2009-08-26

3 ways to re-indent XML

There is a lot of data in XML formats, but often it's hardly readable: written by programs for programs, everything in one line. Indenting XML automatically helps to read such files.

1. Using XSLT
I have a file with an XSL transformation:
<xsl:stylesheet version="1.0" 
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<xsl:param name="indent-increment" select="'   '" />

<xsl:template match="*">
   <xsl:param name="indent" select="'&#xA;'"/>

   <xsl:value-of select="$indent"/>
   <xsl:copy>
     <xsl:copy-of select="@*" />
     <xsl:apply-templates>
       <xsl:with-param name="indent"
            select="concat($indent, $indent-increment)"/>
     </xsl:apply-templates>
     <xsl:value-of select="$indent"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="comment()|processing-instruction()">
   <xsl:copy />
</xsl:template>

<!-- WARNING: this is dangerous. Handle with care -->
<xsl:template match="text()[normalize-space(.)='']"/>

</xsl:stylesheet>
I found it here. There are also some other variants.

In addition to XSLT file, I have a one-line script which actually runs this transformation. I use xmlstarlet, which is a nice CLI utility to deal with XML.

#!/bin/sh
xmlstarlet tr ~/bin/indent-xml.xsl
Run this script as:
$ xmlindent < original.xml
Along with xmlstarlet you can use other XSL processors. For example, xsltproc should work too.
2. Using xmllint
Inside libxml2-utils package (Debian/Ubuntu) there is an XML validator tool xmllint. It can also reformat (indent) XML:
$ xmllint --format original.xml
This must be even easier.
3. xmlindent
xmlindent is a pure C utility with almost no dependencies. It is intended to do just what it is named: indent XML. I didn't try it.

Прочесть по-русски

Monitor file changes in a shell script

Problem: monitor file changes from a shell script and execute some commands when necessary. For example, rebuild LaTeX document or compile program every time when one of its source files is changed.

Solution: inotify-tools help to monitor file changes. There are two utilities. The first one, inotifywait, blocks and waits for changes, then returns. If the event it was waiting for happened, its return code is 0 (success). See an example of using inotifywait below. The second utility is inotifywatch, it monitors files' changes, collects information and prints a nice table on exit. Please visit inotify-tools' site to see examples of its use.

Example: inotifywait monitors all *.tex and *.bib files in the current directory, and when any of them changes, it runs pdfLaTeX and BibTeX to rebuild document:

while true ; do \
  inotifywait *.tex *.bib \
  && ( pdflatex -interaction=nonstopmode mypaper && \
       bibtex mypaper && \
       pdflatex -interaction=nonstopmode mypaper ) \
done

P.S. Please note that when we run LaTeX with -interaction=nonstopmode, it does not ask questions on errors but we can still see those errors.

P.P.S. inotify-tools run only on Linux. You may need to use pnotify or kqueue on *BSD.

Прочесть по-русски