2009-08-26

3 ways to re-indent XML

There is a lot of data in XML formats, but often it's hardly readable: written by programs for programs, everything in one line. Indenting XML automatically helps to read such files.

1. Using XSLT
I have a file with an XSL transformation:
<xsl:stylesheet version="1.0" 
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<xsl:param name="indent-increment" select="'   '" />

<xsl:template match="*">
   <xsl:param name="indent" select="'&#xA;'"/>

   <xsl:value-of select="$indent"/>
   <xsl:copy>
     <xsl:copy-of select="@*" />
     <xsl:apply-templates>
       <xsl:with-param name="indent"
            select="concat($indent, $indent-increment)"/>
     </xsl:apply-templates>
     <xsl:value-of select="$indent"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="comment()|processing-instruction()">
   <xsl:copy />
</xsl:template>

<!-- WARNING: this is dangerous. Handle with care -->
<xsl:template match="text()[normalize-space(.)='']"/>

</xsl:stylesheet>
I found it here. There are also some other variants.

In addition to XSLT file, I have a one-line script which actually runs this transformation. I use xmlstarlet, which is a nice CLI utility to deal with XML.

#!/bin/sh
xmlstarlet tr ~/bin/indent-xml.xsl
Run this script as:
$ xmlindent < original.xml
Along with xmlstarlet you can use other XSL processors. For example, xsltproc should work too.
2. Using xmllint
Inside libxml2-utils package (Debian/Ubuntu) there is an XML validator tool xmllint. It can also reformat (indent) XML:
$ xmllint --format original.xml
This must be even easier.
3. xmlindent
xmlindent is a pure C utility with almost no dependencies. It is intended to do just what it is named: indent XML. I didn't try it.

Прочесть по-русски