2008-06-09

antiodt: view OpenOffice documents as plain text

I don't like launching heavy office applications just to read a file. And there are antiword and wv to read MSWord *.doc files, unrtf to read RTF, and pdftotext to read PDF. Only open, ISO standard, ODT (OpenDocument, produced by OpenOffice) cannot be read that way. o3read seems to be useless for the new ODT files.

So, this is a one-and-half-line script I use to view OpenOffice files quickly from the shell prompt (antiodt):

#!/bin/sh
unzip -p "$1" content.xml | \
xmlstarlet sel -N text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" \
  -T -t -m '//text:p' -v . -n | less
Any ODT is just a normal ZIP archive with an XML file with all the contents. I used xmlstarlet to extract text paragraphs from that XML. Certainly, all formatting is lost, but it is fast.:
$ antiodt document.odt
I got an idea from here.

Update 2009-09-23: To convert ODT to plain text and preserve some formatting, use odt2txt.py script. It converts ODT to Markdown.

This post in Russian: antiodt: просмотр документов OpenOffice в виде простого текста