*nix tips: 2010

2010-11-25

Unplot.py: from plots to tabular data

Recently, when doing backups, I noticed a script on my hard drive, which I think may be useful to someone else. It takes an image with a line plot and generates a data file for that plot. So it does an operation inverse to plotting, i.e. unplotting.

Download: unplot.py.

The script is available on bitbucket. Feel free to improve.

To run the script, you first need to decide which part of the plot image you want to scan, and what values the pixels correspond to. I prefer to use Gimp or Geeqie to find pixel coordinates.

If there are many lines on the same plot, You may also decide to colourize the line you are interested in with some distinct colour. Use Gimp if necessary. Write down the colour's HTML code.

This plot is a good point to start:

Then you can run the script against it. In the directory with the script, run:

./unplot.py "#00ff00" 0 151 0 475 5.0 824 0.09 85 /path/to/plot.png > /path/to/data.txt

The first parameter is the HTML colour code of the line to select. Then there is the values and the pixel coordinate of the bottom left corner of the plot: X value, X pixel coordinate, Y value, Y pixel coordinate. Then the same for the top right angle. And finally the name of the file with the plot. The output is redirected to the text file. Please note that the origin of digital images is usually the top left corner.

After running the script try plotting the data once again to make sure you selected the right colour and region. I went too far to the right in this example and the green letters were mistaken for the part of the plot.

In Russian: unplot.py: извлекаем табличные данные из графиков.

2010-11-11

How to select a region by intensity range in Gimp?

I wrote this as an answer to Photo.StackExchange question, how to select by histogram range in Gimp.

Step-by-step

Make a copy of the layer (Layer → Duplicate Layer)
Select the duplicate layer, apply threshold (Colors → Threshold) to select the range of intensities.
In Layer → Mask → Add Layer Mask (or right click in the list of layers). Select “Grayscale copy of layer” and “Invert mask”.
idem: Mask to selection.
Hide or remove the layer with mask.

An example:

An original image. I want to select the circle:

Original image

Make a copy of the layer:

Duplicate layer

Apply threshold. Note that the area to be selected is black:

Threshold

Add Layer Mask using the grayscale value of the image:

Add Layer Mask

Now you've obtained an image with the mask. Everything except the black circle is transparent (we can see the bottom layer through it):

Image with the mask

Convert the mask to selection. Switch to the original layer. The circle is selected.

Mask to selection

2010-09-14

JavaScript highlighter for Haskell code

In this and in my other Blogspot blog I use SHJS to highlight syntax in Haskell snippets. SHJS is a JavaScript highlighter which uses language definitions from GNU Source-highlight.

To use it, you need to put online:

SHJS main script: sh_main.min.js,
One or more language definitions files. Haskell one: sh_haskell.min.js,
One of the stylesheets. (You can preview them on the main SHJS page)

You may wish to download more than one language definition and concatenate all JavaScript files together. It's OK

Then put this somewhere in the <head>:

<head>
...
<script type="text/javascript" src="http://example.com/path/to/sh_main.min.js"></script>
<script type="text/javascript" src="http://example.com/path/to/sh_haskell.min.js"></script>
<link type="text/css" rel="stylesheet" href="http://example.com/path/to/stylesheet.css">
...
</head>

And add a callback to documents' body element:

<body onload="sh_highlightDocument();">

Finally, wrap your Haskell snippets with <pre class="sh_haskell"></pre>. This is how it looks:

-- Fibonacci numbers
fibs :: [Int]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

As far as I know, Haskell language definition has already found its way to GNU Source-highlight, but the corresponding file is not yet shipped with SHJS. So you can download it either from my source repository, or generate from the language definition file (haskell.lang) yourself.

Haskell is a quirky language to highlight, but this language definition handles it reasonably well, better than the current google-code-prettify, and much better than the current GeSHi highlighter.

Update, 2011-01-25: Suprisingly, this highlighter is still better than more recent highlight.js and an alternative language definition for SHJS by Nicolas Wu (zenzike).

So this is the complete list of JavaScript highlighters for Haskell which I know about:

this highlighter (SHJS + Haskell language definition) (test screenshot)
highlight.js (Haskell is supported since January 2010) (test screenshot)
google-code-prettify (test screenshot)
GeSHi (test screenshot)
An alternative Haskell language definition for SHJS by Nicolas Wu (test screenshot)

Typography keyboard layout in Linux

From a Russian blog Slovomania I learned that in the new versions of X11 there is a typography layout option, similar to Ilya Birman layout. This is its Linux counterpart:

Typography layout in Linux

To enable it in GNOME: System → Preferences → Keyboard. Then Layouts tab, Options.... Enable Key to choose 3rd level (right Alt will do) and Enable extra typographic characters under Miscellaneous compatibility options.

Also in Russian: Типографская раскладка в Линуксе.

2010-09-04

LibZip 0.1: read and write zip archives from Haskell

I am happy to present a major release of my Haskell bindings to libzip library, to manipulate zip archives. It took me longer than I initially expected, but finally I like the result. Essential links first:

Hackage page Source repository Documentation Test coverage report An example

What's new

LibZip 0.1 is a complete remake. Under the hood it was made with bindings-DSL instead of C2Hs as before. The new LibZip offers a lot:

Support of almost all features of libzip: creating, reading, updating, renaming, and deleting files in zip archives, reading and writing file and archive comments. (LibZip 0.0 was read-only)
Support of various data sources: supply contents of a file from a list, from a file on disk, from a file in another archive, or even from a Haskell function.
A new monadic interface. It takes care of managing handlers and pointers behind the scenes. Less to type, less space for a user error.
Unit tests, better documentation and examples.

LibZip the only non-GPL library for Haskell to deal with zip archives. It is also fast to deal with large on-disk archives (more about it below).

Users of LibZip 0.0 (all 1.5 of them) may still use the old API by importing Codec.Archive.LibZip.LegacyZeroZero instead of Codec.Archive.LibZip. However, the old API is deprecated and will not be supported in the future.

LibZip vs zip-archive

There is another Haskell library to deal with zip archives, namely zip-archive. And here are the differences:

	LibZip 0.1/libzip	Zip-Archive 0.1
License	BSD	GPL v.2
Pure?	No	Yes
Large on-disk archives	Fast	Slow

Few notes about the last line. This is the actual reason why LibZip exists. Zip-archive was unacceptably memory-hungry and slow when dealing with large archives. So I started working on LibZip. I suppose that the problem with zip-archive is that it works with lazy bytestrings, not files. Bytestrings are sequential and don't have fseek. There is no reliable way to implement random access.

To get an idea what's the problem, get some moderately sized zip-archive off the web (for example, this one, 22 MiB), and print the list of files using both libraries.

With zip-archive, I used this code:

import Codec.Archive.Zip
import Control.Monad (liftM)
import System.Environment (getArgs)
import qualified Data.ByteString.Lazy as BS

main = mapM_ list =<< getArgs

list file = do
  a <- toArchive `liftM` BS.readFile file
  mapM_ print $ filesInArchive a

On my laptop, it takes 3.5 seconds to run against the downloaded archive:

$ time ./zip-archive-ls pak128-1.4.6--102.2.zip > /dev/null

real 0m3.499s
user 0m3.430s
sys 0m0.050s

And this is the code using LibZip:

import Codec.Archive.LibZip
import System.Environment (getArgs)

main = mapM_ list =<< getArgs

list file =
    withArchive [] file $ do
      names <- fileNames []
      lift $ mapM_ print names

It takes 0.05 seconds on the same file:

$ time ./libzip-ls pak128-1.4.6--102.2.zip > /dev/null

real 0m0.051s
user 0m0.040s
sys 0m0.010s

The difference gets more dramatic as the size of the archive increases. So, in my opinion, license issues aside, LibZip is a better choice when dealing with large archives on disk. Zip-archive may be more suitable choice to generate small archives in memory, without even hitting the disk.

Some implementation notes

I switched from C2Hs to bindings-DSL to implement the FFI bindings. And actually I liked bindings-DSL more. It is simple and makes fewer assumptions about the semantics of the C code. As a result, I had working low-level bindings very early. The rest was just to wrap them with a higher-level API to my liking. C2Hs experience was less smooth: in particular, when a C function is not designed as C2Hs expects it, I had to write wrappers manually anyway (for example, if a function returns a value and writes something to memory). Bindings-DSL seems to be better supported right now.

I changed the order of file names and file access flags in all API functions. File name being the last seems to be more useful for partial function application. An example of such order of arguments is:

fileSize :: [FileFlag] -> FilePath -> Archive Int

I ditched ByteString support from the new API. ByteStrings are my Haskell nightmare: there are too many flavours of them to support, and they are not interchangeable. With LibZip 0.0 I had to support two versions of otherwise identical code, all to discover some time later that I need to pack . unpack bytestrings in the application code (impendance mismatch with another library, which chose to use a different flavour of bytestrings).

In this version I chose to use lists as input and output buffers to some functions (sourceBuffer, sourcePure, readBytes, readContents). I suspect this may have negative performance impact, but it needs to be studied. Marshalling of the byte buffers is another question. I suppose that sourceFile and sourcePure may help to workaround this problem if it actually arises. User feedback is required.

Some of the library functions (most notably sourceBuffer) accept Strings as data buffers. This is convenient for testing, but those Strings should not contains code points above 255. The library doesn't handle text encodings. The user is responsible of providing a correctly encoded byte stream to the library.

Libzip can use C callbacks as data source. LibZip bindings can wrap a pure Haskell function and make the C library call it when necessary (see sourcePure). It is not as convenient as the usual lazy evaluation in Haskell, but, hopefully, may somewhat compensate for impurity of the library. I consider adding also sourceIO.

Thanks?

LibZip is under BSD3 licencse. So it is Free. If you want to say “thanks”, consider using this Flattr button:

I think Flattr is a great idea and I'll be glad if more people start using it.

If you use the library, please let me know. It will make me happier, and will motivate me more to support and improve the library.

2010-08-05

Cleaning sensor dust with Gimp

Recently I found an easy cure for dust spots on the photos. I am talking about sensor dust here. Any owner of a camera with interchangeable lenses (and more than one lens) sees out-of-focus dark spots sooner or later. It is sensor dust. For example:

It is easily recognised by always appearing on the place. The dust spots are most noticeable when shooting a uniform bright object (e.g. sky) with small aperture (f/22 or smaller). And cleaning the lens will not get rid of the spots. The right way to address it is to clean the sensor. Someone else wrote about it.

Let's talk about how to remove these spots from the photos already taken. For this purpose I use Gimp and its Resynthesizer plugin. Clone brush is OK too, but cleaning more than a photo or two is tiresome. Think of Resynthesizer as an automatic clone brush. Ubuntu users may install it with gimp-resynthesizer package.

To clean dust spots, first find and select them. I usually use a free selection tool (Lasso) . Press Shift to add more than one spot to selection. Select also a handful of “clean” pixels around. Which pixels are on the border of the selection matters.

Run Filters -> Map -> Resynthesize filter. Default parameters should be OK, tiling options are not necessary for our scope.

Now you have to wait a few seconds. Resynthesizer takes some time to redraw the selection. Anyway, it is faster than manual clone brush.

Finally, inspect the result. Make sure that the plugin didn't draw anything strange. Usually it is OK from the first attempt:

Enjoy!

And if useful, then

Also in Russian: Удаление пыли на матрице в Gimp.

2010-07-21

.emacs of a Vim user

I've been using Emacs for few weeks now, and now I touch .emacs less and less often. Mostly I added alternatives to some Vim commands and defined more ergonomic keybindings. Here it is:

The hierarchy of numeric typeclasses in Haskell

Some time ago I posted this in my Russian blog. Re-posting it here, for it's inconvenient to not be able to find it when googling in English. I use it for reference :-)

Non-abstract types are gathered together in gray frames. Polymorphic types and type classes have rounded boxes. Their possible type parameters are indicated with inversed rhombus arrows.

2010-06-28

Working AppEngine environment on Ubuntu Lucid

Ubuntu Lucid ships Python 2.6. Python 2.5 has been completely removed. Google AppEngine requires Python 2.5 to work. So if you want to develop AppEngine apps on a Lucid machine, you need to setup your working environment manually. This post tells how to do it.

(I assume you install Python to /opt/python2.5 and the user can write there; I assume you create a virtual environment, a directory where Python packages are installed, in $HOME/.py25. Choose different locations if you like and adjust instructions accordingly)

1. Get the latest Python 2.5 release and build it from source.

Make sure that you have necessary development libraries installed. In particular, you probably want to install libsqlite3-dev before building Python. Otherwise you'll have a Python without SQLite3 support, and GAE will not work with it.

Go to a directory where you build software and do from the command line:

wget  http://www.python.org/ftp/python/2.5.5/Python-2.5.5.tar.bz2 -O - | tar jx
cd Python-2.5.5
./configure --prefix=/opt/python2.5
make -j 2
make install
cd ..

If there are configure errors, likely missing packages or header files for C libraries, install them and repeat.

2. Get virtualenv and setup a new Python environment

You will use a separate Python environment for AppEngine. So you will not mess with system packages. Fetch and unpack virtualenv:

wget http://bitbucket.org/ianb/virtualenv/get/tip.gz -O - | tar zx

Run it to get a new environment in ~/.py25 (I use full path to the newly installed Python 2.5 here):

/opt/python2.5/bin/python virtualenv/virtualenv.py ~/.py25

Now to enable Python 2.5 you can source activate script from this environment, and to disable deactivate it:

$ which python ; python --version
/usr/bin/python
Python 2.6.5
$ source ~/.py25/bin/activate
(.py25)$ which python ; python --version
/home/sergey/.py25/bin/python
Python 2.5.5
(.py25)$ deactivate

To install packages for Python 2.5 you can use pip. For example, to install Python Imaging Library (PIL), which is used by AppEngine, run:

(.py25)$ pip install PIL

Or you can use -E ~/.py25 option of pip without activating the environment.

3. Run the development server

Put your GAE SDK in the PATH, get the source of the application, and run the server. For example:

(.py25)$ git clone http://github.com/anotherjesse/webpy-appengine-helloworld.git gae-hello
...
(.py25)$ cd gae-hello
...
(.py25)$ dev_appserver.py .
INFO     2010-06-28 16:35:04,755 appengine_rpc.py:159] Server: appengine.google.com
INFO     2010-06-28 16:35:04,761 appcfg.py:357] Checking for updates to the SDK.
INFO     2010-06-28 16:35:05,068 appcfg.py:371] The SDK is up to date.
INFO     2010-06-28 16:35:05,106 dev_appserver_main.py:431] Running application hello-webpy on port 8080: http://localhost:8080

2010-06-15

Network Management disabled

Sometimes (I suspect it happens after complete battery discharge and incorrect shutdown), my Ubuntu Lucid machine boots with networking disabled. This is a nasty bug, because it doesn't allow to google for solution.

The solution is

sudo sed -i 's/\(NetworkingEnabled=\).*/\1true/' /var/lib/NetworkManager/NetworkManager.state

and then

sudo service network-manager restart

https://bugs.launchpad.net/ubuntu/+bug/555571

OpenMP in the Multicore Era

Nice slides by Christian Terboven on “OpenMP in the Multicore Era”. Enough to get started.

http://www.autodiff.org/ad08/talks/ad08_terboven.pdf

By the way, to build an OpenMP program with modern GCC it’s enough to

gcc -fopenmp -o eval42 eval42.c

It should work out of the box. GCC 4.1 supports OpenMP 2.5, and GCC 4.4 supports OpenMP 3.0.

2010-06-14

Trying to do my daily tasks with Emacs

Lots of frustration, but I slowly remember some habits from 10 years ago. The thing I miss the most is ability to move quickly around and to delete/change an object of text, such as a word, a sentence, a paragraph, part of the line, a tag, etc. All those dsomething from vim.

Some command correspondence (beyond the crash course available elsewhere): (open table in a separate page)

It seems it's time to learn Emacs Lisp. I've bound toggle-viper-mode to C-<escape>. It's my preferred mode to move the cursor and delete stuff.

*nix tips