2011-03-09

How to choose a Haskell array library

Choosing an array type in Haskell is a difficult task. For one-dimensional random access data structure the vector library seems to be the optimal choice most of the time. Things are more complicated if you happen to need two- or multi-dimensional arrays (matrices), access their blocks and slices as first-class structures (like in Python), enjoy destructive updates, use some linear algebra, interoperate with C and run your code in parallel...

I've reviewed what array libraries are available on Hackage, and compiled this feature matrix.

It is not complete and not finished, please let me know if I am mistaken about some of the libraries.

Data.Array and its variants from the array library seem to be the standard choice of multi-dimensional arrays for a Haskeller. They are not good anymore when you need to write array-to-array operations, access their blocks and slices, or need some linear algebra in general. Choosing the right variant is another question.

Data.Vector from the vector library is fast and has very nice API. It is going to become part of the Haskell Platform. Unfortunately, it is not usable for multi-dimensional arrays, and it doesn't support slices (strides). Only boxed variant is parallelizable.

Data.Packed.Vector and Data.Packed.Matrix from the hmatrix library provide a very nice API, and can do almost anything, if all you need is at most two-dimensional array (a matrix). A big warning sign: hmatrix is GPL. Not a sensible LGPL, but the poisonous GPL library. Also, Data.Packed.Vector and Data.Packed.Matrix are not parallelizable as far as I can see.

Data.Vector.*, Data.Matrix.* and Data.Tensor.* from the blas library. They can do all the standard linear algebra which the BLAS level 3 can offer. Their API is designed with BLAS API in mind. I didn't check if they are interoperable with C arrays, or if the elements are unboxed, or if the operations are parallelizable. The last release was in January 2009, it is not buildable on new GHC 7 yet.

Finally, there is new Data.Array.Repa arrays from the repa library. Nice thing about them: they are designed for parallelization. Not so nice thing about them: they don't give performance advantages on GHC 6, and are not yet buildable on GHC 7. Some more important limitations of the repa: no access to strides or array blocks, no built-in linear algebra, no interoperability with C.

I didn't include in the table Vec and vect libraries, which provide special case solutions in the low-dimension arrays and low-rank matrices. They are mostly tailored towards computer graphics.