*nix tips: Random Python snippets

Lists and dictionaries

Given a flat list, like [key1, value1, key2, value2] convert it to an alist or dictionary:

>>> toalist = lambda kvs: zip(kvs[0::2], kvs[1::2])
>>> toalist(range(4))
[(0, 1), (2, 3)]
>>> dict(toalist(range(4)))
{0: 1, 2: 3}

Convert a dictionary to a flat list:

>>> # dict to alist
... al = list({1:2,3:4}.iteritems())
>>> al
[(1, 2), (3, 4)]
>>> # alist to flat list
... reduce(lambda acc,t: acc + list(t), al, [])
[1, 2, 3, 4]

To tranpose a list of lists/tuples unpack as a list of function arguments and zip zip(*mylist):

>>> l = list(enumerate("abcdef"))
>>> l
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]
>>> # transpose list of lists/tuples
... zip(*l)
[(0, 1, 2, 3, 4, 5), ('a', 'b', 'c', 'd', 'e', 'f')]
>>> # once again
... zip(*_)
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]

Flatten a list of lists:

>>> lofl = [[1,2], [3], [4,5]]
>>> import operator
>>> reduce(operator.add, lofl)
[1, 2, 3, 4, 5]

An alternative approach is to use chain from itertools (this works also on huge lists if used wisely!):

>>> list(itertools.chain(*lofl))
[1, 2, 3, 4]

Apply a function to either an iterable (list, tuple) or a scalar:

>> def fmap(f,xs):
...   try: return map(f,xs)
...   except TypeError: return f(xs)
... 
>>> fmap(lambda x:x*x, range(5))
[0, 1, 4, 9, 16]
>>> fmap(lambda x:x*x, 5)
25

Strings and Unicode

Unicode stuff is changing in 3.0. For earlier versions, it is important to distinguish strings ("abc") and unicode strings (u"abc"). The former can be converted to the latter with unicode():

>>> "абв"
'\xd0\xb0\xd0\xb1\xd0\xb2'
>>> u"абв"
u'\u0430\u0431\u0432'
>>> unicode("абв","utf8")
u'\u0430\u0431\u0432'

Please note there are 3 unicode symbols in the original literal and there are three values in the unicode string. This is how the strings are to be represented internally. Any communication with an external world usually requires that unicode data is encoded. There are various encodings, "UTF-8" is one of the most common. Any encoded input should be decoded to be processed:

>>> "абв".decode("utf8")
u'\u0430\u0431\u0432'
>>> u"абв".encode("utf8")
'\xd0\xb0\xd0\xb1\xd0\xb2'

To live a long and happy life it is important to understand if you are working with an encoded data (practically binary data) or decoded unicode text. To test if an object is a string (either ascii string or unicode), test if it is an instance of basestring:

>>> isinstance("abc",basestring)
True
>>> isinstance(u"abc",basestring)
True
>>> isinstance(42,basestring)
False

To convert to a string and from string (depends on type):

>>> str(42)
'42'
>>> unicode(42)
u'42'
>>> int("42")
42
>>> float("42")
42.0

Backporting to Python 2.4

With Python 2.5, 2.6 and even 3.0 around, I still need to make some scripts run with Python 2.4. Just two tricks, to make sqlite3 code work:

try:
   import sqlite3
except:
   from pysqlite2 import dbapi2 as sqlite3 # cheating with py2.4

and to make ElementTree work:

try:
        import xml.etree.ElementTree as ET
except:
        import cElementTree as ET  # not xml.etree in py2.4, use celementtree

*nix tips

2009-05-21

Random Python snippets

Lists and dictionaries

Strings and Unicode

Backporting to Python 2.4

Subscribe now!

Blog Archive

Labels