George V. Reilly's Technical Blog

Flattening List Comprehensions in Python

List Comprehension

Python has list comprehensions, syntactic sugar for building lists from an expression.

>>> [2 * i for i in (2, 3, 5, 7, 11)]
[4, 6, 10, 14, 22]

This doesn't work so well when the comprehension expression is itself a list: you end up with a list of lists.

>>> def gen():
...     for l in [['a', 'b'], ['c'], ['d', 'e', 'f']]:
...         yield l
...
>>> [l for l in gen()]
[['a', 'b'], ['c'], ['d', 'e', 'f']]

This is ugly. Here's one way to build a flattened list, but it's less elegant than the comprehension.

>>> x = []
>>> for l in gen():
...     x.extend(l)
...
>>> x
['a', 'b', 'c', 'd', 'e', 'f']

It took me a while to find a readable list comprehension, with a little help from Google. Use sum() on the outer list and prime it with an empty list, []. Python will concatenate the inner lists, producing a flattened list.

>>> sum([l for l in gen()], [])
['a', 'b', 'c', 'd', 'e', 'f']

Alternatively, you can use itertools.chain().

>>> import itertools
>>> list(itertools.chain(*gen()))
['a', 'b', 'c', 'd', 'e', 'f']

That might be slightly more efficient, though I find the sum() to be a little more readable.

>>> import itertools
>>> list(itertools.chain(*gen()))
['a', 'b', 'c', 'd', 'e', 'f']

That might be slightly more efficient, though I find the sum() to be a little more readable.

Edit: I forgot about nested comprehensions

>>> [inner
...     for outer in gen()
...         for inner in outer]
['a', 'b', 'c', 'd', 'e', 'f']

Somewhat cryptic on one line however:

>>> [j for i in gen() for j in i]
['a', 'b', 'c', 'd', 'e', 'f']

Comments

Ray Vega said:

This is great info on all of the possible flattening solutions.  The 'sum' version is certainly less verbose than the 'chain' approach. However, it always throws me off that sum can be used to concatenate (nested) lists not just numbers.  After all, the word "sum" commonly denotes the use of numbers.

I would not mind 'sum' being used for lists if it could also be used for strings.  For example, while the following is permissible:

>>> sum([['b'], ['c'], ['d']], ['a'])

['a', 'b', 'c', 'd']

this is not:

>>> sum(['b', 'c', 'd'], 'a')

Traceback (most recent call last):

 File "<pyshell#66>", line 1, in <module>

   sum(['b', 'c', 'd'], 'a')

TypeError: sum() can't sum strings [use ''.join(seq) instead]

Isn't a string a special type of list, anyway?  For example, list comprehensions certainly supports manipulating strings.  Also, the '+' operator can be used on strings, then why not sum?

Python is built on the premise of clarity and terseness, but the recommended syntax of using ''.join(seq) is somewhat less intuitive than if sum could be used.  For the same reasons why flattening the nested lists using sum is more readable to you, using sum with strings could make it a more readable alternative than using join.

# March 25, 2009 6:35 PM

george_v_reilly said:

I'm used to join -- similar idioms are in C# and JavaScript -- so it hadn't really struck me as a wart.

Here's a sum function that sort of does what you want:

>>> def sum_(*l):

...  return ''.join(l)

>>> sum_('b', 'c', 'd', 'a')

'bcda'

Though it won't work with the example you gave, sum_(['b', 'c', 'd'], 'a'), as that's concatenating a list of strings with a string. That in turn could be fixed by flattening any arguments that are lists, then joining the result.

# March 26, 2009 10:19 PM

Ray Vega said:

>> I'm used to join -- similar idioms are in C# and JavaScript -- so it hadn't really struck me as a wart.

Yes that is true. But, if I was someone more comfortable with functional programming language idioms of map, reduce, filter as found in Python then I might use those over list comprehensions although LCs can be viewed as far more elegant.

For some reason, ''.join just feels awkward to me for simple concatenations of lists than if I could use sum. Maybe it's just me. Just nice to have alternatives built into the language then having to roll my own.

It is all subjective I guess. :-)

# March 27, 2009 2:56 PM

Johann Visagie said:

Sorry for replying to such an old post, but I just stumbled across it while googling for something else.  Anyway, I must be really dense, but I don't see the point of the generator.

In Haskell, given that my original list is xss, I would flatten it with:

[x | xs <- xss, x <- xs]

This appears to work just fine if transliterated directly into Python, though maybe it's not optimally efficient.  Given that:

>>> ll = [['a', 'b'], ['c'], ['d', 'e', 'f']]

You can simply flatten it with:

>>> [x for l in ll for x in l]

['a', 'b', 'c', 'd', 'e', 'f']

# April 29, 2010 4:41 PM

aricept tbi said:

BOGTrZ im subscribing to this rss totally

# August 5, 2011 3:09 PM

dofollow blog finder said:

read some ppt on slideshar

# August 26, 2011 12:25 PM

dofollow blog finder said:

read some ppt on slideshar

# August 26, 2011 12:25 PM

noclegi Szczyrk said:

bang ouch, thats cool

# September 8, 2011 3:37 AM

marketing internetowy said:

eVG4h9 This is the first time i've heard of an Seo camp. Really interesting and i will be attending.

# October 21, 2011 10:19 AM

Welding Electrodes said:

Hah, seriously? That's rediculous. No way

# December 18, 2011 1:10 PM
Leave a Comment

(required) 

(required) 

(optional)

(required)