Copyright 2017-2024 Jason Ross, All Rights Reserved

One of the Python features that gets a lot of publicity is list comprehension and, to a lesser extent, dictionary comprehension. It sounds very complex, and it’s very a very popular subject in technical interviews, but it’s really just syntactic sugar – something the interpreter uses to let you simplify your code.

That’s not to say it’s a bad thing just because it’s syntactic sugar. Syntactic sugar can be a good thing – for example C# has the using keyword, which covers up a whole collection of IDispose calls, exception handling and so on. Just because the term syntactic sugar sounds frivolous, doesn't mean it is.

The two concepts are closely related, but list comprehensions are a little simpler, so they’re probably the best place to start.

If you want to iterate through the contents of a list, collection, iterator or generator, a for loop is usually  good place to start. For example, if you wanted to calculate the sum of the squares of the numbers from 1 to 100 you could do it using the following:

result = 0

for n in range(0, 101):
    result += n * n

print(f"Sum of squares: {result}")

Filtering out empty strings and strings starting with a specific letter could be done as follows:

source_strings = ["cat", None, "dog", "hamster", "bat", "coypu", "", "chinchilla"]
result = []

for s in source_strings:
    if s and s.startswith("c"):
        result.append(s)

print(f"Filtered strings: {result}")

You could use the Python filter function for this, but filter, like map, is apparently regarded as not Pythonic, beyond the pale, and generally not great.

As a final example, if you wanted to calculate the average length of the strings in a collection,, this would be a reasonable thing to do:

from statistics import mean

source_strings = ["cat", None, "dog", "hamster", "bat", "coypu", "", "chinchilla"]
lengths = []

for s in source_strings:
    lengths.append(len(s) if s else 0)

result = mean(lengths)

print(f"Average string length: {result}")

All of these examples are reasonable, but they can be made smaller, simpler and more readable with list comprehensions.

So what IS a list comprehension?

A list comprehension is a piece of code that iterates across a list, iterator or generator, optionally performing a function on each item that matches an optional condition, and returns a new list containing the results.

It takes the form:

Python List Comprehension
Python List Comprehension

The condition is important, as it lets you filter out values you don’t want to call the expression on. This can save a lot of time if the expression takes a long time to evaluate.

Using a list comprehension, the sum of squares example from earlier in this article can be written as:

result = [ x * x for x in range(0, 101) ]

This returns a list of:

[0, 1, 4, 9, 16, 25, ... , 10000]

Similarly the example that implements the Python filter function can be rewritten as:

result = [ s for s in strings if s and not s.startswith("b") ]

The third example – calculating the average length of the strings meeting a specified set of criteria, is a little more complex. It’s a combination of text and mathematical functions, with the result of the list comprehension being an array of numbers, and the statistics.mean function being used to calculate the average value of the array contents:

result  = statistics.mean([ len(s) for s in strings if s ])

You might be looking at all this and thinking “Is there anything else I can do with this?", "I have an enormous amount of data I want to process with list comprehensions and I can’t fit it all into memory" or even "I’m in a job interview right now - help!”

List comprehensions can use a lot of memory because all of the data is processed before the comprehension returns. Normally this isn’t a problem, but if you have a lot of data then, as you’ve noticed, it can be. The solution to your problem is a quick and easy change. If you take your list comprehension:

result = [ f(x) for x in source ]

replace the square brackets with regular parentheses, so that your code follows the pattern:

Python List Comprehension As A Generator
Python List Comprehension As A Generator

and instead of returning a list, the code returns a generator. I covered generators in Five Minute Introduction: Python Generators, but in general they’re slower that list comprehensions. Their big advantage though is that they only generate the items in the result as they’re requested, so you won’t run out of memory.
For example:

result = [ n * n for n in range(1, 10000000) ]

is fast, but takes up a lot of memory. Turning it into a generator:

result = ( n * n for n in range(1, 10000000) )

is slower, but uses much less memory. Which one you use is up to you, and should depend entirely on your situation.

Can List Comprehensions Work With Two Lists?

Most of the time you’ll only need to iterate through one collection or iterator in your list comprehensions. But, what if you wanted to use two? After all, occasionally you probably find yourself working with code that looks like:

from pprint import pformat

a = [1, 2, 3, 4, 5]
b = [10, 20, 30, 40, 50]

result = []

for item_a in a:
    for item_b in b:
        result.append((item_a, item_b))

print(f"Results:\n{pformat(result)}")

The result of this code is a list containing the result of the function for every combination of the source collections. So in this case, the results would look like:

Results:
[(1, 10),
 (1, 20),
 (1, 30),
 (1, 40),
 (1, 50),
 (2, 10),
 (2, 20),
 (2, 30),
 (2, 40),
 (2, 50),
 (3, 10),
 (3, 20),
 (3, 30),
 (3, 40),
 (3, 50),
 (4, 10),
 (4, 20),
 (4, 30),
 (4, 40),
 (4, 50),
 (5, 10),
 (5, 20),
 (5, 30),
 (5, 40),
 (5, 50)]

It would be great if you could simplify these loops into a list comprehension, and you can:

Python List Comprehension With Multiple Sources
Python List Comprehension With Multiple Sources

The loop example above can be converted to use a list comprehension with multiple sources, and looks like:

result = [(item_a, item_b) for item_a in a for item_b in b]

Of course you can add a conditional clause on the output and inputs, but that veers quickly into the realm of being overly complex and generally a bad thing to do.

Can List Comprehensions Work With MORE THAN Two Lists?

Yes, you can add as many sources as you want – the following is valid:

result = [f(item_a, item_b, item_c, item_d, item_e, item_f) for item_a in a
                                                            for item_b in b
                                                            for item_c in c
                                                            for item_d in d
                                                            for item_e in e
                                                            for item_f in f]

Again, as with so many aspects of list comprehensions, don’t do this, it’s horrible. It’s unclear, and the performance is O(nm) where m is the number of sources in the list comprehension, so if each of the collections a to f in the code above contained 10 items, function f would be called 1,000,000 times. That's not going to help your system's performance.

Dictionary Comprehensions

Another Python feature that’s related to list comprehensions, albeit much less popular, is dictionary comprehension. Perhaps unsurprisingly, dictionary comprehensions let you create dictionaries much more simply than you may otherwise have to.

The syntax for dictionary comprehensions is similar to that for list comprehensions:

Python Dictionary Comprehension
Python Dictionary Comprehension

Instead of carrying out a function on one value, as with a list comprehension, a dictionary comprehension lets you call two functions on each item or combination of items.

This is a great technique to use if you’re using something like AWS, which sometimes returns data as a list of name/value pairs, like:

source = [{"Name": "account", "Value": "JohnSmith"},
          {"Name": "email_address", "Value": "This email address is being protected from spambots. You need JavaScript enabled to view it."},
          {"Name": "phone", "Value": "1 555 555 5555"}]

A single line of code like this:

result = {item["Name"]: item["Value"] for item in source}

Can turn the source into:

{'account': 'JohnSmith',
 'email_address': This email address is being protected from spambots. You need JavaScript enabled to view it.',
 'phone': '1 555 555 5555'}

Just like list comprehensions, dictionary comprehensions can be used to combine an almost unlimited number of lists, generators and/or iterators. For example, two sources can be combined like this:

Python Dictionary Comprehension With Two Sources
Python Dictionary Comprehension With Two Sources

In the example above, fk(x,y) is the key generation function, and needs to return a unique value for each combination of x and y. If it generates duplicate key values, then the later ones will overwrite the earlier ones, as dictionary keys are always unique.

For example:

source_a = ["a", "b", "c", "d"]
source_b = ["q", "r", "s", "t"]

result = {(a, b): a + b for a in source_a for b in source_b}

Which gives the following result

{('a', 'q'): 'aq',
 ('a', 'r'): 'ar',
 ('a', 's'): 'as',
 ('a', 't'): 'at',
 ('b', 'q'): 'bq',
 ('b', 'r'): 'br',
 ('b', 's'): 'bs',
 ('b', 't'): 'bt',
 ('c', 'q'): 'cq',
 ('c', 'r'): 'cr',
 ('c', 's'): 'cs',
 ('c', 't'): 'ct',
 ('d', 'q'): 'dq',
 ('d', 'r'): 'dr',
 ('d', 's'): 'ds',
 ('d', 't'): 'dt'}

And, just as with list comprehensions, doing this with more than about two sources is a really bad idea. It introduces complexity, confusion and an unnecessary level of intricacy that will have future developers hating you more than they already will!

Summary

List comprehensions are a very useful, concise and efficient way of solving many problems that you would otherwise use loops for. They look a little strange when you first encounter them, especially if you’re coming from working in another language. They’re open to abuse, but then so are most programming features – trust me, I’ve seen it! Overall though, they make your code better. I recommend including them in your code.

Made In YYC

Made In YYC
Made In YYC

Hosted in Canada by CanSpace Solutions