itertools.imap vs map over the entire iterable

Joe picture Joe · Dec 31, 2013 · Viewed 8.3k times · Source

I'm curious about a statement from http://docs.python.org/2/library/itertools.html#itertools.imap, namely it describes

sum(imap(operator.mul, vector1, vector2))

as an efficient dot-product. My understanding is that imap gives a generator instead of a list, and while I understand how it would be faster/consume less memory if you're only considering the first few elements, with the surrounding sum(), I don't see how it behaves any differently than:

sum(map(operator.mul, vector1, vector2))

Answer

Max Noel picture Max Noel · Dec 31, 2013

The difference between map and imap becomes clear when you start increasing the size of what you're iterating over:

# xrange object, takes up no memory
data = xrange(1000000000)

# Tries to builds a list of 1 billion elements!
# Therefore, fails with MemoryError on 32-bit systems.
doubled = map(lambda x: x * 2, data)

# Generator object that lazily doubles each item as it's iterated over.
# Takes up very little (and constant, independent of data's size) memory.
iter_doubled = itertools.imap(lambda x: x * 2, data)

# This is where the iteration and the doubling happen.
# Again, since no list is created, this doesn't run you out of memory.
sum(iter_doubled)

# (The result is 999999999000000000L, if you're interested.
# It takes a minute or two to compute, but consumes minimal memory.)

Note that in Python 3, the built-in map behaves like Python 2's itertools.imap (which was removed because it's no longer needed). To get the "old map" behaviour, you'd use list(map(...)), which is another good way to visualize how Python 2's itertools.imap and map differ from each other.