I'm curious about a statement from http://docs.python.org/2/library/itertools.html#itertools.imap, namely it describes
sum(imap(operator.mul, vector1, vector2))
as an efficient dot-product. My understanding is that imap gives a generator instead of a list, and while I understand how it would be faster/consume less memory if you're only considering the first few elements, with the surrounding sum(), I don't see how it behaves any differently than:
sum(map(operator.mul, vector1, vector2))
The difference between map
and imap
becomes clear when you start increasing the size of what you're iterating over:
# xrange object, takes up no memory
data = xrange(1000000000)
# Tries to builds a list of 1 billion elements!
# Therefore, fails with MemoryError on 32-bit systems.
doubled = map(lambda x: x * 2, data)
# Generator object that lazily doubles each item as it's iterated over.
# Takes up very little (and constant, independent of data's size) memory.
iter_doubled = itertools.imap(lambda x: x * 2, data)
# This is where the iteration and the doubling happen.
# Again, since no list is created, this doesn't run you out of memory.
sum(iter_doubled)
# (The result is 999999999000000000L, if you're interested.
# It takes a minute or two to compute, but consumes minimal memory.)
Note that in Python 3, the built-in map
behaves like Python 2's itertools.imap
(which was removed because it's no longer needed). To get the "old map
" behaviour, you'd use list(map(...))
, which is another good way to visualize how Python 2's itertools.imap
and map
differ from each other.