List Comprehensions in Python to compute minimum and maximum values of a list

Gianni Spear picture Gianni Spear · Aug 5, 2014 · Viewed 10.6k times · Source

i have the following code to compute minimum and maximum values of a list in order to save memory efficiency

x_min = float('+inf')
x_max = float('-inf')

for p in points_in_list:
    x_min = min(x_min, p)
    x_max = max(x_max, p)

where points_in_list is a (large) list of numbers. I wish to know if there is a method to compute with a List Comprehensions the min and max value and saving the memory.

Answer

bgusach picture bgusach · Aug 5, 2014

I'm a big fan of generators and comprehensions, but in this case it seems they are not the right way to go, because:

  1. You want to compute the min and the max of the list
  2. Your list is huge

If you wanted to calculate only one of the min or max, you could just use the min/max function on it. But since you want both, you would have to loop over the list twice to extract first the min and then the max. I.e. something like this:

x_min = min(points)
x_max = max(points)

Let's play with some some timings. First calling both min and max on the list:

>>> import timeit
>>> def with_gens(l):
...     return min(l), max(l)
...
>>> timeit.timeit('with_gens(range(6000000))', 'from __main__ import with_gens', number=5)
1.7451060887015188

and now looping only once, using you code:

>>> def with_loop2(l):
...     x_max = float('+inf')
...     x_min = float('-inf')
...     for el in l:
...         x_min = min(x_min, el)
...         x_max = max(x_max, el)
...     return x_min, x_max
...
>>> timeit.timeit('with_loop2(range(6000000))', 'from __main__ import with_loop2', number=5)
11.636076105071083

Crazy, huh?

There is no memory problem at all with this approach. However, it sets the x_max and x_min in each loop, which is actually an unnecessary waste: you only want to reset the variable when you have found a bigger/smaller value. We can easily address this.

So... let's try looping only once, but avoiding unnecessary resettings.

>>> def with_loop(l):
...     x_min = float('-inf')
...     x_max = float('+inf')
...     for el in l:
...         if el < x_min:
...             x_min = el
...         elif el > x_max:
...             x_max = el
...     return x_min, x_max
...
>>> timeit.timeit('with_loop(range(6000000))', 'from __main__ import with_loop', number=5)
3.961046726963332

OH SURPRISE

It seems while the algorithm of looping only once is on the paper more efficient, it is beaten by the inner optimization of min and max. Moreover, the difference between setting the var in each loop and only when necessary is huge. You never stop learning.