When to use or not use iterator() in the django ORM

Lucas Ou-Yang picture Lucas Ou-Yang · Oct 1, 2012 · Viewed 27k times · Source

This is from the django docs on the queryset iterator() method:

A QuerySet typically caches its results internally so that repeated evaluations do not result in additional queries. In contrast, iterator() will read results directly, without doing any caching at the QuerySet level (internally, the default iterator calls iterator() and caches the return value). For a QuerySet which returns a large number of objects that you only need to access once, this can results in better performance and a significant reduction in memory.

After reading, I'm still confused: The line about increased performance and memory reduction suggests we should just use the iterator() method. Can someone give some examples of good and bad cases iterator() usage?

Even if the query results are not cached, if they really wanted to access the models more than once, can't someone just do the following?

saved_queries = list(Model.objects.all().iterator())

Answer

Steven picture Steven · Oct 2, 2012

Note the first part of the sentence you call out: For a QuerySet which returns a large number of objects that you only need to access once

So the converse of this is: if you need to re-use a set of results, and they are not so numerous as to cause a memory problem then you should not use iterator. Because the extra database round trip is always going to reduce your performance vs. using the cached result.

You could force your QuerySet to be evaluated into a list but:

  • it requires more typing than just saved_queries = Model.objects.all()
  • say you are paginating results on a web page: you will have forced all results into memory (back to possible memory problems) rather than allowing the subsequent paginator to select the slice of 20 results it needs
  • QuerySets are lazy, so you can have a context processor, for instance, that puts a QuerySet into the context of every request but only gets evaluated when you access it on certain requests but if you've forced evaluation that database hit happens every request

The typical web app case is for relatively small result sets (they have to be delivered to a browser in a timely fashion, so pagination or a similar technique is employed to decrease the data volume if required) so generally the standard QuerySet behaviour is what you want. As you are no doubt aware, you must store the QuerySet in a variable to get the benefit of the caching.

Good use of iterator: processing results that take up a large amount of available memory (lots of small objects or fewer large objects). In my experience this is often in management commands when doing heavy data processing.