Why is iterating through a large Django QuerySet consuming massive amounts of memory?

davidchambers picture davidchambers · Nov 19, 2010 · Viewed 44.7k times · Source

The table in question contains roughly ten million rows.

for event in Event.objects.all():
    print event

This causes memory usage to increase steadily to 4 GB or so, at which point the rows print rapidly. The lengthy delay before the first row printed surprised me – I expected it to print almost instantly.

I also tried Event.objects.iterator() which behaved the same way.

I don't understand what Django is loading into memory or why it is doing this. I expected Django to iterate through the results at the database level, which'd mean the results would be printed at roughly a constant rate (rather than all at once after a lengthy wait).

What have I misunderstood?

(I don't know whether it's relevant, but I'm using PostgreSQL.)

Answer

eternicode picture eternicode · Nov 19, 2010

Nate C was close, but not quite.

From the docs:

You can evaluate a QuerySet in the following ways:

  • Iteration. A QuerySet is iterable, and it executes its database query the first time you iterate over it. For example, this will print the headline of all entries in the database:

    for e in Entry.objects.all():
        print e.headline
    

So your ten million rows are retrieved, all at once, when you first enter that loop and get the iterating form of the queryset. The wait you experience is Django loading the database rows and creating objects for each one, before returning something you can actually iterate over. Then you have everything in memory, and the results come spilling out.

From my reading of the docs, iterator() does nothing more than bypass QuerySet's internal caching mechanisms. I think it might make sense for it to a do a one-by-one thing, but that would conversely require ten-million individual hits on your database. Maybe not all that desirable.

Iterating over large datasets efficiently is something we still haven't gotten quite right, but there are some snippets out there you might find useful for your purposes: