Python is slow when iterating over a large list

nycynik picture nycynik · Feb 22, 2012 · Viewed 8k times · Source

I am currently selecting a large list of rows from a database using pyodbc. The result is then copied to a large list, and then i am trying to iterate over the list. Before I abandon python, and try to create this in C#, I wanted to know if there was something I was doing wrong.

clientItems.execute("Select ids from largetable where year =?", year);
allIDRows = clientItemsCursor.fetchall() #takes maybe 8 seconds.

for clientItemrow in allIDRows:
    aID = str(clientItemRow[0])
    # Do something with str -- Removed because I was trying to determine what was slow
    count = count+1

Some more information:

  • The for loop is currently running at about 5 loops per second, and that seems insanely slow to me.
  • The total rows selected is ~489,000.
  • The machine its running on has lots of RAM and CPU. It seems to only run one or two cores, and ram is 1.72GB of 4gb.

Can anyone tell me whats wrong? Do scripts just run this slow?

Thanks

Answer

jsbueno picture jsbueno · Feb 22, 2012

This should not be slow with Python native lists - but maybe ODBC's driver is returning a "lazy" object that tries to be smart but just gets slow. Try just doing

allIDRows = list(clientItemsCursor.fetchall())

in your code and post further benchmarks.

(Python lists can get slow if you start inserting things in its middle, but just iterating over a large list should be fast)