How do cursors work in Python's DB-API?

Nicholas Leonard picture Nicholas Leonard · Jan 18, 2009 · Viewed 13.5k times · Source

I have been using python with RDBMS' (MySQL and PostgreSQL), and I have noticed that I really do not understand how to use a cursor.

Usually, one have his script connect to the DB via a client DB-API (like psycopg2 or MySQLdb):

connection = psycopg2.connect(host='otherhost', etc)

And then one creates a cursor:

cursor = connection.cursor()

And then one can issue queries and commands:

cursor.execute("SELECT * FROM etc")

Now where is the result of the query, I wonder? is it on the server? or a little on my client and a little on my server? And then, if we need to access some results, we fetch 'em:

rows = cursor.fetchone() 

or

rows = cursor.fetchmany()

Now lets say, I do not retrieve all the rows, and decide to execute another query, what will happen to the previous results? Is their an overhead.

Also, should I create a cursor for every form of command and continuously reuse it for those same commands somehow; I head psycopg2 can somehow optimize commands that are executed many times but with different values, how and is it worth it?

Thx

Answer

jwp picture jwp · Jun 9, 2009

ya, i know it's months old :P

DB-API's cursor appears to be closely modeled after SQL cursors. AFA resource(rows) management is concerned, DB-API does not specify whether the client must retrieve all the rows or DECLARE an actual SQL cursor. As long as the fetchXXX interfaces do what they're supposed to, DB-API is happy.

AFA psycopg2 cursors are concerned(as you may well know), "unnamed DB-API cursors" will fetch the entire result set--AFAIK buffered in memory by libpq. "named DB-API cursors"(a psycopg2 concept that may not be portable), will request the rows on demand(fetchXXX methods).

As cited by "unbeknown", executemany can be used to optimize multiple runs of the same command. However, it doesn't accommodate for the need of prepared statements; when repeat executions of a statement with different parameter sets is not directly sequential, executemany() will perform just as well as execute(). DB-API does "provide" driver authors with the ability to cache executed statements, but its implementation(what's the scope/lifetime of the statement?) is undefined, so it's impossible to set expectations across DB-API implementations.

If you are loading lots of data into PostgreSQL, I would strongly recommend trying to find a way to use COPY.