This is more of a question on understanding than programming. I am quite new to Pandas and SQL. I am using pandas to read data from SQL with some specific chunksize. When I run a sql query e.g. import pandas as pd
df = pd.read_sql_query('select name, birthdate from table1', chunksize = 1000)
What I do not understand is when I do not give a chunksize, data is stored in the memory and I can see the memory growing however, when I give a chunksize the memory usage is not that high.
I have is that this df now contains a number of arrays which I can access as
for df_array in df:
print df.head(5)
What I do not understand here is if the entire result of the SQL statement is kept in memory i.e. df is an object carrying multiple arrays or if these are like pointers pointing towards a temp table created by SQL query.
I would be very glad to develop some understanding about how this process is actually working.
Let's consider two options and what happens in both cases:
For more details you can see pandas\io\sql.py module, it is well documented