Putting many python pandas dataframes to one excel worksheet

nyan314sn picture nyan314sn · Oct 5, 2015 · Viewed 52.8k times · Source

It is quite easy to add many pandas dataframes into excel work book as long as it is different worksheets. But, it is somewhat tricky to get many dataframes into one worksheet if you want to use pandas built-in df.to_excel functionality.

# Creating Excel Writer Object from Pandas  
writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter')   
workbook=writer.book
worksheet=workbook.add_worksheet('Validation') 
df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0)   
another_df.to_excel(writer,sheet_name='Validation',startrow=20, startcol=0) 

The above code won't work. You will get the error of

 Sheetname 'Validation', with case ignored, is already in use.

Now, I have experimented enough that I found a way to make it work.

writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter')   # Creating Excel Writer Object from Pandas  
workbook=writer.book
df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0)   
another_df.to_excel(writer,sheet_name='Validation',startrow=20, startcol=0) 

This will work. So, my purpose of posting this question on stackoverflow is twofold. Firstly, I hope this will help someone if he/she is trying to put many dataframes into a single work sheet at excel.

Secondly, Can someone help me understand the difference between those two blocks of code? It appears to me that they are pretty much the same except the first block of code created worksheet called "Validation" in advance while the second does not. I get that part.

What I don't understand is why should it be any different ? Even if I don't create the worksheet in advance, this line, the line right before the last one,

 df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0)  

will create a worksheet anyway. Consequently, by the time we reached the last line of code the worksheet "Validation" is already created as well in the second block of code. So, my question basically, why should the second block of code work while the first doesn't?

Please also share if there is another way to put many dataframes into excel using the built-in df.to_excel functionality !!

Answer

Adrian picture Adrian · Apr 20, 2017

To create the Worksheet in advance, you need to add the created sheet to the sheets dict:

writer.sheets['Validation'] = worksheet

Using your original code:

# Creating Excel Writer Object from Pandas  
writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter')   
workbook=writer.book
worksheet=workbook.add_worksheet('Validation')
writer.sheets['Validation'] = worksheet
df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0)   
another_df.to_excel(writer,sheet_name='Validation',startrow=20, startcol=0) 

Explanation

If we look at the pandas function to_excel, it uses the writer's write_cells function:

excel_writer.write_cells(formatted_cells, sheet_name, startrow=startrow, startcol=startcol)

So looking at the write_cells function for xlsxwriter:

def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0):
    # Write the frame cells using xlsxwriter.
    sheet_name = self._get_sheet_name(sheet_name)
    if sheet_name in self.sheets:
        wks = self.sheets[sheet_name]
    else:
        wks = self.book.add_worksheet(sheet_name)
        self.sheets[sheet_name] = wks

Here we can see that it checks for sheet_name in self.sheets, and so it needs to be added there as well.