Load xlsx file from drive in colaboratory

dd_rookie picture dd_rookie · Nov 22, 2017 · Viewed 23.7k times · Source

How can I import MS-excel(.xlsx) file from google drive into colaboratory?

excel_file = drive.CreateFile({'id':'some id'})

does work(drive is a pydrive.drive.GoogleDrive object). But,

print excel_file.FetchContent()

returns None. And

excel_file.content()

throws:

TypeErrorTraceback (most recent call last) in () ----> 1 excel_file.content()

TypeError: '_io.BytesIO' object is not callable

My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel(), and finally get a pandas dataframe out of it.

Answer

Bob Smith picture Bob Smith · Nov 22, 2017

You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.

Here's a full example: https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC

What I did in more detail:

I created a new spreadsheet in sheets to be exported as an .xlsx file.

Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is: https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM

Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.

Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:

file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')

Finally, to create a Pandas DataFrame:

!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df

The !pip install... line installs the xlrd library, which is needed to read Excel files.