sheets of Excel Workbook from a URL into a `pandas.DataFrame`

benjaminmgross picture benjaminmgross · Mar 23, 2013 · Viewed 23k times · Source

After looking at different ways to read an url link, pointing to a .xls file, I decided to go with using xlrd.

I am having a difficult time converting a 'xlrd.book.Book' type to a 'pandas.DataFrame'

I have the following:

import pandas
import xlrd 
import urllib2

link ='http://www.econ.yale.edu/~shiller/data/chapt26.xls'
socket = urllib2.urlopen(link)

#this line gets me the excel workbook 
xlfile = xlrd.open_workbook(file_contents = socket.read())

#storing the sheets
sheets = xlfile.sheets()

I want to tak the last sheet of sheets and import as a pandas.DataFrame, any ideas as to how I can accomplish this? I've tried, pandas.ExcelFile.parse() but it wants a path to an excel file. I can of certainly save the file to memory and then parse (using tempfile or something), but I'm trying to follow pythonic guidelines and use functionality likely already written into pandas.

Any guidance is greatly appreciated as always.

Answer

DSM picture DSM · Mar 23, 2013

You can pass your socket to ExcelFile:

>>> import pandas as pd
>>> import urllib2
>>> link = 'http://www.econ.yale.edu/~shiller/data/chapt26.xls'
>>> socket = urllib2.urlopen(link)
>>> xd = pd.ExcelFile(socket)
NOTE *** Ignoring non-worksheet data named u'PDVPlot' (type 0x02 = Chart)
NOTE *** Ignoring non-worksheet data named u'ConsumptionPlot' (type 0x02 = Chart)
>>> xd.sheet_names
[u'Data', u'Consumption', u'Calculations']
>>> df = xd.parse(xd.sheet_names[-1], header=None)
>>> df
                                   0   1   2   3         4
0        Average Real Interest Rate: NaN NaN NaN  1.028826
1    Geometric Average Stock Return: NaN NaN NaN  0.065533
2              exp(geo. Avg. return) NaN NaN NaN  0.067728
3  Geometric Average Dividend Growth NaN NaN NaN  0.012025