I'm looking to read in an Excel workbook with 15 fields and about 2000 rows, and convert each row to a dictionary in Python. I then want to append each dictionary to a list. I'd like each field in the top row of the workbook to be a key within each dictionary, and have the corresponding cell value be the value within the dictionary. I've already looked at examples here and here, but I'd like to do something a bit different. The second example will work, but I feel like it would be more efficient looping over the top row to populate the dictionary keys and then iterate through each row to get the values. My Excel file contains data from discussion forums and looks something like this (obviously with more columns):
id thread_id forum_id post_time votes post_text
4 100 3 1377000566 1 'here is some text'
5 100 4 1289003444 0 'even more text here'
So, I'd like the fields id
, thread_id
and so on, to be the dictionary keys. I'd like my dictionaries to look like:
{id: 4,
thread_id: 100,
forum_id: 3,
post_time: 1377000566,
votes: 1,
post_text: 'here is some text'}
Initially, I had some code like this iterating through the file, but my scope is wrong for some of the for-loops and I'm generating way too many dictionaries. Here's my initial code:
import xlrd
from xlrd import open_workbook, cellname
book = open_workbook('forum.xlsx', 'r')
sheet = book.sheet_by_index(3)
dict_list = []
for row_index in range(sheet.nrows):
for col_index in range(sheet.ncols):
d = {}
# My intuition for the below for-loop is to take each cell in the top row of the
# Excel sheet and add it as a key to the dictionary, and then pass the value of
# current index in the above loops as the value to the dictionary. This isn't
# working.
for i in sheet.row(0):
d[str(i)] = sheet.cell(row_index, col_index).value
dict_list.append(d)
Any help would be greatly appreciated. Thanks in advance for reading.
The idea is to, first, read the header into the list. Then, iterate over the sheet rows (starting from the next after the header), create new dictionary based on header keys and appropriate cell values and append it to the list of dictionaries:
from xlrd import open_workbook
book = open_workbook('forum.xlsx')
sheet = book.sheet_by_index(3)
# read header values into the list
keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)]
dict_list = []
for row_index in xrange(1, sheet.nrows):
d = {keys[col_index]: sheet.cell(row_index, col_index).value
for col_index in xrange(sheet.ncols)}
dict_list.append(d)
print dict_list
For a sheet containing:
A B C D
1 2 3 4
5 6 7 8
it prints:
[{'A': 1.0, 'C': 3.0, 'B': 2.0, 'D': 4.0},
{'A': 5.0, 'C': 7.0, 'B': 6.0, 'D': 8.0}]
UPD (expanding the dictionary comprehension):
d = {}
for col_index in xrange(sheet.ncols):
d[keys[col_index]] = sheet.cell(row_index, col_index).value