python xlrd receiving float from excel text cell

binaryFever picture binaryFever · Dec 17, 2011 · Viewed 11k times · Source

I am attempting to read values from an Excel file using xlrd. It has been working great on dates, numbers, and up until now text. I have a column (category) with cells containing text (the cells are formatted as text). When I print out the cell value a float is displayed instead of the text. I also printed out the ctype of the Cell object(s) to check and it is showing as Number. I've read through the documentation and tutorial of xlrd and can't seem to find why this is occurring. Could it be that my excel file is somehow messed up? Any suggestions or pointers in the right direction?

import xlrd
import datetime

workbook = xlrd.open_workbook('training.xls')
courseSheet = workbook.sheet_by_index(0)

for row in range(courseSheet.nrows):
    title = courseSheet.cell_value(row, 2)
    date = courseSheet.cell_value(row, 4)
    date = datetime.datetime(*xlrd.xldate_as_tuple(date, workbook.datemode))
    dateTuple = date.timetuple()
    category = courseSheet.cell_value(row, 7)
    print category

Answer

John Machin picture John Machin · Dec 18, 2011

Background: For each cell, xlrd reports the intrinsic value (if any) that is stored in the XLS file. The value types are assigned initially solely on the basis of the record type in the file (e.g. NUMBER and RK records contain floating-point numbers). It classifies formats as described here and uses that information to override the value type where it is apparent that a datetime, date, or time is intended rather that a number. xlrd does not purport to be able to render cell values according to the format ascribed to the cell.

The cells in question have evidently been entered as numbers. If they have had a text format applied to them, that does not make them "text cells".

You say """When I print out the cell value a float is displayed instead of the text""" ... please give some examples of (a) what was typed into the cell when the file was created (b) what is the evidence that "the cells are formatted as text" (c) what is repr(cell.value) (d) what is "the text" that you expected to be displayed?

You may find the following code useful:

import xlrd, sys

def dump_cell(sheet, rowx, colx):
    c = sheet.cell(rowx, colx)
    xf = sheet.book.xf_list[c.xf_index]
    fmt_obj = sheet.book.format_map[xf.format_key]
    print rowx, colx, repr(c.value), c.ctype, \ 
        fmt_obj.type, fmt_obj.format_key, fmt_obj.format_str

book = xlrd.open_workbook(sys.argv[1], formatting_info=1)
sheet = book.sheet_by_index(0)
for rowx in xrange(sheet.nrows):
    for colx in xrange(sheet.ncols):
        dump_cell(sheet, rowx, colx)