Python : XLRD; compare the columns length

Sathy picture Sathy · Jun 4, 2013 · Viewed 9.3k times · Source

I'm using xlrd to work on xls files. My xls file has got two columns and my requirement is to make sure both the columns have got equal number of rows. I learnt from help() that we have got a row_len() to look for the length of a row given with the index, but unable to find any for col_len. Can you please help with any

Here is my code

from xlrd import open_workbook
spread_sheet=open_workbook("simple.xls")
sheet1=spread_sheet.sheet_by_index(0)

#validates the no of columns in the Spread sheet
 if sheet1.ncols == 2:
  for sheet1_rows in range(sheet1.nrows):
    for sheet1_cols in range(sheet1.ncols):
        value=sheet1.cell(sheet1_rows,sheet1_cols).value
        source=sheet1.cell(sheet1_rows,0).value
        destination=sheet1.cell(sheet1_rows,1).value
    #ignores the Source and Destination Headers 
    if value not in ('Source','Destination'):
        print "Source is : %s \nDestination is : %s\n" %    (source,destination)
 else:
  print "XLS provided is not valid. Check the no of columns is 2"

Some other options apart from comparing the below please

>>> print len(sheet1.col_values(0))
8
>>> print len(sheet1.col_values(1))
8

Thanks for your reply @alecxe. Instead adding few more lines to my code, I found out something below. please advise will this work out

 >>> print len(sheet1.col_values(0))
 6
 >>> print len(sheet1.col_values(1))
 6
 >>> sheet1.col_values(0)
 [u'A', 1.0, 1.0, 1.0, 1.0, 2.0]
 >>> sheet1.col_values(1)
 [u'B', 2.0, 2.0, 2.0, 2.0, '']
 >>> print len(filter(None,sheet1.col_values(1)))
 5
 >>>

Answer

alecxe picture alecxe · Jun 5, 2013

You can't use len(sheet.col_values(index)) for measuring how many cells are set in the column (column length). col_values length is always equal to sheet.nrows.

Imagine you have the following in the input.xls:

A B
1 2
1 2
1 2
1 2
  2 

Then len(sheet.col_values(0)) will return 5 (as well as len(sheet.col_values(1))), which is incorrect. Should be 4.

Instead, it's better to use something like this:

from itertools import takewhile
import xlrd


def column_len(sheet, index):
    col_values = sheet.col_values(index)
    col_len = len(col_values)
    for _ in takewhile(lambda x: not x, reversed(col_values)):
        col_len -= 1
    return col_len


book = xlrd.open_workbook("input.xls")
sheet = book.sheet_by_index(0)

print column_len(sheet, 0)  # prints 4
print column_len(sheet, 1)  # prints 5

Hope that helps.