I'm using xlrd to sniff through a directory structure and pull out spreadsheets, reading the second row (row 1) down to "do stuff." The issue is that I don't know how to stop the reading/printing at the first empty row. I understand that rows are not "empty" objects, but I'd appreciate a little help showing how to check if all cells are empty. Here is the code I'm working with:
import xlrd
import os
def excel_file_filter(filename, extensions=['.xls', '.xlsx']):
return any(filename.endswith(e) for e in extensions)
def get_filenames(root):
filename_list = []
for path, subdirs, files in os.walk(root):
for filename in filter(excel_file_filter, files):
filename_list.append(os.path.join(path, filename))
return filename_list
spreadsheets = get_filenames('C:\\Temp')
for s in spreadsheets:
with xlrd.open_workbook(s) as wb:
cs = wb.sheet_by_index(0)
num_cols = cs.ncols
for row_index in range(1, cs.nrows):
print('Row: {}'.format(row_index))
for col_index in range(0, num_cols):
cell_object = cs.cell(row_index, col_index)
if cell_obj is not xlrd.empty_cell:
print('Col #: {} | Value: {}'.format(col_index, cell_obj))
What ends up happening is that it prints all the way through almost 1000 rows, when only the first say, 25 rows have content in them. The amount of content between the spreadsheets varies, so a generic solution (that does not rely on other optional libraries) that helps me understand how to detect an empty row and then break, would be appreciated.
First: To get the cell value then check if it is empty, use one of the methods explained in the answer of question How to detect if a cell is empty when reading Excel files using the xlrd library?
cell_val= cs.cell(row_index, col_index).value
to get the value:
if cell_vel == ''
cell_object = cs.cell(row_index, col_index)
to get the value:
cell_type = cs.cell_type(row_index, col_index)
if cell_type == xlrd.XL_CELL_EMPTY
Second: To check if the whole row is empty you can do the following:
The Code:
# define empty_cell boolean
empty_cell= False
with xlrd.open_workbook(s) as wb:
cs= wb.sheet_by_index(0)
num_cols= cs.ncols
num_rows= cs.nrows
for row_index in range(1, num_rows):
# set count empty cells
count_empty = 0
print('Row: {}'.format(row_index))
for col_index in range(0,num_cols):
# get cell value
cell_val= cs.cell(row_index, col_index).value
# check if cell is empty
if cell_val== '':
# set empty cell is True
empty_cell = True
# increment counter
count_empty+= 1
else:
# set empty cell is false
empty_cell= False
# check if cell is not empty
if not empty_cell:
# print value of cell
print('Col #: {} | Value: {}'.format(col_index, cell_val))
# check the counter if is = num_cols means the whole row is empty
if count_empty == num_cols:
print ('Row is empty')
# stop looping to next rows
break
Note: I used the first method cell_val= cs.cell(row_index, col_index).value
to get the value of cells, I see it's simpler.
If you want to use the other method change the following:
cell_val= cs.cell(row_index, col_index) # remove .value
cell_type= cs.cell_type(row_index, col_index) # add this line
# check if cell is empty
if cell_type == xlrd.XL_CELL_EMPTY: # change if cell_val== '':
Other links that helped me understand how to check if a cell is empty:
xlrd.XL_CELL_EMPTY and Validating a cell value using XLRD