Unable to read xlsb file using pandas

Syed Afsahul picture Syed Afsahul · Feb 10, 2020 · Viewed 9.3k times · Source

I am trying to read an xlsb file from local using pandas' read_excel but I am getting error. My code:

import pandas as pd
df3 = pd.read_excel('a.xlsb', engine = 'pyxlsb')


Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-06db88cb2446> in <module>
----> 1 pd.read_excel('a.xlsb', engine='pyxlsb')

/usr/local/lib/python3.5/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    186                 else:
    187                     kwargs[new_arg_name] = new_arg_value
--> 188             return func(*args, **kwargs)
    189         return wrapper
    190     return _deprecate_kwarg

/usr/local/lib/python3.5/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    186                 else:
    187                     kwargs[new_arg_name] = new_arg_value
--> 188             return func(*args, **kwargs)
    189         return wrapper
    190     return _deprecate_kwarg

/usr/local/lib/python3.5/dist-packages/pandas/io/excel.py in read_excel(io, sheet_name, header, names, index_col, parse_cols, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skip_footer, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    348 
    349     if not isinstance(io, ExcelFile):
--> 350         io = ExcelFile(io, engine=engine)
    351 
    352     return io.parse(

/usr/local/lib/python3.5/dist-packages/pandas/io/excel.py in __init__(self, io, engine)
    644             engine = 'xlrd'
    645         if engine not in self._engines:
--> 646             raise ValueError("Unknown engine: {engine}".format(engine=engine))
    647 
    648         # could be a str, ExcelFile, Book, etc.

ValueError: Unknown engine: pyxlsb

It works fine for csv and xlsx files.

python version: 3.5.2
pandas version: 0.24.2

Answer

Naveen kumar Nandyala picture Naveen kumar Nandyala · Feb 11, 2020

First install pyxlsb and run the below code.After running the code, you'll have your data stored in df1.

pip install pyxlsb

import pandas as pd
from pyxlsb import open_workbook

df=[]
with open_workbook('some.xlsb') as wb:
    with wb.get_sheet(1) as sheet:
        for row in sheet.rows():
            df.append([item.v for item in row])

df1 = pd.DataFrame(df[1:], columns=df[0])