Unable to read a parquet file

python pandas parquet pyarrow fastparquet

Anonymous Person · Mar 13, 2019 · Viewed 7.5k times · Source

I am breaking my head over this right now. I am new to this parquet files, and I am running into a LOT of issues with it.

I am thrown an error that reads OSError: Passed non-file path: \datasets\proj\train\train.parquet each time I try to create a df from it.

I've tried this: pq.read_pandas(r'E:\datasets\proj\train\train.parquet').to_pandas() AND od = pd.read_parquet(r'E:\datasets\proj\train\train.parquet', engine='pyarrow')

I also changed the drive letter of the drive the dataset resides, and it's the SAME THING!

It's the same with all engines.

PLEASE HELP!

Answer

This might be a problem with Arrow's file path handling. You could instead pass in an already opened file:

import pandas as pd

with open(r'E:\datasets\proj\train\train.parquet', 'rb') as f:
    df = pd.read_parquet(f, engine='pyarrow')

Unable to read a parquet file

Answer

Related questions