Unable to read a parquet file

Anonymous Person picture Anonymous Person · Mar 13, 2019 · Viewed 7.5k times · Source

I am breaking my head over this right now. I am new to this parquet files, and I am running into a LOT of issues with it.

I am thrown an error that reads OSError: Passed non-file path: \datasets\proj\train\train.parquet each time I try to create a df from it.

I've tried this: pq.read_pandas(r'E:\datasets\proj\train\train.parquet').to_pandas() AND od = pd.read_parquet(r'E:\datasets\proj\train\train.parquet', engine='pyarrow')

I also changed the drive letter of the drive the dataset resides, and it's the SAME THING!

It's the same with all engines.

PLEASE HELP!

Answer

Uwe L. Korn picture Uwe L. Korn · Mar 14, 2019

This might be a problem with Arrow's file path handling. You could instead pass in an already opened file:

import pandas as pd

with open(r'E:\datasets\proj\train\train.parquet', 'rb') as f:
    df = pd.read_parquet(f, engine='pyarrow')