Top "Parquet" questions

Apache Parquet is a columnar storage format for Hadoop.

Transfer and write Parquet with python and pandas got timestamp error

I tried to concat() two parquet file with pandas in python . It can work , but when I try to write …

python pandas parquet
Spark Dataframe validating column names for parquet writes (scala)

I'm processing events using Dataframes converted from a stream of JSON events which eventually gets written out as as Parquet …

apache-spark apache-spark-sql spark-streaming spark-dataframe parquet
Read Parquet file stored in S3 with AWS Lambda (Python 3)

I am trying to load, process and write Parquet files in S3 with AWS Lambda. My testing / deployment process is: …

python amazon-s3 aws-lambda parquet pyarrow
Reading gz.parquet file

Hello I need to read the data from gz.parquet files but dont know how to?? Tried with impala but …

apache-spark hive apache-kafka parquet flume-twitter
Assign schema to pa.Table.from_pandas()

Im getting this error when transforming a pandas.DF to parquet using pyArrow: ArrowInvalid('Error converting from Python objects to …

python pandas parquet pyarrow
What is the benefit of using nested data types in Parquet?

Is there any performance benefit resulting from the usage of using nested data types in the Parquet file format? AFAIK …

apache-spark nested parquet data-files