Top "Parquet" questions

Apache Parquet is a columnar storage format for Hadoop.

How to write parquet file from pandas dataframe in S3 in python

I have a pandas dataframe. i want to write this dataframe to parquet file in S3. I need a sample …

python-3.x amazon-s3 parquet
Installing parquet-tools

I am trying to install parquet tools on a FreeBSD machine. I cloned this repo: git clone https://github.com/…

java maven freebsd parquet parquet-mr
Convert Parquet to CSV

How to convert Parquet to CSV from a local file system (e.g. python, some library etc.) but WITHOUT Spark? (…

python csv command-line parquet
How to handle changing parquet schema in Apache Spark

I have run into a problem where I have Parquet data as daily chunks in S3 (in the form of …

apache-spark apache-spark-sql spark-dataframe emr parquet
Using pyarrow how do you append to parquet file?

How do you append/update to a parquet file with pyarrow? import pandas as pd import pyarrow as pa import …

python pandas parquet pyarrow
Spark : Read file only if the path exists

I am trying to read the files present at Sequence of Paths in scala. Below is the sample (pseudo) code: …

scala apache-spark parquet
Index in Parquet

I would like to be able to do a fast range query on a Parquet table. The amount of data …

indexing parquet
Is it better to have one large parquet file or lots of smaller parquet files?

I understand hdfs will split files into something like 64mb chunks. We have data coming in streaming and we can …

hadoop apache-spark parquet
A comparison between fastparquet and pyarrow?

After some searching I failed to find a thorough comparison of fastparquet and pyarrow. I found this blog post (a …

python parquet dask pyarrow fastparquet
How to handle null values when writing to parquet from Spark

Until recently parquet did not support null values - a questionable premise. In fact a recent version did finally add …

apache-spark parquet