Top "Parquet" questions

Apache Parquet is a columnar storage format for Hadoop.

spark parquet write gets slow as partitions grow

I have a spark streaming application that writes parquet data from stream. sqlContext.sql( """ |select |to_date(from_utc_timestamp(…

apache-spark partitioning parquet
Fast Parquet row count in Spark

The Parquet files contain a per-block row count field. Spark seems to read it at some point (SpecificParquetRecordReaderBase.java#L151). …

apache-spark parquet
Convert file of JSON objects to Parquet file

Motivation: I want to load the data into Apache Drill. I understand that Drill can handle JSON input, but I …

json apache parquet apache-drill
How to convert an 500GB SQL table into Apache Parquet?

Perhaps this is well documented, but I am getting very confused how to do this (there are many Apache tools). …

mysql sql-server hadoop parquet
How to read and write Map<String, Object> from/to parquet file in Java or Scala?

Looking for a concise example on how to read and write Map<String, Object> from/to parquet file …

java scala avro parquet
Load Parquet files into Redshift

I have a bunch of Parquet files on S3, i want to load them into redshift in most optimal way. …

amazon-web-services amazon-ec2 amazon-redshift parquet amazon-redshift-spectrum
Spark + Parquet + Snappy: Overall compression ratio loses after spark shuffles data

Commmunity! Please help me understand how to get better compression ratio with Spark? Let me describe case: I have dataset, …

apache-spark apache-spark-sql spark-dataframe parquet snappy
How to Query parquet data from Amazon Athena?

Athena creates a temporary table using fields in S3 table. I have done this using JSON data. Could you help …

amazon-web-services parquet amazon-athena
Nested data in Parquet with Python

I have a file that has one JSON per line. Here is a sample: { "product": { "id": "abcdef", "price": 19.99, "specs": { "voltage": "110…

python json parquet dask
Documentation for Apache's Parquet Java API?

I would like to use Apache's parquet-mr project to read/write Parquet files programmatically with Java. I can't seem to …

parquet