Top "Parquet" questions

Apache Parquet is a columnar storage format for Hadoop.

Write parquet from AWS Kinesis firehose to AWS S3

I would like to ingest data into s3 from kinesis firehose formatted as parquet. So far I have just find …

json amazon-web-services amazon-s3 parquet amazon-kinesis-firehose
How to Convert Many CSV files to Parquet using AWS Glue

I'm using AWS S3, Glue, and Athena with the following setup: S3 --> Glue --> Athena My raw …

amazon-s3 parquet amazon-athena aws-glue
Parquet schema and Spark

I am trying to convert CSV files to parquet and i am using Spark to accomplish this. SparkSession spark = SparkSession .…

java scala apache-spark parquet spark-csv
Redshift COPY command for Parquet format with Snappy compression

I have datasets in HDFS which is in parquet format with snappy as compression codec. As far as my research …

amazon-s3 compression amazon-redshift parquet snappy
Hive - Varchar vs String , Is there any advantage if the storage format is Parquet file format

I have a HIVE table which will hold billions of records, its a time-series data so the partition is per …

hive hql parquet hcatalog
Project_Bank.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [110, 111, 13, 10]

So i was trying to load the csv file inferring custom schema but everytime i end up with the following …

mysql csv apache-spark parquet spark-shell
How to convert spark SchemaRDD into RDD of my case class?

In the spark docs it's clear how to create parquet files from RDD of your own case classes; (from the …

sql apache-spark parquet
Cloudera 5.6: Parquet does not support date. See HIVE-6384

I am currently using Cloudera 5.6 trying to create a parquet format table in hive table based off another table, but …

hive cloudera parquet
EntityTooLarge error when uploading a 5G file to Amazon S3

Amazon S3 file size limit is supposed to be 5T according to this announcement, but I am getting the following …

amazon-s3 apache-spark jets3t parquet apache-spark-sql
Multiple spark jobs appending parquet data to same base path with partitioning

I have multiple jobs that I want to execute in parallel that append daily data into the same path using …

apache-spark parquet