Top "Parquet" questions

Apache Parquet is a columnar storage format for Hadoop.

Hive - How to print the classpath of a Hive service

I need to check the classpath of the Hive service to see the location of the jars it loads while …

hadoop hive hortonworks-data-platform parquet
Overwrite parquet files from dynamic frame in AWS Glue

I use dynamic frames to write a parquet file in S3 but if a file already exists my program append …

amazon-web-services parquet aws-glue
How to write a partitioned Parquet file using Pandas

I'm trying to write a Pandas dataframe to a partitioned file: df.to_parquet('output.parquet', engine='pyarrow', partition_cols = […

python pandas parquet pyarrow
Impala - convert existing table to parquet format

I have a table that has partitions and I use avro files or text files to create and insert into …

text-files avro parquet impala
Spark lists all leaf node even in partitioned data

I have parquet data partitioned by date & hour, folder structure: events_v3 -- event_date=2015-01-01 -- event_…

apache-spark amazon-s3 apache-spark-sql partitioning parquet
Set parquet snappy output file size is hive?

I'm trying to split parquet/snappy files created by hive INSERT OVERWRITE TABLE... on dfs.block.size boundary as impala …

hive impala parquet snappy
Create hive external table from partitioned parquet files in Azure HDInsights

I have data saved as parquet files in Azure blob storage. Data is partitioned by year, month, day and hour …

azure hive parquet azure-hdinsight
Creating a parquet file on AWS Lambda function

I'm receiving a set of (1 Mb) CSV/JSON files on S3 that I would like to convert to Parquet. I …

java scala amazon-web-services parquet
Using parquet tools on files in hdfs

I downloaded and built parquet-1.5.0 of https://github.com/apache/parquet-mr. I now want to run some commands on my …

maven hdfs parquet parquet-mr
Spark2 Can't write dataframe to parquet hive table : HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`

I'm trying to save dataframe in table hive. In spark 1.6 it's work but after migration to 2.2.0 it doesn't work anymore. …

apache-spark hive parquet apache-spark-2.0