Top "Parquet" questions

Apache Parquet is a columnar storage format for Hadoop.

Can we load Parquet file into Hive directly?

I know we can load parquet file using Spark SQL and using Impala but wondering if we can do the …

hadoop hive apache-spark-sql hiveql parquet
How to partition and write DataFrame in Spark without deleting partitions with no new data?

I am trying to save a DataFrame to HDFS in Parquet format using DataFrameWriter, partitioned by three column values, like …

apache-spark spark-dataframe partitioning parquet
How to read partitioned parquet files from S3 using pyarrow in python

I looking for ways to read data from multiple partitioned directories from s3 using python. data_folder/serial_number=1/cur_…

python parquet pyarrow fastparquet python-s3fs
Convert csv to parquet file using python

I am trying to convert a .csv file to a .parquet file. The csv file (Temp.csv) has the following …

python csv parquet
Reading parquet files from multiple directories in Pyspark

I need to read parquet files from multiple paths that are not parent or child directories. for example, dir1 --- | …

pyspark parquet
create parquet files in java

Is there a way to create parquet files from java? I have data in memory (java classes) and I want …

java parquet
Schema evolution in parquet format

Currently we are using Avro data format in production. Out of several good points using Avro, we know that it …

apache-spark hadoop data-warehouse avro parquet
How do you control the size of the output file?

In spark, what is the best way to control file size of the output file. For example, in log4j, …

apache-spark parquet
SPARK DataFrame: How to efficiently split dataframe for each group based on same column values

I have a DataFrame generated as follows: df.groupBy($"Hour", $"Category") .agg(sum($"value").alias("TotalValue")) .sort($"Hour".asc,$"TotalValue".…

scala apache-spark apache-spark-sql spark-dataframe parquet
How do I read a Parquet in R and convert it to an R DataFrame?

I'd like to process Apache Parquet files (in my case, generated in Spark) in the R programming language. Is an …

r apache-spark parquet sparkr