Top "Parquet" questions

Apache Parquet is a columnar storage format for Hadoop.

Transfer and write Parquet with python and pandas got timestamp error

I tried to concat() two parquet file with pandas in python . It can work , but when I try to write …

python pandas parquet
Spark Dataframe validating column names for parquet writes (scala)

I'm processing events using Dataframes converted from a stream of JSON events which eventually gets written out as as Parquet …

apache-spark apache-spark-sql spark-streaming spark-dataframe parquet
Read Parquet file stored in S3 with AWS Lambda (Python 3)

I am trying to load, process and write Parquet files in S3 with AWS Lambda. My testing / deployment process is: …

python amazon-s3 aws-lambda parquet pyarrow
Reading gz.parquet file

Hello I need to read the data from gz.parquet files but dont know how to?? Tried with impala but …

apache-spark hive apache-kafka parquet flume-twitter
Assign schema to pa.Table.from_pandas()

Im getting this error when transforming a pandas.DF to parquet using pyArrow: ArrowInvalid('Error converting from Python objects to …

python pandas parquet pyarrow
What is the benefit of using nested data types in Parquet?

Is there any performance benefit resulting from the usage of using nested data types in the Parquet file format? AFAIK …

apache-spark nested parquet data-files
Using predicates to filter rows from pyarrow.parquet.ParquetDataset

I have a parquet dataset stored on s3, and I would like to query specific rows from the dataset. I …

python pandas amazon-s3 parquet pyarrow
Hive,change table fileformat from orc to parquet is not supported?

I have a hive table like this: CREATE TABLE `abtestmsg_orc`( `eventname` string COMMENT 'AB测试方案上报事件:ABTest', `eventtime` string COMMENT '事件上报时间…

hive alter-table parquet orc
Unable to read a parquet file

I am breaking my head over this right now. I am new to this parquet files, and I am running …

python pandas parquet pyarrow fastparquet
read a parquet files from HDFS using PyArrow

I know I can connect to an HDFS cluster via pyarrow using pyarrow.hdfs.connect() I also know I can …

hdfs parquet pyarrow