Top "Orc" questions

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data.

Parquet vs ORC vs ORC with Snappy

I am running a few tests on the storage formats available with Hive and using Parquet and ORC as major …

hadoop hive parquet snappy orc
Aggregating multiple columns with custom function in Spark

I was wondering if there is some way to specify a custom aggregation function for spark dataframes over multiple columns. …

scala apache-spark dataframe apache-spark-sql orc
Spark: Save Dataframe in ORC format

In the previous version, we used to have a 'saveAsOrcFile()' method on RDD. This is now gone! How do …

scala apache-spark apache-spark-sql orc
How to read an ORC file stored locally in Python Pandas?

Can I think of an ORC file as similar to a CSV file with column headings and row labels containing …

python pandas pyspark data-science orc
Reading an ORC file in Java

How do you read an ORC file in Java? I'm wanting to read in a small file for some unit …

java hadoop orc
CTAS with Dynamic Partition

I want to change an existing table, that contains text format, into orc format. I was able to do it …

hive partition orc
Hadoop ORC file - How it works - How to fetch metadata

I am new to ORC file. I went through many blogs, but didn't get clear understanding. Please help and clarify …

hadoop hive file-format orc
Hive,change table fileformat from orc to parquet is not supported?

I have a hive table like this: CREATE TABLE `abtestmsg_orc`( `eventname` string COMMENT 'AB测试方案上报事件:ABTest', `eventtime` string COMMENT '事件上报时间…

hive alter-table parquet orc
How do I Combine or Merge Small ORC files into Larger ORC file?

Most questions/answers on SO and the web discuss using Hive to combine a bunch of small ORC files into …

java hive hdfs orc
access fields of an array within pyspark dataframe

I am developing sql queries to a spark dataframe that are based on a group of ORC files. The program …

pyspark pyspark-sql orc