Top "Hadoop" questions

Hadoop is an Apache open-source project that provides software for reliable and scalable distributed computing.

What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming?

In Map Reduce programming the reduce phase has shuffling, sorting and reduce as its sub-parts. Sorting is a costly affair. …

sorting hadoop mapreduce hdfs shuffle
Hive load CSV with commas in quoted fields

I am trying to load a CSV file into a Hive table like so: CREATE TABLE mytable ( num1 INT, text1 …

hadoop hbase hive hdfs delimiter
Apache Spark: The number of cores vs. the number of executors

I'm trying to understand the relationship of the number of cores and the number of executors when running a Spark …

hadoop apache-spark yarn
How to list all files in a directory and its subdirectories in hadoop hdfs

I have a folder in hdfs which has two subfolders each one has about 30 subfolders which,finally,each one contains …

java hadoop hdfs
Hive ParseException - cannot recognize input near 'end' 'string'

I am getting the following error when trying to create a Hive table from an existing DynamoDB table: NoViableAltException(88@[]) at …

hadoop mapreduce hive bigdata amazon-dynamodb
How to copy data from one HDFS to another HDFS?

I have two HDFS setup and want to copy (not migrate or move) some tables from HDFS1 to HDFS2. How …

hadoop hdfs bigdata sqoop
What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism?

What's the difference between spark.sql.shuffle.partitions and spark.default.parallelism? I have tried to set both of them …

performance apache-spark hadoop apache-spark-sql
Is there any way to get the column name along with the output while execute any query in Hive?

In Hive, when we do a query (like: select * from employee), we do not get any column names in the …

hadoop hive rdbms
What are the pros and cons of parquet format compared to other formats?

Characteristics of Apache Parquet are : Self-describing Columnar format Language-independent In comparison to Avro, Sequence Files, RC File etc. I want …

file hadoop hdfs avro parquet
Explode the Array of Struct in Hive

This is the below Hive Table CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable ( USER_ID BIGINT, NEW_ITEM ARRAY<…

hadoop mapreduce hive hiveql