Top "Hadoop" questions

Hadoop is an Apache open-source project that provides software for reliable and scalable distributed computing.

Parse CSV as DataFrame/DataSet with Apache Spark and Java

I am new to spark, and I want to use group-by & reduce to find the following from CSV (one …

java apache-spark hadoop apache-spark-sql hdfs
How to list all hive databases being in use or created so far?

Similar to SHOW TABLES command, do we have any such command to list all databases created so far?

hadoop hive hiveql
List the namenode and datanodes of a cluster from any node?

From any node in a Hadoop cluster, what is the command to identify the running namenode? identify all running datanodes? …

hadoop mapreduce
Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)

I want to debug a mapreduce script, and without going into much trouble tried to put some print statements in …

hadoop mapreduce
There are 0 datanode(s) running and no node(s) are excluded in this operation

I have set up a multi node Hadoop Cluster. The NameNode and Secondary namenode runs on the same machine and …

ubuntu hadoop amazon-ec2 hdfs hadoop2
what's the difference between "hadoop fs" shell commands and "hdfs dfs" shell commands?

Are they supposed to be equal? but, why the "hadoop fs" commands show the hdfs files while the "hdfs dfs" …

hadoop hdfs
Hive dynamic partitioning

I'm trying to create a partitioned table using dynamic partitioning, but i'm facing an issue. I'm running Hive 0.12 on Hortonworks …

hadoop hive hiveql
Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)

I have 3 data nodes running, while running a job i am getting the following given below error , java.io.IOException: …

java hadoop mapreduce hive hdfs
How do I get schema / column names from parquet file?

I have a file stored in HDFS as part-m-00000.gz.parquet I've tried to run hdfs dfs -text dir/part-m-00000.…

hadoop apache-pig hdfs parquet
How does the MapReduce sort algorithm work?

One of the main examples that is used in demonstrating the power of MapReduce is the Terasort benchmark. I'm having …

algorithm sorting parallel-processing hadoop mapreduce