Top "Hadoop" questions

Hadoop is an Apache open-source project that provides software for reliable and scalable distributed computing.

How to delete files from the HDFS?

I just downloaded Hortonworks sandbox VM, inside it there are Hadoop with the version 2.7.1. I adding some files by using …

hadoop hdfs hortonworks-data-platform
How to rename a hive table without changing location?

Based on the Hive doc below: Rename Table ALTER TABLE table_name RENAME TO new_table_name; This statement lets …

hadoop hive hiveql
Hive query to quickly find table size (number of rows)

Is there a Hive query to quickly find table size (i.e. number of rows) without launching a time-consuming MapReduce …

hadoop hive
Where are logs in Spark on YARN?

I'm new to spark. Now I can run spark 0.9.1 on yarn (2.0.0-cdh4.2.1). But there is no log after execution. The …

hadoop logging apache-spark cloudera yarn
Avro vs. Parquet

I'm planning to use one of the hadoop file format for my hadoop related project. I understand parquet is efficient …

hadoop avro parquet
HDFS free space available command

Is there a hdfs command to see available free space in hdfs. We can see that through browser at master:…

hadoop hdfs
Hive cluster by vs order by vs sort by

As far as I understand; sort by only sorts with in the reducer order by orders things globally but shoves …

hadoop hql hive
Cannot Read a file from HDFS using Spark

I have installed cloudera CDH 5 by using cloudera manager. I can easily do hadoop fs -ls /input/war-and-peace.txt hadoop …

hadoop apache-spark cloudera-cdh
Convert string to timestamp in Hive

I have the following string representation of a timestamp in my Hive table: 20130502081559999 I need to convert it to a …

hadoop hive hiveql
No data nodes are started

I am trying to setup Hadoop version 0.20.203.0 in a pseudo distributed configuration using the following guide: http://www.javacodegeeks.com/2012/01/…

hadoop hdfs