Top "Hdfs" questions

Hadoop Distributed File System (HDFS) is the default file storage system used by Apache Hadoop.

No data nodes are started

I am trying to setup Hadoop version 0.20.203.0 in a pseudo distributed configuration using the following guide: http://www.javacodegeeks.com/2012/01/…

hadoop hdfs
Default Namenode port of HDFS is 50070.But I have come across at some places 8020 or 9000

When I setup the hadoop cluster, I read the namenode runs on 50070 and I set up accordingly and it's running …

hadoop hdfs
hadoop fs -put command

I have constructed a single-node Hadoop environment on CentOS using the Cloudera CDH repository. When I want to copy a …

shell hadoop hdfs put
Difference between HBase and Hadoop/HDFS

This is kind of naive question but I am new to NoSQL paradigm and don't know much about it. So …

hadoop nosql hbase hdfs difference
Hadoop copy a directory?

Is there an HDFS API that can copy an entire local directory to the HDFS? I found an API for …

hadoop hdfs
Write to multiple outputs by key Spark - one Spark job

How can you write to multiple outputs dependent on the key using Spark in a single Job. Related: Write to …

scala hadoop output hdfs apache-spark
data block size in HDFS, why 64MB?

The default data block size of HDFS/hadoop is 64MB. The block size in disk is generally 4KB. What does 64…

database hadoop mapreduce block hdfs
Reading HDFS and local files in Java

I want to read file paths irrespective of whether they are HDFS or local. Currently, I pass the local paths …

java hadoop mapreduce hdfs
Python read file as stream from HDFS

Here is my problem: I have a file in HDFS which can potentially be huge (=not enough to fit all …

python hadoop subprocess hdfs
Parse CSV as DataFrame/DataSet with Apache Spark and Java

I am new to spark, and I want to use group-by & reduce to find the following from CSV (one …

java apache-spark hadoop apache-spark-sql hdfs