Top "Hdfs" questions

Hadoop Distributed File System (HDFS) is the default file storage system used by Apache Hadoop.

Best splittable compression for Hadoop input = bz2?

We've realized a bit too late that archiving our files in GZip format for Hadoop processing isn't such a great …

hadoop gzip hdfs bzip2
High throughput vs low latency in HDFS

I tried to define what the high throughput vs low latency means in HDFS in my own words, and came …

hadoop hdfs low-latency throughput
Setting fs.default.name in core-site.xml Sets HDFS to Safemode

I installed the Cloudera CDH4 distribution on a single machine in pseudo-distributed mode and successfully tested that it was working …

hadoop hdfs cloudera
Pig: Get top n values per group

I have data that's already grouped and aggregated, it looks like so: user value count ---- -------- ------ Alice third 5 …

hadoop hdfs apache-pig
How to read only n rows of large CSV file on HDFS using spark-csv package?

I have a big distributed file on HDFS and each time I use sqlContext with spark-csv package, it first loads …

apache-spark pyspark hdfs apache-spark-sql spark-csv
Viewing the number of blocks for a file in hadoop

How can I view how many blocks has a file been broken into, in a Hadoop file system?

hadoop hdfs
hadoop user file permissions

i have a problem in setting hadoop file permissions in hortonworks and cloudera. My requirement is: 1. create a new user …

hadoop permissions hdfs cloudera hortonworks-data-platform
Delete files older than 10days on HDFS

Is there a way to delete files older than 10 days on HDFS? In Linux I would use: find /path/to/…

hadoop hdfs
Checksum verification in Hadoop

Do we need to verify checksum after we move files to Hadoop (HDFS) from a Linux server through a Webhdfs ? …

hadoop hdfs checksum
The node /hbase is not in ZooKeeper

I am a newbie in Hadoop trying to install Hbase in pseudo distributed mode, version hbase-0.98.10.1-hadoop1-bin, with Hadoop 2.5.2 . …

hadoop hbase hdfs