Where HDFS stores data

CuriousMind picture CuriousMind · Mar 21, 2014 · Viewed 32.2k times · Source

I am trying to understand where hadoop stores data in HDFS. I refer to the config files viz: core-site.xml and hdfs-site.xml

The property that I have set is:

  • In core-site.xml:

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/hadoop/tmp</value>
    </property>
    
  • In hdfs-site.xml:

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/hadoop/hdfs/namenode</value>
    </property>
    
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/hadoop/hdfs/datanode</value>
    </property>
    

With the above arrangement, like dfs.datanode.data.dir, the data blocks should be stored in this directory. Is this correct?

I referred to the apache hadoop link, and from that i see this:

  • core-default.xml: hadoop.tmp.dir --> A base for other temporary directories.

  • hdfs-default.xml dfs.datanode.data.dir --> Determines where on the local filesystem an DFS data node should store its blocks.

    The default value for this property being -> file://${hadoop.tmp.dir}/dfs/data

Since I explicitly provided the value for dfs.datanode.data.dir (hdfs-site.xml), does it mean data would be stored in that location? If so, would dfs/data be added to the directory to ${dfs.datanode.data.dir}, specifically would it become -> /hadoop/hdfs/datanode/dfs/data?

However I didn't see this directory structure getting created.

One observation that I saw in my env:

I saw that after I run some MapReduce programs, this directory is created viz: /hadoop/tmp/dfs/data is getting created.

So, not sure if data gets stored in the directory as suggested by the property dfs.datanode.data.dir.

Does anyone have similar experience?

Answer

RickH picture RickH · Mar 21, 2014

The data for hdfs files will be stored in the directory specified in dfs.datanode.data.dir, and the /dfs/data suffix that you see in the default value will not be appended.

If you edit hdfs-site.xml, you'll have to restart the DataNode service for the change to take effect. Also remember that changing the value will eliminate the ability of the DataNode service to supply blocks that were stored in the previous location.

Lastly, above you have your values specified with file:/... instead of file://.... File URI's do need that extra slash, so that might be causing these values to revert to the defaults.