Is there any way to download a HDFS file using WebHDFS REST API?

Tariq picture Tariq · May 31, 2013 · Viewed 11.6k times · Source

Is there any way by which I can download a file from HDFS using WebHDFS REST API?The closest I have reached is to use the open operation to read the file and save the content.

curl -i -L "http://localhost:50075/webhdfs/v1/demofile.txt?op=OPEN" -o ~/demofile.txt

Is there any API that will allow me to download the file directly without having to open it?I went through the official document and tried Google as well, but could not find anything. Could somebody point me in the right direction or provide me some pointers?

Thank you so much for your valuable time.

Answer

Charles Menguy picture Charles Menguy · Jun 1, 2013

You could probably use the DataNode API for this (default on port 50075), it supports a streamFile command which you could take advantage of. Using wget this would look something like:

wget http://$datanode:50075/streamFile/demofile.txt -O ~/demofile.txt

Note that this command needs to be executed on the datanode itself, not on the namenode !

Alternatively, if you don't know which datanode to hit you could ask the jobtracker and it will redirect you to the right datanode with this URL:

http://$namenode:50070/data/demofile.txt