Is there any way by which I can download a file from HDFS using WebHDFS REST API?The closest I have reached is to use the open operation to read the file and save the content.
curl -i -L "http://localhost:50075/webhdfs/v1/demofile.txt?op=OPEN" -o ~/demofile.txt
Is there any API that will allow me to download the file directly without having to open it?I went through the official document and tried Google as well, but could not find anything. Could somebody point me in the right direction or provide me some pointers?
Thank you so much for your valuable time.
You could probably use the DataNode API for this (default on port 50075), it supports a streamFile
command which you could take advantage of. Using wget
this would look something like:
wget http://$datanode:50075/streamFile/demofile.txt -O ~/demofile.txt
Note that this command needs to be executed on the datanode itself, not on the namenode !
Alternatively, if you don't know which datanode to hit you could ask the jobtracker and it will redirect you to the right datanode with this URL:
http://$namenode:50070/data/demofile.txt