I am writing a shell script to put data into hadoop as soon as they are generated. I can ssh to my master node, copy the files to a folder over there and then put them into hadoop. I am looking for a shell command to get rid of copying the file to the local disk on master node. to better explain what I need, here below you can find what I have so far:
1) copy the file to the master node's local disk:
scp test.txt username@masternode:/folderName/
I have already setup SSH connection using keys. So no password is needed to do this.
2) I can use ssh to remotely execute the hadoop put command:
ssh username@masternode "hadoop dfs -put /folderName/test.txt hadoopFolderName/"
what I am looking for is how to pipe/combine these two steps into one and skip the local copy of the file on masterNode's local disk.
thanks
In other words, I want to pipe several command in a way that I can
Try this (untested):
cat test.txt | ssh username@masternode "hadoop dfs -put - hadoopFoldername/test.txt"
I've used similar tricks to copy directories around:
tar cf - . | ssh remote "(cd /destination && tar xvf -)"
This sends the output of local-tar
into the input of remote-tar
.