I am running my shell script on machineA
which copies the files from machineB
and machineC
to machineA
.
If the file is not there in machineB
, then it should be there in machineC
for sure. So I will try to copy from machineB
first, if it is not there in machineB
then I will go to machineC
to copy the same files.
In machineB
and machineC
there will be a folder like this YYYYMMDD
inside this folder -
/data/pe_t1_snapshot
So whatever date is the latest date in this format YYYYMMDD
inside the above folder - I will pick that folder as the full path from where I need to start copying the files -
so suppose if this is the latest date folder 20140317
inside /data/pe_t1_snapshot
then this will be the full path for me -
/data/pe_t1_snapshot/20140317
from where I need to start copying the files in machineB
and machineC
. I need to copy around 400
files in machineA
from machineB
and machineC
and each file size is 1.5 GB
.
Currently I have my below shell script which works fine as I am using scp
but somehow it takes ~2 hours
to copy the 400
files in machineA which is too long for me I guess. :(
Below is my shell script -
#!/bin/bash
readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot
PRIMARY_PARTITION=(0 3 5 7 9) # this will have more file numbers around 200
SECONDARY_PARTITION=(1 2 4 6 8) # this will have more file numbers around 200
dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
echo $dir1
echo $dir2
if [ "$dir1" = "$dir2" ]
then
# delete all the files first
find "$PRIMARY" -mindepth 1 -delete
for el in "${PRIMARY_PARTITION[@]}"
do
scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.
done
# delete all the files first
find "$SECONDARY" -mindepth 1 -delete
for sl in "${SECONDARY_PARTITION[@]}"
do
scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.
done
fi
I am copying PRIMARY_PARTITION
files in PRIMARY
folder and SECONDARY_PARTITION
files in SECONDARY
folder in machineA
.
Is there any way to move the files faster in machineA
. Can I copy 10 files at a time or 5 files at a time in parallel to speed up this process or any other approach?
NOTE: machineA
is running on SSD
UPDATE:-
Parallel Shell Script which I tried, top portion of shell script is same as shown above.
if [ "$dir1" = "$dir2" ] && [ "$length1" -gt 0 ] && [ "$length2" -gt 0 ]
then
find "$PRIMARY" -mindepth 1 -delete
for el in "${PRIMARY_PARTITION[@]}"
do
(scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.) &
WAITPID="$WAITPID $!"
done
find "$SECONDARY" -mindepth 1 -delete
for sl in "${SECONDARY_PARTITION[@]}"
do
(scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.) &
WAITPID="$WAITPID $!"
done
wait $WAITPID
echo "All files done copying."
fi
Errors I got with parallel shell script-
channel 24: open failed: administratively prohibited: open failed
channel 25: open failed: administratively prohibited: open failed
channel 26: open failed: administratively prohibited: open failed
channel 28: open failed: administratively prohibited: open failed
channel 30: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 32: open failed: administratively prohibited: open failed
channel 36: open failed: administratively prohibited: open failed
channel 37: open failed: administratively prohibited: open failed
channel 38: open failed: administratively prohibited: open failed
channel 40: open failed: administratively prohibited: open failed
channel 46: open failed: administratively prohibited: open failed
channel 47: open failed: administratively prohibited: open failed
channel 49: open failed: administratively prohibited: open failed
channel 52: open failed: administratively prohibited: open failed
channel 54: open failed: administratively prohibited: open failed
channel 55: open failed: administratively prohibited: open failed
channel 56: open failed: administratively prohibited: open failed
channel 57: open failed: administratively prohibited: open failed
channel 59: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 61: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 64: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 68: open failed: administratively prohibited: open failed
channel 72: open failed: administratively prohibited: open failed
channel 74: open failed: administratively prohibited: open failed
channel 76: open failed: administratively prohibited: open failed
channel 78: open failed: administratively prohibited: open failed
you can try this command
rsync
from the
man rsync
you will see that: The rsync remote-update protocol allows rsync to transfer just the differences between two sets of files across the network connection, using an efficient checksum-search algorithm described in the technical report that accompanies this package.