I am just starting to learn python and have a question.
How to create a script to do following: ( will write how i do it in bash)
Copy <file>.gz
from remote server1 to local storage.
cp /dumps/server1/file1.gz /local/
Then extract that file locally.
gunzip /local/file1.gz
Then copy the extract file to remote server2 (for archiving and deduplication purposes)
cp /local/file1.dump /dedupmount
delete local copy of .gz file to free space on "temporary" storage
rm -rf /local/file1.gz
I need to run all that in loop for all files. All files and directories are NFS mounted on same server.
A for loop goes through /dump/
folder and looks for .gz
files.
Each .gz
file will be first copied to /local
directory, and then extracted there.
Once extracted, the unzipped .dmp
file will be copied to /dedupmount
folder for archiving.
Just banging my head on wall how to write this.
While the shell code might be shorter, the whole process can be done natively in python. The key points in the python solution are:
With the gzip
module, gzipped files are as easy to read as normal files.
To obtain the list of source files, the glob
module is used. It is modeled after the shell glob feature.
To manipulate paths, use the python os.path
module. It provides a OS-independent interface to the file system.
Here is sample code:
import gzip
import glob
import os.path
source_dir = "/dumps/server1"
dest_dir = "/dedupmount"
for src_name in glob.glob(os.path.join(source_dir, '*.gz')):
base = os.path.basename(src_name)
dest_name = os.path.join(dest_dir, base[:-3])
with gzip.open(src_name, 'rb') as infile:
with open(dest_name, 'wb') as outfile:
for line in infile:
outfile.write(line)
This code reads from the remote1 server and writes to the remote2 server. This is no need for a local copy unless you want one.
In this code, all decompression is done by the CPU on the local machine.
For comparison, here is the equivalent shell code:
for src in /dumps/server1/*.gz
do
base=${src##*/}
dest="/dedupmount/${base%.gz}"
zcat "$src" >"$dest"
done
This slightly more complex approach implements the OP's three-step algorithm which uses a temporary file on the local machine:
import gzip
import glob
import os.path
import shutil
source_dir = "./dumps/server1"
dest_dir = "./dedupmount"
tmpfile = "/tmp/delete.me"
for src_name in glob.glob(os.path.join(source_dir, '*.gz')):
base = os.path.basename(src_name)
dest_name = os.path.join(dest_dir, base[:-3])
shutil.copyfile(src_name, tmpfile)
with gzip.open(tmpfile, 'rb') as infile:
with open(dest_name, 'wb') as outfile:
for line in infile:
outfile.write(line)
This copies the source file to a temporary file on the local machine, tmpfile
, and then gunzips it from there to the destination file. tmpfile
will be overwritten with every invocation of this script.
Temporary files can be a security issue. To avoid this, place the temporary file in a directory that is write-able only by the user who runs this script.