Unix Shell scripting for copying files and creating directory

Nick Fortescue picture Nick Fortescue · Jan 30, 2009 · Viewed 13.2k times · Source

I have a source directory eg /my/source/directory/ and a destination directory eg /my/dest/directory/, which I want to mirror with some constraints.

  • I want to copy files which meet certain criteria of the find command, eg -ctime -2 (less than 2 days old) to the dest directory to mirror it
  • I want to include some of the prefix so I know where it came from, eg /source/directory
  • I'd like to do all this with absolute paths so it doesn't depend which directory I run from
  • I'd guess not having cd commands is good practice too.
  • I want the subdirectories created if they don't exist

So

/my/source/directory/1/foo.txt -> /my/dest/directory/source/directory/1/foo.txt
/my/source/directory/2/3/bar.txt -> /my/dest/directory/source/directory/2/3/bar.txt

I've hacked together the following command line but it seems a bit ugly, can anyone do better?

find /my/source/directory -ctime -2 -type f -printf "%P\n" | xargs -IFILE rsync -avR /my/./source/directory/FILE /my/dest/directory/

Please comment if you think I should add this command line as an answer myself, I didn't want to be greedy for reputation.

Answer

Jonathan Leffler picture Jonathan Leffler · Jan 30, 2009

This is remarkably similar to a (closed) question: Bash scripting copying files without overwriting. The answer I gave cites the 'find | cpio' solution mentioned in other answers (minus the time criteria, but that's the difference between 'similar' and 'same'), and also outlines a solution using GNU 'tar'.

ctime

When I tested on Solaris, neither GNU tar nor (Solaris) cpio was able to preserve the ctime setting; indeed, I'm not sure that there is any way to do that. For example, the touch command can set the atime or the mtime or both - but not the ctime. The utime() system call also only takes the mtime or atime values; it does not handle ctime. So, I believe that if you find a solution that preserves ctime, that solution is likely to be platform-specific. (Weird example: hack the disk device and edit the data in the inode - not portable, requires elevated privileges.) Rereading the question, though, I see that 'preserving ctime' is not part of the requirements (phew); it is simply the criterion for whether the file is copied or not.

chdir

I think that the 'cd' operations are necessary - but they can be wholly localized to the script or command line, though, as illustrated in the question cited and the command lines below, the second of which assumes GNU tar.

(cd /my; find source/directory -ctime -2 | cpio -pvdm /my/dest/directory)

(cd /my; find source/directory -ctime -2 | tar -cf - -F - ) |
    (cd /my/dest/directory; tar -xf -)

Without using chdir() (aka cd), you need specialized tools or options to handle the manipulation of the pathnames on the fly.

Names with blanks, newlines, etc

The GNU-specific 'find -print0' and 'xargs -0' are very powerful and effective, as noted by Adam Hawes. Funnily enough, GNU cpio has an option to handle the output from 'find -print0', and that is '--null' or its short form '-0'. So, using GNU find and GNU cpio, the safe command is:

(cd /my; find source/directory -ctime -2 -print0 |
    cpio -pvdm0 /my/dest/directory)

Note:This does not overwrite pre-existing files under the backup directory. Add -u to the cpio command for that.

Similarly, GNU tar supports --null (apparently with no -0 short-form), and could also be used:

(cd /my; find source/directory -ctime -2 -print0 | tar -cf - -F - --null ) |
    (cd /my/dest/directory; tar -xf -)

The GNU handling of file names with the null terminator is extremely clever and a valuable innovation (though I only became aware of it fairly recently, courtesy of SO; it has been in GNU tar for at least a decade).