Python shutil copytree: use ignore function to keep specific files types

Peter Wilson picture Peter Wilson · Feb 27, 2017 · Viewed 20.9k times · Source

I'm trying to figure out how to copy CAD drawings (".dwg", ".dxf) from a source directory with subfolders to a destination directory and maintaining the original directory and subfolders structure.

  • Original Directory: H:\Tanzania...\Bagamoyo_Single_line.dwg
  • Source Directory: H:\CAD\Tanzania...\Bagamoyo_Single_line.dwg

I found the following answer from @martineau within the following post: Python Factory Function

from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree

def include_patterns(*patterns):
    """Factory function that can be used with copytree() ignore parameter.

    Arguments define a sequence of glob-style patterns
    that are used to specify what files to NOT ignore.
    Creates and returns a function that determines this for each directory
    in the file hierarchy rooted at the source directory when used with
    shutil.copytree().
    """
    def _ignore_patterns(path, names):
        keep = set(name for pattern in patterns
                            for name in filter(names, pattern))
        ignore = set(name for name in names
                        if name not in keep and not isdir(join(path, name)))
        return ignore
    return _ignore_patterns

# sample usage

copytree(src_directory, dst_directory,
         ignore=include_patterns('*.dwg', '*.dxf'))

Updated: 18:21. The following code works as expected, except that I'd like to ignore folders that don't contain any include_patterns('.dwg', '.dxf')

Answer

Jan picture Jan · Feb 27, 2017

shutil already contains a function ignore_pattern, so you don't have to provide your own. Straight from the documentation:

from shutil import copytree, ignore_patterns

copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))

This will copy everything except .pyc files and files or directories whose name starts with tmp.

It's a bit tricky (and not strictly necessairy) to explain what's going on: ignore_patterns returns a function _ignore_patterns as its return value, this function gets stuffed into copytree as a parameter, and copytree calls this function as needed, so you don't have to know or care how to call this function _ignore_patterns. It just means that you can exclude certain unneeded cruft files (like *.pyc) from being copied. The fact that the name of the function _ignore_patterns starts with an underscore is a hint that this function is an implementation detail you may ignore.

copytree expects that the folder destination doesn't exist yet. It is not a problem that this folder and its subfolders come into existence once copytree starts to work, copytree knows how to handle that.

Now include_patterns is written to do the opposite: ignore everything that's not explicitly included. But it works the same way: you just call it, it returns a function under the hood, and coptytree knows what to do with that function:

copytree(source, destination, ignore=include_patterns('*.dwg', '*.dxf'))