Python rglob pattern for directory search

tsekine picture tsekine · Aug 31, 2018 · Viewed 8k times · Source

I try to get the name of subdirectories with Python3 script on Windows10. Thus, I wrote code as follows:

from pathlib2 import Path
p = "./path/to/target/dir"
[str(item) for item in Path(p).rglob(".")]
# obtained only subdirectories path names including target directory itself.

It is good for me to get this result, but I don't know why the pattern of rglob argument returns this reuslt.

Can someone explain this?

Thanks.

Answer

Arne picture Arne · Aug 31, 2018

Every directory in a posix-style filesystem features two files from the get go: .., which refers to the parent directory, and ., which refers to the current directory:

$ mkdir tmp; cd tmp
tmp$ ls -a
. ..
tmp$ cd .
tmp$  # <-- still in the same directory

- with the notable exception of /.., which refers to the root itself since the root has not parent.

A Path object from python's pathlib is, when it is created, just a wrapper around a string that is assumed to point somewhere into the filesystem. It will only refer to something tangible when it is resolved:

>>> Path('.')
PosixPath('.')  # just a fancy string
>>> Path('.').resolve()
PosixPath('/current/working/dir')  # an actual point in your filesystem

The bottom line is that

  • the paths /current/working/dir and /current/working/dir/. are, from the filesystem's point of view, completely equivalent, and
  • a pathlib.Path will also reflect that as soon as it is resolved.

By matching the glob call to ., you found all links pointing to the current directories below the initial directory. The results from glob get resolved on return, so the . doesn't appear in there any more.

As a source for this behavior, see this section of PEP428 (which serves as the specification for pathlib), where it briefly mentions path equivalence.