git-svn clone ignore-paths regular expression for folders

zapping picture zapping · Mar 19, 2013 · Viewed 11.4k times · Source

Am trying to do a git-svn clone to import all the files in SVN to GIT. The command that was given was this;

git svn clone --stdlayout --ignore-paths='(/cache|/tmps|/file/conf/setting.xml)' --authors-file=../authors.txt file:///svnFolder/local-repos/PRG PRG.git

The above clones but the issue is it ignores all the files and folder that has cache and tmps. Like for instance it ignores even these

new/folder/cache
meta/files/sets/tmps.html

Can anybody please help me out to set the regular expression to give in the ignore-paths to ignore files and subdirectories that is there in the root folder's cache and tmps directories.

Answer

Russ Watson picture Russ Watson · Jul 31, 2013

Your ignore paths regex is too general. The regular expression provided is run on a full path. For example, if your repository layout is:

svn_root/path/to/your_project

And then has a standard layout of trunk, branches, and tags, a set of sample path lines that gets evaluated might be:

svn_root/path/to/your_project/trunk/new/folder/cache
svn_root/path/to/your_project/trunk/meta/files/sets/tmps.html
svn_root/path/to/your_project/trunk/file/conf/setting.xml
svn_root/path/to/your_project/trunk/cache/...
svn_root/path/to/your_project/trunk/tmps/...

Lets start by analyzing the regex you provided as part of the ignore-paths parameter:

'(/cache|/tmps|/file/conf/setting.xml)'
  1. The surrounding parentheses means that the expression within should be capturing.
  2. The pipes, or alternation, means to evaluate each expression on the target string out of several possible expressions
  3. Each expression is very straight forward, but lets analyze each:
    • /cache
      1. Find a literal character "/"
      2. Find a literal character "c"
      3. Find a literal character "a"
      4. Find a literal character "c"
      5. Find a literal character "h"
      6. Find a literal character "e"
    • /tmps
      1. Find a literal character "/"
      2. Find a literal character "t"
      3. Find a literal character "m"
      4. Find a literal character "p"
      5. Find a literal character "s"
    • /file/conf/setting.xml
      1. Find a literal character "/"
      2. Find a literal character "f"
      3. Find a literal character "i"
      4. Find a literal character "l"
      5. Find a literal character "e"
      6. Find a literal character "/"
      7. Find a literal character "c"
      8. Find a literal character "o"
      9. Find a literal character "n"
      10. Find a literal character "f"
      11. Find a literal character "/"
      12. Find a literal character "s"
      13. Find a literal character "e"
      14. Find a literal character "t"
      15. Find a literal character "t"
      16. Find a literal character "i"
      17. Find a literal character "n"
      18. Find a literal character "g"
      19. Match (almost) any character
      20. Find a literal character "x"
      21. Find a literal character "m"
      22. Find a literal character "l"

With your regular expression analyzed, lets walk through the sample paths given above with your expressions:

String to evaluate:

svn_root/path/to/your_project/trunk/new/folder/cache
  1. Loop through each character looking for a literal "/", followed by "c", etc... until a complete match is found with your first sub-expression "/cache". This path is ignored.

String to evaluate:

svn_root/path/to/your_project/trunk/meta/files/sets/tmps.html
  1. Loop through each character looking for a literal "/", followed by "c", etc... No match is found
  2. Loop through each character looking for a literal "/", followed by "t", etc... until a complete match is found with your second sub-expression "/tmps". This path is ignored.

String to evaluate:

svn_root/path/to/your_project/trunk/file/conf/setting.xml
  1. Loop through each character and evaluate against the first sub-expression. No match is found
  2. Loop through each character and evaluate against the second sub-expression. No match is found
  3. Loop through each character and evaluate against the last sub-expression. Match is found. This path is ignored

From here, you can probably see why the following two are also ignored. One of the sub expressions matches a portion of each path:

svn_root/path/to/your_project/trunk/cache/...
svn_root/path/to/your_project/trunk/tmps/...

There are several ways to solve this problem, but if you are only trying to ignore a couple of specific directories in the trunk, you could modify your expression as follows:

'(trunk/cache|trunk/tmps|/file/conf/setting\.xml)'

It really depends on what you want to do, which specific paths you want to ignore. If you need more help, if you could clarify in detail as to how your repository is laid out and which directories are to be ignored.