How do I check for valid Git branch names?

Alex Chamberlain picture Alex Chamberlain · Aug 23, 2012 · Viewed 15.8k times · Source

I'm developing a git post-receive hook in Python. Data is supplied on stdin with lines similar to

ef4d4037f8568e386629457d4d960915a85da2ae 61a4033ccf9159ae69f951f709d9c987d3c9f580 refs/heads/master

The first hash is the old-ref, the second the new-ref and the third column is the reference being updated.

I want to split this into 3 variables, whilst also validating input. How do I validate the branch name?

I am currently using the following regular expression

^([0-9a-f]{40}) ([0-9a-f]{40}) refs/heads/([0-9a-zA-Z]+)$

This doesn't accept all possible branch names, as set out by man git-check-ref-format. For example, it excludes a branch by the name of build-master, which is valid.

Bonus marks

I actually want to exclude any branch that starts with "build-". Can this be done in the same regex?

Tests

Given the great answers below, I wrote some tests, which can be found at https://github.com/alexchamberlain/githooks/blob/master/miscellaneous/git-branch-re-test.py.

Status: All the regexes below are failing to compile. This could indicate there's a problem with my script or incompatible syntaxes.

Answer

Joey picture Joey · Aug 23, 2012

Let's dissect the various rules and build regex parts from them:

  1. They can include slash / for hierarchical (directory) grouping, but no slash-separated component can begin with a dot . or end with the sequence .lock.

     # must not contain /.
     (?!.*/\.)
     # must not end with .lock
     (?<!\.lock)$
    
  2. They must contain at least one /. This enforces the presence of a category like heads/, tags/ etc. but the actual names are not restricted. If the --allow-onelevel option is used, this rule is waived.

     .+/.+  # may get more precise later
    
  3. They cannot have two consecutive dots .. anywhere.

     (?!.*\.\.)
    
  4. They cannot have ASCII control characters (i.e. bytes whose values are lower than \040, or \177 DEL), space, tilde ~, caret ^, or colon : anywhere.

     [^\000-\037\177 ~^:]+   # pattern for allowed characters
    
  5. They cannot have question-mark ?, asterisk *, or open bracket [ anywhere. See the --refspec-pattern option below for an exception to this rule.

     [^\000-\037\177 ~^:?*[]+   # new pattern for allowed characters
    
  6. They cannot begin or end with a slash / or contain multiple consecutive slashes (see the --normalize option below for an exception to this rule)

     ^(?!/)
     (?<!/)$
     (?!.*//)
    
  7. They cannot end with a dot ..

     (?<!\.)$
    
  8. They cannot contain a sequence @{.

     (?!.*@\{)
    
  9. They cannot contain a \.

     (?!.*\\)
    

Piecing it all together we arrive at the following monstrosity:

^(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$

And if you want to exclude those that start with build- then just add another lookahead:

^(?!build-)(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$

This can be optimized a bit as well by conflating a few things that look for common patterns:

^(?!@$|build-|/|.*([/.]\.|//|@\{|\\))[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock|[/.])$