Git-Based Source Control in the Enterprise: Suggested Tools and Practices?

Bob Murphy picture Bob Murphy · Mar 5, 2010 · Viewed 19.2k times · Source

I use git for personal projects and think it's great. It's fast, flexible, powerful, and works great for remote development.

But now it's mandated at work and, frankly, we're having problems.

Out of the box, git doesn't seem to work well for centralized development in a large (20+ developer) organization with developers of varying abilities and levels of git sophistication - especially compared with other source-control systems like Perforce or Subversion, which are aimed at that kind of environment. (Yes, I know, Linus never intended it for that.)

But - for political reasons - we're stuck with git, even if it sucks for what we're trying to do with it.

Here are some of the things we're seeing:

  • The GUI tools aren't mature
  • Using the command line tools, it's far to easy to screw up a merge and obliterate someone else's changes
  • It doesn't offer per-user repository permissions beyond global read-only or read-write privileges
  • If you have a permission to ANY part of a repository, you can do that same thing to EVERY part of the repository, so you can't do something like make a small-group tracking branch on the central server that other people can't mess with.
  • Workflows other than "anything goes" or "benevolent dictator" are hard to encourage, let alone enforce
  • It's not clear whether it's better to use a single big repository (which lets everybody mess with everything) or lots of per-component repositories (which make for headaches trying to synchronize versions).
  • With multiple repositories, it's also not clear how to replicate all the sources someone else has by pulling from the central repository, or to do something like get everything as of 4:30 yesterday afternoon.

However, I've heard that people are using git successfully in large development organizations.

If you're in that situation - or if you generally have tools, tips and tricks for making it easier and more productive to use git in a large organization where some folks are not command line fans - I'd love to hear what you have to suggest.

BTW, I've asked a version of this question already on LinkedIn, and got no real answers but lots of "gosh, I'd love to know that too!"

UPDATE: Let me clarify...

Where I work, we can't use ANYTHING other than git. It's not an option. We're stuck with it. We can't use mercurial, svn, bitkeeper, Visual Source Safe, ClearCase, PVCS, SCCS, RCS, bazaar, Darcs, monotone, Perforce, Fossil, AccuRev, CVS, or even Apple's good ol' Projector that I used in 1987. So while you're welcome to discuss other options, you ain't gonna get the bounty if you don't discuss git.

Also, I'm looking for practical tips on how to use git in the enterprise. I put a whole laundry list of problems we're having at the top of this question. Again, people are welcome to discuss theory, but if you want to earn the bounty, give me solutions.

Answer

Johannes Rudolph picture Johannes Rudolph · Mar 9, 2010

Against the common opinion, I think that using a DVCS is an ideal choice in an enterprise setting because it enables very flexible workflows. I will talk about using a DVCS vs. CVCS first, best-practices and then about git in particular.

DVCS vs. CVCS in an enterprise context:

I wont talk about the general pros/cons here, but rather focus on your context. It is the common conception, that using a DVCS requires a more disciplined team than using a centralized system. This is because a centralized system provides you with an easy way to enforce your workflow, using a decentralized system requires more communication and discipline to stick to the established of conventions. While this may seem like it induces overhead, I see benefit in the increased communication necessary to make it a good process. Your team will need to communicate about code, about changes and about project status in general.

Another dimension in the context of discipline is encouraging branching and experiments. Here's a quote from Martin Fowler's recent bliki entry on Version Control Tools, he has found a very concise description for this phenomenon.

DVCS encourages quick branching for experimentation. You can do branches in Subversion, but the fact that they are visible to all discourages people from opening up a branch for experimental work. Similarly a DVCS encourages check-pointing of work: committing incomplete changes, that may not even compile or pass tests, to your local repository. Again you could do this on a developer branch in Subversion, but the fact that such branches are in the shared space makes people less likely to do so.

DVCS enables flexible workflows because they provide changeset tracking via globally unique identifiers in a directed acyclic graph (DAG) instead of simple textual diffs. This allows them to transparently track the origin and history of a changeset, which can be quite important.

Workflows:

Larry Osterman (a Microsoft dev working on the Windows team) has a great blog post about the workflow they employ at the Windows team. Most notably they have:

  • A clean, high quality code only trunk (master repo)
  • All development happens on feature branches
  • Feature teams have team repos
  • They do regularily merge the latest trunk changes into their feature branch (Forward Integrate)
  • Complete features must pass several quality gates e.g. review, test coverage, Q&A (repos on their own)
  • If a feature is completed and has acceptable quality it is merged into the trunk (Reverse Integrate)

As you can see, having each of these repositories live on their own you can decouple different teams advancing at different paces. Also the possibility to implement a flexible quality gate system distinguishes DVCS from a CVCS. You can solve your permission issues at this level too. Only a handful of people should be allowed access to the master repo. For each level of the hierachy, have a seperate repo with the corresponding access policies. Indeed, this approach can be very flexible on the team level. You should leave it up to each team to decide wether they want to share their team repo among themselves or if they want a more hierachical approach where only the team lead may commit to the team repo.

Hierachical Repositories

(The picture is stolen from Joel Spolsky's hginit.com.)

One thing remains to be said at this point though:- even though DVCS provides great merging capabilities, this is never a replacement for using Continuous Integration. Even at that point you have a great deal of flexibility: CI for the trunk repo, CI for team repos, Q&A repos etc.

Git in an enterprise context:

Git is maybe not the ideal solution for an enterprise context as you have already pointed out. Repeating some of your concerns, I think most notably they are:

  • Still somewhat immature support on Windows (please correct me if that changed recently) Now windows has github windows client , tortoisegit , SourceTree from atlassian
  • Lack of mature GUI tools, no first class citizen vdiff/merge tool integration
  • Inconsistent interface with a very low level of abstractions on top of its inner workings
  • A very steep learning curve for svn users
  • Git is very powerful and makes it easy to modify history, very dangerous if you don't know what you are doing (and you will sometimes even if you thought you knew)
  • No commercial support options available

I don't want to start a git vs. hg flamewar here, you have already done the right step by switching to a DVCS. Mercurial addresses some of the points above and I think it is therefore better suited in an enterprise context:

  • All plattforms that run python are supported
  • Great GUI tools on all major plattforms (win/linux/OS X), first class merge/vdiff tool integration
  • Very consistent interface, easy transition for svn users
  • Can do most of the things git can do too, but provides a cleaner abstraction. Dangerous operations are are always explicit. Advanced features are provided via extensions that must explicitly be enabled.
  • Commercial support is available from selenic.

In short, when using DVCS in an enterprise I think it's important to choose a tool that introduces the least friction. For the transition to be successful it's especially important to consider the varying skill between developers (in regards to VCS).


Reducing friction:

Ok, since you appear to be really stuck with the situation, there are two options left IMHO. There is no tool to make git less complicated; git is complicated. Either you confront this or work around git:-

  1. Get a git introductory course for the whole team. This should include the basics only and some exercises (important!).
  2. Convert the master repo to svn and let the "young-stars" git-svn. This gives most of the developers an easy to use interface and may compensate for the lacking discipline in your team, while the young-stars can continue to use git for their own repos.

To be honest, I think you really have a people problem rather than a tool problem. What can be done to improve upon this situation?

  • You should make it clear that you think your current process will end up with a maintainable codebase.
  • Invest some time into Continous Integration. As I outlined above, regardless which kind of VCS you use, there's never a replacement for CI. You stated that there are people who push crap into the master repo: Have them fix their crap while a red alert goes off and blames them for breaking the build (or not meeting a quality metric or whatever).