How safe is it to host sensitive data on repository sites like github, bitbucket, etc.?

alien picture alien · Oct 28, 2011 · Viewed 25.2k times · Source

This is just a question out of curiosity. I am wondering how safe it is generally considered to host sensitive data on repository websites like Github, Bitbucket, etc.? Is it safe enough to get rid of all code on local machines and just store it all on there? How about safety in the sense of keeping company secrets? I notice these sites tout big companies like Google and Yahoo use their services, but do these big companies actually store their trade secrets and important company code on websites like this?

Github has a page (http://help.github.com/security), which has some interesting information, that shows they are marketing it as something fool proof like I described. But in practice, do big companies like Google really find that their proprietary secrets and massive amounts of code are really safe from prying eyes and disastrous occurrences on sites like these?

Answer

Christian Specht picture Christian Specht · Oct 28, 2011

As always, it depends :-)

There can be two different meanings of "safety":

  1. Can I trust the hoster to keep my stuff (intellectual property, company secrets...) private?
  2. What happens to my code if the hoster suddenly goes out of service?

For 1., there is no 100% guarantee.
Of course, the big hosters like GitHub and Bitbucket won't share your code intentionally with third parties, but there is always the possibility that some hacker manages to get the content of your private repositories.
(this could happen to you as well if you host your code internally in your company, but this is unlikely, because unless your company is as known as, say, Google, the chance of someone trying to attack your company is much smaller than the chance of someone trying to attack a well-known public hoster).

Plus, you have to consider the laws of the country where the hoster resides.
A few weeks ago I read somewhere that if your hoster is in the USA, they can be forced by law to give your data to the US government under certain circumstances, and they are not even allowed to tell you about that (I don't remember the name of the law, but maybe someone else knows).

I guess that all this causes most "big" companies to not host their code on a public service (my company is mid-sized, and we host our code private as well).

By the way, as you mentioned Google:
I'm sure that especially Google does not use Bitbucket or GitHub. They have the complete infrastructure for project hosting themselves, so I guess they are using it internally, too. Why should they use an external service? It's in the cloud, yes...but it's their cloud.

Concerning 2.: it's unlikely that GitHub or Bitbucket will go bankrupt tomorrow, but you never know.
IMO it's your responsibility to take backups of your code yourself.
The nature of DVCS makes sure that you have some local copies of your code anyway, but it might be difficult to search lots of developer machines for the newest versions of all of your projects.
I do this by pulling all my repositories to my local machine regularly (I wrote a tool that can do this for Bitbucket, which I use for my private projects)