Why are relational databases having scalability issues?

lahiru madhumal picture lahiru madhumal · Aug 31, 2012 · Viewed 12.2k times · Source

Recenctly I read some articles online that indicates relational databases have scaling issues and not good to use when it comes to big data. Specially in cloud computing where the data is big. But I could not find good solid reasons to why it isn't scalable much, by googling. Can you please explain me the limitations of relational databases when it comes to scalability?

Thanks.

Answer

Erwin Smout picture Erwin Smout · Sep 3, 2012

Imagine two different kinds of crossroads.

One has traffic lights or police officers regulating traffic, motion on the crossroad is at limited speed, and there's a watchdog registering precisely what car drove on the crossroad at what time precisely, and what direction it went.

The other has none of that and everyone who arrives at the crossroad at whatever speed he's driving, just dives in and wants to get through as quick as possible.

The former is any traditional database engine. The crossroad is the data itself. The cars are the transactions that want to access the data. The traffic lights or police officer is the DBMS. The watchdog keeps the logs and journals.

The latter is a NOACID type of engine.

Both have a saturation point, at which point arriving cars are forced to start queueing up at the entry points. Both have a maximal throughput. That threshold lies at a lower value for the former type of crossroad, and the reason should be obvious.

The advantage of the former type of crossroad should however also be obvious. Way less opportunity for accidents to happen. On the second type of crossroad, you can expect accidents not to happen only if traffic density is at a much much lower point than the theoretical maximal throughput of the crossroad. And in translation to data management engines, it translates to a guarantee of consistent and coherent results, which only the former type of crossroad (the classical database engine, whether relational or networked or hierarchical) can deliver.

The analogy can be stretched further. Imagine what happens if an accident DOES happen. On the second type of crossroad, the primary concern will probably be to clear the road as quick as possible, so traffic can resume, and when that is done, what info is still available to investigate who caused the accident and how ? Nothing at all. It won't be known. The crossroad is open just waiting for the next accident to happen. On the regulated crossroad, there's the police officer regulating the traffic who saw what happened and can testify. There's the logs saying which car entered at what time precisely, at which entry point precisely, at what speed precisely, a lot of material is available for inspection to determine the root cause of the accident. But of course none of that comes for free.

Colourful enough as an explanation ?