MySQL Cluster vs. Hadoop for handling big data

hadoop mapreduce hive bigdata mysql-cluster

Tobi Weißhaar · Jan 29, 2014 · Viewed 13.8k times · Source

I want to know the advantages/disadvantages of using a MySQL Cluster and using the Hadoop framework. What is the better solution. I would like to read your opinion.

I think the advantages of using a MySQL Cluster are:

high availability
good scalability
high performance / real time data access
you can use commodity hardware

And I don't see a disadvantage! Are there any disadvantages that Hadoop do not has?

The advantages of Hadoop with Hive on top of it are:

also good scalability
you can also use commodity hardware
the ability to run in heterogenous environments
parallel computing with the MapReduce framework
Hive with HiveQL

and the disadvantage is:

no real time data access. It may takes minutes or hours to analyze the data.

So in my opinion for handling big data a MySQL cluster is the better solution. Why Hadoop is the holy grail of handling big data? What is your opinion?

Answer

Both of the above answers miss a huge differentiation between mySQL and Hadoop. mySQL requires you to store data in a certain format. It likes heavily structured data - you declare the data type of each column in a table etc. Hadoop doesn't care about this at all.

Example - if you have a billion text log files, to make analysis even possible for mySQL you'd need to parse and load the data first into a mySQL table, typeing each column along the way. With hadoop and mapreduce, you define the function that is to scan/analyze/return the data from its raw source - you don't need pre-processing ETL to get it pre-structured.

If the data is already structured and in mySQL - then (hopefully) its well structured - why export it for hadoop to analyze? If it isn't, why spend the time to ETL the data?

MySQL Cluster vs. Hadoop for handling big data

Answer

Related questions