Hadoop Vs Data Lake

Kishore picture Kishore · Mar 14, 2016 · Viewed 15.3k times · Source

I heard a new term Data Lake. I googled and got that

A data lake is a large-scale storage repository and processing engine. A data lake provides "massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs"

The term data lake is often associated with Hadoop-oriented object storage. In such a scenario, an organization's data is first loaded into the Hadoop platform, and then business analytics and data mining tools are applied to the data where it resides on Hadoop's cluster nodes of commodity computers.

Same thing is done by Hadoop. We have HDFS for Storage and MapReduce for Computation. I am little bit confuse about Hadoop and Data lake. What is difference between both. If they are same that why this term arise. Or how to define a data lake.

Answer

facha picture facha · Mar 14, 2016

Data Lake is an abstract "idea". Hadoop is specific technology/software. You can implement a data lake using hadoop or using different tool.