Minimum system requirements for running a Hadoop Cluster with High Availability

Prabhath picture Prabhath · Sep 24, 2015 · Viewed 8.5k times · Source

From what I understand for High availability in hadoop we need one Name Node and one Standby Node,Network shared Storage space(shared between two name nodes), at least 2 data nodes for running hadoop cluster.

  1. Can we run dataNode server on the same machine which is running name node.

  2. Can Yarn run on the machine which is running NameNode or dataNode server.

Please suggest if i am missing any other service which is necessary for production hadoop environment.

What should be the system requirements for name node as it is only handling metadata(I/O intensive of CPU Intensive). The data we are crunching is mostly I/O intensive.

Answer

pradeep picture pradeep · Sep 24, 2015

For Hadoop HA - you need atleast two separate machine which can run Namenode and Namenode HA. So in theory you can have Hadoop HA cluster with atleast 2 machines. But that's not much useful in practical.

To answer your other question : 1. You can run DataNode service on the machine which runs Namenode service. This is general scenario in PoC cluster where you have small cluster (3-7nodes roughly) NOTE: You should use dedicated machines for Master services like Namenode in production as part of best practices.

  1. Yes you can run YARN services on the machine which runs Datanode or Namenode or both. In-fact , on single node cluster all services runs on one machines. Basically, all these services like Namenode , Datanode, YARN are Java process so they run on separate JVMs. You can host all these process on same node or different node as per wish.

Namenode mostly needs RAM which depends on your cluster data size and number blocks you have in your cluster or expected to have.Generally , your queries (CPU or I/O intensive) do not affect namenode system requirement.

For more service details refer :

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html