Woodstock Blog

a tech blog for general algorithmic interview questions

[Design] Hadoop Cluster

link

A Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment.

Such clusters run Hadoop’s open source distributed processing software on low-cost commodity computers.

Typically one machine in the cluster is designated as the NameNode and another machine the as JobTracker; these are the masters. The rest of the machines in the cluster act as both DataNode and TaskTracker; these are the slaves.

Hadoop clusters are known for boosting the speed of data analysis applications. They also are highly scalable.

As of early 2013, Facebook was recognized as having the largest Hadoop cluster in the world.