Woodstock Blog

a tech blog for general algorithmic interview questions

[Design] Overview of Big Data Technology

link

Traditional RDBMS

Data is organized in a highly-structured manner, following the relational model.

The need for the data to be well-structured actually becomes a substantial burden at extremely large volumes.

NoSQL

A completely different framework of databases that allows for high-performance, agile processing of information at massive scale.

NoSQL centers around the concept of distributed databases.

It’s horizontally scalable; as data continues to explode, just add more hardware to keep up.

Hadoop

Hadoop is not a type of database, but rather a software ecosystem that allows for massively parallel computing.

Hadoop is an open source implementation of the MapReduce programming model. Hadoop relies not on Google File System (GFS), but on its own Hadoop Distributed File System (HDFS).

MapReduce

An example of the Hadoop ecosystem is MapReduce.

It’s a computational model that basically takes intensive data processes and spreads the computation across a potentially endless number of servers.