Woodstock Blog

a tech blog for general algorithmic interview questions

[Design] Big Data Storage

ref

Question

Given 1 trillion messages on fb and each message has at max 10 words.

How do you build the index table and how many machines do you need on the cluster to store the index table?

One possible answer

Total data = 1 trillion * 10 words * 6 bytes / word = 60TB = one small NetApp box

Index by hashed userid; will distribute traffic effectively across servers; cache active users recent messages in memory.

Cannot use Netapp box. From what I read in FB engg blog, they have all the info in main memory of server.

Total data = 1 trillion * 10 words * 6 bytes / word = 60TB + 1TB for Indexes.

Considering servers have 64 GB ram. 61 GB usable to store index, 1000 servers.

For more information

Read 2 other posts: [Design] Distributed hash table and [Design] Cloud, Grid and Cluster.