[NineChap System Design] Class 2.2: Database

Aug 26th, 2015 12:00 am

Question 5

Continue from last post, now let’s support VIP services.

User could buy VIP services using his acccount balance.

class ProService {
    int userId;
    double money;
    long endTime;

    public addMoney(){}
    public buyPro(){}
}

Q5.1: System crash when purchasing VIP

Solution: transaction with log

WriteLOG
Transaction_1123: BEGIN
money=20; endTime=16_07_2016

If system crash happens here, system will read the log, recover and roll back all original data. Try not to complete the transaction - just fail it.

WriteLOG
Transaction_1123: BEGIN
money=20; endTime=16_07_2016
WriteLOG
Transaction_1123: END
money=10; endTime=16_08_2016

What happens if system crash during writing the log? or during the rollback?

Q5.2: dataset contains bad data

one user id have 2 corresponding pro-services information.
Shallow user: a pro-services info does not have corresponding user.

Solution: have a checker class.

Q5.3: simutaneous purchase by 2 users

Solution: lock.

first process lock money & endTime.
Read money = 20
another process try to lock, but end up waiting (sleeping).
first process do the job, and release the lock.
another process wakes up.

lock have time-out settings. It can be applied in distributed system as well.

Question: does lock make your execution slow?

If another process is sleeping, CPU will be fully consumed by other process. So it won’t impact.
You can do some async processing, too.
When you lock, try to lock only a small piece of code, not the entire method. In DB, lock only a row, not a table.
Java CAS (Compare and swap )

Q5.4: Server crash

Solution: duplication.

How many copies?

Google did 3. Normally 2 in same data center, and 1 in another location.

Backup data normally is on another machine. But there’s also RAID (redundant array of independent disks) which:

combines multiple physical disk drive components into a single logical unit for the purposes of data redundancy, performance improvement, or both.
When does the copy happen?

Option 1 is doing everyday nightly. This is called a ‘check point’.

Option 2 is use another server to support Shadow Redundancy. All data from Machine 0 will be copied to Machine 1 WHILE it happens. The result is, Machine 1 is identical to Machine 0. If Machine 0 crashes, Machine 1 may be behind less than 1 second.

The way to duplicate is either re-play all the actions, or to read Machine 0’s log and apply the new data.
How to copy?

User send data to 1 server, and from that server, pipeline. This ensures good usage of server bandwith, and serial transfer of data ensures low latency (several ms).

It’s also possible to do tree-like transfer, but the above method is preferred cuz all machine consume same bandwidth.
What is log?

It is actually ‘checkpoint’ + log. It allows you to rollback.

Data redundancy - Summary:

Final note: Data inconsistency

Main sources of inconsistency comes from:

network fault
disk error

The disk eror is solved by checksum (compare during disk writing).

Summary

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably.

Atomicity: all or nothing

Q5.1: System crash when purchasing VIP
Consistency: validate according to defined rules

Q5.2: dataset contains bad data
Isolation: independency between transactions (lock)

Q5.3: simutaneous purchase by 2 users
Durability: stored permanently

Q5.4: Server crash

Additional Questions:

design a user system (Netflix 2015)

Hint: table design, register, login/out, password check, find password.

Design payment system (Yelp, BigCommerce 2015)

Hint: the question does not ask about table/ds design itself, but rather the problems associated with payment. Read about ACID principle.

[NineChap System Design] Class 2.1: Database

Aug 26th, 2015 12:00 am

Overview

This class covers database design:

design data with class and inheritance
design a user system (Netflix 2015)
design a payment system (Yelp, BigCommerce 2015)

Question 1

design account (login/out) system for our radio app.

Step one, scenario

register, update, remove account
login/out
user balance, VIP services

Step Two, necessary

Ask
1. total user: 100 million
2. daily user: 1 million
predict
1. daily active user in 3 month: 10 million
2. register percentage: 1%
3. daily new register: 100 thousand
more predict
1. login percentage: 15%
2. average login frequency: 1.2 (ppl may input wrong password 20% of time)
3. daily login attempts: 10 million * 15% * 1.2 = 1.8 million
4. average login frequency: 1.8 million / 24hr = 21 login/sec
5. normal login frequency: 21 * 2 = 42 login/sec
6. peak login frequency: 42 * 3 = 126 login/sec

Step Three, Application

Step Four, Kilobit

Data - User table should contain name and password. What else?

class User {
    int userId; (primary key)
    String name;
    String password;
}

Data - User Table

class UserTable {
    list<User> table;

    public insert(){}
    public delete(User){}
    public update(User){}
    public User select(){}
}

CRUD, (Sometimes called SCRUD with an “S” for Search) are the four basic functions of persistent storage.

Question 2

verification and forbidden accounts

We have to know the concept of Account State Lifecycle Graph:

ban: fake user, advertising users… bannned by the management
inactive: user choose to suspend his own account, voluntarily.
delete account: normally we won’t remove all related data (just make userId as “deleted”). Otherwise a lot of data can be violated. All your chatting history actually remains.

redesign User Table

Old User table:

class User {
    int userId; (primary key)
    String name;
    String password;
}

Modified User table:

class User {
    int userId;
    char name[10];
    char hiddenPassword[10];
    int state;
}

We added state, to support Account life cycle.
We changed username to fixed size, for better performance on searching and storing. Can prevent certain attacks, too.
save encrypted password.

Question 3

design login/out process

User account auto logged out after a certain period of time.
multiple account logged in at same time.

Session

Session is a conversation between user and server.

User can have >1 session, if he log in from different devices.
Session must be verified, thus we have to keep sessionId.

Session status: “iPad online”, “PC online”…

Modify User table:

class User {
    int userId;
    char name[10];
    char hiddenPassword[10];
    int state;
    List<session> sessionList;
}

Important in Session table:

device ID
time-out period

class Session { private sessionId; int userId;
```
 int deviceCode;
 long timeOut;
```
}

User table would include a session list.

further improvement: session

we update sessionList very frequently.
size of sessionList is dynamic.

Solution: seperate the table.

Question: When to clean up the session data (considering huge amount of data and frequent calculation)?

Answer: every end of day. Or store sessions in-memory, so it lose all the data when machine restarts (it is used in Gaming platforms). Or we can clean up one user’s session list whenever the user did a new log-in.

We do not remove session whenever it expires. It’s too much calculation.

further improvement: inheritance

Apply inheritance to UserTable and SessionTable:

class Table {
    list<Row*> table;

    public insert(){}
    public delete(){}
    public update(){}
    public User select(){}
}

class UserTable extends Table {
}

class SessionTable extends Table {
}

As for the Row class:

class Row {
    List<Attributes> row;
}

class User extends Row {
}

class Session extends Row {
}

Question 4

design search algorithm

find my userId
find my userId range

Solution 1: add HashMap in the table. Can do search in O(1), but can’t find range.

Solution 2: BST data structure. Can do search range and search in O(log2 n) time.

Best solution: B+ tree

B+ tree - everything in O(logb n) which is close to constant time.

Plus, B+ tree is hard disk friendly. Read more on a future post.

[Java OOP] Three Properties of Class/Object

Aug 24th, 2015 12:00 am

Objects

Much of the power and flexibility of modern software analysis and design derives from its use of objects.

Classes

Classes create an abstract representation of the world, letting you discard unnecessary details.

The 3 properties

Class

properties
behavior
relationships to other objects

Object

state
behavior
identity

Ref: http://javadevwannabe.blogspot.sg/2012/02/state-behavior-and-identity.html?m=1

[NineChap System Design] Class 1.3: Improvement

Aug 23rd, 2015 12:00 am

From Level 0 to Level 1

Refer to the previous question. How can we improve???

performance
scalability
robustness

performance

A better algo: Inverted Index

Avg performance increase to ~ 20ns (with some optimization of MapReduce procedure, discuss later).

Max QPS increase to 50.

scalability

Use a dispatcher to re-direct the requests to multiple machines.

How many machines do we need then?

Well we need 500 QPS. The algo above achieves ~ 50 QPS. Should we need 10 machines?

The answer is NO. A machine with 8 (or 16) core CPU could be able to handle.

We can also have a hot-standby, to be safe.

hot standby is used as a failover mechanism to provide reliability in system configurations.

When a key component fails, the hot spare is switched into operation.

robustness

Tips about system design for senior engineers:

Draw 1 machine first. This machine can contains multiple datasets and run multiple processes.

On top of this machine, the interface layer is one single Manager process. The Manager is in charge of almost everything: handling data lost, handle high concurrency, copy multiple instance of itself…

Like this:

Back-end

Now we need a cluster of datasets (which has Manager on top of it), and a cluster of Recommenders. Manager is in charge of copying multiple instances.

Dataset can be put in different physical locations. Recommender don’t really need, cuz it’s only do calculation job.

Receiving requests

Just now we used Receptionist (or Dispatcher) to handle request. Now we use a Web service (eg. Apache). It’s not necessary to make it a cluster.

Big Brother

We need a monitor system to oversee everything.

Also, Big Brother is in charge of heart-beat. If not received, Big Brother have some double-check machanism.

Connecting the dots

Dispatcher is used to connect the 4 components. It’s like a messaging queue that collects and distributes jobs among everybody (eg. control and distributed info).

It can be stateful or stateless.

Keep in mind the connection between Dataset and Recommender remains. It’s slow going thru Dispatcher.

Distribute it

During development, the 5 components can be put on same machine. When we deploy distributely, we use Socket connection (keep alive) to connect them.

Notice the Web Service is connection heavy, which consume large CPU and RAM resource. It’s better to seperate to one machine.

Big brother is read/write heavy, so it’s OKAY to put on same machine with Dispatcher.

Since Dataset and Recommender have data exchange, it’s a good idea to put on same machine.

Additional questions

Implement Dispatcher with consumer-producer model.

[NineChap System Design] Class 1.2: An Example

Aug 23rd, 2015 12:00 am

A bottom-up example

Example two: design a recommendation module

A simple algo:

u1={m3,m5,m7,m11}
u2={m1,m2,m3,m4,m5,m6,m7,m8,m9}
Similarity( u1, u2 ) = 3

m - music

u - user

Similarity = # of same music for different users

Adv algo:

find his top-1 similar user. Stay tuned for future posts.

Use the 5 Steps (SNAKE)

Step One, Scenario
Step Two, Necessary
Step Three, Application
Step Four, Kilobit: data
Last Step, Evolve

Because this question is relatively easy, we will not do case-analysis (Macro).

Instead, we do micro design by starting at the interface.

Step One, Scenario

Interface

class Recommender {
    public int findSimilarUser(int userId) {
        //
    }
}

Step Two, Necessary

ask
1. total users = 100,000,000
2. total music = 10,000,000
3. peak users in 3 month = 6,000,000
However, not everyone is logged in. Thus we won’t need to recommend for everybody. On average, the logged-in ratio is 1% - 30%. Let’s assume 5%.
1. participation percentage = 5%
And user’s interest won’t change every minute. Let’s recalculate only after 10 minutes.
1. calculation frequency = 1 update/10min/user
predict
1. user analysis (skip)
2. Traffic analysis (skip)
3. Memory analysis (skip)
4. QPS
Peak QPS = 6,000,000 * 5% / (10 * 60) = 500/s

Step Three, Application

The simpliest algorithm: BF compare. The complexity is O(m n) for each user, where m is # of music a person likes, and n is # of total users. For k users, it takes O(k m n) time (k can be = peak concurrent users).

This is roughly 0.2s per user. Thus Max QPS = 5.

One word about complexity-to-seconds estimation.

O(n ^ 3) -> 1s

O(n ^ 2) -> 0.2s

O(n) -> 20ms

O(k) -> k ms

Step Four, Kilobit: data

Very simple:

Last Step, Evolve

Read on.

[NineChap System Design] Class 1.1: Overview

Aug 23rd, 2015 12:00 am

Overview

defination

the process of defining the architecture, components, modules, interfaces and data to satisfy specified requirements.

conceptual design (macro)
logical design
physical design (micro)

Top-down design

Eg. MS Office, Huawei Security System

Bottom-up design

Most start-up use, MVP first using Medetor + MongoDb.

5 steps (SNAKE Principle):

Scenario: case/interface - input
Necessary: constrain/hypothesis - input
Application: service/algorithm - output
Kilobit: data - output
Evolve - solution

A top-down example

Example one: design a radio

Step One, Scenario

brain storm
1. register/log in
2. play music
3. recommendation
prioritize
1. play music
  1. Get channel
  2. select a channel
  3. play

Step Two, Necessary

ask
1. total user - 100,000,000
2. daily users - 1,000,000
predict
1. user analysis
2. Traffic analysis
3. Memory analysis
4. QPS

Details:

user analysis

Avg Concurrent users = daily users / 5 = 200,000

Peak Concurrent users = concurrent user * 3 = 600,000

considering your product may grow in the next 3 month:

Max Peak users in 3 month = Peak users * 10 = 6,000,000
Traffic analysis

Request of new music per user: 1 music/min

Music size = 3MB

Max peak traffic (in 3 months) = 6,000,000 * 3MB / 60 = 300GB/s
Memory analysis

Memory per user (metadata) = 10KB

Max daily memory = 1,000,000 * 10 * 10 = 100 million KB = 100GB

(10 times of avg daily user)

Step Three, Application

Replay the case, one service for each
Merge the services

Step Four, Kilobit: data

Append 1 dataset for each service

Eg. User service: stability, more addition, less modify and deletion.

Eg. Channel Service: high concurrency, MongoDB

Eg. Music Service: MP3 File Systems

Last Step, Evolve

Better: constrains

eg. able to handle 300GB/s traffic?
Broader: new cases

share music? delete user account?
Deeper: details design

From views of Performance, Scalability and Robustness.

[Design] How Is Pipe Implemented in Unix/Linux

Jul 23rd, 2015 12:00 am

Overview

In Unix-like OS, a pipeline is a set of processes chained by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) to the next one.

Pipes are unidirectional byte streams which connect the standard output from one process into the standard input of another process. Neither process is aware of this redirection and behaves just as it would normally. It is the shell which sets up these temporary pipes between the processes.

Process

Linux has a VFS called pipefs that is mounted in the kernel (not in user space)

PipeFS is a unique virtual filesystem. This filesystem is mounted inside the kernel rather than in the userspace. While most filesystems are mounted under “/”, PipeFS is mounted on “pipe:”, making PipeFS its own root (yes, a second root filesystem).

This filesystem is one superblock and cannot exceed that amount system-wide. The entry point of this filesystem/second-root is the system-call “pipe()”. Unlike the other virtual/pseudo filesystems, this one cannot be viewed.

Many of you may be wondering what purpose this PipeFS filesystem serves. Unix pipes use this filesystem. When a pipe is used (eg. ls | less), the pipe() system-call makes a new pipe object on this filesystem. Without this filesystem, pipes cannot be made.

Also, threads and forks communicate together via pipes. Without PipeFS, processes could not fork and threads could not communicate.

Network pipes also rely on this virtual/pseudo filesystem.
pipefs has a single super block and is mounted at it’s own root (pipe:), alongside /
pipefs cannot be viewed directly unlike most file systems
The entry to pipefs is via the pipe(2) syscall
The pipe(2) syscall used by shells for piping with the | operator (or manually from any other process) creates a new file in pipefs which behaves pretty much like a normal file
The file on the left hand side of the pipe operator has its stdout redirected to the temporary file created in pipefs
The file on the right hand side of the pipe operator has its stdin set to the file on pipefs
pipefs is stored in memory and through some kernel magic

Ref: http://unix.stackexchange.com/q/148401

[Design] Cryptographic Standard, AES and RSA

Jun 9th, 2015 12:00 am

Overview

3 areas of cryptographic standard:

encryption standard
1. Data Encryption Standard (obsolete)
2. Triple DES
3. Advanced Encryption Standard (AES)
4. RSA
5. OpenPGP
6. CipherSaber
hash standard
1. MD5
2. SHA-1
3. SHA-2
4. HMAC
5. PBKDF2
digital signature standard
1. Digital Signature Algorithm (DSA)
2. RSA
3. Elliptic

Symmetric-key algorithm

Use the same cryptographic keys for both encryption and decryption.

The keys represent a shared secret between two parties, and maintain a private information link.

This requirement that both parties have access to the secret key is one of the main drawbacks.

Public-key cryptography

The public key is used:

encrypt plaintext
verify a digital signature

private key is used:

decrypt ciphertext
create a digital signature.

Encryption standard

RSA Vs. AES

RSA is very computationally expensive by comparison with AES. It involves mathematics with very large numbers, whilst AES can be implemented with relatively simple bit operations.

RSA is a public-key encryption algorithm (asymmetric), while AES is a symmetric key algorithm. Often a cryptosystem will use both algorithms.

A good compromise is to use RSA to encrypt the symmetric key that is then used in AES encryption of the larger data.

GitHub

uses RSA encryption.

hash standard

MD5

The MD5 message-digest algorithm is a widely used cryptographic hash function producing a 128-bit (16-byte) hash value, or 32 digit Hex.

d -> 8277e0910d750195b448797616e091ad

good morning -> 2b849500e4585dab4196ec9a415edf8f

SHA-1

SHA-1 produces a 160-bit (20-byte) hash value, or 40 digit Hex.

For more

About MD5, SHA-1 and other, refer to [Design] Cryptographic Hash, MD5 and Digital Signature

digital signature standard

A valid digital signature gives a recipient confidence that the message was created by a known sender.

commonly used for software distribution, financial transactions

To create a digital signature, signing software (such as an email program) creates a one-way hash of the data to be signed. The private key is then used to encrypt the hash.

The reason for encrypting the hash instead of entire message is that a hash function can convert an arbitrary input into a fixed length value, which is usually much shorter.

Other party validate the integrity of the data by using the signer’s public key to decrypt the hash.

Note: you can choose to ‘ Add digital signature to this message ’ in Microsoft Office.

[Design] Linux and TCP Ports

Jun 8th, 2015 12:00 am

Overview

a port is a software construct serving as a communications endpoint in a computer’s host operating system.

purpose of ports is to uniquely identify different applications or processes running on a single computer and thereby enable them to share a single physical connection to a packet-switched network like the Internet.

The protocols that primarily use ports are the Transport Layer protocols, such as TCP and UDP.

Port info can be viewed on Linux /etc/services files.

there’re only 65536 ports

In TCP/IP stack, port number field is just 16bit size unsigned integer. Port number thus ranging from 0 to 65535.

well-known ports

Well-known ports (or Privileged Ports) are those from 0 through 1023.

20 & 21: File Transfer Protocol (FTP)
22: Secure Shell (SSH)
23: Telnet remote login service
25: Simple Mail Transfer Protocol (SMTP)
53: Domain Name System (DNS) service
80: Hypertext Transfer Protocol (HTTP) used in the World Wide Web
110: Post Office Protocol (POP3)
119: Network News Transfer Protocol (NNTP)
143: Internet Message Access Protocol (IMAP)
161: Simple Network Management Protocol (SNMP)
194: Internet Relay Chat (IRC)
443: HTTP Secure (HTTPS)
465: SMTP Secure (SMTPS)

Socket

Socket is combination of software Port and IP address.

Protocol number

In an IP header, the Protocol field identifies the service in the next higher level in the protocol stack to which data is passed. Do not confuse this with port number, which is used for communication by TCP/UDP.

Service	Protocol Number
Internet Control Message Protocol (ICMP)	1
Transmission Control Protocol (TCP)	6
User Datagram Protocol (UDP)	17
General Routing Encapsulation (PPTP data over GRE)	47
Authentication Header (AH) IPSec	51
Encapsulation Security Payload (ESP) IPSec	50
Exterior Gateway Protocol (EGP)	8
Gateway-Gateway Protocol (GGP)	3
Host Monitoring Protocol (HMP)	20
Internet Group Management Protocol (IGMP)	88
MIT Remote Virtual Disk (RVD)	66
OSPF Open Shortest Path First	89
PARC Universal Packet Protocol (PUP)	12
Reliable Datagram Protocol (RDP)	27
Reservation Protocol (RSVP) QoS	46

When the IP packet contain TCP data the protocol number field will have the value 6 in it, so the payload will be sent to the TCP stack, TCP would then use the port numbers to send the data to the correct application. The same is for UDP with protocol number 17.

Another way to look at the IP protocol number field is, if we didn’t have this field in the IP packet header, IP would only be capable of carrying one type of data, while adding this field allowed the IP to carry multiple types of data differentiated by the protocol number, the same goes for TCP/UDP using TCP/UDP ports to serve multiple applications and Ethernet using the Ethertype, and so on.

can multiple app bind to (or listen to) the same port?

Can’t. Because You can only have one application listening on a single port at one time.

the app opens a port, gets a handle to it, and the OS notifies it (via that handle) when a client connection (or a packet in UDP case) arrives.

If the OS allowed two apps to open the same port, how would it know which one to notify?

[Java OOP] Overload, Override, Compile, Runtime (Static/Dynamic Polymph)

May 24th, 2015 12:00 am

The master statement

Overloading is Compile Time and Overriding is Runtime.

Examples

Overloading

public static class test
{
    static void Main(string[] args)
    {
        Foo();
        Foo("test");
    }

    public static void Foo()
    {
        Console.WriteLine("No message supplied");
    }

    public static void Foo(string message)
    {
        Console.WriteLine(message);
    }
}

This is called static (compile-time) polymorphism because the compiler is aware of exactly which method you are calling.

Overriding

When the PrintMessage() function is call, it determines which version of GetMessage() to use at runtime, based on the type of IMessage that is passed in.

public static class MessagePrinter
{
    public static void PrintMessage(IMessage message)
    {
        Console.WriteLine(message.GetMessage());
    }
}

public interface IMessage
{
    public string GetMessage();
}

public class XMLMessage : IMessage
{
    public string GetMessage()
    {
        return "This is an XML Message";
    }
}

public class SOAPMessage : IMessage
{
    public string GetMessage()
    {
        return "This is a SOAP Message";
    }
}

This is dynamic (runtime) polymorphism. This is due to the fact that the compiler doesn’t necessarily know what type of object is being passed in at compile-time.

Conclusion

Overloading

Compile time Polymorphism = Static Polymorphism = Early binding

Overriding

Runtime Polymorphism = Dynamic Polymorphism = Late binding =

Reference: How Overloading is Compile Time and Overriding is Runtime from stackoverflow.com

← Older Blog Archives Newer →