Woodstock Blog

a tech blog for general algorithmic interview questions

[Design] P2P Technology

Overview

Peer-to-peer (P2P) networking is a distributed application architecture that partitions tasks or work loads between peers.

Peers are both suppliers and consumers of resources, in contrast to the traditional client-server model where communication is usually to and from a central server. A typical example of a file transfer that uses the client-server model is the File Transfer Protocol (FTP) service in which the client and server programs are distinct: the clients initiate the transfer, and the servers satisfy these requests.

This architecture was popularized by the file sharing system Napster, originally released in 1999.

Precedure

  1. Alice run P2P client software.
    1. connect to Internet and get new IP address for each connection
    2. register her files in P2P system
    3. request “Harry Potter”
    4. find other peers who have the copy
    5. choose one and copy to her PC.
    6. meanwhile, Alice is servig her files for other people
  2. Act like a server
  3. Act like a client
  4. User keyword to search content (like google)

P2P Types

  1. Unstructured P2P: no coupling between nodes and file location

    1. Napster
    2. Gnutella
    3. KaZaA
  2. Structured P2P: tight coupling between nodes and file location

    1. DHT

Napster

Register at Napster server.

Centralized search, and P2P distributing.

Gnutella

Decentralized design for searching:

  1. No central directory server
  2. Each node maintain a list of neighbors (overlay network)

Search by flooding:

  1. BFS traversal.
  2. Define maximum number of hops
  3. Expanded-ring TTL search means to try 1 hop first, then try 2 hops, then 3…

Join nodes:

  1. Use Bootstrap node to get some IP addresses
  2. Join these IP, which becomes neighbors.

Shortcomings:

  1. Flooding is NOT a scalable design.
  2. Download may not complete.
  3. Possibility of search failure, even then the resource presents.

KaZaA

Combines Napster and Gnutella.

Each peer is a supernode or assigned to a supernode. Each supernode connects to 30~50 other supernodes. The supernode acts like a mini-Napster hub.

At registration, a PC connects to a supernode. If a supernode goes down, obtains updated list and elect another one.

Search within supernode, then in other supernodes. If found many nodes holding the file, do parallel downloading.

Automatic recovery if 1 server peer goes down. Use ContentHash to search.

Structured P2P

For Distributed HashTable services, refer to [Design] Distributed hash table.

Conclusion

  1. Unstructured P2P:

    1. no coupling between nodes and file location
    2. Centralized direcotry service (except Gnutella)
    3. Search by flooding (overhead)
    4. Hierarchical architecture (non-scalable)
  2. Structured P2P:

    1. tight coupling between nodes and file location
    2. DHT using consistent hashing (eg. Chord, and many other types)
    3. A node is assigned to hold particular content
    4. Search with more efficiency