Overview
Peer-to-peer (P2P) networking is a distributed application architecture that partitions tasks or work loads between peers.
Peers are both suppliers and consumers of resources, in contrast to the traditional client-server model where communication is usually to and from a central server. A typical example of a file transfer that uses the client-server model is the File Transfer Protocol (FTP) service in which the client and server programs are distinct: the clients initiate the transfer, and the servers satisfy these requests.
This architecture was popularized by the file sharing system Napster, originally released in 1999.
Precedure
- Alice run P2P client software.
- connect to Internet and get new IP address for each connection
- register her files in P2P system
- request “Harry Potter”
- find other peers who have the copy
- choose one and copy to her PC.
- meanwhile, Alice is servig her files for other people
- Act like a server
- Act like a client
- User keyword to search content (like google)
P2P Types
Unstructured P2P: no coupling between nodes and file location
- Napster
- Gnutella
- KaZaA
Structured P2P: tight coupling between nodes and file location
- DHT
Napster
Register at Napster server.
Centralized search, and P2P distributing.
Gnutella
Decentralized design for searching:
- No central directory server
- Each node maintain a list of neighbors (overlay network)
Search by flooding:
- BFS traversal.
- Define maximum number of hops
- Expanded-ring TTL search means to try 1 hop first, then try 2 hops, then 3…
Join nodes:
- Use Bootstrap node to get some IP addresses
- Join these IP, which becomes neighbors.
Shortcomings:
- Flooding is NOT a scalable design.
- Download may not complete.
- Possibility of search failure, even then the resource presents.
KaZaA
Combines Napster and Gnutella.
Each peer is a supernode or assigned to a supernode. Each supernode connects to 30~50 other supernodes. The supernode acts like a mini-Napster hub.
At registration, a PC connects to a supernode. If a supernode goes down, obtains updated list and elect another one.
Search within supernode, then in other supernodes. If found many nodes holding the file, do parallel downloading.
Automatic recovery if 1 server peer goes down. Use ContentHash to search.
Structured P2P
For Distributed HashTable services, refer to [Design] Distributed hash table.
Conclusion
Unstructured P2P:
- no coupling between nodes and file location
- Centralized direcotry service (except Gnutella)
- Search by flooding (overhead)
- Hierarchical architecture (non-scalable)
Structured P2P:
- tight coupling between nodes and file location
- DHT using consistent hashing (eg. Chord, and many other types)
- A node is assigned to hold particular content
- Search with more efficiency