220 likes | 747 Views
Introduction to Peer-to-Peer Networks. What is a P2P network. A P2P network is a large distributed system. It uses the vast resource of PCs distributed at the edge of the Internet to build a network that allows resource sharing without any central authority
E N D
What is a P2P network • A P2P network is a large distributed system. It uses the vast resource of PCs distributed at the edge of the Internet to build a network that allows resource sharing without any central authority • Client-Server vs. Peer-to-peer. A peer is both a client and a server. Control is decentralized. • Much more than a system for sharing pirated music.
Network of peers. Each link between peers consists of one or more IP links. The overlay network resides in the application layer. A P2P network is an overlay network Bob Alice Carol
Well-known P2P Systems • Napster • Gnutella • KaZaA • eDpnkey • Chord • Tapestry • CAN • Pastry • BitTorrent
Some important issues Search Storage Security Applications
A Distributed Storage Service Bob Alice David Carol
Promises Consider File Sharing as an Example • Available 24/7 • Durable despite machine failures • Information is protected • Resilient to Denial of Service
Additional Goals • Massive scalability • Anonymity • Deniability • Resistance to censorship
Challenges • A P2P network must be self-organizing. Join and leave operationsmust be self-managed. • The infrastructure is untrusted and the components are unreliable. The number of faulty nodes grows linearly with system size. Yet, the aggregate behavior has to be trustworthy.
Challenges • Tolerance to failures and churn • Efficient routing even if the structure of the network is unpredictable. • Dealing with freeriders • Load balancing • Security issues
Looking up data • How do you locate data/files/objects in a large P2P system built around a dynamic set of nodes in a scalable manner without any centralized server or hierarchy? • Napster index servers used a central database. Questionable scalability and poor resilience. • Check how names are looked up in internet’s DNS.
Napster Users Directory server Stores indices of songs only Developed by Shawn Fanning in 1999, Shut down after 2 years for copyright infringement. Centralized directory servers were a bottleneck.. I N T E R N E T Root/ Redirector Directory server Directory server
Gnutella Truly decentralized system. A search like where is Double Helix? is based on the flooding of the query on a graph of arbitrary topology. Obvious scalability problem, and the wastage of bandwidth caused serious inefficiencies.
Gnutella graph double helix Client looking for “double helix”
Unstructured vs. Structured • Unstructured P2P networks allow resources to be placed at any node. The network topology is arbitrary, and the growth is spontaneous. • Structured P2P networks simplify resource location and load balancing by defining a topology and defining rules for resource placement.
Distributed Hash Table (DHT) Object-to-machine mapping uses unique keys. H (object name) = key (H = hash function) H (machine name) = key Object name mapped to key k is placed in machine whose name is mapped to key k. Simplifies object location.
Distributed Hash Table (DHT) 0 N-1 c keyspace a Machine name hashed to b Object name hashed to b Basic idea b