380 likes | 426 Views
P2P Search COP5711. P2P Search Techniques. Centralized P2P systems e.g. Napster, SETI@home Decentralized & unstructured P2P systems e.g. Gnutella Hybrid - partially decentralized e.g., Freenet Structured P2P systems DHT CAN. P2P Network.
E N D
P2P Search Techniques • Centralized P2P systems • e.g. Napster, SETI@home • Decentralized & unstructured P2P systems • e.g. Gnutella • Hybrid - partially decentralized • e.g., Freenet • Structured P2P systems • DHT • CAN
P2P Network • P2P network is an overlay network built on top of a real physical network (e.g., Internet) • In a P2P network, peers are network nodes connected by virtual or logical links • A logical link is a path through many physical links in the underlying network
Napster: Publish a File Users upload their IP address and music titles they wish to share (xyz.mp3, 192.1.2.3) Napster server (Central Catalog) 192.1.2.3
Napster: Query for a File • Users search for peers to download desired files xyz.mp3 ? 192.1.2.3 192.1.2.3 Central Napster server
Napster: Transfer Requested File File transfer is P2P, using a proprietary protocol xyz.mp3 ? 192.1.2.3 Central Napster server
Disadvantage of Centralized Directory • Performance bottleneck • Single point of failure Can we do it without a directory ?
Decentralized P2P - Gnutella • No catalog • Pings network to locate Gnutella peers • File requests are broadcast to peers • Flooding or breadth-first research • When provider is located, the file is transferred via HTTP
Gnutella: Join the Network Special peer maintained by Gnutella Peers are Internet edges Who are my neighbors ? Pings network to locate peers
Gnutella: Broadcast Request to Peers xyz.mp3 ?
Gnutella: Flood the Request (Breadth-first research) I have it.
Gnutella: Reply with the File(via HTTP) I have it. xyz.mp3
Gnutella - Disadvantages • Network flooding - unnecessary network traffic • Using TTL - some files might not be found • Alternatively, • using ultranodes (or supernodes) • using depth-first search, i.e., Freenet
Morpheus, KazaaFlooding only the Supernodes Supernode Layer
Using Ultranodes • Queries flood only the network of ultranodes • Other peer nodes shielded from query traffic • Combine the benefits of centralized and decentralized search; • Take advantage of the heterogeneity in peer capabilities;
Freenet – File not Found I have file X • The requested file not found due to a poor routing decision made at peer D • In this case, query backs out of the dead-end, and tries another peer in depth-first manner
Using Distributed Directory • Data objects are everywhere • Distribute subsets of the data directory among peers • If we can find the relevant sub-directory, we can locate the data object Directory Data Objects Sub-directory
How to Bound Search Space ?Basic Idea - Hashing P2P Network Publish (H(y)) Join (H(x)) Object “y” Peer “x” H(y) H(x) Peer nodes also have hash keys in the same hash space Objects have hash keys y x Hash key Place location information about an object at the peer with closest hash keys (i.e., a distributed directory)
Viewed as a Distributed Hash Table 0 2128-1 Hash table Peer nodes • Each peer node is responsible for a range of the hash table, according to the peer hash key • Location information about Objects are placed in the peer with the closest key (information redundancy)
How to Find an Object ? Looks for a peer /w the corresponding peer hash key • A peer knows its logical neighbors • Find peer X based on multihop routing • X knows who has the object 0 2128-1 Hash table Peer node X Peer Y has the file
K V K V K V K V K V K V K V K V K V K V K V Dynamic Hash Table (DHT) in action
K V K V K V K V K V K V K V K V K V K V K V DHT in action
K V K V K V K V K V K V K V K V K V K V K V DHT in action: put() Want to share a file insert(K1,V1) Operation: Route message, “I have the file,” to node holding key K1
K V K V K V K V K V K V K V K V K V K V K V DHT in action: put() (K1,V1) Operation: take key as input; route messages to node holding key
K V K V K V K V K V K V K V K V K V K V K V DHT in action: get() retrieve (K1) Operation: Retrieve message V1 at node holding key K1
K V K V K V K V K V K V K V K V K V K V K V DHT in action Retrievefile according to V1
Still Flooding • Still flood the network although intermediate nodes do not need to search • Can we avoid flooding ?
CAN – Content Addressable Network • Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone • Each peer knows the neighbors of its zone • Random assignment of peers to zones at startup – split zone if not empty • Dimensional-ordered multihop routing
CAN: Object Publishing I node I::publish(K,V)
CAN: Object Publishing x = a I node I::publish(K,V) (1) a = hx(K)
CAN: Object Publishing x = a I node I::publish(K,V) (1) a = hx(K) b = hy(K) y = b
CAN: Object Publishing I node I::publish(K,V) J (1) a = hx(K) b = hy(K) (2) route (K,V) -> J
CAN: Object Publishing I node I::publish(K,V) J (1) a = hx(K) b = hy(K) (K,V) (2) route (K,V) -> J (3) J stores (K,V)
CAN: Object Retrieval node I::retrieve(K) (1) a = hx(K) b = hy(K) J (K,V) (2) route “retrieve(K)” to J that is in charge of (a,b) I
Maintenance • Inform neighbors that you are alive at discrete time interval t • If your neighbor does not send alive message in time t, takeover its zone
P2P Benefits • Efficient use of resources • Use unused bandwidth, storage, and processing power at the edge of the network • Scalability • Consumers of resources also donate resources • Reliability • Replicas, geographic distribution No single point of failure • Ease of administration • Self organized nodes • Built-in reliability and load balancing
Some Prototypes at UCF • iSEE (Internet-scale Sensor Exploration Environement) • Publishing real-time sensor data • Browsing and querying real-time sensor data • P2P Video Streaming for VoD and Live Broadcast Applications