P2P Search COP5711

P2P SearchCOP5711

P2P Search Techniques • Centralized P2P systems • e.g. Napster, SETI@home • Decentralized & unstructured P2P systems • e.g. Gnutella • Hybrid - partially decentralized • e.g., Freenet • Structured P2P systems • DHT • CAN

P2P Network • P2P network is an overlay network built on top of a real physical network (e.g., Internet) • In a P2P network, peers are network nodes connected by virtual or logical links • A logical link is a path through many physical links in the underlying network

Napster: Publish a File Users upload their IP address and music titles they wish to share (xyz.mp3, 192.1.2.3) Napster server (Central Catalog) 192.1.2.3

Napster: Query for a File • Users search for peers to download desired files xyz.mp3 ? 192.1.2.3 192.1.2.3 Central Napster server

Napster: Transfer Requested File File transfer is P2P, using a proprietary protocol xyz.mp3 ? 192.1.2.3 Central Napster server

Disadvantage of Centralized Directory • Performance bottleneck • Single point of failure Can we do it without a directory ?

Decentralized P2P - Gnutella • No catalog • Pings network to locate Gnutella peers • File requests are broadcast to peers • Flooding or breadth-first research • When provider is located, the file is transferred via HTTP

Gnutella: Join the Network Special peer maintained by Gnutella Peers are Internet edges Who are my neighbors ? Pings network to locate peers

Gnutella: Broadcast Request to Peers xyz.mp3 ?

Gnutella: Flood the Request (Breadth-first research) I have it.

Gnutella: Reply with the File(via HTTP) I have it. xyz.mp3

Gnutella - Disadvantages • Network flooding - unnecessary network traffic • Using TTL - some files might not be found • Alternatively, • using ultranodes (or supernodes) • using depth-first search, i.e., Freenet

Morpheus, KazaaFlooding only the Supernodes Supernode Layer

Using Ultranodes • Queries flood only the network of ultranodes • Other peer nodes shielded from query traffic • Combine the benefits of centralized and decentralized search; • Take advantage of the heterogeneity in peer capabilities;

Freenet - Depth-First Search

Freenet – File not Found I have file X • The requested file not found due to a poor routing decision made at peer D • In this case, query backs out of the dead-end, and tries another peer in depth-first manner

Using Distributed Directory • Data objects are everywhere • Distribute subsets of the data directory among peers • If we can find the relevant sub-directory, we can locate the data object Directory Data Objects Sub-directory

How to Bound Search Space ?Basic Idea - Hashing P2P Network Publish (H(y)) Join (H(x)) Object “y” Peer “x” H(y) H(x) Peer nodes also have hash keys in the same hash space Objects have hash keys y x Hash key Place location information about an object at the peer with closest hash keys (i.e., a distributed directory)

Viewed as a Distributed Hash Table 0 2128-1 Hash table Peer nodes • Each peer node is responsible for a range of the hash table, according to the peer hash key • Location information about Objects are placed in the peer with the closest key (information redundancy)

How to Find an Object ? Looks for a peer /w the corresponding peer hash key • A peer knows its logical neighbors • Find peer X based on multihop routing • X knows who has the object 0 2128-1 Hash table Peer node X Peer Y has the file

K V K V K V K V K V K V K V K V K V K V K V Dynamic Hash Table (DHT) in action

K V K V K V K V K V K V K V K V K V K V K V DHT in action

K V K V K V K V K V K V K V K V K V K V K V DHT in action: put() Want to share a file insert(K1,V1) Operation: Route message, “I have the file,” to node holding key K1

K V K V K V K V K V K V K V K V K V K V K V DHT in action: put() (K1,V1) Operation: take key as input; route messages to node holding key

K V K V K V K V K V K V K V K V K V K V K V DHT in action: get() retrieve (K1) Operation: Retrieve message V1 at node holding key K1

K V K V K V K V K V K V K V K V K V K V K V DHT in action Retrievefile according to V1

Still Flooding • Still flood the network although intermediate nodes do not need to search • Can we avoid flooding ?

CAN – Content Addressable Network • Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone • Each peer knows the neighbors of its zone • Random assignment of peers to zones at startup – split zone if not empty • Dimensional-ordered multihop routing

CAN: Object Publishing I node I::publish(K,V)

CAN: Object Publishing x = a I node I::publish(K,V) (1) a = hx(K)

CAN: Object Publishing x = a I node I::publish(K,V) (1) a = hx(K) b = hy(K) y = b

CAN: Object Publishing I node I::publish(K,V) J (1) a = hx(K) b = hy(K) (2) route (K,V) -> J

CAN: Object Publishing I node I::publish(K,V) J (1) a = hx(K) b = hy(K) (K,V) (2) route (K,V) -> J (3) J stores (K,V)

CAN: Object Retrieval node I::retrieve(K) (1) a = hx(K) b = hy(K) J (K,V) (2) route “retrieve(K)” to J that is in charge of (a,b) I

Maintenance • Inform neighbors that you are alive at discrete time interval t • If your neighbor does not send alive message in time t, takeover its zone

P2P Benefits • Efficient use of resources • Use unused bandwidth, storage, and processing power at the edge of the network • Scalability • Consumers of resources also donate resources • Reliability • Replicas, geographic distribution  No single point of failure • Ease of administration • Self organized nodes • Built-in reliability and load balancing

Some Prototypes at UCF • iSEE (Internet-scale Sensor Exploration Environement) • Publishing real-time sensor data • Browsing and querying real-time sensor data • P2P Video Streaming for VoD and Live Broadcast Applications

P2P Search COP5711

P2P Search COP5711

Presentation Transcript

P2P

: : : :P2P : : : :

The Case for a Hybrid P2P Search Infrastructure

P2P P2P 2005

P2P

Improve search in unstructured P2P overlay

Improving Search in P2P Networks

Search and Replication in Unstructured P2P Networks

WEBTOP Meta Search and P2P Knowledge Sharing

P2P Concept Search

Improving Search in P2P Networks

ISP-aided Biased Query Search in P2P Systems

P2P Search

P2P

Search in P2P architecture

P2P P2P 2005

P2P