790 likes | 914 Views
Search in Distributed Networks. Lecture: Peer-to-peer networks Professor: Dr. Robert Tolksdorf Elena Antonenko elena.Antonenko@web.de Malte Münchert muencher@inf.fu-berlin.de Jing Zhao zhao@inf.fu-berlin.de Shunfeng Zhang zhang@inf.fu-berlin.de. Language of the talk:.
E N D
Search in Distributed Networks Lecture: Peer-to-peer networksProfessor: Dr. Robert TolksdorfElena Antonenko elena.Antonenko@web.deMalte Münchert muencher@inf.fu-berlin.deJing Zhao zhao@inf.fu-berlin.deShunfeng Zhang zhang@inf.fu-berlin.de
Language of the talk: • English instead of German! • Comment: German is also a very beautiful language! • Question can asked in German!
Structure of our talk: • Introduction • Content-Agnostic Search (Shunfeng); • Contect-Based Search (Elena); • Pastry(Malte); • JXTA Search (Jing)
Introduction • Most applications (file sharing, instant-messaging, chatting) involve • finding objects and resource of interest • exchanging resources with other peers. • Accomplished by a system of advertisements and queries
Introduction • Advertisement/query model: • Resource providers publish resource and resource consumer send • search queries; • Resource seekers advertise needs on the network and resource providers query the network for resource;
Introduction • The problem reduced to: • query a dynamic and distributed directory of • advertiesements by advertisement consumers • Distributed directory is built using a subset of all the peers in the network
Content-Agnostic Search >>>basic concept Organization of the peers not depend on the resources they index or point to;
Content-Agnostic Search >>> central mediator • Register content with the central server; • Query the central server for Information; • Roles of central server: • Matchmaker • Broker;
Content-Agnostic Search >>> central mediator as Matchmaker ASK-ALL: who can help? Reply: name1 + info1… Unadvertise Advertise STREAM-All „request“ REPLY… Matchmaker Requester Peer
Content-Agnostic Search >>> central mediator as Matchmaker • Requester: an agent with an objective that it wants to be achieved by some other agent. • Matchmaker: an agent that • knows the names of many agents • and their corresponding capabilities. • Server: an agent that has committed itself to fulfilling objectives on behalf of other agents.
Content-Agnostic Search >>> central mediator as Broker STREAM-ALL: „Request“ REPLY Unadvertise Advertise Broker Requester Peer
Content-Agnostic Search >>>central mediator as Broker • Requester: an agent that has an objective that the agent wants to has achieved by another agent. • Broker: • an agent that knows the names of some other agents and their corresponding capabilities, • and advertises its own capabilities as some function of the capabilities of these other agents. • Brokered Server: an agent that has committed to the broker to taking on a predetermined class of objectives.
Advantages Comprehensive Fast update Minimized messages exchange Disadvantages Central point failure Non-scalabe Needing central authority Comment: Be solved with decentralized mediator Content-Agnostic Search >>>central mediator
Content-Agnostic Search >>>Network forming random connected Graphs • Nodes are connected to few random neighbors • Example: Gnutella network • Already done in 2.nd Talk in the Lecture • Power Law Networks The search takes advantage of the power law link distribution of naturally occurring networks
Content-Agnostic Search >>>Power Law Networks • Power law distribution:few nodes have very high connectivitymany nodes with very low connectivity
Content-Agnostic Search >>>Power Law Networks Rule: Each time: one node two edges connect to node with higher degree
Content-Agnostic Search >>>Power Law Networks • Power law graphs are dynamically constructed • the rewiring of nodes occurs not randomly, but preferentially attaching to the most connected nodes.
Content-Agnostic Search >>>Power Law Networks • Power law search algorithm • needs modification to the basic Gnutella approach;
the Gnutella approach Broadcasting to all neighbors Can exchange with every neighbors Modified Gnutella the neighbor with highest connechtions Exchange with the first- and second-degree neighbors Content-Agnostic Search >>>Power Law Networks
Content-Agnostic Search >>>Power Law Networks • Advantages of PLN • Networks of decentralized mediators • Broadcasting queries to all neighbors avoided • Search cost reduced
Content-Based Search: Introduction • Content of queries is used to efficiently route the messages to the most relevant peers • Search techniques include: • Content-mapping networks; • Some variations of publish/subscribe networks; Content-Based Search
Content – Mapping Search Networks • All peer in network index a „zone“ of the advertisement space • The zone is dynamic • Size of the zone depends on the number of peers • Peers map advertisement content to the space • Mapping is performed using hash functions • Examples include: CAN, Chord, Tapestry, Pastry Content-Based Search
Distributed Hash Table (DHT) • DHT provides the same functionality as traditional hash table • DHT stores key value pair • Data structure is distributed over different nodes • Provides functions: • insert(id, item); • item = query(id); • Item can be anything: a data object, document, file, pointer to a file Content-Based Search
Content Addressable Network (CAN) • CAN is based on virtual d-dimensional coordinate space • Associate to each node and item a unique idin an d-dimensional space • Goals • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes Content-Based Search
Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area Example: Node n1: (1, 2) first node that joins cover the entire space CAN Example: Two Dimensional Space Content-Based Search
Node n2: (4, 2) joins space is divided between n1 and n2 CAN Example: Two Dimensional Space Content-Based Search
Node n3:(3, 5) joins too CAN Example: Two Dimensional Space Content-Based Search
Nodes n4:(5, 5) and n5:(6,6) join CAN Example: Two Dimensional Space Content-Based Search
Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5); n5:(6,6) Items: f1:(2,3); f2:(5,0); f3:(2,1); f4:(7,5) CAN Example: Two Dimensional Space Content-Based Search
Each item is stored by the node who owns its mapping in the space CAN Example: Two Dimensional Space Content-Based Search
Each node knows ist neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 CAN: Query Example Content-Based Search
CAN Routing • For d dimensions with n equal zones each node has 2d neighbors • Routing table size O(d) • Guarantees that a file is found in at most d x n 1/d steps, where n is the total number of nodes • Algorithm: Choose the neighbor nearest to the destination Content-Based Search
CAN: Multi-Dimension • Increase in the dimension reduces the path length Content-Based Search
Chord: Introduction • Chord is a distributed lookup protocol • Given a key (data item), it maps the key onto a node (peer). • Hash function assigns each node and key anm-bit identifier. • A node’sidentifier is defined by hashing the node’s IP address. • A key identifier is produced by hashing the key • ID(node) = hash(196.178.0.1) • ID(key) = hash(“jingle-bells.mp3”) Content-Based Search
Chord: Data Structure • Identifiers are ordered in a virtual ring of size 2m • Each node maintains • Finger table • Entry iin the finger table of node nis the first node that succeeds or equals n + 2i: successor(id) • Predecessor node • An item identified by idis stored on the successor node of id Content-Based Search
Chord: Example • Assume an identifier space 0..7 • Node n1:(1) joins all entries in its finger table are initialized to itself Content-Based Search
Chord: Example • Nodes n2:(2), n0:(0), n6:(6) join Content-Based Search
Chord: Example Nodes: n0(0),n1:(1), n2(2), n6(6) Items: f1:(1), f7:(7) Content-Based Search
Chord: Example Upon receiving a query for item id, a node • Check whether stores the item locally • If not, forwards the query to the largest node in its successor table that does not exceed id Content-Based Search
Chord: Properties • Routing table size O(log(N)) , where N is the total number of nodes • Guarantees that a file is found in O(log(N)) steps Content-Based Search
Pastry - Introduction • Decentralized and scalable DHT-network • Designed for efficient message routing between nodes
What does DHT mean? • Distributed Hash Table • Hash value for every peer • Every peer has knowledge of some other peers (stored in a hash table) • All hash tables from all peers represent a complete map for all peers
Peers reside on a virtual circle made up from all possible addresses Blue points represents peers The Pastry namespace 2128 20
Message is sent to (known) node which is numerically closest to the target-node Procedure is repeated until target-node is reached Pastry routing Origin Closest to target Distance Destination
Message is sent to (known) node which is numerically closest to the target-node Procedure is repeated until target-node is reached Pastry routing Origin Destination