530 likes | 613 Views
Routing Indices For P-to-P Systems. ICDCS 2002. Introduction. Search in a P2P system Mechanisms without an index Mechanisms with specialized index nodes (centralized search) Mechanisms with indices at each node Structure P2P network Unstructure P2P network
E N D
Routing Indices For P-to-P Systems ICDCS 2002
Introduction • Search in a P2P system • Mechanisms without an index • Mechanisms with specialized index nodes (centralized search) • Mechanisms with indices at each node • Structure P2P network • Unstructure P2P network • Parallel v.s. sequentially search • Response time • Network traffic
Routing indices(RI) • Query • Documents are on zero or more “topics”, and queries request documents on particular topics. • Documents topics are independent • Local index • RI • Each node has a local routing index which contains following information • The number of documents along each path • The number of documents on each topic of interest • Allow a node to select the “best” neighbors to send a query to
The RI may be “coarser” than the local indices • overcounts • Undercounts
Goodness measure • Number of results in a path • Using Routing indices
Storage space • N: number of nodes in the P2P network • b: branching factor • c: number of categories • s: counter size in bytes Centralized index : s*( c+1) *N Distributed system: s*(c+1)*b (each node)
Maintaining Routing Indices • Trade off between RI freshness and update cost • No requiring the participation of a disconnecting node • Discussion • If the search topics is dependent? • Can the number of “hops” necessary to reach a document be estimated?
Alternative Routing Indices • Hop-count RI • Aggregated RIs for each “hop” up to a maximum number of hops are stored
Search cost • Number of messages • The goodness of a neighbor • The ratio between the number of documents available through that neighbor and the number of messages required to get those documents • Regular tree with fanout F • It takes Fh messages to find all documents at hop h • Storage cost?
Exponentially aggregated RI • Store the result of applying the regular-tree cost formula to a hop-count RI • How to compute the goodness of a path for the query containing several topics?
Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina
Outline • Introduction • Techniques • Experiment
Introduction • We present three techniques for efficient search in P2P systems. • Basic idea is to reduce the number of nodes that process a query
Current Techniques • Gnutella • BFS with depth limit D. • Waste bandwidth and processing resources • Freenet • DFS with depth limit D. • Poor response time.
Iterative Deepening • Under policy P= { a, b, c} ;waiting time W • See example.
Directed BFS • A source send query messages to just a subset of its neighbors • A node maintains simple statistics on its neighbors • Number of results received from each neighbor • Latency of connection
Candidate nodes • Returned the Highest number of results • Low hop-count • High messages
Local Indices • Each node n maintains an index over the data of all nodes within r hops radius. • All nodes at depths not listed in the policy simply forward the query. • Example: policy P= { 1, 5}
Experimental Setup • For each response ,we log: • Number of hops took • IP from which the Response message came • Response time • Individual results
Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems Kunwadee Sripanidkulchai Bruce Maggs Hui Zhang IEEE INFOCOM 2003
motivation • Although flooding is simple and robust, it is not scalable. • A content location solution in which peers organized into an interest-based structure on top of Gnutella. • The algorithm is called interest-based shortcuts
Shortcuts Architecture and Design Goals • To create additional links on top of a peer-to-peer system’s overlay • As a separate performance enhancement layer on top of existing content location mechanisms
Shortcut Discovery • The first lookup returns a set of peers that store the content • These are potential candidates. • One peer is selected at random from the set and added • For scalability, each peer allocates a fixed-size amount of storage to implement shortcuts.
Shortcut selection • We rank shortcuts based on their perceived utility • A peer sequentially asking all of the shortcuts on its list.
Ranking metrics • Probability of providing content • Latency of the path to the shortcut • Load at the shortcut • A combination of metrics can be used based on each peer’s preference
Performance indices • Success rate • Load characteristics • Query scope • Minimum reply path lengths • Additional state
Potential and Limitations • Adding 5 shortcuts at a time produces success rates that are close to the best possible. • Slightly increase the shortest path length from 1 to 2 hops will perform better success rate.
Conclusion • A simple and practical mechanism was proposed.
Introduction • Structured P2P network • Only support search with a single keyword • Similarity between two documents • Keyword sets • Vector space • Measure • Problems • Search problem • New keyword?
Meteorograph • Absolute angle
Publishing and Searching • Publish • Hash • Publish the item to a node np with the hash key closest to hash value
Search problem • Nearest answers • K_nearest answers • e • Partial • Comprehensive • Search strategy • Discussions • What happened when keyword vector is represented by q?
Other issues • Load balance • Changes of vector space • Republished? • Comprehensive set of keywords • Other methods?
SWAM: A Family of Access Methods for Similarity-Search in Peer-to-Peer Data Networks Farnoush Banaei-Kashani Cyrus Shahabi (CIKM04)
PDN access method • Defines • How to organize the PDN topology to an index-like structure • How to use the index structure
Hilbert space • Hilbert space (V, Lp) • Key k = (a1,a2, … , ad) • d: the dimension of a Vector space • The domain is a contiguous and finite interval of R • The Lp norm with p belongs to Z+ • The distance function to measure the dissimilarity
Topology • Topology of a PDN can be modelled as a directed graph G(N, E) • A(n) is the set of neighbors for node n • A node maintains • A limited amount of information about its neighbors Includes • the key of the tuples maintained at neighbors • The physical addresses of neighbors
The processing of the query is completed when all expected tuples in the relevant result set are visited • Access methods • Join, leave for virtual nodes • Forward for using local information to process queries and make forwarding decisions
The small world example • Grid component • Random graph component • The process of queries (exact, range, kNN) in the highly locality topology
Flat partitioning • SWAM also employs the space partitioning idea: flat partitioning
Query Processing • Exact-Match query processing • Range query processing • kNN Query processing
Data Indexing in Peer-to-Peer DHT Networks ICDCS 2004