200 likes | 321 Views
Data Indexing in Peer-to-Peer DHT Networks. Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004. DHT. Structure P2P Distributed Hash Table mapping between the file identifier and location Ex: Search for file "Starwars.divx“
E N D
Data Indexing in Peer-to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004
DHT • Structure P2P • Distributed Hash Table • mapping between the file identifier and location Ex: • Search for file "Starwars.divx“ • Convert "Starwars.divx" to a key, say "123456789“ • Lookup "123456789" in the DHT, find out the file location • Download the file
Indexing • Indexes don’t contain key-to-data mapping • Indexes provide a key-to-key service, or more precisely a query-to-query service • Ex: Query q A list of more specific queries, covered by q Select a query q If q is the most specific query of a file, returns the file
Maintain • In order to consists of query-to-query mappings, each node: • Insert( q , qi ) function, with q 包含所有的 qi adds a mapping( q ; qi ) to the index of the node responsible for key q • Lookup( q ) function, with q not being the most specific query of a file, returns a list of all the queries qi such there is a mapping(q;qi) in the index of the node responsible for key q
Example: bibliographic database Query-to-key Query-to-Query
Discussion • Some interesting properties of this indexing techniques: • Space efficient • Scalability • Loose coupling between data and indexes • Versatility • Adaptability • Decentralized architecture • Resilient to arbitrary linking
System point of view • Search process should be simple • Amount of network traffic should be minimized • Storage space dedicated to the indexing metadata should remain within reasonable limits.
Evaluation • Distributed Bibliographic Database • Bibliographic database sites: BibFinder http://kilimanjaro.eas.asu.edu NetBib http://edas.info/S.cgi?search=1
Indexing scheme Simple indexing scheme Flat indexing scheme
Indexing scheme Complex indexing scheme
Indexing scheme • Simple: A query for an author or a title returns a set of author and title pairs.The most space-efficient of the three, requiring 152MB of extra storage in the system. • Flat: index query length is always 2.require 37% increase more space. • Complex: some queries in the simple scheme are split into more specific queries.Require 25% increase more space.
Caching • Multi-cache: shortcuts are created on each node along the lookup path. Cache size is unbounded. • Single-cache: shortcuts are created only on the first node that was contacted. Cache size is unbounded. • LRU (least-recently used) : only a limited number of shortcuts can be stored on each node.
Conclusion • Indexing the data stored in the peer-to-peer network. Indexes are distributed across the nodes of the network and contain key-to-key (or query-to-query) mappings. • Given a broad query, a user can look up the more specific queries that match its original query.