200 likes | 303 Views
Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems. Kjetil Nørvåg Norwegian University of Science and Technology Trondheim, Norway Christos Doulkeridis and Michalis Vazirgiannis Athens University of Economics and Business Athens, Greece.
E N D
Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and Technology Trondheim, NorwayChristos Doulkeridis and Michalis Vazirgiannis Athens University of Economics and Business Athens, Greece
Outline • Motivation and example application • Taxonomies and taxonomy-based querying • Taxonomy-based query routing • Taxonomy caching: architecture and maintenance • Experimental results • Summary and further work ICPS'2006
Motivation • Mobile devices high storage capacity & wireless support • Contain multimedia documents that can be shared • Possibly other data/services: • Temperature or other environmental data • Important challenge: find the files & services! • Problem: • Dynamic contents, location, and visibility • Limited bandwidth Centralized indexing/search engines not applicable P2P network & search ICPS'2006
Example application: MobiShare • Devices share resources by hosting web services • Device connected to a CAS • CASs connected P2P • [More details in Valavanis et al., Web Intelligence’2003] ICPS'2006
Outline of basic idea 1) Describe contents according to taxonomy 2) Taxonomy info cached at remote peers 3) Use cached knowledge to route queriesto appropriate peers Why? 1) Should reduce latency 2) Increase recall with same cost ICPS'2006
Resource description • Taxonomy-based resource description • Also applicable for audio/video • More than one taxonomy might exist in system • Resource description: Taxonomy ID and set of categories ICPS'2006
Taxonomy-based querying Query: 1) Request for all resources belonging to category Cj or 2) Request for all resources belonging to category Cjand satisfying some additional property Example properties: Text contents, metadata ICPS'2006
Searching in unstructured P2P networks • Basic search technique: Local execution of query then forwarding if TTL>0 • Naïve flooding (all neighbors) • Normalized flooding (only K neighbors) • Random walks: only one random neighbor, but W walks initiated • Problem: Only a limited # of peers can be searched (query horizon) • Possible improvements: • Routing indices • Summary indexing (bloom filters etc) • Result caching • However: Still limited scalability and coverage ICPS'2006
Taxonomy caching • Basic idea: • Maintain taxonomic of remote contents in a taxonomy cache (TCache) • Mapping from taxonomic concept to set of peers • Advantages: • Cheaper to maintain than full-text index • More applicable to multimedia data • More robust wrt. changes in contents • Used to improve query routing Higher recall and reduced latency ICPS'2006
Query routing using taxonomy cache (TCache) • Basis: one of traditional routing strategies • Query forward peers: PF • Starting point: PF = neighbors=PN={PN1,…,PNn} • Lookup in TCache: Lookup(category) PC={PC1,…,PCm} • PF = PN+PC • Query forwarded to (subset of) PF ICPS'2006
Query forwarding alternatives (1) • Query forward peers: PF • # of neighbors (excl. previous): Nn • # matches from lookup: Nc • Ranking of peers in PC: • Based on # of resources within a category • High # of resources: considered experts • TCB: • Highest ranked in PC + the Nn neighbors in {PN1,…,PNn} • Forwarding to peer in PC called jump • Jump can be to peer beyond query horizon! • TCA: • If Nc≥ Nn: forward to Nn highest ranked peers in PC • If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc) randomly selected neighbors ICPS'2006
Query forwarding alternatives (2) • TCCN: • If Nc≥ Nn: forward to allNc peers in PC • If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc) neighbors • TCDN: • If Nc≥ Nn: forward to Nn/2 highest ranked peers in PC + random selection of Nn/2 other peers in PC • If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc) neighbors ICPS'2006
Distributing taxonomic information • Basic mechanism: piggyback matching category with query result • Rsult returned through original path, possibly involving jumps • Makes revalidation of contents intermediate TCaches possible • Coverage will be gradually extended (beyond query horizon) • Lazy distribution by gossiping also possible ICPS'2006
TCache architecture and maintenance • Aim: Provide efficient mapping C {PC1,…,PCm} • For each category: Peers, # of resources, and TTL • TTL: • Regularly decremented • Reset to start value at revalidation • Caching policy: Aggressive vs. selective • Compacting techniques: Peer upgrade&non-expert pruning ICPS'2006
Experimental setup • Simulations • Excerpts of DMOZ taxonomy • Synthetic network topologies • Resource allocation: 80/20 rule • Queries are taxonomic categories • A number of peers have role as querying peers • Measured: Contacted peers, messages, recall and latency • In this presentation: Results using flooding and TCDN query routing ICPS'2006
Improvements in recall ICPS'2006
Primary reason for improvement:More intelligent query forwarding ICPS'2006
Improvement and scalability ICPS'2006
Latency reduction • TCache results in very fast retrieval of first results • Finding all results approximately similar performance because flooding in both techniques ICPS'2006
Summary and further work • Presented motivation and context • Taxonomy-based querying and query routing • TCache architecture and maintenance • Experimental results proving our claims • Future/ongoing work: • Employing the techniques for XML/XPath querying in P2P context (to appear at IEEE P2P’2006) • Integration of different taxonomies ICPS'2006