Query Caching in Agent-based Distributed Information Retrieval

Query Caching in Agent-based Distributed Information Retrieval Hemali Majithia Hemali Majithia - CADIP, UMBC

Problem Definition • DIR (IR) systems access their collections to perform searches and answer queries • Query resolution on large corpora is expensive in terms of time and resources • Similar queries produce similar results • Repetitive and redundant searching of the collections • Resource Wastage and Inefficiency • Solution – “ CACHING QUERIES ” Hemali Majithia - CADIP, UMBC

Solution • Caching Mechanism • Cache new queries along with the results • Answer future similar queries using the cached queries • New Query • Query which has not been answered before • Similar Query • Query which is identical or similar to the queries existing in the cache • Emphasis • If similar queries exist, you can retrieve the results for those queries from the previous searched queries rather than exact match • Retrieval  linear time  collection size Hemali Majithia - CADIP, UMBC

Caching Mechanism • Two level Caching Mechanism • First level  Exact Match • Second level  Inverted Index of the queries • Caching Algorithm • Least Recent Used (LRU) • Least Frequent Used (LFU) • Lowest Relative Value (LRV) • Similarity Metric • Cosine Similarity Hemali Majithia - CADIP, UMBC

Secondary Cache Secondary Cache 9.. Update cache 5. Miss 3. MISS 4. Query forwarded 10. Results returned 8. HIT 2. Lookup 11. Response 7. Lookup 1. User query 6. Query forwarded to best C2 Primary cache Primary cache Primary cache Primary cache Primary cache Primary cache Caching in CARROT–II Node I Node II Query Agent C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent Hemali Majithia - CADIP, UMBC

Metrics for Evaluation of Caching Mechanism • Efficiency • Round Trip Time (RTT) = Total time to answer queries fired at the system • Hit Rate = For each agent cache and total hit rate • Cost of caching = The over head caused by caching (assuming that the HIT rate is 0) • Effectiveness • Precision =fraction of retrieved documents that are relevant • Recall =fraction of relevant documents that are retrieved Hemali Majithia - CADIP, UMBC

Query Caching in Agent-based Distributed Information Retrieval

Query Caching in Agent-based Distributed Information Retrieval

Presentation Transcript

DISTRIBUTED INFORMATION RETRIEVAL

Query Result Caching

Distributed Information Retrieval

10.0 Speech-based Information Retrieval

INFORMATION RETRIEVAL IN A DISTRIBUTED ENVIRONMENT USING MOBILE AGENT

Advantages of Query Biased Summaries in Information Retrieval

Caching in Distributed File System

Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval

Information Retrieval - Query expansion

Text Based Information Retrieval Document and Query Representation Lecture I

Distributed Information Retrieval Jamie Callan

Evidence-Based Information Retrieval in Bioinformatics

Query Expansion in Information Retrieval using a Bayesian Network-Based Thesaurus

Abstracting Communication in Distributed Agent-Based Systems

Semantic Query Caching in Mobile Environments

Speech-based Information Retrieval

Information Retrieval - Query expansion

Automatic Query Expansion in Information Retrieval

Dirichlet Mixtures for Query Estimation in Information Retrieval

Distributed Query

Information Retrieval - Query expansion

Parallel and Distributed Information Retrieval