60 likes | 167 Views
Query Caching in Agent-based Distributed Information Retrieval. Hemali Majithia. Problem Definition. DIR (IR) systems access their collections to perform searches and answer queries Query resolution on large corpora is expensive in terms of time and resources
E N D
Query Caching in Agent-based Distributed Information Retrieval Hemali Majithia Hemali Majithia - CADIP, UMBC
Problem Definition • DIR (IR) systems access their collections to perform searches and answer queries • Query resolution on large corpora is expensive in terms of time and resources • Similar queries produce similar results • Repetitive and redundant searching of the collections • Resource Wastage and Inefficiency • Solution – “ CACHING QUERIES ” Hemali Majithia - CADIP, UMBC
Solution • Caching Mechanism • Cache new queries along with the results • Answer future similar queries using the cached queries • New Query • Query which has not been answered before • Similar Query • Query which is identical or similar to the queries existing in the cache • Emphasis • If similar queries exist, you can retrieve the results for those queries from the previous searched queries rather than exact match • Retrieval linear time collection size Hemali Majithia - CADIP, UMBC
Caching Mechanism • Two level Caching Mechanism • First level Exact Match • Second level Inverted Index of the queries • Caching Algorithm • Least Recent Used (LRU) • Least Frequent Used (LFU) • Lowest Relative Value (LRV) • Similarity Metric • Cosine Similarity Hemali Majithia - CADIP, UMBC
Secondary Cache Secondary Cache 9.. Update cache 5. Miss 3. MISS 4. Query forwarded 10. Results returned 8. HIT 2. Lookup 11. Response 7. Lookup 1. User query 6. Query forwarded to best C2 Primary cache Primary cache Primary cache Primary cache Primary cache Primary cache Caching in CARROT–II Node I Node II Query Agent C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent C2 Agent Hemali Majithia - CADIP, UMBC
Metrics for Evaluation of Caching Mechanism • Efficiency • Round Trip Time (RTT) = Total time to answer queries fired at the system • Hit Rate = For each agent cache and total hit rate • Cost of caching = The over head caused by caching (assuming that the HIT rate is 0) • Effectiveness • Precision =fraction of retrieved documents that are relevant • Recall =fraction of relevant documents that are retrieved Hemali Majithia - CADIP, UMBC