150 likes | 158 Views
This presentation discusses the concept of Navigation-Aided Retrieval (NAR), which utilizes organic structure in documents to enhance search tasks. It introduces a formal model of NAR and evaluates its effectiveness through a user study.
E N D
Navigation-Aided Retrieval Shashank Pandit and Christopher Olstony Presentation by Yang Yu CSE 450 Web Data Mining
Outline • Introduction • Related Work • System Model • Prototype System • Evaluation • Summary & Future Work
Introduction • Background reasons for this work • Difficulty in formulating appropriate queries • Open-ended search tasks • Preference for orienteering • Navigation-Aided Retrieval
Introduction • Organic versus Synthetic Structure • One is trying to synthesize structure automatically into query results • One is trying to use structure that naturally exists in documents • Advantages of organic NAR • Human oversight. • Familiar user interface. • A single view of the document collection. • Robust implementation by a third party • Contributions • Formal model of navigation-aided retrieval • An overview of techniques for a NAR-based retrieval system • Empirical evaluation via a user study
Related Work • Selecting Starting Points • Best Trails system • An ad-hoc scoring function for starting points • Restrict starting points to be documents that themselves match the query • It does not take into account navigability factors • User interface departs substantially from the traditional interface • Topic distillation that mainly uses HITS • Only effective for broad topic areas for which there are many hubs and authorities • Guiding Navigation • WebWatcher highlights hyperlinks along paths taken by previous users who had posed similar queries.
System Model • Generic Model • Query submodel: • Navigation submodel: • generic scoring function • Assumption: every member of relevance set St is a singleton set. • “Fatten" St into {d1, d2, …, dn}.
System Model • Instantiations of Generic Model • Conventional Probabilistic IR Model • Navigation-Conscious Model • The two terms embody the two key factors • the number of documents reachable from d that are relevant to the search task • the ease and accuracy with which the user is able to navigate to those documents.
Prototype System • Preprocessing • Content Engine • Connectivity Engine: <d1, d2, dW, W(N(d2), d1, d2)> • Intermediary
Prototype System • Selecting Starting Points • 1. Retrieve from the content engine all documents d’ relevant to q. • 2. For each relevant document d’ retrieved in Step 1, then retrieve from the connectivity engine all documents d that can navigate to d’; • 3. For each unique document d in Step 2, compute the starting point score; • 4. Sort the documents in decreasing order of this score, truncate after the top k documents.
Prototype System • Adding Navigation Guidance • 1. Retrieve from the content engine all documents d’ for which R(d’, q)>= T; • 2. For each document d’ retrieved in Step 1, retrieve from the connectivity engine the tuple corresponding to <d, d’>, if it exists. • 3. For each <d1, d2, dW, W(N(d2), d1, d2)> tuple retrieved in Step 2, highlight links on d that point to dW. • Efficiency and Scalability
Evaluation • Experimental Hypotheses • In query-only scenarios, Volant does not perform significantly worse • In combined query/navigation scenarios, Volant performs better • The best organic starting point is of higher quality than one that can be synthesized using existing techniques. • Search Task Test Sets • Unambiguous: • Ambiguous: • Performance on Unambiguous Queries
Evaluation • Performance on Ambiguous Queries • 4 Criteria - Breadth; Accessibility; Appeal; Usefulness.
Summary and Future Work • Summary • Effectiveness • Relationship to conventional IR • Relationship to synthetic approaches • Future Work • Add redundancy to corpora • Tune scoring function to be applicable for synthetic starting points • Unified method can both for exploration and directly return document
Thank you! Questions or Comments?