Even More TopX: Relevance Feedback

Ralf Schenkel Even More TopX:Relevance Feedback Joint work with Osama Samodi, Martin Theobald

TopX Results with INEX 2007 • 660,000 XMLified English Wikipedia articles • 107 topics with • structural query (CAS) • nonstructural (aka keyword) query (CO) • informal description of information need • assessed answers (text passages) • Evaluation metric based on recall/precision: fraction of relevant characters retrieved result list C: #characters retrieved R: #relevant characters retrieved P[0.01]=R/C 1% recall

Results with INEX 2007 structure queries keyword queries improved structure queries (unchecked) improved keyword queries document retrieval Structural constraints can improve result quality

Users vs. Structural XML IR • Structural query languagesdo not work in practise: • Schema is unknown or heterogeneous • Language is too complex • Humans don‘t think XPath • Results often unsatisfying //professor[contains(.,SB) and contains(.//course,IR] I need information about a professor in SB who teaches IR. • System support to generate „good“ structured queries: • User interfaces („advanced search“) • Natural language processing • Interactive query refinement

Relevance Feedback for Interactive Query Refinement XML 1 IR IR 2 index 3 Fagin index 4 index XML IR … query evaluation XMLnot(Fagin) 1. User submits query 2. User marks relevant and nonrelevant docs • Feedback for XML IR: • Start with keyword query • Find structural expansions • Create structural query 3. System finds best terms to distinguish between relevant and nonrelevant docs 4. System submits expanded query

Structural Features User marksrelevant result article frontmatter body backmatter sec sec author„Baeza-Yates“ sec „Semistructured data…“ subsec„XML has evolved…“ subsec p p p„With the advent of XSLT…“ Possible features: Tag+Content of descen-dants of ancestors Tag+Contentof ancestors Content ofresult Tag+Content ofdescendants AD: article//author[Baeza] C: XML D: p[XSLT] A: sec[data]

Feature Selection Order features by Robertson Selection Value: wherepf probability that f occurs in relevant result,qf probability that f occurs in nonrelevant result Compute Robertson-Sparck-Jones weight for each feature (also used as weight in query): whererf number of relevant results with fR number of relevant resultseff number of elements that contain fE number of all elements

Query Construction descendant-or-self axis article author[Baeza] sec[data] p[XSLT] Initial query: query evaluation needs schemainformation! *[query evaluation] *[query evaluation XML] Tag+Content of descen-dants of ancestors Tag+Contentof ancestors Content ofresult Tag+Content ofdescendants AD: article//author[Baeza] C: XML D: p[XSLT] A: sec[data]

More Fancy Query Construction p[XSLT] article sec[data] author[Baeza] *[query evaluation] *[query evaluation XML] • No valid NEXI query, but XPath (ancestor axis) •  DAG queries in TopX • needs disjunctive evaluation

Example: „pyramids of egypt“

Architecture query TopX SearchEngine INEX Tools & Assessments results query + results feedback Candidate Classes expanded query C Module D Module AD Module A Module Weighting + Selection

RF in the TopX 2.0 Interface

Evaluation Methodology Goal: avoid „training on the data“ • Freeze known results at the top • Remove known results+X from the collection • resColl-result: remove results only (~doc retrieval) • resColl-desc: remove results+descendants • resColl-anc: remove results+ancestors • resColl-path: remove results+desc+anc • resColl-doc: remove whole doc with known results

Evaluation: INEX 2003&2004 • INEX collection(IEEE-CS journal and conference articles): • 12,107 XML docs with 12 mio. elements • queries with manual relevance assessments • 52 keyword queries from 2003 & 2004 with our TopX Search Engine [VLDB05] • Baseline run with MAP~0.1, Precision@20=0.174 • Automatic feedback for top-k from relevance assessments • Evaluation ignores results used for feedback and descendants of results (rescoll-desc)

INEX 2003&2004, rescoll-desc All dimensions together are best. Reasonable results for INEX 2005 RF Track

Results for INEX 2005 Track • INEX IEEE collection (scientific articles) • Feedback for the top-20 from the assessments (with the strict quantisation -> only „relevant“ and „nonrelevant“) • top 10 expansion features • runs with top 1500 results • MAP with inex_eval (with strict quantisation)

(Some) Results for INEX 2006 RF Track • INEX Wikipedia collection • Feedback for the top-20 from the assessments (with the generalized quantisation -> graded relevance) • top 10 expansion features • runs with top 100 results for first 50 topics (time…) • MAP with inex_eval (with generalised quantisation) • Significance tests (Wilcoxon signed-rank, t-test)

Conclusions • Queries with structural constraints to improve result quality • Relevance Feedback to create such queries • Structure of collection matters a lot

Even More TopX: Relevance Feedback