Rutgers Information Interaction Lab at TREC 2005: Trying HARD

Rutgers Information Interaction Lab at TREC 2005: Trying HARD N.J. Belkin, M. Cole, J. Gwizdka, Y.-L. Li, J.-J. Liu, G. Muresan, D. Roussinov*, C.A. Smith, A. Taylor, X.-J. Yuan Rutgers University; *Arizona State University

Our Major Goal • Clarification forms (CFs) are simulations of user-system interaction • Users are unwilling to engage in explicit interaction unless payoff is high, and interaction is understood as relevant • Is explicit interaction worthwhile, and if so, under what circumstances?

General Approach to the Question • Use relatively “standard” interactive elicitation techniques to enhance/ disambiguate original query • Compare results to baseline • Compare results to baseline plus relatively “standard” non-interactive query enhancement techniques, in particular, pseudo-rf

Methods for Automatic Query Enhancement • Pseudo-relevance feedback (standard Lemur) • Language modeling-based query expansion (clarity), derived from collection • Web-based query expansion

Methods for User-Based Query Enhancement • User selection of terms suggested by “clarity” and web methods (user selection based on Koenemann & Belkin, 1996; Belkin, et al., 2000) • Elicitation of extended information problem descriptions (elicitation based on Kelly, Dollu & Fu, 2004; 2005)

Hypotheses for Automatic Enhancement • H1: Query expansion using “clarity”-derived terms will improve performance over baseline & baseline + pseudo-rf • H2: Query expansion using web-derived terms will improve performance, ditto • H2b: Query expansion using both clarity- and web-derived terms will improve performance, ditto

Hypotheses for User-Based Query Enhancement • H3: Query expansion with terms selected by the user from those suggested by clarity- and web-derived terms will improve performance, over everything else • H4: Query expansion using “problem statements” elicited from users will increase performance over baseline & baseline + pseudo-rf

Hypothesis for When Elicitation is Useful • H5: The effectiveness of query expansion using problem statements will be negatively correlated with query clarity.

RUTGBL: Baseline query (title + description) RUTGBF3: Baseline + pseudo-rf (Lemur) RUTGWS1: Baseline + 0.1(Web-suggested terms) RUTGLS1: Baseline + 0.1(clarity-suggested terms) RUTGAS1: Baseline + 0.1(all suggested terms) RUTGUS1: Baseline + 0.1(terms selected by user) RUTGUG1: Baseline + 0.1(user-generated terms) RUTGALL: Baseline + all suggested terms and all user-generated terms Query Run Designations

Identification of Suggested Terms • Clarity: Compute query clarity for topic baseline (Lemur QueryClarity); sort terms accordingly; choose top ten • Web: Next slide, please

Navigation by Expansion Paradigm (NBE) undocumented aliens arrested border trafficked haitians WWW WWW WWW Title: human smugglingDescription: Identify incidents of human smuggling

Navigation by Expansion Paradigm (NBE) • Step1: Overview of the surroundings • Produces words and phrases “clearly related” to the topic • Internet mining: topic sent to Google • Logistic regression on the “signal to noise” ratio: • Signal = df(results)/#results • Noise = df(web)/#web • Pr = 1 – exp (-(signal/noise – 1)/a) • Step2: Valid “moves” identified • Related concepts from step 1 and those that • Are present in AQUAINT • Would affect search results if selected: impact estimate = P*df*idf • Step 3: Selected moves executed • E.g. by query expansion: • Score = original score + expansion score * expansion factor

“Combination” Run • Combining pseudo-rf with user-selected terms from CF1 (run RUTBE) • R-Prec. for RUTBE 0.334 • Substantially better than all other runs, but not comparable, because using different ranking function (BM25) and different differential weighting (0.3 for added terms) • Indicative of possible improvements

User Selection (CF1)

User Generation (CF2)

System Implementation • Lemur 3.1, 4.0, 4.1, using StructQueryEval • Could we ask for somewhat more detailed documentation from the Lemur group?

R-precision MAP p@10 Mean SD Mean SD Mean SD Overall Baseline median 0.252 0.149 0.190 0.147 0.408 0.28 RUTGBL 0.270 0.167 0.206 0.163 0.408 0.30 Overall Final median 0.264 0.152 0.207 0.161 0.45 0.30 RUTGALL 0.299* 0.182 0.253 0.188 0.49** 0.31 Comparison to Other Sites

R-Precision for Test Runs

BL AS1 LS1 WS1 US1 UG1 BF3 ALL BL 0.270 AS1 * 0.278 ---- LS1 * 0.279 n/s ---- WS1 * 0.281 n/s n/s ---- US1 * 0.282 n/s n/s n/s ---- UG1 * 0.286 n/s n/s n/s n/s ---- BF3 n/s 0.287 n/s n/s n/s n/s n/s ---- ALL n/s 0.299 n/s n/s n/s n/s n/s n/s ---- Summary of Significant Differences, R-Prec.

Varying Weights of Baseline Terms w.r.t.CF2 Terms

Varying Weights of CF2 Terms w.r.t. Baseline Terms

Run name R-Precision Precision at 10 Mean Average Precision Mean SD Mean SD Mean SD RUTGBL 0.270 0.167 0.408 0.3 0.206 0.16 Q1 0.290 0.178 0.498* 0.325 0.236 0.183 Q2 0.274 0.181 0.474* 0.321 0.223 0.181 Q3 0.295 0.164 0.498** 0.303 0.237** 0.175 Q1Q2 0.298* 0.182 0.514** 0.326 0.248** 0.190 Q1Q3 0.313* 0.176 0.538*** 0.314 0.263** 0.186 Q1Q2Q3 0.314** 0.179 0.564*** 0.304 0.268** 0.190 CF2 & Baseline Terms, Equal Weights

Results w.r.t. Hypotheses • H1, H2, H3, H4 weakly supported w.r.t. baseline, not to pseudo-rf • H5 not supported • No correlation between baseline query clarity, and effectiveness of expanding with CF2 terms

Discussion (1) • Both automatic and user-based query enhancement improved performance over baseline, but not over pseudo-rf • No significant differences in performance between any enhancement methods, except Q1 v. Q1+Q3 (r-precision, 0.290 vs. 0.313)

Discussion (2) • Some benefit both from automatic methods, and to explicit interaction with user, which require some effort from the user that goes beyond initial query formulation • This interpretation of the results depends on the assumption that title+description queries are accurate simulations of user behavior

(Tentative) Conclusions • Results indicate that invoking user interaction for query clarification is unlikely to be cost effective • Alternative might be to develop ways to encourage more elaborate query formulation in the first instance, enhanced with automatic methods. • Subsequent enhancement could be via implicit sources of evidence, rather than explicit questioning, requiring no additional effort from the user.

Rutgers Information Interaction Lab at TREC 2005: Trying HARD