1 / 17

Le Zhao*, Xiaozhong Liu # , Jamie Callan *

WikiQuery.org -- An interactive collaboration interface for creating , storing and sharing effective CNF queries. Le Zhao*, Xiaozhong Liu # , Jamie Callan * *: Language Tech Institute, SCS, Carnegie Mellon University # : School of Information, University of Indiana Bloomington

Download Presentation

Le Zhao*, Xiaozhong Liu # , Jamie Callan *

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WikiQuery.org-- An interactive collaboration interface for creating, storing and sharing effective CNF queries Le Zhao*, Xiaozhong Liu#, Jamie Callan* *: Language Tech Institute, SCS, Carnegie Mellon University#: School of Information, University of Indiana Bloomington @OSIR 2012, Portland, OR

  2. Status Quo • Current open source search engines • Good at: attracting software apps/service providers • Lucene, Lemur, Terrier, … • Lacking: end users • study users’ search behavior • Two necessary conditions for success: • Attract users • with a unique feature/functionality • (not offered by current Web search engines) • Retain users • not easily copied by the current search engines

  3. This Work • Unique opportunity • The term mismatch problem • Significant problem w/ huge potential [Zhao10,12] • Web search engines: automatic expansion • Lots of room for manual expansion • Solution • Conjunctive normal form expansion • Commonly & effectively used by expert searchers [Lancaster68,Harter86,Hearst96,Baron07]

  4. Term Mismatch Problem • Average term mismatch rate: 30-40% [Zhao10] • A common cause of search failure [Harman03, Zhao10] • Frequent user frustration [Feild10] • More than 50-300% gain in retrieval accuracy [Zhao12] Relevant docs not returned Web, short queries, stemmed, inlinks included

  5. Term Mismatch & BooleanConjunctive Normal Form (CNF) Expansion Keyword query: approval of logos on television watched by children Manual CNF (TREC Legal track 2006): (approval OR guideline OR strategy)AND(logos OR promotionOR signage OR brand OR mascot ORmarque OR mark)AND(televisionOR TV OR cable OR network)AND(watched OR view OR viewer)AND(children OR child OR teen OR juvenile OR kid OR adolescent) • Expressive & compact (1 CNF == 100s alternatives) • Highly effective (50-300% over base keyword [Zhao12]) • Widely used by experts (library, legal, medical …) • But, tedious to create

  6. Goal of WikiQuery To facilitate the • creation (a proper tool is lacking for practitioners) • storing (for refinding & sharing) • and sharing (for collaboration on query parts) of high quality CNF queries

  7. One Wiki Page == One Search Need

  8. Displaying CNF Query

  9. Links to Search Result Pages • Relying on existing search engines for results

  10. Creating/Editing CNF Queries

  11. Search Result Access When Editing Interactions with search results are usually necessary to ensure quality of the created query.

  12. Looks Similar, but Different from • Library advanced search: simple Boolean • For example, Library of Congress Advance search: • Term1 AND/OR Term2 AND/OR Term3 • LexisNexis: free form Boolean • ERIC: more flexible, joining with AND or OR

  13. Evaluating the WikiQuery Interface

  14. Experiment Setup • Hypothesis • Users (with limited knowledge about Boolean queries) can create effective Boolean queries using WikiQuery • Preliminary user studies • Classroom users • 6 students, limited prior knowledge of Boolean queries or IR • a 10 minute session of editing Wiki pages • TREC topics => user Boolean CNF queries • Total 12 topics, 12 final Boolean queries • Interacted with Google and Yahoo (Spring, 2010) (40 minutes per topic) • Evaluating on the Web & TREC collections

  15. Results • :-) • on average 30-50% gain over keyword queries • :-( • But, not consistent (Web eval > TREC eval) • not statistically significant over strong kw baseline • :-O • 75% contain more restrictive formulations, unstable • < 50% of queries are CNF expanded, bit more stable • Need better tools to check quality of expansion terms

  16. Conclusions • Conjunctive Normal Form expansion has great potential • Users need guidance and learning to create effective CNF queries • need to be warned against more restrictive queries • Uses of WikiQuery: • Expert searchers • Classroom: becoming experts • Whenever you face a hard query

  17. Source Code • Based on MediaWiki version 1.17 • Available at https://github.com/lezhao/wikiquery • Questions: now OR catch me at lunch!

More Related