370 likes | 490 Views
WSCD09 Workshop on Web Search Click Data 2009. Survey and evaluation of query intent detection methods David J. Brenes, Daniel Gayo Avello , Kilian Pérez-González . phenoxyphenyl entertainment tonight jewish christmas mortgage calculator eoc phi famington , new mexico
E N D
WSCD09 Workshop on Web Search Click Data 2009 Survey and evaluation of queryintentdetectionmethods David J. Brenes, Daniel Gayo Avello, Kilian Pérez-González
phenoxyphenyl entertainmenttonight jewish christmas mortgagecalculator eoc phi famington, new mexico "oneidaindiannation" "oneida, ny" and "employment" simcity 4 cheats benihana of novi total body kc ferencmolnar cucumbershampoomoltonbrown wholikesgrinding in wow? solitudeutahski nicknilsson training in orlando homunculitattoo "robertlouisstevenson"+quotes wwiimilitaryinstallationssouth carolina bogen 3016 sch 1830v samsung www.brookdaleliving.com air temperature gauge oldcalvarychurch steelers playoff game
a query log (prettyraw, huh?) onequery(stillraw)
intent (transactional?) st. thomas bed and breakfast
intent (transactional?) st. thomas bed and breakfast topic (shopping/travel/lodging?)
intent (transactional?) st. thomas bed and breakfast topic (shopping/travel/lodging?) geographicallocation (VirginIslands?)
Intentisjustone of thequeriesdimensions. querytopic geographicallocation commercial/product/jobseeking ... human vs ‘robot’ queries
KDD Cup 2005LogCLEF 2009 WSCD’09
querytopicclassification KDD Cup 2005LogCLEF 2009 WSCD’09 log analysis and geographic query identification user goalshumans vs robotstopic analysis of queriesetc.
However… Topic and Intent aretheclassicalones and we are mainlyinterestedin queryintentdetection.
“To replicate most of the techniques to perform automatic query intent detection in addition to study the feasibility of a pooling strategy à-la-TREC to evaluate the different techniques.” “We hope to anticipate some of the benefits and difficulties from a hypothetical bakeoff on broad query classification.”
Queryintent Informational. The intent is to acquire some information assumed to be present on one or more web pages. Navigational. The immediate intent is to reach a particular site. Transactional. The intent is to perform some web-mediated activity. Andrei Z. Broder
Queryintent Informational. The intent is to acquire some information assumed to be present on one or more web pages. Navigational. The immediate intent is to reach a particular site. Transactional. The intent is to perform some web-mediated activity. Andrei Z. Broder difficulttodistinguish...
Queryintent NavigationalTransactional Informational symptoms separation anxiety dogs should snow dogs wear booties what is addison's diseases for dogs yellowpages the huns yellow pages.com
Queryintent NavigationalTransactional Informational symptoms separation anxiety dogs should snow dogs wear booties what is addison's diseases for dogs yellowpages the huns yellow pages.com findthe ‘right’ website findthe ‘right’ answer tellingapartinformationalfromnavigational/transactionalqueriesmatters.
Queryintentdetectionmethodsreplicated Leeet al. 2005, click-through data + documentcollection, ad hocthresholds Liuet al. 2006, click-through data, ad hocevidences Jansenet al.2008, rule-based Brenes and Gayo 2008, click-through data, ad hocevidences thesewerejusttheevaluatedmethods. thoserequiring training and/orexternalcorporawereleftforfutureresearch.
Queryintentdetectionmethodsreplicated • Leeet al. 2005 • Clickdistribution (fromclickthrough-data) and anchor-link distribution (fromdocumentcollection) • Flat distributions informationalqueries • Skeweddistributions navigational/transactional • Additionally, averagenumber of clicks per query
Queryintentdetectionmethodsreplicated • Liuet al. 2006 • nClicksSatisfied (nCS)number of sessionswheretheuserclicked, at most, nresults (n=2) • top nResultsSatisfied (nRS)number of sessionswheretheuserclicked, at most, the top nresults (n=5) • Additionally, clickdistribution(Lee et al. 2005)
Queryintentdetectionmethodsreplicated • Jansenet al. 2008 • Navigationalqueries: • Containnames of companies, businesses, organizationsorpeople • Containdomainsuffixes • Theyhavelessthanthreeterms • Most of the data obtainedfromFreebasehttp://www.freebase.com
Queryintentdetectionmethodsreplicated • Brenes and Gayo 2008 • Weight of themost popular result (cPopular) • Number of distinctvisitedresults (cDistinct) • Navigationalqueriestendtoappearissolated (cSession)
Proposedevaluationmethod Precision, recall and F-measure 6,624 queries (1‰) weremanuallytagged 10 judges plus theauthors Eachjudgetagged1,000 queries (aprox.) Thirdjudgepolishedinconsistencies
Whataboutthepoolingstrategy? Unfeasible :( 81% actual navigationalqueriesappear in the pool Most of thequeriesflagged as navigational80% of thequeries (5 million) tobemanuallyevaluated
Butthesamplewaspretty “small”… Sure. Biggersamples! Dolores Labs Blog: a mustread!
And what abouttheresults?
Bestachievers • Liuet al. nCS(0.378), nRS(0.398) • Brenes and Gayo cPopular(0.374) • Jansenet al.heuristic rules (0.360) • Promisingfeatures • nRS • usingdomains and websitenames • Monte Carlo simulation (?)
Bestachievers • Liuet al. nCS(0.378), nRS(0.398) • Brenes and Gayo cPopular(0.374) • Jansenet al.heuristic rules (0.360) • Promisingfeatures • nRS • usingdomains and websitenames • Monte Carlo simulation (?) usersbehaviour externalknowledge fillingthe gap
Conclusions and Future work Feasible evaluation methodology Pooling evaluation is unfeasible Larger tagged samples are needed Further refinement to improve performance