170 likes | 340 Views
Intent Mining from Search Results. Jan Pedersen. Outline. Intro to Web Search Free text queries Architecture Why it works Result Set Mining Disambiguation Correction Amplification. The Worst Interface ( ca 1990). The Search Interface ( ca 2010). Search wasn’t always like this.
E N D
Intent Mining from Search Results Jan Pedersen
Outline • Intro to Web Search • Free text queries • Architecture • Why it works • Result Set Mining • Disambiguation • Correction • Amplification
The Worst Interface (ca 1990) The Search Interface (ca 2010)
Search wasn’t always like this ttl/(tennis and (racquet or racket))isd/1/8/2002 and motorcyclein/newmar-julie Source: USPTO
Salton’s Contribution • Free text queries • Approximate matching • Relevance ranking • Exploit redundancy • Meta data • Scored-OR Source: cs.cornell.edu
Life of a query Gerry Salton (Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7])) Index • Separation between user query and backend query • Relevance scoring and ranking • Query-in-context summaries
Query Expansion • [Gerry Salton] [Gerry Salton Cornell] • Disambiguation via Expansion • Pseudo Relevance Feedback (Evans)
Life of a query (2) Gerry Salton (Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7])) Gerry Salton Gerry Salton Cornell Index • Result Set Analysis • Automated Query expansion • Reranking
Spelling Correction Scored-AND(200, OR(“britinay”, “britney”), OR(“spares”, “spears”)) Blend(Scored-AND(200, “britinay”, “spares”), Scored-AND(200, “britney”, “spears”)) • Session Log Mining • Multiple queries with Blending • Behavioral feedback loop
Web Search Gerry Salton • Speller • Synonyms Third Stage reRanking: 50 (Scored-AND 200,”Gerry”, “Salton”) Second Stage reRanking: 5K First Stage reRanking: 100K News Index Local Index Index Index Index Index 100B • Query Understanding • Federation • ReRanking and Blending
Entity Detection • Grouping • Summarization
Post Result Triggering • Alternative to Answer Blending • Structured Data integration • Off-page data joins
Grouping • Reranked Results • Compressed Presentation • Coherently grouped
Summary • Web Queries are not User Intent • Suffer from ambiguity and errors • Intent can be mined from results • Query Correction • Disambiguation • Grouping and Organization