340 likes | 448 Views
Learning Joint Query Interpretation and Response Ranking. Uma Sawant Soumen Chakrabarti IIT Bombay. Searching the “Web of things”. At least 14% of Web search queries mention target type or category. Lin et. al., WWW 2012. Telegraphic entity search queries.
E N D
Learning Joint Query Interpretation and Response Ranking Uma SawantSoumen Chakrabarti IIT Bombay
Searching the “Web of things” At least 14% of Web search queries mention target type or category Lin et. al., WWW 2012
Telegraphic entity search queries • No reliable syntax clues for the search engine • Free word order • No or rare capitalization • Rare to find quoted phrases • Few function or relational words
How to answer entity queries?(simplified view of related work) Telegraphic NLQ 2-stage process Template Knowledge base Query Interpretation e1 e2 e3 Execution Ready Query Ranking
Our Proposal Annotated Corpus Joint Query Interpretation and Ranking Interpretation Interpretation Interpretation Generative and Discriminative models e1 e2 e3 Telegraphic Query response response response Multiple Interpretations
The annotated Web Type: All Type hierarchy subTypeOf Type: Major_league_baseball_teams instanceOf Entity: San_Diego_Padres Annotateddocument mentionOf … By comparison, the Padres have been to two World Series, losing in 1984 and 1998. …
Query = type hints + word matchers • Large type catalog • Most query words match some type • Padres rarely co-occurs with hockey • Can know this only from corpus stats Incorrect type:World_Series_Hockey_teams Query: losingteam baseball world series1998 Query: losingteam baseball world series1998 Query: losingteam baseball world series1998
Query = type hints + word matchers • Large type catalog • Most query words match some type • Padres rarely co-occurs with hockey • Can know this only from corpus stats • Need joint type inference and snippet scoring Query: losingteambaseballworld series1998 Correct Type:Major_league_baseball_teams Word matches instanceOf Entity: San Diego Padres mentionOf By comparison, the Padres have been to twoWorld Series, losing in 1984 and 1998. Evidence snippet
Generative model : generate query from entity E San Diego Padres context type Padres have been to two World Series, losing in 1984 and 1998 Major league baseball team T model model Type hint : baseball , team Context matchers : lost , 1998, world series switch Z q losing team baseball world series 1998 losing team baseball world series 1998
Generative approach : plate diagram Type description language model “Switch” variables: word hints at type or is a matcher? Generate query word T W Z For each query word… Choose type to describe entity E Entity context language model Choose entity For each query matchers hints
Discriminative model : separatecorrect and incorrect entities q : losing team baseball world series 1998 San_Diego_Padres 1998_World_Series losing team baseball world series 1998 (baseball team) losing team baseball world series 1998 (baseball team) losing team baseball world series 1998 (series) losing team baseball world series 1998 (t = baseball team) losing team baseball world series 1998 (series) losing team baseball world series 1998 (t = series) Chakrabarti
Feature vector design inspired by generative Generative: Models entity prior Compatibility between hint words and type Discriminative: Hints Matchers Feature vector given query, entity, type, switches Models type prior Pr(t|e) Compatibility between matchers and snippets that mention e
Discriminative framework • Constraints are formulated using the best scoring interpretation • Non-convex formulation • Annealing algorithms
Testbed • YAGO entity and type catalog • ~0.2 million types and 1.9 million entities • Annotated corpus • Web corpus having 500 million pages • ~ 16 annotations per page • ~700 entity search queries • TREC + INEX • Converted to telegraphic form, with most probable type and answer entities
Experiment 1 : Entity ranking using joint inference • To reach : Human recommended type • To surpass : Most generic type in catalog (no type inference) • Entity level ndcg measure (map and mrr follow the same trend, details in paper)
Human > Discriminative > Generative > Generic Human > ?? > Generic 0.8 0.7 0.6 0.5 NDCG 0.4 human 0.3 discriminative generative 0.2 generic 0.1 Rank 1 2 3 4 5 6 7 8 9 10 • Generative significantly better than generic (lower) • Generative fills 28% gap to human (upper) • Discriminative significantly better than generic (lower) • Discriminative fills 43% gap to human (upper) • Discriminative significantly better than generative • Easier to handle balance diverse scales of probabilities
Generic v/s discriminative Correct hint match & type choice cathedral claude monet painting Incorrect hint match & type choice amazing grace hymn writer
Discriminative better than human • Correct entity unreachable from human recommended type • discriminative recovers using corpus feedback Discriminative patsy cline producer patsy cline producer producer manufacturer Owen Bradley
Experiment 2 : Target Type Inference • Aggregate ranks of top-k interpretations to rank types • Compare type-level ndcg with B&N 2012 possible target type hermitage museum bank river (river) river museum building hermitage museum bank river (museum) . . . k . . . hermitage museum bankriver (building)
Joint prediction improves type inference • Data : [B&N 2012], Dbpedia catalog • Joint prediction improves type inference too!
Experiment 3 : joint v/s two-stage • Two-stage • Best type prediction from experiment (2) • Launch type restricted query on annotated corpus • Top m types to improve recall • Measure entity-level ndcg Stage 1 Type inference Stage 2 Ranking Form query river museum building (river) + matchers (river OR museum) + matchers Ranking
Joint entity ranking ?? two-stage Joint entity ranking better than two-stage 0.6 0.5 NDCG 0.4 Joint 2stage(m=1) 0.3 2stage(m=5) 2stage(m=10) 0.2 Rank 1 2 3 4 5 6 7 8 9 10 • Not much difference with the benefit of more types in 2-stage • Joint type prediction and ranking significantly better than 2-stage
Conclusion • Large percentage of Web search queries contain a mention of the target type • Identification of target type hint words and type itself is rewarding, but non-trivial • Joint query interpretation and ranking approach significantly better than two stage • Joint prediction improves type inference • Datasets available at bit.ly/WSpxvr
References • Patrick Pantel, Thomas Lin, Michael Gamon: Mining Entity Types from Query Logs via User Intent Modeling. ACL (1) 2012: 563-571 • K. Balog and R. Neumayer: Hierarchical Target Type Identification for Entity-oriented Queries, In CIKM 2012, October 2012 • T. Lin, P. Pantel, M. Gamon, A. Kannan, A. Fuxman: Active Objects: Actions for Entity-Centric Search, WWW 2012
Components of the model • Entity prior • (Weighted) fraction of snippets attached to an entity in the corpus • Type • Generality or specificity of types • Hint-type compatibility • Probability of generating hint words from a language model built using type description • Hint sub-sequence matches some type name exactly • Matcher-entity compatibility • Weighted fraction of snippets attached to an entity, retrieved using matchers • Rarity of matchers + number of supporting snippets Chakrabarti
Implementation details • Additive features • One generic query executed on index, rest in memory • Pruned large search space using easy heuristics • Continuous hint words
Query:ymcalyrics Query:ymcaaddress Entity:YMCA_(org) Entity:YMCA_(song) Learn topic model Learn topic model instanceOf instanceOf Type: Organization Type: Music Not entity disambiguation in query • ymca in query refers to song or organization? • Similar to entity disambiguation in documents • Uses accompanying words • Misinterpreting target type: usually disastrous • Avoid early or hard commitment
Future work • Better type description model • More generic query than “hint+matchers” • Entities as literals • Different models • Explore non-linear models (boosting) • List-wise loss • Use click data
Generative framework Type description language model “Switch” variables: decide if word hints at type or is a matcher Generate query word T W Z Choose type to describe entity E For each query word… Choose entity to describe Entity context language model For each query…
Discriminative framework Models entity prior Compatibility between hint words and type Feature vector given query, entity, type, switches Hints Matchers Models type prior Pr(t|e) Compatibility between matchers and snippets that mention e Given q, score of response e is: Ranking model trained by distant supervision
Joint entity ranking better than two-stage • State of the art target type predictor • Does not use corpus information • Pick top k types to improve type recall • Launch type-restricted query on annotated corpus • Significantlyworse than jointtype predictionand ranking
How to answer entity queries?(simplified viewof related work) Annotated Corpus Telegraphic Knowledge NLQ RDF tuples Tables Template 2-stage process Query Interpretation e1 e2 e3 Execution Ready Query Ranking