410 likes | 628 Views
Local and Global Algorithms for Disambiguation to Wikipedia. Lev Ratinov 1 , Dan Roth 1 , Doug Downey 2 , Mike Anderson 3 1 University of Illinois at Urbana-Champaign 2 Northwestern University 3 Rexonomy. March 2011 . Information overload. Organizing knowledge.
E N D
Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov1, Dan Roth1, Doug Downey2, Mike Anderson3 1University of Illinois at Urbana-Champaign 2Northwestern University 3Rexonomy March 2011
Organizing knowledge It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. ChicagoVIIIwas one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.
Cross-document co-reference resolution It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. ChicagoVIIIwas one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.
Reference resolution: (disambiguation to Wikipedia) It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. ChicagoVIIIwas one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.
The “reference” collection has structure It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. ChicagoVIIIwas one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Is_a Is_a Used_In Released Succeeded
Analysis of Information Networks It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. ChicagoVIIIwas one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.
Here – Wikipedia as a knowledge resource …. but we can use other resources Is_a Is_a Used_In Released Succeeded
Talk outline • High-level algorithmic approach. • Bi-partite graph matching with global and local inference. • Local Inference. • Experiments & Results • Global Inference. • Experiments & Results • Results, Conclusions • Demo
Problem formulation - matching/ranking problem Text Document(s)—News, Blogs,… Wikipedia Articles
Local approach Text Document(s)—News, Blogs,… Wikipedia Articles • Γ is a solution to the problem • A set of pairs (m,t) • m: a mention in the document • t: the matched Wikipedia Title
Local approach Text Document(s)—News, Blogs,… Wikipedia Articles • Γ is a solution to the problem • A set of pairs (m,t) • m: a mention in the document • t: the matched Wikipedia Title Local score of matching the mention to the title
Local + Global : using the Wikipedia structure Text Document(s)—News, Blogs,… Wikipedia Articles A “global” term – evaluating how good the structure of the solution is
Can be reduced to an NP-hard problem Text Document(s)—News, Blogs,… Wikipedia Articles
A tractable variation Text Document(s)—News, Blogs,… Wikipedia Articles • Invent a surrogate solution Γ’; • disambiguate each mention independently. • Evaluate the structure based on pair-wise coherence scores Ψ(ti,tj)
Talk outline • High-level algorithmic approach. • Bi-partite graph matching with global and local inference. • Local Inference. • Experiments & Results • Global Inference. • Experiments & Results • Results, Conclusions • Demo
I. Baseline : P(Title|Surface Form) P(Title|”Chicago”)
II. Context(Title) Context(Charcoal)+= “a font called __ is used to”
III. Text(Title) Just the text of the page (one per title)
Putting it all together • City Vs Font: (0.99-0.0001, 0.01-0.2, 0.03-0.01) • Band Vs Font: (0.001-0.0001, 0.001-0.2, 0.02-0.01) • Training ranking SVM: • Consider all title pairs. • Train a ranker on the pairs (learn to prefer the correct solution). • Inference = knockout tournament. • Key: Abstracts over the text – learns which scores are important.
Example: font or city? Text(Chicago_city), Context(Chicago_city) It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_font), Context(Chicago_font)
Lexical matching Text(Chicago_city), Context(Chicago_city) Cosine similarity, TF-IDF weighting It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_font), Context(Chicago_font)
Ranking – font vs. city Text(Chicago_city), Context(Chicago_city) 0.2 0.8 0.5 0.1 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. 0.2 0.5 0.3 0.3 Text(Chicago_font), Context(Chicago_font)
Train a ranking SVM Text(Chicago_city), Context(Chicago_city) (0.5, 0.2 , 0.1, 0.8) It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. [(0.2, 0, -0.2, 0.3), -1] (0.3, 0.2, 0.3, 0.5) Text(Chicago_font), Context(Chicago_font)
Scaling issues – one of our key contributions Text(Chicago_city), Context(Chicago_city) It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_font), Context(Chicago_font)
Scaling issues Text(Chicago_city), Context(Chicago_city) This stuff is big, and is loaded into the memory from the disk It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_font), Context(Chicago_font)
Improving performance Text(Chicago_city), Context(Chicago_city) Rather than computing TF-IDF weighted cosine similarity, we want to train a classifier on the fly. But due to the aggressive feature pruning, we choose PrTFIDF It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_font), Context(Chicago_font)
Talk outline • High-level algorithmic approach. • Bi-partite graph matching with global and local inference. • Local Inference. • Experiments & Results • Global Inference. • Experiments & Results • Results, Conclusions • Demo
Co-occurrence(Title1,Title2) The city senses of Boston and Chicago appear together often.
Co-occurrence(Title1,Title2) Rock music and albums appear together often
Global ranking • How to approximate the “global semantic context” in the document”? (What is Γ’?) • Use only non-ambiguous mentions for Γ’ • Use the top baseline disambiguation for NER surface forms. • Use the top baseline disambiguation for all the surface forms. • How to define relatedness between two titles? (What is Ψ?)
Ψ : Pair-wise relatedness between 2 titles: Normalized Google Distance Pointwise Mutual Information
What is best the Γ’? (ranker accuracy, solvable mentions)
Talk outline • High-level algorithmic approach. • Bi-partite graph matching with global and local inference. • Local Inference. • Experiments & Results • Global Inference. • Experiments & Results • Results, Conclusions • Demo
Conclusions: • Dealing with a very large scale knowledge acquisition and extraction problem • State-of-the-art algorithmic tools that exploit usingcontent & structure of the network. • Formulated a framework for Local & Global reference resolution and disambiguation into knowledge networks • Proposed local and global algorithms: state of the art performance. • Addressed scaling issue: a major issue. • Identified key remaining challenges (next slide).
We want to know what we don’t know • Not dealt well in the literature • “As Peter Thompson, a 16-year-old hunter, said ..” • “Dorothy Byrne, a state coordinator for the Florida Green Party…” • We train a separate SVM classifier to identify such cases. The features are: • All the baseline, lexical and semantic scores of the top candidate. • Score assigned to the top candidate by the ranker. • The “confidence” of the ranker on the top candidate with respect to second-best disambiguation. • Good-Turing probability of out-of-Wikipedia occurrence for the mention. • Limited success; future research.
Comparison to the previous state of the art (all mentions, including OOW)