220 likes | 327 Views
Identifying Implicit Relationships. J. Chu-Carroll E.W. Brown Lally J.W. Murdock. Christine Boucher HON 111. Outline. Introduction to Implicit Relationships Spreading Activation Watson’s 3 Information Resources Application to COMMON BOND Questions Application to Missing Link Questions
E N D
Identifying Implicit Relationships J. Chu-Carroll E.W. Brown Lally J.W. Murdock Christine Boucher HON 111
Outline • Introduction to Implicit Relationships • Spreading Activation • Watson’s 3 Information Resources • Application to COMMON BOND Questions • Application to Missing Link Questions • Evaluation against Watson’s baseline system • Conclusion
Introduction • Resolving an implicit reference to a hidden concept • Question Types: • COMMON BOND • Feet, eyebrows and McDonald’s have arches in common • Trout, loose change and compliments are things that you fish for • Missing Link questions • “The 1648 Peace of Westphalia ended a war that began on May 23 of this year.”
Identifying the missing link “The 1648 Peace of Westphalia ended a war that began on May 23 of this year.” Peace of Westphalia Ended the Thirty Years’ War 1618
Problem? • Need to identify concepts that are closely related to those given in the question… • …then use that information to solve for the final answer
Spreading Activation • Theory of Spreading Activation • Originated in Cog Psych, used to explain semantic processing and lexical retrieval • Activation of a semantic network • Concepts in a network are activated through their connections to already active concepts
Spreading Activation Algorithm • Measure concept relatedness on the basis of frequencies that concepts co-occur • Activation over natural-language text: Watson’s 3 Resources • n-gram corpus • PRISMATIC knowledge base • Wikipedia links • Fan size f, depth d • f-most-related concepts to current activated concept • Recursively invoked d times
n-gram corpus • Contiguous sequence of n items from sequence of text or speech • Could be sequences of letters, syllables, words, etc. • 5-gram corpus: corpus of 5-word sequences from text • (with functionwords removed) • E.g. “Pineapples grow inthe tropical climate of Hawaii and taste sweet.” • Lexical collocation • retrieval of frequently collocated terms • leads to computation of semantic similarity • E.g. High collocation frequency between terns “JFK” and “airport” and “JFK” and “assassination”
PRISMATIC knowledge base • Extracts frames and slots based on syntactic relationships • Syntactic frame – links arguments and predicates • Example frame: SVO (subject-verb-object) “Ford pardoned Nixon in 1974” (Ford, pardon, Nixon) • Query provides count of SVO tuplesw/ subject Ford, etc. • Other types of frames: SVPO and NPO • Counts for 3 frames are combined to compute total frequency of links between two terms and compute a relatedness score
Covering the gaps left by n-gram • n-gram counts related words that appear lexically near e/o, while PRISMATIC counts words that are syntactically connected • “Ford did not act hastily but did finally pardon Nixon in September.”
Wikipedia Links • Uses metadata encoded in Web documents rather than the texts themselves • Documents link to other documents • Target documents are typically closely related concepts to source documents
Using Wikipedia links, continued • Capture semantic relatedness using article titles • Article titles represent canonical form of concepts --> higher likelihood of finding a common related concept given 2 or more concepts • Essence: given term t, we identify the Wikidocument whose title best matches t and return all target document titles from links in that document.
Application to Common-bond Questions • The answers are all semantically related to the given entities • Calls for use of spreading activation • Identify concepts that are closely related to each given entity • Score each concept on basis of their degrees of relatedness to all given entities
Candidate Generation • Spreading activation invoked on each entity • Example: Bobby, bowling, rolling (pins) • bobby: Robert, British police officer, pin • bowling: lane, strike, 300, pin • rolling: Rolling Stone, ramp, pin • Related concepts found are generated as candidate answers • strike, British police officer, Rolling Stone, pin, ramp • Search n-gram corpus for most frequently collocated terms
Common-bond answer scorer • Candidates scored on basis of semantic relatedness to each given entity • Relatedness of ‘strike, British police officer, Rolling Stone, pin, ramp’ to ‘bobby,’ ‘bowling,’ ‘rolling’ • Multiply 3 NGD (Normalized Google Distance) scores for overall goodness score of candidate as common bond answer • f(Bobby, pin) xf(bowling, pin) xf(rolling, pin) = pin’s score • f(Bobby, ramp) xf(bowling, ramp) xf(rolling, ramp) = ramp’s score • ‘pin’ wins!
Application to Missing-link Questions • Q’s in which a missing entity is either explicitly or implicitly referred to (often Final Jeopardy! questions) • “On hearing of the discovery of George Mallory’s body, this explorer still thinks he was first.” (Answer: “Edmund Hillary”) George Mallory Mount Everest Edmund Hillary • 3-step solving : Missing link identification & candidate generation and scoring
Missing link identification • 2 criteria: highly related to concepts in the question and must be ruled out as a possible correct final answer • Search for semantically highly associated entities to key concepts in Q • Many are actually the correct final answer, so can’t be the missing link • Attempt to definitively rule out possible correct final answers • Wrong answer type (e.g. “Thirty Years’ War” appears as a high-association answer but is not of the right answer type “year” and thus is a prime candidate as a missing link.)
Candidate generation using missing links • Invoke system again using missing links in search process • Hope that new search results include correct answers that previously failed to be generated • New search queries produced by augmenting each existing query with a missing link • “The 1648 Peace of Westphalia ended a war that began on May 23 of this year.” • Peace of Westphalia, Thirty Years’ War, began, May 23 • Focuses search on key concepts from Q with additional bias toward the inferred missing link
Missing-link answer scorer • Second iteration produces list of answers ranked by confidence • Developed new scorers for scoring semantic relatedness of candidate answer and concepts in the question via the identified missing link • For each candidate-answer and missing-link pair, compute semantic relatedness score using spreading-activation process • E.g. Given its strong association with George Mallory, it is fairly straightforward to identify “Mount Everest” as a missing link. • Compute relatedness scores of missing link and candidate answers; (Mt. Everest, Apa Sherpa), (Mt. Everest, Edmund Hillary), (Mt. Everest, Jordan Romero) • Edmund Hillary wins!
Evaluation against the baseline system • baseline technique stumbles on a high-scoring candidate that is strongly related to just one of the clue phrases • E.g. “COMMON BONDS: Spice, interrupted, Georgy” • Baseline system prefers “Girl, Interrupted” • Common Bond Answer Generator is able to prefer “girls,” which is associated with all three clue phrases
Evaluation against the baseline system • a)oldtop answer becomes missing link (1,2) • b)initialanswer incorrect but of correct type • when missing link taken into consideration before final answer generation, aids promotion of correct answer candidate to top position
Conclusion • Spreading activation approach for concept expansion and measuring semantic relatedness • Implemented three new knowledge resources • n-gram corpus – semantic relatedness based on lexical collocation • PRISMATIC knowledge base – relatedness of concepts based on syntactic collocation • Wiki links – metadata from link structures to indicate semantic relatedness • Process identifies missing semantic associations between concepts and improves performance on common-bond and Final Jeopardy! questions