1 / 22

Identifying Implicit Relationships

Identifying Implicit Relationships. J. Chu-Carroll E.W. Brown Lally J.W. Murdock. Christine Boucher HON 111. Outline. Introduction to Implicit Relationships Spreading Activation Watson’s 3 Information Resources Application to COMMON BOND Questions Application to Missing Link Questions

eithne
Download Presentation

Identifying Implicit Relationships

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Implicit Relationships J. Chu-Carroll E.W. Brown Lally J.W. Murdock Christine Boucher HON 111

  2. Outline • Introduction to Implicit Relationships • Spreading Activation • Watson’s 3 Information Resources • Application to COMMON BOND Questions • Application to Missing Link Questions • Evaluation against Watson’s baseline system • Conclusion

  3. Introduction • Resolving an implicit reference to a hidden concept • Question Types: • COMMON BOND • Feet, eyebrows and McDonald’s have arches in common • Trout, loose change and compliments are things that you fish for • Missing Link questions • “The 1648 Peace of Westphalia ended a war that began on May 23 of this year.”

  4. Identifying the missing link “The 1648 Peace of Westphalia ended a war that began on May 23 of this year.” Peace of Westphalia Ended the Thirty Years’ War 1618

  5. Problem? • Need to identify concepts that are closely related to those given in the question… • …then use that information to solve for the final answer

  6. Spreading Activation • Theory of Spreading Activation • Originated in Cog Psych, used to explain semantic processing and lexical retrieval • Activation of a semantic network • Concepts in a network are activated through their connections to already active concepts

  7. Spreading Activation Algorithm • Measure concept relatedness on the basis of frequencies that concepts co-occur • Activation over natural-language text: Watson’s 3 Resources • n-gram corpus • PRISMATIC knowledge base • Wikipedia links • Fan size f, depth d • f-most-related concepts to current activated concept • Recursively invoked d times

  8. n-gram corpus • Contiguous sequence of n items from sequence of text or speech • Could be sequences of letters, syllables, words, etc. • 5-gram corpus: corpus of 5-word sequences from text • (with functionwords removed) • E.g. “Pineapples grow inthe tropical climate of Hawaii and taste sweet.” • Lexical collocation • retrieval of frequently collocated terms • leads to computation of semantic similarity • E.g. High collocation frequency between terns “JFK” and “airport” and “JFK” and “assassination”

  9. PRISMATIC knowledge base • Extracts frames and slots based on syntactic relationships • Syntactic frame – links arguments and predicates • Example frame: SVO (subject-verb-object) “Ford pardoned Nixon in 1974” (Ford, pardon, Nixon) • Query provides count of SVO tuplesw/ subject Ford, etc. • Other types of frames: SVPO and NPO • Counts for 3 frames are combined to compute total frequency of links between two terms and compute a relatedness score

  10. Covering the gaps left by n-gram • n-gram counts related words that appear lexically near e/o, while PRISMATIC counts words that are syntactically connected • “Ford did not act hastily but did finally pardon Nixon in September.”

  11. Wikipedia Links • Uses metadata encoded in Web documents rather than the texts themselves • Documents link to other documents • Target documents are typically closely related concepts to source documents

  12. Using Wikipedia links, continued • Capture semantic relatedness using article titles • Article titles represent canonical form of concepts --> higher likelihood of finding a common related concept given 2 or more concepts • Essence: given term t, we identify the Wikidocument whose title best matches t and return all target document titles from links in that document.

  13. Application to Common-bond Questions • The answers are all semantically related to the given entities • Calls for use of spreading activation • Identify concepts that are closely related to each given entity • Score each concept on basis of their degrees of relatedness to all given entities

  14. Candidate Generation • Spreading activation invoked on each entity • Example: Bobby, bowling, rolling (pins) • bobby: Robert, British police officer, pin • bowling: lane, strike, 300, pin • rolling: Rolling Stone, ramp, pin • Related concepts found are generated as candidate answers • strike, British police officer, Rolling Stone, pin, ramp • Search n-gram corpus for most frequently collocated terms

  15. Common-bond answer scorer • Candidates scored on basis of semantic relatedness to each given entity • Relatedness of ‘strike, British police officer, Rolling Stone, pin, ramp’ to ‘bobby,’ ‘bowling,’ ‘rolling’ • Multiply 3 NGD (Normalized Google Distance) scores for overall goodness score of candidate as common bond answer • f(Bobby, pin) xf(bowling, pin) xf(rolling, pin) = pin’s score • f(Bobby, ramp) xf(bowling, ramp) xf(rolling, ramp) = ramp’s score • ‘pin’ wins!

  16. Application to Missing-link Questions • Q’s in which a missing entity is either explicitly or implicitly referred to (often Final Jeopardy! questions) • “On hearing of the discovery of George Mallory’s body, this explorer still thinks he was first.” (Answer: “Edmund Hillary”) George Mallory Mount Everest Edmund Hillary • 3-step solving : Missing link identification & candidate generation and scoring

  17. Missing link identification • 2 criteria: highly related to concepts in the question and must be ruled out as a possible correct final answer • Search for semantically highly associated entities to key concepts in Q • Many are actually the correct final answer, so can’t be the missing link • Attempt to definitively rule out possible correct final answers • Wrong answer type (e.g. “Thirty Years’ War” appears as a high-association answer but is not of the right answer type “year” and thus is a prime candidate as a missing link.)

  18. Candidate generation using missing links • Invoke system again using missing links in search process • Hope that new search results include correct answers that previously failed to be generated • New search queries produced by augmenting each existing query with a missing link • “The 1648 Peace of Westphalia ended a war that began on May 23 of this year.” • Peace of Westphalia, Thirty Years’ War, began, May 23 • Focuses search on key concepts from Q with additional bias toward the inferred missing link

  19. Missing-link answer scorer • Second iteration produces list of answers ranked by confidence • Developed new scorers for scoring semantic relatedness of candidate answer and concepts in the question via the identified missing link • For each candidate-answer and missing-link pair, compute semantic relatedness score using spreading-activation process • E.g. Given its strong association with George Mallory, it is fairly straightforward to identify “Mount Everest” as a missing link. • Compute relatedness scores of missing link and candidate answers; (Mt. Everest, Apa Sherpa), (Mt. Everest, Edmund Hillary), (Mt. Everest, Jordan Romero) • Edmund Hillary wins!

  20. Evaluation against the baseline system • baseline technique stumbles on a high-scoring candidate that is strongly related to just one of the clue phrases • E.g. “COMMON BONDS: Spice, interrupted, Georgy” • Baseline system prefers “Girl, Interrupted” • Common Bond Answer Generator is able to prefer “girls,” which is associated with all three clue phrases

  21. Evaluation against the baseline system • a)oldtop answer becomes missing link (1,2) • b)initialanswer incorrect but of correct type • when missing link taken into consideration before final answer generation, aids promotion of correct answer candidate to top position

  22. Conclusion • Spreading activation approach for concept expansion and measuring semantic relatedness • Implemented three new knowledge resources • n-gram corpus – semantic relatedness based on lexical collocation • PRISMATIC knowledge base – relatedness of concepts based on syntactic collocation • Wiki links – metadata from link structures to indicate semantic relatedness • Process identifies missing semantic associations between concepts and improves performance on common-bond and Final Jeopardy! questions

More Related