200 likes | 550 Views
TextNet – A Text-Based Intelligent System Sanda Harabagiu Dan Moldovan as (mis-)interpreted by Peter Clark Introduction Overall goal: Given a sentence/paragraph, create a representation of the unstated, extra knowledge (“context”) which it suggests.
E N D
TextNet – A Text-Based Intelligent System Sanda Harabagiu Dan Moldovan as (mis-)interpreted by Peter Clark
Introduction • Overall goal: • Given a sentence/paragraph, create a representation of the unstated, extra knowledge (“context”) which it suggests. • Input: sentence graph; Output: bigger, richer graph • Purpose: Question-answering etc. (?) • Sources of this extra knowledge: • (Extended) WordNet • the Internet
WordNet • Organized around concepts (“synsets”), not words • Contains: • ~100k concepts (“synsets”) • ~350k connections (14 types) • English definitions (“glosses”) for most synsets {“athletic game”} 132132 “Game involving athletic activity.” isa {“tennis”, “lawn tennis”} 433243 “A game played with rackets by twp or four players who hit the ball over a net that divides the court.”
WordNet • Organized around concepts (“synsets”), not words • Contains: • ~100k concepts (“synsets”) • ~350k connections (14 types) • English definitions (“glosses”) for most synsets {“athletic game”} “Game involving athletic activity.” athletic game isa {“tennis”, “lawn tennis”} “A game played with rackets by twp or four players who hit the ball over a net that divides the court.” tennis
Extended WordNet • Disambiguate and transform glosses into network representations. “Tennis court: A court in which tennis is played.” def location-of tennis court court play object tennis {“tennis”, “lawn tennis”}
Extended WordNet • Disambiguate and transform glosses into network representations. “Serve: A stroke in tennis that puts the ball in play.” def agent serve stroke put object manner context tennis ball play
Extended WordNet • Resulting structure is no longer just a big graph Original WordNet Processed Glossary Definitions def ball ball def Concepts in context (particular subtypes/ situations for concepts) “Raw” concepts (isa hierarchy, other relations)
“The kid hit the ball very hard.” hit agent manner object kid ball hard Part I: Adding Relevant, Contextual Knowledge from WordNet
“The kid hit the ball very hard.” hit agent manner object kid ball hard “Inference Extraction” • Goals: • provide supplementary information about a sentence • explain relation between sentences • Approach: • Deductive inference (e.g., “snore –entails sleep”) • Find and add information into the sentence representation • Challenge: • Many possible connections
Path-finding To find path(s) between A and B: • use spreading activation/marker passing: • place markers at A and B • propogate markers to neighboring nodes • at quiescence, look for marker collisions • “Propogation rules” determine when to propogate • “asymmetric and transitive relations are more useful” • “going up the isa hierarchy allows hierarchical deductions” • “the same is true for relations such as entail and causation. For example, if a man is snoring, then he is sleeping, and further he is temporarily unconscious.”
“The kid hit the ball very hard.” hit agent manner object kid ball hard • Find connections which “explain” these relations within context of tennis within context of ball context agent isa isa object-of hit game play player person kid within context of tennis within context of ball agent agent-of object context object-of hit game play player hit ball
“The kid hit the ball very hard.” hit agent manner object kid ball hard • Find connections which “explain” these relations within context of return within context of drive manner-of gloss (“isa”) gloss (“isa”) context hard return stroke tennis within context of tennis agent agent-of object-of game play player hit
Inter-sentential Global Context • Find connections between “local contexts” S1: The kid hit the ball very hard. S2: It landed almost always near the baseline. within context of move isa gloss (“isa”) object isa hit move change location within context of destination within context of arrive gloss (“isa”) object gloss (“isa”) isa place destination reach arrive land
Is WordNet (or a dictionary) sufficient to fully build the context? “GPS systems are used for hiking.” • QN: Can we relate “GPS” and “hiking” using a dictionary? • From Oxford Dictionary: • “GPS: a navigation system” • “Hiking: long walk in the countryside taken for pleasure” • “Walk: place or track or route for foot passengers” • “Route: course or way taken from starting point to destination” • But: • Missing knowledge that hiking involves following/navigating a particular trail, as opposed to just wandering aimlessly
Finding and Adding Extra, Contextual Knowledge from the Internet • WordNet doesn’t contain all the background K • So can we addextra K using other texts too? • run-time, extra elaboration of current graph • further expansion of WordNet? • Approach: • Start with some initial “seed” text • Retrieve paragraphs containing relevant words • Elaborate their “local and global contexts” • Determine relevance using a similarity measure • Select “the most appropriate new context” • Add its graph (or parts of it?) to the original graph
Finding Relevant Documents • Two problems: • Discovery: Which keywords to search with? • use words in the original seed text, or closely related words • e.g., “play AND (tennis OR ball OR baseline) AND hit” • Quality: How relevant are the results? • measure the degree of overlap of graphs for seed and new texts • Lexical ambiguity is a root problem • Disambiguation by assuming new words belong to same/close synsets as in the original query (dubious!)
A Real Example… • Text: about player who gets tendinis from hitting ball too hard • Build initial graph of sentences (but info missing) • Look for additional information on Internet • try multiple queries • select the best result (= graph most coherent with original text) • layer this graph on top of the original text graph • Original text + WordNet: • hit –isaaffect isa- injure –result injury • hit –purpose land –location backline • Internet text: • backline –result ace • WordNet • ace –isa serve –attr unreachable –purpose win • Hence (!) • “Winning is the motivation for actions causing tennis injuries”
Summary • Interesting, ambitious • Right idea (used by others too) • Didn’t work (?); no further publications on TextNet • Critical details not clear from the paper • Problem finding good connections, rather = avoiding finding bad connections