240 likes | 354 Views
CSE 454 Advanced Internet Systems Features for Relation Extraction. Dan Weld. Preprocessed Data Files. Each line corresponds to a sentence. "John likes eating sausage.". Preprocessed Data Files. Each line corresponds to a sentence. "John likes eating sausage.".
E N D
CSE 454 Advanced Internet SystemsFeatures for Relation Extraction Dan Weld
Preprocessed Data Files Each line corresponds to a sentence. "John likes eating sausage."
Preprocessed Data Files Each line corresponds to a sentence. "John likes eating sausage." • Grade School: “9 parts of speech in English” • Noun • Verb • Article • Adjective • Preposition • But: plurals, possessive, case, tense, aspect, …. • Pronoun • Adverb • Conjunction • Interjection
Preprocessed Data Files Each line corresponds to a sentence. "John likes eating sausage."
Learning Relational Extractors TRAINING SET Citigroup has taken over EMI, the British … Citigroup’s acquisition of EMI comes just ahead of … Google’s Adwords system has long included … Youtube. Input + + - Extractor Output Text R(a,b) tuples
Learning Relational Extractors TRAINING SET Citigroup has taken over EMI, the British … Citigroup’s acquisition of EMI comes just ahead of … Google’s Adwords system has long included … Youtube. Example <X1, …, Xk, Y> + + Label -
Features Citigroup has taken over EMI, the British … • NER tag of Arg1 • NER tag of Arg2 • Does word-53 (acquire) appear in span? • Consider all words? • Just use verbs & prepositions? • Does bigram-199 (take over) appear in span? • Trigrams? Xi =
Outside the Span Birthplace Relation Dan had lunch in Boston Returning to his birthplace, Dan had lunch in Boston Dan had lunch in Boston, his birthplace.
Proximity Birthplace Relation Dan, who was very tired from deadlines and cranky because of problems with his boss, was born in Boston
Proximity Birthplace Relation Dan, who was very tired from deadlines and cranky because of problems with his boss, was born in Boston born nsubj prep_in Dan Boston rcmod tired prepfrom prepfrom cranky deadlines
Proximity Birthplace Relation Dan, who was very tired from deadlines and a screaming baby, was born in Boston born nsubj prep_in Dan Boston rcmod tired prepfrom prepfrom baby deadlines screaming
Parsing Ambiguity S NP VP VP PP Papa V NP P NP Det Det N N ate with the a caviar spoon
Parsing Ambiguity Prepositional Phase Attachment Please Don’t Eat Me! S NP VP NP Papa V NP ate PP P NP Det N Det the N caviar with a spoon
Extracting grammatical relations from statistical constituency parsers S submitted VP NP agent nsubjpass auxpass VP VBD PP NP VBN PP Brownback Bills were IN NP prep_on nn NP IN NNS NN CC NNS ports NNP NNP Senator cc_and Bills on ports and immigration were submitted by Senator Brownback immigration [de Marneffe et al. LREC 2006] • Exploit the high-quality syntactic analysis done by statistical constituency parsers to get the grammatical relations [typed dependencies] • Dependencies are generated by pattern-matching rules
Preprocessed Data Files (S (NP (NNP John)) (VP (VBZ likes) (S (VP (VBG eating) (NP (NN sausage))))) (. .))
Why Extract Temporal Information? • Many relations and events are temporally bounded • a person's place of residence or employer • an organization's members • the duration of a war between two countries • the precise time at which a plane landed • … • Temporal Information Distribution • One of every fifty lines of database application code involves a date or time value (Snodgrass,1998) • Each news document in PropBank (Kingsbury and Palmer, 2002) includes eight temporal arguments 17 Slide from Dan Roth, HengJi, Taylor Cassidy, Quang Do TIE Tutorial
Time-intensive Slot Types 18 Slide from Dan Roth, HengJi, Taylor Cassidy, Quang Do TIE Tutorial
Temporal Expression Examples Reference Date = December 8, 2012 19 Slide from Dan Roth, HengJi, Taylor Cassidy, Quang Do TIE Tutorial
Temporal Expression Extraction • Rule-based (Strtotgen and Gertz, 2010; Chang and Manning, 2012; Do et al., 2012) • Machine Learning • Risk Minimization Model (Boguraev and Ando, 2005) • Conditional Random Fields (Ahn et al., 2005; UzZaman and Allen, 2010) • State-of-the-art: about 95% F-measure for extraction and 85% F-measure for normalization 20 Slide from Dan Roth, HengJi, Taylor Cassidy, Quang Do TIE Tutorial
Ordering events in discourse (1 ) John entered the room at 5:00pm. (2) It was pitch black. (3) It had been three days since he’d slept. State: John Slept Time: 3 days Event: John entered the room Time: 5pm Time: Now State: PitchBlack 21 21 Slide from Dan Roth, HengJi, Taylor Cassidy, Quang Do TIE Tutorial
Ordering events in time Speech (S), Event (E), & Reference (R) time (Reichenbach, 1947) Tense: relates R and S; Gr.Aspect: relates R and E R associated with temporal anaphora (Partee 1984) Order events by comparing R across sentences By the time Boris noticed his blunder, John had (already) won the game See Michaelis (2006) for a good explanation of tense and grammatical aspect 22 22 Slide from Dan Roth, HengJi, Taylor Cassidy, Quang Do TIE Tutorial
High-Level Architecture Text Distant Supervision Manual Labeling Feature Markup KB Training Data Wikifier Slot Patterns Extractor Learner Inference Manual Generation Tuples
Teams • Named Entity Linking (1) • Time (1) • Distant Supervision (1) • InstaRead (1) • Relation-Specific (3-5)