370 likes | 515 Views
Forest-based Semantic Role Labeling. Hao Xiong, Haitao Mi , Yang Liu and Qun Liu Institute of Computing Technology Academy of Chinese Sciences. Semantic Role Labeling. Given a sentence and its verbs Identify the arguments of the verbs Assign semantic labels (the roles they play).
E N D
Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy of Chinese Sciences AAAI 2010, Atlanta
Semantic Role Labeling • Given a sentence and its verbs • Identify the arguments of the verbs • Assign semantic labels (the roles they play) This company last year 1000 cars in the U.S. This company last year sold 1000 cars in the U.S. ArgMod -LOCation Agent Patient ArgMod -TeMPoral PropBank (Kingsbury and Palmer 2002)
One Conventional Approach therole ofCelimene is played by Kim Cattrall Agent Patient AAAI 2010, Atlanta
One Conventional Approach S VP NP VP PP NP AUX VBN PP therole ofCelimene is played by Kim Cattrall Patient Agent AAAI 2010, Atlanta
One Conventional Approach more than 15% S ? VP VP PP NP AUX VBN PP therole ofCelimene is played by Kim Cattrall Patient Agent AAAI 2010, Atlanta
… Solution • k-best parses: • limited scope: k • too much redundancy S 25<50<26 … VP 1 2 k 3 NP VP PP NP AUX VBN PP S VP VP PP NP AUX VBN PP … AAAI 2010, Atlanta
Our Solution • Forest • A compact representation of many parses • By sharing common sub-derivations • Polynomial-space encoding of exponentially large set S VP NP … VP PP NP AUX VBN PP S VP S VP Unpack PP NP AUX VBN PP VP NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
Our Solution • Forest • A compact representation of many parses • By sharing common sub-derivations • Polynomial-space encoding of exponentially large set S VP NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
Outline • Tree-based Semantic Role Labeling • Parsing • Selecting candidates • Extracting features • Classifying • Forest-based Semantic Role Labeling • Experiments • Conclusion AAAI 2010, Atlanta
Parsing S NP VP NP DT NN JJ NN VBD NP PP CD NNS IN NP This company last year sold 1000 cars in DT NNP the U.S. AAAI 2010, Atlanta
Selecting Candidates S NP VP NP DT NN JJ NN VBD NP PP CD NNS IN NP This company last year sold 1000 cars in DT NNP the U.S. AAAI 2010, Atlanta
Extracting Features Path to the predicate NPSVPVBN S NP VP NP DT NN JJ NN VBD NP PP CD NNS NNS IN NP This This company company last last year year sold 1000 1000 cars cars in in DT NNP the the U.S. U.S. AAAI 2010, Atlanta
Extracting Features Position: left NPSVPVBN S left NP VP NP DT NN JJ NN VBD NP PP CD NNS IN NP This This company company last last year year sold 1000 1000 cars cars in in DT NNP the the U.S. U.S. AAAI 2010, Atlanta
Extracting Features Head word: company NPSVPVBN S left company NP VP NP DT NN JJ NN VBD NP PP CD NNS IN NP This This company company last last year year sold 1000 1000 cars cars in in DT NNP the the U.S. U.S. AAAI 2010, Atlanta
Extracting Features Head POS tag: NN NPSVPVBN S left company NN NP VP NP … DT NN JJ NN VBD NP PP CD NNS IN NP This This company company last last year year sold 1000 1000 cars cars in in DT NNP the the U.S. U.S. AAAI 2010, Atlanta
Classifying S(Agent)=0.8 S(Patient)=0.1 S(None)=0.1 … Computing Score using a trained classifier S(AM-LOC)=0.9 S(Agent)=0.1 S(None)=0.1 … S NP VP NP DT NN JJ NN VBD NP PP CD NNS IN NP This This company company last last year year sold 1000 1000 cars cars in in DT NNP S(Agent)=0.1 S(Patient)=0.1 S(None)=0.5 … S(AM-TMP)=0.9 S(Patient)=0.1 S(None)=0.1 … S(Agent)=0.2 S(Patient)=0.8 S(None)=0.1 … the the U.S. U.S.
Classifying S(Agent)=0.8 … Best score for each constituent Simply sort them Choose the best label sequence S(AM-LOC)=0.9 … S NP VP NP DT NN JJ NN VBD NP NP PP CD NNS IN NP This This company company last last year year sold 1000 1000 cars cars in in DT NNP S(None)=0.5 … S(AM-TMP)=0.9 … S(Patient)=0.8 … the the U.S. U.S.
Classifying S NP VP NP DT NN JJ NN VBD NP PP CD NNS IN NP This This company company last last year year sold 1000 1000 cars cars in in DT NNP Agent AM-TMP V Patient the the U.S. U.S. AM-LOC
Outline • Tree-based Semantic Role Labeling • Forest-based Semantic Role Labeling • Parsing into a forest • Selecting candidates • Extracting features on forest • Classifying • Experiments • Conclusion AAAI 2010, Atlanta
Forest Hyper-graph Hyper-edge Node S VP therole ofCelimene is played by Kim Cattrall NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
Selecting Candidates S VP therole ofCelimene is played by Kim Cattrall NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
Exacting features Path to the predicate NPNPSVPVPVBN S VP therole ofCelimene is played by Kim Cattrall NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
Exacting features Path to the predicate NPNPSVPVPVBN shortest NPSVPVPVBN S VP therole ofCelimene is played by Kim Cattrall NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
Exacting features Parent Label NPSVPVPVBN S VP therole ofCelimene is played by Kim Cattrall NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
Exacting features Parent Label in the shortest path NPSVPVPVBN S S VP therole ofCelimene is played by Kim Cattrall NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
New Features • Parsing score (Fractional value (Mi et al., 2008)) • Inside-outside • Marginal prob. NPSVPVPVBN S f(NP3) S VP therole ofCelimene is played by Kim Cattrall NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
Classifying S(Patient)=0.8 S(Agent)=0.1 S(None)=0.2 … S(Agent)=0.8 S(Patient)=0.1 S(None)=0.2 … S(Patient)=0.5 S(Agent)=0.1 S(None)=0.3 … S VP therole ofCelimene is played by Kim Cattrall NP VP AAAI 2010, Atlanta NP PP AUX VBN PP
Classifying S(Patient)=0.8 … S(Agent)=0.8 … S VP therole ofCelimene is played by Kim Cattrall NP VP AAAI 2010, Atlanta Patient Agent NP PP AUX VBN PP
Outline • Tree-based Semantic Role Labeling • Forest-based Semantic Role Labeling • Experiments • Conclusion AAAI 2010, Atlanta
Experiments • Corpus: CoNLL-2005 shared task • Sections 02-21 of PropBank for training • Section 24 for development set • Section 23 for test set • Total • 43,594 sentences • 262,281 arguments AAAI 2010, Atlanta
Experiments • Training sentences • Parse into 1-best and forest • Prune forest using inside-outside algorithm • Train classifiers • Decoding sentences • Parse into 1-best and forest • Prune forest using inside-outside algorithm • Use classifiers AAAI 2010, Atlanta
Features • Predicate lemma • Path to predicate • Path length • Partial path • Position • Voice • Head word/POS tag • … AAAI 2010, Atlanta
Results on Dev Set 9.63×105 5.78×106 1-best forest(p3) 50-best forest(p5) precision F recall
Results on Tst Set AAAI 2010, Atlanta
Outline • Tree-based Semantic Role Labeling • Forest-based Semantic Role Labeling • Experiments • Conclusion AAAI 2010, Atlanta
Conclusion • Forest • Exponentially encode many parses • Enlarge the candidate space • Explore more rich features • Improve the quality significantly • Not necessary using very large forest • Can NOT use k-best to simulate • Future works • Features on forest AAAI 2010, Atlanta
Thank you! Patient AAAI 2010, Atlanta