130 likes | 272 Views
Two-Phase Semantic Role Labeling based on Support Vector Machines. Kyung-Mi Park Young-Sook Hwang Hae-Chang Rim NLP Lab. Korea Univ. Contents. Introduction Two-phase semantic role labeling based on SVMs Semantic argument boundary identification phase Semantic role classification phase
E N D
Two-Phase Semantic Role Labeling based on Support Vector Machines Kyung-Mi Park Young-Sook Hwang Hae-Chang Rim NLP Lab. Korea Univ.
Contents • Introduction • Two-phase semantic role labeling based on SVMs • Semantic argument boundary identification phase • Semantic role classification phase • Experiments • Conclusion
Introduction(1) • Advantages of using SVMs • high generalization performance in high dimensional feature spaces • learning with combination of multiple features is possible by virtue of polynomial kernel functions • Semantic Role Labeling(SRL) task is one of the multiclass classification task • since SVM is a binary classifier, we have to extend SVMs to multiclass classification task • we are often confronted with the unbalanced class distribution problem in a multiclass classification task
Introduction(2) • If we try to apply SVMs in the SRL task • we have to find a method of resolving the unbalanced class distribution problem • Propose a two-phase SRL method • Boundary identification phase + Role classification phase • We can alleviate the unbalanced class distribution problem • In the identification phase, only three SVM classifiers are required to identify B-ARG, I-ARG, O. • We can decrease the number of negative examples. • In the classification phase, we can ignore non-argumentsconstituents
Two-phase Semantic Role Labeling(1) • First phase: semantic argument identification Phase • Identify the boundary of semantic arguments • First, segment a sentence into syntactic constituents(c) using a unit of chunk or subclause • Second, classify syntactic constituents into B-ARG, I-ARG, O • Second phase: semantic role classification phase • assign appropriate semantic roles to the identified semantic arguments
Semantic Argument Boundary Identification(1) • Restrict the search space in terms of the constituents • the left search boundary is set to the left boundary of the second upper clause • the right search boundary is set to the right boundary of the immediate clause • Utilize features for identifying syntactic constituents which are dependent to a predicate • Semantic arguments are dependent on the predicate • Features for finding dependency relations are implicitly represented
Semantic Argument Boundary Identification(2) • 29 features are used for representing syntactic and semantic information related to dependency relationships between syntactic constituents and predicate
Semantic Role Classification(1) • We consider only 18 semantic roles based on frequency in the training data • AM-MOD, AM-NEG are post-processed by hand-crafted rules • we do not consider 19 semantic roles that appear less than 36 times in the training data • A5, AM-PRD, AM-REC, AA • R-A3, R-AA, R-AM-TMP, R-AM-LOC, R-AM-MNR, R-AM-ADV, R-AM-PNC • C-A0, C-A2, C-A3, C-AM-MNR, C-AM-ADV, C-AM-EXT, C-AM-DIS, C-AM-CAU
Semantic Role Classification(2) • This phase also uses all features applied in the identification phase • except for # of POS[:] and POS[“] & POS[”] • In addition, we use voice feature • This is a binary feature identifying whether the target phrase is active or passive • Named-entity information is not used • performance is decreased when NE information is included
Experiments(1) • We usedSVM light package (http://svm-light.joachims.org/) • In both phases, we used a polynomial kernel (degree 2) with the one-vs-rest classification method • Results on the development set (closed challenge)
Experiments(2) • Results on the test set (closed challenge)
Conclusion • proposed a method of two-phase semantic role labeling based on the support vector machines • By applying the two-phase method, • we can alleviate the unbalanced class distribution problem caused by the negative examples • Our system obtains F-measure of 63.99 % on the test set and 65.78 % on the development set