110 likes | 262 Views
Dependency Parsing with Reference to Slovene, Spanish and Swedish. Simon Corston-Oliver Anthony Aue Microsoft Research. Noteworthy results. Slovene Labeled DA = 72.42% (second) Not significantly different from #1 (73.44%) Swedish #1 for unlabeled DA (89.54%)
E N D
Dependency Parsing with Reference to Slovene, Spanish and Swedish Simon Corston-Oliver Anthony Aue Microsoft Research
Noteworthy results • Slovene • Labeled DA = 72.42% (second) • Not significantly different from #1 (73.44%) • Swedish • #1 for unlabeled DA (89.54%) • Much worse than #3 for labeled DA(79.69% vs 82.31%)
Outline Two stage pipeline • Identify unlabeled directed dependencies • Label the dependencies
Parser • Unlabeled directed dependencies • Discriminatively trained linear classifier • Projective dependencies only • Parse features • Case-normalized surface form and lemma • POS of each token • POS of intervening and neighboring tokens • Combinations of these • Direction and distance of attachment
POS features • Use fine POS tags for all languages except Dutch and Turkish • Swedish: Normalize tags for auxiliaries • Orig: “vara” (be) = AV; “måst” (must) = MV • Replace with “aux” • Unlabeled DA: 89.23% 89.45%
Root identification features • Many errors identifying root in periphrastic constructions with aux and participle • E.g. German aux/modal in second position in declarative main clause; • initial with subj-aux inversion • New features: • POS sequence to left of each token • “Leftmost finite verb and not preceded by subordinating conj or relative pron” • “Sentence does (not) contain finite verb”
Root identification features • Danish improved • RA 94.12% 94.72% • Spanish improved • RA 80.08% 83.57%
Labeling dependencies • Use a maximum entropy classifier (Berger et al 1996) • Fast to train • Good probability estimates • Intended to jointly model sets of labels • Actually labeled independently • Better results with SVM?
Conclusion • Two stage pipeline • Feature engineering important • For predicting dependencies • For labeling dependencies • Replacing maxent classifier with SVM gave boost