160 likes | 353 Views
Machine Learning 2. Inductive Dependency Parsing Joakim Nivre. Inductive Dependency Parsing. Dependency-based representations … have restricted expressivity but provide a transparent encoding of semantic structure. have restricted complexity in parsing. Inductive machine learning …
E N D
Machine Learning 2 Inductive Dependency Parsing Joakim Nivre
Inductive Dependency Parsing • Dependency-based representations … • have restricted expressivity but provide a transparent encoding of semantic structure. • have restricted complexity in parsing. • Inductive machine learning … • is necessary for accurate disambiguation. • is beneficial for robustness. • makes (formal) grammars superfluous.
Dependency Graph P ROOT OBJ PMOD NMOD SBJ NMOD NMOD NMOD
Key Ideas • Deterministic: • Deterministic algorithms for building dependency graphs (Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Nivre 2003) • History-based: • History-based models for predicting the next parser action (Black et al. 1992, Magerman 1995, Ratnaparkhi 1997, Collins 1997) • Discriminative: • Discriminative machine learning to map histories to actions (Veenstra and Daelemans 2000, Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Nivre et al. 2004)
Guided Parsing • Deterministic parsing: • Greedy algorithm for disambiguation • Optimal strategy given an oracle • Guided deterministic parsing: • Guide = Approximation of oracle • Desiderata: • High prediction accuracy • Efficient implementation (constant time) • Solution: • Discriminative classifier induced from treebank data
Learning • Classification problem (S T) • Parser states: S = { s | s = (1, …, p) } • Parser actions: T= { t1, …, tm } • Training data: • D = { (si-1, ti) | ti(si-1) = si in gold standard derivation s1, …, sn } • Learning methods: • Memory-based learning • Support vector machines • Maximum entropy modeling • …
hd rd ld ld t1 th … . … top … . . … next n1 n2 n3 Feature Models • Model P: PoS: t1, top, next, n1, n2 • Model D: P + DepTypes: t.hd, t.ld, t.rd, n.ld • Model L2: D + Words: top, next • Model L4: L2+ Words: top.hd, n1 Stack Input
Experimental Results (MBL) • Results: – Dependency features help – Lexicalisation helps … – … up to a point (?)
Parameter Optimization • Learning algorithm parameter optimization: • Manual (Nivre 2005) vs. paramsearch (van den Bosch 2003)
Learning Curves Swedish: • Attachment score (U/L) • Models: D, L2 • 10K tokens/section English: • Attachment score (U/L) • Models: D, L2 • 100K tokens/section
Dependency Types: Swedish • High accuracy (84% labeled F): IM (markerinfinitive) 98.5%PR (preposition noun) 90.6%UK (complementizer verb) 86.4%VC (auxiliary verb main verb) 86.1%DET (noun determiner) 89.5%ROOT 87.8%SUB (verb subject) 84.5% • Medium accuracy (76% labeled F 80%): ATT (noun modifier) 79.2%CC (coordination) 78.9%OBJ (verb object) 77.7%PRD (verb predicative) 76.8%ADV (adverbial) 76.3% • Low accuracy (labeled F 70%): INF, APP, XX, ID
Dependency Types: English • High accuracy (86% labeled F): VC (auxiliary verb main verb) 95.0%NMOD (noun modifier) 91.0%SBJ (verb subject) 89.3%PMOD (prepositionmodifier) 88.6%SBAR (complementizer verb) 86.1% • Medium accuracy (73% labeled F 83%): ROOT 82.4%OBJ (verb object) 81.1% VMOD (verb modifier) 76.8%AMOD (adjective/adverb modifier) 76.7%PRD (predicative) 73.8% • Low accuracy (labeled F 70%): DEP (null label)
MaltParser • Software for inductive dependency parsing: • Freely available for research and education (http//www.msi.vxu.se/users/nivre/research/MaltParser.html) • Version 0.3: • Parsing algorithms: • Nivre (2003) (arc-eager, arc-standard) • Covington (2001) (projective, non-projective) • Learning algorithms: • MBL (TIMBL) • SVM (LIBSVM) • Feature models: • Arbitrary combinations of part-of-speech features, dependency type features and lexical features • Auxiliary tools: • MaltEval • MaltConverter • Proj
Possible Projects • CoNLL Shared Task: • Work on one or more languages • With or without MaltParser • Data sets available • Parsing spoken language: • Talbanken05: Swedish treebank with written and spoken data, cross-training experiments • GSLC: 1.2M corpus of spoken Swedish