440 likes | 548 Views
Parsing with Soft and Hard Constraints on Dependency Length. Jason Eisner and Noah A. Smith Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University {jason,nasmith}@cs.jhu.edu. Premise. here at IWPT 2005: Burstein Sagae & Lavie Tsuruoka & Tsujii
E N D
Parsing with Soft and Hard Constraints on Dependency Length Jason Eisner and Noah A. Smith Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University {jason,nasmith}@cs.jhu.edu IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Premise here at IWPT 2005: Burstein Sagae & Lavie Tsuruoka & Tsujii Dzikovska and Rosé ... Many parsing consumers (IE, ASR, MT) will benefit more from fast, precise partial parsing than from full, deep parses that are slow to build. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Outline of the Talk • The Short Dependency Preference • Review of split bilexical grammars (SBGs) • O(n3) algorithm • Modeling dependency length • Experiments • Constraining dependency length in a parser • O(n) algorithm, same grammar constant as SBG • Experiments Soft constraints Hard constraints IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Short-Dependency Preference A word’s dependents(adjuncts, arguments) tend to fall near it in the string. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
length of a dependency ≈ surface distance 3 1 1 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
50% of English dependencies have length 1, another 20% have length 2, 10% have length 3 ... fraction of all dependencies length IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Related Ideas • Score parses based on what’s between a head and child (Collins, 1997; Zeman, 2004; McDonald et al., 2005) • Assume short → faster human processing (Church, 1980; Gibson, 1998) • “Attach low” heuristic for PPs (English) (Frazier, 1979; Hobbs and Bear, 1990) • Obligatory and optional re-orderings (English) (see paper) IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Split Bilexical Grammars (Eisner, 1996; 2000) • Bilexical: capture relationships between two words using rules of the form X[p] → Y[p] Z[c] X[p] → Z[c] Y[p] X[w] → w grammar size = N3|Σ|2 • Split: left children conditionally independent of right children, given parent (equivalent to split HAGs; Eisner and Satta, 1999) head IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Generating with SBGs $ λw0 ρw0 • Start with left wall $ • Generate root w0 • Generate left children w-1, w-2, ..., w-ℓ from the FSA λw0 • Generate right children w1, w2, ..., wr from the FSA ρw0 • Recurse on each wi for i in {-ℓ, ..., -1, 1, ..., r}, sampling αi (steps 2-4) • Return αℓ...α-1w0α1...αr w0 w-1 w1 w-2 w2 ... ... λw-ℓ w-ℓ wr w-ℓ.-1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Naïve Recognition/Parsing p goal O(n5N3) if N nonterminals O(n5) combinations r p c i j 0 k n goal takes takes It takes to tango It takes two to It takes two to tango IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Cubic Recognition/Parsing(Eisner & Satta, 1999) A triangle is a head with some left (or right) subtrees. goal One trapezoid per dependency. It takes two to tango IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Cubic Recognition/Parsing(Eisner & Satta, 1999) goal O(n) combinations 0 i n O(n3) combinations i j i j k k O(n3) combinations i j i j k k O(n3g2N) if N nonterminals, polysemy g IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Implementation • Augment items with (Viterbi) weights; order by weight. • Agenda-based, best-first algorithm. • We use Dyna[see the HLT-EMNLP paper] to implement all parsers here. • Count the number of items built → a measure of runtime. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Very Simple Model for λw and ρw *We parse POS tag sequences, not words. p(child | first, parent, direction) p(stop | first, parent, direction) p(child | notfirst, parent, direction) p(stop | notfirst, parent, direction) λtakes ρtakes It takes two to IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Baseline IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Improvements smoothing/max ent parse words, not tags bigger FSAs/ more nonterminals 73% LTAG, CCG, etc. model dependency length? special NP-treatment, punctuation train discriminatively IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
r r d d b b a a e e c c f f Modeling Dependency Length *When running parsing algorithm, just multiply in these probabilities at the appropriate time. p’ DEFICIENT · p(3 | r, a, L) · p(2 | r, b, L) · p(1 | b, c, R) = p · p(1 | r, d, R) · p(1 | d, e, R) · p(1 | e, f, R) IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Modeling Dependency Length + length IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Conclusion (I) Modeling dependency length can cut runtime of simple models by 26-37% with effects ranging from -3% to +4% on recall. (Loss on recall perhaps due to deficient/MLE estimation.) IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Going to Extremes Longer dependencies are less likely. What if we eliminate them completely? IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Hard Constraints Disallow dependencies between words of distance > b ... Risk: best parse contrived, or no parse at all! Solution: allow fragments (partial parsing; Hindle, 1990 inter alia). Why not model the sequence of fragments? IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
From SBG to Vine SBG L(ρ$) Σ An SBG wall ($) has one child. L(λ$) = {ε} $ L(ρ$) Σ+ A vine SBG wall has a sequence of children. L(λ$) = {ε} $ IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Building a Vine SBG Parser Grammar: generates sequence of trees from $ Parser: recognizes sequences of trees without long dependencies Need to modify training data so the model is consistent with the parser. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
$ 8 would 9 4 1 1 . According , changes cut 3 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
$ would 4 1 1 . According , changes cut 3 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than b = 4 some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
$ would 1 1 . According , changes cut 3 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than b = 3 some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
$ would 1 1 . According , changes cut 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than b = 2 some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
$ would 1 1 . According , changes cut 1 to 1 by filings the rule 1 1 estimates more insider 1 1 than b = 1 some third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
$ would . According , changes cut to by filings the rule estimates more insider than b = 0 some third (from the Penn Treebank) a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Observation • Even for small b, “bunches” can grow to arbitrary size: • But arbitrary center embedding is out: IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Vine SBG is Finite-State Could compile into an FSA and get O(n) parsing! • Problem: what’s the grammar constant? EXPONENTIAL • insider has no parent • cut and would can have more children • $ can have more children FSA According to some estimates , the rule changes would cut insider ... IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Alternative Instead, we adapt an SBG chart parser which implicitly shares fragments of stack state to the vine case, eliminating unnecessary work. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Quadratic Recognition/Parsing goal ... O(n2b) * * * * * * ... O(n2b) O(n3) combinations only construct trapezoids such that k – i ≤ b i j i j k k O(nb2) O(n3) combinations i j i j k k IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
$ would According to some , the new changes would cut insider filings by more than a third . . According , changes cut O(nb) vine construction b = 4 all width ≤ 4 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Parsing Algorithm • Same grammar constant as Eisner and Satta (1999) • O(n3) → O(nb2) runtime • Includes some overhead (low-order term) for constructing the vine • Reality check ... is it worth it? IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Results: Penn Treebank *evaluation against original ungrafted Treebank; non-punctuation only b = 20 b = 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Results: Chinese Treebank *evaluation against original ungrafted Treebank; non-punctuation only b = 20 b = 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Results: TIGER Corpus *evaluation against original ungrafted Treebank; non-punctuation only b = 20 b = 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Type-Specific Bounds • b can be specific to dependency type: e.g., b(V-O) can be longer than b(S-V) • b specific to ‹parent, child, direction›: gradually tighten based on training data IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
English: 50% runtime, no loss • Chinese: 55% runtime, no loss • German: 44% runtime, 2% loss IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Related Work Nederhof (2000) surveys finite-state approximation of context-free languages. CFG → FSA We limit all dependency lengths (not just center-embedding), and derive weights from the Treebank (not by approximation). Chart parser → reasonable grammar constant. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Future Work apply to state-of-the-art parsing models $ better parameter estimation applications: MT, IE, grammar induction IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
Conclusion (II) Dependency length can be a helpful feature in improving the speed and accuracy (or trading off between them) of simple parsing models that consider dependencies. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length
This Talk in a Nutshell 3 length of a dependency ≈ surface distance 1 1 1 • Formal results: • A hard bound b on dependency length • results in a regular language. • allows O(nb2) parsing. • Empirical results (English, Chinese, German): • Hard constraints cut runtime in half or more with no accuracy loss (English, Chinese) or by 44% with -2.2% accuracy (German). • Soft constraints affect accuracy of simple models by -3% to +24% and cut runtime by 25% to 40%. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length