Parsing with Soft and Hard Constraints on Dependency Length

Parsing with Soft and Hard Constraints on Dependency Length Jason Eisner and Noah A. Smith Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University {jason,nasmith}@cs.jhu.edu IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Premise here at IWPT 2005: Burstein Sagae & Lavie Tsuruoka & Tsujii Dzikovska and Rosé ... Many parsing consumers (IE, ASR, MT) will benefit more from fast, precise partial parsing than from full, deep parses that are slow to build. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Outline of the Talk • The Short Dependency Preference • Review of split bilexical grammars (SBGs) • O(n3) algorithm • Modeling dependency length • Experiments • Constraining dependency length in a parser • O(n) algorithm, same grammar constant as SBG • Experiments Soft constraints Hard constraints IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Short-Dependency Preference A word’s dependents(adjuncts, arguments) tend to fall near it in the string. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

length of a dependency ≈ surface distance 3 1 1 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

50% of English dependencies have length 1, another 20% have length 2, 10% have length 3 ... fraction of all dependencies length IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Related Ideas • Score parses based on what’s between a head and child (Collins, 1997; Zeman, 2004; McDonald et al., 2005) • Assume short → faster human processing (Church, 1980; Gibson, 1998) • “Attach low” heuristic for PPs (English) (Frazier, 1979; Hobbs and Bear, 1990) • Obligatory and optional re-orderings (English) (see paper) IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Split Bilexical Grammars (Eisner, 1996; 2000) • Bilexical: capture relationships between two words using rules of the form X[p] → Y[p] Z[c] X[p] → Z[c] Y[p] X[w] → w grammar size = N3|Σ|2 • Split: left children conditionally independent of right children, given parent (equivalent to split HAGs; Eisner and Satta, 1999) head IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Generating with SBGs $ λw0 ρw0 • Start with left wall $ • Generate root w0 • Generate left children w-1, w-2, ..., w-ℓ from the FSA λw0 • Generate right children w1, w2, ..., wr from the FSA ρw0 • Recurse on each wi for i in {-ℓ, ..., -1, 1, ..., r}, sampling αi (steps 2-4) • Return αℓ...α-1w0α1...αr w0 w-1 w1 w-2 w2 ... ... λw-ℓ w-ℓ wr w-ℓ.-1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Naïve Recognition/Parsing p goal O(n5N3) if N nonterminals O(n5) combinations r p c i j 0 k n goal takes takes It takes to tango It takes two to It takes two to tango IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Cubic Recognition/Parsing(Eisner & Satta, 1999) A triangle is a head with some left (or right) subtrees. goal One trapezoid per dependency. It takes two to tango IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Cubic Recognition/Parsing(Eisner & Satta, 1999) goal O(n) combinations 0 i n O(n3) combinations i j i j k k O(n3) combinations i j i j k k O(n3g2N) if N nonterminals, polysemy g IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Implementation • Augment items with (Viterbi) weights; order by weight. • Agenda-based, best-first algorithm. • We use Dyna[see the HLT-EMNLP paper] to implement all parsers here. • Count the number of items built → a measure of runtime. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Very Simple Model for λw and ρw *We parse POS tag sequences, not words. p(child | first, parent, direction) p(stop | first, parent, direction) p(child | notfirst, parent, direction) p(stop | notfirst, parent, direction) λtakes ρtakes It takes two to IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Baseline IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Improvements smoothing/max ent parse words, not tags bigger FSAs/ more nonterminals 73% LTAG, CCG, etc. model dependency length? special NP-treatment, punctuation train discriminatively IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

r r d d b b a a e e c c f f Modeling Dependency Length *When running parsing algorithm, just multiply in these probabilities at the appropriate time. p’ DEFICIENT · p(3 | r, a, L) · p(2 | r, b, L) · p(1 | b, c, R) = p · p(1 | r, d, R) · p(1 | d, e, R) · p(1 | e, f, R) IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Modeling Dependency Length + length IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Conclusion (I) Modeling dependency length can cut runtime of simple models by 26-37% with effects ranging from -3% to +4% on recall. (Loss on recall perhaps due to deficient/MLE estimation.) IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Going to Extremes Longer dependencies are less likely. What if we eliminate them completely? IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Hard Constraints Disallow dependencies between words of distance > b ... Risk: best parse contrived, or no parse at all! Solution: allow fragments (partial parsing; Hindle, 1990 inter alia). Why not model the sequence of fragments? IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

From SBG to Vine SBG L(ρ$) Σ An SBG wall ($) has one child. L(λ$) = {ε} $ L(ρ$) Σ+ A vine SBG wall has a sequence of children. L(λ$) = {ε} $ IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Building a Vine SBG Parser Grammar: generates sequence of trees from $ Parser: recognizes sequences of trees without long dependencies Need to modify training data so the model is consistent with the parser. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

$ 8 would 9 4 1 1 . According , changes cut 3 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

$ would 4 1 1 . According , changes cut 3 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than b = 4 some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

$ would 1 1 . According , changes cut 3 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than b = 3 some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

$ would 1 1 . According , changes cut 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than b = 2 some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

$ would 1 1 . According , changes cut 1 to 1 by filings the rule 1 1 estimates more insider 1 1 than b = 1 some third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

$ would . According , changes cut to by filings the rule estimates more insider than b = 0 some third (from the Penn Treebank) a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Observation • Even for small b, “bunches” can grow to arbitrary size: • But arbitrary center embedding is out: IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Vine SBG is Finite-State Could compile into an FSA and get O(n) parsing! • Problem: what’s the grammar constant? EXPONENTIAL • insider has no parent • cut and would can have more children • $ can have more children FSA According to some estimates , the rule changes would cut insider ... IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Alternative Instead, we adapt an SBG chart parser which implicitly shares fragments of stack state to the vine case, eliminating unnecessary work. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Quadratic Recognition/Parsing goal ... O(n2b) * * * * * * ... O(n2b) O(n3) combinations only construct trapezoids such that k – i ≤ b i j i j k k O(nb2) O(n3) combinations i j i j k k IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

$ would According to some , the new changes would cut insider filings by more than a third . . According , changes cut O(nb) vine construction b = 4 all width ≤ 4 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Parsing Algorithm • Same grammar constant as Eisner and Satta (1999) • O(n3) → O(nb2) runtime • Includes some overhead (low-order term) for constructing the vine • Reality check ... is it worth it? IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Results: Penn Treebank *evaluation against original ungrafted Treebank; non-punctuation only b = 20 b = 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Results: Chinese Treebank *evaluation against original ungrafted Treebank; non-punctuation only b = 20 b = 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Results: TIGER Corpus *evaluation against original ungrafted Treebank; non-punctuation only b = 20 b = 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Type-Specific Bounds • b can be specific to dependency type: e.g., b(V-O) can be longer than b(S-V) • b specific to ‹parent, child, direction›: gradually tighten based on training data IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

English: 50% runtime, no loss • Chinese: 55% runtime, no loss • German: 44% runtime, 2% loss IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Related Work Nederhof (2000) surveys finite-state approximation of context-free languages. CFG → FSA We limit all dependency lengths (not just center-embedding), and derive weights from the Treebank (not by approximation). Chart parser → reasonable grammar constant. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Future Work apply to state-of-the-art parsing models $ better parameter estimation applications: MT, IE, grammar induction IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Conclusion (II) Dependency length can be a helpful feature in improving the speed and accuracy (or trading off between them) of simple parsing models that consider dependencies. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

This Talk in a Nutshell 3 length of a dependency ≈ surface distance 1 1 1 • Formal results: • A hard bound b on dependency length • results in a regular language. • allows O(nb2) parsing. • Empirical results (English, Chinese, German): • Hard constraints cut runtime in half or more with no accuracy loss (English, Chinese) or by 44% with -2.2% accuracy (German). • Soft constraints affect accuracy of simple models by -3% to +24% and cut runtime by 25% to 40%. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

Parsing with Soft and Hard Constraints on Dependency Length

Parsing with Soft and Hard Constraints on Dependency Length

Presentation Transcript

Dependency Parsing: Machine Learning Approaches

Dependency Parsing by Belief Propagation

Dependency Parsing

Unsupervised Dependency Parsing

Data-Driven Dependency Parsing

Hard and Soft Engineering

Dependency Parsing

11th Workshop on Preferences and Soft Constraints

Hard and soft water

Uncertainty in Hard, Soft and Hard-Soft Modeling

Scheduling with Soft Constraints

Dependency Parsing with Reference to Slovene, Spanish and Swedish

Dependency Parsing

Soft constraints

Dependency Parsing by Belief Propagation

SOFT AND HARD WATER

Soft and Hard Diffraction

Lexical Dependency Parsing

Soft and Hard Landscaping

SOFT AND HARD WATER

Unsupervised Dependency Parsing

Words with Hard and Soft g