310 likes | 322 Views
This paper presents a fast finite-state relaxation method for enforcing global constraints on sequence decoding, specifically in the context of label structure, semantic role labeling, and agreement in named entity recognition. The proposed approach exploits the quality of local models and dynamically applies only those global constraints that are violated by the input sequence. The method outperforms traditional ILP-based approaches and achieves faster decoding runtimes.
E N D
A FastFinite-state Relaxation Methodfor Enforcing Global Constraintson Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University
Agreement: Named Entity Recognition (Finkel et al., ACL 2005) Seminar announcements (Finkel et al., ACL 2005) Label structure: Bibliography parsing (Peng & McCallum, HLT-NAACL 2004) Semantic Role Labeling (Roth & Yih, ICML 2005) *One role per string *One string per role We know what the labels should look like! Seminar – Friday, April 1 Speaker: Monty Hall Location: Auditorium #1 “Let’s Make a Dilemma” Monty Hall will host a discussion of his famous paradox.
Finite-state constraint relaxation Finite-state constraint relaxation Local models Sequence modeling quality Decoding runtime Global constraints Exploit the quality of the local models!
Salesfor the quarterroseto $ 1.63 billionfrom $ 1.47 billion. A1 A4 A3 Semantic Role Labeling • Label each argument to a verb • Six core argument types (A0-A5) • CoNLL-2004 shared task • Penn Treebank section 20 • 4305 propositions • Follow Roth & Yih (ICML 2005) A1 A1 A1 O O A4 O A3 O
Roth & Yih’s constraints as FSAs [^A0]*A0*[^A0]* [^A1]*A1*[^A1]* NO DUPLICATE ARGUMENTS Each argument type (A0, A1, ...) can label at most one sub-sequence of the input.
Roth & Yih’s constraints as FSAs • Regular expressions on any sequences: • grepfor sequence models O*[^O]?* AT LEAST ONE ARGUMENT The label sequence must contain at least one instance that is not O.
Roth & Yih’s constraints as FSAs DISALLOW ARGUMENTS Only allow argument types that are compatible with the proposition’s verb.
Roth & Yih’s constraints as FSAs KNOWN VERB POSITION The proposition’s verb must be labeled O.
Roth & Yih’s constraints as FSAs Any constraints on bounded-length sequences ARGUMENT CANDIDATES Certain sub-sequences must receive a single label.
Unigram model! Roth & Yih’s local model as a lattice “Soft constraints” or “features”
Local model Sentence Labeling Global constraints Decode Intersect A brute-force FSA decoder
Satisfying global constraints is NP-hard. Any approach would blow up in worst case! NO DUPLICATE ARGUMENTS
Handling an NP-hard problem Roth & Yih (ICML 2005): • Express path decoding and global constraints as an integer linear program (ILP). • Apply ILP solver: • Relax ILP to (real-valued) LP. • Apply polynomial-time LP solver. • Branch and bound to find optimal integer solution.
The ILP solver doesn’t know it’s labeling sequences Path constraints: State 0: outflow ≤ 1; State 3: inflow ≤ 1 States 1 & 2: outflow = inflow At least one argument: Arcs labeled O: flow ≤ 1
Finite-state constraint relaxation • Local models already capture much structure. • Relax the constraints instead! • Find best path using linear decoding algorithm. • Apply only those global constraints that path violates.
Local model Sentence Labeling Global constraints Decode Intersect Brute-force algorithm
Local model Sentence Labeling Global constraints Decode Intersect Violated constraints C1 C2 C3 Test Optimal! Never intersected! Constraint relaxation algorithm no yes
Why? Finite-state constraint relaxation is faster than the ILP solver • State-of-the-art implementations: • Xpress-MP for ILP, • FSA (Kanthak & Ney, ACL 2004) for constraint relaxation.
Many take one iteration even though two constraints were violated. No sentences required more than a few iterations
Buy one, get one free Salesfor the quarterroseto $ 1.63 billionfrom $ 1.47 billion. A1 A1 A4 A3
Arcs at each iteration for examples that required 5 intersections Arcs in brute force lattice for examples that required 5 intersections Lattices remained small
Take-home message • Global constraints aren’t usually doing that much work for you: • Typical examples violate only a small number using local models. • They shouldn’t have to slow you down so much, even though they’re NP-hard in the worst case: • Figure out dynamically which ones need to be applied.
Future work • General soft constraints (We discuss binary soft constraints in the paper.) • Choose order to test and apply constraints, e.g. by reinforcement learning. • k-best decoding
Thanks • to Scott Yih for providing both data and runtime, and • to Stephan Kanthak for FSA.