160 likes | 314 Views
Efficient Inference on Sequence Segmentation Models. Sunita Sarawagi IIT Bombay sunita@iitb.ac.in. Sequence segmentation models. Flexible & accurate models for many applications Speech segmentation on phonemes Syntactic chunking Protein/Gene finding
E N D
Efficient Inference on Sequence Segmentation Models Sunita Sarawagi IIT Bombay sunita@iitb.ac.in
Sequence segmentation models • Flexible & accurate models for many applications • Speech segmentation on phonemes • Syntactic chunking • Protein/Gene finding • Information extraction with entity-level features • Whole entity match with database of entities • Length of entity between 3 and 8 words • Third or fourth token of entity a “-” • Last three tokens are digits From Keshet et al ’05 NIPS wkshp
Similarity to author’s column in database Sequence Vs. Segmentation Models t x y Features describe the single word “Fagin” y1 y2 y3 y4 y5 y6 y7 y8 l,u x y Features describe full entity
Segmentation models • Input: sequence x=x1,x2..xn, label set Y • Output: segmentation S=s1,s2…sp • sj = (start position, end position, label) = (tj,uj,yj) • Score: F(x,s) = • Transition potentials • Segment starting at i has label y and previous label is y’ • Segment potentials • Segment starting at i’, ending at i, and with label y. • All positions from i’ to i get same label. • Inference • Most likely segmentation (Max-margin trainers) • Marginal around segments (likelihood-based & exponentiated-gradient trainers)
Inference: Marginal for a segment Forward messages (L = max segment length) O(n L2) Matrix notation: for L = n Segment Marginal y1 y2 y3 y4 y5 y6 y7 y8
Goal • Speed up segmentation models • Currently 3—8 times slower than sequence models • Eliminate L, the hard limit on segment length • Efficiently handle mix of potentials spanning varying number of tokens • Pay the penalty of segmentation models only for longer entity level features instead of all of them Empirical results on extraction tasks: • Segmentation models with few entity features: higher accuracy at the same cost as sequence models
Succinct potentials • Key insight • Compactly represent features on overlapping segments • Main challenge • Inference algorithms on compact potentials where cost is independent of segments a potential applies to • Four kinds of potentials
Applications with mixed potentials • Named Entity Recognition • Speech segmentation on phonemes
Efficient Inference: forward pass • Sharing computation • Split potentials (y-s) into two parts: different Common to all segments ending after i-1 Common to all segments starting before i-m Maximum gap between boundary of any y
Optimized forward pass Two sets of modified forward messages y1 y2 y3 y4 y5 y6 y7 y8 y9 O(n m2) Similar two sets of backward messages Same strategy for max-product inference
Marginals around potentials • Direct computation of marginals is O(n2) • Reduced to O(1) by two tricks • Decomposing potentials as in a,b • Sharing computations across adjacent potentials Direct Optimized a bit more tricky
Complexity and data structures • Complexity of computing marginals • Optimized: O(nm+H), Original: O(nL+G) • H = number of features in succinct form • G = O(L2H) (In real-data |G| = 5--10 times |H|) • Achieved via incremental computation of q • Special data structure for storing y to compute qi’:i in O(1) time from previous qi’:i-1 • Marginals m computed in sorted order: increasing start boundary, decreasing end boundary
Empirical evaluation • Task: • Citations: Cora, articles (L=20) • Address: Indian address (L=7) • Features: • Token-level • Orthographic properties/lexicon match of words at the start, end, middle, left, right of segment • Entity-level • TFIDF Match with lexicon, entity length • Methods • Sequence-BCEU: Begin-Continue-End-Unique labels • Segment: Original un-optimized algorithm • Segment-Opt: Optimized inference with compact potentials
Limit on segment length (L) • L (Hard limit on segment length) • Too small reduced accuracy 9081 • Too large increased running time 30 minutes 1 hour • m (Maximum entity-level features) • Reduced by half accuracy still 3% higher than Sequence • Too large running time does increases by only 30%
Concluding remarks • Segmentation models: natural, flexible, accurate • Main limitation: inference expensive • Solved via a compact design of shared potentials • New efficient inference algorithms • Pays penalty of entity-level features only when needed • Running time comparable to sequence models • No hard limit on segment length • Future work: • Features that are functions of distance from boundary • Other models: 2-D segmentation? Code: http://crf.sourceforge.net