1 / 48

Inference and Learning via Integer Linear Programming

Inference and Learning via Integer Linear Programming. Vasin,Dan,Scott,Dav. Outline. Problem Definition Integer Linear Programming (ILP) Its generality Learning and Inference via ILP Experiments Extension to hierarchical learning Future Direction Hidden Variables. Problem Definition.

paki-estes
Download Presentation

Inference and Learning via Integer Linear Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference and Learning via Integer Linear Programming Vasin,Dan,Scott,Dav

  2. Outline • Problem Definition • Integer Linear Programming (ILP) • Its generality • Learning and Inference via ILP • Experiments • Extension to hierarchical learning • Future Direction • Hidden Variables

  3. Problem Definition • X = (X1,...,Xk)  X k = X • Y = (Y1,...,Yl)  Y l = Y • Given X = x, find Y = y • Notation agreements • Capital letters mean variables • Non-capital letters mean values • Bold indicates vectors or matrixes • X,Yis a set

  4. Example (Text Chunking) y = NP ADJP VP ADVP VP x = The guy presenting now is so tired

  5. Classifiers • A classifier • h: X Y (l-1)Y {1,..,l} R • Example • score(x,y-3,NP,3) = 0.3 • score(x,y-3,VP,3) = 0.5 • score(x,y-3,ADVP,3) = 0.2 • score(x,y-3,ADJP,3) = 1.2 • score(x,y-3,NULL,3) = 0.1

  6. Inference • Goal: x y • Given • x input • score(x,y-t,y,t) for all (y-t,y)  Y l, t  {1,..,l} • C A set of constraints over Y • Find y • maximizes global function • score(x,y) = t score(x,y-t,yt,t) • satisfies constraints C

  7. Integer Linear Programming • Boolean variables: U = (U1,...,Ud)  {0,1}d • Cost vector: p = (p1,…,pd)  Rd • Cost Function: pU • Constraint Matrix: cReRd • Maximize pU • Subject to cU 0 (cU=0, cU3, possible)

  8. ILP (Example) • U = (U1,U2,U3) • p = (0.3, 0.5, 0.8) • c = 1 2 3 -1 -2 2 0 -3 2 • Maximize pU • Subject to cU 0

  9. Boolean Functions as Linear Constraints • Conjunction • U1U2U3 U1=1, U2=1, U3=1 • Disjunction • U1U2U3 U1 + U2 + U3 1 • CNF • (U1U2)(U3U4)  U1+U2 1, U3+U4  1

  10. Text Chunking • Indicator Variables • U1,NP,U1,NULL,U2,VP... y1=NP, y1=NULL, y2=VP,.. • U1,NP indicates that phrase 1 is labeled NP • Cost Vector • p1,NP = score(x,NP,1) • p1,NULL = score(x,NULL,1) • p2,VP = score(x,VP,2) • ... • pU = score(x,y) = t score(x,yt,t), subject to constraints

  11. Structural Constraints • Coherency • yt can take only one value • y{NP,..NULL} Ut,y = 1 • Non-Overlapping • y1 and y2 overlap • U1,NULL + U2,NULL = 1

  12. Linguistic Constraints • Every sentence must have at least one VP • t Ut,VP  1 • Every sentence must have at least one NP • t Ut,NP  1 • ...

  13. Interacting Classifiers • Classifier for an output yt uses other outputs y-t as inputs • score(x,y-t,y,t) • Need to ensure that the final output from ILP computed from a consistent y • Introduce additional variables • Introduce additional coherency constraints

  14. Interacting Classifiers • Additional variables • Y=yUY,y for all possible y-t,y • Additional coherency constraints • UY,y = 1 iff Ut,yt = 1 for all yt in y • yt in yUt,yt - UY,y  l – 1 • yt in yUt,yt - lUY,y  0

  15. Learning Classifiers • score(x,y-t,y,t) = yy(x,y-t,t) • Learn y, for all y  Y • Multi-class learning • Example (x,y)  {y(x,y-t,t),yt}t=1..l • Learn each classifier independently

  16. Learn with Inference Feedback • Learn by observing global behavior • For each example (x,y) • Make prediction with the current classifiers and ILP • y’ = argmaxytscore(x,y-t,y,t) • For each t, update • If y’t  yt • Promote score(x,y-t,yt,t) • Demote score(x,y’-t,y’t,t)

  17. Experiments • Semantics Role Labeling • Assume correct boundaries are given • Only sentences with more than 5 arguments are included

  18. Experimental Results Winnow Perceptron • For difficult task: • Inference feedback during training improves performance • For easy task: • Learning without inference feedback is better

  19. Conservative Updating • Update only if necessary • Example • U1 + U2 = 1 • Predict (U1, U2) = (1,0) • Correct (U1, U2) = (0,1) • Feedback Demote class 1, promote class 2 • So, U1=0  U2=1, so only demote class 1

  20. Conservative Updating • S = minset(Constraints) • Set of functions that, if changed, would make global prediction correct. • Promote (Demote) only those functions in the minset S

  21. Hierarchical Learning • Given x • Compute hierarchically • z1 = h1(x) • z2 = h2(x,z1) • … • y = hs+1(x,z1,…,zs) • Assume all z are known in training

  22. Hierarchical Learning • Assume each hj can be computed via ILP • pj, Uj, cj • y = argmaxymaxz1,…zs jjpjUj • Subject to • c1U1  0, c2U2  0, …, cs+1Us+1  0 • where j is a large enough constant to preserve hierarchy

  23. Hidden Variables • Given x • y = h(x,z) • z is not known in training • y = argmaxymaxz score(x,z,y-t,y,t) • Subject to some constraints

  24. Learning with Hidden Variables • Truncated EM styled learning • For each example (x,y) • Compute z with the current classifiers and ILP • z = argmaxz score(x,z,y-t,y,t) • Make prediction with the current classifiers and ILP • (y’,z’) = argmaxy,ztscore(x,z,y-t,y,t) • For each t, update • If y’t  yt • Promote score(x,z,y-t,yt,t) • Demote score(x,z’,y’-t,y’t,t)

  25. Conclusion • ILP is • powerful • general • learnable • useful • fast (or at least not too slow) • extendable

  26. Boolean Functions as Linear Constraints • Conjunction • abc  Ua + Ub + Uc 3 • Disjunction • abc  Ua + Ub + Uc 1 • DNF • ab + cd  Iab + Icd 1 • Introduce new variables Iab, Icd

  27. Helper Variables • We must link Ia, Ib, and Iab • Iabab • IaIb  Iab • Ia + Ib <= Iab • Iab  IaIb • 2Iab <= Ia + Ib

  28. Semantic Role Labeling • a,b,c... ph1=A0, ph1=A1,ph2=A0,.. • Cost Vector • pa = score(ph1=A0) • pb = score(ph1=A1) • ... • Indicator Variables • Ia indicates that phrase 1 is labeled A0 • paIa = 0.3 if Ia and 0 ow

  29. Learning • X = (X1,...,Xk)  X1,…,Xk = X • Y-t = (Y1,...,Yt-1,Yt+1,Yl)  Y1,…,Yt-1,Yt+1,…,Yl = Y -t • Yt Yt • Given X = x, and Y-t = y-t, find Yt = yt or score of each possible yt • X Y –t  Yt or X Y –tYtR

  30. SRL via Generalized Inference

  31. Outline • Find potential argument candidates • Classify arguments to types • Inference for Argument Structure • Integer linear programming (ILP) • Cost Function • Constraints • Features

  32. I left my nice pearls to her I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] Find Potential Arguments • Every chunk can be an argument • Restrict potential arguments • BEGIN(word) • BEGIN(word) = 1  “word begins argument” • END(word) • END(word) = 1  “word ends argument” • Argument • (wi,...,wj) is a potential argument iff • BEGIN(wi) = 1 and END(wj) = 1 • Reduce set of potential argments

  33. Details... • BEGIN(word) • Learn a function • B(word,context,structure)  {0,1} • END(word) • Learn a function • E(word,context,structure)  {0,1} • POTARG = {arg | BEGIN(first(arg)) andEND(last(arg))}

  34. I left my nice pearls to her I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] Arguments Type Likelihood • Assign type-likelihood • How likely is it that arg a is type t? • For all aPOTARG , tT • P (argument a = type t ) 0.30.20.20.3 0.60.00.00.4 A0 CA1 A1 Ø

  35. Details... • Learn a classifier • ARGTYPE(arg) • P(arg)  {A0,A1,...,CA0,...,LOC,...} • argmaxt{A0,A1,...,CA0,...,LOC,...}wtP(arg) • Estimate Probabilites • P(a = t) = wtP(a) / Z

  36. What is a Good Assignment? • Likelihood of being correct • P(Arg a = Type t) • if t is the correct type for argument a • For a set of arguments a1, a2, ..., an • Expected number of arguments correct • i P( ai = ti ) • We search for the assignment with maximum expected correct

  37. 0.30.20.20.3 0.60.00.00.4 0.10.30.50.1 0.10.20.30.4 Cost = 0.3 + 0.4 + 0.3 + 0.4 = 1.4 BlueRed & N-O Cost = 0.3 + 0.4 + 0.5 + 0.4 = 1.6 Non-Overlapping Cost = 0.3 + 0.6 + 0.5 + 0.4 = 1.8 Independent Max Inference • Maximize expected number correct • T* = argmaxT i P( ai = ti ) • Subject to some constraints • Structural and Linguistic I left my nice pearls to her Ileftmy nice pearlsto her

  38. Everything is Linear • Cost function • aPOTARG P(a=t) = aPOTARG , tT P(a=t)Iat • Constraints • Non-Overlapping • a and a’ overlap  IaØ+ Ia’Ø = 0 • Linguistic •  CA0   A0  a IaCA0 – a IaA0  1 • Integer Linear Programming

  39. Features are Important • Here, a discussion of the features should go. • Which are most important? • Comparison to other people.

  40. I left my nice pearls to her I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] I left my nice pearls to her I leftmy nice pearlsto her

More Related