1 / 64

Constrained Conditional Models Tutorial

Constrained Conditional Models Tutorial. Jingyu Chen, Xiao Cheng. Introduction. Main ideas:. Idea 1: Modeling Separate modeling and problem formulation from algorithms Similar to the philosophy of probabilistic modeling Idea 2: Inference

ahava
Download Presentation

Constrained Conditional Models Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constrained Conditional Models Tutorial Jingyu Chen, Xiao Cheng

  2. Introduction

  3. Main ideas: • Idea 1:Modeling Separate modeling and problem formulation from algorithms • Similar to the philosophy of probabilistic modeling • Idea 2: Inference Keep model simple, make expressive decisions (via constraints) • Unlike probabilistic modeling, where models become more expressive • Inject background knowledge • Idea 3: Learning Expressive structured decisions can be supported by simply learned models • Global Inference can be used to amplify the simple models (and even minimal supervision).

  4. Task of interest: Structured Prediction • Common formulation • e.g. HMM, CRF, Structured Perceptron etc. • Covers a lot of NLP problems: • Parsing; Semantic Parsing; Summarization; Transliteration; Co-reference resolution, Textual Entailment… • IE problems: • Entities, relations, attributes… • How to improve without incurring performance issues?

  5. Pipeline? • Very crude approximation to the real problem, propagates error. • Ignores dependency : • e.g. In relation extraction, the label of the entity depends on the relation it is involved and the relation label depends on the label of its arguments.

  6. Model Formulation • Typical models • With CCM we choose Local dependency e.g. HMM, CRF Penalty Violation measure Regularization

  7. Constraint expressivity Multiclass Problem: Ideal classification, can be expressed through constraints One v. All approximation:

  8. Implementations

  9. How do we use CCM to learn?

  10. Example 1: joint inference-based learning Constrained HMM in Information Extraction

  11. Typical work flow • Define basic classifiers • Define constraints as linear inequalities • Combine the two into an objective function

  12. HMMCCMExample • Information extraction without prior knowledge • Use HMM

  13. HMMCCM Example Violates a lot of natural constraints

  14. HMMCCM Example • Each field must be aconsecutive list of words and can appear at mostoncein a citation. • State transitions must occur onpunctuation marks. • The citation can only start withAUTHORorEDITOR. • The wordspp., pagescorrespond toPAGE. • Four digits starting with20xx and 19xx areDATE. • Quotationscan appear only inTITLE

  15. HMMCCM Example • How do we use constraints with HMM? • Standard HMM: • Learn the probability of the sequence of labels and input : • Inference, taking the most likely label sequence:

  16. HMMCCM Example • New objective function involving constraints • Penalize the probability of sequence if it violates constraint Penalty for each time the constraint is violated

  17. HMMCCM Example • Transform to linear model

  18. HMMCCM Example • We need to learn the new parameters maximizes the scoring function • Despite the fact that the scoring function is no longer a log likelihood of the dataset, it is still a smooth concave function with a unique global maximum with zero gradient.

  19. HMMCCM Example Simply counting the probability of the constraints being violated

  20. HMMCCM Example

  21. Are there other ways to learn? Can this paradigm be generalized?

  22. Training paradigms

  23. Training paradigms Decompose Learn Inference

  24. Prior knowledge: Features vs. Constraints

  25. Comparison with MLN • MLN models constraints are formulated as an explicit probability jointly with the overall distributions: • e.g. • Constraints in CCM are formulated as linear inequalities • e.g. • Theoretically the same, very different in practice

  26. Training paradigms • Learning + Inference: Train with some constraints, apply all constraints only in inference • No need to retrain an existing system • Fast and modular • Inference-Based Training: Train jointly with constraints and dependencies (e.g. Graphical Models) • Better for strong interactions between • Other training paradigm: • Pipe-line like sequential model [Roth, Small, Titov: AI&Stat’09] • Constraints Driven Learning (CODL) [Chang et. al’07,12]

  27. Which paradigm is better?

  28. For each iteration For each in the training data If endif endfor endfor Algorithmic view of the differences IBT I+L

  29. L+I vs. IBT tradeoffs In some cases problems are hard due to lack of training data. Semi-supervised learning # of Features

  30. Choice of paradigm • IBT: • Better when the interaction between output label is strong • L+I: • Faster computationally • Modular, no need to retrain existing classifier and works with simple models such as

  31. paradigm 2:learning + inference An example with Entity-Relation Extraction

  32. Dole ’s wife, Elizabeth , is a native of N.C. E1E2E3 R23 R12 Entity-Relation Extraction [RothYi07] Decision time inference 1: 32

  33. Entity-Relation Extraction [RothYi07] • Formulation 1: Joint Global Model Intractable to learn Need to decomposition

  34. Entity-Relation Extraction [RothYi07] • Formulation 2: Local learning + global inference

  35. Dole Elizabeth N.C. E1 E2 E3 R12 R21 R23 R32 R13 R31 Entity-Relation Extraction [RothYi07] Cost function: c{E1 = per}· x{E1 = per} + c{E1 = loc}· x{E1 = loc} + … + c{R12 = spouse_of}· x{R12 = spouse_of} + … + c{R12 = }· x{R12 = } + …

  36. Entity-Relation Extraction [RothYi07] Exactly one label for each relation and entity Relation and entity type constraints Integral constraints, in effect boolean

  37. Entity-Relation Extraction [RothYi07] • Each entity is either a person, organization or location: • x{E1 = per}+ x{E1 = loc}+ x{E1 = org} + x{E1 = }=1 • (R12 = spouse_of)  (E1 = person)  (E2 = person) • x{R12 = spouse_of} x{E1 = per} • x{R12 = spouse_of} x{E2 = per}

  38. Entity-Relation Extraction [RothYi07] • Entity classification results

  39. Entity-Relation Extraction [RothYi07] • Relation identification results

  40. Entity-Relation Extraction [RothYi07] • Relation identification results

  41. Inner workings of Inference

  42. Constraints Encoding • Atoms • Existential quantification • Negation • Conjunction • Disjunction

  43. Integer Linear Programming (ILP) • Powerful tool, very general • NP-hard even in binary case, but efficient for most NLP problems • If ILP can not solve the problem efficiently, we can fall back to approximate solutions using heuristic search

  44. Integer Linear Programming (ILP)

  45. Integer Linear Programming (ILP)

  46. Sentence compression

  47. Sentence Compression ExampleModelling Compression with Discourse Constraints, James Clarke and MirellaLapata, COLING/SCL 2006 • 1. What is sentence compression? • Sentence compression is commonly expressed as a word deletion problem: given an input sentence of words W = w1,w2, . . . ,wn, the aim is to produce a compression by removing any subset of these words (Knight and Marcu 2002).

  48. A trigram language model:maximize a scoring function by ILP: p i: word i starts the compression q i,j : sequence wi,wj ends the compression X i,j,k : trigram wi , wj ,wk in the compression Y i : word i in the compression Each p ,q,x,y is either 0 or 1,

  49. Sentential Constrains: • 1. disallows the inclusion of modifiers without their head words: • 2. presence of modifiers when the head is retained in the compression: • 3. constrains that if a verb is present in the compression then so are its arguments:

  50. Modifier Constraint Example

More Related