1 / 33

Transformation-based error-driven learning (TBL)

Transformation-based error-driven learning (TBL). LING 572 Fei Xia 2/15/07. Outline. Basic concept and properties Relation between DT, DL, and TBL Case study Summary. Basic concepts and properties. TBL overview. Introduced by Eric Brill (1992) Intuition:

Download Presentation

Transformation-based error-driven learning (TBL)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transformation-based error-driven learning (TBL) LING 572 Fei Xia 2/15/07

  2. Outline • Basic concept and properties • Relation between DT, DL, and TBL • Case study • Summary

  3. Basic concepts and properties

  4. TBL overview • Introduced by Eric Brill (1992) • Intuition: • Start with some simple solution to the problem • Then apply a sequence of transformations to improve the results • Applications: • Classification problem • Sequence labeling problem: e.g., POS tagging

  5. TBL flowchart for training

  6. Transformations • A transformation has two components: • A trigger environment: e.g., the previous tag is DT • A rewrite rule: change the current tag from MD to N If (prev_tag == DT) then MD  N • Similar to a rule in decision tree, but the rewrite rule can be complicated (e.g., change a parse tree)  a transformation list is a processor and not (just) a classifier.

  7. Training time: learn transformations • Initialize each instance in the training data with an initial annotator • Consider all the possible transformations, and choose the one with the highest score. • Append it to the transformation list and apply it to the training corpus to obtain a “new” corpus. • Repeat steps 2-3.  Steps 2-3 can be expensive. Various ways to address the problem.

  8. Testing time: applying transformations • Initialize each example in the test data with the same initial annotator • Apply the transformations in the same order as they were learned.

  9. Using TBL • Pick the initial state-annotator • Decide the space of allowable transformations • Triggering environments • Rewrite rules • Choose an objective function: (e.g., minimize error rate). • for comparing the corpus to the truth • for choosing a transformation

  10. Using TBL (cont) • Two more parameters: • Whether the effect of a transformation is visible to following transformations • If so, what’s the order in which transformations are applied to a corpus? • left-to-right • right-to-left

  11. The order matters • Transformation: If prevLabel=A then change the curLabel from A to B. • Input: A A A A A A • Output: • “Not immediate” results: A B B B B B • Immediate results, left-to-right: A B A B A B • Immediate results, right-to-left: A B B B B B

  12. Relation between DT, DL, and TBL

  13. DT and TBL DT is a subset of TBL (Proof) Prove by induction. when depth(DT)=1: • Label with S • If X then S  A • S  B

  14. T1 T2 DT is a subset of TBL Depth=n: Label with S’ L1’ Label with S’’ L2’ Depth=n+1: Label with S If X then S  S’ S  S’’ L1’ L2’

  15. DL and TBL • DL is a proper subset of TBL. • In two-class TBL: • (if q then y’  y)  (if q then y) • If multiple transformations apply to an example, only the last one matters

  16. Two-class TBL  DL ? • Two-class TBL  DL • Replace “if q then y’y” with “if q then y” • Reverse the rule order • DL  two-class TBL • Replace “if q then y” with “if q then y’y” • Reverse the rule order •  does not hold for “dynamic” problems: • Dynamic problem: the answers to questions are not static: • Ex: in POS tagging, when the tag of a word is changed, it changes the answers to questions for nearby words.

  17. An example • DL: • If q1 then c1 • If q2 then c2 • c1 • TBL: • c1 • If q2 then c2 • If q1 then c1

  18. DT, DL, and TBL (summary) • K-DT is a proper subset of k-DL. • DL is a proper subset of TBL. • Extra power of TBL comes from • TBL has a current state field. • Transformations are applied in sequence • Results of previous transformations are visible to following transformations. • TBL transforms training data. It does not split training data. • TBL is a processor, not just a classifier

  19. Case study

  20. TBL for POS tagging • The initial state-annotator: most common tag for a word. • The space of allowable transformations • Rewrite rules: change cur_tag from X to Y. • Triggering environments (feature types): unlexicalized or lexicalized

  21. Unlexicalized features • t-1 is z • t-1 or t-2 is z • t-1 or t-2 or t-3 is z • t-1 is z and t+1 is w • …

  22. Lexicalized features • w0 is w. • w-1 is w • w-1 or w-2 is w • t-1 is z and w0 is w. • …

  23. TBL for POS tagging (cont) • The objective function: tagging accuracy • for comparing the corpus to the truth: • For choosing a transformation: choose the one that results in the greatest error reduction. • The order of applying transformations: left-to-right. • The results of applying transformations are not visible to other transformations.

  24. Learned transformations

  25. Experiments

  26. Summary

  27. Properties • Existence of initial annotator • Existence of current label: those labels are updated in each iteration. • Sequence labeling • Features can refer to the current label of any token in the sequence.

  28. Strengths of TBL • TBL is more powerful than DL and DT. • Existence of initial annotator. • Transformations are applied in sequence • Results of previous transformations are visible to following transformations. • Existance of current label  It can handle dynamic problems well. • TBL is more than a classifier • Classification problems: POS tagging • Other problems: e.g., parsing • TBL performs well because it minimizes (training) errors directly.

  29. Weaknesses of TBL • Learning can be expensive  various methods • TBL is not probabilistic, and it cannot produce topN hypotheses or confidence scores.

  30. Additional slides

  31. DT is a proper subset of TBL • There exists a problem that can be solved by TBL but not a DT, for a fixed set of primitive queries. • Ex: Given a sequence of characters • Classify a char based on its position • If pos % 4 == 0 then “yes” else “no” • Input attributes available: previous two chars

  32. Transformation list: • Label with S: A/S A/S A/S A/S A/S A/S A/S • If there is no previous character, then S F A/F A/S A/S A/S A/S A/S A/S • If the char two to the left is labeled with F, then S F A/F A/S A/F A/S A/F A/S A/F • If the char two to the left is labeled with F, then FS A/F A/S A/S A/S A/F A/S A/S • F  yes • S  no

  33. DT is a subset of TBL Label with S If X then S  S’ S  S’’ L1’ (renaming X with X’) L2’ (renaming X with X’’) X’  X X’’  X

More Related