1 / 34

Unsupervised Constraint Driven Learning for Transliteration Discovery

Unsupervised Constraint Driven Learning for Transliteration Discovery. M. Chang, D. Goldwasser, D. Roth, and Y. Tu. What I am going to do today… . Goal 1 : Present the transliteration work Get feedback! Goal 2: Think about this work with CCM Tutorial …. 

mareo
Download Presentation

Unsupervised Constraint Driven Learning for Transliteration Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu

  2. What I am going to do today… • Goal 1 : Present the transliteration work • Get feedback! • Goal 2: Think about this work with CCM • Tutorial ….  • I will try to present this work in a slightly different way • Some of them are my personal comment • Different than our yesterday discussion • Please give us comment about this • Make this work more general (not only transliteration)

  3. Wait a sec! What is CCM? • I get this question 100 times already! • Informal answer: • everything uses constraints is CCM!  • Formal Answer • No constraints • CCM: • We do not define the training method • Definition: CCM makes prediction with constraints!

  4. Constraints Driven Learning • Why Constraints? • The Goal: Building a good system easily • We have prior knowledge at our hand • Why not inject knowledge directly ? • How useful are constraints? • Useful for supervised learning [Yih and Roth 04] [many others] • Useful for semi-supervised learning [Chang et.al. ACL 2007] • Some times more efficient than labeling data directly

  5. Unsupervised Constraint Driven Learning • In this work • We do not use any label instance • Achieve to good performance that competitive several supervised model • Compared to [Chang et.al. ACL 2007] • In ACL 07, they use a small amount of dataset (5-20) • Reason: Bad Models can not benefit from constraints! • For some applications, we have very good resource • We do not need labeled instances at all!

  6. In a nutshell: • Traditional semi-supervised learning. • Model can drift from the correct one. Unsupervised Learning Resource  Model Feedback Learn from labeled data Prediction Label unlabeled data Unlabeled Data

  7. In a nutshell: CODL Improves“Simple” Model Using Expressive Constraints CODL Use constraints to generate better training samples in unsupervised learning. Better Model Model Prediction+ Constraints Prediction Feedback Unlabeled Data More accurate labeling

  8. Outline Constraint Driven Learning (CoDL) Transliteration Discovery Algorithm Experimental Results

  9. Transliteration Generation (Not our focus) • Given a Source Transliteration; What is the target transliteration? • Bush •  布希 • Sushi •  壽司 • Issues • Ambiguity : • For the same source word, many different transliteration • Think about Chinese • What we want: find the most widely used transliteration

  10. Transliteration Discovery (Our focus) • Problem Settings • Give you two list of words, map them! • Advantages • A relatively easy problem • Can find the most widely used transliteration • Assumption: • Source: English • Each source entities has a transliteration in the target candidates • Target candidates might not be named entities

  11. Outline Constraint Driven Learning (CoDL) Transliteration Discovery Algorithm Experimental Results

  12. Algorithm Outline Prediction Model How to use existing resource to construct the Model? Constraints? Learning Algorithm

  13. The Prediction Model • How do we make prediction? • Given a source word, how to predict the best target ? • Model 1 : Vs, Vt  Yes or No • Issue: Not many obvious constraints can be added • Not a structure prediction problem • Model 2: Vs, Vt  Hidden variables  Yes or No • PredictingFisa structure prediction algorithm • We can add constraints more easily

  14. The Prediction Model Violation Hidden Variables More on this point in the next few slides Score for a pair A CCM formulation A slightly different scoring function

  15. Prediction Model: Another View • The scoring function looks like weight times features! • If there is a bad feature, score  - ∞ • Our Hidden variable (Feature Vectors): • Character Mapping

  16. Everything (a,a), (o,O), (w,_),……

  17. Algorithm Outline Prediction Model How to use existing resource to construct the Model? Constraints? Learning Algorithm

  18. Resource: Romanization Table • Hebrew, Russian • How can you type Hebrew or Russian? • Use English Keyboard, C maps to • A similar character “C” or “S” in Hebrew or Russian • Very easy to get • Ambiguous • Special Case: Chinese (Pin Yin) • 壽司 shòu sī (Low ambiguity) • Map Pin-Yin to English (sushi) • Romanization Table? a a

  19. Initialize the Table • Every character pair in the Romanization Table • Weight = 0 • Everything else, -1 • Could have better way to do initialization • Note: All (v_s,v_t) will get zerowithout constraints

  20. Algorithm Outline Prediction Model How to use existing resource to construct the Model? Constraints? Learning Algorithm

  21. Constraints • General Constraints • Coverage: all character need to be mapped at least once • No crossing: character mappings can not cross each other • Language Specific Constraints • General Restricted Mapping • Initial Restricted Mapping • Length Restriction

  22. Constraints Many other works use similar information as well! Pin-Yin to English

  23. Algorithm Outline Prediction Model How to use existing resource to construct the Model? Constraints? Learning Algorithm

  24. High-Level Overview • Model  Resource • While Converge • Use Model + Constraints to get Labels (for both F, y) • Update Model with newly labeled F and y (without Constraints) (details in the next slide) • Similar to ACL 07 • Update the model without Constraints • Difference from ACL 07 • We get feedback from the labels of both hidden variables and output

  25. Training Predict hidden variables and the labels Update Algorithm

  26. Outline Constraint Driven Learning (CoDL) Transliteration Discovery Algorithm Experimental Results

  27. Experimental Setting • Evaluation • ACC: Top candidate is (one of) the right answer • Learning Algorithm • Linear SVM with C = 0.5 • Dataset • English-Hebrew 300: 300 • English-Chinese 581:681 • English-Russian 727:50648 (Target includes all words)

  28. Results - Hebrew

  29. Results - Russian

  30. Analysis 4) Better Constraints Lead to Better Final Results 3) Learning has great impact here! But constraints are very important, too! 1) Without Constraints (on features), Romanization Table is useless! 2) General Constraints are more important! A small Russian subset was used here

  31. Related Works (Need more work here) Learning the score for Edit Distance Previous transliteration works Machine translation?

  32. Conclusion • ML: unsupervised constraint driven algorithm • Use hidden variable to find more constraints (e.g. co-ref) • Use constraints to find “cleaner” feature representation • Transliteration: • Usage of Normalization Table as the starting point • We can get good results without training data • Right constraints (modeling) is the key • Future Work • Transliteration Model: Better Model, Quicker Inference • CoDL: Other applications for unsupervised CoDL

  33. Constraint - DrivenLearning (CODL) Any supervised learning algorithm parametrized by  =learn(Tr) For N iterations do T= For each x in unlabeled dataset y Inference(x, ) T=T  {(x, y)} = +(1- )learn(T) Augmenting the training set (feedback). Any inference algorithm (with constraints). Inference(x,C, ) Learn from new training data. Weight supervised and unsupervised model(Nigam2000*).

  34. Unsupervised Constraint - DrivenLearning Construct the model with Resources =Construct(Resource) For N iterations do T= For each x in unlabeled dataset y Inference(x, ) T=T  {(x, y)} = +(1- )learn(T) Augmenting the training set (feedback). Any inference algorithm (with constraints). Inference(x,C, ) Learn from new training data.  = 0 in this work

More Related