1 / 39

Prototype-Driven Learning for Sequence Models

Prototype-Driven Learning for Sequence Models. Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley. Overview. Target Label. Prototypes. Target Label. Prototypes. Annotated Data. Unlabeled Data. Prototype List. +.

juro
Download Presentation

Prototype-Driven Learning for Sequence Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prototype-Driven Learningfor Sequence Models Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley

  2. Overview Target Label Prototypes Target Label Prototypes Annotated Data Unlabeled Data Prototype List +

  3. Sequence Modeling Tasks Size Restrict Terms Location Features Information Extraction: Classified Ads Newly remodeled2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park.Paid water and garbage.No dogs allowed. Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed. Prototype List

  4. Sequence Modeling Tasks PUNC NN VBN CC JJ CD IN DET NNS IN NNP RB English POS Newly remodeled 2Bdrms/1Bath,spacious upper unit,locatedin Hilltop Mallarea.Walkingdistance toshopping, public transportation,schoolsandpark.Paid water andgarbage. Nodogs allowed. Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed. Prototype List

  5. Generalizing Prototypes awitness reported awitness reported said the president • Tie each word to its most similar prototype

  6. Generalizing Prototypes DET NN VBD y: a witness reported x: ‘reported’ & VBD suffix-2=‘ed’ & VBD sim=‘said’ & VBD Weights ‘reported’ Æ VBD = 0.35 suffix-2=‘ed’ Æ VBD= 0.23 sim=‘said’ Æ VBD = 0.35

  7. Markov Random Fields for Unlabeled Data

  8. Markov Random Fields DET NN VBD y: a witness reported x: x: input sentence y: hidden labels

  9. Markov Random Fields DET NN VBD y: a witness reported x: ‘a’ Æ DET sim=‘the’ Æ DET suffix-1=‘a’ Æ DET Weights sim=‘the’ Æ DET = 0.75 ‘a’ Æ DET = 1.5 suffix-1=‘a’ Æ DET = 0.15

  10. Markov Random Fields DET NN VBD y: a witness reported x: ‘witness’ Æ NN sim=‘president’ Æ NN suffix-2=‘ss’ Æ NN Weights  ‘witness’ Æ NN = 0.35 sim=‘president’ Æ NN = 0.35 suffix-2=‘ss’ Æ NN= 0.23

  11. Markov Random Fields DET NN VBD y: a witness reported x: ‘reported’ Æ VBD sim=‘said’ Æ VBD suffix-2=‘ed’ Æ VBD Weights  ‘reported’ Æ VBD= 0.35 sim=‘said’ Æ VBD = 0.35 suffix-2=‘ed’ Æ ed = 0.23

  12. Markov Random Fields DET NN VBD y: DET Æ NN Æ VBD a witness reported x: Weights DET Æ NN Æ VBD = 1.15

  13. Markov Random Fields DET NN VBD y: a witness reported x: DET Æ NN Æ VBD ‘witness Æ NN suffix-2=‘ed’ Æ VBD ‘suffix-1=‘s’ Æ NN suffix-1=‘a’ Æ DET ‘a’ Æ DET score(x,y) = exp(T )

  14. Markov Random Fields • Joint Probability Model p(x,y) = score(x,y) / Z() • Partition Function Z() = x,yscore(x,y) Sum over infinite inputs!

  15. Objective Function • Given unlabeled sentences {x1,…,xn} choose  to maximize

  16. Optimization First Expectation Second Expectation ? ? ? ? ? ? • Forward Backward Algorithm • For fixed input length, Forward Backward Algorithm for Lattices a the of in …… witness the of in …… reported the of in ……

  17. Partition Function ? ? ? ? ? ? ? ? ? + + the In … the In … the In … the In … the of In … the of In … the In … the In … the of In … • Length Lattice • Compute sum for fixed length • Lattice Forward Backward [Smith & Eisner 05] • Approximation • Truncate to finite length

  18. Experiments

  19. English POS Experiments • Data • 193K tokens (about 8K sentences) of WSJ portion of Penn Treebank • Features [Smith & Eisner 05] • Trigram tagger • Word type, suffixes up to length 3, contains hyphen, contains digit, initial capitalization

  20. English POS Experiments BASE • Fully Unsupervised • Random initialization • Greedy label remapping

  21. English POS Experiments • Prototype List • 3 prototypes per tag • Automatically extracted by frequency

  22. English POS Distributional Similarity -1 +1 • Judge a word by the company it keeps <s> the president said a downturn is near </s> • Collect context counts from 40M words of WSJ • Similarity [Schuetze 93] • SVD dimensionality reduction • cos() similarity measure

  23. English POS Experiments PROTO+SIM • Add similarity features • Top five most similar prototypes that exceed threshold 67.8% on non-prototype accuracy

  24. English POS Transition Counts Learned Structure Target Structure

  25. Classified Ads Experiments • Data • 100 ads (about 119K tokens) from [Grenager et. al. 05] • Features • Trigram tagger • Word type

  26. Classified Ads Experiments BASE • Fully Unsupervised • Random initialization • Greedy label remapping

  27. Classified Ads Experiments • Prototype List • 3 prototypes per tag • 33 words in total • Automatically extracted by frequency

  28. Classified Ads Distributional Similarity <s> the president said a downturn is near </s> -1 +1 walking distance to shopping , public transportation • Different from English POS • Similar to topic model

  29. Classified Ads Experiments • Add similarity features PROTO + SIM

  30. Reacting to observed errors Location Terms • Boundary Model • Augment Prototype List schools and park . Paid water and schools and park .Paid water and schools and park .Paid water and

  31. Classified Ads Experiments BOUND • AddBoundary field

  32. Information Extraction Transition Counts Target Structure Learned Structure

  33. Conclusion • Prototype-Driven learning • Novel flexible weakly-supervised learning framework • Merged distributional clustering techniques with supervised structured models

  34. Thanks! Questions?

  35. English POS Experiments 47.7% on non-prototype accuracy 41.3 BASE 68.8 PROTO 40 50 60 70 Accuracy • Fix Prototypes to their tag • No random initialization • No remapping PROTO

  36. Classified Ads Experiments 46.4 BASE 53.7 PROTO 40 50 60 70 80 • Fix Prototypes to their tag • No random initialization • No remapping PROTO Accuracy

  37. Objective Function ? ? ? a witness reported • Sum over hidden labels • Forward-Backward Algorithm

  38. Objective Function ? ? ? ? ? ? …… + + ? ? ? ? ? ? • Infinite sum over all lengths of input • Can be computed exactly under certain conditions

  39. English POS Distributional Similarity • Collect context counts form BLIPP corpus • Similarity [Schuetze 93] • SVD dimensionality reduction • cos() between context vectors

More Related