Prototype-Driven Learning for Sequence Models

Prototype-Driven Learningfor Sequence Models Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley

Overview Target Label Prototypes Target Label Prototypes Annotated Data Unlabeled Data Prototype List +

Sequence Modeling Tasks Size Restrict Terms Location Features Information Extraction: Classified Ads Newly remodeled2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park.Paid water and garbage.No dogs allowed. Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed. Prototype List

Sequence Modeling Tasks PUNC NN VBN CC JJ CD IN DET NNS IN NNP RB English POS Newly remodeled 2Bdrms/1Bath,spacious upper unit,locatedin Hilltop Mallarea.Walkingdistance toshopping, public transportation,schoolsandpark.Paid water andgarbage. Nodogs allowed. Newly remodeled 2 Bdrms/1 Bath, spacious upper unit, located in Hilltop Mall area. Walking distance to shopping, public transportation, schools and park. Paid water and garbage. No dogs allowed. Prototype List

Generalizing Prototypes awitness reported awitness reported said the president • Tie each word to its most similar prototype

Generalizing Prototypes DET NN VBD y: a witness reported x: ‘reported’ & VBD suffix-2=‘ed’ & VBD sim=‘said’ & VBD Weights ‘reported’ Æ VBD = 0.35 suffix-2=‘ed’ Æ VBD= 0.23 sim=‘said’ Æ VBD = 0.35

Markov Random Fields for Unlabeled Data

Markov Random Fields DET NN VBD y: a witness reported x: x: input sentence y: hidden labels

Markov Random Fields DET NN VBD y: a witness reported x: ‘a’ Æ DET sim=‘the’ Æ DET suffix-1=‘a’ Æ DET Weights sim=‘the’ Æ DET = 0.75 ‘a’ Æ DET = 1.5 suffix-1=‘a’ Æ DET = 0.15

Markov Random Fields DET NN VBD y: a witness reported x: ‘witness’ Æ NN sim=‘president’ Æ NN suffix-2=‘ss’ Æ NN Weights  ‘witness’ Æ NN = 0.35 sim=‘president’ Æ NN = 0.35 suffix-2=‘ss’ Æ NN= 0.23

Markov Random Fields DET NN VBD y: a witness reported x: ‘reported’ Æ VBD sim=‘said’ Æ VBD suffix-2=‘ed’ Æ VBD Weights  ‘reported’ Æ VBD= 0.35 sim=‘said’ Æ VBD = 0.35 suffix-2=‘ed’ Æ ed = 0.23

Markov Random Fields DET NN VBD y: DET Æ NN Æ VBD a witness reported x: Weights DET Æ NN Æ VBD = 1.15

Markov Random Fields DET NN VBD y: a witness reported x: DET Æ NN Æ VBD ‘witness Æ NN suffix-2=‘ed’ Æ VBD ‘suffix-1=‘s’ Æ NN suffix-1=‘a’ Æ DET ‘a’ Æ DET score(x,y) = exp(T )

Markov Random Fields • Joint Probability Model p(x,y) = score(x,y) / Z() • Partition Function Z() = x,yscore(x,y) Sum over infinite inputs!

Objective Function • Given unlabeled sentences {x1,…,xn} choose  to maximize

Optimization First Expectation Second Expectation ? ? ? ? ? ? • Forward Backward Algorithm • For fixed input length, Forward Backward Algorithm for Lattices a the of in …… witness the of in …… reported the of in ……

Partition Function ? ? ? ? ? ? ? ? ? + + the In … the In … the In … the In … the of In … the of In … the In … the In … the of In … • Length Lattice • Compute sum for fixed length • Lattice Forward Backward [Smith & Eisner 05] • Approximation • Truncate to finite length

Experiments

English POS Experiments • Data • 193K tokens (about 8K sentences) of WSJ portion of Penn Treebank • Features [Smith & Eisner 05] • Trigram tagger • Word type, suffixes up to length 3, contains hyphen, contains digit, initial capitalization

English POS Experiments BASE • Fully Unsupervised • Random initialization • Greedy label remapping

English POS Experiments • Prototype List • 3 prototypes per tag • Automatically extracted by frequency

English POS Distributional Similarity -1 +1 • Judge a word by the company it keeps <s> the president said a downturn is near </s> • Collect context counts from 40M words of WSJ • Similarity [Schuetze 93] • SVD dimensionality reduction • cos() similarity measure

English POS Experiments PROTO+SIM • Add similarity features • Top five most similar prototypes that exceed threshold 67.8% on non-prototype accuracy

English POS Transition Counts Learned Structure Target Structure

Classified Ads Experiments • Data • 100 ads (about 119K tokens) from [Grenager et. al. 05] • Features • Trigram tagger • Word type

Classified Ads Experiments BASE • Fully Unsupervised • Random initialization • Greedy label remapping

Classified Ads Experiments • Prototype List • 3 prototypes per tag • 33 words in total • Automatically extracted by frequency

Classified Ads Distributional Similarity <s> the president said a downturn is near </s> -1 +1 walking distance to shopping , public transportation • Different from English POS • Similar to topic model

Classified Ads Experiments • Add similarity features PROTO + SIM

Reacting to observed errors Location Terms • Boundary Model • Augment Prototype List schools and park . Paid water and schools and park .Paid water and schools and park .Paid water and

Classified Ads Experiments BOUND • AddBoundary field

Information Extraction Transition Counts Target Structure Learned Structure

Conclusion • Prototype-Driven learning • Novel flexible weakly-supervised learning framework • Merged distributional clustering techniques with supervised structured models

Thanks! Questions?

English POS Experiments 47.7% on non-prototype accuracy 41.3 BASE 68.8 PROTO 40 50 60 70 Accuracy • Fix Prototypes to their tag • No random initialization • No remapping PROTO

Classified Ads Experiments 46.4 BASE 53.7 PROTO 40 50 60 70 80 • Fix Prototypes to their tag • No random initialization • No remapping PROTO Accuracy

Objective Function ? ? ? a witness reported • Sum over hidden labels • Forward-Backward Algorithm

Objective Function ? ? ? ? ? ? …… + + ? ? ? ? ? ? • Infinite sum over all lengths of input • Can be computed exactly under certain conditions

English POS Distributional Similarity • Collect context counts form BLIPP corpus • Similarity [Schuetze 93] • SVD dimensionality reduction • cos() between context vectors

Prototype-Driven Learning for Sequence Models

Prototype-Driven Learning for Sequence Models

Presentation Transcript

Measuring Prototype Structures for Models

NLP Document and Sequence Models

Prototype-Driven Learning for Sequence Models

Prototype Models From Toy Trains

Community driven models

Student Driven Learning

Prototype-Driven Grammar Induction

Evolutionary Models for Multiple Sequence Alignment

Homework 7: Sequence Models

Hidden Markov Models For DNA Sequence Alignment

Assessment Driven Learning

Data Driven Design for Learning

Prototype-Driven Grammar Induction

Welcome: To the fifth learning sequence “ Data Models “

Learning Models

E-learning opportunities for JSEC – prototype 1

Sequence Models

Event-Driven Models

Learning Models

Prototype Learning Activities for Grade 9 Science