2.85k likes | 3.04k Views
Unsupervised Models for Coreference Resolution. Vincent Ng Human Language Technology Research Institute University of Texas at Dallas. Plan for the Talk. Supervised learning for coreference resolution how and when supervised coreference research started
E N D
Unsupervised Models for Coreference Resolution Vincent Ng Human Language Technology Research Institute University of Texas at Dallas
Plan for the Talk • Supervised learning for coreference resolution • how and when supervised coreference research started • standard machine learning approach
Plan for the Talk • Supervised learning for coreference resolution • how and when supervised coreference research started • standard machine learning approach • Unsupervised learning for coreference resolution • self-training • EM clustering (Ng, 2008) • nonparametric Bayesian modeling (Haghighi and Klein, 2007) • three modifications
Machine Learning for Coreference Resolution • started in mid-1990s • Connolly et al. (1994), Aone and Bennett (1995), McCarthy and Lehnert (1995) • propelled by availability of annotated corpora produced by • Message Understanding Conferences (MUC-6/7: 1995, 1998) • English only • Automatic Content Extraction (ACE 2003, 2004, 2005, 2008) • English, Chinese, Arabic
Machine Learning for Coreference Resolution • started in mid-1990s • Connolly et al. (1994), Aone and Bennett (1995), McCarthy and Lehnert (1995) • propelled by availability of annotated corpora produced by • Message Understanding Conferences (MUC-6/7: 1995, 1998) • English only • Automatic Content Extraction (ACE 2003, 2004, 2005, 2008) • English, Chinese, Arabic • identified as an important task for information extraction • identity coreference only
Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue,a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming herhusband, King George VI, into a viable monarch. Logue,a renowned speech therapist, was summoned to help the King overcome hisspeech impediment...
Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity • Lots of prior work on supervised coreference resolution Queen Elizabeth set about transforming herhusband, King George VI, into a viable monarch. Logue,a renowned speech therapist, was summoned to help the King overcome hisspeech impediment...
Standard Supervised Learning Approach • Classification • a classifier is trained to determine whether two mentions are coreferentor not coreferent
coref ? coref ? [Queen Elizabeth] set about transforming [her] [husband], ... not coref ? Standard Supervised Learning Approach • Classification • a classifier is trained to determine whether two mentions are coreferentor not coreferent
Queen Elizabeth Queen Elizabeth her coref [Queen Elizabeth], set about transforming [her] [husband] ... King George VI not coref Clustering Algorithm husband King George VI the King his not coref Logue Logue a renowned speech therapist Standard Supervised Learning Approach • Clustering • coordinates possibly contradictory pairwise coreference decisions
Queen Elizabeth Queen Elizabeth her coref [Queen Elizabeth], set about transforming [her] [husband] ... King George VI not coref Clustering Algorithm husband King George VI the King his not coref Logue Logue a renowned speech therapist Standard Supervised Learning Approach • Clustering • coordinates possibly contradictory pairwise classification decisions
Queen Elizabeth Queen Elizabeth her coref [Queen Elizabeth], set about transforming [her] [husband] ... King George VI not coref Clustering Algorithm husband King George VI the King his not coref Logue Logue a renowned speech therapist Standard Supervised Learning Approach • Clustering • coordinates possibly contradictory pairwise classification decisions
Standard Supervised Learning Approach • Typically relies on a large amount of labeled data What if we only have a small amount of annotated data?
First Attempt: Supervised Learning • train on whatever annotated data we have • need to specify • learning algorithm • feature set • clustering algorithm
First Attempt: Supervised Learning • train on whatever annotated data we have • need to specify • learning algorithm (Bayes) • feature set • clustering algorithm (Bell-tree)
The Bayes Classifier • finds the class value y that is the most probable given the feature vector x1,..,xn
The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn
The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that
The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that
The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that What features to use in the feature representation?
Linguistic Features • Use 7 linguistic features divided into 3 groups
Linguistic Features • Use 7 linguistic features divided into 3 groups
Linguistic Features • Use 7 linguistic features divided into 3 groups
Linguistic Features • Use 7 linguistic features divided into 3 groups
Linguistic Features • Use 7 linguistic features divided into 3 groups E.g., for the mention pair (Barack Obama, president-elect),the feature value is(Name, Nominal)
The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that
The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that But we may have a data sparseness problem
The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that But we may have a data sparseness problem Let’s simplify this term!
The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that But we may have a data sparseness problem Let’s simplify this term! • assume that feature values from different groups are independent of each other given the class
The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that
The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that These are the model parameters (to be estimated from annotated data using maximum likelihood estimation)
The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated
Generate the class y with P(y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated
Generate the class y with P(y) Given y, generate x1, x2, and x3with P(x1, x2, x3| y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated
Generate the class y with P(y) Given y, generate x1, x2, and x3with P(x1, x2, x3| y) Given y, generate x4, x5, and x6with P(x4, x5, x6| y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated
Generate the class y with P(y) Given y, generate x1, x2, and x3with P(x1, x2, x3| y) Given y, generate x4, x5, and x6with P(x4, x5, x6| y) Given y, generate x7 with P(x7| y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated
First Attempt: Supervised Learning • train on whatever annotated data we have • need to specify • learning algorithm • feature set • clustering algorithm
Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree
Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [1]
[12] [1][2] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [1]
Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [123] [12] [12][3] [1] [1][2]
[123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [12] [1] [1][2]
[123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [12] [1] [1][2]
[123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree Leaves contain all the possible partitions of all of the mentions [12] [1] [1][2]
[123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree Leaves contain all the possible partitions of all of the mentions [12] [1] Computationally infeasible to expand all nodes in the Bell tree [1][2]