1 / 283

Unsupervised Models for Coreference Resolution

Unsupervised Models for Coreference Resolution. Vincent Ng Human Language Technology Research Institute University of Texas at Dallas. Plan for the Talk. Supervised learning for coreference resolution how and when supervised coreference research started

joshua
Download Presentation

Unsupervised Models for Coreference Resolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Models for Coreference Resolution Vincent Ng Human Language Technology Research Institute University of Texas at Dallas

  2. Plan for the Talk • Supervised learning for coreference resolution • how and when supervised coreference research started • standard machine learning approach

  3. Plan for the Talk • Supervised learning for coreference resolution • how and when supervised coreference research started • standard machine learning approach • Unsupervised learning for coreference resolution • self-training • EM clustering (Ng, 2008) • nonparametric Bayesian modeling (Haghighi and Klein, 2007) • three modifications

  4. Machine Learning for Coreference Resolution • started in mid-1990s • Connolly et al. (1994), Aone and Bennett (1995), McCarthy and Lehnert (1995) • propelled by availability of annotated corpora produced by • Message Understanding Conferences (MUC-6/7: 1995, 1998) • English only • Automatic Content Extraction (ACE 2003, 2004, 2005, 2008) • English, Chinese, Arabic

  5. Machine Learning for Coreference Resolution • started in mid-1990s • Connolly et al. (1994), Aone and Bennett (1995), McCarthy and Lehnert (1995) • propelled by availability of annotated corpora produced by • Message Understanding Conferences (MUC-6/7: 1995, 1998) • English only • Automatic Content Extraction (ACE 2003, 2004, 2005, 2008) • English, Chinese, Arabic • identified as an important task for information extraction • identity coreference only

  6. Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

  7. Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

  8. Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

  9. Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue,a renowned speech therapist, was summoned to help the King overcome his speech impediment...

  10. Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

  11. Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming herhusband, King George VI, into a viable monarch. Logue,a renowned speech therapist, was summoned to help the King overcome hisspeech impediment...

  12. Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity • Lots of prior work on supervised coreference resolution Queen Elizabeth set about transforming herhusband, King George VI, into a viable monarch. Logue,a renowned speech therapist, was summoned to help the King overcome hisspeech impediment...

  13. Standard Supervised Learning Approach • Classification • a classifier is trained to determine whether two mentions are coreferentor not coreferent

  14. coref ? coref ? [Queen Elizabeth] set about transforming [her] [husband], ... not coref ? Standard Supervised Learning Approach • Classification • a classifier is trained to determine whether two mentions are coreferentor not coreferent

  15. Queen Elizabeth Queen Elizabeth her coref [Queen Elizabeth], set about transforming [her] [husband] ... King George VI not coref Clustering Algorithm husband King George VI the King his not coref Logue Logue a renowned speech therapist Standard Supervised Learning Approach • Clustering • coordinates possibly contradictory pairwise coreference decisions

  16. Queen Elizabeth Queen Elizabeth her coref [Queen Elizabeth], set about transforming [her] [husband] ... King George VI not coref Clustering Algorithm husband King George VI the King his not coref Logue Logue a renowned speech therapist Standard Supervised Learning Approach • Clustering • coordinates possibly contradictory pairwise classification decisions

  17. Queen Elizabeth Queen Elizabeth her coref [Queen Elizabeth], set about transforming [her] [husband] ... King George VI not coref Clustering Algorithm husband King George VI the King his not coref Logue Logue a renowned speech therapist Standard Supervised Learning Approach • Clustering • coordinates possibly contradictory pairwise classification decisions

  18. Standard Supervised Learning Approach • Typically relies on a large amount of labeled data What if we only have a small amount of annotated data?

  19. First Attempt: Supervised Learning • train on whatever annotated data we have • need to specify • learning algorithm • feature set • clustering algorithm

  20. First Attempt: Supervised Learning • train on whatever annotated data we have • need to specify • learning algorithm (Bayes) • feature set • clustering algorithm (Bell-tree)

  21. The Bayes Classifier • finds the class value y that is the most probable given the feature vector x1,..,xn

  22. The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn

  23. The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that

  24. The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that

  25. The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that What features to use in the feature representation?

  26. Linguistic Features • Use 7 linguistic features divided into 3 groups

  27. Linguistic Features • Use 7 linguistic features divided into 3 groups

  28. Linguistic Features • Use 7 linguistic features divided into 3 groups

  29. Linguistic Features • Use 7 linguistic features divided into 3 groups

  30. Linguistic Features • Use 7 linguistic features divided into 3 groups E.g., for the mention pair (Barack Obama, president-elect),the feature value is(Name, Nominal)

  31. The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that

  32. The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that But we may have a data sparseness problem

  33. The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that But we may have a data sparseness problem Let’s simplify this term!

  34. The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that But we may have a data sparseness problem Let’s simplify this term! • assume that feature values from different groups are independent of each other given the class

  35. The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that

  36. The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that These are the model parameters (to be estimated from annotated data using maximum likelihood estimation)

  37. The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

  38. Generate the class y with P(y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

  39. Generate the class y with P(y) Given y, generate x1, x2, and x3with P(x1, x2, x3| y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

  40. Generate the class y with P(y) Given y, generate x1, x2, and x3with P(x1, x2, x3| y) Given y, generate x4, x5, and x6with P(x4, x5, x6| y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

  41. Generate the class y with P(y) Given y, generate x1, x2, and x3with P(x1, x2, x3| y) Given y, generate x4, x5, and x6with P(x4, x5, x6| y) Given y, generate x7 with P(x7| y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

  42. First Attempt: Supervised Learning • train on whatever annotated data we have • need to specify • learning algorithm • feature set • clustering algorithm

  43. Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree

  44. Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [1]

  45. [12] [1][2] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [1]

  46. Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [123] [12] [12][3] [1] [1][2]

  47. [123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [12] [1] [1][2]

  48. [123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [12] [1] [1][2]

  49. [123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree Leaves contain all the possible partitions of all of the mentions [12] [1] [1][2]

  50. [123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree Leaves contain all the possible partitions of all of the mentions [12] [1] Computationally infeasible to expand all nodes in the Bell tree [1][2]

More Related