1 / 102

Conditional Random Fields

Conditional Random Fields. Advanced Statistical Methods in NLP Ling 572 February 9, 2012. Roadmap. Graphical Models Modeling independence Models revisited Generative & discriminative models Conditional random fields Linear chain models Skip chain models. Preview.

selima
Download Presentation

Conditional Random Fields

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012

  2. Roadmap • Graphical Models • Modeling independence • Models revisited • Generative & discriminative models • Conditional random fields • Linear chain models • Skip chain models

  3. Preview • Conditional random fields • Undirected graphical model • Due to Lafferty, McCallum, and Pereira, 2001

  4. Preview • Conditional random fields • Undirected graphical model • Due to Lafferty, McCallum, and Pereira, 2001 • Discriminative model • Supports integration of rich feature sets

  5. Preview • Conditional random fields • Undirected graphical model • Due to Lafferty, McCallum, and Pereira, 2001 • Discriminative model • Supports integration of rich feature sets • Allows range of dependency structures • Linear-chain, skip-chain, general • Can encode long-distance dependencies

  6. Preview • Conditional random fields • Undirected graphical model • Due to Lafferty, McCallum, and Pereira, 2001 • Discriminative model • Supports integration of rich feature sets • Allows range of dependency structures • Linear-chain, skip-chain, general • Can encode long-distance dependencies • Used diverse NLP sequence labeling tasks: • Named entity recognition, coreference resolution, etc

  7. Graphical Models

  8. Graphical Models • Graphical model • Simple, graphical notation for conditional independence • Probabilistic model where: • Graph structure denotes conditional independence b/t random variables

  9. Graphical Models • Graphical model • Simple, graphical notation for conditional independence • Probabilistic model where: • Graph structure denotes conditional independence b/t random variables • Nodes: random variables

  10. Graphical Models • Graphical model • Simple, graphical notation for conditional independence • Probabilistic model where: • Graph structure denotes conditional independence b/t random variables • Nodes: random variables • Edges: dependency relation between random variables

  11. Graphical Models • Graphical model • Simple, graphical notation for conditional independence • Probabilistic model where: • Graph structure denotes conditional independence b/t random variables • Nodes: random variables • Edges: dependency relation between random variables • Model types: • Bayesian Networks • Markov Random Fields

  12. Modeling (In)dependence • Bayesian network

  13. Modeling (In)dependence • Bayesian network • Directed acyclic graph (DAG)

  14. Modeling (In)dependence • Bayesian network • Directed acyclic graph (DAG) • Nodes = Random Variables • Arc ~ directly influences, conditional dependency

  15. Modeling (In)dependence • Bayesian network • Directed acyclic graph (DAG) • Nodes = Random Variables • Arc ~ directly influences, conditional dependency • Arcs = Child depends on parent(s) • No arcs = independent (0 incoming: only a priori) • Parents of X = • For each X need

  16. Example I Russel & Norvig, AIMA

  17. Example I Russel & Norvig, AIMA

  18. Example I Russel & Norvig, AIMA

  19. A B C D E Simple Bayesian Network • MCBN1 Need: Truth table A B depends on C depends on D depends on E depends on

  20. A B C D E Simple Bayesian Network • MCBN1 Need: P(A) Truth table 2 A = only a priori B depends on C depends on D depends on E depends on

  21. A B C D E Simple Bayesian Network • MCBN1 Need: P(A) P(B|A) Truth table 2 2*2 A = only a priori B depends on A C depends on D depends on E depends on

  22. A B C D E Simple Bayesian Network • MCBN1 Need: P(A) P(B|A) P(C|A) Truth table 2 2*2 2*2 A = only a priori B depends on A C depends on A D depends on E depends on

  23. A B C D E Simple Bayesian Network • MCBN1 Need: P(A) P(B|A) P(C|A) P(D|B,C) P(E|C) Truth table 2 2*2 2*2 2*2*2 2*2 A = only a priori B depends on A C depends on A D depends on B,C E depends on C

  24. Holmes Example (Pearl) Holmes is worried that his house will be burgled. For the time period of interest, there is a 10^-4 a priori chance of this happening, and Holmes has installed a burglar alarm to try to forestall this event. The alarm is 95% reliable in sounding when a burglary happens, but also has a false positive rate of 1%. Holmes’ neighbor, Watson, is 90% sure to call Holmes at his office if the alarm sounds, but he is also a bit of a practical joker and, knowing Holmes’ concern, might (30%) call even if the alarm is silent. Holmes’ other neighbor Mrs. Gibbons is a well-known lush and often befuddled, but Holmes believes that she is four times more likely to call him if there is an alarm than not.

  25. Holmes Example: Model There a four binary random variables:

  26. W B A G Holmes Example: Model There a four binary random variables: B: whether Holmes’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called

  27. W B A G Holmes Example: Model There a four binary random variables: B: whether Holmes’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called

  28. W B A G Holmes Example: Model There a four binary random variables: B: whether Holmes’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called

  29. W B A G Holmes Example: Model There a four binary random variables: B: whether Holmes’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called

  30. Holmes Example: Tables B = #t B=#f A #t #f W=#t W=#f 0.0001 0.9999 0.90 0.10 0.30 0.70 A=#t A=#f B #t #f A #t #f G=#t G=#f 0.95 0.05 0.01 0.99 0.40 0.60 0.10 0.90

  31. Bayes’ Nets: Markov Property • Bayes’s Nets: • Satisfy the local Markov property • Variables: conditionally independent of non-descendents given their parents

  32. Bayes’ Nets: Markov Property • Bayes’s Nets: • Satisfy the local Markov property • Variables: conditionally independent of non-descendents given their parents

  33. Bayes’ Nets: Markov Property • Bayes’s Nets: • Satisfy the local Markov property • Variables: conditionally independent of non-descendents given their parents

  34. A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=

  35. A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)

  36. A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)P(B|A)

  37. A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)P(B|A)P(C|A)

  38. A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)P(B|A)P(C|A)P(D|B,C)P(E|C) There exist algorithms for training, inference on BNs

  39. Naïve Bayes Model • Bayes’ Net: • Conditional independence of features given class Y f1 f2 f3 fk

  40. Naïve Bayes Model • Bayes’ Net: • Conditional independence of features given class Y f1 f2 f3 fk

  41. Naïve Bayes Model • Bayes’ Net: • Conditional independence of features given class Y f1 f2 f3 fk

  42. Hidden Markov Model • Bayesian Network where: • yt depends on

  43. Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt

  44. Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt depends on yt y1 y2 y3 yk x1 x2 x3xk

  45. Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt depends on yt y1 y2 y3 yk x1 x2 x3xk

  46. Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt depends on yt y1 y2 y3 yk x1 x2 x3xk

  47. Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt depends on yt y1 y2 y3 yk x1 x2 x3xk

  48. Generative Models • Both Naïve Bayes and HMMs are generative models

  49. Generative Models • Both Naïve Bayes and HMMs are generative models • We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y. • (Sutton & McCallum, 2006) • State y generates an observation (instance) x

  50. Generative Models • Both Naïve Bayes and HMMs are generative models • We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y. • (Sutton & McCallum, 2006) • State y generates an observation (instance) x • Maximum Entropy and linear-chain Conditional Random Fields (CRFs) are, respectively, their discriminative model counterparts

More Related