1.04k likes | 1.36k Views
Conditional Random Fields. Advanced Statistical Methods in NLP Ling 572 February 9, 2012. Roadmap. Graphical Models Modeling independence Models revisited Generative & discriminative models Conditional random fields Linear chain models Skip chain models. Preview.
E N D
Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012
Roadmap • Graphical Models • Modeling independence • Models revisited • Generative & discriminative models • Conditional random fields • Linear chain models • Skip chain models
Preview • Conditional random fields • Undirected graphical model • Due to Lafferty, McCallum, and Pereira, 2001
Preview • Conditional random fields • Undirected graphical model • Due to Lafferty, McCallum, and Pereira, 2001 • Discriminative model • Supports integration of rich feature sets
Preview • Conditional random fields • Undirected graphical model • Due to Lafferty, McCallum, and Pereira, 2001 • Discriminative model • Supports integration of rich feature sets • Allows range of dependency structures • Linear-chain, skip-chain, general • Can encode long-distance dependencies
Preview • Conditional random fields • Undirected graphical model • Due to Lafferty, McCallum, and Pereira, 2001 • Discriminative model • Supports integration of rich feature sets • Allows range of dependency structures • Linear-chain, skip-chain, general • Can encode long-distance dependencies • Used diverse NLP sequence labeling tasks: • Named entity recognition, coreference resolution, etc
Graphical Models • Graphical model • Simple, graphical notation for conditional independence • Probabilistic model where: • Graph structure denotes conditional independence b/t random variables
Graphical Models • Graphical model • Simple, graphical notation for conditional independence • Probabilistic model where: • Graph structure denotes conditional independence b/t random variables • Nodes: random variables
Graphical Models • Graphical model • Simple, graphical notation for conditional independence • Probabilistic model where: • Graph structure denotes conditional independence b/t random variables • Nodes: random variables • Edges: dependency relation between random variables
Graphical Models • Graphical model • Simple, graphical notation for conditional independence • Probabilistic model where: • Graph structure denotes conditional independence b/t random variables • Nodes: random variables • Edges: dependency relation between random variables • Model types: • Bayesian Networks • Markov Random Fields
Modeling (In)dependence • Bayesian network
Modeling (In)dependence • Bayesian network • Directed acyclic graph (DAG)
Modeling (In)dependence • Bayesian network • Directed acyclic graph (DAG) • Nodes = Random Variables • Arc ~ directly influences, conditional dependency
Modeling (In)dependence • Bayesian network • Directed acyclic graph (DAG) • Nodes = Random Variables • Arc ~ directly influences, conditional dependency • Arcs = Child depends on parent(s) • No arcs = independent (0 incoming: only a priori) • Parents of X = • For each X need
Example I Russel & Norvig, AIMA
Example I Russel & Norvig, AIMA
Example I Russel & Norvig, AIMA
A B C D E Simple Bayesian Network • MCBN1 Need: Truth table A B depends on C depends on D depends on E depends on
A B C D E Simple Bayesian Network • MCBN1 Need: P(A) Truth table 2 A = only a priori B depends on C depends on D depends on E depends on
A B C D E Simple Bayesian Network • MCBN1 Need: P(A) P(B|A) Truth table 2 2*2 A = only a priori B depends on A C depends on D depends on E depends on
A B C D E Simple Bayesian Network • MCBN1 Need: P(A) P(B|A) P(C|A) Truth table 2 2*2 2*2 A = only a priori B depends on A C depends on A D depends on E depends on
A B C D E Simple Bayesian Network • MCBN1 Need: P(A) P(B|A) P(C|A) P(D|B,C) P(E|C) Truth table 2 2*2 2*2 2*2*2 2*2 A = only a priori B depends on A C depends on A D depends on B,C E depends on C
Holmes Example (Pearl) Holmes is worried that his house will be burgled. For the time period of interest, there is a 10^-4 a priori chance of this happening, and Holmes has installed a burglar alarm to try to forestall this event. The alarm is 95% reliable in sounding when a burglary happens, but also has a false positive rate of 1%. Holmes’ neighbor, Watson, is 90% sure to call Holmes at his office if the alarm sounds, but he is also a bit of a practical joker and, knowing Holmes’ concern, might (30%) call even if the alarm is silent. Holmes’ other neighbor Mrs. Gibbons is a well-known lush and often befuddled, but Holmes believes that she is four times more likely to call him if there is an alarm than not.
Holmes Example: Model There a four binary random variables:
W B A G Holmes Example: Model There a four binary random variables: B: whether Holmes’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called
W B A G Holmes Example: Model There a four binary random variables: B: whether Holmes’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called
W B A G Holmes Example: Model There a four binary random variables: B: whether Holmes’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called
W B A G Holmes Example: Model There a four binary random variables: B: whether Holmes’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called
Holmes Example: Tables B = #t B=#f A #t #f W=#t W=#f 0.0001 0.9999 0.90 0.10 0.30 0.70 A=#t A=#f B #t #f A #t #f G=#t G=#f 0.95 0.05 0.01 0.99 0.40 0.60 0.10 0.90
Bayes’ Nets: Markov Property • Bayes’s Nets: • Satisfy the local Markov property • Variables: conditionally independent of non-descendents given their parents
Bayes’ Nets: Markov Property • Bayes’s Nets: • Satisfy the local Markov property • Variables: conditionally independent of non-descendents given their parents
Bayes’ Nets: Markov Property • Bayes’s Nets: • Satisfy the local Markov property • Variables: conditionally independent of non-descendents given their parents
A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=
A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)
A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)P(B|A)
A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)P(B|A)P(C|A)
A B C D E Simple Bayesian Network • MCBN1 A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)P(B|A)P(C|A)P(D|B,C)P(E|C) There exist algorithms for training, inference on BNs
Naïve Bayes Model • Bayes’ Net: • Conditional independence of features given class Y f1 f2 f3 fk
Naïve Bayes Model • Bayes’ Net: • Conditional independence of features given class Y f1 f2 f3 fk
Naïve Bayes Model • Bayes’ Net: • Conditional independence of features given class Y f1 f2 f3 fk
Hidden Markov Model • Bayesian Network where: • yt depends on
Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt
Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt depends on yt y1 y2 y3 yk x1 x2 x3xk
Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt depends on yt y1 y2 y3 yk x1 x2 x3xk
Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt depends on yt y1 y2 y3 yk x1 x2 x3xk
Hidden Markov Model • Bayesian Network where: • yt depends on yt-1 • xt depends on yt y1 y2 y3 yk x1 x2 x3xk
Generative Models • Both Naïve Bayes and HMMs are generative models
Generative Models • Both Naïve Bayes and HMMs are generative models • We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y. • (Sutton & McCallum, 2006) • State y generates an observation (instance) x
Generative Models • Both Naïve Bayes and HMMs are generative models • We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y. • (Sutton & McCallum, 2006) • State y generates an observation (instance) x • Maximum Entropy and linear-chain Conditional Random Fields (CRFs) are, respectively, their discriminative model counterparts