900 likes | 1.19k Views
Conditional Random Fields. Presented by Shira Kritchman & Lena Gorelick May 13, 2007. Advanced Topics in Computer and Human Vision Spring 2007. Outline. Introduction Statistical Modeling Generative vs. Discriminative Models Naïve Bayes vs. Logistic Regression Sequence Modeling: HMM
E N D
Conditional Random Fields Presented by Shira Kritchman & Lena Gorelick May 13, 2007 Advanced Topics in Computer and Human Vision Spring 2007
Outline • Introduction • Statistical Modeling • Generative vs. Discriminative Models • Naïve Bayes vs. Logistic Regression • Sequence Modeling: HMM • CRF • Sequence Modeling: Linear Chain CRF • Learning (Parameter Estimation) • Improved Iterative Scaling (IIS) • S algorithm • General CRF • Applications
Problem Formulation – Label Assignment • Classification • Segmentation Horse Horse Non-horse
Problem Formulation – Label Assignment Labels Observation Labels Horse Horse Non-horse
Problem Formulation – Label Assignment • Classification Biology Math
Problem Formulation – Label Assignment • Parsing TAKE THE GREEN APPLE FROM THE BOX THEN HIT IT WITH MY SWORD verb article adjective noun preposition article noun conjunction verb pronoun preposition adjective noun
Problem Formulation • Find the Mechanism = LEARNING ? Prior Knowledge Data – Horses are rarely blue. Horses have 4 legs. … Neighboring pixels have similar labels.
Outline • Introduction • Statistical Modeling • Generative vs. Discriminative Models • Naïve Bayes vs. Logistic Regression • Sequence Modeling: HMM • CRF • Sequence Modeling: Linear Chain CRF • Learning (Parameter Estimation) • Improved Iterative Scaling (IIS) • S algorithm • General CRF • Applications
Generative Modeling • Define a joint probability distributionover observation and label pairs • Label Assignment: Bayes
What does it generate? • Assumption: the “outputs” probabilistically generate the “inputs” • So we use
Generative Modeling • What are the candidate distributions for ? • Too simple • Underfitting • Too sparse • Overfitting
Data Prior Knowledge Horses are rarely blue. Horses have 4 legs. … Neighboring pixels have similar labels. Generative Modeling – Model Family • Define a model family ?
Generative Modeling – Likelihood • We look for that maximizes the likelihood Data - Training Data !
Outline • Introduction • Statistical Modeling • Generative vs. Discriminative Models • Naïve Bayes vs. Logistic Regression • Sequence Modeling: HMM • CRF • Sequence Modeling: Linear Chain CRF • Learning (Parameter Estimation) • Improved Iterative Scaling (IIS) • S algorithm • General CRF • Applications
Discriminative Modeling • Directly defines a conditional probability distribution over the labels given the observation • Label Assignment:
Discriminative Modeling • Does not include a model of • Which is not needed for label assignment anyway!
Discriminative Modeling – Model Family • Define a model family ?
Discriminative Modeling – Likelihood • We look for that maximizes the conditional likelihood Data !
What is Conditional Likelihood? not required for the labeling task Conditional Likelihood!
Generative Model – Example • Naïve Bayes – Horse Horse Non-horse
Discriminative Model – Example • Logistic Regression –
Can have complex dependencies among are independent given Generative vs. Discriminative • Discriminative Model is better suited to contain rich overlapping features
Generative vs. Discriminative • Model relation between (age, weight, blood preasure) will suffer from a heart attack soon (binary) • Natural to model • Unnatural to model
Models Strict independence assumptions on the observations Models Allows arbitrary, inter-dependent features on the observation Does not spend effort on modeling Generative vs. Discriminative
Outline • Introduction • Statistical Modeling • Generative vs. Discriminative Models • Naïve Bayes vs. Logistic Regression • Sequence Modeling: HMM • CRF • Sequence Modeling: Linear Chain CRF • Learning (Parameter Estimation) • Improved Iterative Scaling (IIS) • S algorithm • General CRF • Applications
Classifiers and Graphical Models • and predict a single variable • What about predicting many variables that are interdependent? • Use a graphical model
Sequence Models – HMM • Simple graphical models Hidden states Observable variables
Sequence Models – HMM • Parsing verb article adjective noun preposition article noun conjunction verb pronoun preposition adjective noun TAKE THE GREEN APPLE FROM THE BOX THEN HIT IT WITH MY SWORD
HMM – Exponential Form • Rewrite as features noun verb noun apple
Outline • Introduction • Statistical Modeling • Generative vs. Discriminative Models • Naïve Bayes vs. Logistic Regression • Sequence Modeling: HMM • CRF • Sequence Modeling: Linear Chain CRF • Learning (Parameter Estimation) • Improved Iterative Scaling (IIS) • S algorithm • General CRF • Applications
From HMM to CRF • The underlying conditional distribution: Partition function per observation
From HMM to CRF • We can now use richer features of the observation for the same price!
Linear-Chain CRF – Definition • random vectors • a parameter vector • real-valued functions Linear-Chain CRF is HMM
Outline • Introduction • Statistical Modeling • Generative vs. Discriminative Models • Naïve Bayes vs. Logistic Regression • Sequence Modeling: HMM • CRF • Sequence Modeling: Linear Chain CRF • Learning (Parameter Estimation) • Improved Iterative Scaling (IIS) • S algorithm • General CRF • Applications
Parameter Estimation – Maximum Likelihood • Maximize the conditional log likelihood: Concave!!! Global Maximum!!!
Parameter Estimation – Maximum Likelihood • Take partial derivatives w.r.t. • There is no closed form solution, since are coupled. Any Alternatives? Model expectation Empirical mean Detour
We assumed: We maximized conditional likelihood: We got: We assume: We maximize conditional entropy: We get: Maximum Likelihood – Maximum Entropy We get the same distribution
Parameter Estimation - Finding • Given current parameter estimation • Find a new set of parameters s.t., • Repeat until convergence Gain in likelihood
Parameter Estimation - Finding • Bound with auxiliary function • Maximize w.r.t. • Update
Parameter Estimation - Finding • Improved Iterative Scaling Algorithm (IIS): • Start with some (arbitrary) value for each • Repeat until convergence: • Solve for • Set
Parameter Estimation: IIS – S algorithm • Differentiating w.r.t. gives • Note that if Total feature count
Parameter Estimation: IIS – S algorithm • Define a new slack feature • And we have an additional constraint for
Parameter Estimation: IIS – S algorithm For each need to compute marginals at every iteration! Local in
Computing Marginals with BP • Computing marginals in Linear-Chain CRF • efficient and exact BP • IIS algorithm # optimization steps # in training
Parameter Estimation: IIS(S) – Summary • Closed form solution • Converges to global maximum • is proportional to the length of • Small optimization steps for large • T algorithm
Outline • Introduction • Statistical Modeling • Generative vs. Discriminative Models • Naïve Bayes vs. Logistic Regression • Sequence Modeling: HMM • CRF • Sequence Modeling: Linear Chain CRF • Learning (Parameter Estimation) • Improved Iterative Scaling (IIS) • S algorithm • General CRF • Applications