Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification. Authors: Sanjiv Kumar and Martial Hebert
Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification Authors: Sanjiv Kumar and Martial Hebert Slides prepared by Chihoon Lee Chihoon Lee
Before we start • We need to • Know the fundamental understandings of Probability and Statistics theory • Know elementary Linear algebra • Be Familiar with the concepts of Graphical Models • Distinguish the Discriminative models from Generative models The brief notes for each item will be opened soon Chihoon Lee
Discriminative Random Fields • Problem • Classification of random variables by incorporating neighborhood interactions in the labels as well as observed data • Advantages • Allow to relax the strong assumption of conditional independence of observed data, which is in general adopted in MRF framework for tractability • Derive their classifications power by exploiting the probabilistic discriminative models • All the parameters in the DRF model are estimated simultaneously from training data Chihoon Lee
DRFs • Introduction • Undirected Graphical Model (MRFs) • DRFs • Representation • Local Function • Association Potentials • Interaction Potential • Parameter Estimation • Inference • Experiments • Conclusion Chihoon Lee
Introduction • MRFs are generally used in a probabilistic generative framework that models the joint probability of the observed data and corresponding labels X={Xi}iS, Xi is the data from the ith site and S is the set of sites. Thus, an image is represented with {x1,x2,…,xn}, where n=|S|. Y= ={Yi}iS, and Yi is the corresponding label at the image site i. YiC, where C is the set of labels Chihoon Lee
Introduction • In the MRFs Framework, the posterior over the labels given observations is expressed as, • P(Y|X) = P(X,Y)/P(X) P(X,Y) =P(Y)P(X|Y) , where the prior over labels, P(Y) is modeled. • P(X|Y) is a factorized form in the likelihood model. i.e.) P(X|Y)=iSP(Xi|Yi) this is too restrictive in classification problem Chihoon Lee
Introduction • For Classification • Estimate the posterior (eg. P(Y|X)) • In Generative frame work (MRFs), • P(Y|X) = P(X|Y)P(Y)/P(X) • So need to implicitly model the observation • In Discriminative frame work, • Directly model P(Y|X) from data. Chihoon Lee
Quick Peek into DRFs • Based on the concept of Conditional Random Fields, where CRFs directly models P(Y|X) • Allow to capture arbitrary dependencies between observation. • CRFs + local discriminative models To Capture the class associations at individual sites as well as the interactions with the neighboring sites on 2-D Chihoon Lee
Y1 Y2 YN Y1 Y2 YN X1 X2 XN X MRFs and DRFs Markov Random Fields Discriminative Random Fields Chihoon Lee
DRFs • X: Observation • Y: Labels, Yi{1,-1} • P(Y|X) is modeled directly from data without modeling the prior P(Y). • Marginal P(X) is not explicitly modeled. Joint Distribution • CRF formal Definition Chihoon Lee
DRFs • Def. of CRFs • Let G=(S,E) be a graph such that Y is indexed by the vertices of G. Then (X,Y) is said to be a conditional random field if, when conditioned on X, the random variables Yi obey the Markov Property with respect to the graph: P (Yi| X,YS-{i})=P(Yi|X,YNi), where S-{i} is the set of all nodes in the graph except the node i, Ni is the set of neighbors of the node i in the G, and Y represents the set of labels at the nodes in the set Chihoon Lee
DRFs • Thus, a CRF is a random field globally conditioned on the observation X. • P(Y|X)>0 • By showing the marginal is not explicitly modeled, Joint distribution as, , where Z is a normalization constant known as partition function, and Ai and Iij are the unary and pair-wise potentials Chihoon Lee
DRFs • Association Potentials • A(Yi,X) is modeled using a local discriminative model that outputs the association of the site i with class Yi • For each site i, fi: XRl, where fi(X) is a function that maps the observations X on a feature vector. Using the logistic function, the local class posterior can be modeled as, Eq.1 Chihoon Lee
DRFs • Where w=(w0,w1) are the model parameters. • To extend the logistic model to induce a nonlinear decision boundary in the feature space, a transformed feature vector at each site i is defined as, , where k(·) is an arbitrary nonlinear function Chihoon Lee
DRFs • Eq. 1 can be rephrased as, • Finally, the association potentials is defined as, • Discuss Difference from the MRFs framework • In MRFs, allow one to use the data only from a particular site, i.e. Xi to get the log likelihood Chihoon Lee
DRFs • Interaction Potential • In MRFs, Interaction Potential is given as I=βYiYj, which penalizes every dissimilar pair of labels by the cost of β. • In DRF, IP is a function of all the observations X. • P(Yi=Yj|i(X), j(X))=P(tij|i(X), j(X)), where tij is 1 if Yi=Yj, otherwise -1. • where k(X) is a function to map x on a feature vector. i.e.) k(X)≠fk(X), Chihoon Lee
DRFs • Using a feature function denoted by uij(X), the pairwise discriminatory term defined as, P(tij|i(x), j(x))=(tijTuij(X)), where are the model parameters. • Interaction Potential in DRFs is modeled as a convex combination of two terms, I(Yi,Yj,X) = β{KYiYj+(1-K)(2 (tijTuij(X))-1}, where 0 K 1. First term – data independent smoothing term Second term – data dependent term acted as a discontinuity adaptive model that moderates the smoothing when data from two sites is different. Chihoon Lee
DRFs • β is the interaction coefficient that controls the degree of smoothing. i.e.) Large value of β produces more smooth solutions. • Now we need to estimate parameters for the models we have defined so far Chihoon Lee
DRFs • Estimation of parameters • ={w, β,,K} • Standard Maximum-Likelihood (pseudo likelihood) , where M is the total number of training examples Chihoon Lee
DRFs Initialization of w (v) is learned using standard maximum likelihood logistic regression, assuming all the labels Ymi to be independent given the observation Xm Chihoon Lee
DRFs • Inference • Given a new test data X, the goal is to find optimal label configuration Y. • Maximum Posterior Solution is widely used estimate that is optimal with respect to the zero-one cost (indicator) function defined as, C(Y,Y*)=1-(Y-Y*), where Y* is the true label configuration, and (Y-Y*) is 1 if Y=Y*, and 0 otherwise. Chihoon Lee
DRFs • An alternative to MAP • Maximum Posterior Marginal, where the cost function is defined as C(Y,Y*)=iS(1-(Yi-Yi*)) • Iterated Conditional Modes (ICM) • Given an initial configuration, ICM maximizes the local conditional probabilities iteratively, i.e. YiargmaxyiP(Yi|YNi,X) • Local Maxima Chihoon Lee
Experiments • The proposed DRFs applied to the detection of man made structures in natural scenes • Training sets: 108 images (256 by 384) • Testing sets: 129 images • Training set contains 36,269 blocks from the non-structured class and 3004 blocks from the structured class Chihoon Lee
Experiments • Feature Description • Histogram • Orientation based feature • Single-site feature (si(Xi)) • 3 moments and 2 orientation based features at site i • No consideration of correlation of neighborhood • Multi scale feature (fi(X)) • Explicitly describes the dependencies in observation X Chihoon Lee
Experiments • fi(X) – Feature space • 14 dimensions • k(fi(X)) – Transformed feature space • 14+14(14+1)/2=119 dimensions • Equivalent to the kernel mapping of data using a polynomial kernel of degree two • Compare the result with MRF Chihoon Lee
Experiments • βm is the interaction parameter of the MRF. • Class conditional density was a modeled as a mixture of Gaussian. • Performance Evaluation • DRFs outperforms • MRFs • Logistic Chihoon Lee
Conclusion • Introduced a discriminative random fields to model conditional distribution of class label without modeling class density distribution. • Explicitly modeled data dependency in neighbors • Initialization of parameters is hard problem due to the local optima (pseudo likelihood) • Extension to Multi class • Ways to apply for the Tumor Segmentation/ even for Tumor Segmentation Chihoon Lee
