460 likes | 632 Views
Improved Initialisation and Gaussian Mixture Pairwise Terms for Dense Random Fields with Mean-field Inference. Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr. http://cms.brookes.ac.uk/research/visiongroup/. Labelling Problem. Assign a label to each image pixel.
E N D
Improved Initialisation and Gaussian Mixture Pairwise Terms for Dense Random Fields with Mean-field Inference Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr http://cms.brookes.ac.uk/research/visiongroup/
Labelling Problem Assign a label to each image pixel Object segmentation Stereo Object detection
Problem Formulation Find a labelling that maximizes the conditional probability or minimizes the energy function
Problem Formulation Grid CRF leads to over smoothing around boundaries Inference Grid CRF construction
Problem Formulation Grid CRF leads to over smoothing around boundaries Dense CRF is able to recover fine boundaries Inference Grid CRF construction Inference Dense CRF construction
Inference in Dense CRF Very high time complexity graph-cuts based methods not feasible alpha-expansion takes almost 1200 secs/per image with neighbourhood size of 15 on PascalVOC segmentation dataset
Inference in Dense CRF • Filter-based mean-field inference method takes 0.2 secs* • Efficient inference under two assumptions • Mean-field approximation to CRF • Pairwise weights take Gaussian weights *Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11
Efficient inference in dense CRF • Mean-fields methods (Jordan et.al., 1999) • Intractable inference with distribution P • Approximate distribution from tractable family
Naïve mean field • Assume all variables are independent
Efficient inference in dense CRF • Assume Gaussian pairwise weight Mixture of Gaussian kernels Spatial Bilateral
Marginal update • Marginal update involve expectation of cost over distribution Q given that x_i takes label l Expensive message passing step is solved using highly efficient permutohedral lattice based filtering approach • Maximum posterior marginal (MPM) with approximate distribution:
Q distribution Q distribution for different classes across different iterations on CamVID dataset Iteration 0 0 0.5 0.3 0.4 0.6 0.7 0.8 0.9 0.1 0.2 1
Q distribution Q distribution for different classes across different iterations on CamVID dataset Iteration 1 0 0.5 0.3 0.4 0.6 0.7 0.8 0.9 0.1 0.2 1
Q distribution Q distribution for different classes across different iterations on CamVID dataset Iteration 2 0 0.5 0.3 0.4 0.6 0.7 0.8 0.9 0.1 0.2 1
Q distribution Q distribution for different classes across different iterations on CamVID dataset Iteration 10 0 0.5 0.3 0.4 0.6 0.7 0.8 0.9 0.1 0.2 1
Q distribution Q distribution for different classes across different iterations on CamVID dataset Iter 0 Iter 1 Iter 2 Iter 10
Two issues associated with the method • Sensitive to initialisation • Restrictive Gaussian pairwise weights
Our Contributions Resolve two issues associated with the method • Sensitive to initialisation • Propose SIFT-flow based initialisation method • Restrictive Gaussian pairwise weights • Expectation maximisation (EM) based strategy to learn more general Gaussian mixture model
Sensitivity to initialisation Experiment on PascalVOC-10 segmentation dataset Observe an improvement of almost 13% in I/U score on initialising the mean-field inference with the ground truth labelling • Good initialisation can lead to better solution Propose a SIFT-flow based better initialisation method
SIFT-flow based correspondence Given a test image, we first retrieve a set of nearest neighbours from training set using GIST features Test image Nearest neighbours retrieved from training set
SIFT-flow based correspondence K-nearest neighbours warped to the test image 13.31 14.31 23.31 18.38 22 22 Test image 22 30.87 27.2 Warped nearest neighbours and corresponding flows
SIFT-flow based correspondence Pick the best nearest neighbour based on the flow value Test image Nearest neighbour Warped image 13.31 Flow:
Label transfer Ground truth of test image Ground truth of the best nearest neighbour Flow Warp the ground truth according to correspondence Transfer labels from top 1 using flow Warped ground truth according to flow
SIFT-flow based initialisation Rescore the unary potential s rescores the unary potential of a variable based on the label observed after the label transfer stage set through cross-validation After rescoring Without rescoring Test image Ground truth Qualitative improvement in accuracy after using rescored unary potential
SIFT-flow based initialisation Initialise mean-field solution Test image Ground truth With initialisation Without initialisation Qualitative improvement in accuracy after initialisation of mean-field
Gaussian pairwise weights Experiment on PascalVOC-10 segmentation dataset Plotted the distribution of class-class ( ) interaction by selecting pair of random points (i-j) Aeroplane-Aeroplane Car-Person Horse-Person
Gaussian pairwise weights Experiment on PascalVOC-10 segmentation dataset Such complex structure of data can not be captured by zero mean Gaussian distributed horizontally distributed vertically not centred around zero mean Propose an EM-based learning strategy to incorporate more general class of Gaussian mixture model
Our model Our energy function takes following form: We use separate weights for label pairs but Gaussian components are shared We follow piecewise learning strategy to learn parameters of our energy function
Learning mixture model • Learn the parameters similar to this model* *Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11
Learning mixture model • Learn the parameters similar to this model* • Learn the parameters of the Gaussian mixture mean, standard deviation mixing coefficients *Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11
Learning mixture model • Learn the parameters similar to this model* • Learn the parameters of the Gaussian mixture mean, standard deviation mixing coefficients • Lambda is set through cross validation *Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11
Our model • We follow a generative training model • Maximise joint likelihood of pair of labels and features: : latent variable: cluster assignment We follow expectation maximization (EM) based method to maximize the likelihood function
Learning mixture model Our model is able to capture the true distribution of class-class interaction Aeroplane-Aeroplane Car-Person Horse-Person
Inference with mixture model • Involves evaluating M extra Gaussian terms: • Perform blurring on mean-shifted points • Increases time complexity
Experiments on Camvid Q distribution for building classes on CamVID dataset Iteration 0 Ground truth Without initialisation With initialisation Confidence of building pixels increases with initialisation 0 0.5 0.3 0.4 0.6 0.7 0.8 0.9 0.1 0.2 1
Experiments on Camvid Q distribution for building classes on CamVID dataset Iteration 1 Ground truth Without initialisation With initialisation Confidence of building pixels increases with initialisation 0 0.5 0.3 0.4 0.6 0.7 0.8 0.9 0.1 0.2 1
Experiments on Camvid Q distribution for building classes on CamVID dataset Iteration 2 Ground truth Without initialisation With initialisation Confidence of building pixels increases with initialisation 0 0.5 0.3 0.4 0.6 0.7 0.8 0.9 0.1 0.2 1
Experiments on Camvid Q distribution for building classes on CamVID dataset Iteration 10 Ground truth Without initialisation With initialisation Confidence of building pixels increases with initialisation 0 0.5 0.3 0.4 0.6 0.7 0.8 0.9 0.1 0.2 1
Experiments on Camvid Image 2 Ground truth Without Initialisation With Initialisation Building is properly recovered with our initialisation strategy
Experiments on Camvid Quantitative results on Camvid dataset • Our model with unary and pairwise terms achieve better accuracy than other complex models • Generally achieve very high efficiency compared to other methods
Experiments on Camvid Qualitative results on Camvid dataset Alpha-expansion Ours Image Ground truth Able to recover building and tree properly
Experiments on PascalVOC-10 Qualitative results of SIFT-flow method Output with SIFT-flow Image Output without SIFT-flow Warped nearest ground truth image Ground truth Able to recover missing body parts
Experiments on PascalVOC-10 Quantitative results PascalVOC-10 segmentation dataset • Our model with unary and pairwise terms achieves better accuracy than other complex models • Generally achieves very high efficiency compared to other methods
Experiments on PascalVOC-10 Qualitative results on PascalVOC-10 segmentation dataset Dense CRF alpha-expansion Ours Image Ground truth Able to recover missing object and body parts
Conclusion • Filter-based mean-field inference promises high efficiency and accuracy • Proposed methods to robustify basic mean-field method • SIFT-flow based method for better initialisation • EM based algorithm for learning general Gaussian mixture model • More complex higher order models can be incorporated into pairwise model