870 likes | 1.09k Views
A Bayesian Approach to Recognition. Moshe Blank Ita Lifshitz. Reverend Thomas Bayes 1702-1761. Agenda. Bayesian decision theory Maximum Likelihood Bayesian Estimation Recognition Simple probabilistic model Mixture model More advanced probabilistic model “One-Shot” Learning.
E N D
A Bayesian Approach to Recognition Moshe Blank Ita Lifshitz Reverend Thomas Bayes 1702-1761
Agenda • Bayesian decision theory • Maximum Likelihood • Bayesian Estimation • Recognition • Simple probabilistic model • Mixture model • More advanced probabilistic model • “One-Shot” Learning
Bayesian Decision Theory • We are given a training set T of samples of class c. • Given a query image x, want to know the probability it belongs to the class, p(x) • We know that the class has some fixed distribution, with unknown parameters θ, that is p(x|θ) is known • Bayes rule tells us: p(x|T) =∫p(x,θ|T)dθ = ∫p(x|θ)p(θ|T)dθ • What can we do about p(θ|T)?
Maximum Likelihood Estimation What can we do about p(θ|T)? Choose parameter value θML, that make the training data most probable: θML = arg max P(T|θ) p(θ|T) = δ(θ – θML) ∫p(x|θ)p(θ|T)dθ = p(x| θML)
ML Illustration Assume that the points of T are drawn from some normal distribution with known variance and unknown mean
Bayesian Estimation • The Bayesian Estimation approach considers θ as a random variable. • Before we observe the training data, the parameters are described by a prior p(θ) which is typically very broad. • Once the data is observed, we can make use of Bayes’ formula to find posterior p(θ|T). Since some values of the parameters are more consistent with the data than others, the posterior is narrower than prior.
Bayesian Estimation • Unlike ML, Bayesian estimation does not choose a specific value for θ, but instead performs a weighted average over all possible values of θ. • Why is it more accurate then ML?
Maximal Likelihood vs Bayesian • ML and Bayesian estimations are asymptotically equivalent and “consistent”. • ML is typically computationally easier. • ML is often easier to interpret: it returns the single best model (parameter) whereas Bayesian gives a weighted average of models. • But for a finite training data (and given a reliable prior) Bayesian is more accurate (uses more of the information). • Bayesian with “flat” prior is essentially ML; with asymmetric and broad priors the methods lead to different solutions.
Agenda • Bayesian decision theory • Recognition • Simple probabilistic model • Mixture model • More advanced probabilistic model • “One-Shot” Learning
Objective Given an image, decide whether or not it contains an object of a specific class.
Main Issues • Representation • Learning • Recognition
Approaches to Recognition • Photometric properties – filter subspaces, neural networks, principal analysis… • Geometric constraints between low level object features – alignment, geometric invariance, geometric hashing… • Object Model
Model: constellation of Parts • Yuille, ‘91 • Brunelli & Poggio, ‘93 • Lades, v.d. Malsburg et al. ‘93 • Cootes, Lanitis, Taylor et al. ‘95 • Amit & Geman, ‘95, ‘99 • Perona et al. ‘95, ‘96, ‘98, ‘00, ‘02 Fischler & Elschlager, 1973
Perona’s Approach • Objects are represented as a probabilistic constellation of rigid parts (features). • The variability within a class is represented by a joint probability density function on the shape of the constellation and the appearance of the parts.
Agenda • Bayesian decision theory • Recognition • Simple probabilistic model • Model parameterization • Feature Selection • Learning • Mixture model • More advanced probabilistic model • “One-Shot” Learning
Weber, Weilling, Perona - 2000 • Unsupervised Learning of Models for Recognition • Towards Automatic Discovery of Object Categories
Unsupervised Learning Learn to recognize object class given a set of class and background pictures, without preprocessing – labeling, segmentation, alignment.
Model Description • Each object is constructed of F parts, each of a certain type. • Relations between the part locations define the shape of the object.
Image Model • Image is transformed into a collection of parts • Objects are modeled as sub collections
Model Parameterization Given an image we detect potential object parts, to obtain the following observable:
Hypothesis • When presented with an un-segmented and unlabeled image, we do not know which parts correspond to the foreground. • Assuming the image contains the object, use vector of indices h to indicate which of the observables correspond to a foreground point (i.e. real part of the object). • We call h hypothesis since it is a guess on the structure of the object. h = (h1, …, hT) is not observable.
Additional Hidden Variables • We denote by the locations of the unobserved object parts. • b = sign(h) – binary vector indicates which parts were detected • n = number of background parts detected of each type
Probabilistic Model • We can now define a generative probabilistic model for the object class using the probability density function:
Model Details Since n, b are determined by Xo, h, we have: By Bayesian rule:
Model Details Full table of joint probabilities (for small F) or F independent detection rate probabilities for large F
Model Details Poisson probability density function with parameter Mt for detection of feature of type t
Model Details Uniform probability over all hypotheses consistent with n and b
Model Details Where - coordinates of all foreground detections, and - coordinates of all background detections
Invariance to Translation Rotation and Scale • There is no use in modeling the shape of the object in terms of absolute pixel positions of the features. • We apply a transformation on features’ coordinates to make the shape invariant to translation, rotation and scale. • But the feature detector must be invariant to the transformations as well!
Automatic Part Selection • Find points of interest in all training images • Apply Vector Quantization and clustering to get 100 total candidate patterns.
Automatic Part Selection Points of interest patterns
Method Scheme Part Selection Model Learning Test
Automatic Part Selection • Find subset of candidate parts of (small) size F to be used in the model that gives the best performance in the learning phase. 57% 87% 51%
Learning • Goal: Find θ = {μ, Σ, p(b), M} which best explains the observed (training) data μ, Σ – expectation and covariance parameters of the joint Gaussian modeling the shape of the foreground b – random variable denoting whether each of the parts of the model is detected or not M – average number of background detections for each of the parts
Learning • Goal: Find θ = {μ, Σ, p(b), M} which best explains the observed (training) data, i.e. maximize the likelihood arg max p( Xo| θ ) θ • Done using the EM method
Expectation Maximization (EM) • EM is an iterative optimization method to estimate some unknown parameters θ, given measurement data, but not given some “hidden” variables J. • We want to maximize the posterior probability of the parameters θ given the data U, marginalizing over J:
Expectation Maximization (EM) E-Step: Estimate unobserved data using θk Choose an initial parameter θ0 Guess of parameters θk Guess of unknown hidden data Observed Data M-Step: Compute Maximum Likelihood Estimate parameter θk+1 using estimated data
Expectation Maximization (EM) • alternate between estimating the unknowns θ and the hidden variables J. • EM algorithm converges to a local maximum
Method Scheme Part Selection Model Learning Test
Recognition • Using the maximum a posteriori approach we consider the ratio R = where h0 is the null hypothesis – which explains all parts as background noise. • We accept the image as belonging to the class if R is above a certain threshold.
Database • Two classes – faces and cars • 100 training images for each class • 100 test images for each class • Images vary in scale, location of the object, lighting conditions • Images have cluttered background • No manual preprocessing
Model Performance Average training and testing errors measured as 1-Area(ROC) Suggests 4 parts model for faces and 5 parts model for cars as optimal.
Multiple use of parts Part Labels: Part ‘C’ has high variance along the vertical direction – can be detected in several locations – bumper, license plate or roof.
Recognition Results • Average success rate (at even False Positive and False Negative ratios): • Faces: 93.5% • Cars: 86.5%
Agenda • Bayesian decision theory • Recognition • Simple probabilistic model • Mixture model • More advanced probabilistic model • “One-Shot” Learning
Mixture Model • Gaussian model works good for homogenous classes, but real life objects can be far from homogenous. • Can we extend our approach to multi-model classes?
Mixture Model • An object is modeled using Ω different components, each is a probabilistic model: • Each component “sees the whole picture”. • Components are trained together.
Database • Faces with different viewing angles – 0°, 15°, …, 90° • Cars – rear view and side view • Tree leaves – of several types