A Bayesian Approach to Recognition

A Bayesian Approach to Recognition Moshe Blank Ita Lifshitz Reverend Thomas Bayes 1702-1761

Agenda • Bayesian decision theory • Maximum Likelihood • Bayesian Estimation • Recognition • Simple probabilistic model • Mixture model • More advanced probabilistic model • “One-Shot” Learning

Bayesian Decision Theory • We are given a training set T of samples of class c. • Given a query image x, want to know the probability it belongs to the class, p(x) • We know that the class has some fixed distribution, with unknown parameters θ, that is p(x|θ) is known • Bayes rule tells us: p(x|T) =∫p(x,θ|T)dθ = ∫p(x|θ)p(θ|T)dθ • What can we do about p(θ|T)?

Maximum Likelihood Estimation What can we do about p(θ|T)? Choose parameter value θML, that make the training data most probable: θML = arg max P(T|θ) p(θ|T) = δ(θ – θML) ∫p(x|θ)p(θ|T)dθ = p(x| θML)

ML Illustration Assume that the points of T are drawn from some normal distribution with known variance and unknown mean

Bayesian Estimation • The Bayesian Estimation approach considers θ as a random variable. • Before we observe the training data, the parameters are described by a prior p(θ) which is typically very broad. • Once the data is observed, we can make use of Bayes’ formula to find posterior p(θ|T). Since some values of the parameters are more consistent with the data than others, the posterior is narrower than prior.

Bayesian Estimation • Unlike ML, Bayesian estimation does not choose a specific value for θ, but instead performs a weighted average over all possible values of θ. • Why is it more accurate then ML?

Maximal Likelihood vs Bayesian • ML and Bayesian estimations are asymptotically equivalent and “consistent”. • ML is typically computationally easier. • ML is often easier to interpret: it returns the single best model (parameter) whereas Bayesian gives a weighted average of models. • But for a finite training data (and given a reliable prior) Bayesian is more accurate (uses more of the information). • Bayesian with “flat” prior is essentially ML; with asymmetric and broad priors the methods lead to different solutions.

Agenda • Bayesian decision theory • Recognition • Simple probabilistic model • Mixture model • More advanced probabilistic model • “One-Shot” Learning

Objective Given an image, decide whether or not it contains an object of a specific class.

Main Issues • Representation • Learning • Recognition

Approaches to Recognition • Photometric properties – filter subspaces, neural networks, principal analysis… • Geometric constraints between low level object features – alignment, geometric invariance, geometric hashing… • Object Model

Model: constellation of Parts • Yuille, ‘91 • Brunelli & Poggio, ‘93 • Lades, v.d. Malsburg et al. ‘93 • Cootes, Lanitis, Taylor et al. ‘95 • Amit & Geman, ‘95, ‘99 • Perona et al. ‘95, ‘96, ‘98, ‘00, ‘02 Fischler & Elschlager, 1973

Perona’s Approach • Objects are represented as a probabilistic constellation of rigid parts (features). • The variability within a class is represented by a joint probability density function on the shape of the constellation and the appearance of the parts.

Agenda • Bayesian decision theory • Recognition • Simple probabilistic model • Model parameterization • Feature Selection • Learning • Mixture model • More advanced probabilistic model • “One-Shot” Learning

Weber, Weilling, Perona - 2000 • Unsupervised Learning of Models for Recognition • Towards Automatic Discovery of Object Categories

Unsupervised Learning Learn to recognize object class given a set of class and background pictures, without preprocessing – labeling, segmentation, alignment.

Model Description • Each object is constructed of F parts, each of a certain type. • Relations between the part locations define the shape of the object.

Image Model • Image is transformed into a collection of parts • Objects are modeled as sub collections

Model Parameterization Given an image we detect potential object parts, to obtain the following observable:

Hypothesis • When presented with an un-segmented and unlabeled image, we do not know which parts correspond to the foreground. • Assuming the image contains the object, use vector of indices h to indicate which of the observables correspond to a foreground point (i.e. real part of the object). • We call h hypothesis since it is a guess on the structure of the object. h = (h1, …, hT) is not observable.

Additional Hidden Variables • We denote by the locations of the unobserved object parts. • b = sign(h) – binary vector indicates which parts were detected • n = number of background parts detected of each type

Probabilistic Model • We can now define a generative probabilistic model for the object class using the probability density function:

Model Details Since n, b are determined by Xo, h, we have: By Bayesian rule:

Model Details Full table of joint probabilities (for small F) or F independent detection rate probabilities for large F

Model Details Poisson probability density function with parameter Mt for detection of feature of type t

Model Details Uniform probability over all hypotheses consistent with n and b

Model Details Where - coordinates of all foreground detections, and - coordinates of all background detections

Sample object classes

Invariance to Translation Rotation and Scale • There is no use in modeling the shape of the object in terms of absolute pixel positions of the features. • We apply a transformation on features’ coordinates to make the shape invariant to translation, rotation and scale. • But the feature detector must be invariant to the transformations as well!

Automatic Part Selection • Find points of interest in all training images • Apply Vector Quantization and clustering to get 100 total candidate patterns.

Automatic Part Selection Points of interest patterns

Method Scheme Part Selection Model Learning Test

Automatic Part Selection • Find subset of candidate parts of (small) size F to be used in the model that gives the best performance in the learning phase. 57% 87% 51%

Learning • Goal: Find θ = {μ, Σ, p(b), M} which best explains the observed (training) data μ, Σ – expectation and covariance parameters of the joint Gaussian modeling the shape of the foreground b – random variable denoting whether each of the parts of the model is detected or not M – average number of background detections for each of the parts

Learning • Goal: Find θ = {μ, Σ, p(b), M} which best explains the observed (training) data, i.e. maximize the likelihood arg max p( Xo| θ ) θ • Done using the EM method

Expectation Maximization (EM) • EM is an iterative optimization method to estimate some unknown parameters θ, given measurement data, but not given some “hidden” variables J. • We want to maximize the posterior probability of the parameters θ given the data U, marginalizing over J:

Expectation Maximization (EM) E-Step: Estimate unobserved data using θk Choose an initial parameter θ0 Guess of parameters θk Guess of unknown hidden data Observed Data M-Step: Compute Maximum Likelihood Estimate parameter θk+1 using estimated data

Expectation Maximization (EM) • alternate between estimating the unknowns θ and the hidden variables J. • EM algorithm converges to a local maximum

Method Scheme Part Selection Model Learning Test

Recognition • Using the maximum a posteriori approach we consider the ratio R = where h0 is the null hypothesis – which explains all parts as background noise. • We accept the image as belonging to the class if R is above a certain threshold.

Database • Two classes – faces and cars • 100 training images for each class • 100 test images for each class • Images vary in scale, location of the object, lighting conditions • Images have cluttered background • No manual preprocessing

Learning Results

Model Performance Average training and testing errors measured as 1-Area(ROC) Suggests 4 parts model for faces and 5 parts model for cars as optimal.

Multiple use of parts Part Labels: Part ‘C’ has high variance along the vertical direction – can be detected in several locations – bumper, license plate or roof.

Recognition Results • Average success rate (at even False Positive and False Negative ratios): • Faces: 93.5% • Cars: 86.5%

Agenda • Bayesian decision theory • Recognition • Simple probabilistic model • Mixture model • More advanced probabilistic model • “One-Shot” Learning

Mixture Model • Gaussian model works good for homogenous classes, but real life objects can be far from homogenous. • Can we extend our approach to multi-model classes?

Mixture Model • An object is modeled using Ω different components, each is a probabilistic model: • Each component “sees the whole picture”. • Components are trained together.

Database • Faces with different viewing angles – 0°, 15°, …, 90° • Cars – rear view and side view • Tree leaves – of several types

A Bayesian Approach to Recognition

A Bayesian Approach to Recognition

Presentation Transcript

A Bayesian Approach to Parallelism Testing in Bioassay

Intro to Pattern Recognition : Bayesian Decision Theory

A Bayesian Approach to Earthquake Source Studies

A Computer Science Approach to Solar Image Recognition

A Non-parametric Bayesian Approach [WSDM’14]

Bayesian Approach

A Bayesian Approach to Learning Causal networks

Recognition: A machine learning approach

A Differential Approach to Inference in Bayesian Networks

Evolutionary HMMs: a Bayesian approach to multiple alignment

Reconstructing Genealogies: a Bayesian approach

A Bayesian Approach to the Poverty of the Stimulus

A Discussion of the Bayesian Approach

A Noninformative Prior Bayesian Approach to Reliability Growth Projection

Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach

A Bayesian Approach for Transformation Estimation

A Bayesian Approach to HMM-Based Speech Synthesis

A PAC-Bayesian Approach to Formulation of Clustering Objectives

A BAYESIAN APPROACH TO SPELLING CORRECTION

A Differential Approach to Inference in Bayesian Networks

A Bayesian Approach to Learning Causal networks

Frequentist approach Bayesian approach