1k likes | 1.21k Views
Dense Object Recognition. 2. Template Matching. Face Detection. We will investigate face detection using a scanning window technique:. Think that this task sounds easy?. Training Data. Non-Faces. Faces. 800 random non-face regions 60x60, taken from same data as faces.
E N D
Dense Object Recognition 2. Template Matching
Face Detection We will investigate face detection using a scanning window technique: Think that this task sounds easy?
Training Data Non-Faces Faces 800 random non-face regions 60x60, taken from same data as faces 800 face images 60x60, taken from online dating website
Vectorizing Images x1 x2 x3 …….. xN Concatenate face pixels into “vector”, x.
Overview of Approach • GENERATIVE APPROACH • Calculate models for data likelihood given each class • Compare likelihoods – in this case we will just calculate the likelihood ratio: • Threshold likelihood ratio to decide if face / non-face All that remains is to specify form of likelihood terms
The Multivariate Gaussian denotes a n-dimensional Gaussian or Normal distribution in the variable x with mean m and symmetric positive definite covariance matrix, S which comes in three flavours:
Model # 1: Gaussian, uniform covariance Fit model using maximum likelihood criterion m face m non-face Pixel 2 • Face • 59.1 • non-face • 69.1 Pixel 1 Face ‘template’
Model 1 Results Results based on 200 cropped faces and 200 non-faces from the same database. How does this work with a real image? Pr(Hit) Pr(False Alarm)
Scale 1 Maxima in log like ratio
Scale 2 Maxima in log like ratio
Scale 3 Maxima in log like ratio
Threshold Maxima Scale 1 Scale 2 Scale 3 Before Thresholding Before Thresholding Before Thresholding After Thresholding After Thresholding After Thresholding
Results Original Image Superimposed log like ratio Detected Faces Positions of maxima
Model # 2: Gaussian, diagonal covariance Fit model using maximum likelihood criterion m face m non-face Pixel 2 • Face • non-face Pixel 1
Model 2 Results Results based on 200 cropped faces and 200 non-faces from the same database. More sophisticated model unsurprisingly classifies new faces and non-faces better. Pr(Hit) Diagonal Uniform Pr(False Alarm)
Model # 2: Gaussian, full covariance Fit model using maximum likelihood criterion PROBLEM: we cannot fit this model. We don’t have enough data to estimate the full covariance matrix. N=800 training images D=10800 dimensions Total number of measured numbers = ND = 800x10,800 = 8,640,000 Total number of parameters in cov matrix = (D+1)D/2 = (10,800+1)x10,800/2 = 58,325,400 Pixel 2 Pixel 1
Possible Solution We could induce some covariance by using a mixtures of Gaussians model in which each component is uniform or diagonal. For small number of mixture components, the number of parameters is not too bad. Pixel 2 Pixel 2 Pixel 1 Pixel 1 For diagonal Gaussians, there are 2D+1 unknowns per component (D parameters for mean, D for diagonal covariance, and 1 for the weight of the Gaussian). i.e. For K components K(2D+1).
Dense Object Recognition 3. Mixtures of Templates
Mixture of Gaussians Key idea: represent probability as weighted sum (mixture) of Gaussian distributions. Weights must sum to 1 or not a pdf. Pr(x) x x
Hidden Variable Interpretation Try to think about the same problem in a different way... Marginalize over h
Hidden Variable Interpretation • ASSUMPTIONS • for each training datum xi there is a hidden variable hi. • hi represents which Gaussian xi came from • hence hi takes discrete values • OUR GOAL: • To estimate the parameters q: • means m, • variances s2 • weights w • for each of the K components. THING TO NOTICE #1: If we knew the hidden variables hi for the training data it would very easy to estimate parameters q – just estimate individual Gaussians separately.
Hidden Variable Interpretation THING TO NOTICE #2: If we knew the parameters q it would very easy to estimate the posterior distribution over the each hidden variables hi using Bayes’ rule: Pr(x|h=3) Pr(h|x) Pr(x|h=2) Pr(x|h=1) h=1 h=2 h=3
Expectation Maximization • Chicken and egg problem: • could find h1...N if we knew q • could find q if we knew h1...N Solution: Expectation Maximization (EM) algorithm (Dempster, Laird and Rubin 1977) • Alternate between: • 1. Expectation Step (E-Step) • For fixed q find posterior distribution over h1...N • 2. Maximization Step (M-Step) • Given these distributions,maximize lower bound on likelihood w.r.t. q
MOG 2 Components 0.4999 0.5001 Prior The face model and non-face model have divided the data into two clusters. In each case, these clusters have roughly equal weights. The primary thing that these seem to have captured is the photometric (luminance) variation. Note that the standard deviations have become smaller than for the single Gaussian model as any given data point is likely to be close to one mean or the other. Mean Face Model Parameters Standard deviation 0.5325 0.4675 Prior Mean Non-Face Model Parameters Standard deviation
Results for MOG 2 Model Performance improves relative to a single Gaussian model, although it is not dramatic. We have a better description of the data likelihood. Pr(Hit) MOG 2 Diagonal Uniform Pr(False Alarm)
MOG 5 Components 0.0988 0.1925 0.2062 0.2275 0.1575 Prior Mean Face Model Parameters Standard deviation 0.1737 0.2250 0.1950 0.2200 0.1863 Prior Mean Non-Face Model Parameters Standard deviation
MOG 10 Components 0.0075 0.1425 0.1437 0.0988 0.1038 0.1187 0.1638 0.1175 0.1038 0.0000 0.1137 0.0688 0.0763 0.0800 0.1338 0.1063 0.1063 0.1263 0.0900 0.0988
Results for MOG 2 Model Performance improves slightly more, particularly at low false alarm rates. What if we move to an infinite number of Gaussians? Pr(Hit) MOG 10 MOG 2 Diagonal Uniform Pr(False Alarm)
Dense Object Recognition 4. Subspace models: factor analysis
Factor Analysis: Intuitions Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical. What happens if we keep adding more and more Gaussians along this line? Pixel 2 Marginalize over h Pixel 2 Pixel 1 h=1 Pixel 1 h=0 h=-1 Hidden Variable Pixel 1
Factor Analysis: Intuitions Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical. What happens if we keep adding more and more Gaussians along this line? In the limit the hidden variable become continuous Pixel 2 Marginalize over h Pixel 2 Pixel 1 h=2 Pixel 1 h=1 h=0 h=-1 h=-2 Hidden Variable Pixel 1
Factor Analysis: Intuitions Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical. What happens if we keep adding more and more Gaussians along this line? In the limit the hidden variable become continuous Pixel 2 Marginalize over h Pixel 2 Pixel 1 CONTINUOUS Pixel 1 Hidden Variable Pixel 1 Now consider weighting the constituent Gaussians...
Factor Analysis: Intuitions Consider putting the means of the Gaussians mixture components all on a line and forcing their diagonal covariances to be identical. What happens if we keep adding more and more Gaussians along this line? In the limit the hidden variable become continuous Pixel 2 Marginalize over h Pixel 2 Pixel 1 CONTINUOUS Pixel 1 Hidden Variable Pixel 1 If weights decrease with distance from central point, can get something like oriented Gaussian
Factor Analysis: Maths f Pixel 2 Marginalize over h Pixel 2 m Pixel 1 h=1 Pixel 1 h=0 h=-1 Hidden Variable Pixel 1
Factor Analysis: Maths f Pixel 2 Marginalize over h Pixel 2 m Pixel 1 h=1 Pixel 1 h=0 h=-1 Hidden Variable Pixel 1
Factor Analysis: Maths f Pixel 2 Marginalize over h Pixel 2 m Pixel 1 h=1 Pixel 1 h=0 h=-1 Hidden Variable Pixel 1
Factor Analysis: Maths Pixel 2 f Marginalize over h Pixel 2 m Pixel 1 h=2 Pixel 1 h=1 h=0 h=-1 h=-2 Hidden Variable Pixel 1
Factor Analysis Maths Pixel 2 Marginalize over h Pixel 2 Pixel 1 CONTINUOUS Pixel 1 Hidden Variable Pixel 1 Now consider weighting the constituent Gaussians...
Factor Analysis Maths Pixel 2 Marginalize over h Pixel 2 Pixel 1 CONTINUOUS Pixel 1 Hidden Variable Pixel 1 Weight components by another Gaussian distribution with mean 0 and variance 1
Factor Analysis Maths • This integral does actually evaluate to make a new Gaussian whose principal axis is oriented along the line given by m+kf. • This is not obvious! • The line along which the Gaussians are placed is termed a subspace. • Since h was just a number and there was one column in f it was a one dimensional subspace. • This is not necessarily the case though but dh< dxalways holds
Factor Analysis Maths For a general subspace of dh dimensions in a larger space of size dx • F has dim(h) columns each of length dx– these are termed factors. • They are basis vectors span the subspace • h now weights these basis vectors to define a position in the subspace • Concrete example: 2D subspace in a 3D space • F will contain two 3D vectors in its columns, spanning plane subspace • h determines the weighting of these vectors • h determines the position on the plane
A Generative View • We have considered factor analysis as an infinite mixture of Gaussians, but there are other ways to think about it. • Consider a rule for creating new data points xi • Created from some smaller underlying random variables hi h • To generate: • Choose factor loadings, hi from standard normal distribution • Multiply by factors, F • Add mean, m • add random noise component ei w/ diagonal covS x
A Generative View • Choose factor loadings, hi from standard normal distribution • Multiply by factors, F • Add mean, m • add random noise component ei w/ diagonal covS . . x1 = m+Fh1 + e . • e x1 h1 . x2 . . HIDDEN DIM 2 OBSERVED DIM 2 . h2 x3 Deterministic transformation + additive noise OBSERVED DIM 3 h3 OBSERVED DIM 1 HIDDEN DIM 1
A Generative View • Choose factor loadings, hi from standard normal distribution • Multiply by factors, F • Add mean, m • add random noise component ei w/ diagonal covS h Equivalent Description: x Joint Distribution: (marginalize to get Pr(x))
Factor Analysis Parameter Count For a general subspace of dh dimensions in a larger space of size dx. • Factor analysis covariance has: • dhdx parameters in the factor matrix, F • dx parameters in the covariance, S This gives a total of dx (dh+1) parameters. If dh is reasonably small, and dx is large then this is much less than the full covariance which has dx(dx+1)/2. It is a reasonable assumption that an ensemble of images (like faces) genuinely lie largely within a subspace of the very high-dimensional image space so this is not a bad model. • But given some data, how to we estimate F, S, and m? • Unfortunately, to do this, we will need some more maths!
Dense Object Recognition Interlude: Gaussian and Matrix Identities
Multivariate Normal Distribution Multivariate generalization of 1D Gaussian or Normal distribution. Depends on mean vector m and (symmetric, positive, definite) covariance matrix S. The multivariate normal distribution has PDF: where n is the dimensionality of the space under consideration.
Gaussian Identity #1:Multiplication of Gaussians Property: When we multiply two Gaussian distributions (common when applying Bayes’ rule) then the resulting distribution is also Gaussian. In particular: where: The normalization constant is also Gaussian in either a or b. Intuitively you can see that the product must be a Gaussian, as each of the original Gaussians has an exponent that is quadratic in x. When we multiply the two Gaussians, we add the exponents giving another quadratic.