Bayesian Learning & Gaussian Mixture Models Overview

Bayesian Learning & Gaussian Mixture Models Jianping Fan Dept of Computer Science UNC-Charlotte

Basic Classification Input Output Spam vs. Not-Spam Spam filtering Binary !!!!$$$!!!! Multi-Class Character recognition C C vs. other 25 characters

Structured Classification Input Output Handwriting recognition Structured output brace building 3D object recognition tree

Overview of Bayesian Decision • Bayesian classification: one example • E.g. How to decide if a patient is sick or healthy, based on • A probabilistic model of the observed data (data distributions) • Prior knowledge (ratio or importance)

Bayes’ Rule: Who is who in Bayes’ rule

Classification problem • Training data: examples of the form (d,h(d)) • where d are the data objects to classify (inputs) • and h(d) are the correct class info for d, h(d){1,…K} • Goal: given dnew, provide h(dnew)

Why Bayesian? • Provides practical learning algorithms • E.g. Naïve Bayes • Prior knowledge and observed data can be combined • It is a generative (model based) approach, which offers a useful conceptual framework • E.g. sequences could also be classified, based on a probabilistic model specification • Any kind of objects can be classified, based on a probabilistic model specification

Gaussian Mixture Model (GMM)

  Univariate Normal Sample Sampling

  Maximum Likelihood Sampling We want to maximize it. Given x, it is a function of  and 2

Log-Likelihood Function Maximize this instead By setting and

Max. the Log-Likelihood Function

  Miss Data Missing data Sampling

E-Step be the estimated parameters at the initial of the tth iterations Let

M-Step be the estimated parameters at the initial of the tth iterations Let

n= 40 (10 data missing) Estimate using different initial conditions. 375.081556 362.275902 332.612068 351.383048 304.823174 386.438672 430.079689 395.317406 369.029845 365.343938 243.548664 382.789939 374.419161 337.289831 418.928822 364.086502 343.854855 371.279406 439.241736 338.281616 454.981077 479.685107 336.634962 407.030453 297.821512 311.267105 528.267783 419.841982 392.684770 301.910093 Exercise

Multinomial Population Sampling Nsamples

Maximum Likelihood Sampling Nsamples

Maximum Likelihood Sampling Nsamples We want to maximize it.

Log-Likelihood

Mixed Attributes Sampling Nsamples x3 is not available

E-Step Sampling Nsamples x3 is not available Given (t), what can you say about x3?

M-Step

Exercise Estimate using different initial conditions?

# Children n6 n2 n3 n4 n5 n1 Married Obasongs Unmarried Obasongs (No Children) Binomial/Poison Mixture M: married obasong X: # Children n0 # Obasongs

# Children n6 n2 n3 n4 n5 n1 Married Obasongs Unmarried Obasongs (No Children) Binomial/Poison Mixture M: married obasong X: # Children n0 # Obasongs Unobserved data: nA : # married Ob’s nB : # unmarried Ob’s

# Children n6 n6 n2 n2 n3 n3 n4 n4 n5 n5 n1 n1 pA, pB p1 p2 p3 p4 p5 p6 Probability Binomial/Poison Mixture M: married obasong X: # Children n0 # Obasongs Complete data

# Children n6 n6 n2 n2 n3 n3 n4 n4 n5 n5 n1 n1 pA, pB p1 p2 p3 p4 p5 p6 Probability Binomial/Poison Mixture n0 # Obasongs Complete data

# Children n6 n6 n2 n2 n3 n3 n4 n4 n5 n5 n1 n1 pA, pB p1 p2 p3 p4 p5 p6 Probability Complete Data Likelihood n0 # Obasongs Complete data

 Maximum Likelihood

 Latent Variables Incomplete Data Complete Data

Complete Data  Complete Data Likelihood

Complete Data Complete Data Likelihood A function of latent variable Y and parameter  A function of parameter  A function of random variable Y. The result is in term of random variable Y. Computable If we are given ,

Expectation Step Let (i1) be the parameter vector obtained at the (i1)th step. Define

Maximization Step Let (i1) be the parameter vector obtained at the (i1)th step. Define

Mixture Models • If there is a reason to believe that a data set is comprised of several distinct populations, a mixture model can be used. • It has the following form: with

 Mixture Models Let yi{1,…, M} represents the source that generates the data.

 Mixture Models

Mixture Models

Expectation Zero when yi l

Expectation

Expectation 1

Maximization Given the initial guess g, We want to find , to maximize the above expectation. In fact, iteratively.

The GMM (Guassian Mixture Model) Guassian model of a d-dimensional source, say j : GMM with M sources:

Bayesian Learning & Gaussian Mixture Models Overview