380 likes | 502 Views
EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell. Objectives. To review basic statistical modelling To review the notion of probability distribution To review the notion of probability distribution To review the notion of probability density function
E N D
EE3J2 Data MiningLecture 10 Statistical ModellingMartin Russell EE3J2 Data Mining
Objectives • To review basic statistical modelling • To review the notion of probability distribution • To review the notion of probability distribution • To review the notion of probability density function • To introduce mixture densities • To introduce the multivariate Gaussian density EE3J2 Data Mining
Discrete variables • Suppose that Y is a random variable which can take any value in a discrete set X={x1,x2,…,xM} • Suppose that y1,y2,…,yNare samples of the random variable Y • If cmis the number of times that the yn = xmthen an estimate of the probability that yn takes the value xmis given by: EE3J2 Data Mining
Symbol 1 2 3 4 5 6 7 8 9 Total Num.Occurrences 120 231 90 87 63 57 156 203 91 1098 Discrete Probability Mass Function EE3J2 Data Mining
Continuous Random Variables • In most practical applications the data are not restricted to a finite set of values – they can take any value in N-dimensional space • Simply counting the number of occurrences of each value is no longer a viable way of estimating probabilities… • …but there are generalisations of this approach which are applicable to continuous variables – these are referred to as non-parametric methods EE3J2 Data Mining
Continuous Random Variables • An alternative is to use a parametric model • In a parametric model, probabilities are defined by a small set of parameters • Simplest example is a normal, or Gaussian model • A Gaussian probability density function (PDF) is defined by two parameters – its mean and variance EE3J2 Data Mining
Gaussian PDF • ‘Standard’ 1-dimensional Guassian PDF: • mean =0 • variance =1 EE3J2 Data Mining
Gaussian PDF P(a x b) a b EE3J2 Data Mining
Constant to ensure area under curve is 1 Defines ‘bell’ shape Gaussian PDF • For a 1-dimensional Gaussian PDF p with mean and variance : EE3J2 Data Mining
=0.1 =10.0 =1.0 =5.0 More examples EE3J2 Data Mining
Fitting a Gaussian PDF to Data • Suppose y = y1,…,yn,…,yN is a set of N data values • Given a Gaussian PDF p with mean and variance , define: • How do we choose and to maximise this probability? EE3J2 Data Mining
Fitting a Gaussian PDF to Data Good fit Poor fit EE3J2 Data Mining
Maximum Likelihood Estimation • Define the best fitting Gaussian to be the one such that p(y|,) is maximised. • Terminology: • p(y|,), thought of as a function of y is the probability (density) of y • p(y|,), thought of as a function of , is the likelihood of , • Maximising p(y|,) with respect to , is called Maximum Likelihood (ML) estimation of , EE3J2 Data Mining
ML estimation of , • Intuitively: • The maximum likelihood estimate of should be the average value of y1,…,yN, (the sample mean) • The maximum likelihood estimate of should be the variance of y1,…,yN. (the sample variance) • This turns out to be true: p(y| , ) is maximised by setting: EE3J2 Data Mining
Multi-modal distributions • In practice the distributions of many naturally occurring phenomena do not follow the simple bell-shaped Gaussian curve • For example, if the data arises from several difference sources, there may be several distinct peaks (e.g. distribution of heights of adults) • These peaks are the modes of the distribution and the distribution is called multi-modal EE3J2 Data Mining
Gaussian Mixture PDFs • Gaussian Mixture PDFs, or Gaussian Mixture Models (GMMs) are commonly used to model multi-modal, or other non-Gaussian distributions. • A GMM is just a weighted average of several Gaussian PDFs, called the component PDFs • For example, if p1and p2are Gaussiam PDFs, then p(y) = w1p1(y) + w2p2(y) defines a 2 component Gaussian mixture PDF EE3J2 Data Mining
Gaussian Mixture - Example • 2 component mixture model • Component 1: =0, =0.1 • Component 2: =2, =1 • w1 = w2=0.5 EE3J2 Data Mining
Example 2 • 2 component mixture model • Component 1: =0, =0.1 • Component 2: =2, =1 • w1 = 0.2 w2=0.8 EE3J2 Data Mining
Example 3 • 2 component mixture model • Component 1: =0, =0.1 • Component 2: =2, =1 • w1 = 0.2 w2=0.8 EE3J2 Data Mining
Example 4 • 5 component Gaussian mixture PDF EE3J2 Data Mining
Gaussian Mixture Model • In general, an M component Gaussian mixture PDF is defined by: where each pmis a Gaussian PDF and EE3J2 Data Mining
Estimating the parameters of a Gaussian mixture model • A Gaussian Mixture Model with M components has: • M means: 1,…,M • M variances 1,…,M • M mixture weights w1,…,wM. • Given a set of data y = y1,…,yN, how can we estimate these parameters? • I.e. how do we find a maximum likelihood estimate of 1,…,M, 1,…,M, w1,…,wM? EE3J2 Data Mining
Parameter Estimation • If we knew which component each sample ytcame from, then parameter estimation would be easy: • Set mto be the average value of the samples which belong to the mth component • Set mto be the variance of the samples which belong to the mth component • Set wmto be the proportion of samples which belong to the mth component • But we don’t know which component each sample belongs to. EE3J2 Data Mining
This is a measure of how much yn ‘belongs to’ the mth component Solution – the E-M algorithm • Guess initial values • For each n calculate the probabilities • Use these probabilities to estimate how much each sample yn‘belongs to’ the mth component • Calculate: REPEAT EE3J2 Data Mining
The E-M algorithm local optimum p(y | ) (0)… (i) Parameter set EE3J2 Data Mining
E-M Algorithm • Let’s just look at estimation of a the mean μ of a single component of a GMM • In fact, • In other words, λn is the probability of the mth component given the data point yn EE3J2 Data Mining
Calculate from mth Gaussian component mth weight Sum over all components E-M continued • From Bayes’ theorem: EE3J2 Data Mining
Example – initial model P(m1|y6)=λ1 m1 P(m2|y6)=λ2 m2 y6 EE3J2 Data Mining
Example – after 1st iteration of E-M EE3J2 Data Mining
Example – after 2nd iteration of E-M EE3J2 Data Mining
Example – after 4th iteration of E-M EE3J2 Data Mining
Example – after 10th iteration of E-M EE3J2 Data Mining
Multivariate Gaussian PDFs • All PDFs so far have been 1-dimensional • They take scalar values • But most real data will be represented as D-dimensional vectors • The vector equivalent of a Gaussian PDF is called a multivariate Gaussian PDF EE3J2 Data Mining
1-dimensional Gaussian PDFs Multivariate Gaussian PDFs Contours of equal probability EE3J2 Data Mining
1-dimensional Gaussian PDFs Multivariate Gaussian PDFs EE3J2 Data Mining
The covariance matrix Multivariate Gaussian PDF • The parameters of a multivariate Gaussian PDF are: • The (vector) mean • The (vector) variance • The covariance EE3J2 Data Mining
Multivariate Gaussian PDFs • Multivariate Gaussian PDFs are commonly used in pattern processing and data mining • Vector data is often not unimodal, so we use mixtures of multivariate Gaussian PDFs • The E-M algorithm works for multivariate Gaussian mixture PDFs EE3J2 Data Mining
Summary • Basic statistical modelling • Probability distributions • Probability density function • Gaussian PDFs • Gaussian mixture PDFs and the E-M algorithm • Multivariate Gaussian PDFs EE3J2 Data Mining