210 likes | 334 Views
Kernel Methods Part 2. Bing Han June 26, 2008. Local Likelihood. Logistic Regression. Logistic Regression. After a simple calculation, we get We denote the probabilities Logistic regression models are usually fit by maximum likelihood. Local Likelihood.
E N D
Kernel MethodsPart 2 Bing Han June 26, 2008
Local Likelihood • Logistic Regression
Logistic Regression • After a simple calculation, we get • We denote the probabilities • Logistic regression models are usually fit by maximum likelihood
Local Likelihood • The data has feature xi and classes {1,2,…,J} • The linear model is
Local Likelihood • Local logistic regression • The local log-likelihood for this J class model
Kernel Density Estimation • We have a random sample x1, x2, …,xN, we want to estimate probability density • A natural local estimate • Smooth Pazen estimate
Kernel Density Estimation • A popular choice is Gaussian Kernel • A natural generalization of the Gaussian density estimate by the Gaussian product kernel
Kernel Density Classification • Density estimates • Estimates of class priors • By Bayes’ theorem
Naïve Bayes Classifier • Assume given a class G=j, the features Xk are independent
Naïve Bayes Classifier • A generalized additive model
Radial Basis Functions • Functions can be represented as expansions in basis functions • Radial basis functions treat kernel functions as basis functions. This lead to model
Method of learning parameters • Optimize the sum-of squares with respect to all the parameters:
Radial Basis Functions • Reduce the parameter set and assume a constant value for it will produce an undesirable effect. • Renormalized radial basis functions
Mixture models • Gaussian mixture model for density estimation • In general, mixture models can use any component densities. The Gaussian mixture model is the most popular.
Mixture models • If , Radial basis expansion • If , kernel density estimate • Where
Mixture models • The parameter are usually fit by maximum likelihood, such as EM algorithm • The mixture model also provides an estimate of the probability that observation i belong to component m