580 likes | 938 Views
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 600N: Reasoning and Decision under Uncertainty Summer 2010 Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cs.ust.hk Home page PMs for Classification PMs for Clustering: Continuous data PMs for Clustering: Discrete data
E N D
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGYCSIT 600N: Reasoning and Decision under Uncertainty Summer 2010 Nevin L. ZhangRoom 3504, phone: 2358-7015, Email: lzhang@cs.ust.hkHome page
PMs for Classification PMs for Clustering: Continuous data PMs for Clustering: Discrete data L09: Probabilistic Models (PMs) for Classification and Clustering
The problem: Given data: Find mapping (A1, A2, …, An) |- C Possible solutions ANN Decision tree (Quinlan) … (SVM: Continuous data) Classification
Bayesian Networks for Classification • Naïve Bayes model often has good performance in practice • Drawbacks of Naïve Bayes: • Attributes mutually independent given class variable • Often violated, leading to double counting. • Fixes: • General BN classifiers • Tree augmented Naïve Bayes (TAN) models • …
Bayesian Networks for Classification • General BN classifier • Treat class variable just as another variable • Learn a BN. • Classify the next instance based on values of variables in the Markov blanket of the class variable. • Pretty bad because it does not utilize all available information because of Markov boundary
Bayesian Networks for Classification • Tree-Augmented Naïve Bayes (TAN) model • Capture dependence among attributes using a tree structure. • During learning, • First learn a tree among attributes: use Chow-Liu algorithm • Special structure learning problem, easy • Add class variable and estimate parameters • Classification • arg max_c P(C=c|A1=a1, …, An=an) • BN inference
PMs for Classification PMs for Clustering: Continuous data • Gaussian distributions • Parameter estimation for Gaussian distributions • Gaussian mixtures • Learning Gaussian mixtures PMs for Clustering: Discrete data Outline
http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.htmlhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html • Real-world example of Normal Distributions?
PMs for Classification PMs for Clustering: Continuous data • Gaussian distributions • Parameter estimation for Gaussian distributions • Gaussian mixtures • Learning Gaussian mixtures PMs for Clustering: Discrete data Outline
Data: Example Mean vector Covariance Matrix
PMs for Classification PMs for Clustering: Continuous data • Gaussian distributions • Parameter estimation for Gaussian distributions • Gaussian mixtures • Learning Gaussian mixtures PMs for Clustering: Discrete data Outline
PMs for Classification PMs for Clustering: Continuous data • Gaussian distributions • Parameter estimation for Gaussian distributions • Gaussian mixtures • Learning Gaussian mixtures PMs for Clustering: Discrete data Outline
PMs for Classification PMs for Clustering: Continuous data PMs for Clustering: Discrete data L09: Probabilistic Models (PMs) for Classification and Clustering
PMs for Classification PMs for Clustering: Continuous data PMs for Clustering: Discrete data • A generalization L09: Probabilistic Models (PMs) for Classification and Clustering