490 likes | 511 Views
COMP 328: Final Review Spring 2010. Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology http://www.cse.ust.hk/~lzhang/ Can be used as cheat sheet. Pre-Midterm. Algorithms for supervised learning Decision trees
E N D
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology http://www.cse.ust.hk/~lzhang/ Can be used as cheat sheet
Pre-Midterm • Algorithms for supervised learning • Decision trees • Instance-based learning • Naïve Bayes classifiers • Neural networks • Support vector machines • General issues regarding supervised learning • Classification error and confidence interval • Bias-Variance tradeoff • PAC learning theory
Post-Midterm • Clustering • Distance-Based Clustering • Model-Based Clustering • Dimension Reduction • Principal Component Analysis • Reinforcement Learning • Ensemble Learning
Distance-Based Clustering • Partitional and Hierarchical clustering
K-Means: Partitional Clustering • Different initial points might lead to different partitions • Solution: • Multiple runs, • Use evaluation criteria such as SSE to pick the best one
Hierarchical Clustering • Agglomerative and Divisive
Cluster Validation • External indices • Entropy: Average purity of clusters obtained • Mutual Information between class label and cluster label
Cluster Validation • External Measure • Jaccard Index • Rand Index Measure similarity between two relationships: in-same-class & in-same-cluster
Cluster Validation • Internal Measure • Dunn’s index
Cluster Validation • Internal Measure
Post-Midterm • Clustering • Distance-Based Clustering • Model-Based Clustering • Dimension Reduction • Principal Component Analysis • Reinforcement Learning • Ensemble Learning
Model-Based Clustering • Assume data generated from a mixture model with K components • Estimate parameters of the model from data • Assign objects to clusters based posterior probability: Soft Assignment
EM • l(t): Log likelihood of model after t-th iteration • l(t): increases monotonically with t • But might go to infinite in case of singularity • Solution: place bound on eigen values of covariance matrix • Local maximum • Multiple restart • Use likelihood to pick best model
EM and K-Means • K-Means is hard-assignment EM
Learning Latent Class Models Always converges
Post-Midterm • Clustering • Distance-Based Clustering • Model-Based Clustering • Dimension Reduction • Principal Component Analysis • Reinforcement Learning • Ensemble Learning
Dimension Reduction • Necessary because there are data sets with large numbers of attributes that are difficult to learning algorithms to handle.
Post-Midterm • Clustering • Distance-Based Clustering • Model-Based Clustering • Dimension Reduction • Principal Component Analysis • Reinforcement Learning • Ensemble Learning
Markov Decision Process • A model of how agent interact with its environment
Q-Learning • From Q-function based value iteration • Ideas • In-place/asynchronous value iteration • Approximate expectation using samples • ε-greedy policy (for exploration/exploitation) tradeoff
Post-Midterm • Clustering • Distance-Based Clustering • Model-Based Clustering • Dimension Reduction • Principal Component Analysis • Reinforcement Learning • Ensemble Learning