EM Algorithm: Expectation Maximazation Clustering Algorithm book: “ DataMining, Morgan Kaufmann, Frank ”

EM Algorithm:Expectation MaximazationClustering Algorithmbook: “DataMining, Morgan Kaufmann, Frank” DataMining, Morgan Kaufmann, p218-227 Mining Lab. 김완섭 2004년 10월 27일

Content • Clustering • K-Means via EM • Mixture Model • EM Algorithm • Simple examples of EM • EM Application; WEKA • References

Clustering (1/2) • Clustering ? • Clustering algorithms divide a data set into natural groups (clusters). • Instances in the same cluster are similar to each other, they share certain properties. • e.g Customer Segmentation. • Clustering vs. Classification • Supervised Learning • Unsupervised Learning • Not target variable to be predicted.

Clustering (2/2) • Categorization of Clustering Methods • Partitioning mehtods • K-Means / K-medoids / PAM / CRARA / CRARANS • Hierachical methods • CURE / CHAMELON / BIRCH • Density-based methods • DBSCAN / OPTICS • Grid-based methods • STING / CLIQUE / Wave-Cluster • Model-based methods • EM / COBWEB / Bayesian / Neural Model-Based Clustering Probability-based Clustering Statistical Clustering

K-Means (1)Algorithm • Step 0 : • Select K objects as initial centroids. • Step 1 : (Assignment) • For each object compute distances to k centroids. • Assign each object to the cluster to which it is the closest. • Step 2 : (New Centroids) • Compute a new centroid for each cluster. • Step 3: (Converage) • Stop if the change in the centroids is less than the selected covergence criterion. • Otherwise repeat Step 1.

K-Means (2)simple example Input Data New Centroids & (Check) Random Centroids Assignment New Centroids & (check) Assignment Centroids & (check) Assignment

K-Means (3)weakness on outlier (noise)

K-Means (4)Calculation 1. (4,4), (3,4) 0. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) (100, 0) (4,2), (0,2), (1,1), (1,0) 1. 1)<3.5, 4> <21, 1> 1. 1) <3.5, 4> <1.5, 1.25> 2)<3.5, 4> - (0,2), (1,1), (1,0),(3,4),(4,4),(4,2) <21, 1> - (100,1) 2) <3.5, 4> - (3, 4), (4, 4), (4, 2) <1.5, 1.25> - (0, 2) (1, 1), (1, 0) 2. 1)<2.1, 2.1> <100, 0> 2. 2) <3.67, 3.3> <0.67, 1> 2)<2.1, 2.1> - (0, 2),(1,1),(1,0),(3,4),(4,4),(4,2) <100, 1> - (100, 1) 3) <3.67, 3.3> - (3, 4), (4, 4), (4, 2) <0.67, 1> - (0, 2) (1, 1), (1, 0)

K-Means (5)comparison with EM C1 • K-Means • Hard Clustering. • A instance belong to only one Cluster. • Based on Euclidean distance. • Not Robust on outlier, value range. • EM • Soft Clustering. • A instance belong to several clusters with membership probability. • Based on density probability. • Can handle both numeric and nominal attributes. I C2 C1 0.7 0.3 I C2

Mixture Model (1) • A Mixture is a set of k probability distributions, repesenting k clusters. • A probability distribution have mean and variances. • The mixture model combines several normal distributions.

Mixture Model (2) • Only one numeric attribute • five parameter

Mixture Model (3) Simple Example • Probability that an instance x belongs to cluster A Probability Density Function

Mixture Model (4)Probability Density Function • Normal Distribution • Gaussian Density Function • Poisson Distribution

Mixture Model (5)Probability Density Function • Iteration Iteration

EM Algorithm (1) • Step 1. (Initialization) • Random probability • Step 2. (Maximization Step) • Re-create cluster model • Re-compute the parameter Θ(mean, variance) • normal distribution. • Step 3. (Expectation Step) • Update record’s weight • Step 4. • Calculate log-likelihood • If the value saturates, exit • If not, Go to Step 2. Parameter Adjustment Weight Adjustment

EM Algorithm (2)Initialization • Random Probability • M-Step • Example

EM Algorithm (3)M-Step : Parameter (Mean, Dev) • Estimating parameters from weighted instances • Parameters • means, deviations.

EM Algorithm (3)M-Step : Parameter (Mean, Dev)

EM Algorithm (4)E-Step : Weight • compute weight • here

EM Algorithm (5)E-Step : Weight • compute weight • here

EM Algorithm (6)Objective Function (check) • Log-likelihood Function • For all instances, it’s probability belong to cluster A, • Use log for analysis 1-Dimensional data 2-Cluster A,B N-Dimensional data K-cluster - Mean vector - Covariance matrix

EM Algorithm (7)Objective Function (check) - Covariance Matrix - Mean Vector

EM Algorithm (8)Termination • Termination • Procedure stops when log-likelihood saturates. Q4 Q3 Q2 Q1 Q0 # of Iteration

EM Algorithm (1)Simple Data • EM example • 6 data (3 sample per 1 class) • 2 class (circle, rectangle)

EM Algorithm (2) Likelihood function of two component means Θ1, Θ2

EM Algorithm (3)

EM Example (1) • Example dataset • 2 Column(Math, English), 6 record

EM Example (2) • Distri. Of Math • mean : 56.67 • variance : 776.73 • Distri. Of Eng • mean : 82.5 • variance : 197.50 100 50 0 100 50

EM Example (3) • Random Cluster Weight

Iteration 1 EM Example (4) Maximization Step (parameter adjustment)

EM Example (4)

Iteration 2 EM Example (5) Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)

Iteration 3 EM Example (6) Expectation Step (Weight adjustment) Maximization Step (parameter adjustment)

EM Application (1)Weka • Weka • Waikato University in Newzealand • Open Source Mining Tool • http://www.cs.waikato.ac.nz/ml/weka • Experiment Data • Iris data • Real Data • Department Customer Data • Modified Customer Data

EM Application (2)IRIS Data • Data Info • Attribute Information: • sepal length in cm / sepal width / petal length / petal width in cm • class : Iris Setosa / Iris Versicolour / Iris Virginica

EM Application (3)IRIS Data

EM Application (4)Weka Usage • Weka Clustering Packages • Command line Execution • GUI Execution Weka.clusterers Java weka.clusterers.EM –t iris.arff –N 2 Java weka.clusterers.EM –t iris.arff –N 2 -V Java –jar weka.jar

EM Application (4)Weka Usage • Options for clustering in weka

EM Application (5)Weka usage

EM Application (5)Weka usage – input file format % Summary Statistics: % Min Max Mean SD Class Correlation % sepal length: 4.3 7.9 5.84 0.83 0.7826 % sepal width: 2.0 4.4 3.05 0.43 -0.4194 % petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) % petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa

EM Application (6)Weka usage – output format Number of clusters: 3 Cluster: 0 Prior probability: 0.3333 Attribute: sepallength Normal Distribution. Mean = 5.006 StdDev = 0.3489 Attribute: sepalwidth Normal Distribution. Mean = 3.418 StdDev = 0.3772 Attribute: petallength Normal Distribution. Mean = 1.464 StdDev = 0.1718 Attribute: petalwidth Normal Distribution. Mean = 0.244 StdDev = 0.1061 Attribute: class Discrete Estimator. Counts = 51 1 1 (Total = 53) 0 50 ( 33%) 1 48 ( 32%) 2 52 ( 35%) Log likelihood: -2.21138

EM Application (6)Result Visualization

References • DataMining • Morgan Cauffmann. IAN H. p218-p255. • DataMining, Concepts and Techiques. • Jiawei Han. Chapter 8. • The Expectation Maximization Algorithm • Frank Dellaert, Febrary 2002. • A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models.

EM Algorithm: Expectation Maximazation Clustering Algorithm book: “ DataMining, Morgan Kaufmann, Frank ”