1.23k likes | 1.67k Views
Clustering and Unsupervised Learning. Clustering is one of algorithms for unsupervised learningClass labels are not known in advanceUnlike in classification, clustering models are learned purely based on attribute values. What is Clustering ?. Given: Set of unlabeled patternsEach pattern contain
E N D
1. Clustering © Dragoljub Pokrajac 2003
2. Clustering and Unsupervised Learning Clustering is one of algorithms for unsupervised learning
Class labels are not known in advance
Unlike in classification, clustering models are learned purely based on attribute values
3. What is Clustering ? Given: Set of unlabeled patterns
Each pattern contain one or more groups
Goal: Group patterns such that patterns “similar” to each other are in one group, and “dissimilar” in distinct groups
Such distinguished groups are called clusters
4. Example Group these figures according some criteria
Attributes:
Number of edges
Color
5. Clustering by color
6. Clustering by number of edges
7. Some Issues in Clustering The actual number of clusters is not known
Potential lack of apriori knowledge about data and cluster shapes
Clustering could be performed on-line
Time complexity, when working with large amounts of data
8. Types of Clustering Algorithms Partitioning
Hierarchical
Clusters for large data sets
9. Partitioning Methods Only one set of clusters is created at the output of the algorithm
Number of clusters is usually specified
Dataset is being partitioned into several groups and groups are updated through iterations
K-means, EM, PAM, CLARA, CLARANS…
10. K Means Algorithm Repeat:
Randomly initialize K cluster centers
Each point assign to the nearest cluster center
Re-estimate cluster centers by averaging coordinates of points assigned to each cluster
32. Problems With K-means Number of Clusters Must Be prespecified
Algorithm Sensitive on Initialization
May not converge to proper clusters TO DO: show this!
Algorithm does not care about the shape of the clusters
Algorithm does not care about densities
“Blue” cluster much denser than the other clusters
33. EM Algorithm Idea:
Each point came from one of Gaussian distributions
Goal: estimate parameters of Gaussian distributions
34. Mixture of Gaussian Distributions With probability p1 data came from the distribution D1 determined by:
Mean ?1, covariance matrix ?1, conditional density function p(x|D1)
With probability p2 data came from the distribution D2 determined by:
Mean ?2, covariance matrix ?2, conditional density function p(x|D2)
…
With probability pK data came from the distribution DK determined by:
Mean ?K, covariance matrix ?K, conditional density function p(x|DK)
35. Gaussian Mixture - Formula Similar to well-known formula of total probability, we have formula of “total” probability density…
p(x)= p1*p(x|D1)+ p2*p(x|D2)+… pK*p(x|DK)
As we know, since conditional distributions are here Gaussians, we have:
36. EM Algorithm Details We need to set number of clusters K in advance
Consists of two phases
Expectation
Compute for every pattern probability that is came from each of K clusters, conditioned by observed attributes
Maximization
Update estimated values for
Means of the clusters
Covariance matrices of the clusters
Cluster priors (probability that a random point belongs to given cluster)
37. EM in Matlab for j=1:n_iterations
% “Expectation” phase: Compute for patterns probabilities that they came from
% each of K clusters
for h=1:K
Pmat(:,h)=p(h)*multinorm_distr_value(mu(h,:),sigma{h},X);
end
Pmat=Pmat./repmat(sum(Pmat,2),1,K); %
% “Maximization” phase
%Compute new means
for h=1:K
mu(h,:)=Pmat(:,h)'*X/p(h)/N; %N patterns
end
%Compute new sigmas
for h=1:K
XM=(X-repmat(mu(h,:),N,1)).*repmat(Pmat(:,h)).^0.5,1,2);
sigma{h}=XM'*XM/p(h)/N;
end
% Compute new prior probabilities
for h=1:K
p(h)=w'*Pmat(:,h)/N;
end
end
38. EM Algorithm- Example 3500 points from mixture of three two-dimensional Gaussian Distributions
EM algorithm initialized with
Distribution means close to true means
Covariance matrices equal to covariance matrix on all data
Equal priors
Each point colored by mixture of three primary colors
More clear color: more certain in the cluster membership
51. Problems with EM Algorithm Slow
Convergence depends on the initialization
Assumes Gaussian clusters
52. Hierarchical Clustering Set of clusters is created
The way clusters are created is depicted by dendograms
Agglomerative
Divisive
53. Agglomerative Clustering Put each pattern into one separate cluster
While there are more than c clusters
Merge two clusters closest according to some distance criterion
Output c clusters
54. Distance Criterion SINGLE LINK
Minimal distance between points in two clusters
COMPLETE LINK
Maximal distance between points in two clusters
AVERAGE LINK
Average distance between points in two clusters
58. Properties of Single Link Distance Favors elongated clusters
59. Properties of Complete Link Distance Favors Compact clusters
67. Main Problems Slow
Distances should be recomputed
O(n2) time complexity
68. Divisive Clustering We start from only one cluster and successively split the clusters into smaller
E.g. using Minimal Spanning Tree (MST)
Minimal Spanning Tree is tree connecting all edges of the graph such that the sum of vertices is minimal
Note: MST can also be used to do single link…
69. Divisive Clustering using MST Consider patterns as vertices of fully connected graph
Consider each pair of vertices as connected with edge length equal to the distance between points
Compute MST
Sort edges of MST in decreasing order
While there are remaining edges
Form new cluster by deleting the longest remaining edge
75. Clustering Large Datasets Issues
Time complexity vs.
Number of patterns
Number of attributes
Spatial complexity
What if not whole dataset can fit in main memory?
76. DBSCAN Density based clusters
Time complexity, using special data structure R* trees is O(NlogN) where N is number of patterns
77. Couple of Definitions Core point: pattern in which neighborhood there are more than Nmin patterns
Nmin minimal number of patterns in the neighborhood
Example: Nmin =9
78. Density Reachable Points
79. Density Reachable Points –Formally… Point q is density reachable from p1 if:
p1 is a core point
There are some core points p2, p3,…,pM such that
p2 is in neighborhood of p1
p3 is in neighborhood of p2
p4 is in neighborhood of p3
…
pM is in neighborhood of pM-1
q is in neighborhood of pM
NOTE: q does not need to be a core point!
80. Density- Based Cluster A density-based cluster contains all the points density reachable from an arbitrary core point in the cluster!
81. Idea of DBSCAN Initially, all patterns of the database are unlabelled.
BUT: For each pattern, we check whether it is labeled (it can be labeled if it was in some previously detected cluster)
If the pattern is not labeled, we will check whether it is a core point, so that it may initiate a new cluster
If the pattern has a cluster label, we do not do nothing but instead process next pattern
82. Idea of DBSCAN -Cont If the examined point is a core point, it seeds a new cluster.
We observe the neighbors
If the neighbor is already labeled, it means it is already examined so we do not need to assign label or to re-examine that
Otherwise (neighbor is unlabeled)
Each neighbor is assigned a label of a new cluster
We recursively examine all core points in the neighborhood
83. DBSCAN - Algorithm DBSCAN:
FOR each pattern in dataset
IF the pattern is not already assigned to a cluster
IF CORE_POINT(pattern)==Yes
ASSIGN new cluster label to the pattern
EXAMINE (pattern.neighbors)
84. Important Note In practical realization, we can avoid having recursive calls
Maintain and update the list of all nodes from various neighborhoods that need to be examined
NOTE: Instead of list, we could improve performance by using sets (sets do not contain duplicates…)
This leads to the following practical, non-recursive version of DBSCAN
85. Non-Recursive DBSCAN
FOR each pattern in dataset
IF the pattern is not already assigned to a cluster
IF CORE_POINT(pattern)==Yes
ASSIGN new cluster label to the pattern
ADD pattern.neighbors to the list
WHILE list is not empty
TAKE neighbor from the beginning of the list (and remove it from the list)
IF neighbor is not already assigned to a cluster
ASSIGN new cluster label to the neighbor
IF CORE_POINT(neighbor)==Yes
ADD neighbor.neighbors to the list;
END WHILE
86. Remark In addition to the functionality provided in the described algorithm DBSCAN may assign a NOISE label to a pattern
Pattern is NOISE if it is not a core point and if it is not density reachable from some core point
NOISE patterns do not belong to any class
87. DBSCAN - Example
111. Problems with DBSCAN How to choose optimal parameters (size of neighborhood and minimal number of points)
May not work well with non-uniform and/or Gaussian clusters
How does it scale with the number of attributes?
112. Choice of Optimal Parameter Values (Ester et al, 1996) Use Nmin=4
From each point in dataset compute the distance to its 4th nearest neighbor
Sort these distances and plot them
Choose the distance threshold such that:
It is situated at the “knee” of the curve
The percentage of noise is pre-specified
114. Example Lets detect the same Gaussian clusters we successfully discovered by k-means and by EM algorithm
118. Problem of Dimensionality Speed of DBSCAN depends on the efficient search of neighbors
We need special indexing structures
R* trees work well up to 6 attributes
X trees work well up to 12 attributes
Is there an indexing structure which will scale well with the number of attributes???