330 likes | 459 Views
Lecture 09 Clustering-based Learning. Topics Basics K-Means Self-Organizing Maps Applications Discussions. Basics. Clustering Grouping a collection of objects (examples) into clusters, such that objects are most similar inside each cluster and least similar between clusters.
E N D
Lecture 09 Clustering-based Learning • Topics • Basics • K-Means • Self-Organizing Maps • Applications • Discussions
Basics • Clustering • Grouping a collection of objects (examples) into clusters, such that objects are most similar inside each cluster and least similar between clusters. • Core problem: similarity definition • Intra cluster similarity • Inter cluster similarity • Inductive learning • Unsupervised learning
Basics • Minimizing intra cluster dissimilarity is equivalent to maximizing inter cluster dissimilarity • Clustering performance in terms of Intra cluster dissimilarity: • K for K clusters and d(xi,xi’) for dissimilarity measure
Basics • Dissimilarity measure depends on value types and value coding systems • Some examples • Quantitative variables: • Ordinal variables: • Categorical variables:
Basics • Clustering algorithms • Combinatorial Algorithms • Work directly on the observed data • K-Means • Self-Organizing Maps
K-Means • A statistical learning mechanism • A given object is assigned to a cluster if it has least dissimilarity to the mean value of the cluster. • Euclidean or Manhattandistance is commonly used to measure dissimilarity • The mean value of each cluster is recalculated in each iteration
K-Means • Step 1:Selecting Centers Selects k objects randomly, each becoming the center (mean) of an initial cluster. • Step 2:Clustering Assign each of the remaining objects to the cluster with the nearest distance. The most popular method for calculating distance is Euclidean distance. Given two points p = ( p1, p2, …, pk ) and q = ( q1, q2, …, qk ), their Euclidean distance is defined as:
K-Means • Step 3:Computing New Centers Compute new cluster centers. Let xibe one of the elements assigned to the kth cluster, and Nkbe the number of elements in thecluster. The new center of cluster k, Ck, is calculated as: • Step 4:Iteration Repeat steps 2 and 3 until no members change their clusters.
10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 K-Means • Example 10 9 8 7 6 Assign each object to most similar center 5 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 reassign reassign K=2 Arbitrarily choose K object as initial cluster center Update the cluster means
K-Means • Usually, the problem itself has the setting of K • If K is not given, in order to find the best K, we examine the intra cluster dissimilarity Wk, which is a function of K • Usually Wk decreases with increasing K
A sharp drop of Wk is observed K-Means • Decide K
K-Means • Hierarchical Clustering
K-Means • Agglomerative Hierarchical Clustering
K-Means • Divisive Hierarchical Clustering
Self-Organizing Maps • The self-organising map(SOM)is a subsymbolic learning algorithm; data input need to be numerically coded • It is based on competitive learning:Neurons compete among themselves to be activated, but only a single output neuron can be active at any time. • The output neuron that wins the “competition” is called the winner-takes-all neuron
Self-Organizing Maps • Emulating brain structure • Our brain is dominated by the cerebral cortex, a very complex structure of billions of neurons and hundreds of billions of synapses. • The cortex includes areas that are responsible for different human activities (motor, visual, auditory, etc.), and associated with different sensory inputs. • We can say that each sensory input is mapped into a corresponding area of the cerebral cortex. • The cortex is a self-organising computational map in the human brain.
Self-Organizing Maps • SOM provides a topological mapping. It places a fixed number of input patterns from the input layer into a higher-dimensional output or Kohonen layer. • Training in SOM begins with the winner’s neighborhood of a fairly large size. Then, as training proceeds, the neighborhoodsize gradually decreases.
Self-Organizing Maps • Conceptual architecture
Self-Organizing Maps • The lateral connections are used to create a competition between neurons. The neuron with the largest activation level among all neurons in the output layer becomes the winner. This neuron is the only neuron that produces an output signal. The activity of all other neurons is suppressed in the competition. • The lateral feedback connections produce excitatory or inhibitory effects, depending on the distance from the winning neuron. This can be achieved by the use of a Mexican hat function which describes synaptic weights between neurons in the Kohonen layer.
Self-Organizing Maps • Mexican hat function of lateral connection
SOM – Competitive Learning Algorithm • Step 1: Initialization Set initial weights to small random values, say in an interval [0, 1], and assign a small positive value, e.g., 0.2 to 0.5, to the learning rate parameter 0.
SOM – Competitive Learning Algorithm • Step 2: Activation and Similarity Matching Activate the SOM by applying the input vector X, and find the best matching neuron JX at iteration p, using the minimum Euclidean distance criterion where n is the number of neurons in the input layer, m is the number of neurons in the Kohonen layer, and j = 1, 2, …m.
SOM – Competitive Learning Algorithm • Step 3: Learning (a) Calculate weight corrections according to the competitive learning rule: where ΛJ: neighborhood of neuron J, d0: initial neighborhood size and T: total repetitions.
SOM – Competitive Learning Algorithm • Step 3: Learning (Continued) (b) Update the weights where wij(p) is the weight correction at iteration p. • Step 4: Iteration Increase iteration p by one, go back to Step 2 and continue until the minimum-distance Euclidean criterion is satisfied, or no noticeable changes occur in the feature map.
New Object Self-Organizing Maps • SOM is online K-Means
Self-Organizing Maps • Example: A SOM with 100 neurons arranged in the form of a two-dimensional lattice with 10 rows and 10 columns. It is required to classify two-dimensional input vectors each neuron in the network should respond only to the input vectors occurring in its region. • The network is trained with 1000 two-dimensional input vectors generated randomly in a square region in the interval between –1 and +1. The learning rate parameter is fixed, equal to 0.1.
Self-Organizing Maps Initial random weights
Self-Organizing Maps 100 repetitions
Self-Organizing Maps 1,000 repetitions
Self-Organizing Maps 10,000 repetitions
Applications • K-Means • Cluster ECG signals according to Correlation Dimensions • Self-Organizing Maps • Find churner groups • Speech recognition
Discussions • Clustering algorithm in Open Sesame! • Attribute-based representation of events • Attribute-based similarity measure for clusters • Hierarchical clustering of event sequences • Generalization, e.g., • “A ∧ B ∧C” generalized to “A ∧ B” • “A ∨ B”generalized to “A ∨ B ∨ C” • ontology • Specialization