1 / 33

Lecture 09 Clustering-based Learning

Lecture 09 Clustering-based Learning. Topics Basics K-Means Self-Organizing Maps Applications Discussions. Basics. Clustering Grouping a collection of objects (examples) into clusters, such that objects are most similar inside each cluster and least similar between clusters.

birch
Download Presentation

Lecture 09 Clustering-based Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 09 Clustering-based Learning • Topics • Basics • K-Means • Self-Organizing Maps • Applications • Discussions

  2. Basics • Clustering • Grouping a collection of objects (examples) into clusters, such that objects are most similar inside each cluster and least similar between clusters. • Core problem: similarity definition • Intra cluster similarity • Inter cluster similarity • Inductive learning • Unsupervised learning

  3. Basics • Minimizing intra cluster dissimilarity is equivalent to maximizing inter cluster dissimilarity • Clustering performance in terms of Intra cluster dissimilarity: • K for K clusters and d(xi,xi’) for dissimilarity measure

  4. Basics • Dissimilarity measure depends on value types and value coding systems • Some examples • Quantitative variables: • Ordinal variables: • Categorical variables:

  5. Basics • Clustering algorithms • Combinatorial Algorithms • Work directly on the observed data • K-Means • Self-Organizing Maps

  6. K-Means • A statistical learning mechanism • A given object is assigned to a cluster if it has least dissimilarity to the mean value of the cluster. • Euclidean or Manhattandistance is commonly used to measure dissimilarity • The mean value of each cluster is recalculated in each iteration

  7. K-Means • Step 1:Selecting Centers Selects k objects randomly, each becoming the center (mean) of an initial cluster. • Step 2:Clustering Assign each of the remaining objects to the cluster with the nearest distance. The most popular method for calculating distance is Euclidean distance. Given two points p = ( p1, p2, …, pk ) and q = ( q1, q2, …, qk ), their Euclidean distance is defined as:

  8. K-Means • Step 3:Computing New Centers Compute new cluster centers. Let xibe one of the elements assigned to the kth cluster, and Nkbe the number of elements in thecluster. The new center of cluster k, Ck, is calculated as: • Step 4:Iteration Repeat steps 2 and 3 until no members change their clusters.

  9. 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 K-Means • Example 10 9 8 7 6 Assign each object to most similar center 5 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 reassign reassign K=2 Arbitrarily choose K object as initial cluster center Update the cluster means

  10. K-Means • Usually, the problem itself has the setting of K • If K is not given, in order to find the best K, we examine the intra cluster dissimilarity Wk, which is a function of K • Usually Wk decreases with increasing K

  11. A sharp drop of Wk is observed K-Means • Decide K

  12. K-Means • Hierarchical Clustering

  13. K-Means • Agglomerative Hierarchical Clustering

  14. K-Means • Divisive Hierarchical Clustering

  15. Self-Organizing Maps • The self-organising map(SOM)is a subsymbolic learning algorithm; data input need to be numerically coded • It is based on competitive learning:Neurons compete among themselves to be activated, but only a single output neuron can be active at any time. • The output neuron that wins the “competition” is called the winner-takes-all neuron

  16. Self-Organizing Maps

  17. Self-Organizing Maps • Emulating brain structure • Our brain is dominated by the cerebral cortex, a very complex structure of billions of neurons and hundreds of billions of synapses. • The cortex includes areas that are responsible for different human activities (motor, visual, auditory, etc.), and associated with different sensory inputs. • We can say that each sensory input is mapped into a corresponding area of the cerebral cortex. • The cortex is a self-organising computational map in the human brain.

  18. Self-Organizing Maps • SOM provides a topological mapping. It places a fixed number of input patterns from the input layer into a higher-dimensional output or Kohonen layer. • Training in SOM begins with the winner’s neighborhood of a fairly large size. Then, as training proceeds, the neighborhoodsize gradually decreases.

  19. Self-Organizing Maps • Conceptual architecture

  20. Self-Organizing Maps • The lateral connections are used to create a competition between neurons. The neuron with the largest activation level among all neurons in the output layer becomes the winner. This neuron is the only neuron that produces an output signal. The activity of all other neurons is suppressed in the competition. • The lateral feedback connections produce excitatory or inhibitory effects, depending on the distance from the winning neuron. This can be achieved by the use of a Mexican hat function which describes synaptic weights between neurons in the Kohonen layer.

  21. Self-Organizing Maps • Mexican hat function of lateral connection

  22. SOM – Competitive Learning Algorithm • Step 1: Initialization Set initial weights to small random values, say in an interval [0, 1], and assign a small positive value, e.g., 0.2 to 0.5, to the learning rate parameter 0.

  23. SOM – Competitive Learning Algorithm • Step 2: Activation and Similarity Matching Activate the SOM by applying the input vector X, and find the best matching neuron JX at iteration p, using the minimum Euclidean distance criterion where n is the number of neurons in the input layer, m is the number of neurons in the Kohonen layer, and j = 1, 2, …m.

  24. SOM – Competitive Learning Algorithm • Step 3: Learning (a) Calculate weight corrections according to the competitive learning rule: where ΛJ: neighborhood of neuron J, d0: initial neighborhood size and T: total repetitions.

  25. SOM – Competitive Learning Algorithm • Step 3: Learning (Continued) (b) Update the weights where wij(p) is the weight correction at iteration p. • Step 4: Iteration Increase iteration p by one, go back to Step 2 and continue until the minimum-distance Euclidean criterion is satisfied, or no noticeable changes occur in the feature map.

  26. New Object Self-Organizing Maps • SOM is online K-Means

  27. Self-Organizing Maps • Example: A SOM with 100 neurons arranged in the form of a two-dimensional lattice with 10 rows and 10 columns. It is required to classify two-dimensional input vectors  each neuron in the network should respond only to the input vectors occurring in its region. • The network is trained with 1000 two-dimensional input vectors generated randomly in a square region in the interval between –1 and +1. The learning rate parameter  is fixed, equal to 0.1.

  28. Self-Organizing Maps Initial random weights

  29. Self-Organizing Maps 100 repetitions

  30. Self-Organizing Maps 1,000 repetitions

  31. Self-Organizing Maps 10,000 repetitions

  32. Applications • K-Means • Cluster ECG signals according to Correlation Dimensions • Self-Organizing Maps • Find churner groups • Speech recognition

  33. Discussions • Clustering algorithm in Open Sesame! • Attribute-based representation of events • Attribute-based similarity measure for clusters • Hierarchical clustering of event sequences • Generalization, e.g., • “A ∧ B ∧C” generalized to “A ∧ B” • “A ∨ B”generalized to “A ∨ B ∨ C” • ontology • Specialization

More Related