1 / 17

Prepared by: Mahmoud Rafeek Al-Farra

College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology. Data Mining. Chapter 6_2 : Clustering Methods. Prepared by: Mahmoud Rafeek Al-Farra. 2013. www.cst.ps/staff/mfarra. Course’s Out Lines. Introduction Data Preparation and Preprocessing

Download Presentation

Prepared by: Mahmoud Rafeek Al-Farra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining Chapter 6_2: Clustering Methods Prepared by: Mahmoud Rafeek Al-Farra 2013 www.cst.ps/staff/mfarra

  2. Course’s Out Lines • Introduction • Data Preparation and Preprocessing • Data Representation • Classification Methods • Evaluation • Clustering Methods • Mid Exam • Association Rules • Knowledge Representation • Special Case study : Document clustering • Discussion of Case studies by students

  3. Out Lines • Definition of Clustering • Clustering Process • Clustering Algorithms (Methods) • Cluster validation

  4. Definition ? • Clustering is a division of data into groups of similar objects. • Each group, called cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. • Clustering is an unsupervised classification problem.

  5. Clustering Process (Document Case) Preprocessing step Clustering Algorithm • Document cleaning • Feature selection or extraction. • Similarity Measure • Criterion Clustering Function Documents samples 1 2 Clusters 3 4 Cluster validation Results interpretation • External Indices • Internal Indices • Relative Indices. Knowledge

  6. Clustering algorithm design or selection • This step is usually combined with the selection of a corresponding proximity measure and the construction of a criterion function. • Obviously, the proximity measure directly affects the formation of the resulting clusters. • Almost all clustering algorithms are explicitly or implicitly connected to some definition of proximity measure.

  7. Clustering algorithm design or selection • In order to be able to group similar data objects a proximity metric has to be used to find which objects (or clusters) are similar. • Similarity Measure can be done through measure how much two objects are similar to each other (Similarity) or measure how mach two objects are different (dissimilarity ). • There is a large number of similarity metrics reported in the literature due to the large number of representation models and clustering algorithms.

  8. Clustering algorithm design or selection Document cluster Document cluster Inter-Cluster Sim. Intra-Cluster Sim. Document cluster

  9. Clustering Algorithms • Once a proximity measure is chosen, the construction of a clustering criterion function makes the partition of clusters an optimization problem, which is well defined mathematically, and has rich solutions in the literature.

  10. Clustering Algorithms • K-means • Fuzzy C-means • Bisecting k-means Partitional Clustering NN Clustering Clustering Algorithms Density Clustering Grid Clustering Agglomerative (AHC) Hierarchical Clustering Divisive (DHC)

  11. {a, b,c,d,e} {a}, {b,c,d,e} {a}, {b,c}, {d,e} {a}, {b,c}, {d}, {e} {a}, {b}, {c}, {d}, {e} a b c d e Hierarchical Clustering • Hierarchical techniques produce a nested sequence of partitions, with a single all-inclusive cluster at the top and singleton clusters of individual objects at the bottom. • The result of a hierarchical clustering algorithm can be viewed as a tree, called a dendogram.

  12. Hierarchical Clustering • AHC starts with the set of objects as individual clusters; then, at each step merges the most two similar clusters. • This process is repeated until a minimal number of clusters have been reached, or, if a complete hierarchy is required then the process continues until only one cluster is left.

  13. Hierarchical Clustering • DHC Methods work from top to bottom, starting with the whole data set as one cluster, and at each step split a cluster until only singleton clusters of individual objects remain

  14. Partitional Clustering • Partitional clustering techniques create a one-level (un-nested) partitioning of the data points. • If K is the desired number of clusters, the partitional approaches typically find all K clusters at once. • The most known class of partitional clustering algorithms are the k-means algorithm and its variants. Centroids

  15. Neural Networks-Based Clustering • Neural networks (NNs) are able to learn complex relationships from data samples either in a supervised or unsupervised fashion. • In supervised leaning, a labeled set of data is used to train the network for modeling the input and output functions, prior to testing. Whereas unsupervised networks do not use such a priori knowledge but they can learn the underlying relationships from the data.

  16. Next: • Cluster validation • Examples of Clustering algorithm • Prepare 2 slides for each of the following clustering algorithm: • Density Clustering • Grid Clustering

  17. Thanks

More Related