1 / 16

Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

Brief introduction to lectures. Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge. Transparencies prepared by Ho Tu Bao [JAIST]. Lecture 5: Automatic Cluster Detection.

holland
Download Presentation

Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brief introduction to lectures Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Transparencies prepared by Ho Tu Bao [JAIST]

  2. Lecture 5: Automatic Cluster Detection • One of the most widely used KDD classification techniques for unsupervised data. • Content of the lecture 1. Introduction 2. Partitioning Clustering 3. Hierarchical Clustering 4. Software and case-studies • Prerequisite: Nothing special

  3. Partitioning Clustering • Each cluster must contain at least one object • Each object must belong to exactly one group

  4. Partitioning Clustering What is a “good” partitioning clustering? Key ideas: Objects in each group are similar and objects between different groups are dissimilar. = P { { x , x , x , x , x } , { x , x } , { x , x , x } } 1 4 7 9 10 2 7 3 5 6 1 4 4 2 4 4 3 2 1 4 2 4 3 1 3 1 2 3 P P P Minimize the within-group distance and Maximize the between-group distance. Notice: Many ways to define the “within-group distance” (the average of distance to the group’s center or the average of distance between all pairs of objects, etc.) and to define the “between-group distance”. It is in general impossible to find the optimal clustering.

  5. Hierarchical Clustering Partition Q is nested into partition P if every component of Q is a subset of a component of P. A hierarchical clustering is a sequence of partitions in which each partition is nested into the next partition in the sequence. (This definition is for bottom-up hierarchical clustering. In case of top-down hierarchical clustering, “next” becomes “previous”).

  6. Bottom-up Hierarchical Clustering x1 x2 x3 x4 x5 x6

  7. Top-Down Hierarchical Clustering x1 x2 x3 x4 x5 x6

  8. OSHAM: Hybrid Model Multiple Inheritance Concepts Brief Description of Concepts Wisconsin Breast Cancer Data Concept Hierarchy Discovered Concepts Attributes

  9. Brief introduction to lectures Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge

  10. Lecture 6: Neural networks • One of the most widely used KDD classification techniques. • Content of the lecture • Prerequisite: Nothing special 1. Neural network representation 2. Feed-forward neural networks 3. Using back-propagation algorithm 4. Case-studies

  11. Brief introduction to lectures Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge

  12. Lecture 7 Evaluation of discovered knowledge • One of the most widely used KDD classification techniques. • Content of the lecture 1. Cross validation 2. Bootstrapping 3. Case-studies • Prerequisite: Nothing special

  13. Out-of-sample testing Training data Induction method 2/3 Historical Data (warehouse) Sample data Model Sampling method Sampling method 1/3 Testing data Error estimation error The quality of the test sample estimate is dependent on the number of test cases and the validity of the independent assumption

  14. Cross Validation iterate Sample 1 Induction method Historical Data (warehouse) Sample 2 Sample data Model Sampling method Sampling method . . . Sample n Error estimation 10-fold cross validation appears adequate (n = 10) Run’s error - Mutually exclusive - Equal size Error estimation

  15. Evaluation: k-fold cross validation (k=3) 3 1 2 1 A method to be evaluated 2 2 3 1 1 2 3 3 A data set run on each 2 subsets as training data to find knowledge average all the accuracies as final evaluation test on the remaining one subset as testing data to evaluate the accuracy randomly split the data set into 3 subsets of equal size

  16. Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion This presentation summarizes the content and organization of lectures in module “Knowledge Discovery and Data Mining”

More Related