160 likes | 407 Views
Brief introduction to lectures. Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge. Transparencies prepared by Ho Tu Bao [JAIST]. Lecture 5: Automatic Cluster Detection.
E N D
Brief introduction to lectures Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Transparencies prepared by Ho Tu Bao [JAIST]
Lecture 5: Automatic Cluster Detection • One of the most widely used KDD classification techniques for unsupervised data. • Content of the lecture 1. Introduction 2. Partitioning Clustering 3. Hierarchical Clustering 4. Software and case-studies • Prerequisite: Nothing special
Partitioning Clustering • Each cluster must contain at least one object • Each object must belong to exactly one group
Partitioning Clustering What is a “good” partitioning clustering? Key ideas: Objects in each group are similar and objects between different groups are dissimilar. = P { { x , x , x , x , x } , { x , x } , { x , x , x } } 1 4 7 9 10 2 7 3 5 6 1 4 4 2 4 4 3 2 1 4 2 4 3 1 3 1 2 3 P P P Minimize the within-group distance and Maximize the between-group distance. Notice: Many ways to define the “within-group distance” (the average of distance to the group’s center or the average of distance between all pairs of objects, etc.) and to define the “between-group distance”. It is in general impossible to find the optimal clustering.
Hierarchical Clustering Partition Q is nested into partition P if every component of Q is a subset of a component of P. A hierarchical clustering is a sequence of partitions in which each partition is nested into the next partition in the sequence. (This definition is for bottom-up hierarchical clustering. In case of top-down hierarchical clustering, “next” becomes “previous”).
Bottom-up Hierarchical Clustering x1 x2 x3 x4 x5 x6
Top-Down Hierarchical Clustering x1 x2 x3 x4 x5 x6
OSHAM: Hybrid Model Multiple Inheritance Concepts Brief Description of Concepts Wisconsin Breast Cancer Data Concept Hierarchy Discovered Concepts Attributes
Brief introduction to lectures Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge
Lecture 6: Neural networks • One of the most widely used KDD classification techniques. • Content of the lecture • Prerequisite: Nothing special 1. Neural network representation 2. Feed-forward neural networks 3. Using back-propagation algorithm 4. Case-studies
Brief introduction to lectures Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge
Lecture 7 Evaluation of discovered knowledge • One of the most widely used KDD classification techniques. • Content of the lecture 1. Cross validation 2. Bootstrapping 3. Case-studies • Prerequisite: Nothing special
Out-of-sample testing Training data Induction method 2/3 Historical Data (warehouse) Sample data Model Sampling method Sampling method 1/3 Testing data Error estimation error The quality of the test sample estimate is dependent on the number of test cases and the validity of the independent assumption
Cross Validation iterate Sample 1 Induction method Historical Data (warehouse) Sample 2 Sample data Model Sampling method Sampling method . . . Sample n Error estimation 10-fold cross validation appears adequate (n = 10) Run’s error - Mutually exclusive - Equal size Error estimation
Evaluation: k-fold cross validation (k=3) 3 1 2 1 A method to be evaluated 2 2 3 1 1 2 3 3 A data set run on each 2 subsets as training data to find knowledge average all the accuracies as final evaluation test on the remaining one subset as testing data to evaluate the accuracy randomly split the data set into 3 subsets of equal size
Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion This presentation summarizes the content and organization of lectures in module “Knowledge Discovery and Data Mining”