Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks

Brief introduction to lectures Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Transparencies prepared by Ho Tu Bao [JAIST]

Lecture 5: Automatic Cluster Detection • One of the most widely used KDD classification techniques for unsupervised data. • Content of the lecture 1. Introduction 2. Partitioning Clustering 3. Hierarchical Clustering 4. Software and case-studies • Prerequisite: Nothing special

Partitioning Clustering • Each cluster must contain at least one object • Each object must belong to exactly one group

Partitioning Clustering What is a “good” partitioning clustering? Key ideas: Objects in each group are similar and objects between different groups are dissimilar. Minimize the within-group distance and Maximize the between-group distance. Notice: Many ways to define the “within-group distance” (the average of distance to the group’s center or the average of distance between all pairs of objects, etc.) and to define the “between-group distance”. It is in general impossible to find the optimal clustering.

Hierarchical Clustering Partition Q is nested into partition P if every component of Q is a subset of a component of P. A hierarchical clustering is a sequence of partitions in which each partition is nested into the next partition in the sequence. (This definition is for bottom-up hierarchical clustering. In case of top-down hierarchical clustering, “next” becomes “previous”).

Bottom-up Hierarchical Clustering x1 x2 x3 x4 x5 x6

Top-Down Hierarchical Clustering x1 x2 x3 x4 x5 x6

OSHAM: Hybrid Model Multiple Inheritance Concepts Brief Description of Concepts Wisconsin Breast Cancer Data Concept Hierarchy Discovered Concepts Attributes

Brief introduction to lectures Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge

Lecture 6: Neural networks • One of the most widely used KDD classification techniques. • Content of the lecture • Prerequisite: Nothing special 1. Neural network representation 2. Feed-forward neural networks 3. Using back-propagation algorithm 4. Case-studies

Brief introduction to lectures Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge

Lecture 7 Evaluation of discovered knowledge • One of the most widely used KDD classification techniques. • Content of the lecture 1. Cross validation 2. Bootstrapping 3. Case-studies • Prerequisite: Nothing special

Out-of-sample testing Training data Induction method 2/3 Historical Data (warehouse) Sample data Model Sampling method Sampling method 1/3 Testing data Error estimation error The quality of the test sample estimate is dependent on the number of test cases and the validity of the independent assumption

Cross Validation iterate Sample 1 Induction method Historical Data (warehouse) Sample 2 Sample data Model Sampling method Sampling method . . . Sample n Error estimation 10-fold cross validation appears adequate (n = 10) Run’s error - Mutually exclusive - Equal size Error estimation

Evaluation: k-fold cross validation (k=3) 3 1 2 1 A method to be evaluated 2 2 3 1 1 2 3 3 A data set run on each 2 subsets as training data to find knowledge average the accuracies as final evaluation test on the rest subset as testing data to evaluate the accuracy randomly split the data set into 3 subsets of equal size

Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion This presentation summarizes the content and organization of lectures in module “Knowledge Discovery and Data Mining”

Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks