80 likes | 259 Views
MIS 451 Building Business Intelligence Systems. Clustering (1). Problem. Target Marketing Diaper, Baby food, Swiss cheese and Belgian Toys chocolate . French Wine. Clustering.
E N D
MIS 451Building Business Intelligence Systems Clustering (1)
Problem • Target Marketing Diaper, Baby food, Swiss cheese and Belgian Toys chocolate French Wine
Clustering • Clustering is a data mining method for grouping data points such that data points within the same cluster are similar and data points in different clusters are dissimilar. • How to calculate similarity between data points??
Measuring Similarity • Continuous variable • Use distance to measure dissimilarity between data points • For two data points, distance between them can be measured in two ways • Manhattan distance • Euclidean distance
Measuring Similarity • For two continuous data points X, Y, Manhattan distance is defined as:
Measuring Similarity • Example of Manhattan distance • NAME AGE SPENDING($) • Sue 21 2300 • Carl 27 2600 • TOM 45 5400 • JACK 52 6000
Measuring Similarity • For two continuous data points X, Y, Euclidean distance is defined as:
Measuring Similarity • Example of Euclidean distance • NAME AGE SPENDING($) • Sue 21 2300 • Carl 27 2600 • TOM 45 5400 • JACK 52 6000 • Reading: Data mining book PP335-341