30 likes | 146 Views
Canopy Clustering Given a distance measure and two threshold distances T1>T2, 1. Determine canopy centers - go through The list of input points to form a list of “clusterCenters”. If a point is within T2 of A a point in clusterCenters, then ignore it. If not, then append the point to
E N D
Canopy Clustering Given a distance measure and two threshold distances T1>T2, 1. Determine canopy centers - go through The list of input points to form a list of “clusterCenters”. If a point is within T2 of A a point in clusterCenters, then ignore it. If not, then append the point to ClusterCenters. 2. Determine canopy membership – for each point in the input set, if the point is Within T1 of a cluster center, then the point is a member of the corresponding cluster
Combine Canopy and kMeans or EM Only calculate distances for points that share a canopy with the centroid. (assign infinite distance to points outside the canopies containing the Centroid.
Canopy Clustering with MR Given distance metric and tighter threshold T2 Mapper – Start with empty set of canopyCenters. For each x in inputData, if x is further than T2 from any member of canopyCenters, Then add x to canopyCenters and emit (1, x). Reducer – start with empty set of canopyCenters. Input = (key, iterator over mapper cluster centers). For x in iterator, if x is further than T2 from any member of canopyCenters, then add x to canopyCenters and emit(1,x). This results in a list of canopy centers to be used for determining canopy membership