1 / 71

Efficient Algorithms for Non-parametric Clustering With Clutter

Efficient Algorithms for Non-parametric Clustering With Clutter. Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement). Problems From the Physical Sciences. Minefield detection (Dasgupta and Raftery 1998). Earthquake faults (Byers and Raftery 1998).

genemiller
Download Presentation

Efficient Algorithms for Non-parametric Clustering With Clutter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)

  2. Problems From the Physical Sciences Minefield detection (Dasgupta and Raftery 1998) Earthquake faults (Byers and Raftery 1998)

  3. Problems From the Physical Sciences (Pereira 2002) (Sloan Digital Sky Survey 2000)

  4. A Simplified Example

  5. Clustering with Single Linkage Clustering Single Linkage Clustering MST Clusters

  6. Clustering with Mixture Models Mixture of Gaussians with a Uniform Background Component Resulting Clusters

  7. Clustering with CFF Original Dataset Cuevas-Febrero-Fraiman

  8. Related Work (Dasgupta and Raftery 98) • Mixture model approach – mixture of Gaussians for features, Poisson process for clutter (Byers and Raftery 98) • K-nearest neighbour distances for all points modeled as a mixture of two gamma distributions, one for clutter and one for the features • Classify each data point based on which component it was most likely generated from

  9. Outline 1. Introduction: Clustering and Clutter 2. The Cuevas-Febreiro-Fraiman Algorithm 3. Optimizing Step One of CFF 4. Optimizing Step Two of CFF 5. Results

  10. The CFF Algorithm Step One Find the high density datapoints

  11. The CFF Algorithm Step Two • Cluster the high density points using Single Linkage Clustering • Stop when link length > 

  12. The CFF Algorithm • Originally intended to estimate the number of clusters • Can also be used to find clusters against a noisy background

  13. Step One: Density Estimators • Finding high density points requires a density estimator • Want to make as few assumptions about underlying density as possible • Use a non-parametric density estimator

  14. A Simple Non-Parametric Density Estimator A datapoint is a high density datapoint if: The number of datapoints within a hypersphere of radius h is > threshold c

  15. Speeding up the Non-Parametric Density Estimator • Addressed in a separate paper (Gray and Moore 2001) • Two basic ideas: 1. Use a dual tree algorithm (Gray and Moore 2000) 2. Cut search off early without computing exact densities (Moore 2000)

  16. Step Two: Euclidean Minimum Spanning Trees (EMSTs) • Traditional MST algorithms assume you are given all the distances • Implies O(N2) memory usage • Want to use a Euclidean Minimum Spanning Tree algorithm

  17. Optimizing Clustering Step • Exploit recent results in computational geometry for efficient EMSTs • Involves modification to GeoMST2 algorithm by (Narasimhan et al 2000) • GeoMST2 is based on Well-Separated Pairwise Decompositions (WSPDs) (Callahan 1995) • Our optimizations gain an order of magnitude speedup, especially in higher dimensions

  18. Outline for Optimizing Step Two 1. High level overview of GeoMST2 2. Properties of a WSPD 3. How to create a WSPD 4. More detailed description of GeoMST2 5. Our optimizations

  19. Intuition behind GeoMST2

  20. Intuition behind GeoMST2

  21. High Level Overview of GeoMST2 Well-Separated Pairwise Decomposition (A1,B1) (A2,B2) . . . (Am,Bm)

  22. High Level Overview of GeoMST2 Well-Separated Pairwise Decomposition Each Pair (Ai,Bi) represents a possible edge in the MST (A1,B1) (A2,B2) . . . (Am,Bm)

  23. High Level Overview of GeoMST2 1. Create the Well-Separated Pairwise Decomposition (A1,B1) (A2,B2) . . . (Am,Bm) 2. Take the pair (Ai,Bi) that corresponds to the shortest edge 3. If the vertices of that edge are not in the same connected component, add the edge to the MST. Repeat Step 2.

  24. A Well-Separated Pair (Callahan 1995) • Let A and B be point sets in d • Let RA and RB be their respective bounding hyper-rectangles • Define MargDistance(A,B) to be the minimum distance between RA and RB

  25. A Well-Separated Pair (Cont) The point sets A and B are considered to be well-separated if: MargDistance(A,B)  max{Diam(RA),Diam(RB)}

  26. Interaction Product The interaction product between two point sets A and B is defined as: A  B = {{p,p’} | p  A, p’  B, p  p’}

  27. Interaction Product The interaction product between two point sets A and B is defined as: A  B = {{p,p’} | p  A, p’  B, p  p’} This is the set of all distinct pairs with one element in the pair from A and the other element from B

  28. Interaction Product Definition The interaction product between two point sets A and B is defined as: A  B = {{p,p’} | p  A, p’  B, p  p’} For Example: A = {1,2,3} B = {4,5} A  B = {{1,4}, {1,5}, {2,4}, {2,5}, {3,4}, {3,5}}

  29. Interaction Product Now let A and B be the same point set ie. A = {0,1,2,3,4} B = {0,1,2,3,4} A  B = {{0,1}, {0,2}, {0,3},{0,4}, {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}}

  30. Interaction Product Now let A and B be the same point set ie. A = {0,1,2,3,4} B = {0,1,2,3,4} A  B = {{0,1}, {0,2}, {0,3}, {0,4}, {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}} Think of this as all possible edges in a complete, undirected graph with {0,1,2,3,4} as the vertices

  31. A Well-Separated Pairwise Decomposition Pair #1: ([0],[1]) Pair #2: ([0,1], [2]) Pair #3: ([0,1,2],[3,4]) Pair #4: ([3], [4]) Claim: The set of pairs {([0],[1]), ([0,1], [2]), ([0,1,2],[3,4]), ([3], [4])} form a Well-Separated Decomposition.

  32. Interaction Product Properties If P is a point set in d then a WSPD of P is a set of pairs (Ai,Bi),…,(Ak,Bk) with the following properties: 1. Ai  P and Bi  P for all i = 1,…,k 2. Ai  Bi =  for all i = 1, …, k A = {0,1,2,3,4} B = {0,1,2,3,4} {([0],[1]), ([0,1], [2]), ([0,1,2],[3,4]), ([3], [4])} clearly satisfies Properties 1 and 2

  33. Interaction Product Property 3 3. (Ai  Bi)  (Aj  Bj) =  for all i,j such that i  j From {([0],[1]), ([0,1], [2]), ([0,1,2],[3,4]), ([3], [4])} we get the following interaction products: A1 B1 = {{0,1}} A2  B2 = {{0,2},{1,2}} A3  B3 = {{0,3},{1,3},{2,3},{0,4},{1,4},{2,4}} A4  B4 = {{3,4}} These Interaction Products are all disjoint

  34. Interaction Product Property 4 4. P  P = {{0,1}, {0,2}, {0,3}, {0,4}, {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}} A1 B1 = {{0,1}} A2  B2 = {{0,2},{1,2}} A3  B3 = {{0,3},{1,3},{2,3},{0,4},{1,4},{2,4}} A4  B4 = {{3,4}} The Union of the above Interaction Products gives back P  P

  35. Interaction Product Property 5 5. Ai and Bi are well-separated for all i=1,…,k

  36. Two Points to Note about WSPDs • Two distinct points are considered to be well-separated • For any data set of size n, there is a trivial WSPD of size (n choose 2)

  37. A Well-Separated Pairwise Decomposition (Continued) If there are n points in P, a WSPD of P can be constructed in O(nlogn) time with O(n) elements using a fair split tree (Callahan 1995)

  38. A Fair Split Tree

  39. Creating a WSPD Are the nodes outlined in yellow well-separated? No.

  40. Creating a WSPD Recurse on children of node with widest dimension

  41. Creating a WSPD Recurse on children of node with widest dimension

  42. Creating a WSPD Recurse on children of node with widest dimension

  43. Creating a WSPD And so on…

  44. Base Case Eventually you will find a well-separated pair of nodes. Add this pair to the WSPD.

  45. Another Example of the Base Case

  46. Creating a WSPD FindWSPD(W,NodeA,NodeB) if( IsWellSeparated(NodeA,NodeB)) AddPair(W,NodeA,NodeB) else if( MaxHrectDimLength(NodeA) < MaxHrectDimLength(NodeB) ) Swap(NodeA,NodeB) FindWSPD(W,NodeA->Left,NodeB) FindWSPD(W,NodeA->Right,NodeB)

  47. High Level Overview of GeoMST2 1. Create the Well-Separated Pairwise Decomposition (A1,B1) (A2,B2) . . . (Am,Bm) 2. Take the pair (Ai,Bi) that corresponds to the shortest edge 3. If the vertices of that edge are not in the same connected component, add the edge to the MST. Repeat Step 2

  48. Bichromatic Closest Pair Distance Given two sets (Ai,Bi), the Bichromatic Closest Pair Distance is the closest distance from a point in Ai to a point in Bi

  49. High Level Overview of GeoMST2 1. Create the Well-Separated Pairwise Decomposition (A1,B1) (A2,B2) . . . (Am,Bm) 2. Take the pair (Ai,Bi) with the shortest BCP distance 3. If Ai and Bi are not already connected, add the edge to the MST. Repeat Step 2.

  50. GeoMST2 Example Start Current MST

More Related