1 / 34

Pairwise Nearest Neighbor Method Revisited Parittainen yhdistelymenetelmä uudistettuna

Pairwise Nearest Neighbor Method Revisited Parittainen yhdistelymenetelmä uudistettuna. Olli Virmajoki. UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND. 11.12.2004. Clustering.

zanna
Download Presentation

Pairwise Nearest Neighbor Method Revisited Parittainen yhdistelymenetelmä uudistettuna

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise Nearest Neighbor Method RevisitedParittainen yhdistelymenetelmä uudistettuna Olli Virmajoki UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND 11.12.2004

  2. Clustering • Important combinatorial optimization problem that must often be solved as a part of more complicated tasks in • data analysis • pattern recognition • data mining • other fields of science and engineering • Entails partitioning a data set so that similar objects are grouped together and dissimilar objects are placed in separate groups

  3. Example of data sets Employment statistics RGB-data

  4. Summary of data sets

  5. Data sets

  6. An example of clustering

  7. Clustering • Given a set of N data vectors X={x1, x2, ...XN} in K-dimensional space, clustering aims at solving the partition P={p1, p2, ...pN}, which defines for each data vector the index of the cluster where it is assigned to. • Cluster sa = {xi|pi=a} • Clustering S={s1, s2, ...,sM} • Codebook C={c1, c2, ...,cM} • Cost function • Combinatorial optimization problem

  8. Clustering algorithms • Heuristic methods • Optimization methods • K-means • Genetic algorithms • Graph-theoretical methods • Hierarchical methods • Divisive • Agglomerative (yhdistelevä)

  9. Agglomerative clustering N = 22 ( number of data points ) M = 3 ( number of final clusters )

  10. Ward’s method (PNN in VQ) Merge cost: Local optimization strategy: • Nearest neighbor search is needed: • finding the cluster pair to be merged • updating of NN pointers

  11. The PNN method M=5000 M=50 M=5000 M=4999 M=4988 . . . M=50 . . M=16 M=15 M=15 M=16

  12. Nearest neighbor pointers Fast exaxt PNN method: Reduces the amount of the nearest neighbor searches in each iteration:O(N 3) Ω (N 2)

  13. Combining the PNN and k-means

  14. PNNas a crossover method in the genetic algorithm Initial1 Initial2 Two random codebooks M=15 Union Combined Result of PNN Combined codebook M=30 and final codebook M=15 PNN

  15. Publication 1:Speed-up methods • Partial distortion search (PDS) • Mean-distance-ordered search (MPS) • Uses the component means of the vectors • Derives a precondition for the distance calculations • Reduction of the run time to 2 to 15%

  16. Example of the MPS method Input vector Best candidate

  17. Publication 2:Graph-based PNN • Based on the exact PNN method • NN search is limited only to the k clusters that are connected by the graph structure • Reduces the time complexity of every search from O(N) to O(k) • Reduction in the run time to 1 to 4%

  18. Why graph structure ? Only O(k) searches with the graph structure ! (k = 3) O(N) searches with the full search (N=4096)

  19. Sample graph

  20. Publication 3:Multilevel thresholding • Can be considerd as a special case of vector quantization (VQ), where the vectors are 1-dimensional • Existing method (N 2) • PNN thresholding can be implemented in O(N·logN) • The proposed method works in real time for any number of thresholds

  21. Distances in heap structure O(log N) O(1)

  22. Publication 4:Iterative shrinking (IS) • Generates the clustering by a sequence of cluster removal operations • In the IS method the vectors can be reassigned more freely than in the PNN method • Can be applied as a crossover method in the genetic algorithm (GAIS) • GAIS outperforms all other clustering algorithms

  23. Example of the PNN method

  24. Example of the iterative shrinking method

  25. The PNN and IS in the search of the number of clusters

  26. Time-distortion performance

  27. Publication 5:Optimal clustering • Can be found by considering all possible merge sequences and finding the one that minimizes the optimization function • Can be implemented as a branch-and-bound (BB) technique • Two suboptimal, but polynomial, time variants: • Piecewise optimization • Look-Ahead optimization

  28. Example of non-redundant search tree Branches that do not have any valid clustering have been cut out

  29. Illustration of the Piecewise optimization

  30. Comparative results

  31. Comparative results

  32. Comparative results

  33. Example of clustering k-means agglomerative clustering

  34. Conclusions • Several speed-up methods • Projection-based search • Partial distortion search • k nearest neighbor graph • Efficient O(N·logN) time implementation for the 1-dimensional case • Generalization of the merge phase by cluster removal philosofy (IS) for better quality • Optimal clustering based on the PNN method

More Related