1 / 25

The same dataset

O PTIMCLASS: Simultaneous identification of optimal clustering method and optimal number of clusters in vegetation classification studies. Tich y L ubomír 1 , Chytr y M ilan 1 , B otta-Dukát Zoltán 2 , Hájek M ichal 1 ; Talbot S tephen S. 3

Download Presentation

The same dataset

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OPTIMCLASS: Simultaneous identification of optimal clustering method and optimal number of clusters in vegetation classification studies Tichy Lubomír1, Chytry Milan1, Botta-Dukát Zoltán2, Hájek Michal1; Talbot Stephen S.3 1Masaryk University, Brno, Czech Republic2Hungarian Academy of Sciences, Vácrátot, Hungary3U.S. Fish and Wildlife Service, Anchorage, USA

  2. Why do we need a method for identification of optimal clustering algorithm and optimal number of clusters? The same dataset

  3. Why do we need a method for identification of optimal clustering algorithm and optimal number of clusters? A huge variety of clustering methods produce “reasonable” results. Subjective selection of the clustering methodand no. of clustersis usually based on empirical experience Methods published: Most algorithms identify the optimal partition mathematically, without considering ecological interpretation

  4. The Method Species 1 98788 12112 3.211Species 2 51123 1223. 11132Species 3 23132 ..... .....Species 4 ..2.4 112.. 1..5.Species 5 ..... .1.1. 1.213 A posteriori description of phytosociological tables is based on diagnosticspecies Diagnostic species describes a cluster. Therefore, the number of diagnostic species determines whether the classified table can be sufficiently interpreted.

  5. The Method The samedataset:

  6. The Method Measure of the classification quality: the total sum of diagnostic species Fisher’s Exact Test calculates the probability of observed occurrence of species across clusters for a right-tailed test hypothesis The measure reduces the importance of very small clusters. Easy interpretation: the more diagnostic species in the dataset, the better description of the clusters.

  7. The Method Southern Siberia, Sayan Mountains (310 plots; forest, steppe and tundra vegetation) Central Europe, Carpathians (241 plots; mire vegetation) Alaska, Kenai Peninsula(171 plots; wetlands) Test on three different datasets

  8. The Method Flexible beta clustering WARD‘s clusteringUPGMA(PC-ORD) Ordinal cluster analysis(SYN-TAX) Cover transformations (percentages, log percentages, Braun-Blanquet, presence/absence) Distance measures (Kruskal-Wallis, Kendall, Gower-Podani coefficient) Classifications tested Modified TWINSPAN classification(JUICE)The sequence of splits in divisive classification is determined by internal heterogeneity of clusters. Therefore, any number of clusters is possible (three modifications of pseudospecies cut levels) Distance measures(Bray-Curtis, Manhattan, Euclidean)

  9. Results No. of diag. spec. No. of diag. spec. Probability = 10-3 Probability = 10-9 Sayan Mountains, Siberia (310 plots, 1036 species) No. of clusters No. of diagnostic species Probability = 10-6 No. of clusters No. of clusters

  10. Results Number of diagnostic species Sayan Mountains, Siberia (310 plots, 1036 species) Untransformed cover data Number of clusters

  11. Results Number of diagnostic species Sayan Mountains, Siberia (310 plots, 1036 species) Euclidean distance measure Number of clusters

  12. Results Number of diagnostic species Sayan Mountains, Siberia (310 plots, 1036 species) Manhattan distance measure Number of clusters

  13. Results Number of diagnostic species Sayan Mountains, Siberia (310 plots, 1036 species) Bray-Curtis distance measure Number of clusters

  14. Results Number of diagnostic species Sayan Mountains, Siberia (310 plots, 1036 species) UPGMA Number of clusters

  15. Results Number of diagnostic species Sayan Mountains, Siberia (310 plots, 1036 species) Ward‘s method Number of clusters

  16. Results Number of diagnostic species Sayan Mountains, Siberia (310 plots, 1036 species) Flexible beta -0.25 Number of clusters

  17. Results Number of diagnostic species Sayan Mountains, Siberia (310 plots, 1036 species) Ordinal cluster analyses (SYN-TAX) Number of clusters

  18. Results Number of diagnostic species Sayan Mountains, Siberia (310 plots, 1036 species) Modified TWINSPAN Number of clusters

  19. The Method Southern Siberia, Sayan Mountains (310 plots; forest, steppe and tundra vegetation) Central Europe, Carpathians (241 plots; mire vegetation) Alaska, Kenai Peninsula(171 plots; wetlands) Test on three different datasets Similar results:

  20. Conclusions Classifications based on transformed cover values give better results than percentage covers. Euclidean distance - slightly poorer results than Manhattan or Bray-Curtis distances. UPGMA clustering method - poorer results than Ward’s and Flexible beta methods. No significant difference between ordinal cluster analysis proposed by Podani (SYN-TAX 2000) and other clusteringmethods. Modified TWINSPAN– performs well with small numbers of clusters.

  21. Modified TWINSPAN classification Number of diagnostic species occurrences Number of clusters

  22. Modified TWINSPAN classification Sum of diagnostic species Number of clusters

  23. Modified TWINSPAN classification Number of clusters with more than 4 diagnostic species Number of clusters

More Related