1 / 20

Classification: Cluster Analysis and Related Techniques

Classification: Cluster Analysis and Related Techniques. Tanya , Caroline , Nick. Introduction to Classification. Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together

etta
Download Presentation

Classification: Cluster Analysis and Related Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification: Cluster Analysis and Related Techniques Tanya,Caroline,Nick

  2. Introduction to Classification • Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together • Help researchers explore data and generate hypotheses like ordination • Ordination techniques vs. Classification techniques

  3. Objective ?? • What is a cluster? • No formal rule exists for identifying clusters→ it is subjective; you make the call

  4. Hierarchical vs. Non-Hierarchical • Hierarchical divide data into clusters and looks for relationships between them to create higher order clusters→ create dendrograms • Dendrograms subdivide a set of individuals into progressively smaller clusters until a stopping condition is encountered • Non-hierarchical divide data into clusters without looking at relationships between clusters

  5. Dendrogram of Classification Techniques

  6. Hierarchical Techniques • Monothetic vs. Polythetic • Monothetic imposes classifications based on the presence or absence of one attribute at a time • Association analysis • Polythetic uses all information within data • Most common modern approach • Cluster analysis • TWINSPAN

  7. Cluster Analysis • Many procedures and algorithms may be used to create a valid dendrogram • Similar in technique to Bray-Curtis Ordination • Procedure: • Square Matrix of Dissimilarities →Find lowest distance in matrix →Identify pair that generated this →Fuse two observations together (First Cluster)

  8. Example

  9. Example

  10. Dissimilarity Matrix

  11. Rules for cluster formation • Single- link clustering (AKA Nearest- neighbor clustering) • Clusters are defined by fusing the individual pairs with the smallest distance • Chaining- two individuals ending up in the same cluster despite having a big dissimilarity → occurs if linked by closely connected points • Constituent clusters may increase in size gradually with each fusion adding one or small number of elements →inconclusive and hard to interpret

  12. Other Rules • Complete- Link Clustering • Allows fusion between members separated by the greatest distance • Exact opposite of Single Link • May end up separating individuals that are very similar • Minimum Variance Clustering (Ward’s technique) • Intermediate

  13. Interpretation • There are NO objective rules for interpreting dendrograms • Use dendrogram for Hypothesis Formation → look for divisions that coincide with existing knowledge about the data → Metadata (Chapter 1) • Complementary Analysis

  14. Divisive Classification Techniques • Takes an entire dataset and divides it into categories • As always, the boundaries for these categories is subjective • On a plus though, this forces us to admit that there is some uncertainty which a software package wouldn’t tell us

  15. TWINSPAN • Acronym for Two-way indicator species analysis • Polythetic divisive classification technique • Output is in two-way tables

  16. TWINSPAN Tables • There are two ordered lists, one for species and one for observations • There are two dendrograms, one to classify species, and one to classify observations • Pseudospecies are constructs that convert continuous distributions to a presence/absence (discrete)

  17. HOMEWORK!!!!!! 1) What is the difference between Hierarchical and Non- Hierarchical classification technique 2) Define Cluster 3) T/F There can be only one valid dendrogram for a single data set? (Correct if False) **********Bonus********** What is the background of the powerpoint suppose to represent?

More Related