200 likes | 447 Views
Classification: Cluster Analysis and Related Techniques. Tanya , Caroline , Nick. Introduction to Classification. Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together
E N D
Classification: Cluster Analysis and Related Techniques Tanya,Caroline,Nick
Introduction to Classification • Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together • Help researchers explore data and generate hypotheses like ordination • Ordination techniques vs. Classification techniques
Objective ?? • What is a cluster? • No formal rule exists for identifying clusters→ it is subjective; you make the call
Hierarchical vs. Non-Hierarchical • Hierarchical divide data into clusters and looks for relationships between them to create higher order clusters→ create dendrograms • Dendrograms subdivide a set of individuals into progressively smaller clusters until a stopping condition is encountered • Non-hierarchical divide data into clusters without looking at relationships between clusters
Hierarchical Techniques • Monothetic vs. Polythetic • Monothetic imposes classifications based on the presence or absence of one attribute at a time • Association analysis • Polythetic uses all information within data • Most common modern approach • Cluster analysis • TWINSPAN
Cluster Analysis • Many procedures and algorithms may be used to create a valid dendrogram • Similar in technique to Bray-Curtis Ordination • Procedure: • Square Matrix of Dissimilarities →Find lowest distance in matrix →Identify pair that generated this →Fuse two observations together (First Cluster)
Rules for cluster formation • Single- link clustering (AKA Nearest- neighbor clustering) • Clusters are defined by fusing the individual pairs with the smallest distance • Chaining- two individuals ending up in the same cluster despite having a big dissimilarity → occurs if linked by closely connected points • Constituent clusters may increase in size gradually with each fusion adding one or small number of elements →inconclusive and hard to interpret
Other Rules • Complete- Link Clustering • Allows fusion between members separated by the greatest distance • Exact opposite of Single Link • May end up separating individuals that are very similar • Minimum Variance Clustering (Ward’s technique) • Intermediate
Interpretation • There are NO objective rules for interpreting dendrograms • Use dendrogram for Hypothesis Formation → look for divisions that coincide with existing knowledge about the data → Metadata (Chapter 1) • Complementary Analysis
Divisive Classification Techniques • Takes an entire dataset and divides it into categories • As always, the boundaries for these categories is subjective • On a plus though, this forces us to admit that there is some uncertainty which a software package wouldn’t tell us
TWINSPAN • Acronym for Two-way indicator species analysis • Polythetic divisive classification technique • Output is in two-way tables
TWINSPAN Tables • There are two ordered lists, one for species and one for observations • There are two dendrograms, one to classify species, and one to classify observations • Pseudospecies are constructs that convert continuous distributions to a presence/absence (discrete)
HOMEWORK!!!!!! 1) What is the difference between Hierarchical and Non- Hierarchical classification technique 2) Define Cluster 3) T/F There can be only one valid dendrogram for a single data set? (Correct if False) **********Bonus********** What is the background of the powerpoint suppose to represent?