240 likes | 361 Views
Automated Barcoding Using the Characteristic Attribute Organization System. Indra Neil Sarkar, PhD Divisions of Invertebrate Zoology & Library Services American Museum of Natural History. Consortium for the Barcoding of Life Data Analysis Working Group Muséum National d’Histoire Naturelle
E N D
Automated Barcoding Using theCharacteristic Attribute Organization System Indra Neil Sarkar, PhD Divisions of Invertebrate Zoology & Library Services American Museum of Natural History Consortium for the Barcoding of Life Data Analysis Working Group Muséum National d’Histoire Naturelle July 06, 2006
Ambition & Being BOLD http://www.jaestudio.com/
Barcoding • Identify Species • Recall • Precision • Speed • Simplicity • Consistency
Similarity Based Methods • BLAST • Database Retrieval • Clustering Algorithms • Phenetics
Phenetic vs Cladistic • Tree Topologies Are Often Different! • Which Is Right? • Does it Matter? • Similarity Methods (Phenetic) • Evolution of Complete Sequences • FAST • Character Methods (Cladistic) • Evolution of Individual Characters • SLOW
MLAT MLBT MRBT MLCT MRCT MRCA A Character Mindset MLAT MLBT MRBT MLCT MRCT MRCA
MLAT MLBT MRBT MLCT MRCT MRCA A Character Mindset Characters
MLAT MLBT MRBT MLCT MRCT MRCA A Character Mindset Character States
CAOS • Characteristic • Character States • Attribute • Characters • Organization System • Originally Designed as a Character-Based Heuristic for Phylogenetic Classification
A B C D CAOS
CA’s with single position CA’s with multiple positions ALL Members of One Group Have The Same Character State SOME Members of One Group Have The Same Character State CAOS Simple (s) Compound (c) Pure (Pu) Private (Pr)
Rule Set CAOS Classification Unclassified Sequence CAOS
Characters vs Vectors • Characters = Diagnostic • Apomorphies • Vectors ≠ Diagnostic • Similarity Score Which approach provides a consistent phylogenetic representation of data?
Mopalia Test Case • 569bp COI • 19 In-Group Species • 116 Individuals (~6/Species) • What Happens to Classification Accuracy with Limited Sampling (e.g., 50%)?
A B
1 2 1 2 T atcgatcgatcgatcgatcgatcgTatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgTatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgTatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgTatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgTatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgTatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgTatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgTatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgTatcgatcgatcgatcgatcgatcg ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------T------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ atcgatcgatcgatcgatcgatcgAatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgAatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgAatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgAatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgAatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgAatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgAatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgAatcgatcgatcgatcgatcgatcg atcgatcgatcgatcgatcgatcgAatcgatcgatcgatcgatcgatcg ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ ------------------------A------------------------ A
On Being Ambitious... • Inter- vs. Intra- Species Classification • Limited Sampling Strategies • Accuracy at the Cost of Speed
On Being BOLD... • Diagnostics • Primers • Drop-Off • PCR • TAQ Assay • Single Molecular Sequencing Oligos • Diagnostic-Based Query Interface (In Addition to NJ Interface)
Acknowledgments Rob DeSalleRyan P KellyPaul J PlanetMark SiddallAl Phillips MLA Donald A.B. Lindberg Research Fellowship National Science Foundation (IIS-0241229) Lewis B. & Dorothy Program for Molecular Systematics