290 likes | 379 Views
Education and Computational Biology. Dean L. Zeller Kent State University OCCBIO ‘06 July 28-30, 2006. “ …the great Tree of Life fills with its dead and broken branches the crust of the earth, and covers the surface with its ever-branching and beautiful ramifications. ”
E N D
Education and Computational Biology Dean L. Zeller Kent State University OCCBIO ‘06 July 28-30, 2006
“…the great Tree of Life fills with its dead and broken branches the crust of the earth, and covers the surface with its ever-branching and beautiful ramifications.” Charles Darwin (1809-1882) Father of Evolution Education of Computational Biology
Initial Inspiration • Colloquium by Dr. Lonnie Welsh on March 15th for KSU department of computer science: Extraterrestrials, Cryptanalysis, and Genomes: Perspectives on Bioinformatics Research • Looking for new perspectives in bioinformatics. • My perspective: educate a younger audience of computational biologists Education of Computational Biology
Outline • Goals of research • Evolution trees • Assignment 1 – Atlas of Evolution Trees • Assignment 2 – Atlas of Distance Graphs • Assignment 3 – Phylogeny Reconstruction • Future Work Education of Computational Biology
Goals of Research Specific Goals • Create “teachable” lessons on bioinformatics suitable for a mid-level computer science, mathematics, or biology class. • Make use of and create more adequate evolution models. Long Term Goals • Discover methods of phylogeny reconstruction from a new perspective. • Educate the next generation of computational biologists. Education of Computational Biology
Evolution Tree example Tree inferred by Unweighted Pair Group Method with Arithmetic mean(UPGMA) clustering of the Sarich (1969) immunological distance data set. [Felsenstein, p166] Education of Computational Biology
Evolution Tree example Education of Computational Biology
Class Assignments • Assignment 1 – Drawing Trees • The student will use a graphics package to create diagrams of binary evolution trees. • Assignment 2 – Phylogenetic Distance Graphs • The student will use a graphics package to construct distance graphs (k-leaf powers) for the evolution trees created in Assignment 1. • Assignment 3 – Phylogeny Reconstruction • The student will demonstrate an algorithm of phylogeny reconstruction from the results of theoretical experiments using the incremental k-leaf power. (Tested on CS10051 students, Spring 2006) Education of Computational Biology
Assumptions • By making simple assumptions, the problem complexity is greatly reduced. • Redundant nodes removed • Multiple splits nodes replaced with isomorphic approximations • Only consider isomorphically unique trees Education of Computational Biology
Assumption #1 • Redundant nodes are removed without loss of data. • It is already assumed the species is slowly changing over time. It does not add to the problem to consider a single point along the way. Education of Computational Biology
Assumption #2 • Multiple split nodes replaced with isomorphic approximations • Some loss of data, but greatly reduces the problem complexity Education of Computational Biology
Assumption #3 • Isomorphically unique trees Education of Computational Biology
Assignment 1:Atlas of Evolution Trees • Inspired by An Atlas of Graphs[Read and Wilson, 1999] • Elegant yet simple way to analyze graphs and trees, useful for instructional purposes. • Apply same style to phylogenies. Education of Computational Biology
Atlas of Evolution Trees (5 leaves) Education of Computational Biology
Atlas of Evolution Trees (6 leaves) Education of Computational Biology
Assignment 2: Atlas of Distance Graphs (k-leaf powers) • Builds on Assignment 1 – create the associative k-leaf powers for each tree. • Useful as a reference for studying relationship between clicks, k-leaf powers, and k-leaf roots. Education of Computational Biology
Atlas of Distance Graphs k=2 k=2 k=3 k=2 k=3 k=4 Education of Computational Biology
Atlas of Distance Graphs k=2 k=3 k=4 k=2 k=3 k=4 k=5 Education of Computational Biology
k = 2 k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 Distance Graph Simulator a b i h c d g e f Graph complete Education of Computational Biology
Phylogeny Reconstruction from Binary Genetic Data • Test returns 1 if species x and y are genetically close to a certain degree, and 0 otherwise. • Data collected to form a similarity grid and distance graph (k-leaf power). Education of Computational Biology
Step 1 – Difference Summary Table a b c d e f a 1 1 0 0 0 b 1 0 0 0 c 1 0 0 d 1 1 e 1 f Reconstruction Step 2 – k-leaf power Step 3 – phylogeny (k-leaf root) Education of Computational Biology
Reconstruction • Linear time solution exists for k = 3 [Brandstädt and Le, 2006] • … and k = 4 [Brandstädt et al, 2006] • An open problem for k 5 • Severely limits analysis capability. Education of Computational Biology
Assignment 3:Phylogeny Reconstruction from Discrete Genetic Data • Genetic test returns a discrete value (k=2,3,4,…) denoting distance between x and y in tree. • Data collected to form a distance grid. • Create k-leaf powers incrementally. Education of Computational Biology
Difference Summary Table a b c d e f a 2 3 5 6 6 b 3 5 6 6 c 4 5 5 d 3 3 e 2 f Reconstruction k 3 k 2 k 4 k 5 k 6 Education of Computational Biology
Incremental k-leaf power Distance 2 Direct Neighbors Distance 3 Close relatives Distance 4 Tree complete Education of Computational Biology
Literature Review of Related Methods • Additive and Ultrametric Trees [Wu and Chao, 2004] • Minimum Increment Evolution Tree (MEIT) [Wu and Chao, 2004] • Evolutionary Tree Insertion with Minimum Increment (ETIMI) [Wu and Chao, 2004] • Maximum Homeomorphic Agreement Subtree (MHT) [Gasieniec et al 1997] • Maximum Agreement Subtree (MAST) [Gąsieniec et al, 1997] • Maximum Inferred Consensus Tree (MICT) [Lingas et al, 1999] • Maximum Inferred Local Consensus Tree (MILCT) [Lingas et al, 1999] • Balanced Randomized Tree Splitting (BRTS) [Kao et al, 1999] • Merging Partial Evolution Trees (MPET) [Lingas et al, 1999] Education of Computational Biology
Future Work • Additional class assignments • Implement the Phylogeny Reconstruction Simulator using NetworkX • Remove redundant node and isomorphic approximation assumptions Education of Computational Biology
References [Br06a] Brandstädt, A. and V. B. Le (2006). “Structure and Linear Time Recognition of 3-Leaf Powers”, Information Processing Letters (98), 133-138. [Br06b] Brandstädt, A., V.B. Le, and R. Sritharan (2005). “Structure and Linear Time Recognition of 4-Leaf Powers”, Unpublished manuscript. [Fe04] J. Felsenstein (2004). Inferring Phylogenies, Sinauer Associates, Inc. [Ga97] L. Gąsieniec, J. Jansson, A. Lingas, and A. Östlin (1997), “On the complexity of computing evolutionary trees,” Proceedings of Computing and Combinatonics Third Annual International Conference COCOON ’97, Shanghai, China, pp. 134 to 145, Aug 97. [Ka99] Y. Kao, A. Lingas, and A. Östlin (1999), “Balanced Randomized Tree Splitting with Applications to Evolutionary Tree Constructions,” Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science, Trier, Germany, pp. 184 to 196, March 1999. [Li99] A. Lingas, H. Olsson, and A. Östlin (1999), “Efficient Merging, Construction, and Maintenance of Evolutionary Trees,” Proceedings of the 26th International Colloquium on Automata, Languages, and Programming (ICALP) ’99, Prague, Chech Republic, pp. 544 to 553, July 1999. [Re99] Read, R.C. and R.J. Wilson (1999). An Atlas of Graphs, Oxford Science Publications. [Wu04] Wu, B.Y. and K.M. Chao (2004). Spanning Trees and Optimization Problems. Chapman & Hall/CRC. Education of Computational Biology
Thank You • The full text of the paper, assignments, this presentation, and student examples are available on the author’s web page: http://www.cs.kent.edu/~dzeller/research Education of Computational Biology