130 likes | 487 Views
Phylogenetic Tree Construction. Mark Eldridge Andrew Larsen Michael Lollis Thomas Marley Michael Smith. Intro page (overview of talk):. Tom – Intro to the topic. Andrew -- Reading in objects from a FASTA file and MUSCLE compare. Mike S. -- Getting the Matrix
E N D
Phylogenetic Tree Construction Mark Eldridge Andrew Larsen Michael Lollis Thomas Marley Michael Smith
Intro page (overview of talk): • Tom – Intro to the topic. • Andrew -- Reading in objects from a FASTA file and MUSCLE compare. • Mike S. -- Getting the Matrix • Mark -- Determining the Matrix • Mike L. -- Building the Tree • Conclusion -- some examples of our program in action. • Q & A
Turn this: We set out to... Label A B C D E F Sequence GATTCCAG GATTCTGG GGTTCCGG GGTTTCGG GGCTCCGA GGCCCCGG into this:
How? UPGMA: Unweighted Pair Group Method with Arithmetic Mean • Construct distance matrix (pairwise between groups) • Merge two closest groups • Repeat steps 1 and 2 until only two groups remain • Note: distances for merged groups are calculated by taking the arithmetic mean of distances for all members
FASTA file and MUSCLE compare • Format,standards, and lots of data... • We figured out how to read in "SeqIO objects" • Now that we have the objects what do we do with them? • MUSCLE power. • So now what do we have? • A pretty ideal way to access a semi-large dataset. • We normalized the data for later functions and computing.
Getting the Matrix Have object with an ID to identify the gene, and the sequence Muscle has already aligned the sequences to be the same length Compare function does a character-to-character compare of similarities Using NumPy, we create a matrix and filled the matrix with the first run of comparisons It was then in a format for successive similarity calls
Recursive Function to Determine Next Matrix A A B B C C D D E E Initial Formula Weighted Formula A A BDC BDC E E A BD C E A -1 -1 -1 -1 -1 A -1 -1 -1 -1 -1 A -1 -1 -1 -1 B 4 -1 -1 -1 -1 B 4 -1 -1 -1 -1 A A -1 -1 -1 -1 -1 -1 BD 3 -1 -1 -1 C 4 3 -1 -1 -1 C 4 3 -1 -1 -1 BDC 3.5 -1 -1 BDC 3.33 -1 -1 C 4 2.5 -1 -1 D 2 1 2 -1 -1 D 2 1 2 -1 -1 E E 3 3 4.25 4 -1 -1 E 3 3.5 5 -1 E 3 4 5 3 -1 E 3 4 5 3 -1 First Matrix First List 0: ‘A’ 1: ‘B’ 2: ‘C’3: ‘D’4: ‘E’ Min = 1Min = (3, 1) -> (B, D) For new matrix, append D onto B. BD to A = BD to C = BD to E = Min = 2.5Min = (2, 1) -> (BD, C) Second Matrix Second List 0: ‘A’ 1: ‘(B, D)’ 2: ‘C’3: ‘E’
What is Dendropy and why did we use it? • Dendropy is a library of functions for python that allow the user to create phylogenetic tree structures and display them. • Phylo vs. Dendropy • Phylo was "too powerful" and didn't allow for much "under the hood" code. • Dendropy provided more basic functionality. How did we build the tree? • Build upon a 'newick' formatted string each time Mark's algorithm recuresed. • Draw an ASCII representation of the phylogenetic tree.