170 likes | 327 Views
Marc A. Schaub February 22 nd , 2008. CS 262 Problem Session. Problem Set 2 Solutions Tree Reconstruction Algorithms. Based on slides by - Andreas Sundquist and George Asimenos (problem 1) - Serafim Batzoglou (tree reconstruction). Problem 1(a). Problem 1(b). Baum-Welch:
E N D
Marc A. Schaub February 22nd, 2008 CS 262 Problem Session Problem Set 2 Solutions Tree Reconstruction Algorithms Based on slides by - Andreas Sundquist and George Asimenos(problem 1) - SerafimBatzoglou(tree reconstruction)
Problem 1(b) • Baum-Welch: Suppose Forward: Similar for Backward
Problem 1(b) • Baum-Welch:
Problem 1(b) • Baum-Welch:
Problem 1(b) • Baum-Welch: Given Inductive step: After training:
Problem 1(b) • Viterbi: Viterbi parse may arbitrarily choose state k over state k’ Akl Ak’l a’kl a’k’l
Problem 1(c) Viterbi
Problem 1(c) Viterbi
d1,4 Additive Distances 1 4 12 8 Given a tree, a distance measure is additive if the distance between any pair of leaves is the sum of lengths of edges connecting them Given a tree T & additive distances dij, can uniquely reconstruct edge lengths: • Find two neighboring leaves i, j, with common parent k • Place parent node k at distance dkm = ½ (dim + djm – dij) from any node m i, j 3 7 9 5 11 10 6 2
Neighbor-Joining • Guaranteed to produce the correct tree if distance is additive • May produce a good tree even when distance is not additive Step 1: Finding neighboring leaves Define Dij = (N – 2) dij – kidik – kjdjk Claim: The above “magic trick” ensures that Dij is minimal iffi, j are neighbors 1 3 0.1 0.1 0.1 0.4 0.4 4 2
Algorithm: Neighbor-joining Initialization: Define T to be the set of leaf nodes, one per sequence Let L = T Iteration: Pick i, j s.t. Dij is minimal Define a new node k, and set dkm = ½ (dim + djm – dij) for all m L Add k to T, with edges of lengths dik = ½ (dij + ri – rj), djk = dij – dik where ri = (N – 2)-1kidik Remove i, j from L; Add k to L Termination: When L consists of two nodes, i, j, and the edge between them of length dij
Parsimony – direct method not using distances • One of the most popular methods: • GIVEN multiple alignment • FIND tree & history of substitutions explaining alignment Idea: Find the tree that explains the observed sequences with a minimal number of substitutions Two computational subproblems: • Find the parsimony cost of a given tree (easy) • Search through all tree topologies (hard)
Example: Parsimony cost of one column {A} Final cost C = 1 {A} {A, B} Cost C+=1 A B A A A A B A {B} {A} {A} {A}
Parsimony Scoring Given a tree, and an alignment column u Label internal nodes to minimize the number of required substitutions Initialization: Set cost C = 0; node k = 2N – 1 (last leaf) Iteration: If k is a leaf, set Rk = { xk[u] } // Rk is simply the character of kth species If k is not a leaf, Let i, j be the daughter nodes; Set Rk = Ri Rj if intersection is nonempty Set Rk = Ri Rj, and C += 1, if intersection is empty Termination: Minimal cost of tree for column u, = C
Example {B} {A,B} {A} {B} {A} {A,B} {A} A A A A B B A B {A} {A} {A} {A} {B} {B} {A} {B}