1 / 27

Presentation partially taken from Dan Geiger, modified by Benny Chor.

Maximum Likelihood ( ML ) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a. Background reading: Durbin et al Chapter 8. Presentation partially taken from Dan Geiger, modified by Benny Chor. Our Probabilistic Model (Reminder).

Download Presentation

Presentation partially taken from Dan Geiger, modified by Benny Chor.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic treesComput. Genomics, lecture 7a Background reading: Durbin et al Chapter 8. Presentation partially taken from Dan Geiger, modified by Benny Chor. .

  2. Our Probabilistic Model(Reminder) A single edge is a fairly boring tree… Now we don’t know the states at internal node(s), nor the edge parameters pe1, pe2, pe3 YXYXX XXYXY pe2 pe1 pe3 ????? YYYYX

  3. Maximum Likelihood Maximize likelihood (over edge parameters), while averaging over states of unknown, internal node(s). YXYXX XXYXY pe2 pe1 pe3 ????? YYYYX

  4. Maximum Likelihood (2) Consider the phylogenetic tree to be a stochastic process. XXX Unobserved XXX XYX XXX XYX Observed XXY YYX The probability of transition from character a to character b along edge e is given by parameters pe. Given the complete tree, the likelihood of data is determined by the values of the pe‘s.

  5. X X X X Y X X X Y Y Y X Maximum Likelihood (3) We assume each site evolves independently of the others. Pr(D|Tree, )=iPr(D(i)|Tree, ) This allows us to decompose the likelihood of the data (sequences at leaves) to the product of each site, given the (same) tree and edge probabilities. This is the first key to an efficient DP algorithm for the tiny ML problem. (Felsenstein, 1981). Will now show how Pr(D(i)|Tree, ) is efficiently computed.

  6. Computing the Likelihood Let T be a binary tree with subtrees T1 and T2. Let Lx(D | T, ) bethe likelihood of T with X at T’sroot. Define LY(D | T, ) similarly. X Y p1 p2 tree1 tree2

  7. X Y p1 p2 tree1 tree2 Computing the Likelihood (2) By the definition of likelihood (sum over internal assignments), L(D | T, ) = Lx(D | T, ) + LY(D | T, ) This is the second key to an efficient DP algorithm for the tiny ML problem. (Felsenstein, 1981)

  8. Computing Lx(D | Tree,  ) X p1 p2 Y Y X X tree1 tree2

  9. Computing Lx(D | Tree,  ) Lx(D | Tree,  ) = ( Lx(D | Tree1,  )(1- p1)+ LY(D | Tree1,  ) p1 ) * ( Lx(D | Tree2,  )(1- p2)+ LY(D | Tree2,  ) p2 ) X p1 p2 Y Y X X tree1 tree2

  10. The Dynamic Programming Algorithm The algorithm starts from the leaves and proceeds up towards the root. For each sub-tree visited, keep both Lx(D | sub-tree, )and LY(D | sub-tree, ). This enables computing Lx andLYlikelihoodsw.r.t T using5 multiplications and 2 additions. X p1 p2 Y Y X X tree1 tree2

  11. The Dynamic Programming Algorithm The algorithm thus takes O(1) floating point operations per internal node of the tree. If there are n leaves, the number of internal nodes is n-1,so overall complexity is O(n). X p1 p2 Y Y X X tree1 tree2

  12. What About Initialization? Well, this is easy. If T is a leaf that contains X, then Lx(D | T, ) = 1, andLx(D | T, ) = 0. ( the case where T is a leaf that contains Y is left as a bonus assignment  ) X p1 p2 Y Y X X tree1 tree2

  13. A Few More Question Marks • What if tree is not binary? Would it not effect complexity… • What if tree unrooted? Can show symmetry of substitution • probabilities implies likelihood invariant under choice of • roots. • Numerical questions • (underflow, stability). • Non binary alphabet. X p1 p2 Y Y X X tree1 tree2

  14. From Two to Four States Model Maximize likelihood (over edge parameters), while averaging over states of unknown, internal node(s). But what do the edge probabilities mean now? AAGTT ACCGT pe2 pe1 pe3 ????? CGGCT

  15. From Two to Four States Model(2) • So far, our models consisted of a “regular” tree, where in addition, edges are assigned substituion probabilities. • For simplicity, assumed our “DNA” has only two states, say X and Y. • If edge eis assigned probability pe, this means that the probability of substitution (X Y) across e is pe. • Now a single pecan no longerexpressall 16-4=12 possiblesubstitution probabilities.

  16. From Two to Four States Model • Now a singlepecan no longerexpressall 16-4=12 possiblesubstitution probabilities. • The most general model will indeed have 12 independent parameters per edge, e.g.pe (C->A), pe (T->A),etc. It need not be symmetric. • Still, most popular models are symmetric, and use far less parameters per edge. • For example, the Jukes-Cantor substitution model assumes equal substitution probability of any unequal pair of nucleotides (across each edge separately).

  17. A G   C T  The Jukes-Cantor model (1969) Jukes-Cantor assume equal prob. of change: 1-3

  18. Tiny ML on Four States : Like Before, Only More Cases Can handle DNA subst. models, AA subst. models, ... Constant (per node) depends on alphabet size. A C G T P(GC) *PC(left subtree) A CGT A CGT

  19. Kimura’s K2P model (1980) Jukes-Cantor model does not take into account that transitions rates (between purines) AG and (between pyrmidine) CT are different from transversions rates (AC, AT, CG, GT). Kimura 2 parameter model uses a different substitution matrix:

  20. Kimura’s K2P model (Cont) Leading using similar methods to: Where:

  21. Additional Models There are yet more involved DNA substitution models, responding to phenomena occurring in DNA. Some of the models (like Jukes-Cantor, Kimura 2 parameters, and others) exhibit a “group-like” structure that helps analysis. The most general of these is a matrix where all rates of change are distinct (12 parameters). For AA (proteins), models typically have less structure. Further discussion is out of scope for this course. Please refer to the Molecular Evolution course (life science).

  22. Back to the 2 States Model Showed efficient solution to the tiny ML problem. Now want to efficiently solve the tiny AML problem. YXYXX XXYXY pe2 pe1 pe3 ????? YYYYX

  23. Two Ways to Go In the second version (maximize over states of internal nodes) we are looking for the “most likely” ancestral states. This is called ancestral maximum likelihood (AML). In some sense AML is “between” MP (having ancestral states) and ML (because the goal is still to maximize likelihood). YXYXX XXYXY pe2 pe1 pe3 ????? YYYYX

  24. Two Ways to Go In some sense AML is “between” MP (having ancestral states) and ML (because the goal is still to maximize likelihood). The tiny AML algorithm will be like Fitch small MP algorithm: It goes up to the root, then back down to the leaves. YXYXX XXYXY pe2 pe1 pe3 ????? YYYYX

  25. X Y p1 p2 tree1 tree2 Computing the Ancestral Likelihood Let T be a binary tree with subtrees T1 and T2. Let LE(D | T, ) bethe ancestral likelihood of T with E (X or Y) at the node of T’sfather. E p

  26. Computing the Ancestral Likelihood (2) By the definition of ancestral likelihood (maximizing over internal assignments), LX(D| T, ) = max((1-p)Lx(D | tree1, ) * Lx(D | tree2, ) , pLY(D | tree1, )* LY(D | tree2, )) This is key to an efficient DP algorithm for the tiny AML problem (Pupko et. al, 2000) X p X Y p1 p2 tree1 tree2

  27. Computing the Ancestral Likelihood (2) Boundary conditions: At leaves LX(D| T, ) = 1-p if leaf label is X, p otherwise. At root: We pick label E (XorY)that maximizes LE(D | tree1, ) LE(D | tree2, ). We now go down the tree. At each node we pick the label that maximizes the likelihood, given the (known) label of father. Total run time is O(n). X p X Y p1 p2 tree1 tree2

More Related