1 / 23

Multiple Sequence Alignment (I)

Multiple Sequence Alignment (I). (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct. 4, 2005 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign. Outline. Motivation Scoring of multiple sequence alignments Algorithms Dynamic programming

shea
Download Presentation

Multiple Sequence Alignment (I)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct. 4, 2005 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

  2. Outline • Motivation • Scoring of multiple sequence alignments • Algorithms • Dynamic programming • Progressive alignment (next class)

  3. Why Multiple Alignments? • Characterize protein families: Identify shared regions of homology in a multiple sequence alignment • Determination of the consensus sequence of several aligned sequences. • Help predict the secondary and tertiary structures of new sequences • Help predict the function of new sequences • Preliminary step in molecular evolution analysis using phylogenetic trees.

  4. Example of Multiple Alignment Multiple sequence alignment of 7 neuroglobins using clustalx (Slide from Craig A. Struble)

  5. 4 Basic Questions in Multiple Alignment Q1: How should we define s? Q2: How should we define A? Model: scoring function s: A X1=x11,…,x1m1 X1=x11,…,x1m1 Possible alignments of all Xi’s: A ={a1,…,ak} Find the best alignment(s) X2=x21,…,x2m2 X2=x21,…,x2m2 … … S(a*)= 21 XN=xN1,…,xNmN XN=xN1,…,xNmN Q4: Is the alignment biologically Meaningful? Q3: How can we find a* quickly?

  6. Defining Multi-Sequence Alignment • We may generalize our definition of pairwise sequence alignment • Alignment of 2 sequences is represented as a 2-row matrix • In a similar way, we represent alignment of 3 sequences as a 3-row matrix A T _ G C G _A _ C G T _ AA T C A C _ A • A column must have at least one nucleotide • Question: How many possible global alignments are there for 3 sequences each of length 2?

  7. How do we score a multiple alignment?

  8. Scoring a Multiple Alignment • Ideally, it should be based on evolutionary models • In practice, • We often assume columns are independent • Use “Sum of Pairs” (SP scores) G is the gap score

  9. Minimum Entropy Scoring Intuition: A perfectly aligned column has one single symbol (least uncertainty) A poorly aligned column has many distinct symbols (high uncertainty) Count of symbol a in column i This is related to the HMM formulation of the alignment problem, which we will cover later …

  10. Entropy: Example Best case Worst case

  11. Entropy of an Alignment: Example column entropy: -( pAlogpA+ pClogpC + pGlogpG + pTlogpT) • Column 1 = -[1*log(1) + 0*log0 + 0*log0 +0*log0] = 0 • Column 2 = -[(1/4)*log(1/4) + (3/4)*log(3/4) + 0*log0 + 0*log0] = -[ (1/4)*(-2) + (3/4)*(-.415) ] = +0.811 • Column 3 = -[(1/4)*log(1/4)+(1/4)*log(1/4)+(1/4)*log(1/4) +(1/4)*log(1/4)] = 4* -[(1/4)*(-2)] = +2 • Alignment Entropy = 0 + 0.811 + 2 = +2.811

  12. How can we find a multiple alignment quickly? Can we generalize the dynamic programming algorithm used for pairwise alignment?

  13. Alignments = Paths in… • Align 3 sequences: ATGC, AATC,ATGC

  14. Alignment Paths x coordinate

  15. Alignment Paths • Align the following 3 sequences: ATGC, AATC,ATGC x coordinate y coordinate

  16. Alignment Paths x coordinate y coordinate z coordinate • Resulting path in (x,y,z) space: • (0,0,0)(1,1,0)(1,2,1) (2,3,2) (3,3,3) (4,4,4)

  17. 2-D vs 3-D Alignment Grid V W 2-D edit graph 3-D?

  18. Architecture of 3-D Alignment Grid In 2-D, 3 edges in each unit square In 3-D, 7 edges in each unit cube

  19. A Cell of 3-D Alignment Grid (i-1,j,k-1) (i-1,j-1,k-1) (i-1,j,k) (i-1,j-1,k) (i,j,k-1) (i,j-1,k-1) (i,j,k) (i,j-1,k)

  20. si-1,j-1,k-1 + (vi, wj, uk) si-1,j-1,k + (vi, wj, _ ) si-1,j,k-1 + (vi, _, uk) si,j-1,k-1 + (_, wj, uk) si-1,j,k + (vi, _ , _) si,j-1,k + (_, wj, _) si,j,k-1 + (_, _, uk) Multiple Alignment: Dynamic Programming cube diagonal: no indels • si,j,k = max • (x, y, z) is an entry in the 3-D scoring matrix and can be computed using sum of pairs or entropy face diagonal: one indel edge diagonal: two indels

  21. Multiple Alignment: Running Time • For 3 sequences of length n, the run time is 7n3; O(n3) • For ksequences, building a k-dimensional edit graph has run time (2k-1)(nk); O(2knk) • Conclusion: dynamic programming approach for alignment between two sequences is easily extended to k sequences but it is impractical due to exponential running time

  22. In the next class, we will cover more efficient algorithms -- progressive alignment ….

  23. What You Should Know • How to score a multi-sequence alignment • How the dynamic programming algorithm works • Computational complexity of dynamic programming algorithms

More Related