1 / 11

Pairwise Sequence Alignment (cont.)

Pairwise Sequence Alignment (cont.). (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 4, 2004 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign. Outline. Variations of the basic global/local alignment algorithms

tsimonsen
Download Presentation

Pairwise Sequence Alignment (cont.)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 4, 2004 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

  2. Outline • Variations of the basic global/local alignment algorithms • Basic information theory concepts • Significance of alignment scores (to be continued in the next class)

  3. Dynamic Programming Equations Alignment: F(0,0)-F(n,m) Alignment: 0-F(i,j) We can vary both the model and the alignment strategies

  4. In general, we can vary • Initial values • Recursive functions • Start and end of paths • The model s, d

  5. Variation I: Repeated Matches X= HEAGAWGHEE Y= HEA . AW –HE . Find non-overlapping copies of sections of Y in X: Unmatched regions Matched regions Alignment: (0,0)-(n+1,0)

  6. Variation II: Overlap Matches X is contained in Y, doesn’t penalize “overhanging ends” Ignore overhanding prefix Matched regions Ignore overhanding suffix Alignment: (0,0)- maximum{(i,m), (n,j)}

  7. Variation III: A general gap model Gap-open penalty Gap-extension penalty Alignment: F(0,0)-F(n,m) This can be more easily described as a Finite State Automaton (FSA)… (3 States: Match, Insertion in X, Insertion in Y)

  8. Heuristic alignment algorithms • Motivation: Complexity of alignment algorithms O(nm) • Current protein DB: 100 million base pairs • Imagine matching each sequence with a 1,000 base pair query • Takes about 3 hours! • Heuristic algorithms aim at speeding up at the price of possibly missing the best scoring alignment • Two well known programs • BLAST: Basic Local Alignment Search Tool • FASTA: • Both find high scoring local alignments between a query sequence and a target database • Basic idea: first locate high-scoring short stretches and the extend them

  9. BLAST (Basic Local Alignment Search Tool) • Three steps • Compiling a list of high-scoring “words” of fixed length • Scanning database to find occurrences of these words • Extend each word occurrence • Basic BLAST only finds ungapped alignments; newer versions can find gapped alignments (PSI-BLAST) • Visit BLAST (need some help!)

  10. FASTA (Fast Alignment) • Quite similar to BLAST • Multi-step procedure • Locate all identically matching words of a fixed length (1-2 for proteins, 4-6 for DNAs) • Look for diagonals with many mutually supporting word matches • The best diagonals are selected as “seeds” for extension • Extend a seed word to find maximal scoring ungapped regions (possibly joining several seeds) • Check to see if adjacent ungapped matches can be joined by a gapped region allowing for gap costs • Finally the full dynamic programming algorithm is run on the regions of best matching alignments

  11. Significance of Scores • How do we assess the significance of an alignment score? • Two basic approaches • The classical approach: Extreme value distribution • Assume a null (random) model for scores M0 • P(Score > s|M0, x, y)=? • The Bayesian approach: Model comparison • Assume two models for (x,y): random M0; aligned: M1 • P(M1|x,y)/P(M0|x,y)=? prior Log-odds score of the alignment

More Related