1 / 26

Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment

Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment. Alignment and Resources. Given two sequences of length ~1,000 requires a table of size ~1,000,000 cells Can we use less space if only wanted the alignment score Hint: The construction was carried out one row at a time .

tamber
Download Presentation

Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Space/Time TradeoffandHeuristic ApproachesinPairwise Alignment

  2. Alignment and Resources • Given two sequences of length ~1,000 requires a table of size ~1,000,000 cells • Can we use less space if only wanted the alignment score • Hint: The construction was carried out one row at a time

  3. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  4. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  5. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  6. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  7. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  8. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  9. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  10. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  11. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  12. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  13. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  14. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  15. Alignment and Resources • If only alignment score is needed the alignment can be computed by using a matrix of only two rows

  16. Alignment and Resources • If the sequences have size m and n need 2*min(m, n) cells to compute alignment score (could have slid “window” vertically)

  17. Alignment and Resources • If the sequences have size m and n need 2*min(m, n) cells to compute alignment score (could have slid “window” vertically) • Cannot recover the alignment -- trace-back arrows not stored • Possible to design an algorithm that uses m+n cells but still allows to recover the alignment D. S. Hirschberg. Algorithms for the longest common subsequence problem. J.ACM, 24:664-675, 1977.

  18. Alignment and Resources • Given two sequences each of length ~1,000 • original algorithm required to store ~1,000,000 = 1,000*1,000 cells • modified version requires 2,000 = 2*min(1000, 1000) • If the value of a cell could be computed in 1μs how much time is required by each algorithm • The algorithms are impractical if you need to search through a database of hundreds of thousands of sequences • Heuristic approaches (BLAST, FASTA) have been developed to cope with this problem • May not find overall best alignment, but do well in practice

  19. BLAST QUERY sequence(s) • Basic Local Alignment Search Tool – computes local alignments and performs very well in practice Altschul, Gish, Miller, Myers, Lipman, Basic Local Alignment Search Tool. Journal of Molecular Biology, 215(3), 403-410. BLAST results BLAST program BLAST database

  20. BLAST • Main Idea: Identify short stretches of high scoring local alignments between query and target sequence and extend “The central idea of the BLAST algorithm is to confine attention to segment pairs that contain a word pair of length w with a score of at least T.” Altschul et al. (1990) • The procedure: • use sliding window to extract all words of size w from query sequence • for each word build a “hit list” of words with pairwise score at least T • scan database for sequences that have words from “hit list” • extend each hit until score drops below some cutoff

  21. BLAST • Example with w=3, T=11, query=…FSGTWYA… • use sliding window to extract all words of size w from query sequence … FSG, SGT, GTW, WAY, … • for each word build a “hit list” of words with pairwise score at least T GTWGTW 6,5,11 = 22 ASW 0,1,11 = 12 QTW -2,5,11 = 14 • scan database for sequences that have words from “hit list” • extend each hit until score drops below some cutoff ENFDKARFSGTWYAMAKKD QNFDKTRYAGTWYAVAKKD Adapted from JHMI 140.638.01

  22. BLAST Server • http://blast.ncbi.nlm.nih.gov/Blast.cgi

  23. FASTA • Runs dynamic programming on a restricted part of the table Lipman, Pearson. Rapid and sensitive protein similarity searches. Science. 227 (4693): 1435-41. • Procedure • identify all matches of size k between the sequences (dot plot like) --these matches will form diagonals in the matrix • keep only the top scoring matches (using PAMn, BLOSUMn) – the score for these matches is called init1 • attempt to join any of the top scoring regions if they could form longer alignment – the score for these alignments is called initn • apply full dynamic programming on a narrow band around the high scoring diagonal – the score for the final alignment is called opt

  24. “Protein Structure prediction – a practical approach”

  25. FASTA Server • http://fasta.bioch.virginia.edu

  26. Exam Topics • Python Programming • be able to write python functions • be able to predict the output of a function • Chapter 4 • 4.1: principles of sequence alignment • 4.2: scoring alignments, dot plots • 4.3: substitution matrices (high-level difference PAM vs BLOSUM) • 4.4: handling gaps • 4.5: types of alignment (pairwise only) • 4.6: searching databases (BLAST, FASTA) • Chapter 5 • 5.1: substitution matrices (know how BLOSUM works, up to p.124) • 5.2: dynamic programming algorithms (skip pp.134, 135) • 8.1: Jukes-Cantor, Kimura models (pp.271-273)

More Related