1 / 18

BLAST – A heuristic algorithm

BLAST – A heuristic algorithm. Anjali Tiwari Pannaben Patel Pushkala Venkataraman. Basic Local Alignment Search Tool. BLAST. Rapid Searching of Protein & nucleotide DBs. Seeking similar sequences. GenBank. nr. SwissProt. Database. PDB. PRF. PIR. nr = non redundant database.

snow
Download Presentation

BLAST – A heuristic algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BLAST – A heuristic algorithm Anjali Tiwari Pannaben Patel Pushkala Venkataraman

  2. Basic Local Alignment Search Tool BLAST Rapid Searching of Protein & nucleotide DBs Seeking similar sequences GenBank nr SwissProt Database PDB PRF PIR nr = non redundant database

  3. BLAST – 3 STEP ALGORITHM Compile Words Scan DB Extend

  4. Some definitions Process of lining up 2 or more sequences to asses similarity Alignment A 20*20 substitution matrix for amino acids BLOSUM62 • Space introduced into alignment to compensate for insertions/deletions in 1 sequence relative to another Gap

  5. Local Search Algorithms Similarity Measures Identities & Conservative Replacements = +ve Similarity Matrix - BLOSUM Unlikely Replacements = -ve

  6. General Concept of working of BLAST 1000’s of sequences Query Input Calculate HSP Calculate MSP MSP – Maximal Segment Pair HSP – High Scoring Pair Display output

  7. Key Idea – BLAST1 Compile a list of high scoring words of length w from query (w=3 for proteins, 11 for nucleic acids) Step 1 Scan for word hits in the database of score greater than threshold, T Step 2 Extend word hit in both directions to find High Scoring Pairs with scores greater than S Step 3

  8. Example Step -1 Query – QQGPHUIQEGQQGKEEDPP Words of length 3 –w = QQG, QGP, GPH, PHU, HUI… Take first triple – QQG Make neighborhood words – w’ = QQG, QEG, GQG… Find high scoring triples – Blosum(w, w’) > T where T = Threshold parameter Suppose Blosum (QQG, QEG) =18 Blosum(QQG,GQG) = 12 Blosum(QQG, QQG)= 16 T=13 Choose QQG and QEG since Blosum Value > T value

  9. Step -2 Suppose Database Sequence = PKLMMQQGKQEGM Matching Word Pairs in DB sequence

  10. Step -3 Query QQGPHUIQEGQQGKEEDPP DB Sequence PKLMMQQGKQEGM Blosum(QQG, QQG) =16 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGK, QQGK) =21 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKE, QQGKQ) =23 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKEE, QQGKQE) =28 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKEED, QQGKQEG) =27

  11. Extension to the right stops here because BLOSUM value is beginning to decrease • ADVANTAGES • Faster than Dynamic Programming • Removes low complexity regions • Spends less time on uninteresting • search • Statistical significance of results can • be obtained & these are very good • DISADVANTAGES • Finds & reports only local • alignments • Finds too many word hits per • Sequence thus reducing speed • Does not allow for gaps in sequence *** New Models to combat disadvantages *** BLAST2, PSI Blast

  12. BLAST2 – Combination of 2 Hit & Gapped 2 Hit Method - 3 Step method Step 1 and Step 2 as BLAST –1 Step – 3 is where they differ BLAST now looks for 2 words in a sequence instead of 1 while aligning. The 2 words are at a distance < A and are not overlapping. Typically A=40 A

  13. Gapped Blast • Gapped alignment is introduced to get an optimal alignment • Two sequences: Seq A = ACGTA Seq B = ACATA Normal alignment is ACGTA ACATA But if a penalty of mismatch is larger than the penalty of gap then the best optimal alignment is as below. AC-GTA ACG-TA ACA-TA AC-ATA

  14. Gapped BLAST - Allows gaps to come while aligning Query – ATTGTCAAAGACTTGAGCTGATGCAT DB GGCAGACATGACTGACAAGGGTATCG ATTGTCAAAGACTTGAGCTGATGCAT GGCAGACATGA CTGACAAGGGTATCG Mismatch Gap

  15. PSI – BLAST- Position specific iterated BLAST. Used for multiple alignments Query Sequence BLAST search of DB Sequences with high scores collected New sequences added & process iterated Multiple alignment & profile made DB searched with profile

  16. References • Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." Journal of Molecular Biology 215:403-410. • Altschul, S.F.,Thomas L.M., Alejandro A.S, Jinghui Z, Zheng Z, W. Miller & David J.L. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” Nucleic Acids Research. • http://www.ncbi.nlm.nih.gov/ • http://bioinf.man.ac.uk/ember/prototype/

  17. References (Continued) • http://www.psc.edu/biomed/training/tutorials/sequence/db/index.html • http://aracyc.stanford.edu/~jshrager/jeff/mbcs/match.html • http://www.ime.usp.br/~durham/cursos/ibi5032/pub/doc/allignmentTutorial.pdf • http://ibivu.cs.vu.nl/teaching/masters/seq_analysis/sa_lecture3.pdf

More Related