180 likes | 215 Views
BLAST – A heuristic algorithm. Anjali Tiwari Pannaben Patel Pushkala Venkataraman. Basic Local Alignment Search Tool. BLAST. Rapid Searching of Protein & nucleotide DBs. Seeking similar sequences. GenBank. nr. SwissProt. Database. PDB. PRF. PIR. nr = non redundant database.
E N D
BLAST – A heuristic algorithm Anjali Tiwari Pannaben Patel Pushkala Venkataraman
Basic Local Alignment Search Tool BLAST Rapid Searching of Protein & nucleotide DBs Seeking similar sequences GenBank nr SwissProt Database PDB PRF PIR nr = non redundant database
BLAST – 3 STEP ALGORITHM Compile Words Scan DB Extend
Some definitions Process of lining up 2 or more sequences to asses similarity Alignment A 20*20 substitution matrix for amino acids BLOSUM62 • Space introduced into alignment to compensate for insertions/deletions in 1 sequence relative to another Gap
Local Search Algorithms Similarity Measures Identities & Conservative Replacements = +ve Similarity Matrix - BLOSUM Unlikely Replacements = -ve
General Concept of working of BLAST 1000’s of sequences Query Input Calculate HSP Calculate MSP MSP – Maximal Segment Pair HSP – High Scoring Pair Display output
Key Idea – BLAST1 Compile a list of high scoring words of length w from query (w=3 for proteins, 11 for nucleic acids) Step 1 Scan for word hits in the database of score greater than threshold, T Step 2 Extend word hit in both directions to find High Scoring Pairs with scores greater than S Step 3
Example Step -1 Query – QQGPHUIQEGQQGKEEDPP Words of length 3 –w = QQG, QGP, GPH, PHU, HUI… Take first triple – QQG Make neighborhood words – w’ = QQG, QEG, GQG… Find high scoring triples – Blosum(w, w’) > T where T = Threshold parameter Suppose Blosum (QQG, QEG) =18 Blosum(QQG,GQG) = 12 Blosum(QQG, QQG)= 16 T=13 Choose QQG and QEG since Blosum Value > T value
Step -2 Suppose Database Sequence = PKLMMQQGKQEGM Matching Word Pairs in DB sequence
Step -3 Query QQGPHUIQEGQQGKEEDPP DB Sequence PKLMMQQGKQEGM Blosum(QQG, QQG) =16 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGK, QQGK) =21 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKE, QQGKQ) =23 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKEE, QQGKQE) =28 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKEED, QQGKQEG) =27
Extension to the right stops here because BLOSUM value is beginning to decrease • ADVANTAGES • Faster than Dynamic Programming • Removes low complexity regions • Spends less time on uninteresting • search • Statistical significance of results can • be obtained & these are very good • DISADVANTAGES • Finds & reports only local • alignments • Finds too many word hits per • Sequence thus reducing speed • Does not allow for gaps in sequence *** New Models to combat disadvantages *** BLAST2, PSI Blast
BLAST2 – Combination of 2 Hit & Gapped 2 Hit Method - 3 Step method Step 1 and Step 2 as BLAST –1 Step – 3 is where they differ BLAST now looks for 2 words in a sequence instead of 1 while aligning. The 2 words are at a distance < A and are not overlapping. Typically A=40 A
Gapped Blast • Gapped alignment is introduced to get an optimal alignment • Two sequences: Seq A = ACGTA Seq B = ACATA Normal alignment is ACGTA ACATA But if a penalty of mismatch is larger than the penalty of gap then the best optimal alignment is as below. AC-GTA ACG-TA ACA-TA AC-ATA
Gapped BLAST - Allows gaps to come while aligning Query – ATTGTCAAAGACTTGAGCTGATGCAT DB GGCAGACATGACTGACAAGGGTATCG ATTGTCAAAGACTTGAGCTGATGCAT GGCAGACATGA CTGACAAGGGTATCG Mismatch Gap
PSI – BLAST- Position specific iterated BLAST. Used for multiple alignments Query Sequence BLAST search of DB Sequences with high scores collected New sequences added & process iterated Multiple alignment & profile made DB searched with profile
References • Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." Journal of Molecular Biology 215:403-410. • Altschul, S.F.,Thomas L.M., Alejandro A.S, Jinghui Z, Zheng Z, W. Miller & David J.L. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” Nucleic Acids Research. • http://www.ncbi.nlm.nih.gov/ • http://bioinf.man.ac.uk/ember/prototype/
References (Continued) • http://www.psc.edu/biomed/training/tutorials/sequence/db/index.html • http://aracyc.stanford.edu/~jshrager/jeff/mbcs/match.html • http://www.ime.usp.br/~durham/cursos/ibi5032/pub/doc/allignmentTutorial.pdf • http://ibivu.cs.vu.nl/teaching/masters/seq_analysis/sa_lecture3.pdf