160 likes | 289 Views
CS 6990 Bioinformatics BLAST. Fall 2004 Dr. Susan Bridges. Overview. B asic L ocal A lignment T ool BLAST is a collection of programs Developed by Altschul, et al. Simplification of the Smith Waterman Dynamic Programming algorithm
E N D
CS 6990BioinformaticsBLAST Fall 2004 Dr. Susan Bridges Department of Computer Science and Engineering Bioinformatics
Overview • Basic Local Alignment Tool • BLAST is a collection of programs • Developed by Altschul, et al. • Simplification of the Smith Waterman Dynamic Programming algorithm • It looks for matches of short words (but not necessarily exact) Department of Computer Science and Engineering Bioinformatics
BLAST Terminology • Segment—a substring of a sequence • Segment pair of two sequences—pair of segments of the same length (no gaps), one from each sequence • w-mer—a substring (or word) of w characters Department of Computer Science and Engineering Bioinformatics
Goal • Form a gapless alignment between pairs and score the alignment using an amino acid substitution matrix. • Example (using PAM 120) K A L M R V A K N S -4 3 -4 -3 -1 Total score of alignment = -9 Department of Computer Science and Engineering Bioinformatics
Steps in the Algorithm • Compile a list of high-scoring words in the query sequence • Find matches in the db for each high-scoring word • For each match in the db, extend the alignment in both directions Department of Computer Science and Engineering Bioinformatics
Step 1 • Compile a list of high-scoring words in the query sequence • Defaults of w=3 for proteins, and w=11 for nucleic acid sequences • The total number of words will be n-w+1 • Each word has a score t toward the query sequence computed using scoring matrix • Threshold T: t-scores above T for any word pair indicates synonyms (T is called the neighborhood word score threshold) Department of Computer Science and Engineering Bioinformatics
Step 1 Example (w=2) Adipokinetic hormone II of migratory locust q l n f s a g w q l l n n f f s s a a g g w Department of Computer Science and Engineering Bioinformatics
Step 1 continued • Find all words in the db that are synonyms of the high scoring query words Department of Computer Science and Engineering Bioinformatics
Example continued (T=8, PAM120 Scoring Matrix) Department of Computer Science and Engineering Bioinformatics
Step 2 • For each word or synonym from the query, search for a hit in all db sequences • Each hit is considered a seed alignment and is extended in both directions as long as the score of the alignment is increased. (newer versions allows short gaps) q l n f s a g w w i d f a a c p • If the score for the segment pair is higher than a threshold S, the score and the endpoints are stored. • High scoring segment pairs are called HSPs • The highest scoring segment pair for the whole pairwise comparison is referred to as the maximal-scoring segment pair (MSP) Department of Computer Science and Engineering Bioinformatics
Step 3 The HSP’s of the entire database are compared to a cutoff score S, and those greater than S, are returned. Query: q l n f s a g w Return all matched sequences with scores greater than 8 Department of Computer Science and Engineering Bioinformatics
Step 4 • Compute the statistical significance of each HSP score. Department of Computer Science and Engineering Bioinformatics
Step 5 • Alignment of the segments are done • The alignment score is obtained • The E() value for this score is calculated. • If the calculated E() for the database sequence meets the user given E() for the program, this score is reported. Department of Computer Science and Engineering Bioinformatics
BLAST output • The list of hits • Database accession codes, name, description, general information about the hit. • Score in bits, the alignment score expressed in units of information. • Expectation value E() Department of Computer Science and Engineering Bioinformatics
BLAST programs Department of Computer Science and Engineering Bioinformatics
References • Setubal and Meidanis, Introduction to Computational Molecular Biology • NCBI Education Pages, http://www.ncbi.nih.gov/Education/BLASTinfo/BLAST_algorithm.html • Weizmann Institute of Science, http://bioportal.weizmann.ac.il/course/introbioinfo/ • Computers and the Human Genome Project, http://www-cse.stanford.edu/classes/sophomore-college/projects-00/computers-and-the-hgp/BLAST.html • The BLAST Help Manual, http://www.ncbi.nlm.nih.gov/BLAST/blast_help.shtml Department of Computer Science and Engineering Bioinformatics