CS 6990 Bioinformatics BLAST

CS 6990BioinformaticsBLAST Fall 2004 Dr. Susan Bridges Department of Computer Science and Engineering Bioinformatics

Overview • Basic Local Alignment Tool • BLAST is a collection of programs • Developed by Altschul, et al. • Simplification of the Smith Waterman Dynamic Programming algorithm • It looks for matches of short words (but not necessarily exact) Department of Computer Science and Engineering Bioinformatics

BLAST Terminology • Segment—a substring of a sequence • Segment pair of two sequences—pair of segments of the same length (no gaps), one from each sequence • w-mer—a substring (or word) of w characters Department of Computer Science and Engineering Bioinformatics

Goal • Form a gapless alignment between pairs and score the alignment using an amino acid substitution matrix. • Example (using PAM 120) K A L M R V A K N S -4 3 -4 -3 -1 Total score of alignment = -9 Department of Computer Science and Engineering Bioinformatics

Steps in the Algorithm • Compile a list of high-scoring words in the query sequence • Find matches in the db for each high-scoring word • For each match in the db, extend the alignment in both directions Department of Computer Science and Engineering Bioinformatics

Step 1 • Compile a list of high-scoring words in the query sequence • Defaults of w=3 for proteins, and w=11 for nucleic acid sequences • The total number of words will be n-w+1 • Each word has a score t toward the query sequence computed using scoring matrix • Threshold T: t-scores above T for any word pair indicates synonyms (T is called the neighborhood word score threshold) Department of Computer Science and Engineering Bioinformatics

Step 1 Example (w=2) Adipokinetic hormone II of migratory locust q l n f s a g w q l l n n f f s s a a g g w Department of Computer Science and Engineering Bioinformatics

Step 1 continued • Find all words in the db that are synonyms of the high scoring query words Department of Computer Science and Engineering Bioinformatics

Example continued (T=8, PAM120 Scoring Matrix) Department of Computer Science and Engineering Bioinformatics

Step 2 • For each word or synonym from the query, search for a hit in all db sequences • Each hit is considered a seed alignment and is extended in both directions as long as the score of the alignment is increased. (newer versions allows short gaps) q l n f s a g w w i d f a a c p • If the score for the segment pair is higher than a threshold S, the score and the endpoints are stored. • High scoring segment pairs are called HSPs • The highest scoring segment pair for the whole pairwise comparison is referred to as the maximal-scoring segment pair (MSP) Department of Computer Science and Engineering Bioinformatics

Step 3 The HSP’s of the entire database are compared to a cutoff score S, and those greater than S, are returned. Query: q l n f s a g w Return all matched sequences with scores greater than 8 Department of Computer Science and Engineering Bioinformatics

Step 4 • Compute the statistical significance of each HSP score. Department of Computer Science and Engineering Bioinformatics

Step 5 • Alignment of the segments are done • The alignment score is obtained • The E() value for this score is calculated. • If the calculated E() for the database sequence meets the user given E() for the program, this score is reported. Department of Computer Science and Engineering Bioinformatics

BLAST output • The list of hits • Database accession codes, name, description, general information about the hit. • Score in bits, the alignment score expressed in units of information. • Expectation value E() Department of Computer Science and Engineering Bioinformatics

BLAST programs Department of Computer Science and Engineering Bioinformatics

References • Setubal and Meidanis, Introduction to Computational Molecular Biology • NCBI Education Pages, http://www.ncbi.nih.gov/Education/BLASTinfo/BLAST_algorithm.html • Weizmann Institute of Science, http://bioportal.weizmann.ac.il/course/introbioinfo/ • Computers and the Human Genome Project, http://www-cse.stanford.edu/classes/sophomore-college/projects-00/computers-and-the-hgp/BLAST.html • The BLAST Help Manual, http://www.ncbi.nlm.nih.gov/BLAST/blast_help.shtml Department of Computer Science and Engineering Bioinformatics

CS 6990 Bioinformatics BLAST

CS 6990 Bioinformatics BLAST

Presentation Transcript

Bioinformatics and BLAST

CS 5263 Bioinformatics

CS 6990 Bioinformatics BLAST

CS 5263 Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics

CS 6990 Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics

Bioinformatics and BLAST

CS 5263 Bioinformatics

CS 5263 Bioinformatics CS 4593 AT: Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics

CS 5263 Bioinformatics CS 4593 AT: Bioinformatics

CS 5263 Bioinformatics