1 / 18

BLAST

BLAST. Chris Cobb Bioinformatics April 2008. Overview. Sequence comparison Compare regions of similarity Global alignment not necessarily 'valuable' Indicative of evolutionary relationships. Not always... Horizontal gene transfer Convergent evolution

marek
Download Presentation

BLAST

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BLAST • Chris Cobb • Bioinformatics • April 2008

  2. Overview • Sequence comparison • Compare regions of similarity • Global alignment not necessarily 'valuable' • Indicative of evolutionary relationships. • Not always... • Horizontal gene transfer • Convergent evolution • Similar function different sequence • Databases • Growing exponentially

  3. Sequence similarity • Percent identity • Max matches in an alignment / alignment length • Best alignment gives us maximum number of matches • Example • Cytochrome c • Small electron carrier protein in humans • Remember 3D structure! • 104 nucleotides

  4. Cytochrome c • Comparison between mouse and human • (95/104) = 91% match. • Evolutionary distance measured by similarity value.

  5. Sequence Alignment • How do we perform this alignment? • What considerations do we take? • The intuitive solution is to line them up along x-y axis and compare. • In CS this is a dynamic programming problem. • Small example

  6. Dynamic Programming • Set weights on different situations • +1 match • -1 mismatch • -2 insertion/deletion • Note: this gets expensive • Gaps resolved by insertion/deletion • If treated as individual events, carries a lot of weight • Instead use affine penalty function • penalty = G + nL • G : open penalty • L : extension penalty • n : size of gap

  7. Smith-Waterman • Dynamic programming algorithm with affine gap penalty developed by Smith and Waterman in 1981 • Guarantees optimal local alignment • Local alignment avoids noise • Reliable statistical model for optical local alignments. • Expectation that optimal alignment would occur by chance. (Karlin-Altschul). • Low expectation = high chance they are homologous • Open source implementations available

  8. Substitution Matrix • +1 for match not good metric. • Substitution matrix • Probability one amino acid mutates into another amino acid. • Built empirically with large dataset to reflect true probabilities of mutation during evolutionary process • BLOSUM, PAM common examples. • BLOSUM for local alignment • PAM for global

  9. Karlin Altschul scoring • Expected frequencey of High Scoring Pair (HSP) versus random occurance: • E = K M N e ^ (-λS)‏ • S: alignment score • λ: unique positive value used to normalize score • MN: search space (|query| * |target|)‏ • K: constant, aprox .1 • E-(MN): linear change • E-(S): exponential change (small change in score leads to big effect on value). • Assumptions: • i.i.d. (indepentantly identically distributed)‏ • Roughly equal in length

  10. BLAST!!!

  11. BLAST!!! • Basic Local Alignment Search Tool

  12. BLAST • Pre-indexed database • Position of every 'word' is remembered • High Scoring segment Pair (HSP)‏ • Local alignment with no gaps that scores high. • Query starts with best HSP and expands from there.

  13. BLAST search • Query • Break query into 'words' ('k' character strings)‏ • n – k + 1 words • n = query length • k = word length (default in Blast is 11 and 3)‏ • Scan database for words • Don't just use exact word, use similar words • When two words within a certain (T) distance from each other match the target, this as a segment pair. • Extend the HSP until the score drops by 'X' below its max value • Report statistically significant scores.

  14. BLAST statistics • HSP has a P value based on Poisson distribution • Small P value means significant score • Applies to ungapped segments • For gapped alignment, calculate E-Value • Karlin-Altschul method • E value decreases exponentially as Score increases! • Boils down to: • Results that aren't likely to happen by chance are best. • When we find these, report them as significant • Low E values mean more significant response.

  15. Using BLAST • http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/tut1.html

  16. BLAST customization • Lowering Neighborhood Word Threshold (T) finds more distantly related sequences • Raising the Segment Extension Cutoff (X) extends region that might be considered a HSP • Changing (E) Expectation just changes the score threshold (how good it has to be to show up).

  17. BLAST scores • Raw score • Sum of substitution score + gap score • Substitution score given by sub matrix (BLOSUM, PAM)‏ • Gap score calculated using affine penalty function • G: gap opening penalty • L: gap length penalty • Bit score • Normalized raw score so that scores from different substitution matrices can be compared. • E-value • Probability sequence with same or better score would occur randomly • Based on the Karlin-Altschul method

  18. Sources • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=bioinfo.chapter.A05 • http://www.ludwig.edu.au/course/course2002/talks/flegg02search/sld018.htm • http://en.wikipedia.org/wiki/Smith-Waterman_algorithm • http://math.la.asu.edu/~cbs/pdfs/projects/Fall_2005/Karlin-AltschulStatistics.pdf • http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html

More Related