260 likes | 274 Views
BLAST is a heuristic method for performing local alignments through searches of high scoring segment pairs (HSPs). It is the fastest and most frequently used sequence alignment tool, offering both sensitivity and speed. This lecture covers the uses of BLAST and provides access to different flavors of BLAST, as well as instructions on how to run NCBI BLAST.
E N D
Lecture 3 BLAST
BLAST • Basic Local Alignment Search Tool • A heuristic method for performing local alignments through searches of high scoring segment pairs (HSP’s) • 1st to use statistics to predict significance of initial matches - • Offers both sensitivity and speed
BLAST • Looks for clusters of nearby or locally dense “similar or homologous” k-tuples • Uses “look-up” tables to shorten search time • Uses larger “word size” than FASTA to accelerate the search process • Performs both Global and Local alignment • Fastest and most frequently used sequence alignment tool -- THE STANDARD
Uses of BLAST • Identifying species With the use of BLAST, you can possibly correctly identify a species and/or find homologous species. • Locating domains When working with a protein sequence you can input it into BLAST, to locate known domains within the sequence of interest. • Establishing phylogeny Using the results received through BLAST you can create a phylogenetic tree using the BLAST web-page. It should be noted that phylogenies based on BLAST alone are less reliable than other purpose-built computational phylogenetic methods, so should only be relied upon for "first pass" phylogenetic analyses. • DNA mapping When working with a known species, and looking to sequence a gene at an unknown location, BLAST can compare the chromosomal position of the sequence of interest, to relevant sequences in the database(s). • Comparison When working with genes, BLAST can locate common genes in two related species, and can be used to map annotations from one organism to another.
BLAST Access • NCBI BLAST • http://www.ncbi.nlm.nih.gov/BLAST/ • Canadian Bioinformatics Resource BLAST • http://cbr-rbc.nrc-cnrc.gc.ca/blast/ • European Bioinformatics Institute BLAST • http://www.ebi.ac.uk/blastall/ • http://www.ebi.ac.uk/blast2/
Different Flavours of BLAST • BLASTP - protein query against protein DB • BLASTN - DNA/RNA query against GenBank (DNA) • BLASTX - 6 frame trans. DNA query against proteinDB • TBLASTN - protein query against 6 frame GB transl. • TBLASTX - 6 frame DNA query to 6 frame GB transl. • PSI-BLAST - protein ‘profile’ query against protein DB • PHI-BLAST - protein pattern against protein DB
Other BLAST Services • MEGABLAST - for comparison of large sets of long DNA sequences • RPS-BLAST - Conserved Domain Detection • BLAST 2 Sequences - for performing pairwise alignments for 2 chosen sequences • Genomic BLAST - for alignments against select human, microbial or malarial genomes
MT0895 • MMKIQIYGTGCANCQMLEKNAREAVKELGIDAEFEKIKEMDQILEAGLTALPGLAVDGELKIMGRVASKEEIKKILS
Running NCBI BLAST • Paste in sequence (FASTA format, raw sequence or type in GI or accession number) >Mysequence MT0895 KIQIYGTGCANCQMLEKNAREAVKELGIDAEFEKIKEMDQILEAGLTALPGLAVDGELKIDS OR > KIQIYGTGCANCQMLEKNAREAVKELGIDAEFEKIKEMDQILEAGLTALPGLAVDGELKIDS OR KIQIYGTGCANCQMLEKNAREAVKELGIDAEFEKIKEMDQILEAGLTALPGLAVDGELKIDS
Running NCBI BLAST • Choose a range of interest in the sequence “set subsequences” (not usually used) • Select the database from pull-down menu (usually choose nr = non-redundant) • Leave “Options” unchanged (use defaults)
Running NCBI BLAST Select Database
Running NCBI BLAST Click BLAST!
BLAST Parameters • Identities - No. & % exact residue matches • Positives - No. and % similar & ID matches • Gaps - No. & % gaps introduced • Score - Summed HSP score (S) • Bit Score - a normalized score (S’) • Expect (E) - Expected # of chance HSP aligns
Conclusions • BLAST is the most important program in bioinformatics (maybe all of biology) • BLAST is based on sound statistical principles (key to its speed and sensitivity) • A basic understanding of its principles is key for using/interpreting BLAST output • Use NBLAST or MEGABLAST for DNA • Use PSI-BLAST for protein searches