1 / 37

BLAST

BLAST. Similarity and Homology. Similarity is a measure of “ sameness ” . It is expressed as a percentage, and it does not imply any reasons for the observed sameness, it is simply a measure of the observed likeness.

clorene
Download Presentation

BLAST

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BLAST

  2. Similarity and Homology • Similarity is a measure of “sameness”. It is expressed as a percentage, and it does not imply any reasons for the observed sameness, it is simply a measure of the observed likeness. • Homology is an evolutionary term used to describe relationship via descent from a common ancestor. Homologous things are often similar, but not always, for example the flipper of a whale and your arm, or the DNA sequence for Actin in humans and chickens. • Homology is NEVER expressed as a percent, either things being compared are related or they are not. • Similarity is not homology, things may be % similar, but they are either homologous or not.

  3. Similarity and Homology • Sequence homology can be reliably inferred from statistically significant similarity over a majority of the sequence length. • Non-homology CANNOT be inferred from non-similarity because non-similar things can still share a common ancestor. • Homologous proteins share common structures, but not necessarily common sequence or function.

  4. What is BLAST? • Basic Local Alignment Search Tool • It is a sequence database search program • It tries to match a query sequence with each of a target database sequences • Produces local alignments: only a portion of each sequence is aligned • Uses statistical theory to determine if a match might have occurred by chance

  5. In 6 frames Nucleotide Sequence Protein Sequence Translated Protein Sequence tblastn blastn blastp blastx Nucleotide DB Protein DB tblastx Translated DB (contain amino acid sequences) In 6 frames

  6. BLAST at NCBI

  7. Peptide Sequence Databases nr: All non-redundant GenBank CDS translations + RefSeq Proteins + PDB + SwissProt + PIR + PRFrefseq
RefSeq: protein sequences from NCBI's Reference Sequence Project. Swissprot: Last major release of the SWISS-PROT protein sequence database (no updates). Pat: Proteins from the Patent division of GenPept. PDB: Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank. Month: All new or revised GenBank CDS translation+PDB+SwissProt+PIR+PRF released in the last 30 days. env_nr: Protein sequences from environmental samples. Nucleotide Sequence Databases
 nr: All GenBank + RefSeq Nucleotides + EMBL + DDBJ + PDB sequences (excluding HTGS0,1,2, EST, GSS, STS, PAT, WGS). No longer "non-redundant". refseq_rna: RNA entries from NCBI's Reference Sequence project refseq_genomic: Genomic entries from NCBI's Reference Sequence project Est: Database of GenBank + EMBL + DDBJ sequences from EST Divisions est_human: Human subset of est. est_mouse: Mouse subset. est_others: Non-Mouse, non-Human subset of est
gss: Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences. htgs: Unfinished High Throughput Genomic Sequences: phases 0, 1 and 2 (finished, phase 3 HTG sequences are in nr) Pat: Nucleotides from the Patent division of GenBank. Pdb: Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank Month: All new or revised GenBank + EMBL + DDBJ + PDB sequences released in the last 30 days.dbsts
Database of GenBank+EMBL+DDBJ sequences from STS Divisions . Chromosome: A database with complete genomes and chromosomes from the NCBI Reference Sequence project.. Wgs: A database for whole genome shotgun sequence entries.env_nt
Nucleotide sequences from environmental samples. NCBI BLAST Databases

  8. Graphical Overview

  9. Using a filter (SEG) on a query.

  10. http://www.ncbi.nlm.nih.gov/blast/producttable.shtml

  11. What do you need for running BLAST ? • BLAST • Blastable database or formatted database which can be queried. • Query sequence • Query parameter

  12. Making your own BLAST DB • Any sequence file of fasta formatted sequences can be turned into a BLAST DB. • How you do this depends on which BLAST variant you are using. • NCBI BLAST-protein DB: formatdb -p T –i myseqfile • NCBI BLAST-nucleotide DB: formatdb -p F –i myseqfile

  13. Command line BLAST • blastall -p blastp -d formatteddb -i myseq -o myseq.blastp

  14. PSI BLAST • PSI stands for Position Specific Iterated.  • This search method makes use of a profile, which is a position-specific accounting of what amino acid residues are found in a family of aligned homologous proteins.  • PSI-blast accepts a protein sequence as input and first conducts a normal blast search to identify homologues in the database.  • A profile is constructed from the spectrum of sequences found in the initially identified homologues.  • This profile is used as the search key to identify more distant relatives.  • The process is then iterated, each time refining the profile based on inclusion of the new members.  • Ideally, the process is expected to converge on a unique set of genes

  15. PHI-BLAST • Pattern Hit Initiated BLAST • PHI-BLAST expects as input a protein query sequence and a pattern contained in that sequence. • PHI-BLAST searches the specified database for other protein sequences that also contain the input pattern and have significant similarity to the query sequence in the vicinity of the pattern occurrences. • PHI-BLAST is integrated with Position-Specific Iterated BLAST (PSI-BLAST), so that the results of a PHI-BLAST query can be used to initiate one or more rounds of PSI-BLAST searching. • By filling in the "regular expression" box on the PSI-blast page, you can execute a PHI-blast search. • PHI-blast enforces the presence of a motif in addition to the usual PSI-blast criteria for matching.  An example of a regular expression is W-x(9,11)-[VFY]-[FYW]-x(6,7)-[GSTNE]-[GSTQCR]-[FYW]-x(2)-P.  This means a W followed by 9 to 11 of anything, followed by one of the residues V, F, or Y, etc. 

  16. BLAST Assignment • http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml • After reading the tutorial go to basic BLAST input a sequence and run BLAST • Go to advanced BLAST page and use the same input sequence – change the parameters and see if there is any change in output • Go to PSI BLAST tutorial page follow the tutorial and proceed to PHI blast search.

  17. BLAST: Ian Korf, and M. Yandell O’Reilly Publishing

More Related