1 / 17

Sequence Alignment

Dive into global and local alignment methods, scoring systems for matches, and gaps in sequence alignment. Explore dynamic programming, BLAST programs, affinity gap penalties, and protein and nucleotide database searches.

fswenson
Download Presentation

Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Alignment

  2. Two general methods for sequence alignment: • Global alignment:considers similarity across the full extent of the sequences, e.g. MegAlign • Local alignment:focuses on regions of similarity in parts of the sequences only, e.g. BLAST programs.

  3. Questions: • How similar are two sequences? • What is the best alignment between the two sequences? • How should alignments be scored? • And, if gaps are allowed, how should they be scored? • Three things are required : • a means of scoring matches and mismatches, • a means of scoring gaps, and • a method of using the two to evaluate numerous possible alignments.

  4. Sequence 1 ALCPQCDIE ALC +CD+E Sequence 2 ALCAKCDVE

  5. Grouping of amino acids based on physico-chemical properties important in protein structures.

  6. Commonly used substitution matrices are: • Point Accepted Mutation matrix (PAM) • PAM250 • BLOcks SUBstitution Matrix (BLOSUM) BLOSUM62

  7. Gap penalties Mutational events include not only substitutions but also insertions and deletions. • Affined gap penalties impose an 'opening' penalty for a gap and an 'extension' penalty that decreases the relative penalty for each additional position in an already opened gap. Sequence 1 ALCPQCDIE ALC CD+E Sequence 2 ALCA--DVE

  8. Sequence Search

  9. SensitivityversusSpeed • FASTAlooks for exactly matching 'words‘. • BLASTuses a scoring matrix.

  10. BLAST(Basic Local Alignment Search Tools) • The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity. • Include a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. • The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. • Local alignment may produce more biologically meaningful and sensitive results.

  11. Dynamic programming • First described in the 1950s. • First applied in this context by Needleman and Wunsch in 1970. • Breaking the original problem into smaller and smaller subproblems until the subproblems have a trivial solution, and then using those solutions to construct solutions for larger and larger portions of the original problem.

  12. MNGPLSSSGQTSTSPH PLSSSGQ PLSSSGQ PLSSSGQ All BLAST programs take the following steps: • The query is divided to overlapping, short “word sizes”, (e.g. 3 for amino acid sequence, 11 for nucleotide sequence). • Words with simple compositions are filtered out. • The remaining words are searched for in the databases. • After finding the best matching sequence with each word, the matching is extended in both direction until the highest scoring pairs (HSP) are found. • HSPs are reported to the client. MNPLSSSGQPHTLM MNP SGQ NPL GQP PLS QPH LSS PHT SSS HTL SSG TLM MNGPLSSSGQTSTSPH LSS

  13. BLAST Programs • BLASTN: • Compares a nucleotide query sequence against a nucleotide sequence database. • BLASTP: • Compares an amino acid query sequence against a protein sequence database. • BLASTX: • Compares a nucleotide query sequence translated in all reading frames against a protein sequence database.

  14. tblastn: • Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames. • tblastx: • Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

  15. Length Database Purpose BLAST Program 20 bp or longer Nucleotide Identify the query sequence MEGABLAST      (accept batch queries) Standard BLAST     (blastn) Find sequences similar to query sequence Standard BLAST      (blastn) Find similar proteins to translated query in a translated database Translated BLAST      (tblastx) Protein Find similar proteins to translated query in a protein database Translated BLAST     (blastx) 7 - 20 bp Nucleotide Find primer binding sites or map short contiguous motifs Search for short, nearly exact matches If your sequence is NUCLEOTIDE

  16. If your sequence is PROTEIN Length Database Purpose BLAST program 15 residues or longer Protein Identify the query sequence or find protein sequences similar to query Standard Protein BLAST     (blastp) Find members of a protein family or build a custom position-specific score matrix PSI-BLAST Find proteins similar to the query around a given pattern PHI-BLAST Conserved Domains Find conserved domains in the query CD-search     (RPS-BLAST) Conserved Domains Find conserved domains in the query and identify other proteins with similar domain architectures Domain Architecture Retrieval Tool     (DART) Nucleotide Find similar proteins in a translated nucleotide database Translated BLAST     (tblastn) 5-15 residues Protein Search for peptide motifs Search for short, nearly exact matches

  17. BLAST search examples

More Related