1 / 22

© Wiley Publishing. 2007. All Rights Reserved.

© Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences. Learning Objectives. Get the basics about dot plots Know how to interpret the most common patterns in a dot plot Use Dotlet Use Lalign to extract local alignments. Outline. Some reasons for comparing two sequences

malaya
Download Presentation

© Wiley Publishing. 2007. All Rights Reserved.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences

  2. Learning Objectives • Get the basics about dot plots • Know how to interpret the most common patterns in a dot plot • Use Dotlet • Use Lalign to extract local alignments

  3. Outline • Some reasons for comparing two sequences • Basic principles of dot-plot comparisons • Using Dotlet • Making local alignments with Lalign

  4. Why Compare Two Sequences? • Database searches are useful for finding homologues • Database searches don’t provide precise comparisons • More precise tools are needed to analyze the sequences in detail including • Dot plots for graphic analysis • Local or global alignments for residue/residue analysis • The alignment of two sequences is called a pairwise alignment

  5. Using The Right Tool

  6. Some Applications of Pairwise Alignments • Convince yourself two sequences are homologous • Identify a shared domain • Identify a duplicated region • Locate important features such as • Catalytic domains • Disulphide bridges • Compare a gene and its product

  7. What Is a Dot Plot ? • A dot plot is a graphic representation of pairwise similarity • The simplicity of dot plots prevents artifacts • Ideal for looking for features that may come in different orders • Reveal complex patterns • Benefit from the most sophisticated statistical-analysis tool in the universe . . . your brain

  8. Choosing Your Two Sequences • Making pairwise comparisons takes time • Use BLAST to rapidly select your sequences • More than 70% identity for DNA • More than 25% identity for proteins • If your sequences are too similar, comparing them yields no useful information

  9. Self-comparisons • Start comparing your sequence with itself • You can discover • Repeated domains • Motifs repeated many times (low complexity) • Mirror regions (palindromes) in nucleic acids

  10. What Can You Analyze with a Dot Plot ? • Any pair of sequences • DNA • Proteins • RNA • DNA with proteins • Dotlet is an appropriate tool • To compare full genomes, install the program locally • Sequences longer than 1000 symbols are hard to analyze online

  11. Some Typical Dot-plot Comparisons • Divergent sequences where only a segment is homologous • Long insertions and deletions • Tandem repeats • The square shape of the pattern is characteristic of these repeats

  12. Using Dotlet • Dotlet is one of the handiest tools for making dot plots • Dotlet is a Java applet • Open and download the applet at the following site: • www.isrec.isb-sib.ch/java/dotlet • Use Firefox or IE (if one doesn’t work, use the other)

  13. Set Dotlet Parameters • Dotlet slides a window along each sequence • If the windows are more similar than the threshold, Dotlet prints a dot at their intersection • You can control the similarity threshold with the little window on the left Window size Window Size Threshold Threshold

  14. The Dotlet Threshold • Every dot has a score given by the window comparison • When the score is • Below threshold 1  black dot • Between thresholds 1 and 2  grey dot • Above threshold 2  white dot • The blue curve is the distribution of scores in the sequences • The peak  most common score, • Most common  less informative Log curve

  15. Getting Your Dot Plot Right • Window size and the stringency control the aspect of your dot plot • Very stringent = clean dot plot, little signal • Not stringent enough = noisy dot plot, too much signal • Play with the threshold until a usable signal appears

  16. Which Size for the Window? • Long window • Clean dot plots • Little sensitivity • Short window • Noisy dot plots • Very sensitive • The size of the window should be in the range of the elements you are looking for • Conserved domains: 50 amino acids • Transmembrane segments: 20 amino acids • Shorten the window to compare distantly related sequences

  17. Looking at Repeated Domains with Dotlet • The square shape is typical of tandem repeats • The repeats are not perfect because the sequences have diverged after their duplication

  18. Comparing a Gene and Its Product • Eukaryotic genes are transcribed into RNA • The RNA is then spliced to remove the introns’ sequences • It may be necessary to compare the gene and its product • Dotlet makes this comparative analysis easy

  19. Aligning Sequences • Dotlet dot plots are a good way to provide an overview • Dot plots don’t provide residue/residue analysis • For this analysis you need an alignment • The most convenient tool for making precise local alignments is Lalign

  20. Lalign and BLAST • Lalign is like a very precise BLAST • It works on only two sequences at a time • You must provide both sequences

  21. Lalign Output • Lalign produces an output similar to the alignment section of BLAST • The E-value indicates the significance of each alignment • Low E-value  good alignment

  22. Going Farther • If you need to align coding DNA with a protein, try these sites: • www.tcoffee.org => protogene • coot.embl.de/pal2nal • If you need to align very large sequences, try this site: • www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi • If you need a precise estimate of your alignment’s statistical significance, use PRSS • The program is available at fasta.bioch.virginia.edu

More Related