1 / 17

Multiple alignment

Multiple alignment. June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may be altered by changing your search criteria. Understand usefulness of multiple sequence alignment. Become familiar with Clustal W.

fathi
Download Presentation

Multiple alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple alignment • June 29, 2007 • Learning objectives- • Review sequence alignment answer and answer questions you may have. • Understand how the E value may be altered by changing your search criteria. • Understand usefulness of multiple sequence alignment. Become familiar with Clustal W.

  2. Answer to pairwise sequence alignment workshop Similarity score: 60 %similarity: 10/12 x 100 = 83.3% P-RCLCQRDNCBA | || | |::|:| PBRCKC-RNDCDA

  3. P – R C L C Q R D N C B A | | | | | : : | : | P B R C K C – R N D C D A 9-5 7 12-3 12-5 7 2 2 12 5 5 SUM = 60

  4. Multiple Sequence Alignment 1 • Collection of three or more protein (or nucleic acid) sequences partially or completely aligned. • Aligned residues tend to occupy corresponding positions in the 3-D structure of each aligned protein.

  5. Practical use of MSA • Places proteins into a group of related proteins (paralogs and orthologs). • Identifies conserved domains and motifs • Identifies sequencing errors in nucleotide sequences • Identifies important regulatory regions in the promoters of genes.

  6. Practical uses Create Alignment If possible, edit the alignment to ensure that regions of functional or structural similarity are preserved Find conserved motifs to deduce function Structural Analysis Design of PCR primers Phylogenetic Analysis

  7. Clustal W (Thompson et al., 1994) • CLUSTAL=Cluster alignment • The underlying concept is that groups of sequences are phylogenetically related. If they can be aligned, then one can construct a tree. • Step1-pairwise alignments • Step2-create a guide tree • Step3-progressive alignment

  8. Flowchart of computation steps in Clustal W (Thompson et al., 1994) Pairwise Alignment: Calculation of distance matrix Creation of unrooted Neighbor-Joining Tree Create rooted NJ Tree (Guide Tree) and calculate sequence weights Progressive alignment following the Guide Tree

  9. Step 1-Pairwise alignments Compare each sequence with each other and pairwise alignment scores Human EYSGSSEKIDLLASDPHEALICKSERVHSKSVESNIEDKIFGKTYRKKASLPNLSHVTEN 480 Dog EYSGSSEKIDLMASDPQDAFICESERVHTKPVGGNIEDKIFGKTYRRKASLPKVSHTTEV 477 Mouse GGFSSSRKTDLVTPDPHHTLMCKSGRDFSKPVEDNISDKIFGKSYQRKGSRPHLNHVTE 476 • SeqA Name Len(aa) SeqB Name Len(aa) Score • human 60 2 dog 60 76 • 1 human 60 3 mouse 59 57 • 2 dog 60 3 mouse 59 49

  10. Step 1-Pairwise alignments Compare each sequence with each other and calculate similarity scores. H - D 76 - M 57 49 - Different sequences H D M The highest similarity score is between sequence H and sequence D

  11. Sreal(ij) – Srand(ij) Sident(ij) – Srand(ij) Seff = x 100 Step 2-Create Guide Tree Use the similarity scores to create a guide tree to determine the “order” of the sequences to be aligned. H - D 76 - M 57 49 - Sreal(ij) describes the similarity score for two aligned sequences i and j. Different sequences Sident(ij) is the average of the two scores for the two sequences compared to themselves H D M Srand(ij) is the mean alignment score derived from many random shufflings D ranges from 0 to 1 D = -ln(Seff) ( human:0.07429, dog:0.15904, mouse:0.34944);

  12. human:0.07429 dog:0.15904 mouse:0.3494 Guide Tree Step 2-Create Guide Tree (human:0.07429, dog:0.15904, mouse:0.34944); This branch length is proportional to the estimated divergence between the Dog sequence and an “average” sequence common to all three sequences

  13. human:0.07429 dog:0.15904 mouse:0.3494 Guide Tree Step 3-Progressive Alignment Align human and dog first. Then add mouse to the previous alignment. In closely aligned sequences, gaps are given lower penalty than penalty for gaps in more diver- gent sequences. General rule: “once a gap always a gap” Why is there a lower penalty for the closely aligned sequences? In closely aligned sequences, those gaps suggest that they are located in areas between functional or structural domains. In more divergent sequences gaps may be located in areas where the sequences that are dissimilar regardless of whether they break up functional or structural domains.

  14. Other Gap treatment • Short stretches of 5 hydrophilic residues often indicate loop or random coil regions (not essential for structure) and therefore gap penalties are lowered if they separate such areas. • Alignments of proteins of known structure show that proteins gaps do not occur more frequently than every eight residues. Therefore penalties for gaps increase when required at 8 residues or less for alignment. This gives a lower alignment score in that region. • A gap penalty is assigned after each aa according to the frequency that such a gap naturally occurs after that aa to align known homologs.

  15. Amino acid weight matrices • As we know, there are many scoring matrices that one can use depending on the relatedness of the aligned proteins. • In CLUSTAL W, as the alignment proceeds to longer branches the aa scoring matrices are automatically changed to more divergent scoring matrices (lower BLOSUM Scoring Matrices). The length of the branch is used to determine which matrix to use.

  16. Example of Sequence Alignment using Clustal W

  17. Multiple Alignment Considerations • Quality of guide tree. It would be good to have a set of closely related sequences in the alignment to set the pattern for more divergent sequences. • If the initial alignments have a problem, the problem is magnified in subsequent steps. • CLUSTAL W is best when aligning sequences that are related to each other over their entire lengths. • Do not use when there are variable N- and C- terminal regions.

More Related