1 / 57

Bioinformatics – NSF Summer School 2003 Z. Luthey-Schulten, UIUC

Bioinformatics – NSF Summer School 2003 Z. Luthey-Schulten, UIUC. Sequence-Sequence Alignment. Smith-Watermann Needleman-Wunsch. Sequence-Structure Alignment. Threading Hidden Markov. Sequence Alignment & Dynamic Programming. number of possible alignments:.

luna
Download Presentation

Bioinformatics – NSF Summer School 2003 Z. Luthey-Schulten, UIUC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics – NSF Summer School 2003 Z. Luthey-Schulten, UIUC

  2. Sequence-Sequence Alignment • Smith-Watermann • Needleman-Wunsch Sequence-Structure Alignment • Threading • Hidden Markov

  3. Sequence Alignment & Dynamic Programming number of possible alignments: Seq. 1: a1 a2 a3 - - a4 a5…an Seq. 2: c1 - c2 c3 c4 c5 - …cm Smith-Waterman alignment algorithm Score Matrix H: Traceback AWGHE AW--HE

  4. Smith-Waterman Local Alignment Score Matrix AWGHE AW--HE

  5. Blosum 40 Substitution Matrix

  6. Protein Structural Relationships Can protein structural relationships help us to understand evolutionary dynamics? Is there a connection between evolutionary events and changes in protein structure? What is the effect of gene duplication, horizontal gene transfer, and other evolutionary mechanisms on protein shape? Substitution Indel Domain Insertion O’Donoghue and Luthey-Schulten, UIUC 2003

  7. Sequence Alignment & Dynamic Programming number of possible alignments: Seq. 1: a1 a2 a3 - - a4 a5…an Seq. 2: c1 - c2 c3 c4 c5 - …cm Needleman-Wunsch alignment algorithm Score Matrix H: Traceback ??? Tutorial: W=d

  8. Needleman-Wunsch Global Alignment Initialization of Gap Penalties Similarity Values http://www.dkfz-heidelberg.de/tbi/bioinfo/PracticalSection/AliApplet/index.html

  9. Filling out the Score Matrix H http://www.dkfz-heidelberg.de/tbi/bioinfo/PracticalSection/AliApplet/index.html

  10. Traceback and Alignment The Alignment Traceback (blue) from optimal score http://www.dkfz-heidelberg.de/tbi/bioinfo/PracticalSection/AliApplet/index.html

  11. Energy Landscape Theory of Structure Prediction

  12. Protein Structure Prediction 1-D protein sequence 3-D protein structure Ab Initio protein folding SISSIRVKSKRIQLG…. Sequence Alignment Target protein of unknown structure SISSRVKSKRIQLGLNQAELAQKV------GTTQ… QFANEFKVRRIKLGYTQTNVGEALAAVHGS… Homologous/analogous protein of known structure Sequence Alignment: the Energy Function E= Ematch + Egap Egap= ? Ematch=

  13. Threading: Sequence-Structure Alignment “Scaffold” structure Target sequence threading alignment between target and scaffold A1 A2 A3 A4 A5 … Threading Energy Function R. Goldstein, Z. Luthey-Schulten, P. Wolynes (1992, PNAS)

  14. Gap Penalties Distribution of Gaps Insertion Deletion Sequence-Structure Gap Energy target scaffold Bulge R. Goldstein, Z. Luthey-Schulten, P. Wolynes (1994) Proc 27th Annu Hawaii Int Conf Sys Sci:306.

  15. Similarity Measures Sequence Identity fraction of identically matched residues Q “Structural Identity” fraction of native contacts

  16. <Es/E> 2 Es

  17. Homology Modeling - Threading

  18. Results from CASP5 CM/FR The prediction is never better than the scaffold. Threading Energy function requires improvement.

  19. You are now entering the twilight zone of sequence identity. We need profiles! Watch for Bioinformants!!!

  20. Profiles – Evolution Revisited • “What molecular sequences taught us in the 1960’s was that the genealogical history of an organism is written to one extent or another into the sequences of each of its genes, an insight that became the central tenet of a new discipline, molecular evolution” • Woese (PNAS, 2000) Pauling (1965)

  21. Universal Tree The Universal Phylogenetic Tree inferred from comparative analyses of rRNA sequences: Woese(PNAS, 1990)

  22. Horizontal Gene Transfer O’Donoghue and Luthey-Schulten, UIUC 2003

  23. Multiple Sequence Alignments • “The aminoacyl-tRNA synthetases, perhaps better than any other molecules in the cell, eptiomize the current situation and help to under standard (the effects) of HGT” Woese (PNAS, 2000; MMBR 2000)

  24. Standard Dogma Molecular Biology • DNA RNA Proteins • Role of AARS? • Charging of t-RNA

  25. NCBI 3D

  26. LeuRS Canonical Tree Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll (Yale) Micro. Mol. Biol. Rev. March 2000..

  27. D,N Sequence Phylogenetic Trees Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll (Yale) Micro. Mol. Biol. Rev. March 2000..

  28. Fold Motifs of AARSs O’Donoghue and Luthey-Schulten, UIUC 2003

  29. Structure Conserved More than Sequence Structural Overlap of Class II AARS Conserved helices Conserved sheets

  30. Subset of Class II Structural Tree O’Donoghue and Luthey-Schulten, UIUC 2003

  31. Novel Evolutionary Connections from Sequence and Structure B A E Canonical Pattern D E F L W Y Canonical Pattern + AB I H P M Gemini K1 K2 C S G N Q Basal Canonical V T A R B A No canonical pattern Horizontal transfer after B-AE split. O’Donoghue, Luthey-Schulten, UIUC 2003 Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll (Yale) Micro. Mol. Biol. Rev. March 2000..

  32. Gap Distribution Functions Length Gap Distribution Function Spatial Gap Distribution Funciton l, gap length (residues) rij, spatial gap distance (Å) B. Qian & R. Goldstein. (2001) Proteins45:102.

  33. Structural Alignment Methods • PDB - Structural Neighbors – CE (Bourne) • Stamp - Russell

  34. Multiple Structural Alignments • STAMP • Initial Alignment • Multiple Sequence alignment • Ridged Body “Scan” • Refine Initial Alignment & Produce Multiple Structural Alignment • Dynamic Programming (Smith-Waterman) through P matrix gives optimal set of equivalent residues. • This set is used to re-superpose the two chains. Then iterate until alignment score is unchanged. • This procedure is performed for all pairs. R. Russell, G. Barton (1992) Proteins14: 309.

  35. Multiple Structural Alignments • STAMP – cont’d • Refine Initial Alignment & Produce Multiple Structural Alignment Alignment score: • Multiple Alignment: • Create a dendrogram using the alignment score. • Successively align groups of proteins (from branch tips to root). • When 2 or more sequences are in a group, • then average coordinates are used.

  36. Stamp Output/Secondary Structure O’Donoghue and Luthey-Schulten, UIUC 2003

  37. Stamp Output/Clustal Format O’Donoghue and Luthey-Schulten, UIUC 2003

  38. Examples of Useful Web Tools • Genomes – Sequence and Gene Information • Domain Architecture • Multiple Sequence Alignments • Phylogenetic Trees • Structural Databases • Hidden Markov Methods

  39. NCBI: Genomes

  40. Charging the tRNA Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll (Yale) Micro. Mol. Biol. Rev. March 2000..

  41. NCBI 3D

  42. Report from SWISS-PROT

  43. PFAM Report

  44. Sequence Dendrogram from Clustal Luthey-Schulten, UIUC 2003

  45. Phylogenetic Tree in Tutorial Pogorelov and Luthey-Schulten, UIUC 2003

  46. Alignment in MOE

  47. Alignment in MOE

More Related