1 / 64

Structural Bioinformatics

Structural Bioinformatics. Elodie Laine Master BIM-BMC Semestre 3, 2017-2018. Laboratoire de Biologie Computationnelle et Quantitative (LCQB) e-documents : http://www.lcqb.upmc.fr/laine/STRUCT e-mail : elodie.laine@upmc.fr. Lecture 1 – Classification. Elodie Laine – 19.09.2016.

engelbert
Download Presentation

Structural Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structural Bioinformatics ElodieLaine Master BIM-BMC Semestre 3, 2017-2018 Laboratoire de Biologie Computationnelle et Quantitative (LCQB) e-documents: http://www.lcqb.upmc.fr/laine/STRUCT e-mail: elodie.laine@upmc.fr

  2. Lecture 1 – Classification Elodie Laine – 19.09.2016

  3. General principles Comparing structures structural similarity structural alignment segment shapes secondary structure elements Classifying structures evolutionary relationships functional motifs Elodie Laine – 19.09.2016

  4. A brief history of protein structures Pauling & Corey (1951) PNAS Linus Pauling (1901-1994) Nobel prize in 1954, CALTECH Elodie Laine – 19.09.2016

  5. A brief history of protein structures John Kendrew (1917-1997) Kendrew et al. (1958) Nature Nobel prize in 1962 with Max Perutz, Cavendish Laboratory Elodie Laine – 19.09.2016

  6. Protein structures are puzzling 3-Dimensional structure of myoglobin (1958, 2 Å resolution) Kendrew et al. (1958) Nature Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates, and it is more complicated than has been predicted by any theory of protein structure. Elodie Laine – 19.09.2016

  7. BACK TO BASICS Elodie Laine – 19.09.2016

  8. General principle driving protein folding Hydrophobic core / hydrophilic surface • Observation: The main driving force for folding water-soluble globular proteins is to pack the hydrophobic side chains into the interior of the molecule. • Problem: To pack side chains inside the protein core, the main chain must also fold into the interior, but it is highly polar and thus hydrophilic. • Solution: formation of secondary structures charaterized by hydrogen-bonding between the main-chain NH and C=O’ groups Elodie Laine – 19.09.2016

  9. Secondary structure: alpha (α) helix A few facts about α-helices: 3.6 residues per turn C’=O (n) ---- NH (n+4) l in [4-5;>40], μ = 10 res. 1.5 Å by res x 10 = 15 Å Space-filling model of the α-helix Pauling & Corey (1951) PNAS Elodie Laine – 19.09.2016

  10. Secondary structure: beta (β) sheet anti-parallel parallel A few facts about β-strands: C’=O (n) ---- NH (nadj) l in [5; 10] form pleated β-sheets Elodie Laine – 19.09.2016

  11. Secondary structure: loop regions Hairpin loops A few facts about loop regions: C’=O ---- wat ; NH ---- wat various lengths & irregular shape highly flexible involved in binding sites … Type I Type II Elodie Laine – 19.09.2016

  12. Secondary structure assignment • Given the 3D coordinates of the atoms of a protein, it is possible to assign secondary structure to each amino acid residue. • The DSSP (Define Secondary Structure of Proteins) algorithm • Kabsch & Sander (1983) Biopolymers • Identifies intra-backbone H-bonds using an electrostatic definition: • Defines eight types of secondary structures: • 310 helix (G), α helix (H),  helix (I)are recognized by having a repetitive sequence of H-bonds with 3, 4 or 5 residues apart • β-sheets can be β-bridges (B) with a few H-bonds or β-bulges (E) defined by longer sets of H-bonds •   turns (T) feature H-bonds typical of helices • regions with high curvature (S), Ciα Ci+2α – Ci-2α Ciα angle < 70° • loops (a blank or space) where no other rule applies Elodie Laine – 19.09.2016

  13. Topology Diagrams nucleoplasmin (1K5J) auxin binding protein 1 (1LRH) Elodie Laine – 19.09.2016

  14. Structural motifs Structural motifs are simple combinations of a few secondary structure elements, with a specific geometrical arrangement. Some motifs may be associated to a particular function. EF-hand Calmodulin Elodie Laine – 19.09.2016

  15. Protein domains The fundamental unit of tertiary structure is the domain. A domain can fold independently. Receptor tyrosine kinases Elodie Laine – 19.09.2016

  16. What is a domain ?! Protein domains are stable units of protein structure that can fold autonomously. In the past, protein domains have been described in terms of structure compactness, function and evolution, or folding. Small proteins and most medium sized ones have just one domain. Often the different domains of a protein are associated with different functions. Domains are formed by different combinations of secondary structure elements and motifs. Elodie Laine – 19.09.2016

  17. A JUNGLE OF SHAPES Elodie Laine – 19.09.2016

  18. All αdomains The first globular protein to be solved, myoglobin, belongs to the class of α-domains structures. In α-domains structures, α-helices are packed against each other to produce a stable globular structure, which hydrophobic core is protected from the solvent. Alpha-helices are sufficiently versatile to produce many very different classes of structures. In membrane-bound proteins, the regions inside the membranes are frequently α-helices. Elodie Laine – 19.09.2016

  19. All αdomains Alpha helices are sufficiently versatile to produce many very different class of structures. 1FXK molecular chaperone prefoldin Siegert et al. (2000) Cell An isolated α-helix in solution is marginally stable. F. Crick showed (1953) that the side-chain interactions are maximized for helices wound around each other in a coiled-coil arrangement. Elodie Laine – 19.09.2016

  20. All αdomains The four-helix bundle is a common domain structure in α-proteins. hydrophobic core 1HWH Growth Hormone Sundstrom et al. (1996) J. Biol. Chem. Four-helix bundles are formed by four α-helices packed against each other with their helical axes almost parallel to each other. Hydrophobic side chains are buried between the helices. Elodie Laine – 19.09.2016

  21. All αdomains The globin fold is present in myoglobin and hemoglobin. Myoglobin 153 aa residues + heme group compact structure with 8 21α-helical parts used for oxygen storage in the muscles has to bind oxygen reversibly at low pressures Hemoglobin 4 subunits: 2 α-chains (141 aa) 2 β-chains (146 aa) each subunit has one heme group used for oxygen transport can bind 4 oxygen molecules reversibly allosteric protein 1MBN Elodie Laine – 19.09.2016

  22. α/βdomains The most frequent of the domain structures are the alpha/beta (α/β) domains. Alpha/beta domains consist of a central parallel or mixed β- sheet surrounded by α-helices. Parallel β-strands are arranged in barrels or sheets, according to three main classes: the TIM barrel, the Rossman fold and the horseshoe fold. Elodie Laine – 19.09.2016

  23. α/βdomains The TIM barrelhas a central cylinder or barrel of β-sheet formed from 8 parallel β-strands. This very common fold is found in many proteins with diverse functions and no detectable sequence identity (convergent evolution). The active sites are all formed by loop regions at the carboxyl ends of the β-strands that connect to the α-helices. Triose phosphate isomerase 5CSS Elodie Laine – 19.09.2016

  24. α/βdomains The Rossman foldis composed of six parallel β-strands linked to two pairs of α-helices. This fold is common in proteins that bind nucleotides. It was names after Michael Rossman, Purdue University, who first discovered it in the enzyme lactate dehydrogenase in 1970. 3QVO Elodie Laine – 19.09.2016

  25. α/βdomains The horseshoe foldis characteristic of leucine-rich repeats. This fold is composed of repeating 20–30 aa stretches that are unusually rich in leucine. The β-strands form a curved parallel β-sheet with all the helices on the outside. One face of the beta sheet and one side of the helix array are exposed to solvent. 1DFJ Elodie Laine – 19.09.2016

  26. All βdomains Antiparallel beta (β) structures represent the most functionally diverse group of protein structures ; it includes enzymes, transport proteins, antibodies, cell surface proteins, and virus coat proteins. The cores of theses structures are built up by β-strands, from 4-5 to over 10. The β-strands are arranged in a predominantly antiparallel fashion. They usually form two β-sheets (twisted by definition) joined together and packed against each other, resulting in a barrel-like structure. Elodie Laine – 19.09.2016

  27. All βdomains The plasma-borne retinol-binding protein, RBP, is an up-and-down β-barrel. The structure can be viewed as two β-sheets (green and blue) packed against each other. Red β-strands participate in both β-sheets. A retinol molecule, vitamin A (yellow), binds inside the barrel and is transported from the liver to various to tissues before RBP is degraded. 5HBS Elodie Laine – 19.09.2016

  28. All βdomains The influenza virus neuraminidase soluble head is a homotetramer (4*400 aas). In this up-and-down β-sheet structure, the β-sheets do not form a simple barrel but instead 6 small sheets, each with 4 strands, arranged like blades of a 6-bladed propeller. The active site is in the middle of one side of the propeller. 7NN9 P. Colman (1991) 2.9 Å resolution Elodie Laine – 19.09.2016

  29. All βdomains The γ-crystallin molecule has two domains, each domain built from 2 greek key motifs. The crystallins are lens-specific proteins responsible for the transparency and reflective power of the lenses in our eyes. The 4 greek key motifs are evolutionary related (2 events of duplication and fusion). 1A45 T. Blundell (1981) 1.9 Å resolution Elodie Laine – 19.09.2016

  30. All βdomains 2 greek key motifs can be found in jelly roll barrel, very common in subunits of spherical viruses. This complex nonlocal structure contains 4 pairs of antiparallel β-sheets, only one of which is adjacent in sequence, "wrapped" in 3D to form a barrel shape. 1QW9 Elodie Laine – 19.09.2016

  31. All βdomains Up-and-down γ-crystallin-like jelly-roll Elodie Laine – 19.09.2016

  32. LET'S PUT SOME ORDER Elodie Laine – 19.09.2016

  33. Hierarchical taxonomy class secondary structure content Fold/topology global shape Increasing similarity superfamily homology & similar function The structural classification of proteins is centered on the notion of domains. Elodie Laine – 19.09.2016

  34. Protein structure classification resources CATH/Gene3D 16 millions protein domains classified into 2,626 superfamilies http://www.cathdb.info/ SCOP/SCOPe 59,514 PDB entries representing 167,547 domains.  http://scop.berkeley.edu/ Superfamily level annotations on a collection of hidden Markov models for 2,478 completely sequence genomes Elodie Laine – 19.09.2016

  35. SCOP & CATH – the standard of truth The SCOP database is mainly based on expert knowledge. The building process of CATH contains more automatic steps and less human intervention. class fold superfamily family sec. struct. + connectivity low seq. id, high struct. similarity high seq. similarity or functional evidence ’all α’, ’all β’, ’α/β’, ’α+β’ homologous superfamily class architecture topology low seq. id, high struct. similarity general shape sec. struct. + connectivity ’all α’, ’all β’, ’α/β’ Elodie Laine – 19.09.2016

  36. CATH example Elodie Laine – 19.09.2016

  37. Classes β class secondary structure content Levitt and Chothia (MRC lab): All alpha (α) All beta (β) • Alpha and beta – mixed (α/β) • Alpha and beta proteins – segregated (α+β) … α/β α/β Elodie Laine – 19.09.2016

  38. Folds and superfamilies Domains belonging to the same fold have the same major secondary structures in the same arrangement with the same topological connections. Ex: Globin-like, Long alpha-hairpin, Type I dockerin domain… The domains within a fold are further classified into superfamilies. Domains belonging to the same superfamily have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology. Ex: Globin-like and Alpha-helical ferredoxin are the two superfamilies of the Globin-like fold. Elodie Laine – 19.09.2016

  39. Superfamilies PA superfamily Elodie Laine – 19.09.2016

  40. SCOP and CATH comparison • A very large number of domain pairs are not classified consistently in the two herarchies. Csaba G, Birzele F, Zimmer R. (2009) BMC Struct Biol. Elodie Laine – 19.09.2016

  41. Protein structure space Choi and Kim. (2006) PNAS. The protein structure space is sparsely populated, and all of the proteins of known structures cluster mostly into four elongated regions, which correspond approximately to four SCOP classes (all-α, all-β, α+β, and α/β) Elodie Laine – 19.09.2016

  42. NOW, LET'S GO BACK THROUGH TIME Elodie Laine – 19.09.2016

  43. Protein evolution The evolution of proteins is different from the evolution of organisms. It does not need to follow the evolutionary path of organismic reproduction. Rather, the evolution of proteins is directly related to improved , unaltered or diversified molecular functions, and the protein function is directly related to protein structure. Structures tend to diverge less than sequences. Proteins displaying a certain degree of sequence similarity adopt similar shapes. Generally above 40% sequence identity, the structures are very much alike. There exist no remain of primitive proteins. All information about protein structures is derived from the proteins of present-day organisms, and the current protein universe represents a time-sliced view of all proteins at their various stages of evolution. Elodie Laine – 19.09.2016

  44. Evolutionary processes Do all proteins displaying identical folds share a common ancestor ? Divergent evolution Homology Convergent evolution Analogy • Above a certain level of structural similarity • Conservation of rare structural characteristics, e.g. βαβ left • Low sequence identity, yet significant • Key residues in the active site • Transitivity: if A & B are homologous, B & C also, then A & C are homologous Elodie Laine – 19.09.2016

  45. Can protein structures evolve? α-amylase (1bpl) G4-amylase (2amg) GPK (1phk) sonic hedgehog (1vhh) HIN recomb. (1hcr) CAP (1cgp) L11 (1fow) biotin repressor (1bia) Elodie Laine – 19.09.2016

  46. Protein structure evolutionary ages One can estimate the evolutionary ages of protein structures: from a representative protein structure, retrieve all homologous sequences map these sequences on the tree of life find the most recent common ancestor of the organisms that contain these homolgous sequences Choi and Kim. (2006) PNAS. Elodie Laine – 19.09.2016

  47. When homology is difficult to assess Decarboxylases: convergent or divergent evolution? Benzoylformate decarboxylase (BFD) and pyruvate decarboxylase (PDC) share a common fold and overall biochemical function, but they recognize different substrates and have low (21%) sequence identity. Elodie Laine – 19.09.2016

  48. When homology modeling fails The K homology (KH) module is a widespread RNA-binding motif. the type I and II KH domains belong to different protein folds. Thus KH motif proteins provide a rare example of protein domains that share significant sequence similarity in the motif regions but possess globally distinct structures. Elodie Laine – 19.09.2016

  49. HOW DO WE DO IN PRACTICE? Elodie Laine – 19.09.2016

  50. How can we compare 2 structures? Root Mean Square Deviation (RMSD) is a measure of structural similarity. It expresses the minimal global mean distance between the n corresponding atoms of the superimposed structures a and b, where (x,y,z) are the atomic cartesian coordinates. The RMSD can be computed on a selection of atoms (backbone, heavy atoms…). The RMSD computation requires that exactly n atoms from structure a to correspond to n atoms from structure b. The RMSD is generally computed after superimposition of structures a and b. Elodie Laine – 19.09.2016

More Related