1 / 38

All about the Protein Data Bank

All about the Protein Data Bank. Helen Berman. 1960 ’ s Protein crystallography begins to take off Emerging interest in protein folding Use of computer graphics to represent structure Nobel Prize awarded for the first 3D protein structures: myoglobin and hemoglobin. Myoglobin. Hemoglobin.

Download Presentation

All about the Protein Data Bank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. All about the Protein Data Bank Helen Berman

  2. 1960’s Protein crystallography begins to take off Emerging interest in protein folding Use of computer graphics to represent structure Nobel Prize awarded for the first 3D protein structures: myoglobin and hemoglobin Myoglobin Hemoglobin Lysozyme Ribonuclease Myoglobin: Kendrew, Bodo, Dintzis, Parrish, Wyckoff, Phillips (1958) Nature 181 662-666; Hemoglobin: Perutz (1962) Proc. R. Soc. A265, 161-187; Lysozyme: Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757; Ribonuclease: Kartha, Bello, Harker (1967) Nature 213, 862-865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757.

  3. 1970’s • Grassroots efforts to archive data • Protein crystallographers discuss how to archive data • June 1971 Cold Spring Harbor meeting brings groups together (Cold Spring Harbor Symposia on Quantitative Biology, vol. XXXVI, 1972) • October 1971 PDB is announced in Nature New Biology (7 structures; vol 233, 1971, page 223) • 1975 PDB receives first funding from NSF (~32 structures)

  4. Ligases Isomerases Lyases Hydrolases Transferases Oxidoreductases Enzymes Proportion of enzyme classes relative to total enzyme structures Percent Lysozyme Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757 RibonucleaseKartha, Bello, Harker (1967) Nature 213, 862-865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757. Decade: RNA-containing structures Protein/RNA complexes tRNA J.L. Sussman, S.-H. Kim (1976) Biochem Biophys Res Commun. 68:89-96; J.D. Robertus, J.E. Ladner, J.T. Finch, D. Rhodes, R.S. Brown, B.F.C. Clark, & A. Klug (1974) Nature 250: 546-551. RNA only DNA/RNA hybrid Protein/DNA/RNA complexes Decade:

  5. 1980’s • Technology takes off • Structural biology is able to focus on medical problems • Community efforts to promote data sharing • IUCr guidelines requiring data deposition in the PDB are published

  6. Protein/DNA complexes DNA only DNA/RNA hybrid Prot/DNA/RNA complexes Protein/DNA complexes Protein/RNA complexes Prot/DNA/RNA complexes DNA-containing structures Viruses Protein-nucleic acid complexes Hopper, Harrison, Sauer (1984) Structure of tomato bushy stunt virus. V. Coat protein sequence determination and its structural implications J.Mol.Biol. 177: 701-713 B-DNA Z-DNA Phage 434 repressor-operator 1bna Dickerson & Drew (1981) J. Mol. Biol. 149: 761-786 2dcg Wang, Quigley, Kolpak, Crawford, van Boom, van der Marel, Rich (1979) Nature 282: 680-686 2or1 Aggarwal, Rodgers, Drottar, Ptashne, & Harrison (1988) Science 242: 899-907 Silva, Rossmann (1985) The refinement of southern bean mosaic virus in reciprocal space Acta Crystallogr. B41: 147-157 Year

  7. Cooperative community action Individual letters to editors of journals Committees IUCr commission on Biological Macromolecules ACA/USNCCr Richards committee Funding agencies Articles in journals Fred Richards Marvin Cassman Richard Dickerson

  8. 1990’s • Number of structures increases exponentially • Complexity of structures increases • mmCIF dictionary created • New databases begin to emerge • User base expands dramatically • PDB archive moves mmCIF Working Group Members

  9. Electron Microscopy structures 50S 30S Ribosome structures Ribosome. Ban, Nissen, Hansen, Moore, & Steitz (2000) Science 289: 905-920; Clemons Jr., May, Wimberly, McCutcheon, Capel, & Ramakrishnan (1999) Nature 400: 833-840; Schluenzen, Tocilj, Zarivach, Harms, Gluehmann, Janell, Bashan, Bartels, Agmon, Franceschi, Yonath (2000) Cell 102: 615-623; Yusupova, Yusupov, Cate,& Noller (2001) Cell 106: 233-241. Bacteriorhodopsin. Henderson, Baldwin, Ceska, Zemlin, Beckmann, Downing (1990) J.Mol.Biol. 213: 899-929.

  10. 2000’s wwPDB is formed Continued growth in structures Structural genomics takes off Structures solved as of 2007 wwPDB AC 2009 wwPDB Directors

  11. Worldwide Protein Data Bank Formalization of current working practice Members RCSB PDB (Research Collaboratory for Structural Bioinformatics) PDBj (Osaka University) PDBe (EMBL-EBI) BioMagResBank (University Wisconsin, Madison) MOU signed July 1, 2003 Announced in Nature Structural Biology November 21, 2003 wwpdb.org

  12. Depositions to the PDB by decade Number of released entries Year:

  13. Archive Contents Public archive (as of January 2011) More than 70,000 entries More than 503,000 files Requires over 120 GB of storage Data dictionaries Derived data files For each entry Atomic coordinates Sequence information Description of structure Experimental data Release status information Internal archive Depositor correspondence Depositor contact information Paper records Documentation Historical records from Day One

  14. What can the PDB archive tell us?

  15. Structure distribution Resolution distribution: protein structures Resolution distribution: other structures Resolution distribution: all structures Resolution Year Structure determination methods Protein-RNA complexes RNA only RNA-DNA hybrid DNA only Other Protein-DNA complexes Number of structures Protein only Year

  16. 70 63% 60 51% 50 Structures containing distinct protein sequences (<98%) 39% 37% Structures containing novel protein sequences (<30%) 40 Percent of distinct/novel structures 32% 7% Subset of PSI structures 27% 30 Subset of other SG structures 7% 16% 14% 20 25% 4% 2% 10 10% 0 1972-1979 1980-1989 1990-1999 2000-2008 Year Distinct and novel protein sequences Redundancy: protein clusters

  17. Lysozyme: Lessons learned T4 bacteriophage (459 structures) • Amino acid replacement studies suggest that fraction of amino acid residues that define the structure of T4 lysozyme is about 50% B.W. Matthews (1996) FASEB J.10: 35-41. Insight into folding and catalysis Hen egg white (297 structures) • Low sequence identity • Structural similarity of active site to T4 B.W. Matthews, M.G. Remington, M.G. Grutter, W.F. Anderson (1981) J.Mol.Biol. 147: 545-58. Insight into evolution and catalysis Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206: 757.

  18. Myoglobin and hemoglobin: Lessons learned Whale myoglobin (185 structures) • Different ligands: oxygen, carbon dioxide1 • Amino acid substitution studies2 • Laue studies3 Insight into function and dynamics Other species myoglobin • Low sequence identity, same structure4 Insight into evolution Human hemoglobin (178 structures) Insight into function and disease (sickle cell anemia, thalassemia)5 Other species hemoglobin • Low sequence identity, same structure4 Profound insight into evolution Lodish et al.6 1Kuriyan, Wilz, Karplus, Petsko (1986) J. Mol. Biol. 192:133–154; 2Quillin, Arduini, Olson, Phillips, Jr. (1993) J. Mol. Biol. 234: 140–155, Carver, Brantley Jr, Singleton, Arduini, Quillin, Phillips Jr, Olson (1992) J. Biol. Chem. 267:14443–14450; 3Bourgeois, Vallone, Schotte, Arcovito, Miele, Sciara, Wulff, Anfinrud, Brunori (2003) PNAS 100: 8704-8709; 4Dickerson, Geis (1983) Hemoglobin: structure, function, and pathology; 5Kidd, Baker, Mathews, Brittain Baker (2001) Prot. Sci. 10:1739-1749, Harrington, Adachi, Royer Jr. (1998) J. Biol. Chem. 273: 32690 - 32696; 6Lodish, Berk, Zipursky, Matsudaira, Balitmore, Darnell (2000) Molecular Cell Biology WH Freeman & Co.

  19. TIM barrel proteins: Lessons learned TIM barrel structures (1727) http://www.cathdb.info • Share the same fold but represent significant sequence and functional diversity • Are enzymes or enzyme-related proteins involved in molecular or energy metabolism • Comparative structure analysis indicates evolutionary relatedness of TIM barrel proteins Banner, Bloomer, Petsko, Phillips, Wilson, (1976) Biochem.Biophys.Res. Commun. 72: 146-155 Nagano, Orengo, Thornton (2002) J.Mol. Biol. 321: 741-65. Nagano, Orengo, Thornton (2002) J.Mol. Biol. 321: 741-65.

  20. HIV-related structures HIV-1 reverse transcriptase Abacavir (GSK) Amprenavir (GSK) Fosamprenavir (GSK) Nevirapine (BI) Stavudin (BMS) 311 2HND, 2HNY, 1S1U, 1S1X, 1LW0, 1LWE, 1LWC, 1LWF, 1JLB, 1JLF, 1FKP, 1VRT, 3HVT 122 1T7J, 1HPV Lopinavir (Abbott) Atazanavir (BMS) 27 Efavirenz (BMS) Lamivudine (GSK) 2FXE, 2FXD, 2O4K, 2AQU, 2FND 2RKG, 2RKF, 2QHC, 2Z54, 2Q5K, 2O4S, 1RV7, 1MUI 1JKH, 1IKW, 1IKV, 1FKO, 1FK9 39 Nelfinavir (Agouron) Darunavir (Tibotec) Zidovudine (GSK) Emtricitabine (Gilead) 2QAK, 2PYM, 2Q63, 2PYN, 2Q64, 2R5Q, 1OHR 110 Protease Tipranavir (BI) Indinavir (Merck) Reverse Transcriptase 2R5P, 2B7Z, 2AVV, 2AVO, 2AVS, 1SGU, 1SDT, 1SDV, 1SDU, 1K6C, 1C6Y, 2BPX, 1HSG, 1HSH 2O4N, 2O4L, 2O4P, 1D4Y, 1D4S Tenofovir (Gilead) Zalcitabine (Hoffmann- LaRoche) Gag protein Integrase Ritonavir (Abbott) Saquinavir (Roche) Other 2B60, 1RL8, 1SH9, 1N49, 1HXW 1T05 3D1X, 3D1Y, 3CYX, 2NMW, 2NMZ, 2NNP, 2NMY, 2NNK, 1C6Z, 1FB7 Etravirine (Tibotec) Delavirdine (Pfizer) 1S6P HIV-1 protease Year

  21. Scientific challenges to the PDB Number of data files continues to increase Information content of each data file is increasing Many more very large macromolecular complexes New structure determination methods

  22. Growth of PDB Depositions By deposition and processing site By experimental method

  23. PDB Depositors

  24. Technical challenges in data management How do we represent diverse data? How do make a searchable database? How do we integrate with other data resources? How do we make a scalable system? How do we meet the needs of a diverse community?

  25. The pipeline: deposition to release Data Distribution & Query Data Processing Data Archiving PDB Deposition Structure Determination

  26. RCSB and wwPDB Full Data Flow Processing and Deposition Integration Dissemination Annotation Web communication with Depositor RCSB Web RCSB External Access to Data Loaders Database PDB ID B S B C T S Validation I R depositors D consumers C Annotation A R Master PDB PDB FTP Release FTP Archive Shared RCSB Archive RCSB at Harvest , DB at UCSD RU Prepare , Prevalidate PDBe Web Data PDBe Access to e p Exchange PDB ftp s e B Data r d mirror e o file D t n u t P R B A ( Daily r M a R N upload ) PDBj Web P M T R PDBj J w I Access to M D , B B T w PDB ftp N A I Data D D T mirror A I P D A Data In: What happens with PDB depositions?

  27. After deposition: annotation and validation Check all incoming files Sequence/structure correspondences Small molecule ligands Biological assembly (PISA, author-defined) Agreement with experimental data Agreement with known geometrical features (Molprobity, Procheck, SFCheck, NUCheck) Update and maintain data processing database daily Developing method-specific standards: Validation Task Forces • X-ray • April 2008 at EBI-EMBL, Hinxton, UK • Randy Read (Chair) • NMR • September 2009 in Paris, France • January 2011 at Rutgers • Guy Montelione, Michael Nilges (Co-chairs) • EM • September 2010 at Rutgers

  28. Planning for the Future:wwPDB Deposition and Annotation Tool support increases in deposition throughput address the anticipated increase in complexity and experimental variety of submissions focus on quality enhancement through the use of community-based validation tools Goal: To collaboratively develop the new processes and supporting systems that will support the wwPDB over the next 10 years. The new systems will provide a high quality and dependable resource that will effectively:

  29. FTP site for wwPDB Data downloaded by hundreds of external resources Each wwPDB member maintains websites with different services Data Out: What happens when data are released?

  30. RCSB PDB portal www.pdb.org

  31. MyPDB: Keep up-to-date with new structures...automatically! Framework to store user preferences Saves queries in a private account Notifies users via email when new structures match stored queries

  32. Interactive Views of Domain Annotations

  33. Structure Explorer Summary Page Information summarized in easy-to-read 2-column format Related information presented in customizable “widgets” Abstract from PubMed is displayed

  34. Visualization Options • 3D Viewers are context-sensitive • Asymmetric unit • Biological assembly • Biological assembly is displayed by default • Presumed oligomeric state of biological molecule is displayed (for X-ray structures)

  35. Protein-Ligand Interaction View Simplified user interface Added metal interactions Display of bond orders from Chemical Component Dictionary

  36. Integration of Structural Bioinformatics Activities Data In Deposition Validation Annotation Ligands Data Out Query Visualization Reports Analysis Data In Data Out Outreach Conferences News Data Views Impact Outreach to give a structural view of biology

  37. What we have learned so far • Sequence-structure-function relationships are complex • Low sequence identity-same structure (hemoglobin) • Same structure/different function (TIM) • Different overall structure/same function (lysozyme) • New protein targets lead to new drugs (HIV protease) • Technology-science cycle closely coupled in structural biology • A structural view of biology is closer than we thought • “If it can be done, it will be done”

  38. Acknowledgements Funding Agencies for all Projects: NSF, NIGMS, DOE, NLM, NCI, NINDS, NIDDK; EMBL, Wellcome Trust, BBSRC, NIGMS, EU; BIRD-JST, MEXT; NLM; NIGMS

More Related