450 likes | 658 Views
An Overview of the RCSB Protein Data Bank. http://www.pdb.org/ • info@rcsb.org. History of the PDB. 1970s Community discussions about how to establish a PDB Cold Spring Harbor meeting in protein crystallography PDB established at Brookhaven (October 1971; 7 structures) 1980s
E N D
An Overview of the RCSB Protein Data Bank http://www.pdb.org/ • info@rcsb.org
History of the PDB 1970s • Community discussions about how to establish a PDB • Cold Spring Harbor meeting in protein crystallography • PDB established at Brookhaven (October 1971; 7 structures) 1980s • Number of structures increases as technology improves • Community discussions about requiring depositions • IUCr guidelines established • Number of structures deposited increases • Independent biological databases established – e.g., the NDB 1990s • mmCIF project completed • Structural genomics begins • PDB moves to RCSB 2000s • RCSB PDB renewed • wwPDB established
PDB Mission To provide the most accurate, well-annotated data in the most timely and efficient way possible to facilitate new discoveries and advances in science
Structural Biology • Understand biological processes through structural analyses • Several methods (X-ray, NMR, cryo-electron microscopy)
Structural Genomics “The next step beyond the human genome project” From the NIH Request for Proposals for Structure Genomics Centers: “These studies should lead to an understanding of structure/function relationships and the ability to obtain structural models of all proteins identified by genomics. This project will require the determination of a large number of protein structures in a high-throughput mode.”
The Rules Driving Structural Genomics • There is much information derived from structure that is not available from sequence alone yet there are 2-3 orders of magnitude more sequences that structures • There is a high likelihood that if two sequences are similar they will have similar structures • Two dissimilar sequences can share a similar structure as a result of divergent or convergent evolution • Similar structures may confer similar functions
Challenges • Growth in number of structures • Increase in complexity of structures • New methods for structure determination • Demand for complex queries • Demand for more annotation • Integration with other genomic and proteomic information • Larger and more diverse community of users
PDB Timeline 1993 1998 2003 2008 Total structures 1727 8942 23793 60000? # of structures deposited/year 792 2178 4831 9000? Average #of Web hits/day N/A 57000 180000 ?
Structure Determination(X-ray) Publication Functional Annotation Target Selection Data Collection Structure Solution Structure Refinement Crystallomics PDB Deposition Isolation, Expression, Purification,Crystallization
Depositor Validation MAXIT Data ADIT AutoDep Input Tool Database Loader Reports Final Files Metadata Dictionaries Data Views System for Data Collection and Archiving
Data Processing System Features • Different dictionaries without software changes • Simple customization of both functionality and content • Automatically scales with changes in content • Can be distributed to multiple deposition sites • Reference data and standard nomenclature (ERFs)
Data Content of Each PDB Entry • 1970’s • Name, source, reference, resolution, sequence,secondary structure, crystal data, coordinates, unstructured remarks • 1990’s • Name, source, reference,resolution, refinement details, data collection and processing details,symmetry details, biological unit information, missing residues, related entries, sequence, ligand and ions, secondary structure, crystal data, coordinates, few unstructured remarks
Annotation and Validation • ADIT • Reviewing, adding, correcting entry information • Maxit • File format conversions • Blast Automation Tool results • Sequence discrepancies, protein names, synonyms, source info, EC number • Validation Server Reports • Format and nomenclature consistency • Sequence/coordinate mismatches • Geometrical checks (NUCheck, PROCHECK) • Experimental checks (SFCheck) • Ligand Depot, ChemDraw • RasMol for Visualization • PubMed, Citation Tracker, Citation Tool
Data Uniformity • Sequence • Resolve anomalies relative to Swiss-Prot, GenBank • Resolve anomalies between sequence and atom • Atom nomenclature • Atom naming problems in 40% of structures • Redundant atom labels • Errors in chirality • Biologically active molecule described • Ligands • Names standardized • http://deposit.pdb.org/public-components-erf.cif • Biological assembly ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/ The Protein Data Bank: Unifying the Archive. Nucleic Acids Research 2002, 30:245-248
Additional Requirements of Structural Genomics • All data in Materials and Methods section of a journal should be captured • Tracking of all experiments must be publicly available
Extending Data Dictionaries for Deposition • X-ray • Structure determination data items • http://deposit.pdb.org/mmcif/sg-data/xstal.html • NMR • Structure determination data items • http:// deposit.pdb.org /mmcif/sg-data/nmr.html • Protein Production • http:// deposit.pdb.org /mmcif/sg-data/protprod.html
Current Integration Strategy • Collect bits of mmCIF output from each program step • Merge the mmCIF data from each step • Use ADIT deposition tool to enter remaining data and check results • Make all data files available in the representation of the exchange dictionary
Target Registration DatabaseTargetDB • http://targetdb.pdb.org/ • All targets downloadable in XML (~51,000 Targets) • Targets downloaded from 18 centers weekly • Target search by: • Sequence (FASTA), project target ID, project site, status (selected, cloned, expressed, … in PDB), update date, protein name, source organism • Report output in HTML, FASTA, and XML • Integrates PDB entry sequences (~55,600 sequences) • Includes PDB pre-release sequence data • Provides links to related sequence databases • Open to all Structural Genomics projects • Summary reports of target or project progress
Beyond TargetDB PepcDB Protein Expression, Purification, and Crystallization Database All information about targets including the protocols for protein production
Current Query System WWW User Interfaces SearchFields SearchLite Query Result Browser Structure Explorer CGI INTEGRATION LAYER DB INTEGRATION LAYER FLAT FILES KEYWORDSEARCH DERIVED DATA CORE DB BMCD FTP tree (download) POM SYBASE LUCENE
Biological Assembly View Structure page Tutorial at http://www.rcsb.org/pdb/biounit_tutorial.html 1AEW Horse Apoferritin Hempstead, P. D., Yewdall, S. J., Fernie, A. R., Lawson, D. M., Artymiuk, P. J., Rice, D. W., Ford, G. C., Harrison, P. M.: Comparison of the three-dimensional structures of recombinant human H and horse L ferritins at high resolution. J Mol Biol268pp. 424 (1997)
Structure Explorer Summary Page Go to EC site Search by EC number Go to NCBI Taxonomy Go to PubMed Abstract Search by author Search for related citations Search by Chemical Component
3-tier Architecture Separates database, applications and presentation Supports high access rates on multiple machines Serves very large data sets Design of the New PDB Database
Navigation Persistent Search Box Integrated Help (Context-sensitive) Getting Started Persistent Navigation Bar Hierarchical Menu Items Site Search
Browsing Gene Ontology Enzyme Classification Taxonomy Disease Ligands CATH/SCOP
Molecular Visualization Simple viewer built from Molecular BiologyToolkit http://mbt.sdsc.edu Envisioned to be a future query interface, e.g. “what other structures contain this ligand?” Molecular Biology Toolkit authors: John Moreland and Apostol Gramada 4HHB Fermi, G., Perutz, M. F., Shaanan, B., Fourme, R.: The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. J Mol Biol175pp. 159 (1984)
http://www.wwpdb.org/ • Worldwide PDB (wwPDB) • RCSB (Research Collaboratory for Structural Bioinformatics) • PDBj (Osaka University) • Macromolecular Structure Database (EBI) • To ensure that PDB files remain in a single archive to best serve the worldwide community of depositors and users
http://www.pdb.org/ Operated by three members of the RCSB: Rutgers, The State University of New Jersey; San Diego Supercomputer Center at the University of California, San Diego; Center for Advanced Research in Biotechnology/UMBI/NIST. The RCSB PDB is supported by funds from the National Science Foundation (NSF), the National Institute of General Medical Sciences (NIGMS), the Office of Science, Department of Energy (DOE), the National Library of Medicine (NLM), the National Cancer Institute (NCI), the National Center for Research Resources (NCRR), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and the National Institute of Neurological Disorders and Stroke (NINDS).
RCSB PDB MSD-EBI PDBj at Osaka
RCSB-PDB Team RCSB PDB Team: Ken Addess, Helen M. Berman, Wolfgang F. Bluhm, Phil Bourne, Kyle Burkhardt, Al Carlson, Li Chen, Sharon Cousin, Nita Deshpande, Shuchismita Dutta, Zukang Feng, Lew-Christiane Fernandez, Judith L. Flippen-Anderson, Gary Gilliland, Rachel Kramer Green,Vladimir Guranovic, Shri Jain,Jeff Merino-Ott, Rose Oughtred, Irina Persikova, Suzanne Richman, Melcoir Rosas, Kathryn Rosecrans, Bohdan Schneider, Wayne Townsend-Merino, Elizabeth Walker, John Westbrook, Huanwang Yang, Jasmin Yang, Christine Zardecki