840 likes | 865 Views
This summary provides an overview of the Worldwide Protein Data Bank's accomplishments and future plans, including the establishment of validation procedures, expansion into new regions, increased data deposition and download rates, and the development of a common deposition and annotation tool.
E N D
Worldwide Protein Data Bank www.wwpdb.org Summary Overview Haruki Nakamura
wwPDBAC 2008 Recommendations Establish validation procedures for X-ray and NMR X-ray Task Force draft report; NMR Task Force established Establish recommendations for additional data deposition and release requirements Policy document updated Work with SAXS/SANS community First report completed Definition of purview of the PDB For approval at this meeting Establish feasibility of chemical shift depositions Implementation plan Independent funding for wwPDB wwPDB Foundation established Develop wwPDBAC membership plan Done Establish EM Task Force First EMDB AC Meeting held March 2009 Broaden wwPDBAC to include China and India Associate Members: Zihe Rao (China) and Manju Bansal (India)
wwPDB October 2008 - September 2009 New Leadership PDBe-Gerard Kleywegt Funding stable at 4 sites Continued growth of archive Increased use of data PDB Archive Version 3.15 released: archival snapshots Further coordination of FTP and web updates Common Tool Project underway Establishment of wwPDB Foundation Continued outreach
Funding RCSB PDB competitive renewal funded by NSF January 2009 - December 2013 PDBe competitive grant from Wellcome Trust December 2009 - November 2014 Additional new permanent EMBL posts-from 2013: stable core of at least 15 permanent posts (from 6 in 2008!) PDBj competitive renewal funded by JST (Japan Science & Technology Agency) April 2006 - March 2011: Current program April 2011 - March 2014: A new funding system for life-science databases is planned BMRB competitive renewal funded from the National Library of Medicine September 2009 – August 2014 (parent grant) September 2009 – August 2011 (administrative supplement – US recovery act funding) September 2009 – August 2011 (competitive renewal – US recovery act funding)
PDB Depositions By deposition and processing site *(projected) (*8322) By experimental type *
Increase of PDB data depositions from Asia and Oceania regions India New Zealand Australia Taiwan Hong Kong China
PDB Downloads * * 1st month after version 3.0/3.1 files released 1st month after version 3.15 files released
PDBe 37 million data downloads PDBj 14 million data downloads RCSB PDB 200 million data downloads PDB FTP Traffic (July 2008 - June 2009)
Remediation Rollout Complete PDB File Format Contents Guide Version 3.20 released (Sep 2008) PDB format version 3.15 files released (March 2009) New records/remarks: describe models, ligands and zero occupancy atoms/residues Enhancements: assemblies, SITE records, chain IDs, database references (including taxonomy id, PubMed, DOI IDs) added for all files Various corrections (including sequence, beamline, wavelength, atom connectivities, atom nomenclature, mmCIF consistency) Missing remarks restored
Common Tool for Deposition and Annotation Manage increased data load without an increase in resources Create global deposition and annotation tools Proof of concept delivered July 2009 First test system due June 2010
wwPDB Foundation • Will enable funding for wwPDBAC meetings and outreach activities • Bylaws created • Paperwork filed
Outreach Joint publications Meeting posters, exhibit booths, and presentations eChemInfo, EMBO Practical Course: Macromolecular Crystallography, Biophysical Society, Experimental Biology, Biocuration Meeting, ISMB/ECCB, Protein Society, ACA, ACS, AsCA Website Helpdesk At AsCa’09 Beijing Mitchell Guss Se Won Suh
Formalized Internal Communications Phone conferences among site directors Regular exchange visits Weekly VTC’s among staff Common Tool EM NMR Data annotation Electronic reporting (validation, structure assignments, FTP updates, etc.) wwPDB Retreat 2009
Common Deposition and Annotation (D&A) Tool Martha Quesada Worldwide Protein Data Bank www.wwpdb.org
2007 wwPDB Retreat What strategic objectives should we start to address now to meet our goals in the next 5 – 10 years? COST IMPACT
Drivers and Opportunities wwPDB Common D & A Tool Project New deposition types: multiple methods and new data New validation procedures Need to support higher throughput Limited resources for new development and maintenance Process inefficiencies Redundant tools in use: AutoDep and ADIT, data harmonization required Good collaborations among sites Possibility of sharing of workload Precedent for common tools for NMR and EM
Inflection Point Change the game New capabilities Usage, Impact Ready for today Business as usual Time Decision to come together as the wwPDB in developing the tools that will support the shared functions of the wwPDB for the next 10 years
wwPDB Common D&A Tool Project The goal is to implement a set of common deposition and annotation processes and tools that will enable the wwPDB to deliver a resource of increasingly high quality and dependability over the next 10 years. The tools will address: the increase in complexity and experimental variety of submissions and the increase in deposition throughput The processes and tools will maximize the efficiency and effectiveness of data handling and support for the scientific community
Project Scope Common deposition interface and processing Coordinates (x,y,z) regardless of experiment origin (X-ray, NMR, EM) NMR restraints, chemical shifts X-ray structure factors 3D-EM maps Data processing: Validation Tools Fit of model to data Coordinates for polymers and ligands NMR Restraints Structure factor validation EM map validation Partly developed by External Task Forces To be integrated over time
Assumptions & Constraints Functional requirements Deposition tool must handle all current, agreed upon data entry and report formats from the user community All data elements covered within the PDB annotation manual must be included Technical requirement Design will enable flexibility for growth and evolution Technical level reasonable standard, not bleeding edge or declining Design must enable integration with community data capture
For example Deposition will capture all currently deposited experimental data for each method The tool will support all data formats and validation requirements for all deposition types The system will allow for workload balancing during deposition
Common D&A Project Characteristics Large and complex – Break into smaller bits Distributed developers – Establish development controls and communication Some requirements well understood – Policies in place Some requirements evolving Iteration 1 Increment delivered Review lessons learned Iteration 2 Iteration n Etcetera… 2. 2. 2. 1. 1. 1. 3. 3. 3. 6. 6. 6. 4. 4. 4. 5. 5. 5. Potentially novel designs require early experimentation
Modern Requirements Analysis Process (workflow) driven: As is vs. To be What exactly do we actually do? Analyze current processes What works and where are the opportunities? What would be the ideal process? Alignment of requirements to workflow Functional requirements calculations, decision trees, reports, communication Data flow requirements Technical (strategic and tactical) requirements
Project Phases, Structure and Roles Steering Committee Governance Milestone reviews and guidance Concept Team Initial requirements and design Core Team (Functional leaders) Plan and manage the project Project Teams Design, develop and test component solutions Deliver the solution Initiation Concept Reqmt Design Develop & Test Delivery
Communication and Coordination Among all of the project stakeholders Inward facing Outward facing Between distributed functional and development groups
Initiation Concept • Steering Committee (wwPDB Directors) • October 2007 • Set project Project Goal, Scope, Assumptions and Constraints, initial timelines • Approves project at each milestone • Concept Team • November 2007 • Objectives, strategies and metrics • Stakeholder analysis & risk assessment • New system requirements • Concept process maps • Approved May 08
Objectives & Strategies Improve data quality beginning at data capture Provide for interactive feedback and value to the depositors during the deposition process Employ community-driven validation methods Improve efficiency Standardization, automation and more flexible data sharing Improve existing tools Use “best of breed” existing tools where possible Free resources to redevelop/develop new common tools Enable system maintenance and evolution system modularity
Initiation of Design and Planning The Core Team, representing the functional groups and sites, leads the project through design and implementation in conversation with the Steering Committee Reqmt Design Develop & Test RCSB PDB: John Westbrook, Jasmine Young; PDBe: Tom Oldfield, Sameer Velankar, Jawahar Swaminathan; BMRB: Steve Mading, Eldon Urlich; PDBj: Takanori Matssura
Project Team: Distributed • Quarterly face-to-face meetings • Weekly VTC team working meetings • On-going teleconferences and email • Shared web-based document and code management tools Reqmt Design Develop & Test Subject and technical experts from all sites Delivery
Key Design Elements • Modular construction through an API • Reuse of “best of breed” existing tools; redevelop tools as time and need dictate. • Enable system maintenance and evolution • Improved workflow efficiency for faster processing • Workflow automation - workflow engine and manager • Improved collaboration • More flexible data sharing • Proposed technical design and deliverables reviewed and approved by the Steering Committee
July 2009Technical Design Proof Of Concept Application Programming Interface: API “wrapped” application functionality Faster processing through improved efficiency workflow automation implementation Improved collaboration Snapmirror tested Potentially novel designs require early experimentation Workflow Engine Python Core API Layer C/C++ Apps Fortran Apps RDBMS Other Services
January 2010 - Production Deliverable Implementation of an annotation module Expansion of workflow proof of concept Implementation of the API using existing functionality and the “Master Format” Introduction of “Go Back” functionality Improved user interface Integrated with existing workflows. With GO BACK functionality
Project Progress July 2008: Initiate Design and Development Planning (Core Team) Nov 2008: Define Data Model Requirements (Project Team Meeting), Flesh out Design elements March 2009: Finalized design elements and initiate development of “proof of concept” July 2009: Deliver design “proof of concept” January 2010: First production deliverable Initiation Requirements Design Development Test Delivery Concept 2011 2010 4Q 2007 2008 2009
2011 2010 4Q 2007 2008 2009 wwPDB Common D&A Tool Project Timeline Going Forward Initiation Requirements Design Development Test Concept • Concept • Define deliverables • Initial design • Process definition • Data model definition • Requirements elaboration • Data flow documentation • Technical Design • Data Sharing & Replication • API, Master Format • Automated Workflow • Technical Proof of Concept • Development of initial production deliverable • Communication design • production deliverables • D&A system delivery Delivery
Ultimate Project Deliverables For Depositors Interactive and informative deposition interface Value added validation input and annotation during deposition Faster processing For Annotators Improve efficiency, freeing time for more advanced annotation Improved quality early in the process Automation of appropriate processing steps Best of breed tools Expanded functionality Enable system maintenance and evolution through system modularity For Data Users Higher Quality Archive
Method and Molecule-specific Activities John Markley Gerard Kleywegt Worldwide Protein Data Bank www.wwpdb.org
NMR Update Remediated NMR restraints project is near NMR Validation Task Force established First meeting held Sept 21, 2009, in Paris, France Implementation plan for Chemical Shifts requirement in progress Status of SMSDep
NMR Validation Task Force: Charge Advise on validation of new NMR data depositions Provide a report for the wwPDB AC Provide recommendations for structure validation criteria and tools Tools and procedures recommended should be freely available and simple to install and maintain so users can easily use in own laboratories Tools should not be used as a basis to “reject” structures, but to flag potential problems for the depositor/user to be aware of Recommendations should be assembled into a “white paper” for publication Recommendations should be targeted to software developers, depositors, journal editors, and PDB users
NMR Validation Task Force Committee Members Gaetano Montelione (Co-Chair, Rutgers) Michael Nilges (Co-Chair, Institut Pasteur) Ad Bax (NIH)* Peter Guentert (University Frankfurt) Torsten Herrmann (CNRS/ENS Lyon) Jane Richardson (Duke University) Charles Schwieters (NIH) Geerten Vuister (Radboud University)* David Wishart (University of Alberta) * Notes on the Paris meeting Ad Bax and Eldon Ulrich were unable to attend the meeting Jurgen Doreleijers attended as a substitute for Geerten Vuister Meeting Observers Naohiro Kobayashi (PDBj-BMRB) John Markley (NMR VTF Organizer) Randy Read (Chair, X-ray VTF) Eldon Ulrich (BMRB)* Wim Vranken (PDBe) John Westbrook (RCSB PDB)
NMR VTF: Outcome of first meetingSeptember 21, 2009; Paris, France General consensus on the value of expanded NMR validation for the scientific community. Consensus on coordination with X-ray VTF on common validation issues. Requirements and available tools for validation were assessed during the meeting. Areas targeted for further research: format consistency for restraints, treatment of internal dynamics and ensemble averaging. Website and mail archive created to support task force communication
Chemical Shifts: Progress in implementation BMRB has been the primary deposition and processing site for NMR chemical shift (CS) data Mandatory chemical shift and reference data items have been defined, and a prototype mandatory CS system is in place wwPDB to perform minimal processing: check format and sanity check at deposition substitute explicit atoms for pseudo-atoms maintain nomenclature correspondence during annotation Data files are to be transferred to BMRB for further annotation PDB will release chemical shift files in NMR-STAR format along with coordinate data files Download statistics for chemical shift data files will be be maintained for BMRB (needed for grant reporting)
SMSDep Deposition system is in place and accepting structures and associated NMR chemical shifts Current policy is to accept data only for small peptides or nucleic acids (processing and annotation is carried out at PDBj-BMRB) We need to monitor the level of activity to determine whether this site should be maintained
wwPDB X-ray Validation Task Force Initial meeting April 14-16, 2008 EBI, Hinxton, UK R. Read (Chair), P. Adams, A. Brunger, P. Emsley, R. Joosten, G. Kleywegt, E. Krissinel, T. Luetteke, Z. Otwinowski, T. Perrakis, J. Richardson, W. Sheffler, J. Smith, I. Tickle, G. Vriend Goal Gather recommendations and consensus on additional validation for PDB entries, and identify software applications for these validation tasks Provide code/algorithms for the validation-software pipeline Preliminary Outcome Candidate global and local validation measures were identified These measures were reviewed in terms of the requirements of depositors, reviewers, and users
X-ray Validation Task Force:Next Steps May 2008 - September 2009: discussions (e-mail, Gordon Conference) and report writing October 2009: Meeting to complete report during Cold Spring Harbor Laboratory Crystallography Course November 2009: Report presented at wwPDBAC wwPDB partners are pooling manpower to implement Task Force recommendations One dedicated programmer to implement the validation-software pipeline (Swanand Gore) Validation tools and procedures will also be incorporated in the new wwPDB Common Deposition and Annotation system
wwPDB X-ray Validation Task Force Apply new knowledge of structure proteins, nucleic acids, carbohydrates, ligands New opportunities from mandatory data fit to data, quality of data, pathologies Exploit new technologies machine-readable annotation Serve the different communities users, depositors, editors/referees