1 / 26

RECOORD RE calculated COOR dinates D atabase

RECOORD RE calculated COOR dinates D atabase. Jurgen Doreleijers Center for Eukaryotic Structural Genomics University of Madison-Wisconsin jurgen@bmrb.wisc.edu. Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University a.j.nederveen@chem.uu.nl. Wim Vranken

tamarr
Download Presentation

RECOORD RE calculated COOR dinates D atabase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RECOORDREcalculated COORdinates Database Jurgen Doreleijers Center for Eukaryotic Structural Genomics University of Madison-Wisconsin jurgen@bmrb.wisc.edu Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University a.j.nederveen@chem.uu.nl Wim Vranken Macromolecular Structure Database European Bioinformatics Institute wim@ebi.ac.uk

  2. Aim • Recalculation of protein structures based on deposited NMR restraints using state of the art methods • Goals: • decrease user- and software-dependent biases • allow a bettercomparison between structures • comparison between different structure calculation programs • provide a database for the development and assessments of validation tools and calculation protocols

  3. Overview recalculation project EBI/UU: Generation of consistent STAR files PDB: -coordinates -restraints BMRB: STAR files Doreleijers et al. 2003 2 3 1 restraint manipulation CNS -topology -MD SA -refinement CYANA -sequence -MD SA -… 5 4 recalculation design of RECOORD 6 analysis analysis -improvement? -correlations? -…

  4. Databases now publicly available • DOCR/FRED (BMRB) databases containing converted and filtered restraints http://www.bmrb.wisc.edu/servlets/MRGridServlet • RECOORD (EBI) database containing recalculated coordinates http://www.ebi.ac.uk/msd/recoord

  5. PDB: -coordinates -restraints BMRB: STAR files Doreleijers et al. 2003 2 1 Selection • Formats (if distance restraints available): • CNS/XPLOR • DIANA/DYANA/CYANA • DISCOVER/MSI • PDB entries selected: • only proteins • no HET atoms • multimers allowed (not yet re-calculated) • at least 20 residues • Finally 545 monomers were selected

  6. EBI/UU: Generation of consistent STAR files 3 Conversion issues • Data is converted to formats readable by calculation software (e.g. XPLOR/CNS and CYANA) by the FormatConverter available within CCPN software (Wim Vranken, EBI). Problems: • Differences between coordinate and restraint data: • e.g. 1 chain in pdb entry, 2 chains in restraint list • residue numbering can differ in PDB entry and restraint list • restraints for residues not present in PDB entry… • Nomenclature in restraint list

  7. CNS -topology -MD SA -refinement CYANA -sequence -MD SA -… 5 4 Building topology • Starting script: generate_easy.inp from CNS • Automated detection in original ensemble of: • Disulfide bridges (<3Å S-S distance in original first models) • CIS peptides (if |w|<25º in original first models) • Protonation state of histidines (use CNS patches HISD, HISE) • CYANA: sequence based on CNS topology • Add CYSS, HIST, HIST+, cPRO in sequence • Automated generation of disulfide restraints

  8. CNS -topology -MD SA -refinement CYANA -sequence -MD SA -… 5 4 CONDOR computer cluster CS University Madison • More than 800 processor used • Total CPU time: 31,169 hours (3.5 years on single workstation) • Example 2EZM, calculation of 1 model (101 a.a. & 2.2 GHz P4 computer) CYANA 31 seconds CNS 340 seconds

  9. 6 analysis -improvement? -correlations? -… Evaluation of structure quality • Agreement with experimental restraints • Improvement? • Comparison CNS and CYANA • Relation NMR data quality and structural quality

  10. 6 analysis -improvement? -correlations? -… Distance restraints violations ORG: 0.08 Å (0.14 Å) original entries CNW: 0.04 Å (0.05 Å) recalculated in CNS and refined in water frequency RMS distance restraints violations (Å)

  11. 6 analysis -improvement? -correlations? -… Dihedral restraints violations ORG: 1.6° (4.6°) original entries CNW: 0.5° (0.5°) recalculated in CNS and refined in water frequency RMS dihedral restraints violations (degrees)

  12. 6 analysis -improvement? -correlations? -… Results: quality indicatorsperformance CNS vs. CYANA (no water refinement yet)

  13. 6 analysis -improvement? -correlations? -… Results: quality indicatorsperformance CNS before and after water refinement

  14. 6 analysis -improvement? -correlations? -… Improvement: packing and Ramachandran Z-scores Improvent Z-score: DZ=Zrefined - Zoriginal For ~ 5 % of entries no improvement possible because of missing NMR data compared to authors improvement Ramachandran missing data improvement packing

  15. 6 analysis -improvement? -correlations? -… In search of correlations (Pearson coefficient) refined (correlations higher) original (correlations lower)

  16. 6 analysis -improvement? -correlations? -… In search of correlations (Bumps) refined original

  17. 6 analysis -improvement? -correlations? -… In search of correlations (NMR data density) refined original

  18. 6 analysis -improvement? -correlations? -… Correlation NMR data density Ramachandran Z-score r=0.31 Ramachandran Z-score NMR data density

  19. 6 analysis -improvement? -correlations? -… Correlation NOE completeness and packing Z-score r=0.20 NMR data-based indicators cannot yield any indication of the normality of the structures packing Z-score NOE completeness

  20. 6 analysis -improvement? -correlations? -… In search of correlations (Precision) refined original

  21. 6 analysis -improvement? -correlations? -… Correlation between precision and data density r=-0.46 circular variance NMR data density

  22. 6 analysis -improvement? -correlations? -… Correlation between precision and Ramachandran r=-0.67 Protein with high Ramachandran normality will have small circular variance circular variance 1SUT Ramachandran plot appearance (Z-score)

  23. 6 analysis -improvement? -correlations? -… Correlation between RMSD and structural uncertainty (QUEEN) r=-0.69 Structural uncertainty imposes lower limit to the RMSD backbone RMSD (Å) structural uncertainty

  24. Conclusions I • NMR-STAR files made consistent for 545 out of ±1700 entries • Protocols and scripts available for recalculation in CYANA and CNS • Validation database available for testing of new protocols • Improvement compared to original data: 1 standard deviation closer to X-ray db • violations in original data do no limit recalculation effort • refinement in water required • 5 % no improvement: data missing

  25. Conclusions II • Correlations higher after recalculation and refinement, though most of them still weak • Highest correlation: precision vs. Ramachandran score & structural uncertainty (QUEEN)

  26. Acknowledgements • Utrecht University Alexandre Bonvin Rob Kaptein • EBI Cambridge Wim Vranken • CESG/BMRB Jurgen Doreleijers Zachary Miller Eldon Ulrich John Markley • Radboud University Nijmegen Chris Spronk Sander Nabuurs • RIKEN Japan Peter Güntert • Institut Pasteur Paris Michael Nilges

More Related