150 likes | 158 Views
This workshop discusses the possibilities of using Phenix.Refine for refining powder diffraction data, focusing on the challenges and potential solutions. Topics include the modular design of the software and the development of powder-specific target functions.
E N D
Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography InitiativeLawrence Berkeley National Laboratory Workshop on developments and directions of powder diffraction on proteins, June 22/23, 2007
My two lives • Live 1 (PhD project): • Zeolite structure determination frompowder data using extracted intensities • Live 2: • Contributions to Xplor/CNS • Single-crystal protein crystallography • About 80% of all PDB entries refined with Xplor/CNS • Phenix project • Fresh start after losing a legal battle
Computational Crystallography Initiative (LBNL) • Paul Adams, Ralf Grosse-Kunstleve, Pavel Afonine • Nigel Moriarty, Nicholas Sauter, Peter Zwart Los Alamos National Lab (LANL) • Tom Terwilliger, Li-Wei Hung Cambridge University • Randy Read, Airlie McCoy Texas A&M University • Tom Ioerger, Jim Sacchettini, Erik McKee Duke University • Jane Richardson, David Richardson, Ian Davis Phenix Collaboration CCI APPS SOLVE / RESOLVE PHASER TEXTAL MolProbity / REDUCE Funding: NIH Program Project (NIGMS, PSI), Director - Paul Adams
Spectrum of phenix components • Automated analysis of data quality: phenix.xtriage • Rapid substructure determination: phenix.hyss • Phasing: Maximum likelihood – SOLVE, PHASER for SAD • Density modification: Statistical density modification (RESOLVE) • Automated model building: • Pattern matching methods (RESOLVE or TEXTAL) • Structure refinement: phenix.refine (likelihood, annealing, TLS) • Advanced automation: AutoSol – hkl to map • Ligand building and fitting: eLBOW, AutoLigand • Validation and Hydrogens: MolProbity + Reduce
phenix.refine - Restrained refinement (xyz, iso/aniso ADP) - Automatic water picking - Bond density - Unrestrained refinement • FFT or direct summation • Hydrogens - Group ADP refinement - Rigid body refinement - Automatic NCS restraints - Simulated Annealing - Occupancies (individual, group) - TLS refinement - Twinned data • X-ray, Neutron, joint X-ray + Neutron refinement
Refinement flowchart PDB model, Any data format (CNS, Shelx, MTZ, …) Input data and model processing Refinement strategy selection Bulk-solvent, Anisotropic scaling, Twinning parameters refinement Ordered solvent (add / remove) Target weights calculation Coordinate refinement (rigid body, individual) (minimization or Simulated Annealing) ADP refinement (TLS, group, individual iso / aniso) Occupancy refinement (individual, group) Output: Refined model, various maps, structure factors, complete statistics Repeated several times Files for COOT, O, PyMol
Designed to be very easy to use Refinement of individual coordinates and B-factors: % phenix.refinemodel.pdbdata.hkl Same as above plus water picking: % phenix.refinemodel.pdbdata.hkl ordered_solvent=true Run with parameter file: % phenix.refinemodel.pdbdata.hkl parameter_file refinement.main { high_resolution = 2.0 simulated_annealing = True ordered_solvent = True number_of_macro_cycles = 5 } refinement.refine.adp { tls = chain A tls = chain B }
How to best make ends meet? • GSAS & proteins • Extending a small-molecule powder program to deal with proteins • Advantage: program designed for the field • Community used to inputs, outputs, idiosyncrasies • Disadvantage: some approaches suitable for small molecules don’t scale • Direct-summation structure factor calculation • Neighborhood calculations (nonbonded interactions, a.k.a. anti-bumping restraints) • phenix.refine • Extending a single-crystal protein program to deal with powders • Advantage: program designed to deal with large structures • Protein, RNA/DNA restraint libraries, optimized algorithms • Disadvantage: new data formats, differences in terminology
Two main challenges • Challenge 1: • Input/output of powder-specific format • Fundamentally trivial but potentially tedious • New command? • No interference with existing, non-trivial algorithms for automatic recognition, processing, and consolidation of already very heterogeneous inputs • Extend the existing input algorithms? • Nicer, but requires higher degree of collaboration • Challenge 2: • Development of a powder-specific target function • Based on extracted intensities or primary pattern + pre-fitted profile parameters? • Maximum likelihood with or without cross-validation? • Will probably require some refactoring of the refinement engine
Modular design • Application level • phenix wizards (data in, structure out) • phenix.refine • phenix.hyss (hybrid substructure search) • Visible source • Library level • cctbx project, organized in modules • libtbx, scitbx, cctbx, iotbx, mmtbx • cctbx is intended to cover small-molecule work • But nothing yet specific to powders • Unrestricted open source
Existing target functions • Least-squares (variety) • Maximum likelihood on amplitudes • Maximum likelihood with experimental phases • Least-squares twin target • SAD-specific maximum likelihood target implemented in Phaser • Reusing target from external application! • Dirty laundry • Severe code duplication in implementation of twin target • Needs to be consolidated • Some friction integrating the Phaser ML-SAD target • Phaser target relatively slow: we need better bookkeeping to avoid repeated calculations with exactly the same input
Precedence for reusing cctbx? • cctbx used heavily by all phenix collaborators • Phaser uses cctbx -> cctbx supported by CCP4 6.0 and up • smtbx: small-molecule toolbox • Group at Durham University, U.K. collaborating with David Watkin at Oxford University, U.K. • Long-term goal: highly integrated single-crystal structure determination (direct methods), automatic model building and refinement • Initial focus: iterative model building and refinement • Initial approach: reuse + adjust cctbx core libraries directly combined with copying sub-modules to smtbx where they are modified • Long term: consolidate duplications as much as possible • half the code = half the bugs, reuse of optimizations
Summary of ideas • Implement powder-specific target function(s) that plug into the refinement engine in the open source cctbx libraries • Can be done stand-alone using ad-hoc input/output methods • Collaborate in making the necessary adjustments to the existing libraries • Figure out the best way to handle input/output at the application level • Learn and re-evaluate as we go • If the powder field joins in there will be the potential for direct cross-fertilization between three specializations in crystallography • Single-crystal protein • Single-crystal small-molecule • Powder diffraction protein • More? (powder diffraction small-molecule) • cctbx libraries are very general • Ever increasing integration is the secret behind the stunning successes in the development of computing technology • Can we make this idea work in crystallography?
Availability • Phenix incl. Graphical User Interface • http://www.phenix-online.org/ • Freely available to academic (non-profit) groups • Core libraries (cctbx) • http://cctbx.sourceforge.net/ • Freely available to all
Acknowledgments • Phenix developers • P.D. Adams • P. Afonine • T.R. Ioerger • A.J. McCoy • E.W. McKee • N.W. Moriarty • R.J. Read • N.K. Sauter • J.N. Smith • L.C. Storoni • T.C. Terwilliger • P.H. Zwart • Funding: • LBNL (DE-AC03-76SF00098) • NIH/NIGMS (1P01GM063210) • PHENIX Industrial Consortium