Reusing phenix.refine for powder data?

Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography InitiativeLawrence Berkeley National Laboratory Workshop on developments and directions of powder diffraction on proteins, June 22/23, 2007

My two lives • Live 1 (PhD project): • Zeolite structure determination frompowder data using extracted intensities • Live 2: • Contributions to Xplor/CNS • Single-crystal protein crystallography • About 80% of all PDB entries refined with Xplor/CNS • Phenix project • Fresh start after losing a legal battle

Computational Crystallography Initiative (LBNL) • Paul Adams, Ralf Grosse-Kunstleve, Pavel Afonine • Nigel Moriarty, Nicholas Sauter, Peter Zwart Los Alamos National Lab (LANL) • Tom Terwilliger, Li-Wei Hung Cambridge University • Randy Read, Airlie McCoy Texas A&M University • Tom Ioerger, Jim Sacchettini, Erik McKee Duke University • Jane Richardson, David Richardson, Ian Davis Phenix Collaboration CCI APPS SOLVE / RESOLVE PHASER TEXTAL MolProbity / REDUCE Funding: NIH Program Project (NIGMS, PSI), Director - Paul Adams

Spectrum of phenix components • Automated analysis of data quality: phenix.xtriage • Rapid substructure determination: phenix.hyss • Phasing: Maximum likelihood – SOLVE, PHASER for SAD • Density modification: Statistical density modification (RESOLVE) • Automated model building: • Pattern matching methods (RESOLVE or TEXTAL) • Structure refinement: phenix.refine (likelihood, annealing, TLS) • Advanced automation: AutoSol – hkl to map • Ligand building and fitting: eLBOW, AutoLigand • Validation and Hydrogens: MolProbity + Reduce

phenix.refine - Restrained refinement (xyz, iso/aniso ADP) - Automatic water picking - Bond density - Unrestrained refinement • FFT or direct summation • Hydrogens - Group ADP refinement - Rigid body refinement - Automatic NCS restraints - Simulated Annealing - Occupancies (individual, group) - TLS refinement - Twinned data • X-ray, Neutron, joint X-ray + Neutron refinement

Refinement flowchart PDB model, Any data format (CNS, Shelx, MTZ, …) Input data and model processing Refinement strategy selection Bulk-solvent, Anisotropic scaling, Twinning parameters refinement Ordered solvent (add / remove) Target weights calculation Coordinate refinement (rigid body, individual) (minimization or Simulated Annealing) ADP refinement (TLS, group, individual iso / aniso) Occupancy refinement (individual, group) Output: Refined model, various maps, structure factors, complete statistics Repeated several times Files for COOT, O, PyMol

Designed to be very easy to use Refinement of individual coordinates and B-factors: % phenix.refinemodel.pdbdata.hkl Same as above plus water picking: % phenix.refinemodel.pdbdata.hkl ordered_solvent=true Run with parameter file: % phenix.refinemodel.pdbdata.hkl parameter_file refinement.main { high_resolution = 2.0 simulated_annealing = True ordered_solvent = True number_of_macro_cycles = 5 } refinement.refine.adp { tls = chain A tls = chain B }

How to best make ends meet? • GSAS & proteins • Extending a small-molecule powder program to deal with proteins • Advantage: program designed for the field • Community used to inputs, outputs, idiosyncrasies • Disadvantage: some approaches suitable for small molecules don’t scale • Direct-summation structure factor calculation • Neighborhood calculations (nonbonded interactions, a.k.a. anti-bumping restraints) • phenix.refine • Extending a single-crystal protein program to deal with powders • Advantage: program designed to deal with large structures • Protein, RNA/DNA restraint libraries, optimized algorithms • Disadvantage: new data formats, differences in terminology

Two main challenges • Challenge 1: • Input/output of powder-specific format • Fundamentally trivial but potentially tedious • New command? • No interference with existing, non-trivial algorithms for automatic recognition, processing, and consolidation of already very heterogeneous inputs • Extend the existing input algorithms? • Nicer, but requires higher degree of collaboration • Challenge 2: • Development of a powder-specific target function • Based on extracted intensities or primary pattern + pre-fitted profile parameters? • Maximum likelihood with or without cross-validation? • Will probably require some refactoring of the refinement engine

Modular design • Application level • phenix wizards (data in, structure out) • phenix.refine • phenix.hyss (hybrid substructure search) • Visible source • Library level • cctbx project, organized in modules • libtbx, scitbx, cctbx, iotbx, mmtbx • cctbx is intended to cover small-molecule work • But nothing yet specific to powders • Unrestricted open source

Existing target functions • Least-squares (variety) • Maximum likelihood on amplitudes • Maximum likelihood with experimental phases • Least-squares twin target • SAD-specific maximum likelihood target implemented in Phaser • Reusing target from external application! • Dirty laundry • Severe code duplication in implementation of twin target • Needs to be consolidated • Some friction integrating the Phaser ML-SAD target • Phaser target relatively slow: we need better bookkeeping to avoid repeated calculations with exactly the same input

Precedence for reusing cctbx? • cctbx used heavily by all phenix collaborators • Phaser uses cctbx -> cctbx supported by CCP4 6.0 and up • smtbx: small-molecule toolbox • Group at Durham University, U.K. collaborating with David Watkin at Oxford University, U.K. • Long-term goal: highly integrated single-crystal structure determination (direct methods), automatic model building and refinement • Initial focus: iterative model building and refinement • Initial approach: reuse + adjust cctbx core libraries directly combined with copying sub-modules to smtbx where they are modified • Long term: consolidate duplications as much as possible • half the code = half the bugs, reuse of optimizations

Summary of ideas • Implement powder-specific target function(s) that plug into the refinement engine in the open source cctbx libraries • Can be done stand-alone using ad-hoc input/output methods • Collaborate in making the necessary adjustments to the existing libraries • Figure out the best way to handle input/output at the application level • Learn and re-evaluate as we go • If the powder field joins in there will be the potential for direct cross-fertilization between three specializations in crystallography • Single-crystal protein • Single-crystal small-molecule • Powder diffraction protein • More? (powder diffraction small-molecule) • cctbx libraries are very general • Ever increasing integration is the secret behind the stunning successes in the development of computing technology • Can we make this idea work in crystallography?

Availability • Phenix incl. Graphical User Interface • http://www.phenix-online.org/ • Freely available to academic (non-profit) groups • Core libraries (cctbx) • http://cctbx.sourceforge.net/ • Freely available to all

Acknowledgments • Phenix developers • P.D. Adams • P. Afonine • T.R. Ioerger • A.J. McCoy • E.W. McKee • N.W. Moriarty • R.J. Read • N.K. Sauter • J.N. Smith • L.C. Storoni • T.C. Terwilliger • P.H. Zwart • Funding: • LBNL (DE-AC03-76SF00098) • NIH/NIGMS (1P01GM063210) • PHENIX Industrial Consortium

Reusing phenix.refine for powder data?