1 / 44

Likelihood and automation in Phaser

Understand likelihood concept, its application in crystallography, and automation in Phaser programming. Explore how likelihood enhances model consistency with data, optimizing model parameters, and its use in molecular replacement and phased error elimination.

dgoodloe
Download Presentation

Likelihood and automation in Phaser

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Likelihood and automation in Phaser R J Read, Department of Haematology Cambridge Institute for Medical Research

  2. Likelihood and automation in Phaser • Likelihood • background, use in crystallography • Molecular replacement • SAD phasing • log-likelihood-gradient maps • SAD phasing from partial model • bootstrapping from MR solution • iterative phasing

  3. 1/34=1/81 3 ? Concept of likelihood • Likelihood with dice Roll 2,3,1,1. Which die? 6 4 8 10 p(4)=1/44=1/256 p(6)=1/64=1/1296 p(8)=1/84=1/4096 p(10)=1/104=1/10000

  4. Principle of maximum likelihood • How consistent is the model with the data? • What is the probability that the data would be measured if the model were correct? • Optimise model by adjusting parameters in probability distribution • parameters include variances (sources of error)

  5. Illustration of likelihood • Random data with Gaussian distribution • Mean? Variance? 2 3 4 5 6 7 8 Model parameters: mean = m, variance = s

  6. Illustration of likelihood m=4, s=1 m=6, s=1

  7. Illustration of likelihood m=5, s=0.5 m=5, s=2

  8. Illustration of likelihood m=5, s=1

  9. Least squares and likelihood • Most experiments have multiple sources of error: Gaussian error in observations • Central Limit Theorem • Likelihood for Gaussians = least squares

  10. Least-squares line fitting A [S]

  11. Why not least squares in crystallography? • Gaussian error for observations • Error in predicting observation generally includes difference between structure factors • this is Gaussian in phased difference • e.g.Fvs.FC from model, FPvs.FPH • Phased error usually dominates • elimination of unknown phase changes probabilities

  12. Applying likelihood to crystallography • Find probability distribution for observations • start from structure factor probabilities • eliminate unknown phase angles • Adjust parameters to optimise likelihood Applications: • calculating model phase probabilities • structure refinement • experimental phasing (isomorphous/anomalous) • likelihood-based molecular replacement

  13. The Central Limit Theorem • Probability distribution of a sum of independent random variables tends to be Gaussian • regardless of distributions of variables in sum • Conditions: • sufficient number of independent random variables • none may dominate the distribution • Centroid (mean) of Gaussian is sum of centroids • Variance of Gaussian is sum of variances

  14. Effect of atomic errors • Atomic errors give “boomerang” distribution of possible atomic contributions • Portion of atomic contribution is correct Bragg Plane Bragg Plane

  15. Structure factor with coordinate errors • Same direction as the sum of the atomic f • but shorter by 0< D <1 • D=f(resolution) • Central Limit Theorem • Gaussian distribution for the total summed F • sD=f(resolution) FC sD DFC F

  16. Amplitude probability distribution • Integrate over unknown phase angle to get Rice (Luzzati, Sim, Srinivasan) distribution

  17. Rotation likelihood function • What structure factors could be obtained from an oriented model? • add up contributions from symmetry-related molecules, but unknown relative phase

  18. Likelihood-based molecular replacement • Molecular replacement likelihood functions • account for expected coordinate error in model • account for missing components • exploit knowledge from partial solution • More sensitive than previous methods • succeeds with more distant homologues • succeeds with more components to find

  19. Likelihood and automation • Automated decisions require reliable scores • Likelihood provides semi-absolute score • compare different models against same data • likelihood should increase for better model • more accurate, more complete or more detailed

  20. Programming for automation • Phaser developed in C++ • Different modes of operation • modes can call other modes • Functions exported to Python • run Phaser from Python scripts that can use functionality from other packages • e.g. AutoMR wizard in Phenix

  21. Selected Data Anisotropy Correction 2nd and subsequent models Fast Rotation Functions RF peak selection criteria Fast Rotation Function 1st model Fast Translation Functions RF peak selection criteria loop over models TF peak selection criteria Best RF solutions for 1st model Packing Packing criteria Fast Translation Functions Refinement and Phasing TF peak selection criteria loop over space-groups Packing Best solutions for complete structure Packing criteria All Data Anisotropy Correction Refinement and Phasing Refinement and Phasing Best TF solutions for 1st model Best spacegroup .pdb files .mtz files .sol files

  22. A31P mutant of ROP: four helix bundle • Originally solved by 23-dimensional Monte Carlo search with four copies of poly-Ala helix • space group C2 • helix = 15% of protein • Glykos & Kokkinidis (2003) • Can be solved in minutes by Phaser

  23. Helix 1 Helix 2 Helix 3 Helix 4 Data to 2.9Å Anisotropy 15.4Å2 24 (12*) RF/TF 307 (283*) RF/TF 6 (1*) RF/TF 32 (20*) RF/TF 3 (1*) Pack 68 (64*) Pack 6 (1*) Pack 22 (17*) Pack 3 (1*) Refined 24 (2*) Refined 6 (1*) Refined 8 (1*) Refined *best .pdb files .mtz files

  24. Pushing the limits of molecular replacement • Investigate the use of smaller fragments • helices, subdomains • Extend the limits of homology (David Baker) • use ab initio models from Rosetta • (Qian et al., Nature450: 259-264, 2007) • improve homology modeling before MR • increase convergence radius for refinement after MR • pilot project: angiotensinogen • Apply concepts to NMR structure solution (Ernest Laue)

  25. Likelihood-based SAD phasing • Conventional SAD phasing uses a least-squares term • New SAD likelihood function developed using multivariate statistics

  26. SAD likelihood function • Fix structure factors calculated from model • Factor joint probability into two parts • Integrate out unknown phases, a + and a -

  27. Intuitive understanding of SAD phasing Expected value of F-* (H-*) Expected difference between F+ and F-*

  28. Intuitive understanding of SAD phasing Expected difference between F+ and F-* Expected value of F-* (H-*) Total likelihood is integral of the product of the two distributions under the black circle

  29. Absolute scaling • SAD target uses real (partial structure) scattering and anomalous scattering • best results if f’’ known precisely • helps to have data on absolute scale • use BEST data from Sasha Popov • average intensities as function of resolution • get Wilson B-factor, absolute scale • have to define composition of crystal

  30. Breakdown of Friedel’s law • Friedel’s law breaks down for mixture of scatterers differing in real:anomalous ratio • SAD target can distinguish hand for model with mixture of scattering types

  31. SAD log-likelihood gradient (LLG) map • Compute derivative of log-likelihood wrt heavy atom structure factor • opposite phase shifts for plus and minus hands • Fourier transform gives map of where likelihood target would like to see changes in anomalous scatterer model • Very sensitive to minor sites • picks up sites identified as water molecules in refined structures determined by halide soaks

  32. Locating anomalous scatterers in model solved by MR • Structure of thyroxine-binding globulin • Thyroxine doesn’t bind in accepted site • only 2.8Å resolution, but thyroxine contains 4 iodine atoms • data collected at Daresbury SRS with l=0.979Å • f’’  3e • Compare conventional model-phased anomalous difference map with Phaser LLG map

  33. mol 1 Dano, 3.5s LLG, 5.5s mol 2

  34. Iterative model-building with SAD • Nitrate reductase structure • integral membrane protein, 1976 residues • contains 21 Fe atoms, 1 Mo, 113 S • solved by Natalie Strynadka, using combination of Fe-MAD, MIRAS • Fe peak SAD data • find 11 “Fe” sites with phenix.hyss • several are super-sites of Fe4S4 clusters • phase and complete adding Fe with Phaser • total of 38 sites, some of which are S atoms • still ghosts of super-sites

  35. Round 1 of iterative model-building • Improve phases by density modification • Build with ARP/wARP (Resolve also works…) • 798 residues, 18 docked in sequence • LLG completion in Phaser, using partial polyAla model • Fe sites are now perfectly resolved

  36. Convergence of iterative model-building • LLG maps are better than random at identifying atom type • resolve any ambiguities by refined occupancy • Converges after 5 cycles • anomalous scatterer model from Phaser has 21 Fe, 1 Mo, 84 of 113 S • 1392/1976 residues, 731 docked in sequence • Could do better by preserving anomalous scatterers in refined models, refining against SAD likelihood target

  37. Automation of SAD phasing • Functions are all available from Python • part of AutoSolve wizard in Phenix • could run directly from HySS • will be a refinement target for phenix.refine • can run from HAPPy (CCP4) • Log-likelihood-gradient completion • look for one or several types of scatterer • start from MR model or partial substructure • analyse map to add sites, make atoms anisotropic • delete atoms that fade away • repeat to convergence

  38. Future plans for experimental phasing • Account for translational NCS • Bi-wavelength anomalous diffraction (BAD) phasing • MIRAS • Account for radiation damage

  39. Contributors • Molecular replacement • Airlie McCoy, Laurent Storoni • SAD phasing • Raj Pannu, Airlie McCoy, Laurent Storoni • BEST data • Sasha Popov • ccp4i GUI • Anne Baker, Peter Briggs • PHENIX collaboration • Ralf Grosse-Kunstleve, Nigel Moriarty, Paul Adams • Tom Terwilliger (Wizards)

  40. Sponsors • Wellcome Trust • crystallographic theory and methods • structures of proteins relevant to pathogenesis • NIH • PHENIX package for automated crystallography • Paul Adams, Tom Terwilliger, David & Jane Richardson • implementation of likelihood-based methods • CCP4 • GUI development for Beast and Phaser

More Related