360 likes | 383 Views
Explore the significance of phases, correlation in Fourier space, and fitting strategies into cryo-EM maps using Refmac program. Learn about available tools and useful websites for macromolecular model building and refinement.
E N D
Refinement against cryo-EM maps Garib Murshudov MRC-LMB, Cambridge, UK
Contents Importance of phases Fit into EM maps: likelihood and restraints Overfitting Effect of oversharpening Form factors Brown et al, (2015) Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions. ActaCryst. D71, 136-153 Murshudov (2016) Refinement of Atomic Structures Against cryo-EM Maps, Methods in Enzymology. Ed. Crowter, V579, 277-306 Nicholls et al (2018) Current approaches for the fitting and refinement of atomic models into cryo-EM maps using CCP-EM . Acta Cryst. D74, 492-505
Useful websites (related to us and related to atomic model building and refinement) ccpem: http://www.ccpem.ac.uk/ ccpem website. In future all our developments about EM will be distributed on this website. They also develop GUI for EM tools. coot: http://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/ Coot website by Paul Emsley. All coot related tools are available from this website refmac: http://www2.mrc-lmb.cam.ac.uk/groups/murshudov/ You can find some useful info on this website. Some of the tools are still available from this website. Ccp4 is the better site for refmac related info.
Importance of phases It is well known that phases are very important. It is one of the examples demonstrating it. It may be not very good example but nevertheless … Can we show in terms of numbers the meaning of this statement?
Importance of phases Correlation between two densities: Correlation can be calculated in real or Fourier space. Advantage of Fourier space is that we can calculate it in Fourier shells or resolution bins. <F> = 0 in all resolution bins apart from that containing the origin. We almost never need the origin. Working in Fourier space means that we can analyse the quality of maps depending on resolution (or frequency). We also will use this fact: var(F) = <|F|2> So correlation in resolution bin for Fourier coefficients (complex coefficients) is
Importance of phases Using definition of E values: we can write an expression for Fourier Shell Correlation (which is same as map vs map correlation if all Fourier components are used) Now let us consider several cases: • Phases are perfect, amplitudes are perfect then cor=1.0 • Phases are random, amplitudes are perfect then cor=0
Importance of phases • Phases are perfect, amplitudes are random (i.e. not related to the structure we are solving) Now if we use the properties of the distribution of |F| then we get: i.e. correlation becomes: If phases are perfect then correlation between true and observed electron densities with random amplitudes can be as high as 0.785. If we do know that amplitudes are random then we can replace them with constant value equal to the the expected value of the amplitudes on this resolution bin. E value of a constant equal to 1. So in this case correlation will be: One obvious conclusion is that if we have phase information then unobserved E values can be replaced with the expected value. It is how “free-lunch” algorithm works
Importance of phases: amplitudes also matter Red – cos(Δφ) Cyan – corr(|F1|,|F2| If we take phases and replace amplitudes with random numbers then map correlations is reduced. This reduction is difference between black diagonal line and red line cor(F1,F2): complex
About REFMAC Refmac is a program for refinement of atomic models into experimental data It was originally designed for Macromolecular Crystallography It is based on some elements of Bayesian statistics: it tries to fit chemically and structurally consistent atomic models into the data. It also can fit atomic models into cryo-EM maps. It can do some manipulation of maps, e.g. sharpening/blurring It is available from CCP4 and CCPEM
Current approach to fitting into cryo-EM map • Mask the map around molecule to be refined • If there are more than one maps then use them to generated “composite” map • Calculate structure factors from masked map • Generate reference structure restraints • Generate RNA/DNA related restraints • Use local symmetry (helical, icosahedral or others) • Refine coordinates using masked map • Monitor refinement behaviour using average FSC • Put the molecule into the original map coordinate frame This procedure works but it is not entirely satisfactory. At the moment we do not use error model of “observed” data, we do not calculate difference maps, we do not calculate “improved” maps.
+ = Data Atomic model Fit and refine We want to fit currently available model into the data and calculate differences between them. To do this fit properly we must use as much as possible information about model and data.
Likelihood function for cryo-EM maps Probability distribution of “observed” structure factors given coordinates of a molecule: Fo “observed” structure factors calculated from EM map Fc calculated structure factors – from coordinates variance of signal variance of noise N normalisation Major difference from crystallography: 1) variance of noise is large at high resolution; 2) complex structure factors are available
Restraints: prior knowledge There are several restraints types: • Standard bond lengths, angles etc. • Restraints against high resolution homologues • Jelly body restraints • Secondary structure restraints • Covalent links • Base pairs and parallel planes • B value restraints
Overfitting What leads to overfitting? Insufficient data (low resolution, partial occupancy) Ignoring data (cutting by resolution) Sub-optimal parameterisation Bad weights Excess of imagination
Reference map sharpening Cross validation requires maps to be on the same sharpening level. Refmac can sharpen given map to a reference map. Usually reference map is taken as full reconstruction map. Fourier transformation (structure factor) from reference map is calculated and then average values of modules of structure factors in resolution shells are calculated. For given map average values of modules of Fourier coefficients in resolution bins scaled to reference maps coefficients
Validation of overfitting Not overfittedOverfitted FSCwork= model refined against half map 1, compared to half map 1 FSCfree= model refined against half map 1, compared to half map 2
Monitoring fit to density: FSCaverage • Measure of fit to density (analogous to crystallographic Rfactor) • Used to follow the progress of refinement • FSCaverage avoids a dependence on weight • FSC is calculated over resolution shells • If shells are sufficiently narrow the weights are roughly the same within each shell
FSC between model and map As a basic rule one can use relationship between half data fsc and map correlation between observed and calculated maps to see how far fitting into the map can go, or if there is an overfitting. Fo – Fourier coefficients from observed maps Fc – Fourier coefficients from calculated maps FT – Fourier coefficients from “true” maps fsc – half data Fourier shell correlation Note that these formulas are valid under certain conditions: noise is random, independent of signal, two halves are independent,
FSC between model and map If the model is perfect then cor(FT,Fc) = 1 and then relationship becomes even simpler When fsc = 0.143 then cor(Fo,Fc) = 0.5 Therefore it is recommended to use fsc = 0.143 as an indicator of resolution. In this case signal and noise variances are equal Correlation between observed and calculated map should be below the red curve. If it is higher than red curve then model has been over-fitted
Meaning of average FSC Average fsc could be interpreted as: If average fsc would be 1 then the right side would be equal to 1. Essentially average fsc means that how much the effective resolution is reduced. We can calculate the effective resolution as: I.e. average fsc indicates how much the resolution of the data is reduced. Average fsc between map and model would mean what is the extend of common signal between map and model.
Map details, resolution and sharpening The paper by Wlodawer, Li, Dauter (2017) High-Resolution Cryo-EM maps and models: A Crystallographer’s Perspective” Structure, 25, 1-9 hightlights the problem of lack of clarity in definition of resolution. The same problem exist in crystallography: if we go to very low signal to noise ratio then how much do we gain? Do we add anything useful to the resolvability of the peaks.
Map details, resolution and sharpening Map details depend on the nominal resolution (i.e. sphere in Fourier space within which Fourier coefficients have been observed), mobility of the molecules (signal level) and noise in the map. In cryoEM reconstructions we can expect that noise level is more or less constant within the molecule. Nominal resolution is also constant for whole map. However signal level changes over the molecule (different mobility, occupancy). One side effect is that when the same sharpening is used over whole map noise may mask out signal where signal is relatively small.
<|F|> vs resolution Map from EMDB Blurred with B value 40 and 60 Bartesaghi et al, Science, 2015
Map from PDB Blurred: B = 60 Model refined against blurred map
Form factors for electrons Electrons diffract from electrostatic potential. Charges and electrostatic potential are related with each other via Poisson equation in real space and Mott-Bethe formula in reciprocal space: Atoms in molecule have certain B values. They need to be accounted for. Using Moth-Bethe formula means that we can calculate contribution from nuclear and electron separately (two centrehydrogens can be accounted for)
Electron scattering factor In principle using electron diffraction should allow refinement of charges, but data must be very accurate (we can use local neutrality restraints) Where o – overall occupancy, c – charge occupancy (charge occupancy refinement has not been implemented yet) Screening of charges might need to be accounted for. : For total Fourier coefficient (with total screening coefficient, I have implemented it but not tested):
Form factors Form factors have been tabulated for X-ray scattering. For tabulations isolated atoms are used. They do not account for bonding electrons effect. Peng et al converted X-ray form factors to electron form factors using Moth-Bethe formula – as sum of Gaussians. We need to update form factors. We need to calculate form factors for each atom (chemical) type. For example sp1, sp2, sp3 carbons should have different form factors.
Some additional features of refmac Calculate multiple sharpened and blurred maps Divide and conquer algorithm for large molecules Vector difference map (for ligand detection) Refinement against composite
Conclusions Cross-validation is a problem: refinement against half data maps might be useful Phases are important, however amplitudes do matter also Oversharpening can obscure features of the map. Multiple maps with different sharpening levels should be used for model building. Form factor for electron scattering must be improved
Acknowledgements MRC-LMB CCP-EM Fei Long Tom Burnley Rob Nicholls Martyn Winn Oleg Kovalevskiy Paul Emsley Michal Tykach RanganaWarshamanage Alan Brown Richard Henderson VenkiRamakrishnan Jan Löwe Many, many EM users and people of LMB People of CCP4 Users of our programs