440 likes | 601 Views
Model Building, Refinement, and Validation. What can one see?. will determine what can be ascertained will determine which parameters can be refined resolution-dependent note about maps: contoured in standard deviations ( s ) from the mean (which is 0.0)
E N D
What can one see? • will determine what can be ascertained • will determine which parameters can be refined • resolution-dependent • note about maps: • contoured in standard deviations (s) from the mean (which is 0.0) • experimental-type maps contoured at 1-1.25s • difference maps contoured at ~±3s
Fitting a Model into Density • start by tracing the backbone • a-helices easiest to identify, b-sheets are harder • sometimes loops might be untraceable until later in the process (or never) • side chains come later • check known rotamers first • many rounds of rebuilding might be necessary
How does this Process Progress? • first build is usually mostly backbone, some side chains • cycles of building and refinement until model ceases to improve • waters, ion, ligands are typically added towards the end of the process
Refinement • based on what I see, what can I refine?
Constraints and Restraints • used to overcome poor data:parameter • atoms not "free" improved convergence • geometric restraints: • bond lengths and angles found in protein structures well-known from small molecule x-ray crystallography • penalize excessive deviations from these values • planarity restraints for rings and planar end groups
Non-crystallographic Symmetry • restraint: each copy of molecule in asymmetric unit must have rmsd for all atoms below user-defined value when compared with each other • constraint: all copies in the asymmetric unit must be identical • are the molecules identical? • strong density in averaged map a clue • e.g. (1.5 + 1.8)/2 = 1.65; (1.5 + 0.0)/2 = 0.7
B: The Temperature Factor • describes mean displacement from average position • higher B = more mobile = less well ordered • mx = my = mz if B is isotropic, need 3x3 matrix if B is anisotropic
Evaluation of Refinement • R-factors (Rwork and Rfree): • Rfree is the same as Rwork but calculated for a percentage of the data (5-10%) not included in the refinement • if model really improves, Rfree should decrease along with Rwork
Rules of Thumb for Rwork and Rfree • depends on resolution • for most structures, Rfree should be less than ~28%, and the spread between Rwork and Rfree should be ~5% or less • very low resolution structures (3.5Å and lower) might not conform to this • careful not to overstate conclusions
Why are R-factors so High? • geometrical restraints not sophisticated enough to find true minimum (R ~ error in data, or 4-9%) • R-factors higher at lower resolution because less data but same number of geometrical parameters to be satisfied • result is numerous very small errors (0.01-0.1Å) in coordinate positions
Mechanics of Refinement • perturb x, y, z, and B such that Fobs and Fcalc come into maximal agreement • the old way: least-squares minimization • assuming errors follow a Gaussian distribution, minimization would take following form: is the predicted value of xj is the standard deviation for the measurement xj
More Least Squares • general form for refinement would be: • add geometrical constraints and in practice: • real space equivalent of the x-ray part is:
Still More Least Squares • two ways to minimize : • improve the model • introduce systematic errors that obliterate difference density • note that the s2 weighting term is eliminated—empirically shown to converge poorly • sign that least squares not appropriate • also have to incorporate higher resolution data later in refinement
Why not Least Squares? • phases of model in term treated as error-free • model completeness not taken into account • leads to bias towards existing model • all measurements treated as having equal information content • i.e. a F with F/s = 50 weighted the same as an F with F/s = 2 • additional phase information not easy to incorporate
Maximum Likelihood to the Rescue • if we move an atom, how it is moved depends on the position of all other atoms • if they're not in the right place and we assume they are, our choice of move for the target atom will not place it correctly • we need an estimate of model accuracy and completeness to help guide this process • maximum likelihood allows us to explicitly account for errors in the model, completeness of model, errors in data • additional phase info easily incorporated
Some Mathematical Background • in these slides "|" means "given", P is probability, and L is likelihood • assume errors in observations independent:
Nasty Math Shown for Effect, not to be Fully Understood, let alone Memorized • need to know what joint conditional probability of observations given current model, "P(obs|mod)", looks like: • the above is for an acentric reflection • worked out in the '60s (i.e. before cable TV)
Take-home Message from Previous Slide • Sq = amount of missing scattering matter • Sp = mass accounted for by current model • D reflects errors in current model • is the error of a given reflection • P(obs|mod) depends upon the magnitude of Fo and Fc, the errors in the Fo, the completeness of the current model, and the accuracy of the current model
Maximum Likelihood • want to maximize: • equivalent to minimizing negative logarithm(LLK is "log-likelihood"): • P0(mod) is our geometric restraints term ( ), and replaces • cast in similar form as LSQ, but with more complicated terms that reflect complexity of the problem • less biased towards model, data properly weighted
Additional Phase Information (e.g. MIR or MAD) • easily incorporated in maximum likelihood (unlike least-squares) as experimental constraints in the refinement process: becomes !
Aside: Scaling Fo to Fc • not as easy as you think • Fc calculated essentially in vacuum, whereas real crystal (source of Fo ) has bulk solvent (i.e. not ordered waters you can see) • bulk solvent tends to dampen low resolution reflections (pun intended) • poor scaling can mess up refinement • in olden times, would exclude all reflections below 8Å from refinement, despite fact they're the most accurately measured
Bulk Solvent Correction: Exponential Scaling Model • assumes and have exactly opposite phases • only really true to ~15Å • for ksol=0.75, Bsol=200: • 15Å reflection — • 4Å reflection —
Bulk Solvent Correction: The Mask Model • mask out the protein, calculate structure factors for everything outside the protein mask: • no assumption about solvent phases • ksolv and Bsolv determined by LSQ fit • mask also optimized • more robust than exponential scaling model
Rebuilding • after a round of refinement, model phases should be markedly improved, need for rebuilding evident • side chains added • loops built • waters/ions/ligands added • incorrectly-built areas remodeled
The "2Fo-Fc" Map • is our approximation to:
Maximum Likelihood Maps • 2Fo-Fc type map: • m = figure of merit of model phases • D = weight reflective of errors in model • difference map:
Validation • most obvious validation is Rfree • SFCHECK checks structure against data • other methods are model-based • all involve comparing present structure to well-refined structures in a database • some deviations from "standard" parameters will be functionally and/or structurally necessary • others will be errors in building
Procheck • very thorough check of a variety of geometry-based criteria • Ramachandran plot • main chain bond lengths and angles • planarity of rings and end groups (R,D,N,E,Q) • torsion angles, chirality • close non-bonded interactions, main chain H-bonds, disulfide bond geometry • residue by residue analysis of most of the above
Errat • analyzes statistics of non-bonded interactions between different atom types • highlights unusual regions, giving "confidence level" that a region is in error • anything above the 99% confidence level in most cases needs to be rebuilt
Verify3D • 3D-1D profile analysis of structure versus its own sequence • if residue is in an unusual chemical environment, it will receive a bad score and should be inspected • environment defined by: • area of residue buried • fraction covered by polar atoms • local secondary structure
PROVE • analyzes departures from standard atomic volumes • presented as "Z-score" or RMS(Z-score): >3 BAD! ≥2 BAD!