250 likes | 407 Views
Exploring Symmetry, Outlier Detection & Twinning update. Peter Zwart. Overview. Exploring metric symmetry iotbx.explore_metric_symmetry Outlier detection mmtbx.remove_outliers Twinning mmtbx.twin_map_utils Actually: cctbx.python $MMTBX_DIST/mmtbx/twinning/twin_map_utils.py.
E N D
Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart
Overview • Exploring metric symmetry • iotbx.explore_metric_symmetry • Outlier detection • mmtbx.remove_outliers • Twinning • mmtbx.twin_map_utils • Actually: cctbx.python $MMTBX_DIST/mmtbx/twinning/twin_map_utils.py
Exploring metric symmetry • Protein crystals grown under various conditions can sometimes exhibit drastic changes in symmetry and unit cell dimensions • Sometimes, the crystal symmetries are related • The relation is not always obvious • Finding the relation between two unit cells can be not so straightforward • Knowing the relations between the different crystal forms can be helpful during structure solution
Exploring metric symmetry • How to find relations between unit cells? • A sub-lattice formalism allows one to generate a family of related lattices from a given lattice • The number of unique unit cells that are N times larger than the original unit cell is quite small Rutherford, Acta Cryst. (2006). A62, 93-97 • Unit cells of approximate equal volume can be compared to each other by checking a large number of uni-modular transforms • Ralfs work
Exploring metric symmetry • Sub lattice? • Given all lattice points, ignore some of them while ensuring that the remaining lattice points form a regular lattice
Exploring metric symmetry • Examples Native : P212121 61.8 97.7 148.9 90 90 90 SeMet1 : P21 115.5 149.0 115.6 90 115 90 SeMet2 : C2221 123.6 195.4 148.9 90 90 90 Poulsen, et al, (2001). Acta Cryst.D57, 1251-1259.
Exploring metric symmetry • Future • Provide reindexing methods between related unit cells. • Would make molecular replacement of related structures easier • Useful for multi crystal averaging • Obtain non-merohedral twin laws from this analyses
Outlier detection • Outliers can have a detrimental effect on the progress of structure solution and refinement • Read, Acta Cryst. (1999). D55, 1759-1764 • The detection of outliers should be performed on the basis of all information available. • Use model info if you can • One would like to have the flexibility of correcting for mistakes made earlier • Those reflection with E-values larger then 5 could have been valid observations!
Outlier detection • What is an outlier? • A data point that does not fit a model because of an abnormal situation such as an erroneous measurement. • How to spot them? • If Fobs is not reconcilable with Fcalc, Fobs might be an outlier • Reconcilable? • Fobs should be explainable from Fcalc and the current quality of the model (A)
Outlier detection • Model based outlier detection is done in a similar way to the method described by Read (Acta Cryst. (1999). D55, 1759-1764) • Fobs and Fcalc are normalized to get Eobs & Ecalc • A is estimated for each reflection • Combining standard likelihood techniques with kernel methods to obtain smooth varying estimates • Find : • Compute :
Outlier detection • Q is approximately 2 distributed • Acceptable values of Q are determined by the size of the dataset • If the dataset is large, large deviations are expected • A p-value is computed for each reflection • The p-value is the probability that if this particular Q-value was the largest in the dataset, a Q value of equal or larger value is observed by chance. • Observations for which the p-value is smaller than 5% are considered outliers.
Outlier detection • Example: 1ty3 • Wilson statistics indicate 1 outlier (25,6,-43) Eobs = 3.938 centric = True p-wilson = 1.83E-07 p-extreme = 9.0E-03 • Model based outlier detection indicate that the (25,6,-43) is a valid observation
Outlier Detection • The outlier detection algorithm is embedded in a class that caches the original observed data. • This will allow one to perform outlier detection during different macro-cycles/rebuilding states and update • Will be incorporated in phenix.refine at the appropriate juncture • Command line tool available
Twinning progress report • Routines available • Least squares target functions • Both intensity and amplitude • Target values and first derivatives • Detwinning • Standard and a la Sheldrick • R-values • Map coefficients • 2mFo-DFc & gradient maps • Bulk solvent scaling • Estimation of twin fraction, ksol Bsol, U* and overall scale on twinned data • Using global optimizer (differential evolution) for the moment
Twinning progress report • Bulk solvent scaling and detwinned map generation available as a command line tool mmtbx.twin_map_utils • Results similar to CNS • mmtbx.twin_map_utils should be seen as the first step to full integration of twin utilities in phenix.refine
Twinning progress report mmtbx.twin_map_utils CNS
Twinning progress report 1eyx: twin fraction = 0.47; difference maps at 2.5 sigma Ligands and waters deleted (10% of total model) Twinning not taken into account Twinning taken into account
Twinning progress report Difference in 2mFO-DFC density is less striking Twinning not taken into account Twinning taken into account
Twinning progress report • Future plans • Likelihood based map coefficients • in collaboration with Randy Read • Incorporation of least squares targets in phenix.refine • Likelihood based targets • in collaboration with Randy Read
Funding: • LBNL (DE-AC03-76SF00098) • NIH/NIGMS (P01GM063210) • PHENIX Industrial Consortium Ackowledgements Cambridge Randy Read Airlie McCoy Los Alamos Tom Terwilliger Li Wei Hung Texas A&M Univeristy Jim Sacchettini Tom Ioerger Eric McKee Duke University Jane Richardson David Richardson Phenix industrial Consortium Robert Nolte Eric Vogan Paul Adams Ralf Grosse-Kunstleve Pavel Afonine Nigel Moriarty Nick Sauter Michael Hohn
Kernel methods • Discrete binning of X-ray data introduces discontinuous jumps of properties that are continuously varying properties • Mean intensity (normalisation) • The estimation of A • Possible remedies: • Spline functions • Used extensively by K. Cowtan • Kernel methods
Kernel methods • Discreet binning assumes a constant value in a certain range
Kernel methods • With Kernel methods, the estimate at each position is based on a full dataset. • The amount that each datum contributes is determined by a weighting function (usually depending on the squared distance)
Kernel methods • Kernel method available for normalisation • Used by xtriage in intensity statistics • Kernel method available for of A estimation • Used in the outlier detection
Kernel methods • Determination of alpha from A estimated using kernel methods results in values similar as those obtained by what is available in phenix.refine • Similar results for beta