1 / 25

Exploring Symmetry, Outlier Detection & Twinning update

Exploring Symmetry, Outlier Detection & Twinning update. Peter Zwart. Overview. Exploring metric symmetry iotbx.explore_metric_symmetry Outlier detection mmtbx.remove_outliers Twinning mmtbx.twin_map_utils Actually: cctbx.python $MMTBX_DIST/mmtbx/twinning/twin_map_utils.py.

dlan
Download Presentation

Exploring Symmetry, Outlier Detection & Twinning update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

  2. Overview • Exploring metric symmetry • iotbx.explore_metric_symmetry • Outlier detection • mmtbx.remove_outliers • Twinning • mmtbx.twin_map_utils • Actually: cctbx.python $MMTBX_DIST/mmtbx/twinning/twin_map_utils.py

  3. Exploring metric symmetry • Protein crystals grown under various conditions can sometimes exhibit drastic changes in symmetry and unit cell dimensions • Sometimes, the crystal symmetries are related • The relation is not always obvious • Finding the relation between two unit cells can be not so straightforward • Knowing the relations between the different crystal forms can be helpful during structure solution

  4. Exploring metric symmetry • How to find relations between unit cells? • A sub-lattice formalism allows one to generate a family of related lattices from a given lattice • The number of unique unit cells that are N times larger than the original unit cell is quite small Rutherford, Acta Cryst. (2006). A62, 93-97 • Unit cells of approximate equal volume can be compared to each other by checking a large number of uni-modular transforms • Ralfs work

  5. Exploring metric symmetry • Sub lattice? • Given all lattice points, ignore some of them while ensuring that the remaining lattice points form a regular lattice

  6. Exploring metric symmetry • Examples Native : P212121 61.8 97.7 148.9 90 90 90 SeMet1 : P21 115.5 149.0 115.6 90 115 90 SeMet2 : C2221 123.6 195.4 148.9 90 90 90 Poulsen, et al, (2001). Acta Cryst.D57, 1251-1259.

  7. Exploring metric symmetry • Future • Provide reindexing methods between related unit cells. • Would make molecular replacement of related structures easier • Useful for multi crystal averaging • Obtain non-merohedral twin laws from this analyses

  8. Outlier detection • Outliers can have a detrimental effect on the progress of structure solution and refinement • Read, Acta Cryst. (1999). D55, 1759-1764 • The detection of outliers should be performed on the basis of all information available. • Use model info if you can • One would like to have the flexibility of correcting for mistakes made earlier • Those reflection with E-values larger then 5 could have been valid observations!

  9. Outlier detection • What is an outlier? • A data point that does not fit a model because of an abnormal situation such as an erroneous measurement. • How to spot them? • If Fobs is not reconcilable with Fcalc, Fobs might be an outlier • Reconcilable? • Fobs should be explainable from Fcalc and the current quality of the model (A)

  10. Outlier detection • Model based outlier detection is done in a similar way to the method described by Read (Acta Cryst. (1999). D55, 1759-1764) • Fobs and Fcalc are normalized to get Eobs & Ecalc • A is estimated for each reflection • Combining standard likelihood techniques with kernel methods to obtain smooth varying estimates • Find : • Compute :

  11. Outlier detection • Q is approximately 2 distributed • Acceptable values of Q are determined by the size of the dataset • If the dataset is large, large deviations are expected • A p-value is computed for each reflection • The p-value is the probability that if this particular Q-value was the largest in the dataset, a Q value of equal or larger value is observed by chance. • Observations for which the p-value is smaller than 5% are considered outliers.

  12. Outlier detection • Example: 1ty3 • Wilson statistics indicate 1 outlier (25,6,-43) Eobs = 3.938 centric = True p-wilson = 1.83E-07 p-extreme = 9.0E-03 • Model based outlier detection indicate that the (25,6,-43) is a valid observation

  13. Outlier Detection • The outlier detection algorithm is embedded in a class that caches the original observed data. • This will allow one to perform outlier detection during different macro-cycles/rebuilding states and update • Will be incorporated in phenix.refine at the appropriate juncture • Command line tool available

  14. Twinning progress report • Routines available • Least squares target functions • Both intensity and amplitude • Target values and first derivatives • Detwinning • Standard and a la Sheldrick • R-values • Map coefficients • 2mFo-DFc & gradient maps • Bulk solvent scaling • Estimation of twin fraction, ksol Bsol, U* and overall scale on twinned data • Using global optimizer (differential evolution) for the moment

  15. Twinning progress report • Bulk solvent scaling and detwinned map generation available as a command line tool mmtbx.twin_map_utils • Results similar to CNS • mmtbx.twin_map_utils should be seen as the first step to full integration of twin utilities in phenix.refine

  16. Twinning progress report mmtbx.twin_map_utils CNS

  17. Twinning progress report 1eyx: twin fraction = 0.47; difference maps at 2.5 sigma Ligands and waters deleted (10% of total model) Twinning not taken into account Twinning taken into account

  18. Twinning progress report Difference in 2mFO-DFC density is less striking Twinning not taken into account Twinning taken into account

  19. Twinning progress report • Future plans • Likelihood based map coefficients • in collaboration with Randy Read • Incorporation of least squares targets in phenix.refine • Likelihood based targets • in collaboration with Randy Read

  20. Funding: • LBNL (DE-AC03-76SF00098) • NIH/NIGMS (P01GM063210) • PHENIX Industrial Consortium Ackowledgements Cambridge Randy Read Airlie McCoy Los Alamos Tom Terwilliger Li Wei Hung Texas A&M Univeristy Jim Sacchettini Tom Ioerger Eric McKee Duke University Jane Richardson David Richardson Phenix industrial Consortium Robert Nolte Eric Vogan Paul Adams Ralf Grosse-Kunstleve Pavel Afonine Nigel Moriarty Nick Sauter Michael Hohn

  21. Kernel methods • Discrete binning of X-ray data introduces discontinuous jumps of properties that are continuously varying properties • Mean intensity (normalisation) • The estimation of A • Possible remedies: • Spline functions • Used extensively by K. Cowtan • Kernel methods

  22. Kernel methods • Discreet binning assumes a constant value in a certain range

  23. Kernel methods • With Kernel methods, the estimate at each position is based on a full dataset. • The amount that each datum contributes is determined by a weighting function (usually depending on the squared distance)

  24. Kernel methods • Kernel method available for normalisation • Used by xtriage in intensity statistics • Kernel method available for of A estimation • Used in the outlier detection

  25. Kernel methods • Determination of alpha from A estimated using kernel methods results in values similar as those obtained by what is available in phenix.refine • Similar results for beta

More Related