1 / 27

Winners and Losers: Ranking Crystals from Diffraction Images

Winners and Losers: Ranking Crystals from Diffraction Images . Angela R. Criswell Automation Scientist. ACTOR Installations. Pharmaceutical Companies (11) Abbott Laboratories (Chicago, IL) Astex Technology (UK) AstraZeneca (UK) Aventis (Frankfurt) BMS (Princeton, NJ)

cheryl
Download Presentation

Winners and Losers: Ranking Crystals from Diffraction Images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Winners and Losers: Ranking Crystals from Diffraction Images Angela R. Criswell Automation Scientist

  2. ACTOR Installations • Pharmaceutical Companies (11) • Abbott Laboratories (Chicago, IL) • Astex Technology (UK) • AstraZeneca (UK) • Aventis (Frankfurt) • BMS (Princeton, NJ) • Exelixis (San Francisco, CA) • Merck (West Point, PA) • Novartis (Basel, Switzerland) • Novartis (Cambridge, MA) • Pfizer (St. Louis, MO) • Schering-Plough Research Inst. (NJ) • Structural Genomics Groups (3) • SGC – Oxford (UK) • University of Georgia • University of Toronto • Beamlines (2) • Daresbury Laboratory (UK) • IMCA-CAT (APS) • Future Installations (4) • 2 additional beamlines (SLS, Diamond) • 1 pharmaceutical company • AGENT Installations (3) • ActiveSight (San Diego, CA) • 2 future pharmaceutical sites

  3. High Throughput Optimization • Automate the processes • Crystallization robots • Sample mounting robots • Automated structure solution • Increase robustness for automated processes • Hardware and software improvements • Sample tracking methods and database management • Ever increasing complexity • Incorporate intelligence and examine success/failure. • Heuristic and learning methods • Remote access and control of automated processes • VNC and mail-in crystallography • Diffraction improvement by controlled hydration • Free-mounting system (Proteros)

  4. Crystal Ranking: An Evolution

  5. How do CrystallographersRank Crystals?? • Do I have another crystal?? • Is the crystal twinned? • How far does the crystal diffract? • Are there ice rings? • Do peaks have a decent spot shapes? • Can I assign a unit cell for the sample? • What are the unit cell dimensions and space group? • I/sig(I) analysis is not sufficient • Single image is probably not sufficient

  6. Crystal Ranking Efforts • d*TREK (Rigaku/MSC - Pflugrath) • automatic indexing, ranking, strategy, integration, scaling • DISTL and LABELIT (SSRL & LBNL) • Automatic ranking and indexing, data processing • DNA (SPINE) • Automatic ranking and indexing • CrySis (Brookhaven – Bernston, Stojanoff, and Takai) • ranking with neural network trained with 500 diff images • BEST (EMBL – Popov) • Data collection strategy based upon statistic modeling

  7. SpamAssassin • Performs cursory header analysis: spots emails that try to mask their identities • Performs in-depth text analysis: spam mails often have a characteristic style (to put it politely) • characteristic disclaimers and lots of !!!!! • webpage links • Enables blacklisting: block email from existing blacklist sites • Adaptivelearns to recognize spam based upon user scores and amend blacklists Email SCORE: Advertisement for SuperBowl Celebration Event • No. hits=3.9 Required=4.0 • tests=HTML_60_70HTML_FONTCOLOR_REDHTML_FONTCOLOR_UNSAFEHTML_FONT_INVISIBLEHTML_MESSAGEHTTP_ESCAPED_HOSTHTTP_EXCESSIVE_ESCAPESLINES_OF_YELLING

  8. Strategic Ranking Goals • Incorporate image analysis tools alone • Diffraction limits • Bragg peak intensities • Background radiation • Ice ring identification – strong and diffuse • Incorporate indexing and refinement results • Spot shape • Lattice quality • Spot prediction analysis (discriminates twinned from non-twinned crystals) • Incorporate Comparative analysis • Between samples (rank comparisons) • Images collected for same sample (different crystal orientations) • Automatic exposure time determination

  9. Divide image into 10 resolution bins. • Ignore lowest 3 bins. Rules 1 and 2 • Analyze 7 highest resolution shells • # reflns / shell • S:N of reflns / shell

  10. Rule 3: Spot Sharpness • calculated for every peak output = avg 2(A/B) A = peak max position – peak center position x1 x2 B = ( Δx 2 + Δy 2 )1/2 B is the effective diameter of the peak.

  11. Rules 4 – 5: Ice Ring Detection • Step 1: filter out peaks from images • Step 2: bin pixels by 2θ • Step 3: for each bin, sum pixel intensities Example plot:

  12. Lysozyme 2_05rank = 202

  13. Lysozyme 2_01rank = 179

  14. Lysozyme 2_10rank = 124

  15. Rules 6 - 11 • Indexing Award for percentage of indexed spots • Refinement Penalty based upon RMSMM residual • Mosaicity Penalty based upon refined mosaicity • Refinement Coverage Award for percentage of accepted reflections in prediction list • Prediction Re-evaluate highest 7 resolution shells based upon number of found spots that match predicted reflection list • Refined Reflection Resolution Re-evaluate highest 7 resolution shells based upon the signal-to-noise ratio of predicted reflections

  16. Ranking Results Rule 1: Spot count in resolution shells (found spots) Rule 2: I/Sigma in resolution shells (found spots) Rule 3: Spot sharpness Rule 4: Strong ice rings Rule 5: Diffuse ice rings Rule 6: Percentage of spots indexed Rule 7: RMS residual after refinement Rule 8: Mosaicity Rule 9: Percentage of spots refined Rule 10: Spot count in resolution shells (predicted and found spots) Rule 11: I/Sigma in resolution shells (predicted and found spots) Sample / Rules 1 2 3 4 5 6 7 8 9 10 11 Total L:\Images\lyso101_????.osc 1 70 60 -1 -10 0 50 -17 -20 28 70 62 292

  17. Sample Group #1Tests with Lysozyme crystals

  18. Lysozyme 2_05rank = 202 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 2nd shell (1.79-1.86)Å 10 10 >=5 reflns found in 3rd shell (1.86-1.94)Å 10 20 >=5 reflns found in 4th shell (1.94-2.04)Å 10 30 >=5 reflns found in 5th shell (2.04-2.17)Å 10 40 >=5 reflns found in 6th shell (2.17-2.34)Å 10 50 >=5 reflns found in 7th shell (2.34-2.58)Å 10 60 I/sig == 44.8 in 2nd found shell (1.79-1.86)Å 7 67 I/sig == 56.8 in 3rd found shell (1.86-1.94)Å 9 76 I/sig == 60.1 in 4th found shell (1.94-2.04)Å 10 86 I/sig == 67.7 in 5th found shell (2.04-2.17)Å 10 96 I/sig == 74.2 in 6th found shell (2.17-2.34)Å 10 106 I/sig == 89.7 in 7th found shell (2.34-2.58)Å 10 116 Penalty for spot sharpness of 0.06 -1 115 Penalty for strong ring (2.82%) near resln. 3.513 -10 105 Penalty for diffuse ring (0.70%) near resln. 3.943 -5 100 Indexed 404 spots, or 75% of all spots used in indexing 74 174 Penalty for RMS residual value of 0.164 -16 158 Penalty for Mosaicity value of 0.4 -19 139 Refined 44 spots, or 4% of all predictions 3 142 >=5 reflns predicted and found in 5th shell (2.04-2.17)Å 10 152 >=5 reflns predicted and found in 6th shell (2.17-2.34)Å 10 162 >=5 reflns predicted and found in 7th shell (2.34-2.58)Å 10 172 I/sig == 77.7 in 5th predicted and found shell (2.04-2.17)Å 10 182 I/sig == 80.8 in 6th predicted and found shell (2.17-2.34)Å 10 192 I/sig == 94.5 in 7th predicted and found shell (2.34-2.58)Å 10 202 ------------------------------------------------------------------------------- Cumulative 202

  19. Lysozyme 2_01rank = 179 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 2nd shell (1.79-1.86)Å 10 10 >=5 reflns found in 3rd shell (1.86-1.94)Å 10 20 >=5 reflns found in 4th shell (1.94-2.04)Å 10 30 >=5 reflns found in 5th shell (2.04-2.17)Å 10 40 >=5 reflns found in 6th shell (2.17-2.34)Å 10 50 >=5 reflns found in 7th shell (2.34-2.58)Å 10 60 I/sig == 49.8 in 2nd found shell (1.79-1.86)Å 8 68 I/sig == 47.0 in 3rd found shell (1.86-1.94)Å 7 75 I/sig == 52.8 in 4th found shell (1.94-2.04)Å 8 83 I/sig == 65.7 in 5th found shell (2.04-2.17)Å 10 93 I/sig == 69.9 in 6th found shell (2.17-2.34)Å 10 103 I/sig == 86.8 in 7th found shell (2.34-2.58)Å 10 113 Penalty for spot sharpness of 0.10 -1 112 Penalty for strong ring (2.78%) near resln. 3.555 -10 102 Penalty for diffuse ring (0.55%) near resln. 3.943 -5 97 Indexed 342 spots, or 56% of all spots used in indexing 56 153 Penalty for RMS residual value of 0.182 -18 135 Penalty for Mosaicity value of 0.3 -15 120 Refined 24 spots, or 2% of all predictions 2 122 >=5 reflns predicted and found in 4th shell (1.94-2.04)Å 10 132 >=5 reflns predicted and found in 5th shell (2.04-2.17)Å 10 142 >=5 reflns predicted and found in 6th shell (2.17-2.34)Å 10 152 I/sig == 44.4 in 4th predicted and found shell (1.94-2.04)Å 7 159 I/sig == 87.2 in 5th predicted and found shell (2.04-2.17)Å 10 169 I/sig == 67.0 in 6th predicted and found shell (2.17-2.34)Å 10 179 ------------------------------------------------------------------------------ Cumulative 179

  20. Lysozyme 2_10rank = 124 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 3rd shell (1.86-1.94)Å 10 10 >=5 reflns found in 4th shell (1.94-2.04)Å 10 20 >=5 reflns found in 5th shell (2.04-2.17)Å 10 30 >=5 reflns found in 6th shell (2.17-2.34)Å 10 40 >=5 reflns found in 7th shell (2.34-2.58)Å 10 50 I/sig == 54.8 in 3rd found shell (1.86-1.94)Å 9 59 I/sig == 55.3 in 4th found shell (1.94-2.04)Å 9 68 I/sig == 64.3 in 5th found shell (2.04-2.17)Å 10 78 I/sig == 72.1 in 6th found shell (2.17-2.34)Å 10 88 I/sig == 86.1 in 7th found shell (2.34-2.58)Å 10 98 Penalty for spot sharpness of 0.07 -1 97 Penalty for strong ring (2.64%) near resln. 4.162 -10 87 Penalty for strong ring (2.05%) near resln. 3.875 -10 77 Penalty for strong ring (1.84%) near resln. 3.434 -10 67 Penalty for strong ring (6.76%) near resln. 2.139 -10 57 Penalty for strong ring (7.87%) near resln. 1.975 -10 47 Penalty for strong ring (4.78%) near resln. 1.875 -10 37 Indexed 305 spots, or 58% of all spots used in indexing 58 95 Penalty for RMS residual value of 0.121 -12 83 Penalty for Mosaicity value of 0.4 -18 65 >=5 reflns predicted and found in 5th shell (2.04-2.17)Å 10 75 >=5 reflns predicted and found in 6th shell (2.17-2.34)Å 10 85 >=5 reflns predicted and found in 7th shell (2.34-2.58)Å 10 95 I/sig == 57.8 in 5th predicted and found shell (2.04-2.17)Å 9 104 I/sig == 61.2 in 6th predicted and found shell (2.17-2.34)Å 10 114 I/sig == 103.3 in 7th predicted and found shell (2.34-2.58)Å 10 124 ------------------------------------------------------------------------------- Cumulative 124

  21. Lysozyme 4_12rank = 112 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 5th shell (2.25-2.39)Å 10 10 >=5 reflns found in 6th shell (2.39-2.57)Å 10 20 >=5 reflns found in 7th shell (2.57-2.83)Å 10 30 I/sig == 15.7 in 5th found shell (2.25-2.39)Å 2 32 I/sig == 19.5 in 6th found shell (2.39-2.57)Å 3 35 I/sig == 22.9 in 7th found shell (2.57-2.83)Å 3 38 Penalty for spot sharpness of 0.10 -1 37 Penalty for strong ring (1.09%) near resln. 4.031 -10 27 Indexed 242 spots, or 57% of all spots used in indexing 57 84 Penalty for RMS residual value of 0.086 -8 76 Penalty for Mosaicity value of 0.5 -20 56 Refined 186 spots, or 19% of all predictions 18 74 >=5 reflns predicted and found in 5th shell (2.25-2.39)Å 10 84 >=5 reflns predicted and found in 6th shell (2.39-2.57)Å 10 94 >=5 reflns predicted and found in 7th shell (2.57-2.83)Å 10 104 I/sig == 17.6 in 5th predicted and found shell (2.25-2.39)Å 2 106 I/sig == 19.7 in 6th predicted and found shell (2.39-2.57)Å 3 109 I/sig == 22.4 in 7th predicted and found shell (2.57-2.83)Å 3 112 ------------------------------------------------------------------------------- Cumulative 112

  22. Effect of Indexing on Rank Values

  23. 197.5 221 260 Score VariabilityRank Values vs. Exposure Time Images / Rules 1 2 3 4 5 6 7 8 9 10 11 Total Thaumatin – 5 sec/0.5º: Rmerge = 12.9 % (32.5 %) thau3 501,561 60 22 -2 -20 0 56 -5 -9 19 70 23 214 thau3 501 60 22 -2 -20 0 58 -5 -12 14 60 21 196 thau3 545 50 18 -3 -20 0 55 -5 -6 24 50 16 179 thau3 590 60 28 -3 -20 0 55 -6 -10 18 60 22 204 thau3 626 50 22 -3 -20 0 59 -6 -7 20 70 26 211 Thaumatin – 10 sec/0.5º: Rmerge = 10.3 % (27.5 %) thau3 1001,1061 70 32 -3 -20 0 57 -6 -11 20 70 30 239 thau3 1001 60 31 -3 -20 0 57 -6 -12 18 60 28 213 thau3 1045 60 26 -3 -20 0 53 -6 -11 22 70 25 216 thau3 1090 60 32 -3 -20 0 57 -6 -10 21 60 27 218 thau3 1126 70 33 -2 -20 0 55 -6 -13 17 70 33 237 Thaumatin – 30 sec/0.5º: Rmerge = 8.4 % (25.8 %) thau3 3001,3061 70 46 -3 -20 0 53 -7 -12 21 70 42 260 thau3 3001 60 40 -3 -20 0 57 -6 -11 21 60 40 238 thau3 3045 70 45 -3 -20 0 54 -6 -10 24 70 40 264 thau3 3090 70 48 -3 -20 0 57 -6 -11 23 70 42 270 thau3 3126 70 47 -2 -20 0 56 -6 -11 20 70 44 268

  24. 291 305 Score VariabilityData sets collected with VariMax optics Images / Rules 1 2 3 4 5 6 7 8 9 10 11 Total VariMax-HR : Rmerge = 2.9 % (22.3 %) LYS0503_screen 1-2 70 46 -1 -10 -5 51 -18 -13 46 70 42 278 LYS0503_screen 1 70 46 -1 -10 -5 54 -15 -11 50 70 41 289 LYS0503_screen 2 70 44 -1 -10 0 56 -18 -14 44 70 41 282 LYS0503_ 1 70 46 -1 0 -5 56 -17 -12 48 70 42 297 LYS0503_ 45 70 46 -1 -10 0 57 -15 -13 45 70 42 291 LYS0503_ 90 70 46 -1 -10 -5 57 -16 -12 47 70 42 288 LYS0503_ 116 70 46 -1 -10 -5 57 -16 -12 49 70 40 288 VariMax-HR : Rmerge = 2.8 % (15.0 %) LYS0503_screen 1-2 70 57 -1 -10 0 56 -23 -18 39 70 57 297 LYS0503_screen 1 70 58 -1 0 -15 57 -23 -17 42 70 57 298 LYS0503_screen 2 70 57 -1 -10 0 59 -23 -17 42 70 57 304 LYS0503_ 1 70 57 -1 -10 0 58 -21 -17 43 70 56 305 LYS0503_ 45 70 58 -1 -10 0 57 -21 -17 46 70 56 308 LYS0503_ 90 70 58 -1 -10 -5 55 -22 -18 39 70 57 293 LYS0503_ 116 70 57 -1 0 -5 57 -22 -14 47 70 55 314

  25. What Have We Learned? • Signal-to-noise is predominant factor in current d*TREK release • This is intentional! Should it be? • Each of the 11 rules have independent parameters that can be adjusted to optimize for your case • Image processing adds domino effect to ranking • Better refinement, higher rank • Lower mosaicity, higher rank • Fewer twin spots, higher rank • Spot sharpness analysis is not robust • Incorporate graph theory • Potential Pitfalls • Weak diffractors • lowest 3 resolution bins should not excluded from spot analysis • Image Header Accuracies • Anisotropy • Need images at multiple angles • These effects become effectively ‘averaged’ across images • Merohedral twinning

  26. Recent d*TREK Improvements • Don’t ignore lowest resolution bins • Image Header Accuracies • Command line override • Anisotropy • Incorporated anisotropy check and another rule • Rank each image, calculate average and ESD • Apply penalty as multiple of ESD • Data Collection Strategy improvements • Automatic exposure time calculation (using ‘intelligent’ algorithm) • Optimize detector space for diffraction resolution • Multiple scan strategy, if possible

  27. Acknowledgements Russ Athay Robert Bolotovsky Joseph D. Ferrara Thad Niemeyer Karen Opersteny J.W. Pflugrath

More Related