270 likes | 388 Views
Winners and Losers: Ranking Crystals from Diffraction Images. Angela R. Criswell Automation Scientist. ACTOR Installations. Pharmaceutical Companies (11) Abbott Laboratories (Chicago, IL) Astex Technology (UK) AstraZeneca (UK) Aventis (Frankfurt) BMS (Princeton, NJ)
E N D
Winners and Losers: Ranking Crystals from Diffraction Images Angela R. Criswell Automation Scientist
ACTOR Installations • Pharmaceutical Companies (11) • Abbott Laboratories (Chicago, IL) • Astex Technology (UK) • AstraZeneca (UK) • Aventis (Frankfurt) • BMS (Princeton, NJ) • Exelixis (San Francisco, CA) • Merck (West Point, PA) • Novartis (Basel, Switzerland) • Novartis (Cambridge, MA) • Pfizer (St. Louis, MO) • Schering-Plough Research Inst. (NJ) • Structural Genomics Groups (3) • SGC – Oxford (UK) • University of Georgia • University of Toronto • Beamlines (2) • Daresbury Laboratory (UK) • IMCA-CAT (APS) • Future Installations (4) • 2 additional beamlines (SLS, Diamond) • 1 pharmaceutical company • AGENT Installations (3) • ActiveSight (San Diego, CA) • 2 future pharmaceutical sites
High Throughput Optimization • Automate the processes • Crystallization robots • Sample mounting robots • Automated structure solution • Increase robustness for automated processes • Hardware and software improvements • Sample tracking methods and database management • Ever increasing complexity • Incorporate intelligence and examine success/failure. • Heuristic and learning methods • Remote access and control of automated processes • VNC and mail-in crystallography • Diffraction improvement by controlled hydration • Free-mounting system (Proteros)
How do CrystallographersRank Crystals?? • Do I have another crystal?? • Is the crystal twinned? • How far does the crystal diffract? • Are there ice rings? • Do peaks have a decent spot shapes? • Can I assign a unit cell for the sample? • What are the unit cell dimensions and space group? • I/sig(I) analysis is not sufficient • Single image is probably not sufficient
Crystal Ranking Efforts • d*TREK (Rigaku/MSC - Pflugrath) • automatic indexing, ranking, strategy, integration, scaling • DISTL and LABELIT (SSRL & LBNL) • Automatic ranking and indexing, data processing • DNA (SPINE) • Automatic ranking and indexing • CrySis (Brookhaven – Bernston, Stojanoff, and Takai) • ranking with neural network trained with 500 diff images • BEST (EMBL – Popov) • Data collection strategy based upon statistic modeling
SpamAssassin • Performs cursory header analysis: spots emails that try to mask their identities • Performs in-depth text analysis: spam mails often have a characteristic style (to put it politely) • characteristic disclaimers and lots of !!!!! • webpage links • Enables blacklisting: block email from existing blacklist sites • Adaptivelearns to recognize spam based upon user scores and amend blacklists Email SCORE: Advertisement for SuperBowl Celebration Event • No. hits=3.9 Required=4.0 • tests=HTML_60_70HTML_FONTCOLOR_REDHTML_FONTCOLOR_UNSAFEHTML_FONT_INVISIBLEHTML_MESSAGEHTTP_ESCAPED_HOSTHTTP_EXCESSIVE_ESCAPESLINES_OF_YELLING
Strategic Ranking Goals • Incorporate image analysis tools alone • Diffraction limits • Bragg peak intensities • Background radiation • Ice ring identification – strong and diffuse • Incorporate indexing and refinement results • Spot shape • Lattice quality • Spot prediction analysis (discriminates twinned from non-twinned crystals) • Incorporate Comparative analysis • Between samples (rank comparisons) • Images collected for same sample (different crystal orientations) • Automatic exposure time determination
Divide image into 10 resolution bins. • Ignore lowest 3 bins. Rules 1 and 2 • Analyze 7 highest resolution shells • # reflns / shell • S:N of reflns / shell
Rule 3: Spot Sharpness • calculated for every peak output = avg 2(A/B) A = peak max position – peak center position x1 x2 B = ( Δx 2 + Δy 2 )1/2 B is the effective diameter of the peak.
Rules 4 – 5: Ice Ring Detection • Step 1: filter out peaks from images • Step 2: bin pixels by 2θ • Step 3: for each bin, sum pixel intensities Example plot:
Rules 6 - 11 • Indexing Award for percentage of indexed spots • Refinement Penalty based upon RMSMM residual • Mosaicity Penalty based upon refined mosaicity • Refinement Coverage Award for percentage of accepted reflections in prediction list • Prediction Re-evaluate highest 7 resolution shells based upon number of found spots that match predicted reflection list • Refined Reflection Resolution Re-evaluate highest 7 resolution shells based upon the signal-to-noise ratio of predicted reflections
Ranking Results Rule 1: Spot count in resolution shells (found spots) Rule 2: I/Sigma in resolution shells (found spots) Rule 3: Spot sharpness Rule 4: Strong ice rings Rule 5: Diffuse ice rings Rule 6: Percentage of spots indexed Rule 7: RMS residual after refinement Rule 8: Mosaicity Rule 9: Percentage of spots refined Rule 10: Spot count in resolution shells (predicted and found spots) Rule 11: I/Sigma in resolution shells (predicted and found spots) Sample / Rules 1 2 3 4 5 6 7 8 9 10 11 Total L:\Images\lyso101_????.osc 1 70 60 -1 -10 0 50 -17 -20 28 70 62 292
Lysozyme 2_05rank = 202 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 2nd shell (1.79-1.86)Å 10 10 >=5 reflns found in 3rd shell (1.86-1.94)Å 10 20 >=5 reflns found in 4th shell (1.94-2.04)Å 10 30 >=5 reflns found in 5th shell (2.04-2.17)Å 10 40 >=5 reflns found in 6th shell (2.17-2.34)Å 10 50 >=5 reflns found in 7th shell (2.34-2.58)Å 10 60 I/sig == 44.8 in 2nd found shell (1.79-1.86)Å 7 67 I/sig == 56.8 in 3rd found shell (1.86-1.94)Å 9 76 I/sig == 60.1 in 4th found shell (1.94-2.04)Å 10 86 I/sig == 67.7 in 5th found shell (2.04-2.17)Å 10 96 I/sig == 74.2 in 6th found shell (2.17-2.34)Å 10 106 I/sig == 89.7 in 7th found shell (2.34-2.58)Å 10 116 Penalty for spot sharpness of 0.06 -1 115 Penalty for strong ring (2.82%) near resln. 3.513 -10 105 Penalty for diffuse ring (0.70%) near resln. 3.943 -5 100 Indexed 404 spots, or 75% of all spots used in indexing 74 174 Penalty for RMS residual value of 0.164 -16 158 Penalty for Mosaicity value of 0.4 -19 139 Refined 44 spots, or 4% of all predictions 3 142 >=5 reflns predicted and found in 5th shell (2.04-2.17)Å 10 152 >=5 reflns predicted and found in 6th shell (2.17-2.34)Å 10 162 >=5 reflns predicted and found in 7th shell (2.34-2.58)Å 10 172 I/sig == 77.7 in 5th predicted and found shell (2.04-2.17)Å 10 182 I/sig == 80.8 in 6th predicted and found shell (2.17-2.34)Å 10 192 I/sig == 94.5 in 7th predicted and found shell (2.34-2.58)Å 10 202 ------------------------------------------------------------------------------- Cumulative 202
Lysozyme 2_01rank = 179 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 2nd shell (1.79-1.86)Å 10 10 >=5 reflns found in 3rd shell (1.86-1.94)Å 10 20 >=5 reflns found in 4th shell (1.94-2.04)Å 10 30 >=5 reflns found in 5th shell (2.04-2.17)Å 10 40 >=5 reflns found in 6th shell (2.17-2.34)Å 10 50 >=5 reflns found in 7th shell (2.34-2.58)Å 10 60 I/sig == 49.8 in 2nd found shell (1.79-1.86)Å 8 68 I/sig == 47.0 in 3rd found shell (1.86-1.94)Å 7 75 I/sig == 52.8 in 4th found shell (1.94-2.04)Å 8 83 I/sig == 65.7 in 5th found shell (2.04-2.17)Å 10 93 I/sig == 69.9 in 6th found shell (2.17-2.34)Å 10 103 I/sig == 86.8 in 7th found shell (2.34-2.58)Å 10 113 Penalty for spot sharpness of 0.10 -1 112 Penalty for strong ring (2.78%) near resln. 3.555 -10 102 Penalty for diffuse ring (0.55%) near resln. 3.943 -5 97 Indexed 342 spots, or 56% of all spots used in indexing 56 153 Penalty for RMS residual value of 0.182 -18 135 Penalty for Mosaicity value of 0.3 -15 120 Refined 24 spots, or 2% of all predictions 2 122 >=5 reflns predicted and found in 4th shell (1.94-2.04)Å 10 132 >=5 reflns predicted and found in 5th shell (2.04-2.17)Å 10 142 >=5 reflns predicted and found in 6th shell (2.17-2.34)Å 10 152 I/sig == 44.4 in 4th predicted and found shell (1.94-2.04)Å 7 159 I/sig == 87.2 in 5th predicted and found shell (2.04-2.17)Å 10 169 I/sig == 67.0 in 6th predicted and found shell (2.17-2.34)Å 10 179 ------------------------------------------------------------------------------ Cumulative 179
Lysozyme 2_10rank = 124 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 3rd shell (1.86-1.94)Å 10 10 >=5 reflns found in 4th shell (1.94-2.04)Å 10 20 >=5 reflns found in 5th shell (2.04-2.17)Å 10 30 >=5 reflns found in 6th shell (2.17-2.34)Å 10 40 >=5 reflns found in 7th shell (2.34-2.58)Å 10 50 I/sig == 54.8 in 3rd found shell (1.86-1.94)Å 9 59 I/sig == 55.3 in 4th found shell (1.94-2.04)Å 9 68 I/sig == 64.3 in 5th found shell (2.04-2.17)Å 10 78 I/sig == 72.1 in 6th found shell (2.17-2.34)Å 10 88 I/sig == 86.1 in 7th found shell (2.34-2.58)Å 10 98 Penalty for spot sharpness of 0.07 -1 97 Penalty for strong ring (2.64%) near resln. 4.162 -10 87 Penalty for strong ring (2.05%) near resln. 3.875 -10 77 Penalty for strong ring (1.84%) near resln. 3.434 -10 67 Penalty for strong ring (6.76%) near resln. 2.139 -10 57 Penalty for strong ring (7.87%) near resln. 1.975 -10 47 Penalty for strong ring (4.78%) near resln. 1.875 -10 37 Indexed 305 spots, or 58% of all spots used in indexing 58 95 Penalty for RMS residual value of 0.121 -12 83 Penalty for Mosaicity value of 0.4 -18 65 >=5 reflns predicted and found in 5th shell (2.04-2.17)Å 10 75 >=5 reflns predicted and found in 6th shell (2.17-2.34)Å 10 85 >=5 reflns predicted and found in 7th shell (2.34-2.58)Å 10 95 I/sig == 57.8 in 5th predicted and found shell (2.04-2.17)Å 9 104 I/sig == 61.2 in 6th predicted and found shell (2.17-2.34)Å 10 114 I/sig == 103.3 in 7th predicted and found shell (2.34-2.58)Å 10 124 ------------------------------------------------------------------------------- Cumulative 124
Lysozyme 4_12rank = 112 ------------------------------------------------------------------------------- Category Points Cumul ------------------------------------------------------------------------------- >=5 reflns found in 5th shell (2.25-2.39)Å 10 10 >=5 reflns found in 6th shell (2.39-2.57)Å 10 20 >=5 reflns found in 7th shell (2.57-2.83)Å 10 30 I/sig == 15.7 in 5th found shell (2.25-2.39)Å 2 32 I/sig == 19.5 in 6th found shell (2.39-2.57)Å 3 35 I/sig == 22.9 in 7th found shell (2.57-2.83)Å 3 38 Penalty for spot sharpness of 0.10 -1 37 Penalty for strong ring (1.09%) near resln. 4.031 -10 27 Indexed 242 spots, or 57% of all spots used in indexing 57 84 Penalty for RMS residual value of 0.086 -8 76 Penalty for Mosaicity value of 0.5 -20 56 Refined 186 spots, or 19% of all predictions 18 74 >=5 reflns predicted and found in 5th shell (2.25-2.39)Å 10 84 >=5 reflns predicted and found in 6th shell (2.39-2.57)Å 10 94 >=5 reflns predicted and found in 7th shell (2.57-2.83)Å 10 104 I/sig == 17.6 in 5th predicted and found shell (2.25-2.39)Å 2 106 I/sig == 19.7 in 6th predicted and found shell (2.39-2.57)Å 3 109 I/sig == 22.4 in 7th predicted and found shell (2.57-2.83)Å 3 112 ------------------------------------------------------------------------------- Cumulative 112
197.5 221 260 Score VariabilityRank Values vs. Exposure Time Images / Rules 1 2 3 4 5 6 7 8 9 10 11 Total Thaumatin – 5 sec/0.5º: Rmerge = 12.9 % (32.5 %) thau3 501,561 60 22 -2 -20 0 56 -5 -9 19 70 23 214 thau3 501 60 22 -2 -20 0 58 -5 -12 14 60 21 196 thau3 545 50 18 -3 -20 0 55 -5 -6 24 50 16 179 thau3 590 60 28 -3 -20 0 55 -6 -10 18 60 22 204 thau3 626 50 22 -3 -20 0 59 -6 -7 20 70 26 211 Thaumatin – 10 sec/0.5º: Rmerge = 10.3 % (27.5 %) thau3 1001,1061 70 32 -3 -20 0 57 -6 -11 20 70 30 239 thau3 1001 60 31 -3 -20 0 57 -6 -12 18 60 28 213 thau3 1045 60 26 -3 -20 0 53 -6 -11 22 70 25 216 thau3 1090 60 32 -3 -20 0 57 -6 -10 21 60 27 218 thau3 1126 70 33 -2 -20 0 55 -6 -13 17 70 33 237 Thaumatin – 30 sec/0.5º: Rmerge = 8.4 % (25.8 %) thau3 3001,3061 70 46 -3 -20 0 53 -7 -12 21 70 42 260 thau3 3001 60 40 -3 -20 0 57 -6 -11 21 60 40 238 thau3 3045 70 45 -3 -20 0 54 -6 -10 24 70 40 264 thau3 3090 70 48 -3 -20 0 57 -6 -11 23 70 42 270 thau3 3126 70 47 -2 -20 0 56 -6 -11 20 70 44 268
291 305 Score VariabilityData sets collected with VariMax optics Images / Rules 1 2 3 4 5 6 7 8 9 10 11 Total VariMax-HR : Rmerge = 2.9 % (22.3 %) LYS0503_screen 1-2 70 46 -1 -10 -5 51 -18 -13 46 70 42 278 LYS0503_screen 1 70 46 -1 -10 -5 54 -15 -11 50 70 41 289 LYS0503_screen 2 70 44 -1 -10 0 56 -18 -14 44 70 41 282 LYS0503_ 1 70 46 -1 0 -5 56 -17 -12 48 70 42 297 LYS0503_ 45 70 46 -1 -10 0 57 -15 -13 45 70 42 291 LYS0503_ 90 70 46 -1 -10 -5 57 -16 -12 47 70 42 288 LYS0503_ 116 70 46 -1 -10 -5 57 -16 -12 49 70 40 288 VariMax-HR : Rmerge = 2.8 % (15.0 %) LYS0503_screen 1-2 70 57 -1 -10 0 56 -23 -18 39 70 57 297 LYS0503_screen 1 70 58 -1 0 -15 57 -23 -17 42 70 57 298 LYS0503_screen 2 70 57 -1 -10 0 59 -23 -17 42 70 57 304 LYS0503_ 1 70 57 -1 -10 0 58 -21 -17 43 70 56 305 LYS0503_ 45 70 58 -1 -10 0 57 -21 -17 46 70 56 308 LYS0503_ 90 70 58 -1 -10 -5 55 -22 -18 39 70 57 293 LYS0503_ 116 70 57 -1 0 -5 57 -22 -14 47 70 55 314
What Have We Learned? • Signal-to-noise is predominant factor in current d*TREK release • This is intentional! Should it be? • Each of the 11 rules have independent parameters that can be adjusted to optimize for your case • Image processing adds domino effect to ranking • Better refinement, higher rank • Lower mosaicity, higher rank • Fewer twin spots, higher rank • Spot sharpness analysis is not robust • Incorporate graph theory • Potential Pitfalls • Weak diffractors • lowest 3 resolution bins should not excluded from spot analysis • Image Header Accuracies • Anisotropy • Need images at multiple angles • These effects become effectively ‘averaged’ across images • Merohedral twinning
Recent d*TREK Improvements • Don’t ignore lowest resolution bins • Image Header Accuracies • Command line override • Anisotropy • Incorporated anisotropy check and another rule • Rank each image, calculate average and ESD • Apply penalty as multiple of ESD • Data Collection Strategy improvements • Automatic exposure time calculation (using ‘intelligent’ algorithm) • Optimize detector space for diffraction resolution • Multiple scan strategy, if possible
Acknowledgements Russ Athay Robert Bolotovsky Joseph D. Ferrara Thad Niemeyer Karen Opersteny J.W. Pflugrath