1 / 38

MDL Keys Revisited

MDL Keys Revisited. Joseph L. Durant , Burton A. Leland, Douglas R. Henry and James G. Nourse MDL Information Systems. Overview. What are MDL Keys? Constructing better keys metrics optimization by "educated guesswork" optimization by Genetic Algorithms Conclusions. What are MDL Keys.

mshephard
Download Presentation

MDL Keys Revisited

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MDL Keys Revisited Joseph L. Durant, Burton A. Leland, Douglas R. Henry and James G. NourseMDL Information Systems

  2. Overview • What are MDL Keys? • Constructing better keys • metrics • optimization by "educated guesswork" • optimization by Genetic Algorithms • Conclusions

  3. What are MDL Keys • a.k.a. SSKeys • Originally designed to support sub-structure searching • Bits encoding molecular features • Most follow the structure of: • a property on atom A • a property on atom B • A and B are separated by N bonds (0<=N<=4) • this pattern is encountered M or more times

  4. MDL Keys - What Are They? • Some keys code for specific bonds (C-Cl, S-P) • Other keys code for a property in an atomic neighborhood (C-CCO, Q-OO) • Still others are custom properties • Sgroup properties • rings • atom types

  5. MDL Keys - Standard Implementation • MDL’s SSKeys are encountered in 2 flavors: • a 960 keybitset • a 166 keybitset (Subset or User Keys)

  6. The 960 Keybitset • Created to support substructure searching • Encodes 1387 molecule features • Encodes features with >0, >1, >2 and >4 occurrences • Features can turn on 1, 2 or 3 keybits • many of the keybits can be set by multiple features

  7. The 166 Keybitset • Originally created to embody an earlier MDL keybitset • Largely correspond to “chemist-meaningful” features

  8. 166 Keybitset Definitions 1 - isotope 2 - 103<atomic number<256 . 84 - NH2 85 - CN(C)C 86 - CH2QCH2 . 165 - ring 166 - fragments

  9. Current Uses for MDL Keys • Clustering/diversity • Brown & Martin, JCICS, 1996, 36, 572-584. • McGregor & Pallai, JCICS, 1997, 37, 443-448. • Library generation/evaluation • Brown & Martin, J. Med. Chem., 1997, 40, 2304-2313. • Koehler, Dixon, & Villar, J. Med. Chem.,1999, 42, 4695-4704. • Ajay, Bemis, & Murcko, J. Med. Chem., 1999, 42, 4942-4951. • Koehler & Villar, J. Comp. Chem., 2000, 21, 1145-1152. • Information content/comparison • Brown & Martin, JCICS, 1997, 37, 1-9. • Jamois, Hassan, & Waldman, JCICS, 2000, 40, 63-70. • Briem & Lessel, Perspect. Drug Disc. Des., 2000, 20, 231-244.

  10. Can We Construct Better Keys? • Keybitsets optimized for substructure searching • Keybitsets constructed to minimize memory/storage footprint • But they work remarkably well already

  11. But... • bit-setting algorithm has untapped power • algorithm defines ~3200 unique features • algorithm allows keybit to be set for "N or more occurrences"

  12. Find a Metric

  13. Success Measure • Defined by Briem and Lessel, Perspect. Drug Disc. Des., 20, 231 (2000). • Modified to account for ties • Evaluates the ability to differentiate classes of activity

  14. Test Set • 134 PAF antagonists • 49 5-HT3 antagonists • 49 TXA2 antagonists • 40 ACE inhibitors • 111 HMG-CoA reductase inhibitors • 574 "random" MDDR compounds

  15. Success Measure - Evaluation • Calculate the 10 nearest neighbors for each "active" molecule • Calculate the fraction of nearest neighbors in the same activity class as the target • Allow for ties; expand the number of neighbors until the tie is broken

  16. Success Measure

  17. Starting Points... • 166 keybitset • 960 keybitset • 3234 keybitset

  18. Modifying the 960 Keybitset • all the "singly mapped" keybits • 726 keybitset • all the 960 keybitset features, one feature per bit • 1387 keybitset

  19. Initial Success Measures

  20. Optimization?

  21. Results of Random Pruning

  22. Intelligent Selection(Educated Guesswork) • Differentiating compounds • active from inactive • active from other actives

  23. Surprisal Analysis • Surprisal = log ( probability 1 / probability 2) probability 1 = "active" molecules probability 2 = "inactive" molecules • assume Poisson-distributed errors • | Surprisal S/N | = | Surprisal / ssurprisal |

  24. Surprisals for 166 Keybitset

  25. Surprisal S/N for 166 Keybitset

  26. Surprisal Pruning

  27. Success Measure vs. Surprisal S/N

  28. Success Measure vs. # of Keys

  29. Success Measure vs. # of Keys

  30. Success Measures

  31. Success Measures

  32. What About Multiple Occurrences? • Keybits can be set for >0, >1, >2,... occurrences of features • Inclusion of multiple occurrence keybits enhances performance for substructure searching

  33. Assembling a Composite Keybitset • Construct keybitsets for >0, >1, >2, >3... occurrences • Surprisal prune to the 2-sigma level • Concatenate the resulting keybitsets • only add keybits for new features

  34. Success Measure • Success Measure increases until "7 or more" occurrences • 1283 keybits in final set • Final success measure = 71.26%

  35. Genetic Algorithm • We used the SUGAL genetic algorithm package • written by Dr. Andrew Hunter at University of Sunderland, UK • Identification of local minima is straightforward • Small keybitsets with good performance can be identified • The global minimum is elusive

  36. Final Success Measures

  37. Conclusions • Key performance can be substantially improved by reoptimizing keybitsets • Key performance is not substantially improved for MDL keybitsets longer than ~500 bits

  38. Acknowledgements • use of SUGAL Genetic Algorithm Package, written by Dr. Andrew Hunter at University of Sunderland, UK • correspondence with and MDDR extregs from Dr. Hans Briem, Boehringer Ingelheim Pharma KG

More Related