270 likes | 376 Views
MultiY Recursive Partitioning – Method and Applications Robert Brown, Shashidhar Rao, Tom Stockfisch, Accelrys Inc David Roush, Litai Zhang, FMC UK-QSAR meeting – June 2002. Outline. Introduction PUMP-RP Methodology Selectivity Study – COX2 inhibitors HTS study – FMC Summary. Introduction.
E N D
MultiY Recursive Partitioning – Method and ApplicationsRobert Brown, Shashidhar Rao, Tom Stockfisch, Accelrys IncDavid Roush, Litai Zhang, FMCUK-QSAR meeting – June 2002
Outline • Introduction • PUMP-RP Methodology • Selectivity Study – COX2 inhibitors • HTS study – FMC • Summary
Introduction • High-throughput chemistry and biology are creating a wealth of data that can lead to knowledge to expedite the drug-discovery process • Requirement for high-throughput methods to model HTS data for in-silico screening • HTS data is characterized by huge number of observations, low hit rates, lots of noise • Need high-speed methods for prediction • Recursive Partitioning (CART, FIRM), Linear Discriminant Analysis, Neural Nets, Binary QSAR etc • Would like to understand trends and selectivity across assays • mine the HTS data matrix
Standard CART RP • Input • Multiple descriptors (X) - continuous or categorical • Single screening result (Y) - categorical (e.g. yes/no) • Decision tree aims to separate different types of observation into different leaves of the tree • Two step procedure:(1) overgrow then (2) prune • decrease impurity during growth phase • choose split with greatest drop in impurity • stepwise procedure w/ no look-ahead • examines only a small fraction of possible trees • over-grows the tree 1 • decrease R during pruning • R R0 + Nterminal • stepwise procedure finds optimum R over all possible subtrees of the overgrown tree 2
Understanding Selectivity? Target 1 Target 2 I I A I A I A A I I A A I A I A • Hard or impossible to compare trees to see what produces selectivity • Requires enough data to determine two separate trees I
I1 A1 I2 I2 A1 I1 I1 I2 A2 I2 A2 I2 A2 I2 PUMP-RP Yk-generic splits • One tree combines both responses • Easy to see what makes a molecule selective • Easy to see what the targets have in common • Twice the activity data available to determine generic portion • Use for Specificity (e.g., Y1, Y2 different targets) • Use for multi-physical models (e.g., Y1= activity, Y2= toxicity) Activity -type splits Yk-specific splits • Partially Unified Multiple Property Recursive Partitioning: A New Method for Predicting and Understanding Drug Selectivity, Thomas Stockfisch in preparation for J. Chem. Inf. Comput. Sci.
Yk=1 K-split wins X2 .0.61 X1.61 X10.41 2,I X2 0.41 1,I Multi-Y Single-Y plus K column X12a0.91 X12b 0.91 X1 X2 Y1 Y2 X1 X2 K Y 1,I 2,I 0.1 0.2 I I 0.1 0.2 1 1,I 0.1 0.2 2 2,I 0.2 0.4 A I 0.2 0.4 1 1,A 0.2 0.4 2 2,I 0.3 0.6 A unk 0.3 0.6 1 1,A X12b 0.11 X12a 0.11 2,I 1,I 2,A 1,A 1,I 2,I X Yk separate Y1 model separate Y2 model X X Yk Yk X X X New algorithm • Obtain a balance between a single general tree and a series of unrelated specific trees • Procedure • 1. Map data to a single Y variable • 2. Grow a pure specific tree - k node at level 1 • 3. Regrow a k-branch - save the k split and replace with a non-k split • 4. Recursively repeat step 3 moving the k-nodes “down” until arriving at the maximally generic tree • 5. Prune the generic tree - replace some generic branches with specifics • 6. Find the optimal tree to balance specificity and generality
Outline • Introduction • PUMP-RP Methodology • Selectivity Study – COX2 inhibitors • HTS study – FMC • Summary
Selectivity Study: COX-2 selectivity • Cyclooxygenase (COX) is a key enzyme in the prostaglandin biosynthesis via the pathway of arachadonic acid breakdown. • Two isoforms, COX-1 (constitutive) and COX-2 (triggered by inflammatory insults) are known and characterized. • COX-2 inhibitors are anti-inflammatory agents with minimal GI side-effects. • Celebrex and Vioxx • Inhibition of COX-1 can lead to gastric damage, hemorrhage or ulceration • NSAIDS e.g Iboprofen, Aspirin etc Partially Unified Multiple Property Recursive Partitioning (PUMP-RP) Analyses of Cyclooxygenase (COX) Inhibitors, Shashidhar N. Rao &Thomas P. Stockfisch in preparation for J. Chem. Inf. Comput. Sci.
Study Input • 454 Diaryl heterocycle cyclooxygenase (COX) inhibitors with phenyl sulfones & phenyl sulfonamides from published literature. • Inhibitory activities (IC50) against COX-1 and COX-2 isoforms of the enzyme. • Divided into 2 classes for each target: • COX-1 - IC50 > 5 M (Class 0). IC50 <= 5 M (Class 1) • COX-2 - IC50 > 0.5 M (Class 0). IC50 <= 0.5 M (Class 1) • Divided into • Test set (TE) of 50 compounds: 17 COX-2 selective • Training set (TR) of 404 compounds: 181 COX-2 selective. • External validation sets • 25 Merck cyclooxygenase inhibitors • represents a different class of chemistry than that covered by the training and test sets • 8 NSAIDs (aspirin, ketoprofen, naproxene, desmethylnaproxene, ibuprofen, indomethacin, phenytoin and diclofenac) • all active and non-selective
generic split Yk = 1 split TRUE Specific split FALSE I1 (125) Example Tree COX-2 selective A2 (30) I2 (95) HB Donor <=1 Jurs-FNSA-3 <= -0.2 A2 (112) A1 (112) I2 (61) AlogP98 <=3.1 I1 (61) ISIS_key59 I2 (6) A1 (6) JY <=2.083 A2 (100) A1 (100)
Why not just calculate two trees? A2 (127) FH2O <=-30.1 AlogP98 <= 2.6 I2 (4) I2 (4) Apol <=14051.8 COX-2 Inhibition A2 (9) JX <=2.01 ISIS Key #75 I2 (231) I2 (29) I1 (148) Dipole Mom. <=5.87 JX <= 1.79 COX-1 Inhibition A1 (8) ISIS Key #94 A1 (6) A1 (9) A1 (104) Shdw-XZ fract <= 0.7 ISIS Key #66 Shdw-nu <= 2 AlogP98 <= 3.1 A1 (36) I1 (63) I1 (30)
Prediction of Selectivity • Percentage of actives correctly predicted by RP trees compared to experiment • Enrichment in Cox2 selectives • 1.56 to 1.86 in the training set (TR) • 1.60 to 2.29 in the test set (TE) • Remember: 44% of TR is Cox2 selective, so the best possible enrichment in TR would be ~2.2
SRfp SRfn False positive and negative selectivity rates
External Validation Sets • 25 Merck compounds – 21 actives including 13 COX2 selective, 4 inactive • 21 correctly predicted COX2 active, 8 correctly predicted COX1 active • 8 correctly predicted COX2 selective • Correctly predict that none are COX1 selective • 8 NSAIDs: aspirin, ketoprofen, naproxene, desmethylnaproxene, ibuprofen, indomethacin, phenytoin and diclofenac. • All predicted to be non-selective • five of them (ketoprofen, naproxene, ibuprofen, indomethacin and diclofenac) are predicted to be active • three including aspirin predicted inactive • Aspirin is a weak inhibitor of both COX 1 and 2 (IC50 ~ 150-300 nM)
Outline • Introduction • PUMP-RP Methodology • Selectivity Study – COX2 inhibitors • HTS study – FMC • Summary
Assay Enrichment Study • 66000 FMC compounds library screened in two functional assays (I and II) returning two classes of activity (0 and 1) • Assay I has two follow up assays [I(1); I(2); I(3)] • 60, 33, 24 actives respectively • Assay II has one follow up assay[II(1); II(2)] • 109, 12 actives respectively • X(1) is a primary assay, whilst (2) and (3) are related to specific mechanisms • Goal • Combine multiple data from multiple assays for endpoint X to • Explain factors causing activity • Use maximum data to get best predictive model
Computational Protocol • The 66000 compounds were divided in half for training and test sets with even distributions of actives/inactives for both assays • Six sets of descriptors • Bcuts (8), • Cerius2 Fast descriptors (199), • Jurs descriptors (30), • ISIS keys (166), • 3D Atom pairs (825) • CCG-2D (145) Mining Large Databases Using Multiple Y Recursive Partitioning, David Roush, Litai Zhang, Thomas Stockfisch and Shashidhar Rao, in preparation for J. Chem. Inf. Comput. Sci.
Single Y vs Multi Y – Cerius2 Descriptors Test Set Results
Single Y vs Multi Y – ISIS Keys Test Set Results
Single Y vs Multiple Y • Multi Y produces better enrichments with better false positive rates • Single Y produces better false negative rates • => More information has produced a more selective screen • Logistically, only one experiment to run • Multi Y allows the factors/descriptors important to all assays to be identified
Summary • PUMP-RP procedure creates tree with target-generic splits near the root, target-specific splits near the leaves, and separated by splits on the activity type. • the generic splits benefit from being determined by a larger amount of data than if separate models were made • easy to interpret which splits determine specificity and which show commonality of target • Prediction and understanding of COX-2 selective molecules • Large scale experiments with FMC show use of multiple assay data to enhance understanding of activity • Commercial released in Cerius2 4.6
Forthcoming Publications • Methodology • Partially Unified Multiple Property Recursive Partitioning: A New Method for Predicting and Understanding Drug Selectivity, Thomas Stockfisch, in preparation for J. Chem. Inf. Comput. Sci. • COX Selectivity Study • Partially Unified Multiple Property Recursive Partitioning (PUMP-RP) Analyses of Cyclooxygenase (COX) Inhibitors, Shashidhar N. Rao &Thomas P. Stockfisch in preparation for J. Chem. Inf. Comput. Sci. • FMC HTS Study • Mining Large Databases Using Multiple Y Recursive Partitioning, David Roush, Litai Zhang, Thomas Stockfisch and Shashidhar Rao. in preparation for J. Chem. Inf. Comput. Sci.