Forming focused libraries and discovering active molecules with Iterative Stochastic Elimination

Forming focused libraries and discovering active molecules with Iterative Stochastic Elimination Amiram Goldblum, Anwar Rayan and David Marcus Dept. of Medicinal Chemistry School of Pharmacy Ein Kerem Campus http://www.md.huji.ac.il/models

Iterative Stochastic Elimination (ISE)Our Generic toolfor optimizing highlycomplex combinatorial problems Problem type: Systems with many variables, each variable having many discrete values, the variables interacting with each other, and each state of the system can be evaluated and given a score (transportation, communication, electronic devices, life sciences) Method: ISE finds optimal system states (global and local minima/optima) by iteratively eliminating values of variables that contribute to worst results. Elimination is based on careful statistics of randomly picked states of the system Why: ISE has been compared to Genetic Algorithms, Monte Carlo, Simulated annealing, Support Vector Machines and other optimization methods – on specific problems and found to do as well or better

Iterative Stochastic Elimination publications • Glick, M. & Goldblum, A. A novel energy-based stochastic method for positioning polar protons in protein structures from X-rays. Proteins-Structure Function and Genetics 38, 273-287 (2000). • Glick, M., Rayan, A. & Goldblum, A. A stochastic algorithm for global optimization and for best populations: A test case of side chains in proteins. Proceedings of the National Academy of Sciences of the United States of America 99, 703-708 (2002). • Noy, E., Gorelik, B., Rayan, A. & Goldblum, A. Stochastic path to form ensembles and to quantify flexibility in proteins. Abstracts of Papers of the American Chemical Society 225, U781-U781 (2003). • Rayan, A., Barasch, D., Brinker, G., Cycowitz, A., Geva-Dotan, I., Scaiewicz, A. & Goldblum, A. New stochastic algorithm to determine drug-likeness. Abstracts of Papers of the American Chemical Society 226, U297-U297 (2003). • Rayan, A., Scaiewicz, A., Geva-Dotan, I., Barasch, D. & Goldblum, A. Screening molecules for their drug-like index. Abstracts of Papers of the American Chemical Society 228, U358-U358 (2004). • Rayan, A., Senderowitz, H. & Goldblum, A. Exploring the conformational space of cyclic peptides by a stochastic search method. Journal of Molecular Graphics & Modelling 22, 319-333 (2004). • Rayan, A., Noy, E., Chema, D., Levitzki, A. & Goldblum, A. Stochastic algorithm for kinase homology model construction. Current Medicinal Chemistry 11, 675-692 (2004). • Rayan, A., Scaiewitz, A., Geva-Dotan, I., Marcus D., Barasch, D. & Goldblum, A (2007). Determining the Drug Like character of molecules and prioritizing them by a drug like index, ACS presentations 2005-8. • Noy, E., Tabakman, T. & Goldblbum A. Constructing ensembles of flexible fragments by ISE is relevant to protein-protein interfaces, Proteins (2007) 68, 702-711 • Gorelik, B & Goldblum, A. High Quality binding modes in docking ligands to proteins. Proteins (2008), 71, 1373-1386

General Model System A1 A2 • Variables • Values • Interactions B4 B5 B6 A7 A B B7 A6 C7 B8 C C6 The number of combinations: 7(A)x8(B)x7(C)xn(D)xm(E)….. =A very large number D C5 C4 E • An exhaustive calculation is not possible

(1) Randomly pick: one value for each of the variables B4 A7 A This determines a single “conformation” or “configuration” of the system B (2)Employ the “cost function” to score the current configuration C D C5 E

(3)Repeat steps (1) and (2) for n conformations (n~103-106), and calculate the total value of each sample 2 2nd value . . . . . . sample n nth value

(4) Construct a histogram of the distribution of values for all sampled conformations low values region high values region

(5) Examine the frequency of each variable value in worst results, compare to expected high values region zoom A3 A3 A3 A3 B4 B8 B4 C6 C6 C6 C6 D7 D2 D6 E8 E8 E2 F1 F2 F9 conformation 220 conformation 314 conformation 715

(6)Evict values that contribute above expectation to worst scores, and less than expected to best A3 B4 B8 B4 C6 D7 D2 D6 E8 E8 E2 F1 F2 F9 conformation 715 conformation 220 conformation 314 The total number of combinations is reduced (7)Repeat the process iteratively until all remaining combinations can be evaluated exhaustively and sorted. We obtain a population

> 2 million molecules Target specificity 9 molecules, 5 measured, 3 active Acetylcholinesterase inhibitors with ISE Inhibition measured by Marta Rosin (Novartis’ Excellon) , Hebrew University School of Pharmacy Molecular chemical properties ISE “engine” ISE Docking and scoring

~ 1080 sequences Target specificity 10 peptides, 6 active Bcr-Abl dimerization inhibition by peptides 64aa Synthesized and measured by Martin Ruthardt, Goethe Univ. Frankfurt Properties of amino acids ISE “engine” ISE protein design

Distinguishing between actives and inactives, on a specific target Classification – Drugs vs. Non-drugs, Selectives vs. non Selectives Huge combinatorial problem with more than 10100 options Optimization problem: find differences in molecular properties to distinguish between actives and inactives

Learning from known data “Actives” : Molecules with activity < 100nm “Selectives” : Molecules with selectivity > 3:1 “Inactives”: MDDR (randomly picked), or less actives Properties (“descriptors”, our variables) are produced by computer programs (MOE): Molecular weight, number of H-bond donors & acceptors, partial charges, topological, polar surface, Van der Waals, Molar refraction etc…

Upper Range 500 1200 ~70 values 100 700 Randomly picked range Optimization of property ranges by ISE to distinguish between the two databases Each property is separated into two “sub properties” 0 1200 Lower Range 0 800 ~ 80 values at intervals of 10 Overall there are 80*70 = 5.6*103 combinations for ranges of this variable

Using properties to optimize the difference between actives (selectives) and inactives 2 < HD  6 -2 < logP  3 150 < M.W  775 • If we construct a RANGE for each property • Then we test each of the molecules in the Actives and each in the inactives A FILTER • Determine if TP, TN, FP, FN ( P N Pf Nf) • Compute the fraction of each category in the full DB • Use the Matthews Correlation to score

actives inactives Databases: Nf N P Pf Scoring by the Matthews Correlation Each given range is for ACTIVES, and actives can only be P or Nf For a fully correct prediction C = 1 For a completely erroneous prediction C = - 1 For a random prediction C ~ 0.00

Applying ISE to discriminate between actives and inactives by optimizing descriptor ranges Construct filter i: Pick randomly a value for each of the variables, i.e., low range MW, high range MW etc. Pass all actives and inactives of the training set through filter i P, N, Pf, Nf Get MCC value for filter i Until i =106 Histogram, Elimination, Iteration, Exhaustive, Test

Results of exhaustive step, before clustering Best filter

Employing the “best sets of filters” to construct a Molecular Bioactivity Index With good data, the range of MBI is large and we get a good “resolution” We have shown that we can use MBI to “fish” a few active molecules out of a “sea” of inactive ones http://www.md.huji.ac.il/models (look for “test MBI”)

Employing the “best sets of filters” to construct a Drug Likeness Index (DLI) Drug Likeness is different than Lipinski’s ROF !

MBI and DLI can make a difference in: • High Throughput Screening • Combinatorial Synthesis • Hit to lead development • Lead optimization • Construction of Focused libraries • Molecular scaffold optimization • Selectivity optimization

3.Diversity, Similarity Eliminate known actives A few hours 2.ZINC scan Few hrs. 1. Model building 2-3 days 4.SCIFinder manual search 4-5 days 5. Purchase/synthesize molecules 6. in vitro tests 1-2 months Timeline for discovery, single processor One target (enzyme, cells, organs…)

Input: VEGFR-2 KDR active inhibitors <100nm 549 actives divided randomly into 412 training and 137 test set Inactives are from MDDR

Output: example of a filter with 6 descriptors One of the best (high MCC); there are others with higher MCC but many desciptors Number of descriptors – 6 MCC of test set – 0.79 TP - 98.9 TN - 78.6 Bcut_SMR_3 0.0 – 3.06 SMR_VSA4 0.1 - 100.6 Vsa_pol 0.1 – 102.4 Reactive 0.0 – 0.999 balabanJ 0.0 - 1.902 Q_RPC- 0.0 – 0.267

A 6-property filter Bcut_SMR_3 Molar refraction SMR_VSA4 VdW surface area Vsa_pol Approx VdW polar surface Reactive Reactive fragments balabanJ Topological variable Q_RPC- Relative Negative partial charge

Enrichment in the training set of VEGFR2

Initial focused library from ZINC (2.1 million) ZINC library screening gave 7826 molecules with top MBI

Similarity of highest MBI to training set

BBB results

High Moderate Low ER-MBI “moving ensemble”(normalized MBI values) logRBA ER-MBI

ER-MBI Combined high/low MBI

Molecular bioactivity index

Molecular Bioactivity Index (MBI): Fishing actives from a “bath” of “non-actives” Mix 10 in 100,000 - find 9 in best 100, 5 in best 10 Enrichment of 5000 Enrichment of 900

Target1 selective Multi target MBI target1 Non-actives Target2 selective MBI target2 Polypharmacology – with our indexing method • We use several MBI (or MBI and DLI) to map activity into multiple targets. This may be used to extract potential new poly-active compounds or selective compounds depending on the behavior of the relevant disease

Docking & Scoring Do the molecules bind ? How strong is the binding affinity ? How does the complex look like ? Binding mode Score X-ray, NMR, Homology model Requirement: 3D structure of the target

ISE-dock • A new docking program from our lab that uses the ISE algorithm in order to produce large sets of optimal results for docking of ligands to their targets

ISE-dock • Better than AutoDock – the most cited docking program • Much better in the main docking criteria than other two popular programs – Glide and GOLD • Produces large near optimal docking populations to study the nature of binding and to predict alternative binding modes • Accounts for ligand and protein flexibility • Correlation between ISE-dock populations and experimental multiple binding modes

Anti Alzheimer current main drug strategy

Based on ~450 active molecules with IC50 < 10 micromolar ~8000 randomly picked molecules from ZINC assumed to be inactives

Docking with ISE-dock/Autodock We used the crystal structure of mouse AChE (1q84) for docking. Compounds in protonated state were docked to AChE by AutoDock3.0 and ISE-Dock. 751 out of 755 compounds were docked in the active site by both methods

ISE-dock results Fig 2 – AChE with ACh , the red color represents the negatively charged gorge due to many side chain aromatic rings 10 different conformations of one ligand in the AChE. Each color represents a different pose

10 compounds from docking results (financial limitation) The 10 compounds were picked by direct examination of each of these molecules in the active site, paying utmost attention to its conformation, H-bonds and other interactions.

Experimental Results 9 out of the 10 compounds were purchased 8 out of the 9 compounds reached our lab with enough quantity 5 out of the 8 compounds are soluble 3 out of the 5 compounds are active (IC50=3.25, 3.5, 3.75 µM) Similarity to known active compounds is less than 0.35 molecules are novel AChE inhibitors (not a single paper on any)

Conclusions • ISE is useful for solving extremely complex optimization problems • Provides large sets of graded results • Achieves high enrichments of “actives” vs. “inactives” by MBI, DLI, MSI etc. • Useful for developing multi-targeted drugs • Discovers new binders for known drug targets • Produces diverse sets of solutions

Molecular Modeling Group Partners http://www.md.huji.ac.il/models http://www.cancergrid.eu Prof. Andrej Bohac: Comenius U, Bratislava, VEGFR2 (Angiokem) DAC company Milan, HDAC and HSP90 inhibition Prof. Mart Sarma U. Helsinki, RET Kinase inhibition Prof. Martin Rhutardt U. Frankfurt, Bcr-Abl inhibition by peptides Prof. Yousef Najajreh Al Quds University, Bcr-Abl inhibitor synthesis Prof. Yossi Schlessinger Yale, FGFR inhibitors Prof. David Varon Hadassah, Jerusalem, ADAMTS-13 inhibition Prof. Angelo Carotti: School of Pharmacy, Univ. of Bari, MMP inhibitors Prof. Marta Rosin HUJI, AChE inhibitors

Molecular Modeling Group, HUJI http://www.md.huji.ac.il/models

Forming focused libraries and discovering active molecules with Iterative Stochastic Elimination

Forming focused libraries and discovering active molecules with Iterative Stochastic Elimination

Presentation Transcript

GAUSS ELIMINATION AND GAUSS-JORDAN ELIMINATION

Interferences with Urinary Elimination

Docking molecules with Vina

INTERFERENCES WITH ELIMINATION

Forming Women and Men for and with others

Working with Libraries

Explain it with Atoms and Molecules

Partnering With Libraries

Stochastic Background Search with VIRGO and GEO

Discovering libraries’ gold through collection-level descriptions

Integrating Digital Libraries with Traditional Libraries

TESTING AND CHARACTERISING FOCUSED TYROSINE KINASE INHIBITORY LIBRARIES

Secondary Libraries: Customer-Focused and Data-Driven

Joining and Rotating Data with Molecules

Making Libraries Active

Substitution and Elimination

Substitution and Elimination

Active Wisdom: Boomers in Libraries

Generative Programming and Active Libraries

Forming National Coalitions for Sustained Elimination of Iodine Deficiency