10 likes | 144 Views
The Silicon Chemist: Using logic-based machine learning and virtual chemistry to design new drugs automatically. Department of Computing, Imperial College, and Department of Life Sciences, Imperial College. Christopher Reynolds and Mike Sternberg. Results. Introduction.
E N D
The Silicon Chemist: Using logic-based machine learning and virtual chemistry to design new drugs automatically. Department of Computing, Imperial College, and Department of Life Sciences, Imperial College Christopher Reynolds and Mike Sternberg Results Introduction Figure 3. A scatter plot showing observed vs. predicted pKa in a cross validation of a PubChem bioassay. This shows that there are no false positives when using an activity cut-off of pKa 7. Observed activity False positives True positives Top 1% (6 actives) New drug leads are always needed, and virtual screening is now often used to simulate the bioactivity of compounds, in order to search as much of chemical space as possible in a reasonable amount of time. This project takes an existing logic based machine-learning approach to identifying bioactive compounds, and incorporates chemical synthesis rules, to design novel, easily synthesisable, and effective pharmaceutical drugs. Top 5% (25 actives) Predicted pKa Top 10% (47 actives) Active/Inactive (232 actives) True positive rate True negatives False negatives Fragmentation of molecules Observed pKa False positive rate Database of virtual reactions Figure 4. Receiver Operating Characteristic (ROC) curve, showing fraction of true positives against true negatives retrieved as the discrimination threshold is varied. ROC values for 1%, 5%, 10% and all actives are 0.912, 0.830, 0.814, and 0.892 respectively. This illustrates the sensitivity of active compound detection active(A):- positive(A, B), Nsp2(A, C), distance(A, B, C, 2.49, 0.5). Molecule is active if there is positive charge centre and an sp2 nitrogen atom 2.49±0.5Å. apart Logic-based drug discovery • Inductive Logic Programming (ILP) is a machine learning technique, which learns human interpretable qualitative rules from chemical knowledge of active drugs (see Figure 2), whichrelate structure to activity, and can guide the next steps drug design chemistry. • Using Partial Least-Squares or Support Vector Machines, the rules can then be weighted • Weighted rules used as a quantitativemodel of Quantitative Structure-Activity Relationship (QSAR) to predict drug activity. • This approach combines the human comprehensible rules of ILP with the predictive accuracy of advanced regression techniques. • This Support Vector Inductive Logic Programming (SVILP) method is being patented. • INDDEx (Investigational Novel Drug Discovery by Example) is a proprietary virtual screening program, incorporating SVILP, and developed by Equinox Pharma Ltd., an Imperial College spin-off company. • The QSAR model is then used to screen a database of all purchasable molecules to identify drug leads. • In a blind test, INDDEx had a hit rate of 30%, predicting around 30 active molecules, each capable of being the start of a new drug series, and each sufficiently novel that it could be patented. SVILP generates QSAR rules active(A):- phenyl(A, B), phenyl(A, C), distance(A, B, C, 4.1, 0.5). Molecule is active if there are two phenyl rings 4.1±0.5Å apart. to hold true up to the top 5% most active compounds provided to INDDEx as examples. Modified molecules Figure 2. Example ILP QSAR rules. Screen Molecular database Figure 5. From left to right: wireframe visualisations of the two reactant molecules entered in SMILES format are processed by the SMIRKS esterification reaction, and the resultant product molecule is returned. Hit rate ?% Screen Novel verified hits on synthesisable molecules Modify using all viable reactions Split molecule with reverse reaction Virtual chemistry Hit rate 30% To extend the machine learning method currently used in INDDEx, and move the process from hit to lead discovery, the logic-based rules produced by SVILP will be used to modify promising hits. This new process is shown in Figure 1. This project will concentrate on scanning through the dataset of purchasable molecules that partially fulfil the rules, and then altering the molecules to try and fit the remaining rules using a Novel verified hits Figure 1. Graphic showing the current method of finding matches for ILP derived rules, as used by the INDDEx software, and this project’s new method. database of virtual chemical synthesis reactions to combine these molecules with a database of fragment-like molecules. Through this method, the program will generate focussed libraries of synthetic derivatives around the promising hit molecules, and so explore a far greater section of easily synthetically accessible chemical space.