De Novo design tools for the generation of synthetically accessible ligands

De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko

Receptor Structure Based Drug Design Objective: To suggest potential leads that • bind strongly to a given protein because of shape and electrostatic complementarity • Are easy to synthesise Approaches: • Docking methods (preferably flexible docking) identify new lead structures by rapidly screening a database of 3-D structures of known compounds • De novo design methods (such as SPROUT) construct a diverse set of entirely novel potential leads from scratch

Detects potential binding pockets of the protein structures Identifies favourable hydrogen bonding interaction sites(H-bonding, hydrophobic, covalent, metal, user defined) Docks structures to target interaction sites Generates 3D molecular structures of novel ligands by linking the docked starting fragments together in an incremental construction scheme Scores, sorts and clusters the solutions SPROUT Components

Eliminate candidates with poor estimated binding affinity Binding Affinity Score Eliminate candidates with complex molecular structures Synthetic Feasibility Problem with Large Answer Sets De novo design programs such as SPROUT can suggest large sets of entirely novel potential leads Powerful heuristics are necessary to evaluate (and reduce) often large answer sets

For de novo design prediction of synthetic accessibilty is equally important Hypothetical ligands, including those predicted to bind very strongly, have no practical value unless they can be readily synthesised. Our Attempts to Provide Solutions: • CAESA(estimates synthetic accessibility) • Complexity Analysis(estimates structural complexity and drug-likeness) • SynSPROUT(avoids the problem by building constraints into the structure generation process)

CAESAComputer Assisted Estimation of Synthetic Accessibility Glenn Myatt Jon Baber

Goals of CAESA Project • Clear need for automated method of ranking hypothetical compounds according to perceived ease of synthesis • Good synthetic chemists can do this job themselves on small number of compounds but are unwilling to do it for hundreds or thousands of compounds • CAESA attempts to do the same job but never gets bored!

Estimation of Synthetic Accessibility: Criteria used by CAESA CAESA scores the synthetic accessibility of structures using two main criteria: a) An estimate of structural complexity: • stereocentres • complex topological features (fusions etc.) • functional group complexity b) Availability of good starting materials: • rapid retrosynthetic analysis • database of commercially available materials • reaction rule base (editable)

CAESA Components

Automatic Selection of Starting Materials Starting Materials and Synthetic Accessibility • Availability of suitable starting materials very important factor - good starting materials can dramatically reduce the difficulty of synthesising a compound. • Good starting materials for part of the target molecule means the analysis of structural synthetic difficulty or complexity can be directed to just those portions of the target molecule that cannot be made from available starting materials • Finding good starting materials through retrosynthetic analysis also provides possible synthetic routes as a byproduct

Traditional Retrosynthetic Analysis

Bidirectional Search for Synthetic Routes

Example of Starting Material Selection

Summary of CAESA Features • CAESA carries out a retrosynthetic analysis which terminates when a starting material from a database (such as ACD) is found • Found starting materials are scored according to length and difficulty of reaction sequence and coverage of target compound • All chemistry rules and transformations are described in editable text knowledge bases easily modified by chemists • Quality of the analysis depends on the chemistry included in the knowledge bases and the comprehensiveness of the starting material libraries • But CAESA is relatively slow and speedier methods needed for pruning of large data sets

Alternative ApproachComplexity Analysis Based on statistical distribution of various substitution patterns found in databases of existing drugs and available starting materials. Molecular Complexity Analysis of de Novo Designed LigandsKrisztina Boda and A. Peter JohnsonJ. Med. Chem.; 2006; ASAP Web Release Date: 26-Jan-2006

Assumption If a molecular structure contains ring and chain substitution patterns which are common in existing drugs than the structure is likely to be “drug-like” as well as readily synthesisable available starting materials, then the structure is likely to bereadily synthesisable Complexity analysis based on statistical distribution of various substitution patterns

Input structure Building Complexity Database Enumerate chain patterns Enumerate ring/ring substitution patterns • 1-centred • 2-centred • 3-centred • 4-centred Database of chains Database of rings/ring substitutions

3780 610 420 83 352 266 32 6 32 21 30 Atom Substitution Hierarchy Ring (and chain) substitutions are organised in hierarchies The hierarchy stores: • Atom type sequence • Number of occurrences • Binding properties Total occurrences of the topology: 11,801 3591 1586 494 688 537 62

DATABASE of hierarchies + frequency of occurrences Ligand Complexity Analysis 1. Enumerate ring and chain patterns [More Patterns] 2. Generate canonical names for each atom pattern Canonical name : A Canonical name : B Canonical name : C 3. Match canonical name against the hierarchy roots of the database 5. Rank structures by complexity score Speed of Complexity Analysis ~ 1000-1200 structures / minute on Linux PC (3GHz) 4. Retrieval of frequency of occurrences → Calculate score

Calculation of Complexity Score CONCEPT Penalise atom patterns which are infrequent or not present in the complexity database. Penalty values can be altered to tailor the system for different applications. In SPROUT the complexity analysis is followed by ranking the putative ligands according to their evaluated complexity score. The penalty values used in the examples presented here are 25, 20, 15, 10 for 1-,2-,3- and 4-centred chain patterns, 40 and 30 for rings and ring substitutions.

Validation ExperimentComparison with CAESA Both methods used to estimate synthetic accessibility for the same set of 50 top selling drugs

CAESA vs. Complexity Analysis Complexity scores are calculated using the complexity database derived from available SMs + 2.0 penalty for each identified stereo centre in the structures. Elapsed time: CAESA : 703 sec Complexity Analysis : 8 sec

Complexity Analysis vs CAESA • More suitable for prioritization of thousands of structures within a reasonable time frame. • Provides acceptable compromise between the speed of the analysis and the accuracy of calculated scores. • Because this approach is based on characteristics of existing readily available compounds, simple but novel structural features may be wrongly identified as complex

Yet another alternative approach Build synthetic feasibility into the structure generation process ~

SynSPROUT Approach Classic SPROUT SynSPROUT Ease of synthesis is a key factor in drug development Build synthetic constraints into structure generation process fuse Built in / user defined reactions: Amide formation Ether formation Ester formation Amine alkylation Reductive amination etc. spiro new bond SynSPROUT Scheme VIRTUAL SYNTHESIS IN RECEPTOR CAVITY Synthetic Knowledge Base Fragment Library Pool of readily available starting materials Reliable high yielding reactions Readily synthetisable putative ligand structures

Current Status • Promising structures with estimated high binding affinity • SynSPROUT provides the equivalent to screening a large number of combinatorial libraries • Potential for suggesting starting points for new combinatorial libraries • Combination of a large starting material library with a large reaction knowledgebase causes a combinatorial problem – even with parallel processing • Restricting either size of library or number of synthetic reactions gives acceptable run times

De Novo Structure Generation vs. Lead Optimization De Novo Structure Generation Lead Optimization To suggest better ligands structurally similar to the bound one AIM To generate diverse putative ligands from scratch AIM No structural information from any existing bound ligand is utilised The structure of a good bound ligand provides a starting point (core)

Variations on the SynSPROUT ThemeSPROUT LeadOpt Two modes for structure based lead optimisation • Core Extension – Extends core structure (derived from lead) by virtual synthetic chemistry • Monomer Replacement – Replaces monomers which have been identified by retrosynthetic analysis of a lead compound

Core Extension • Import the modified bound ligand (core) + identify substitution points (functional groups) • Generate core + monomer product by performing virtual synthetic reaction(s) at selected functional groups • Estimate binding affinity for products

CORE CORE R21 R22 CORE R23 R11 CORE R12 CORE R13 R32 R31 R33 CORE CORE Core Extension Scheme Monomer Library GeneralScheme All possible core + monomer combinations are generated Multiple low energy conformers + detected functional groups Simulate synthetic reaction in the 3D context of receptor site Synthetic Knowledge Base List of reactions (between functional groups) Core Structure

Automatic Monomer Library Generation SDF file of 3D monomers Perception Knowledge Base Synthetic Knowledge Base Atom & Ring Perception • Aromaticity • Normalisation • Hybridisation • H-bonding • properties Functional Groups Detect Functional Groups (joining points) Synthetic rules Monomer Library Multiple low energy conformers + detected functional groups …

2 1 + 5 3 4 Synthetic Knowledge Base CHEMICAL-LABEL <Carboxylic Acid> C[SPCENTRE=2](=O)-O[HS=1] CHEMICAL-LABEL <Primary Amine> C-N[HS=2];[CONNECTION=1] EXPLANATION Amide Formation IF Carboxylic Acid INTER Primary Amine THEN delete-atom 3 change-hybridization 5 to SP2 form-bond - between 1 and 5 DIHEDRAL-ATOMS 2 1 5 4 DIHEDRAL 0 0 BOND-LENGTH 1.35 END-THEN Steps of Joining Rules • Steps of formation • Hybridization changes • Bond type • Bond length • Dihedral penalty/angle

Importing the Core Structure (from MOL/PDB file in Elephant module) Importing from a pdb file pdb→mol converter is invoked Functional group(s) are automatically detected when the core structure is imported into the system Hydrogen donor/acceptor or spheric target sites anchor the imported core structure inside the receptor cavity, partially restricting the displacement of the core during lead optimization, but allowing slight movements in order to avoid boundary violations.

R2 Core Sulphonamide Formation Amide Formation R1 Product Generation I. Step I. Generate products by mimicking synthetic reactions between core + monomers

Core R2 R1 Product Generation II. Step II. Ligand flexibility = generate multiple low energy conformers Rigid body docking Secondaryconformers generated by twisting about rotatable bonds of the low energy monomer conformers • User defined parameters: • Max deviation • Sampling of dihedral angles • Max penalty Primary monomer conformers generated by (a) CORINA + ROTATE (b) sampling discrete dihedral angles around formed bonds

Product Generation III. Step III. • Docking + rejection of conformers with • High internal energy • Boundary violation

Slave2 R2 R2 R2 CORE CORE CORE R1 R1 R1 R3 R3 R3 Slave1 Slave3 Multiple Extension Points Combinatorial Problem • Clients-Master-Slaves architecture • Mixed SGI/Linux cluster network (TCP/IP socket network communication) Linux SGI … Client1 Client2 Client3 Master … Each slave performs optimization on different core + monomer combination

PDB: 1KE8 CORE R2 R1 Case Study (CDK2)

Monomer Reagent Library Generation Applied filters Maybridge & Aldrich (~140.000) 2D structures • Number of heavy atoms ≥ 8 • Number of heavy atoms ≤ 16 • Number of acceptor atom ≤ 5 • Number of donor atoms ≤ 3 • Number of rotatable bonds ≤ 2 • Max chain length ≤ 3 • Allowed atom types: H, B, C, N, O, F, S, Cl, Br • Number of rings ≤ 3 • Stereo centres ≤ 1 • No 3,4,7,8,9 –membered ring 1171 2D structures At least one of the following functional groups: • Carboxylic Acid • Primary Amine • Primary Alkyl Halide • Carbonyl CORINA ROTATE Monomer Library 4557 3D conformers Case Study (CDK2)

CORE R2 R1 Primary amine reacts with Sulphonyl chloridereacts with • Carboxylic acid in amide reaction • Primary aryl halide in amine alkylation reaction • Carbonyl in reductive amination and imine formation • Primary amine in sulphonamide formation Case Study (CDK2)

Results • Elapsed time ~ 5 Hours (with 100 slave processors) • R1 +Core + R2 combinations: • Screened 81.23% • Failed 4.87 % • Accepted 13.90 % (54,123) 432,345 combinations x = CORE R2 R1 R1 Monomer Library R2 Monomer Library 523 Primary Amine 293 Carboxylic Acid 93 Primary Alkyl Halide 393 Carbonyl Case Study (CDK2)

Case Study (Generated Products) -7.95 -7.47 -7.82 -7.56 -7.75 -7.45 -7.60 -7.07

Monomer Replacement Many lead compounds are composed of readily available starting materials (monomers) linked by reliable high yielding reactions Retrosynthetic analysis can be used to identify the monomers Structurally related analogues could be generated by exhaustive monomer replacement Considerable efficiency gains if monomer library is arranged in a hierarchy based on substructural relationships

Substructure Substructure No overlap Superstructure Superstructure No overlap Substructure Hierarchy Construction Amide

Hierarchy Usage Amide

Monomer Replacement Do they exist in starting materials HIERARCHY? Retro-synthetic analysis

CASE STUDYOptimisation ofSPROUT designed inhibitors of p falciparum Dihydro-orotate Dehydrogenase using Monomer Replacement

High scoring monomer replacement resultsMonomer replacement gave 840 new structures (including multiple conformers of the same structure) Scores – 7.50 to 9.30.

Experimental Results for Some Ligands Suggested by SPROUT LeadOpt Monomer Replacement

Conclusions • Scoring functions for assessment of binding affinity of the hypothetical compounds produced by de novo design are far from perfect • Hence only readily synthesisable putative ligands will undergo experimental evaluation by medicinal chemists • Assessment of synthetic feasibility is a tractable problem

De Novo design tools for the generation of synthetically accessible ligands

De Novo design tools for the generation of synthetically accessible ligands

Presentation Transcript

Accessible Test Design

Next-Generation HIL Design Tools for Next-Generation Vehicles

Accessible Design

2010 Standards for Accessible Design

New tools for MIAPE Generation

Developing Accessible Application Software for Individual de novo Genome Projects

Typical Ligands (Alkyl ligands)

ESLT The next generation of Design Automation Tools

Accessible Design

The generation of T cell receptor ligands

Tools for Creating Accessible Math

de novo Protein Design

DE NOVO DESIGN OF A THYMIDYLATE KINASE INHIBITOR

Dictionary of ligands

The New Generation of CAD Tools

Accessible Software Design

Dimensions of Accessible Design

Accessible Web Survey Tools

A new generation of Learning Design tools

De novo Peptide Design

Accessible Test Design

Accessible Software Design