320 likes | 461 Views
2 nd Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery. Flexsim-R: A new 3D descriptor for combinatorial library design and in-silico screening. Outline. Introduction The Flexsim-R Methodology Validation Conclusion and Outlook. Introduction.
E N D
2nd Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery Flexsim-R: A new 3D descriptor for combinatorial library design and in-silico screening
Outline • Introduction • The Flexsim-R Methodology • Validation • Conclusion and Outlook
Introduction What is Flexsim-R? Flexsim-R calculates 3D descriptors for reagents, based on the virtual affinity fingerprint idea
Motivation to develop Flexsim-R • Reagent-based descriptors are important for • combinatorial library design • virtual screening experiments • bioisosteric replacements • rational augmentation of inhouse reagent pool • For large combinatorial libraries, product-based descriptor calculation is often not feasible -> possible solution: reagent-based product selection (e.g. by a GA) • Descriptor calculation should be fast and automizable • Descriptor should be related to experimental affinity data • Encouragement by virtual affinity fingerprint methods
In-vitro Affinity Fingerprints Terrapin's Affinity Fingerprint Approach: (Kauvar et al., Chemistry & Biology, 1995, 2, 107-118) A1 A2 A3 A4 A5 A6 A7 A8 L1 Molecular similarity is defined by in-vitro binding patterns ("Affinity Fingerprints") of a ligand set (L) in reference binding assays (A) L2 L3 L4 L5 L6
Virtual Affinity Fingerprints (VAF) Terrapins in-vitro screening in diverse reference assays is simulated • by Computational Docking into a reference panel of protein pockets (Docksim, Flexsim-X) • by Computational Fitting onto a reference panel of small molecules (Flexsim-S) (Briem and Lessel, Perspectives in Drug Discovery and Design, 20 (2000) 231-244)
Protein pocket The Flexsim-R Method Problems with Rgroups in conventional VAF approaches: • Rgroups tend to be smaller than „drug-like“ molecules • Alignment rule by common core attachment point gets lost Solution: Core-constrained multiple-site docking
3. Protein Binding Pockets 2. Common Core 1. Rgroup Set The Flexsim-R Method Components of core-constrained multiple-site docking:
The Flexsim-R Method First step: • Docking of common core group with FlexX • Multiple (e.g. 50 best) solutions are stored • RMS threshold can be applied to prevent clustering
The Flexsim-R Method Example: Thrombin active site with 50 best FlexX solutions of hydantoin (RMS threshold = 2.0)
Descriptor Matrix Protein pocket Core Pos1 Core Pos2 ... R1 15.5 ... R2 11.2 ... ... R3 21.7 ... ... ... ... The Flexsim-R Method Second step: • Docking of core group + rgroup with FlexX • Pre-stored core positions serve as reference • FlexX scores are stored in descriptor matrix 15.7 22.0 13.5
Pocket 1 Pocket 2 Pocket 3 C1 C1 C2 C2 C3 C3 C1 C2 C3 R1 R2 R3 ... The Flexsim-R Method Multiple protein pockets -> Concatenated descriptor matrix
X1 X2 X4 X3 C1 C1 C1 C2 C2 C2 C3 C3 C3 C1 C2 C3 R1 R2 R3 ... The Flexsim-R Method Multiple core attachment points -> Concatenated descriptor matrix
The Flexsim-R Method Example: Hydantoin Core 4 attachment points * 7 protein pockets * 50 FlexX solutions -> descriptor vector length = 1,400
The Flexsim-R Method Test set for method development and evaluation: • Rgroups: 20 natural amino acids • Core groups: • 7 protein pockets: 1dwc, 1eed, 1pop, 2tsc, 3cla, 3dfr, 5ht2 (model)
Correlation Analysis • Analyses were performed to check correlation between • different protein pockets • different cores • different attachment points • Analyses are based on euclidian distance matrices for all 190 pairwise amino acid vector combinations
Correlation Analysis • Correlation matrix of protein pockets: (hydantoin core, all 4 attachment points)
Correlation Analysis • Correlation matrix of core groups: (all 7 protein pockets, all attachment points)
Correlation Analysis • Correlation matrix of attachment points: (hydantoin core, all 7 protein pockets)
Correlation Analysis Reduction of descriptor vector length (dimensionality) : • no PCA was performed, since we want to get information about the most uncorrelated descriptor columns • instead, an elimination method has been applied: • the complete pairwise correlation matrix is calculate • all pairs of columns with correlation coefficient (r) above a user-defined threshold (e.g. 0.7) are considered for elimination • from each correlating pair, that column is eliminated which can be better described by multiple linear regression of the remaining descriptors • resulting matrix doesn‘t contain pairs of columns with correlation coefficient above the threshold
Descriptor set 1 Descriptor set 3 Descriptor set 2 Correlation Analysis Example: hydantoin core, all 7 proteins, all 4 attachment points
Correlation Analysis Thrombin with three most information-rich core positions
Descriptor Validation • Five peptide datasets, taken from literature(Refs. in Matter, H., J. Peptide Res. 52 (1998) 305-314) • Product descriptors are generated by concatenation of respective reagent descriptors • Validation by PLS Analysis • leave-one-out (LOO) and leave-random-groups-out (LRGO) cross-validation
Descriptor Validation • Datasets:
ACE BIT BRA ENK BR9 Descriptor Validation: Results Leave-random-groups-out (LRGO) results:
Summary • Flexsim-R comprises a novel virtual affinity fingerprint method, which calculates meaningful 3D descriptors for reagents • High correlation between different cores and attachment points • For 3 out of 5 validation sets, significant cross-validated q2 values could be obtained • Rgroup alignment problem is tackled inherently • Flexsim-R calculations are fast and can be automated easily: • only clipped reagent structures are required • core positions need to be calculated only once
Outlook • More validation sets have to be tested (e.g. „real-life“ combichem dataset) • Is there a set of descriptors, which works well for different datasets? • Integration in Boehringer Ingelheim library design and virtual screening workflow
Acknowledgements • Alexander Weber (Boehringer Ingelheim/University of Marburg) • Andreas Teckentrup (Boehringer Ingelheim) • Hans Matter (Aventis) • BMBF for financial support