Chemical similarity with Toxmatch 1.03 and Ambit Discovery

1. Chemical similaritywithToxmatch 1.03and Ambit Discovery Nina Jeliazkova, Joanna Jaworska

2. 2 Chemical similarity assessment using Ambit Database Exact substructure search based on 2D Structural Similarity search (various methods) Criteria on descriptors Based on mechanistic understanding Verhaar scheme

3. 3 Another view on Similarity assessments with Toxmatch and Ambit Discovery Ambit Discovery Similarity to a set The query compound is compared to a summary representation of a set: Descriptor space � center of the data cloud Fingerprints � a consensus fingerprint Toxmatch Pairwise similarities Similarity to a set The query compound is compared to its nearest neighbours, selected by various similarity measures

4. 4 Similarity searching

5. 5 Similarity searching Rationale Based in the Similar Property Principle: �Structurally similar compounds tend to exhibit similar properties� Application 1 Calculate the pair wise similarity between a compound with known activity and each compound in the database Rank the database on similarity to the known active Select top n% for further biological testing Application 2 Calculate the pair wise similarity between a compound with unknown activity and each compound from a set with known activity Rank the dataset on similarity to the query compound Do inference on query compound activity based on activity of top n% compounds Application 3 Calculate the similarity between a compound with unknown activity and two or more sets of compounds with known activity Decide which set is more similar to the query compound

6. 6 Chemical similarity quantified Numerical representation of chemical structure Structural similarity 2D, 3D Descriptor �based similarity Field �based Spectral Quantum mechanics More� Comparison between numerical representations Distance-like Association, Correlation

7. 7 Structural similarity Substructure searching Maximum Common Substructure Fragment approach Atom, bond or ring counts, degree of connectivity Atom-centred, bond-centred, ring-centred fragments Fingerprints, molecular holograms, atom environments

8. 8 Fingerprints with Tanimoto distance

9. 9 2D: Atom Environment (AE) E.g. 6-aminoquinoline MOLPRINT 2D used features calculated from the connectivity table, which are similar e.g. to �augmented atoms� which came up in the late 70s, or the Scitegic circular fingerprints The algorithm consists of two steps: Firstly, Sybyl mol2 atom types are assigned to every heavy atom in the molecule which partially capture physicochemical properties such as hybridization state and lone pairs Secondly, counts of heavy atoms at a certain number of bonds apart from the central atom are constructed for every heavy atom in the molecule. Thus, a molecule is described by a number of count vectors of atom types. The number of count vectors is equal to the number of heavy atoms in the molecule. (The process is described more detailed in JCICS 2004, 44, 170 and 2004, 44, 1710.)MOLPRINT 2D used features calculated from the connectivity table, which are similar e.g. to �augmented atoms� which came up in the late 70s, or the Scitegic circular fingerprints The algorithm consists of two steps: Firstly, Sybyl mol2 atom types are assigned to every heavy atom in the molecule which partially capture physicochemical properties such as hybridization state and lone pairs Secondly, counts of heavy atoms at a certain number of bonds apart from the central atom are constructed for every heavy atom in the molecule. Thus, a molecule is described by a number of count vectors of atom types. The number of count vectors is equal to the number of heavy atoms in the molecule. (The process is described more detailed in JCICS 2004, 44, 170 and 2004, 44, 1710.)

10. 10 What do we measure We compare numerical representations of chemical compounds The numerical representation is not unique The numerical representation includes only part of all the information about the compound A distance measure reflects �closeness� only if the data holds specific assumptions Statistical assumptions and statistical error is involved

11. 11 Fingerprint similarity:Tanimoto coefficient specifics Information loss � fragments presence and absence instead of counts Bit string saturation � within a large database almost all bits are set Can give nonintuitive results: The average similarity appears to increase with the complexity of the query compound queries for large molecules are more discriminating (flatter curve, Tanimoto values spread wider) queries for small molecules have sharp peak, unable to distinguish between molecules Flower D., On the Properties of Bit String-Based Measures of Chemical Similarity, J. Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998 Flower D., On the Properties of Bit String-Based Measures of Chemical Similarity, J. Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998

12. 12 Descriptor similarity: Distances specifics Euclidean distance City-block distance Mahalanobis distance

13. 13 Chemical similarity byDistance in relevant descriptor space Data set � phenols , toxicity data to the ciliate Tetrahymena pyriformis Data taken from Shuurman article in Quant. Struct.-Act. Relat., 21 (2002) �Multivariate Discrimination between Modes of Toxic Action of Phenols� Descriptors : pKa � acidity constant; frontier orbital energies EHOMO, ELUMO (more, not illustrated) Yellow : polar narcotics Red : respiratory uncouplers The blue point will be classified as most similar to the yellow class (a) , red class (b) or not classified (c) , according to the three different classification models. The probabilistic classifier gives the most accurate predictionData set � phenols , toxicity data to the ciliate Tetrahymena pyriformis Data taken from Shuurman article in Quant. Struct.-Act. Relat., 21 (2002) �Multivariate Discrimination between Modes of Toxic Action of Phenols� Descriptors : pKa � acidity constant; frontier orbital energies EHOMO, ELUMO (more, not illustrated) Yellow : polar narcotics Red : respiratory uncouplers The blue point will be classified as most similar to the yellow class (a) , red class (b) or not classified (c) , according to the three different classification models. The probabilistic classifier gives the most accurate prediction

14. 14 Do structurally similar molecules have similar biological activity ? Set of 1645 chemicals with IC50s for monoamine oxidase inhibition Daylight fingerprints 1024 bits long ( 0-7 bonds) When using Tanimoto coefficient with a cut off value of 0.85 only 30 % of actives were detected

15. 15 Structurally similar compounds can have very different 3D properties Usually the modeller resorts to the similarity in structures with the hope that structurally similar compounds will also have the same mechanism of action [[i]]. This is a widely used approach, but such hope does not always come true. Several surprising structure-activity relationships demonstrate that chemically similar compounds may have significantly different biological actions and activities and different molecules can be very similar in their biological activities. Applying the results from one con-generic series to another one may lead to completely wrong conclusions [54, [ii], [iii], [iv], [v]]. As illustrated in [Kubinyi], structurally similar compounds (eight compounds with the same connectivity and differing in only one or two substituents in this example) can have very different volume and surface potentials, hydrophobic and polar regions, hydrogen bond donor potentials, hydrogen bond acceptor potentials and molecular electrostatic potentials. This is also in contradiction with the long repeated �basics of QSAR�, asserting that similar compounds have similar properties and dissimilar compounds have dissimilar properties. [[i]] Barratt, M.D., Castell, J.V., Chamberlain, M., Combes, R.D., Dearden, J.C., Fentem, J.H., Gerner, I., Giuliani, A., Gray, T.J.B., Livingstone, D.J., McLean Provan W., Rutten, F.J.J.A.L., Verhaar, H.J.M. and Zbinden, P.,, The Integrated Use of Alternative Approaches for Predicting Toxic Hazard The Report and Recommendations of ECVAM Workshop 8 http://altweb.jhsph.edu/publications/ECVAM/ecvam08.htm [[ii]] Burger, A., Isosterism and bioisosterism in drug design, Prog. Drug. Res., 37, 287-371 (1991). [[iii]] Patani, G.A. and LaVoie, E.J., Bioisosterism: A rational approach in drug design, Chem. Rev., 96, 3147-3176 (1996). [[iv]] Kubinyi, H., Similarity and Dissimilarity - A Medicinal Chemist�s View, in: 3D QSAR in Drug Design. Volume II. Ligand-Protein Interactions and Molecular Similarity, H. Kubinyi, G. Folkers and Y. C. Martin, Eds., Kluwer/ESCOM, Dordrecht (1998), 225-252; also published in: Persp. Drug Design Discov. 9/10/11, 225-252 (1998). [[v]] Kubinyi H., Chemical Similarity and Biological Activity ,3rd Workshop on Chemical Structure and Biological Activity: Perspectives on QSAR 2001 (November 8-10, 2001) Sao Paolo, Brazil, http://arara.iq.usp.br/l6.htm Kubinyi, H., Chemical Similarity and Biological activity. Hugo Kubinyi Lectures, http://home.t-online.de/home/kubinyi/dd-06.pdfUsually the modeller resorts to the similarity in structures with the hope that structurally similar compounds will also have the same mechanism of action [[i]]. This is a widely used approach, but such hope does not always come true. Several surprising structure-activity relationships demonstrate that chemically similar compounds may have significantly different biological actions and activities and different molecules can be very similar in their biological activities. Applying the results from one con-generic series to another one may lead to completely wrong conclusions [54, [ii], [iii], [iv], [v]]. As illustrated in [Kubinyi], structurally similar compounds (eight compounds with the same connectivity and differing in only one or two substituents in this example) can have very different volume and surface potentials, hydrophobic and polar regions, hydrogen bond donor potentials, hydrogen bond acceptor potentials and molecular electrostatic potentials. This is also in contradiction with the long repeated �basics of QSAR�, asserting that similar compounds have similar properties and dissimilar compounds have dissimilar properties. [[i]] Barratt, M.D., Castell, J.V., Chamberlain, M., Combes, R.D., Dearden, J.C., Fentem, J.H., Gerner, I., Giuliani, A., Gray, T.J.B., Livingstone, D.J., McLean Provan W., Rutten, F.J.J.A.L., Verhaar, H.J.M. and Zbinden, P.,, The Integrated Use of Alternative Approaches for Predicting Toxic Hazard The Report and Recommendations of ECVAM Workshop 8 http://altweb.jhsph.edu/publications/ECVAM/ecvam08.htm [[ii]] Burger, A., Isosterism and bioisosterism in drug design, Prog. Drug. Res., 37, 287-371 (1991). [[iii]] Patani, G.A. and LaVoie, E.J., Bioisosterism: A rational approach in drug design, Chem. Rev., 96, 3147-3176 (1996). [[iv]] Kubinyi, H., Similarity and Dissimilarity - A Medicinal Chemist�s View, in: 3D QSAR in Drug Design. Volume II. Ligand-Protein Interactions and Molecular Similarity, H. Kubinyi, G. Folkers and Y. C. Martin, Eds., Kluwer/ESCOM, Dordrecht (1998), 225-252; also published in: Persp. Drug Design Discov. 9/10/11, 225-252 (1998). [[v]] Kubinyi H., Chemical Similarity and Biological Activity ,3rd Workshop on Chemical Structure and Biological Activity: Perspectives on QSAR 2001 (November 8-10, 2001) Sao Paolo, Brazil, http://arara.iq.usp.br/l6.htm

16. 16 Back to the meaning of chemical similarity Structural similarity (the available methods) Similarity in activity (the need) Reconciliation: Similarity in �tailored� space Proximity in relevant descriptor space Structural similarity based on mechanism of action Weighted structural similarity Data fusion � combine different methods

17. 17 What kind of searches are desirable? Detailed analysis for pair wise similarity Toxmatch Similarity of a compound to compounds in the database Ambit Database Tools Similarity of a compounds to a reference set Toxmatch Similarity of a set of compounds to compounds in the database Ambit Discovery Toxmatch Grouping based on chemical class Toxtree Ambit db

18. 18 AMBIT DiscoverySoftware for applicability domain and similarity assessment

19. 19 AMBIT DiscoveryData visualisation

20. 20 AMBIT DiscoveryResults

21. 21 AMBIT DiscoveryResults (exported to MSExcel file)

22. 22 Ambit Discovery applications Variety of exploring similarities Based on properties Based on structural similarity Application Consensus domain for a robust assessment SAR difficult due to multiple functional groups, multi target toxicity Ascribing chemical to a particular group defined by expert Mechanistic reactivity domains for skin sensitization J.Jaworska , N.Nikolova-Jeliazkov, How can structural similarity analysis help in category formation, SAR and QSAR in Environmental Research, Vol. 18, No. 3-4, June 2007, 1-13

23. 23 What Toxmatch can do for you ? Assess pairwise similarity Classify into groups Predict activity Pair wise similarity between each analyzed compound and compounds in the training/test sets are calculated; Composite (averaged) similarity measure between a compound and a user selected subset of compounds is performed. Besides predefined subsets for certain training data sets, subsets can be selected manually or automatically by a clustering algorithm; The software allows compounds to be ranked according to a selected similarity index;

24. 24 Similarity What is implemented Structural � fingerprints (Tanimoto), atom environments, MCCS Descriptor � Euclidean distance, Hodgkin-Richards, Cosine, Tanimoto Set of rules for specific activities (e.g. BfR rules) How to use similarity Similarity values say nothing about biological activity Similarity calculation has to be linked to specific activity Prediction � predict dependent variable (activity) Classification � classify into groups of activity

25. 25 Similarity indices: Distance-Like similarity indices: General definition: Euclidean Distance Index (k=x=2): Correlation-like similarity indices: General definition: Hodgkin-Richards index: Tanimoto index: Cosine-like similarity index or Carb� index:

26. 26 Toxmatch - main screen

27. 27 Similarity and activity Background: Prediction � predict dependent variable (activity) Classification � classify into groups of activity Implementation: Prediction � find k most similar compounds and predict activity based on activities of those compounds (weighted average) Classification � classify the query compound into the group where most of the k most similar compounds belong

28. 28 Activity prediction by similarity Predict dependent variable (activity) Measured activity values should be available for the training set Find k most similar compounds and predict activity based on activities of these compounds The actual set of k most similar compounds depends on similarity measure The predicted value is weighted average of k activities Reported values: Similarity to the entire data set Predicted activity value

29. 29 Classification by similarity Classify into groups of activity Activity groups should be available for the training set (e.g. potency classes or other grouping) Find k most similar compounds and classify the query compound into the group where most of these compounds belong The actual set of k most similar compounds depends on similarity measure The predicted value is Probability to belong to a group ( m/k , where m is the number of compounds in a group) The group predicted (one with highest probability) Reported values: Similarity to each group (Dataset.distance.group) The group predicted

30. 30 More in Toxmatch Predefined training sets for 4 endpoints Similarity matrix Importing descriptors Calculating descriptors Comparing training and test sets BfR rules for predicting skin irritation potential

31. Thank you!

Chemical similarity with Toxmatch 1.03 and Ambit Discovery

Chemical similarity with Toxmatch 1.03 and Ambit Discovery

Presentation Transcript

Toxmatch - a tool to assess chemical similarity

Learning with Similarity Functions

Unstructured information integration through data-driven similarity discovery

SIMILARITY & DIVERSITY SEARCHING OF CHEMICAL DATABASES

MARKETING 1.03

Indicator 1.03

Chemical Data and Computer-Aided Drug Discovery

1.03 MWD

SEM1 1.03

Building Biochemical + Chemical Similarity Networks

Obj. 1.03

1.03

Objective 1.03

Synopsis Part I – Discovery, Physical and Chemical Properties,

Objective 1.03

Learning with Similarity Functions

1.03

Chemical similarity with Toxmatch 1.03 and Ambit Discovery