330 likes | 769 Views
2. Chemical similarity assessment using Ambit Database. Exact substructure search based on 2D Structural Similarity search (various methods) Criteria on descriptors Based on mechanistic understanding Verhaar scheme. 3. Another view on Similarity assessments with Toxmatch and Ambit Discovery.
E N D
1. Chemical similaritywithToxmatch 1.03and Ambit Discovery Nina Jeliazkova, Joanna Jaworska
2. 2 Chemical similarity assessment using Ambit Database Exact substructure search based on 2D
Structural Similarity search (various methods)
Criteria on descriptors
Based on mechanistic understanding
Verhaar scheme
3. 3 Another view on Similarity assessments with Toxmatch and Ambit Discovery Ambit Discovery
Similarity to a set
The query compound is compared to a summary representation of a set:
Descriptor space – center of the data cloud
Fingerprints – a consensus fingerprint
Toxmatch
Pairwise similarities
Similarity to a set
The query compound is compared to its nearest neighbours, selected by various similarity measures
4. 4 Similarity searching
5. 5 Similarity searching Rationale
Based in the Similar Property Principle:
“Structurally similar compounds tend to exhibit similar properties”
Application 1
Calculate the pair wise similarity between a compound with known activity and each compound in the database
Rank the database on similarity to the known active
Select top n% for further biological testing
Application 2
Calculate the pair wise similarity between a compound with unknown activity and each compound from a set with known activity
Rank the dataset on similarity to the query compound
Do inference on query compound activity based on activity of top n% compounds
Application 3
Calculate the similarity between a compound with unknown activity and two or more sets of compounds with known activity
Decide which set is more similar to the query compound
6. 6 Chemical similarity quantified Numerical representation of chemical structure
Structural similarity 2D, 3D
Descriptor –based similarity
Field –based
Spectral
Quantum mechanics
More…
Comparison between numerical representations
Distance-like
Association,
Correlation
7. 7 Structural similarity Substructure searching
Maximum Common Substructure
Fragment approach
Atom, bond or ring counts, degree of connectivity
Atom-centred, bond-centred, ring-centred fragments
Fingerprints, molecular holograms, atom environments
8. 8 Fingerprints with Tanimoto distance
9. 9 2D: Atom Environment (AE) E.g. 6-aminoquinoline MOLPRINT 2D used features calculated from the connectivity table, which are similar e.g. to “augmented atoms” which came up in the late 70s, or the Scitegic circular fingerprints
The algorithm consists of two steps: Firstly, Sybyl mol2 atom types are assigned to every heavy atom in the molecule which partially capture physicochemical properties such as hybridization state and lone pairs
Secondly, counts of heavy atoms at a certain number of bonds apart from the central atom are constructed for every heavy atom in the molecule. Thus, a molecule is described by a number of count vectors of atom types. The number of count vectors is equal to the number of heavy atoms in the molecule. (The process is described more detailed in JCICS 2004, 44, 170 and 2004, 44, 1710.)MOLPRINT 2D used features calculated from the connectivity table, which are similar e.g. to “augmented atoms” which came up in the late 70s, or the Scitegic circular fingerprints
The algorithm consists of two steps: Firstly, Sybyl mol2 atom types are assigned to every heavy atom in the molecule which partially capture physicochemical properties such as hybridization state and lone pairs
Secondly, counts of heavy atoms at a certain number of bonds apart from the central atom are constructed for every heavy atom in the molecule. Thus, a molecule is described by a number of count vectors of atom types. The number of count vectors is equal to the number of heavy atoms in the molecule. (The process is described more detailed in JCICS 2004, 44, 170 and 2004, 44, 1710.)
10. 10 What do we measure We compare numerical representations of chemical compounds
The numerical representation is not unique
The numerical representation includes only part of all the information about the compound
A distance measure reflects “closeness” only if the data holds specific assumptions
Statistical assumptions and statistical error is involved
11. 11 Fingerprint similarity:Tanimoto coefficient specifics Information loss – fragments presence and absence instead of counts
Bit string saturation – within a large database almost all bits are set
Can give nonintuitive results:
The average similarity appears to increase with the complexity of the query compound
queries for large molecules are more discriminating (flatter curve, Tanimoto values spread wider)
queries for small molecules have sharp peak, unable to distinguish between molecules Flower D., On the Properties of Bit String-Based Measures of Chemical Similarity, J. Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998
Flower D., On the Properties of Bit String-Based Measures of Chemical Similarity, J. Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998
12. 12 Descriptor similarity: Distances specifics Euclidean distance
City-block distance
Mahalanobis distance
13. 13 Chemical similarity byDistance in relevant descriptor space Data set – phenols , toxicity data to the ciliate Tetrahymena pyriformis
Data taken from Shuurman article in Quant. Struct.-Act. Relat., 21 (2002) “Multivariate Discrimination between Modes of Toxic Action of Phenols”
Descriptors : pKa – acidity constant; frontier orbital energies EHOMO, ELUMO (more, not illustrated)
Yellow : polar narcotics
Red : respiratory uncouplers
The blue point will be classified as most similar to the yellow class (a) , red class (b) or not classified (c) , according to the three different classification models. The probabilistic classifier gives the most accurate predictionData set – phenols , toxicity data to the ciliate Tetrahymena pyriformis
Data taken from Shuurman article in Quant. Struct.-Act. Relat., 21 (2002) “Multivariate Discrimination between Modes of Toxic Action of Phenols”
Descriptors : pKa – acidity constant; frontier orbital energies EHOMO, ELUMO (more, not illustrated)
Yellow : polar narcotics
Red : respiratory uncouplers
The blue point will be classified as most similar to the yellow class (a) , red class (b) or not classified (c) , according to the three different classification models. The probabilistic classifier gives the most accurate prediction
14. 14 Do structurally similar molecules have similar biological activity ? Set of 1645 chemicals with IC50s for monoamine oxidase inhibition
Daylight fingerprints 1024 bits long ( 0-7 bonds)
When using Tanimoto coefficient with a cut off value of 0.85 only 30 % of actives were detected
15. 15 Structurally similar compounds can have very different 3D properties Usually the modeller resorts to the similarity in structures with the hope that structurally similar compounds will also have the same mechanism of action [[i]]. This is a widely used approach, but such hope does not always come true. Several surprising structure-activity relationships demonstrate that chemically similar compounds may have significantly different biological actions and activities and different molecules can be very similar in their biological activities. Applying the results from one con-generic series to another one may lead to completely wrong conclusions [54, [ii], [iii], [iv], [v]].
As illustrated in [Kubinyi], structurally similar compounds (eight compounds with the same connectivity and differing in only one or two substituents in this example) can have very different volume and surface potentials, hydrophobic and polar regions, hydrogen bond donor potentials, hydrogen bond acceptor potentials and molecular electrostatic potentials. This is also in contradiction with the long repeated “basics of QSAR”, asserting that similar compounds have similar properties and dissimilar compounds have dissimilar properties.
[[i]] Barratt, M.D., Castell, J.V., Chamberlain, M., Combes, R.D., Dearden, J.C., Fentem, J.H., Gerner, I., Giuliani, A., Gray, T.J.B., Livingstone, D.J., McLean Provan W., Rutten, F.J.J.A.L., Verhaar, H.J.M. and Zbinden, P.,, The Integrated Use of Alternative Approaches for Predicting Toxic Hazard The Report and Recommendations of ECVAM Workshop 8 http://altweb.jhsph.edu/publications/ECVAM/ecvam08.htm
[[ii]] Burger, A., Isosterism and bioisosterism in drug design, Prog. Drug. Res., 37, 287-371 (1991).
[[iii]] Patani, G.A. and LaVoie, E.J., Bioisosterism: A rational approach in drug design, Chem. Rev., 96, 3147-3176 (1996).
[[iv]] Kubinyi, H., Similarity and Dissimilarity - A Medicinal Chemist’s View, in: 3D QSAR in Drug Design. Volume II. Ligand-Protein Interactions and Molecular Similarity, H. Kubinyi, G. Folkers and Y. C. Martin, Eds., Kluwer/ESCOM, Dordrecht (1998), 225-252; also published in: Persp. Drug Design Discov. 9/10/11, 225-252 (1998).
[[v]] Kubinyi H., Chemical Similarity and Biological Activity ,3rd Workshop on Chemical Structure and Biological Activity: Perspectives on QSAR 2001 (November 8-10, 2001) Sao Paolo, Brazil, http://arara.iq.usp.br/l6.htm
Kubinyi, H., Chemical Similarity and Biological activity. Hugo Kubinyi Lectures, http://home.t-online.de/home/kubinyi/dd-06.pdfUsually the modeller resorts to the similarity in structures with the hope that structurally similar compounds will also have the same mechanism of action [[i]]. This is a widely used approach, but such hope does not always come true. Several surprising structure-activity relationships demonstrate that chemically similar compounds may have significantly different biological actions and activities and different molecules can be very similar in their biological activities. Applying the results from one con-generic series to another one may lead to completely wrong conclusions [54, [ii], [iii], [iv], [v]].
As illustrated in [Kubinyi], structurally similar compounds (eight compounds with the same connectivity and differing in only one or two substituents in this example) can have very different volume and surface potentials, hydrophobic and polar regions, hydrogen bond donor potentials, hydrogen bond acceptor potentials and molecular electrostatic potentials. This is also in contradiction with the long repeated “basics of QSAR”, asserting that similar compounds have similar properties and dissimilar compounds have dissimilar properties.
[[i]] Barratt, M.D., Castell, J.V., Chamberlain, M., Combes, R.D., Dearden, J.C., Fentem, J.H., Gerner, I., Giuliani, A., Gray, T.J.B., Livingstone, D.J., McLean Provan W., Rutten, F.J.J.A.L., Verhaar, H.J.M. and Zbinden, P.,, The Integrated Use of Alternative Approaches for Predicting Toxic Hazard The Report and Recommendations of ECVAM Workshop 8 http://altweb.jhsph.edu/publications/ECVAM/ecvam08.htm
[[ii]] Burger, A., Isosterism and bioisosterism in drug design, Prog. Drug. Res., 37, 287-371 (1991).
[[iii]] Patani, G.A. and LaVoie, E.J., Bioisosterism: A rational approach in drug design, Chem. Rev., 96, 3147-3176 (1996).
[[iv]] Kubinyi, H., Similarity and Dissimilarity - A Medicinal Chemist’s View, in: 3D QSAR in Drug Design. Volume II. Ligand-Protein Interactions and Molecular Similarity, H. Kubinyi, G. Folkers and Y. C. Martin, Eds., Kluwer/ESCOM, Dordrecht (1998), 225-252; also published in: Persp. Drug Design Discov. 9/10/11, 225-252 (1998).
[[v]] Kubinyi H., Chemical Similarity and Biological Activity ,3rd Workshop on Chemical Structure and Biological Activity: Perspectives on QSAR 2001 (November 8-10, 2001) Sao Paolo, Brazil, http://arara.iq.usp.br/l6.htm
16. 16 Back to the meaning of chemical similarity Structural similarity (the available methods)
Similarity in activity (the need)
Reconciliation: Similarity in “tailored” space
Proximity in relevant descriptor space
Structural similarity based on mechanism of action
Weighted structural similarity
Data fusion – combine different methods
17. 17 What kind of searches are desirable? Detailed analysis for pair wise similarity
Toxmatch
Similarity of a compound to compounds in the database
Ambit Database Tools
Similarity of a compounds to a reference set
Toxmatch
Similarity of a set of compounds to compounds in the database
Ambit Discovery
Toxmatch
Grouping based on chemical class
Toxtree
Ambit db
18. 18 AMBIT DiscoverySoftware for applicability domain and similarity assessment
19. 19 AMBIT DiscoveryData visualisation
20. 20 AMBIT DiscoveryResults
21. 21 AMBIT DiscoveryResults (exported to MSExcel file)
22. 22 Ambit Discovery applications Variety of exploring similarities
Based on properties
Based on structural similarity
Application
Consensus domain for a robust assessment
SAR difficult due to multiple functional groups, multi target toxicity
Ascribing chemical to a particular group defined by expert
Mechanistic reactivity domains for skin sensitization
J.Jaworska , N.Nikolova-Jeliazkov, How can structural similarity analysis help in category formation, SAR and QSAR in Environmental Research, Vol. 18, No. 3-4, June 2007, 1-13
23. 23 What Toxmatch can do for you ? Assess pairwise similarity
Classify into groups
Predict activity
Pair wise similarity between each analyzed compound and compounds in the training/test sets are calculated;
Composite (averaged) similarity measure between a compound and a user selected subset of compounds is performed. Besides predefined subsets for certain training data sets, subsets can be selected manually or automatically by a clustering algorithm;
The software allows compounds to be ranked according to a selected similarity index;
24. 24 Similarity What is implemented
Structural – fingerprints (Tanimoto), atom environments, MCCS
Descriptor – Euclidean distance, Hodgkin-Richards, Cosine, Tanimoto
Set of rules for specific activities (e.g. BfR rules)
How to use similarity
Similarity values say nothing about biological activity
Similarity calculation has to be linked to specific activity
Prediction – predict dependent variable (activity)
Classification – classify into groups of activity
25. 25 Similarity indices: Distance-Like similarity indices:
General definition:
Euclidean Distance Index (k=x=2):
Correlation-like similarity indices:
General definition:
Hodgkin-Richards index:
Tanimoto index:
Cosine-like similarity index or Carbó index:
26. 26 Toxmatch - main screen
27. 27 Similarity and activity Background:
Prediction – predict dependent variable (activity)
Classification – classify into groups of activity
Implementation:
Prediction – find k most similar compounds and predict activity based on activities of those compounds (weighted average)
Classification – classify the query compound into the group where most of the k most similar compounds belong
28. 28 Activity prediction by similarity Predict dependent variable (activity)
Measured activity values should be available for the training set
Find k most similar compounds and predict activity based on activities of these compounds
The actual set of k most similar compounds depends on similarity measure
The predicted value is weighted average of k activities
Reported values:
Similarity to the entire data set
Predicted activity value
29. 29 Classification by similarity Classify into groups of activity
Activity groups should be available for the training set (e.g. potency classes or other grouping)
Find k most similar compounds and classify the query compound into the group where most of these compounds belong
The actual set of k most similar compounds depends on similarity measure
The predicted value is
Probability to belong to a group ( m/k , where m is the number of compounds in a group)
The group predicted (one with highest probability)
Reported values:
Similarity to each group (Dataset.distance.group)
The group predicted
30. 30 More in Toxmatch Predefined training sets for 4 endpoints
Similarity matrix
Importing descriptors
Calculating descriptors
Comparing training and test sets
BfR rules for predicting skin irritation potential
31. Thank you!