240 likes | 539 Views
Compound Set Enrichment. A novel approach to analysis of primary HTS data. Thibault Varin. Ansgar Schuffenhauer. Gubler, H., Parker, C., Zhang, JH., Raman, P., Ertl, P. Compound Set Enrichment. INTRODUCTION. Introduction.
E N D
Compound Set Enrichment A novel approach to analysis of primary HTS data Thibault Varin Ansgar Schuffenhauer Gubler, H., Parker, C., Zhang, JH., Raman, P., Ertl, P.
Compound Set Enrichment INTRODUCTION | Compound Set Enrichment | Thibault Varin | 10/07/14
Introduction • Active series identification: Can relevant SAR be extracted from primary HTS data? • Are activity data binary or continuous? | Compound Set Enrichment | ThibaultVarin | 10/07/14
IntroductionActive series identification Hypothesis 1: Within primary HTS screening data, structure activity relationships (SAR) are apparent and can be used to help selecting active compound classes. | Compound Set Enrichment | ThibaultVarin | 10/07/14
IntroductionAre the activity data binary or continuous? Activity Scaffold 1 Scaffold 2 • Binary activity: • 1 active / 5 inactives • Scaffold 1 = Scaffold 2 Continuous activity: Scaffold 1 > Scaffold 2 Active compound (binary) Inactive compound (binary) | Compound Set Enrichment | Thibault Varin | 10/07/14
Introduction Are the activity data binary or continuous? Threshold 1 Activity Threshold 2 Activity Binary scaffold activity is different according to the threshold Hypothesis 2: Methods based on an activity cut-off distort the activity information leading to the incorrect assignment of active series of compounds. Active compound (binary) Inactive compound (binary) | Compound Set Enrichment | Thibault Varin | 10/07/14
Compound Set Enrichment METHODS | Compound Set Enrichment | Thibault Varin | 10/07/14
MethodsThe Scaffold Tree classification The Scaffold Tree – Visualization of the Scaffold Universe by Hierarchical Scaffold Classification A. Schuffenhauer, P. Ertl et al. J. Chem. Inf. Model., 47, 47, 2007 | Compound Set Enrichment | Thibault Varin | 10/07/14
MethodsDatasets • 7 PubChem bioassays • Ranging from 9389 to 263679 compounds • Ranging from 0.03 to 26.29% of active compounds Hypothesis 1 PubChem Annotationfrom CRC Simulation of the primary screening data | Compound Set Enrichment | Thibault Varin | 10/07/14
Methods Single hypothesis test: summary procedure • 1. State the null and the alternative hypotheses • H0: „the scaffold is inactive“ • H1: „the scaffold is active“ • 2. Specify a significance level: α=0.01 • 3. Compute the statistics and the p-value )→p-value=probability that the scaffold is inactive (H0) • 4. Decision step: • p-value> α: H0 is accepted • p-value< α: H0 is rejected and then H1 is accepted„The scaffold is active“ | Compound Set Enrichment | Thibault Varin | 10/07/14
Methods The KS and the Binomial hypothesis tests Bioassay Scaffold H0: there is no difference in the proportion of active compounds for compounds having the scaffold S3-2 and the proportion of active compounds for the full dataset. H0: there is no difference in the activity distribution defined by compounds having the scaffold S3-2 and the background distribution Inactives Actives Continuous data KS test Binary data Binomial test | Compound Set Enrichment | Thibault Varin | 10/07/14
Methods Multiple hypothesis tests: Bonferroni correction • Problem offalse positives • α =probabilitytoidentifyasactive an inactivescaffold (foreachtestdone...) • 100 inactivescaffolds: probabilitytoidentify an „active“ bychanceisequal 63% (1-0.99100)) • Suggests to test each scaffold at a critical significance level equal to α = 0.01 / Nbr of scaffolds • Makes the assumption that the individual tests are independent • Each level in the Scaffold Tree have been done separately | Compound Set Enrichment | Thibault Varin | 10/07/14
MethodsDetermining the activity of classes Hypo 1 Hypo 2 Scaffold activity evaluation Multiple hypothesis test correction (Bonferroni) Comparison of results | Compound Set Enrichment | Thibault Varin | 10/07/14
Compound Set Enrichment RESULTS | Compound Set Enrichment | Thibault Varin | 10/07/14
ResultsComparison of KSP and BTP predictions • With: • KSP: KS Prediction • BTP: Binomial Threshold Prediction • Δ: KSP-BTP • BPCA: Binomial PubChem Annotation Both KSP and BTP retrieve BPCA significantly active classes Most of new KSP active classes are not BPCA significantly actives Number of active classes: KSP > BTP | Compound Set Enrichment | Thibault Varin | 10/07/14
ResultsKSP significantly active scaffolds that are in Pubchem inactives Compound activity (PubChem Annotation) Active Inconclusive Inactive WA Inconclusive? Inconclusives? WA WA WA Inconclusives? | Compound Set Enrichment | Thibault Varin | 10/07/14
ResultsPrioritize nodes instead of individual scaffolds Scaffold activity (KS Prediction / Bonferroni) Non significantly active Significantly active | Compound Set Enrichment | Thibault Varin | 10/07/14
ResultsVisualization tool (Peter Ertl) | Compound Set Enrichment | Thibault Varin | 10/07/14
Compound Set Enrichment CONCLUSION | Compound Set Enrichment | Thibault Varin | 10/07/14
ConclusionCompound Set Enrichment • Validation of initial hypotheses • A method to mine HTS data and identify active series of compounds • Chemical classification: Scaffold Tree • Statistical analysis: Kolmogorov-Smirnov hypothesis test • Multiple hypothesis test correction: Bonferroni correction • Use all primary data • No activity cut-off • Identification of new active scaffolds not necessarily represented by very active compounds (latent hits) during the primary screen | Compound Set Enrichment | Thibault Varin | 10/07/14
With many thanks to Acknowledgments Primary mentor: - Ansgar Schuffenhauer Help: MLI group • Scientific advisers: • Christian Parker • Hanspeter Gubler • Ji-Hu Zhang • Peter Ertl • Edgar Jacoby Fellowship: Education office • Discussions: • Martin Beibel • Sebastian Bergling • Meir Glick • Alain Dietrich • Marie-Cecile Didiot | Compound Set Enrichment | Thibault Varin | 10/07/14
Questions? | Compound Set Enrichment | Thibault Varin | 10/07/14