Chemoinformatics tools for lead discovery

Chemoinformatics tools for lead discovery

Virtual screening • The huge numbers of molecules available in public and in-house databases means that there is a requirement for tools to rank compounds in order of decreasing probability of activity • Range of methods available, varying in the sophistication and the amount of information that is available • Use of structure-based methods when an X-ray structure for the biological target is available • If this is not the case then must make use of information about (potential) ligands

Ligand-Based Methods • Similarity searching • Use when just a single bioactive reference structure is available • 3D pharmacophore searching • Use when it has been possible to carry out a pharmacophore mapping exercise • Machine learning • Use when a fair number of both actives and inactives have been identified

Similarity Searching: I • Use of a similarity measure to quantify the resemblance between an active target, or reference, structure and each database structure • The similar property principle means that high-ranked structures are likely to have similar activities to that of the target structure • Similarity searching hence provides an obvious way of following-up on an initial active

Similarity searching: II • Many ways in which the similarity between two molecules can be computed • A similarity measure has two components • A structure representation • A similarity coefficient to compare two representations • Most operational systems use similarity measures based on 2D fingerprints and the Tanimoto coefficient

Fragment bit-strings (fingerprints) • Originally developed for 2D substructure search • Similarity is based on the fragments common to two molecules • Widely used in both in-house and commercial chemoinformatics systems

Similarity coefficients • Tanimoto coefficient for binary bit strings • C bits set in common between Target and Database Structure • T bits set in Target • D bits set in Database structure • Values between zero (no bits in common) and unity (identical fingerprints) • Many other, related similarity coefficients exist: • Tversky, cosine, Euclidean distance …..

Combination of search techniques using data fusion: I • Tanimoto/fingerprint measures most common but many other types, e.g., • Computed physicochemical properties • 3D grid describing the molecular electrostatic potential • These reflect different molecular characteristics, so may enhance search performance by using more than one similarity measure • Data fusion or consensus scoring

Combination of search techniques using data fusion: II • Combination of different rankings of the same sets of molecules • Two basic approaches • Generate rankings from the same molecule using different similarity measures (similarity fusion) • Generate rankings from different molecules using the same similarity measure but different molecules (group fusion)

Reference 2 Reference 3 Groupfusion Reference 1

After truncation to required rank Reference 2 Reference 1 Reference 3

Fused Group Fusion Final truncated r = 1000 r = 2000 New Active Active found in earlier list

Group fusion rules • Useful performance increases, even with just 10 actives, as better coverage of structural space with multiple starting points • Improvement most obvious when searching for heterogeneous sets of active molecules • Best results obtained by • Fusing similarity coefficient values, rather than ranks • Re-ranking using the maximum of the similarity values associated with each molecule • Using the Tanimoto coefficient

Turbo similarity searching: I • Similar property principle: nearest neighbours are likely to exhibit the same activity as the reference structure • Group fusion improves the identification of active compounds • Potential for further enhancements by group fusion of rankings from the reference structure and from its assumed active nearest neighbours

Turbo similarity searching: II REFERENCE STRUCTURE RANKED LIST NEAREST NEIGHBOURS

Experimental details • MDL Drug Data report (MDDR) dataset of 11 activity classes and 102K structures • In all, 8294 actives in the 11 classes, with (turbo) similarity searches being carried out using each of these as the reference structure • ECFP_4 fingerprints/Tanimoto coefficient • MAX group fusion on similarity scores • Increasing numbers of nearest neighbours

Numbers of nearest neighbours

Upper and lower bound experiments

Rationale for upper bound results • The true actives in the set of assumed actives yield significant enhancements in performance • The true inactives in the set of assumed actives have little effect on performance • Taken together, the two groups of compounds yield the observed net enhancement

Use of machine-learning methods for similarity searching: I • Turbo similarity searching uses group fusion to enhance conventional similarity searching • Machine learning is a more powerful virtual screening tool than similarity searching • But requires a training-set containing known actives and inactives • Given an active reference structure, a training-set can be generated from • Using the k nearest neighbours of the reference structure as the actives • Using k randomly chosen, low-similarity compounds as the inactives

Use of machine-learning methods for similarity searching: II

Results: I • Experiments with the MDDR dataset show that group fusion better than machine-learning methods when averaged over all of the classes • However, group fusion inferior for the most diverse datasets (as measured by the mean pair-wise similarities) • Additional searches using 10 MDDR activity classes that are as structurally diverse as possible

Results: II

Conclusions: I • Fingerprint-based similarity searching using a known reference structure is long-established in chemoinformatics • When small numbers of actives are available, group fusion will enhance performance when the sought actives are structurally heterogeneous

Conclusions: II • Can also enhance conventional similarity search, even if there is just a single active, by assuming that the nearest neighbours are also active • Can be effected in two ways • Use of group fusion to combine similarity rankings (overall best approach) • Use of substructural analysis to compute fragment weights (best with highly heterogeneous sets of actives)

Soaluntukdipelajari • Tunjukkanperankhemoinformatikdalam QSAR • Data dananalisisdarikhemoinformatik yang banyakdigunakandalam docking molekul • Indekskemiripan (similarity index) banyakdigunakanuntukmendapatkaninformasitentangsenyawabaru yang memilikiaktivitasbiologistinggi. Jelaskansecarasingkatsistemkerjanya • Dalampenemuanobatbaru yang lebihpotensialdari yang sudahdikenal, banyakmemanfaatkankhemoinformatiks. Jelaskandenganbeberapacontoh. • ApaperbedaanpenggunaanKhemoinformatiksdalam QSAR, molecular docking dan similarity searching?

Chemoinformatics tools for lead discovery

Chemoinformatics tools for lead discovery

Presentation Transcript

Chemoinformatics

Chemoinformatics in Molecular Docking and Drug Discovery

Tools for Discovery

Tools for Discovery

Linked Environments for Atmospheric Discovery (LEAD)

Tools for Discovery

Tools for Discovery

Tools for Discovery

Introducing Chemoinformatics

Bioinformatics Tools for Biomarkers Discovery

Chemoinformatics

2 nd Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

Resource Discovery tools for VLEs 1

Lead Management Tools

software discovery tools

the discovery of lead

Tools for Discovery

Chemoinformatics

2 nd Joint Sheffield Conference on Chemoinformatics: Computational Tools for Lead Discovery

Chemoinformatics

Audio Tools for Music Discovery