Use of Machine Learning in Chemoinformatics

Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course

Major Aspects of Chemoinformatics • Databases: Development of databases for storage and retrieval of small molecule structures and their properties. • Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. • Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

Machine Learning

Machine learning classifiers

Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

Machine Learning

Machine Learning QSAR Virtual Screening Clustering Classification Molecular Structures Properties Molecular Descriptors

Different descriptor types • Simple feature counts (such as number of rotatable bonds or molecular weight) • Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures • Physicochemical properties (density, solubility, vdWaals volume) • Topological indices (size, branching, overall shape)

Major Aspects of Chemoinformatics • Databases: Development of databases for storage and retrieval of small molecule structures and their properties. • Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. • Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

Quantitative Structure-Activity Relationships (QSAR) In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P)

Prediction of Solubility, ADME & Toxicity

hERG Classification with SVM

Evaluation of the data set

Performance of SVM

Virtual screening • Computational techniques for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates.

Similarity Search • Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. • Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.

Fingerprints-based Similarity Search • widely used similarity search tool • consists of descriptors encoded as bit strings • Bit strings of query and database are compared using similarity metric such as Tanimoto coefficient • MACCS fingerprints: 166 structural keys • that answer questions of the type: • Is there a ring of size 4? • Is at least one F, Br, Cl, or I present? • where the answer is either • TRUE (1) or FALSE (0)

Tanimoto Similarity or 90% similarity

Similarity Search

Questions?

Molecular editors and viewers http://www.chemaxon.com/products/marvin/

Molecular editors and viewers http://jmol.sourceforge.net/

Format conversion http://cactus.nci.nih.gov/translate/

Use of Machine Learning in Chemoinformatics

Use of Machine Learning in Chemoinformatics

Presentation Transcript

Topics in Machine Learning

Machine Learning in Bioinformatics

University of Sheffield MSc in Chemoinformatics

Chemoinformatics

Use of Machine Learning Methods to Impute Categorical Data

Introducing Chemoinformatics

Machine Learning in GATE

Chemoinformatics

Experiments in Machine Learning

Evaluation in Machine Learning

Machine Learning in Football

Chemoinformatics in Drug Design

How Search Engines Use Machine Learning

HOW TO USE MACHINE LEARNING IN REAL WORLD APPLICATIONS?

Properties of Machine Learning Applications for Use in Metamorphic Testing

Chemoinformatics

Chemoinformatics

Frontiers in Applications of Machine Learning

Use of Machine Learning for your businesses

Future of Machine Learning in HR

Benefits of Machine Learning in Insurance

Machine Learning Use cases in Data Management