390 likes | 627 Views
Outline. Chemoinformatics-What is it?Molecular descriptors and chemical spacesChemical spaces and molecular similarityMolecular similarity, dissimilarity, diversityModification and Simplification of chemical spacesCompound Classification and SelectionSimilarity SearchingMachine Learning MethodsLibrary DesignQuantitative Structure Activity Relationship Analysis (QSAR)Virtual Screening and compound filtering.
E N D
2. Outline Chemoinformatics-What is it?
Molecular descriptors and chemical spaces
Chemical spaces and molecular similarity
Molecular similarity, dissimilarity, diversity
Modification and Simplification of chemical spaces
Compound Classification and Selection
Similarity Searching
Machine Learning Methods
Library Design
Quantitative Structure Activity Relationship Analysis (QSAR)
Virtual Screening and compound filtering
3. Chemoinformatics-What is it? Use of computer and informational techniques, applied to a range of problems in the field of chemistry.
This in silico techniques are used in pharmaceutical companies in the process of drug discovery.
4. Chemoinformatics-What is it?
5. Chemoinformatics-What is it?
6. Molecular descriptors and chemical spaces Chemical reference spaces – where molecular data sets are projected and analysis of design is carried out.
Definition of chemical spaces critically depend on the use of computational descriptors of molecular structure, physical or chemical properties.
7. Molecular descriptors and chemical spaces
8. Molecular descriptors and chemical spaces
9. Chemical spaces and molecular similarity Similar Property Principle – Molecules having similar structures and properties should also exhibit similar activity. (Often but not always true)
Thus, molecules that are located closely together in chemical reference space are often considered to be functionally related.
10. Chemical spaces and molecular similarity
11. Molecular similarity, dissimilarity, and diversity Diversity analysis
Select different compounds from a given population
Evenly populate a given chemical space with candidate molecules. – Only selecting compounds that are at least a pre-defined minimum distance away from others.
Dissimilarity : Inverse of molecular similarity
Dissimilarity analysis played a major role in the pharmaceutical industry.
12. Molecular similarity, dissimilarity, and diversity Dissimilarity algorithm
Select a subset of k maximally dissimilar compounds
? due to combinatorial problem, non-trivial challenge
Other dissimilarity algorithm
Decide on a desired size, n, of a final subset
Select a seed compound and place it in the subset
Calculate the dissimilarity between each of the other compounds and those in the subset
Choose the next compound as the one most dissimilar to those in the subset
If fewer than n in the subset, repeat the calculation of the dissimilarity until n is achieved
Complexity varies as the square of n
13. Modification and Simplification of Chemical Spaces High dimensional chemistry space might often too complex for carrying meaningful analyses.
Why?
1) Major areas of high dimensional chemical space might not populated and remained as “empty”.
2) Correlation effects between selected descriptors dramatically distort the reference space.
Therefore,
1) Design low-dimensional reference spaces
2) Simplify high-dimensional spaces
3) Reduce their dimensionality
14. Modification and Simplification of Chemical Spaces (cont’d.) Auto scaling or variance scaling
Why? Descriptor with large value range will dominate those having smaller one.
Dimension reduction
15. Modification and Simplification of Chemical Spaces (cont’d.) – Dimension reduction Assumption : High dimensional descriptor spaces have at least some intrinsic redundancy.
Two approaches:
To identify those descriptors that are most important for representing the original dataset and the relationships they form between objects for lower-dimensional representation
ex) multi dimensional scaling (Agrafiotis, et al. 2001)
To attempt to generate new descriptors for lower-dimensional spaces by combining important contributors from original one.
ex) Principal Component Analysis (PCA)
16. Modification and Simplification of Chemical Spaces (cont’d.) - Simplification Simplification of n-dimensional descriptor spaces
ex) Binary descriptor transformation
above mean ? 1, below mean ? 0
17. Compound Classification and Selection- CLUSTER ANALYSIS Aim is to divide a group into clusters where objects in the cluster are similar, but objects in other clusters are dissimilar
Many algorithms for doing this
Hierarchical methods seem to be better than non-hierarchical
Sometimes called a “distance-based” approach to compound selection, because distance is measured between pairs of compounds
18. Compound Classification and Selection- CLUSTER ANALYSIS
19. Compound Classification and Selection- Hierarchical Clustering The composition of each cluster depends on the one from which it was derived
Agglomerative methods start at the bottom and merge similar clusters (bottom-up)
Ward’s method: clusters are formed to minimize the variance (i.e., the sum of the squared deviations from the mean)
Others: centroid method and the median method
Divisive hierarchical clustering starts with all compounds in a single cluster and partitions the data (top-down)
20. Compound Classification and Selection- Non-Hierarchical Clustering Organize compounds into an initially defined number of independent clusters.
Methods:
nearest neighbor: Jarvis Patrick clustering
relocation: K-means
21. Compound Classification and Selection- Partitioning Rather than comparing molecular positions, establish a coordinate ore reference system in chemical space.
Compounds that populate the same partitions considered to be similar.
22. Compound Classification and Selection- Partitioning
23. Compound Classification and Selection- Statistical Partitioning Recursive partitioning – most popular statistical partitioning. A decision tree method
Divides datasets along decision trees formed by sequences of molecular descriptors.
ex) The compounds could be divided according to molecular weight.
24. Compound Classification and Selection- Statistical Partitioning Statistical partitioning methods such as recursive partitioning is also very attractive tools for the analysis of HTS data sets.
25. Similarity Searching –Structural queries and graphs
26. Similarity Searching –Structural queries and graphs Contemporary substructure search methods are mostly based on dictionaries of predefined molecular fragments.
Queries can be transformed into an machine-readable format such as Simplified Molecular Input Line Entry Specification (SMILES) code.
SMILES encodes 2D representation of molecules as linear strings of alpha-numeric characters.
27. Similarity Searching –Structural queries and graphs (SMILES)
28. Similarity Searching –Structural queries and graphs Subgraph-isomorphism :
Common substructures can also determined by systematic mapping of corresponding node positions in graph.
However, computationally expensive
Reduced graph :
Nodes do not represent atoms but features such as functionally important groups or whole ring system.
Become more suitable for node matching procedures and similarity searching.
29. Similarity Searching –Structural queries and graphs (Reduced graph )
30. Similarity Searching – Pharmacophore A molecular framework that carries the essential features responsible for drug’s biological activity
Spatial arrangements of atoms or groups that are responsible for biological activity
Often used as 3D queries for database searching
31. Similarity Searching –Fingerprints Fingerprints :
widely used similarity search tools.
consist of various descriptors that are encoded as bit strings
Bit strings of query and database compared using similarity metric such as Tanimoto coefficient
32. Machine Learning Methods Important role in chemoinformatics
For example, it is usually difficult to predict which types of descriptors are most suitable for a given search, classification.
Therefore, machine learning techniques are often used to facilitate descriptor selection
Applied to generate complex predictive models by iterative processing of molecular learning sets
Genetic algorithms
Neural Networks
Self Organizing Maps (SOM)
33. Machine Learning Methods – Genetic algorithms Different parameters and model solutions to given problems are encoded in a chromosome and subjected to iterative random variation, thus generating a population.
Solutions provided by these chromosomes are evaluated by fitness function that assign high scores to desired results.
Chromosomes yielding best intermediate solutions are subjected to mutation and crossover operation that correspond to random genetic mutations and gene recombination events.
The resulting modified chromosomes represent the next generation and the process is continued until the obtained results meet a satisfactory convergence criterion
34. Library Design Diverse Library
Focused Library
35. Quantitative Structure Activity Relationship Analysis (QSAR) Goal : Evaluation of molecular features that determine biological activity and the prediction of compound potency as a function of structural modification
36. Virtual Screening and Compound Filtering VS(Virtual Screening) - the process of screening large databases on the computer for molecules having desired properties and biological activity.
A major application of VS techniques is the identification of novel active molecules in large compound databases.
Series of known active compounds are added as search templates to a source DB and then compounds that are identified as similar to these templates based on VS calculations are selected as candidate molecules for experimental evaluation
38. Virtual Screening and Compound Filtering- Filter Functions Filter functions are very popular tools for VS
Attempts to identify compounds with desired properties and discard others.
Have been implemented for analysis of diverse molecular properties including chemical reactivity, toxicity, drug-like character, absorption, distribution, metabolism, excretion (ADME) parameters.
Ex) Aqueous solubility, Passive absorption
blood-brain-barrier penetration, metabolic stability,
oral availability
39. Virtual Screening and Compound Filtering- Filter Functions
40. Thank You