770 likes | 912 Views
In the name of GOD. Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir. QSAR. Qualitative Structure-Activity Relationships Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of the molecule?
E N D
In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University mhfatemi@umz.ac.ir
QSAR • Qualitative Structure-Activity Relationships • Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of the molecule? • In other, words, if one systematically changes a component, will it have a systematic effect on the activity?
What is QSAR? • A QSAR is a mathematical relationship between a biological activity of a molecular system and its geometric and chemical characteristics. • QSAR attempts to find consistent relationship between biological activity and molecular properties, so that these “rules” can be used to evaluate the activity of new compounds.
Why QSAR? • The number of compounds required for synthesis in order to place 10 different groups in 4 positions of benzene ring is 104 • Solution: synthesize a small number of compounds and from their data derive rules to predict the biological activity of other compounds.
QSXRX=A Activity X=P Property X=R Retention X= bo+ b1D1+ b2D2+…..+ bnDn bi regression coefficient Di descriptors n number of descriptors
Early Examples • Hammett (1930s-1940s)
Hammett (cont.) • Now suppose have a related series s reflect sensitivity to substituent r reflect sensitivity to different system
Free-Wilson Analysis • Log 1/C = S ai + m where C=predicted activity, ai= contribution per group, and m=activity of reference
Free-Wilson example activity of analogs Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br] + 0.58 [m-I] + 0.45 [m-Me] + 0.34 [p-F] + 0.77 [p-Cl] + 1.02 [p-Br] + 1.43 [p-I] + 1.26 [p-Me] + 7.82 Problems include at least two substituent position necessary and only predict new combinations of the substituents used in the analysis.
Hansch Analysis Log 1/C = a p + b s + c where p(x) = log PRX – log PRH and log P is the water/octanol partition This is also a linear free energy relation
Applications of QSAR • 1-Drug design • 2-Prediction of Chemical toxicity • 3-Prediction of environmental activity • 4-Prediction of molecular properties • 5-Investigation of retention mechanism
Steps in QSPR/QSAR Structure Entry & Molecular Modeling QSAR STEPS Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation
Data set selection • 1-Structural similarity of studied molecules • 2-Data collected in the same conditions • 3-Data set would be as large as possible
Steps in QSPR/QSAR Structure Entry & Molecular Modeling QSAR STEPS Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation
INTRODUCTION to Molecular Descriptors • Molecular descriptors are numerical values that characterize properties of molecules • Molecular descriptors encoded structural features of molecules as numerical descriptors • Vary in complexity of encoded information and in compute time • Examples: • Physicochemical properties (empirical) • Values from algorithms, such as 2D fingerprints
Classical Classification of Molecular Descriptors Constitutional, Topological 2-D structural formula Geometrical 3-D shape and structure Quantum Chemical Physicochemical Hybrid descriptors
Topological Indexes: Example: • Wiener Index • Counts the number of bonds between pairs of atoms and sums the distances between all pairs • Molecular Connectivity Indexes • Randićbranching index • Defines a “degree” of an atom as the number of adjacent non-hydrogen atoms • Bond connectivity value is the reciprocal of the square root of the product of the degree of the two atoms in the bond. • Branching index is the sum of the bond connectivities over all bonds in the molecule. • Chi indexes – introduces valence values to encode sigma, pi, and lone pair electrons
Electronic descriptors • Electronic interactions have very important roles in controlling of molecular properties. • Electronic descriptors are calculated to encode aspects of the structures that are related to the electrons • Electronic interaction is a function of charge distribution on a molecule
Physicochemical PropertiesUsed in this QSAR • Liquid solubility Sw,L in mg/L and mmol/m3 • Octanol-water partition coefficient Kow • Liquid Vapor Pressure Pv,L in Pa • Henry’s Law constant Hc in Pa∙m3/mole • Boiling point
Steps in QSPR/QSAR Structure Entry & Molecular Modeling QSAR STEPS Descriptor Generation Construct Model MLRA or CNN Model Validation Feature Selection
Feature Selection • E.g. comparing faces first requires the identification of key features. • How do we identify these? • The same applies to molecules.
Objective feature selection • After descriptors have been calculated for each compound, this set must be reduced to a set of descriptors which is as information rich but as small as possible 1- Deleting of constant or near constant descriptors 2- Pair correlation cut-off selection 3- Cluster analysis 4- Principal component analysis 5- K correlation analysis
Variable reduction • Principal Component Analysis
Principal Component • PC1 = a1,1x1 + a1,2x2 + … + a1,nxn • PC2 = a2,1x1 + a2,2x2 + … + a2,nxn • Keep only those components that possess largest variation • PC are orthogonal to each other
Subjective Feature Selection • The aim is to reach optimal model • 1-Search all possible model (Best MLR) • 2-Forward, Backward & Stepwise methods • 3-Genetic algorithm • 4-Mutation and selection uncover models • 5-Cluster significance analysis • 6-Leaps & bounds regression
Feature Selection: • Most existing feature selection algorithms consist of : • Starting point in the feature space • Search procedure • Evaluation function • Criterion of stopping the search ACS