390 likes | 615 Views
Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.unimib.it/chm/. Milano Chemometrics and QSAR Research Group. Roberto Todeschini Viviana Consonni
E N D
Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.unimib.it/chm/ Milano Chemometrics and QSAR Research Group Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control
Roberto Todeschini Milano Chemometrics and QSAR Research Group Molecular descriptors Autocorrelations, eigenvalue-based and information indices Iran - February 2009
Contents Autocorrelation descriptors Molecule representation by matrices Eigenvalue-based descriptors Information content Information indices
Autocorrelation on a molecular graph w is the vector collecting the weights of each atom - quadratic molecular property 1 = (1,A) (A,A) (A,1) - quadratic molecular property with interaction terms
LAG Autocorrelation on a molecular graph Moreau - Broto autocorrelation of a topological structure 1984
Autocorrelation on a molecular graph Example : 4-hydroxy-2-butanone
Eigenvalue-based descriptors Eigenvalue descriptors are derived from the diagonalization of symmetric matrices derived from a molecular graph, such as: Adjacency matrix Vertex distance matrix Edge adjacency matrix Edge distance matrix Detour matrix Geometrical distance matrix Covariance matrix ... and any weighted symmetric matrix
Eigenvalue-based descriptors Lovasz - Pelikan index (or leading eigenvalue) 1973 The largest eigenvalue derived from the adjacency matrix
Eigenvalue-based descriptors General functions of eigenvalues
Eigenvalue-based descriptors The trace of the adjacency matrix (and of the distance matrix) is equal to zero.
Eigenvalue-based descriptors VAA indices (from adjacency matrix) Balaban et al., 1991
Eigenvector-based descriptors VEA indices (from adjacency matrix) Balaban et al., 1991 where A is largest negative eigenvalue derived from the adjacency matrix
Eigenvalue-based descriptors VAD, VED and VRD indices (from distance matrix) Balaban et al., 1991 The same indices defined above are calculated on the topological distance matrix
Molecular geometry The geometry matrix G (or geometric distance matrix) is a square symmetric matrix whose entry rst is the geometric distance calculated as the Euclidean distance between the atoms s and t:
Distance / distance matrix Distance / distance matrix (DD) Randic et al., 1994
Eigenvalue-based descriptors Folding degree index Randic et al., 1994 The largest eigenvalue derived from the distance/distance matrix This quantity tends to 1 for linear molecules (of infinite length) and decreases in correspondence with the folding of the molecule.
Conventional bond order • single bond: * = 1 • double bond: * = 2 • triple bond: * = 3 • conjugated bond: * = 1.5
Eigenvalue-based descriptors BCUT descriptors Burden - CAS - University of Texas eigenvalues 1997 The largest absolute eigenvalues1, 2, 3, ..., L, derived from the following B matrix: *conventional bond order w atomic properties
Topological information indices Indices based on the information content and entropy measures derived from the molecular graphs.
Information content The information content of a system having n elements is a measure of the degree of diversity of the elements in the set. where G is the number of different equivalence classes and ng is the number of elements in the g-th class and
Information content Maximum information content Total information content
Information content The Shannon entropy of a system having n elements is the mean information content of a set of elements where G is the number of different equivalence classes and pg is the probability of the g-th class and
Information content Maximum entropy Standardized entropy
Information content ... on atoms IMAX = 9 log2 9 = 28.529 HMAX = log2 9 = 3.170 n = 9 C = 7 F = 2 n = 9 C = 7 F = 1 Br = 1 IC = 7 log2 7 + 2 log2 2 = 19.651 + 2.000 = 21.651 IT = 28.529 – 21.651 = 6.878 IC = 7 log2 7 + 2 (1 log2 1) = 19.651 + 0 = 19.651 IT = 28.529 – 19.651 = 8.878 H = -(7/9) log2 (7/9) + -(2/9) log2 (2/9) = 0.282 + 0.482 = 0.764 H* = 0.764 / 3.170 = 0.241 H = -(7/9) log2 (7/9) - 2 (1/9) log2 (1/9) = 0.282 + 2 x 0.352 = 0.986 H* = 0.986 / 3.170 = 0.311
n = 18 V1 = 3 V2 = 6 V3 = 9 Information content ... on vertex degrees 1 1 n = 9 V1 = 3 V2 = 3 V3 = 3 3 2 3 H = 3*[-(3/9) log2 (3/9)] = xxx 3 ... on vertex degree magnitudes 2 1 2 SV1 = 3 SV2 = 6 SV3 = 9 H = -(3/18) log2 (3/18) - (6/18) log2 (6/18) -(9/18) log2 (9/18) = xxxx
Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.disat.unimib.it/chm/ Milano Chemometrics and QSAR Research Group Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control THANK YOU
Roberto Todeschini Milano Chemometrics and QSAR Research Group Molecular descriptors Autocorrelations, eigenvalue-based and information indices Prof. Roberto Todeschini Dr. Davide Ballabio Dr. Viviana Consonni Dr. Alberto Manganaro Dr. Andrea Mauri