10 likes | 104 Views
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT. Paola Gramatica a , Ester Papa a and Stefano Pozzi b a) Department of Structural and Functional Biology, University of Insubria - Varese (Italy)
E N D
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT Paola Gramaticaa, Ester Papaa and Stefano Pozzib a) Department of Structural and Functional Biology, University of Insubria - Varese (Italy) b) Laboratory of environmental Studies (SPAA) - Lugano (Switzerland) e-mail: paola.gramatica@uninsubria.ithttp://fisio.dipbsf.uninsubria.it/qsar/ QSAR Research Unit D13 Introduction The need for a scientific foundation for the criteria used to evaluate persistence and long-range transport (LRT) potential of POPs (Persistent Organic Pollutants) in the environment has been recently highlighted1. Persistence is a necessary condition for long-range transport, however persistent chemicals are not necessarily subject to long-range transport: the inherent tendency of compounds towards global mobility must also be taken into account. The half-life of organic pollutants in various compartments is among the most commonly used criterion for studying persistence, but these studies are severely hindered by the limited availability of experimental degradation half-life data, thus there is an incentive to develop reliable procedures, like QSAR/QSPR, to estimate lacking data. The same is true for physico-chemical properties particularly relevant for determining mobility potential2. As the Long Range Transport potential of POPs is due to the contemporaneous influence of their persistence in the environment and their inherent tendency to mobility, the finding of the best combination of chemical properties minimizing LRT is a multicriteria problem and can be approached positively through MultiCriteria Decision-Making (MCDM) techniques3: procedures for combining the magnitude of several properties into a single quantitative measure of overall quality. For modeling and predicting half life we used a data set of 141 organic compounds, for which half-life experimental values in different compartments are available from Howard4, Mackay5 and Rodan6. The molecular structure has been represented by a wide set of molecular descriptors7 calculated by a software developed by R.Todeschini7,8: Constitutional descriptors(56), Topological descriptors(69), Walk counts (20), Bcut descriptors (64), Galvez indices (21), 2D Autocorrelations (96), Charge descriptors (7), Aromaticity descriptors (4), Molecular profiles (40), Geometrical descriptors(18), 3D MoRSE descriptors (160), WHIM descriptors9 (99), GETAWAY descriptors (196), Empirical descriptors (3). The selection of the best subset variables for modelling half-life was done by a Genetic Algorithm (GA-VSS) approach, where the response is obtained by ordinary least square regression (OLS). All the calculations have been performed by using the leave-one-out (LOO) and leave-more-out (LMO) procedures and the scrambling of the responses for the validation of the models (MOBY-DIGS package)10. General Persistence Index The Principal Component Analysis (PCA) of the experimental and predicted half-life of 141 pollutants in various media allows the ranking of the chemicals according to their overall half-life and relative persistence in different media. A general Persistence Index is obtained from the linear combination of half-life data in four environmental media (PC1 in Fig. 1).The chemicals on the right are the most globally persistent in the various compartments. Global Mobility Index The inherent tendency of compounds towards global mobility is regulated mainly by volatility, water solubility, Kow and Koa. A Global Mobility Index is obtained from the linear combination, by PCA, of the physico-chemical properties: the PC1 score (EV%=74.6%) in Fig. 2. The chemicals on the right side of are those with the major tendency to mobility. PERSISTENCE MOBILITY Figure 1 Figure 2 Screening of Long Range Transport Potential The finding of the best combination of chemical properties minimizing LRT can be approached by MultiCriteria Decision-Making (MCDM) techniques: procedures for combining the magnitude of several properties into a single quantitative measure of overall quality. The utility function is chosen here as the best combined criteria function and is applied to the most relevant properties determining the LRT, according to the following criteria, f(x), all expressed as the minimum: the general Persistence Index (Fig.1), deriving from the PCA combination of half-life in four environmental compartments, the Mobility Index (Fig.2), deriving from the cited physico-chemical properties and the Air Half-life, which is considered particularly relevant in determining LRT. The k=3 properties, equally weighted (by the weight l) and added in the utility function, according to the reported formula, allow a ranking of the studied chemicals according to their LRT potential, giving a LRT index ( F(x)). The chemicals, highlighted in Fig. 3, with the lowest utility (F(x) near 0) will exhibit highest LRT potential, while those with F(x) near 1 will have the lowest possibility for LRT. Figure 3 The QSPR (Quantitative Structure-Property Relationships) approach is applied here in two steps: first, to fill the gap in the experimental data of the studied properties and finally to model the scores of the MCDM function, the LRT index (Fig. 3). Different kinds of theoretical molecular descriptors have been used to obtain OLS regression models (Fig.4) and CART classification models (Fig. 5) with good predictive power (Q2LOO=86.8%, Q2LMO =86.2% and Misclassification Risk Cross val.=6.2%, respectively. References 1- Klecka, G.M., Ed. (1999). SETAC Pellston Workshop Environ. Toxicol. Chem. (Suppl.), 18, 8 2- Gramatica, P., Pozzi, S., Consonni, V. and Di Guardo, A. (2001) SAR and QSAR in Environ. Res., in press. 3- Hendriks M.M.W.B., De Boer J.H., Smilde A.K. and Doornbos D.A. (1992) Chemom. Intell. Lab. Syst 16, 175. 4- Howard,P.H. et all. Handbook of environmental degradation rates (1991) http://esc.syrres.com/interkow/PhysProp.htm ; 5- Mackay, Shiu, Ma Illustrated handbook of physical-chemical properties and environmental fate for organic chemicals (2000); 6- Rodan, B.D et all. Envir. Sci. technol.,33(2 3482-3488 (1999); 7- R.Todeschini and V.Consonni,Handbook of molecular descriptors (2000) Wiley; 8- R.Todeschini, DRAGON ver.1.0, Milano, 2000 free download from http://www.disat.unimib.it/chm; 9- R. Todeschini and P.Gramatica (1997) Quant. Struct.Act. Rel. 16, 113-119. 10- R. Todeschini, R. (1999). MOBY DIGS - Software for multilinear regression analysis and variable subset selection by Genetic Algorithm, rel. 2.1 Milan (Italy). 11- Beyer, A., Mackay, D., Matthies, M., Wania, F. and Webster E. (2000). Environ. Sci.Technol. 34, 699-703. Classification Tree nC 14.50 nC 7.00 E1u 0.40 2 3 1 2 Assigned class Figure 5 Figure 4 Conclusions The ranking of the studied chemicals according to their LRT potential, obtained by the utility function of MCDM, can be proposed as an alternative approach to others based on characteristic travel distance (CTD)11. An additional advantage of this approach is that the application of the QSPR models (both regression and classification) on the scores of the MCDM utility function (defined as LRT index) can allow a fast pre-screening of existing and new chemicals for their inherent tendency to LRT, based simply on the knowledge of their molecular structure.