230 likes | 299 Views
Application of rule induction techniques for detecting the possible impact of endocrine disruptors on the North Sea ecosystems. Tim Verslycke 1 , Peter Goethals 1,2 , Gert Vandenbergh 1 , Karen Callebaut 3 & C olin Janssen 1
E N D
Application of rule induction techniques for detecting the possible impact of endocrine disruptors on the North Sea ecosystems Tim Verslycke1, Peter Goethals1,2, Gert Vandenbergh1, Karen Callebaut3& Colin Janssen1 1Laboratory of Environmental Toxicology and Aquatic Ecology, Ghent University 2 Institute for Forestry and Game Management 3 Ecolas n.v.
Outline • Introduction on endocrine disruptors • ED North project • Database set-up • Data mining and rule induction • Practical application on ED North database • Conclusions
Endocrine disruptors ?? • Endocrine disruptors, pseudo-hormones, endocrine modulators, xeno-hormones, … • Compounds that interfere with the endocrine system, resulting in (negative) effects on health and/or reproduction of organisms • Since 90s: one of the strongest growing research domains in environmental toxicology • Dozens of lists, 100s compounds • Worldwide implication: industry - government - academics
Endocrine disruption in marine environments ?? • Sea: final sink for many chemicals • North Sea and its estuaries are under a heavy pollution load • Indications of potential endocrine disruption in these ecosystems • Need to have better overview of potential endocrine disruption in North Sea and Scheldt estuary ED-NORTH project
ED-North project ~ Goals • Critical evaluation of the literature on endocrine disruptors • Build a reference list and database of chemicals with (potential) endocrine disruptive activity • Evaluation of the described and suspected effects of endocrine disruptors on marine organisms • Prioritize the selected chemicals • If enough information: preliminary risk assessment • Formulation of the research needs and policy actions (overview of the Belgian expertise)
ED-North project ~ Methods • Literature study - electronic databases: Poltox, Medline, Current Contents, CAB abstracts, Agris, Agricola, Web of Science,… - world wide web: USEPA, OECD, WWF, CEFIC, IEH,… - grey literature • Database MS Access (relational database)
ED-North project ~ Results • General overview of endocrine disruption in humans and other mammals, birds, reptiles, fish and invertebrates • Situation in Belgium and The Netherlands • Expertise in Belgium • Emission of synthetic and natural hormones in Belgium • Sources, effects and occurrence of endocrine disruptors in the North Sea + prioritization • Database of (potential) endocrine disruptors for the North Sea ecosystem
Relational database: anthropogenic (potential) endocrine disruptors CHEMICALS (765) Chemical ID Chemical Name Nl Chemical Name E CAS UN Chemical Formula Molecular Weight Boiling Point Melting Point Density Pressure Solubility Log Kow Phase Notes ENDOCRINE Endocrine ID Chemical ID Reference ID Group Name Organism Tissue Age In vivo Lab Flow Duration Route Temperature Concentration Notes EFFECT (3516) Effect ID Hormone Name Endocrine ID Effect Code Effect description HORMONE Hormone Name EFFECT CODE Effect Code REFERENCES (423) Reference ID Authors Year Title Source GROUP Group Name
EndocrinID RefID Chem ID Authors ChemNameNl Chem ID Ref ID CAS Group Chem Form Organism Year Mol weight Source Tissue BP Age MP In Vivo Pressure Lab Dura tion Solubility Concentra tion Log Kow Notes Phase 26 240 2598 Soto, A.M., Chung, K.L., Sonnenschein, C. DDT 240 26 50-29-3 mammalian C14H9Cl5 Human 1994 354,49 Environ. Health Perspect., 102:380-383 MCF-7 cells 260°C 108°C In vitro 1,9E-7 mm Hg at 20°C Laboratory 6 days 3,1-3,4 µg/l 10 µM 6,19 Technical grade; E-screen Solid Relational database Tabel: References Tabel: Endocrine Tabel: Chemicals
Rule induction techniques Data mining (analysis) techniques: 1) Clustering methods (which data are related or ‘similar’)e.g. cluster analysis 2) Classification methods (how are variables related, merely using classes (numerical or not) = rules amongst variables)e.g. decision trees 3) Regression methods (quantitative description of the relation between two variables)e.g. multivariate regression B A B A B A
Rule induction techniques • Classification and decision trees: induction of rules from datasets • which variablesare relatede.g. which variables are mainly related to endocrine disruptive effects in animals • how are variables related (quantitative rules making use oftreshold values or classes)e.g. when hormone concentration higher thanvalue A, then estrogenic effects of type X will occur
Rule induction techniques WEKA data mining software: DOS command window but also Visual JAVA interface
Induced rule set Rule set performance indicators
Applications on ED-North database Example on crustacean data 1) Prediction of endocrine disruptive effectsbased on physical/chemical properties of chemicals 2) Prediction of estrogenic effect of chemicals to the crustaceans in the database 3) Which factors (flow, concentration, duration, ...) affect this estrogenicity
1) Which molecular characteristics are related to estrogenic effects Estrogenic effects in crustaceans (89 cases) Tested variables: effects, molecular weight, boiling point, temperature, Log Kow, solubility Induced rule set: LogKow 3.74: Estrogenic effect LogKow > 3.74 | Solubility 0.00033: No Estrogenic effect | Solubility > 0.00033: Estrogenic effect Reliability (CCI): 63 %
2) Which estrogenic effects are related with particular compounds in the environment Estrogenic effects in crustaceans Tested variables: effects, compounds Induced rule set (23 rules, one for each compound): CHEMID = 4-nonylphenol (p-nonylphenol): Estrogenic effect CHEMID = ... ... CHEMID = 20-hydroxyecdysone: No Estrogenic effect Reliability (CCI): 60 %
2) Which estrogenic effects are related with particular compounds in the environment Estrogenic effects in crustaceans Tested variables: effects, organisms, compounds Induced rule set (13 rules, one for each organism): Organism = Balanus amphitrite: No estrogenic effect Organism = Daphnia magna: Estrogenic effect ... Reliability (CCI): 74 %
3) Which factors affect the estrogenic effects Estrogenic effects in crustaceans Tested variables: effects, organisms, compounds, age, flow, invitro/invivo, duration Induced rule set (16 rules, one for each age class and for larval also one for each organism type): Age = Juvenile: No estrogenic effect Age = Larval | Organism = Balanus amphitrite : Estrogenic effect | Organism = ... Age = Adult: Estrogenic effect Age = Egg: Estrogenic effect Reliability (CCI): 78 %
General discussion This exercice on the ED North data base illustrated that data mining can help to find relations between: Compounds and their structure Estrogenic effects Test and environmental conditions Type of organisms
General discussion Data mining helps to find errors and outliers in the data set, and creates insights to improve further data collection and the development of databases Interaction between data miners and domain experts (ecologist, ecotoxicologist) very important: 1) easily find ‘reliable nonsense’ rules by excluding important variables during the analysis (need for expertise of ecotoxicologist) 2) the parameter settings and the insight in tuning them have a very important impact on the richness of the outcome of the data mining exercice (need for data mining expertise)
General discussion • The collected data set itself influences to an important extend the outcome of the analysis: • importance of collecting data that cover the whole range (variables and their values/classes) and stratification of the instances is necessary • Selection of variable-classes can affect the results to a high extend (e.g. larval-adult problem, amount of effect-classes, ...)
Conclusions Data mining allows to find whichgapsexistin the databaseand deliversinformation for sustainable data collection and management Data mining delivers insight in the dataset:generation of knowledge from data Highly impredictable parts in the dataset are useful tofocus further researchon General reliable rules are promising fordecision support in environmental management Important to be aware ofexploring correlations instead of causal relations!Control by experts or further research(validation)is always necessary Data mining addsmorecolourtoourdata
Acknowledgements Federal Office for Scientific, Technical and Cultural Affairs (OSTC) Thesis students Ward Vanden Berghe (VLIZ) The Flemish Institute for the Promotion of Scientific and Technological Research in Industry (IWT)