240 likes | 369 Views
Analysis & interpretation of evidence. Which scenario presents a stronger case for DO causing impairment?. Hypothesis testing: a simple scenario. SCENARIO 1 DO measured upstream & downstream over 9 months Upstream = 9.3 mg/L Downstream = 8.4 mg/L Difference significant at P<0.05.
E N D
Which scenario presents a stronger case for DO causing impairment? Hypothesis testing: a simple scenario • SCENARIO 1 • DO measured upstream & downstream over 9 months • Upstream = 9.3 mg/L • Downstream = 8.4 mg/L • Difference significant at P<0.05 • SCENARIO 2 • DO measured upstream & downstream over 3 months • Upstream = 7.9 mg/L • Downstream = 4.2 mg/L • Difference not significant at P<0.05
Using statistics responsibly • Use caution in interpreting differences • Look at magnitude & consistency of differences, rather than statistical significance • Statistical significance detects differences exceeding natural variance • Does not detect stressor effects • Does not equal biological significance • Can use statistics, but also use your head • Consider the relationship between minimum detectable difference (power) & the biologically relevant difference
Use allavailable types of evidence to make an inferential assessment • Types of evidence using data from the case • Spatial/temporal co-occurrence • Evidence of exposure or biological mechanism • Causal pathway • Stressor-response relationships from the field • Manipulation of exposure • Laboratory tests of site media • Temporal sequence • Verified predictions • Symptoms • Types of evidence using data from elsewhere • Stressor-response relationships from other field studies • Stressor-response relationships from laboratory studies • Stressor-response relationships from ecological simulation models • Mechanistically plausible cause • Manipulation of exposure at other sites • Analogous stressors italics indicates commonly available types of evidence
Sampling/data collection issues • Want to collect candidate cause & biological response variables at same sites, at same times • Want to time sampling to accurately assess exposures • For continuous exposures, try to catch sensitive life stages & conditions that enhance exposure or effects • For episodic exposures, try to characterize episodes through continuous, event, and post-event sampling • Want to collect good reference comparisons, in space & time • At unimpaired sites, at same times you sample impaired sites
Begin with data exploration • Want to examine relationships between different variables • Linear & more complex • Expected & unexpected • 1st step is visualizing data • Scatterplots & boxplots • No statistics, hypotheses, or p-values
Be sure to look at your data… • Do these data support or weaken the case for low dissolved oxygen as a candidate cause? • What type of evidence do the data represent?
Boxplots • Depict distribution of observations • Center box indicates spread of 50% of observations • Whiskers indicate either max & min values, or 1.5X the interquartile range • Good data exploration & visualization tool • Can be used to evaluate data from case & data from elsewhere • Spatial/temporal co-occurrence • Stressor-response relationships from the field & from other field studies • Causal pathway
Conductivity (uS/cm) Scatterplots • Graphical displays of matched data • Dependent variable (y-axis) vs. independent variable (x-axis) • Often dependent variable is biological response, independent variable is measure of candidate cause • Good data exploration & visualization tool • Can view several at once in scatterplot matrix • Can be used to evaluate data from case & from elsewhere • Stressor-response relationships from the field, from other field studies, & from laboratory studies • Causal pathway
reference site EPT taxa richness impaired site Maximum summer water temperature (˚C) Example: scatterplot • Do the data support or weaken the case for temperature as a candidate cause? • What type(s) of evidence do these data provide?
Correlation analysis • Method for measuring degree of association between 2 variables in a matched data set • Correlation coefficient provides quantitative measure of degree to which 2 variables co-vary • Good data exploration tool • Can be used to evaluate data from case • Stressor-response relationships from the field • Causal pathway
Regression analysis • Method for quantifying relationship between dependent & independent variable • Can have ≥ 1 independent variables • Models can be used to: • Predict value of response variable for new values of explanatory variable • Estimate value of explanatory variable needed to account for change in response variable • Model natural variation in stressors, to define reference expectations 90th percentile prediction limits
Quantile regression • Models relationship between specified quantile of response variable and explanatory variable (stressor) • 50th quantile gives median line, under which 50% of observed responses occur • 90th quantile gives line under which 90% of observed responses occur • Provides means of estimating location of upper boundary on scatterplot • Used to evaluate stressor-response relationships from other field studies (Step 4: Evaluating data from elsewhere)
WEAKENS SUPPORTS Applying quantile regression
Predicting environmental conditions from biological observations (PECBO) • Uses taxa list & niche characteristics to predict environmental conditions at site Predicted temperature if both Ameletus & Diphetor are present at site • Maximum likelihood inference based on taxon-environment relationships • Useful when environmental data not available Ameletus Diphetor
Using PECBO to verify predictions • Biologically-based predictions can provide valuable supporting evidence, under “Verified predictions” • If stressor elevated at site, can test hypothesis that biologically-based prediction should also be elevated • Tools for calculating taxon-environmental relationships & PECBO available on CADDIS R2 = 0.67
Species sensitivity distributions (SSDs) An SSD is a statistical distribution describing the variation of responses of a set of species to a stressor; it can be used to estimate the proportion of species adversely affected at a given stressor intensity. SSD plot showing LC50s for arthropods exposed to dissolved cadmium
Data analysis tools • Species sensitivity distribution (SSD) generator • Tool that calculates & plots proportion of species affected at different levels of exposure in laboratory toxicity tests
Data analysis tools • Species sensitivity distribution (SSD) generator • Tool that calculates & plots proportion of species affected at different levels of exposure in laboratory toxicity tests • Command-line R scripts • Powerful and free statistical package [http://www.r-project.org/] • R scripts provided for PECBO • CADStat • Menu-driven package of data visualization & statistical techniques, based on JGR (Java GUI for R)
CADStat is based on R • R is a powerful and free statistical package [http://www.r-project.org/] • Provides flexible plotting algorithms & libraries for running virtually any statistical analysis • It’s run at the command line, so there’s a steep learning curve
Making R easier • A Graphical User Interface (GUI) has been developed for R • Java GUI for R (JGR) • provides point-and-click interface for basic R functions • CADStat adds several tools to JGR • point-and-click interfaces for statistical tools useful in causal analysis • rendering of analysis results in browser window • additional help files