Statistical basis for dynamic prediction of beach bacteria concentrations

Statistical basis for dynamic prediction of beach bacteria concentrations Zhongfu Ge National Research Council Walter E. Frick Ecosystems Research Div., NERL, USEPA, Athens, GA NOAA’s Ocean and Human Health Initiative All PI’s 2006 Annual Meeting January 18-20, 2006 Charleston, SC

Statistical basis for the empirical model of Virtual Beach

Objectives • To demonstrate a multiple linear regression modeling of E coli concentrations • To clarify a few misunderstandings and pitfalls in practice • To promote the idea of dynamic modeling: based on a growing data-base

Example of modeling at a Lake Erie beach • Huntington Beach, OH: data of 247 days in 2001; only four explanatory variables available

Correlation coeff. with time delay : Insignificant correlations Cross-correlation with time delays • Do the data records need to be synchronized? Not for this case Highest correlation at zero time delay

Transformation is very necessary • Inspect scatter plots to see if we need any transformation to make equal spread

Still remember transformations for equal spreads Interaction terms • Including interaction terms can greatly improve fitting performance

Categorized data nearly normally distributed Categorization of wind direction • Wind direction was categorized into northerly (WD=0) and southerly (WD=1) winds; histograms to show equal spreads Northerly winds Southerly winds

Other issues with data inspection • Multicollinearity: variance inflation factor (VIF) for each explanatory variable Correlation coefficients • Adjustment for “time-series effect”

Residuals are highly normal MLR fitting of the full model and the normality of the residuals

Outlier identification • Adjustment for serial correlation Table 3: Influential cases and outliers from the full model; total number of cases 247; numbers in red are influential outliers (appearing in the both rows) It’s not simple to deal with outliers; if there is no evidence of errors in measurements, they should be kept

: number of variables; : sample size; : standard deviation Best models Model selection • Backward elimination: Cp + R2 or BIC + R2 Sequence of elimination

Model selection • What if we didn’t have interaction terms? All models are biased, R2 means nothing

Dynamic modeling • Models are updated when new observations are added to the data base Predictions with (left) and without (right) outliers (#1 and #135 days)

Dynamic modeling • The table below shows how models change with time The variable is in the model; R2 is consistently around 48%

Conclusions • Model selection should be implemented using Cp and R2 as criteria; R2 or t-statistic alone doesn’t mean anything • Transformations make models correct • Inclusion of interaction terms can improve R2 of the model; it’s useful especially when variables are limited. (48% in the current case compared with 41% in previous works without interactions) • Optimal models are both beach-specific and time-varying

References • Francy, D.S. and R.A. Darner 1998. Factors affecting Escherichia coli concentrations at Lake Erie public bathing beaches. USGS Water Resources Investigations Report 98-4241. Columbus, Ohio • Nevers M.B. and R.L. Whitman. Protecting visitor health in beach waters of Lake Michigan: problems and opportunities. The State of Lake Michigan: Ecology, Health and Management, Eds. T. Edsall & M. Munawar. Ecovision World Monograph Series, 2004 Aquatic Ecosystem Health and Management Society • Ramsey, F.L. and D.W. Schafer 2002. The statistical sleuth: a course in methods of data analysis, second edition. Duxbury Thomson Learning Acknowledgements

Statistical basis for dynamic prediction of beach bacteria concentrations

Statistical basis for dynamic prediction of beach bacteria concentrations

Presentation Transcript

Statistical Basis for Quality by Design

Statistical Basis for Quality by Design

Gene Prediction: Statistical Approaches

Dynamic Branch Prediction

Gene Prediction: Statistical Approaches

Dynamic Branch Prediction

Dynamic Branch Prediction

Gene Prediction: Statistical Approaches

Dynamic Branch Prediction

Dynamic Branch Prediction

Dynamic Branch Prediction

The basis of diversity in bacteria

Neural Methods for Dynamic Branch Prediction

Dynamic Branch Prediction

Gene Prediction: Statistical Methods

Gene Prediction: Statistical Approaches

Dynamic Branch Prediction

Implications for Predictability Basis for extended range prediction

Implications for Predictability Basis for extended range prediction

1. The Statistical Basis of Thermodynamics

Dynamic Branch Prediction

Neural Methods for Dynamic Branch Prediction