1 / 17

Statistical basis for dynamic prediction of beach bacteria concentrations

Statistical basis for dynamic prediction of beach bacteria concentrations. Zhongfu Ge National Research Council Walter E. Frick Ecosystems Research Div., NERL, USEPA, Athens, GA. NOAA’s Ocean and Human Health Initiative All PI’s 2006 Annual Meeting January 18-20, 2006 Charleston, SC.

taniel
Download Presentation

Statistical basis for dynamic prediction of beach bacteria concentrations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical basis for dynamic prediction of beach bacteria concentrations Zhongfu Ge National Research Council Walter E. Frick Ecosystems Research Div., NERL, USEPA, Athens, GA NOAA’s Ocean and Human Health Initiative All PI’s 2006 Annual Meeting January 18-20, 2006 Charleston, SC

  2. Statistical basis for the empirical model of Virtual Beach

  3. Objectives • To demonstrate a multiple linear regression modeling of E coli concentrations • To clarify a few misunderstandings and pitfalls in practice • To promote the idea of dynamic modeling: based on a growing data-base

  4. Example of modeling at a Lake Erie beach • Huntington Beach, OH: data of 247 days in 2001; only four explanatory variables available

  5. Correlation coeff. with time delay : Insignificant correlations Cross-correlation with time delays • Do the data records need to be synchronized? Not for this case Highest correlation at zero time delay

  6. Transformation is very necessary • Inspect scatter plots to see if we need any transformation to make equal spread

  7. Still remember transformations for equal spreads Interaction terms • Including interaction terms can greatly improve fitting performance

  8. Categorized data nearly normally distributed Categorization of wind direction • Wind direction was categorized into northerly (WD=0) and southerly (WD=1) winds; histograms to show equal spreads Northerly winds Southerly winds

  9. Other issues with data inspection • Multicollinearity: variance inflation factor (VIF) for each explanatory variable Correlation coefficients • Adjustment for “time-series effect”

  10. Residuals are highly normal MLR fitting of the full model and the normality of the residuals

  11. Outlier identification • Adjustment for serial correlation Table 3: Influential cases and outliers from the full model; total number of cases 247; numbers in red are influential outliers (appearing in the both rows) It’s not simple to deal with outliers; if there is no evidence of errors in measurements, they should be kept

  12. : number of variables; : sample size; : standard deviation Best models Model selection • Backward elimination: Cp + R2 or BIC + R2 Sequence of elimination

  13. Model selection • What if we didn’t have interaction terms? All models are biased, R2 means nothing

  14. Dynamic modeling • Models are updated when new observations are added to the data base Predictions with (left) and without (right) outliers (#1 and #135 days)

  15. Dynamic modeling • The table below shows how models change with time The variable is in the model; R2 is consistently around 48%

  16. Conclusions • Model selection should be implemented using Cp and R2 as criteria; R2 or t-statistic alone doesn’t mean anything • Transformations make models correct • Inclusion of interaction terms can improve R2 of the model; it’s useful especially when variables are limited. (48% in the current case compared with 41% in previous works without interactions) • Optimal models are both beach-specific and time-varying

  17. References • Francy, D.S. and R.A. Darner 1998. Factors affecting Escherichia coli  concentrations at Lake Erie public bathing beaches. USGS Water Resources Investigations Report 98-4241. Columbus, Ohio • Nevers M.B. and R.L. Whitman. Protecting visitor health in beach waters of Lake Michigan: problems and opportunities. The State of Lake Michigan: Ecology, Health and Management, Eds. T. Edsall & M. Munawar. Ecovision World Monograph Series, 2004 Aquatic Ecosystem Health and Management Society • Ramsey, F.L. and D.W. Schafer 2002. The statistical sleuth: a course in methods of data analysis, second edition. Duxbury Thomson Learning Acknowledgements

More Related