480 likes | 628 Views
Statistical trends and time series a recap. July 2012 Marian Scott and Adrian Bowman. Measurement and assessment of change-. Two topics to consider Regression modelling- in general Time series Leading to Trends (combining time series and regression ideas) Meet some examples
E N D
Statistical trends and time seriesa recap July 2012 Marian Scott and Adrian Bowman
Measurement and assessment of change- • Two topics to consider • Regression modelling- in general • Time series • Leading to • Trends (combining time series and regression ideas) • Meet some examples • Cover some of the ideas • Apply them
Trends and change • In time (SNIFFER, 2006) • A linear regression equation was calculated for each dataset and then the trend was calculated from the gradient parameter (i.e. the rate of change) multiplied by the length of the data period to provide a clear change value since the start of the period. • “the significance of trends was tested using the non-parametric Mann-Kendall tau test (Sneyers, 1990). Linear trends with the Mann-Kendall significance test are widely used in the analysis of climate trends”
Joint Nature Conservation Council definition of trend • a trend is a measurement of change derived from a comparison of the results of two or more statistics. • A trend relates to a range of dates spanning the statistics from which it is derived, e.g. 1996 - 2000. A trend will generally be expressed as a percentage change (+ for an increase, - for a decrease) or as an index.
Statistical definition of trend • What is a statistical trend? • A long-term change in the mean level (Chatfield, 1996) • Long-term movement (Kendall and Ord, 1990) • The non-random function (t)= E (Y(t)) (Diggle, 1990) • Trend is a long-term behaviour of the process, trends in mean, variance and extremes may be of interest (Chandler, 2002) • Environmental change often but not always means a statistical trend • Not restricted to linear (or even monotonic) trends
Statistical tools for exploring and quantifying trend • Exploratory tools • Scatterplot, Time series plots, smoothed trends over time (are the series equally spaced, no missing data?) • More formal tools • Can you assume monotonicity?, is the trend linear? • Non-parametric estimation and testing (classic tests) • Semi-parametric and non-parametric additive models (for irregular spaced data) • what is monotonic? steadily increasing or decreasing
Simple Regression Model • The basic regression model assumes: • The average value of the response y, is linearly related to the explanatory x, • The spread of the response y, about the average is the SAME for all values of x, The VARIABILITY of the response y, about the average follows a NORMAL distribution for each value of x.
Simple Regression Model • Model is fit typically using least squares • Goodness of fit of model assessed based on residual sum of squares and R2 • Assumptions checked using residual plots • Inference about model parameters • For water quality data, the response would be TOC, the explanatory would be year
Regression Output The regression equation is chloro = - 1.7 + 28.8 N Predictor Coef StDev T PConstant -1.69 10.14 -0.17 0.869N 28.808 4.171 6.91 0.000 S = 15.19 R-Sq = 67.5% R-Sq(adj) = 66.1%
Conclusions • the equation for the best fit straight line has an intercept of -1.7 and a slope of 28.8. Thus for every unit increase in N, the chloro measures increases by 28.8. • The R2(adj) value is 66.1%, so we have explained 66% of the variation in chloro by its relationship to N. The S value is 15.19, which describes the variation in the points around this fitted line.
Checking assumptions • Usually based round residuals • Residuals are the differences between each observation and the corresponding model fitted value • They can be positive or negative but should be on average zero. • Residual plots are common model assessment tools (scatterplot of residuals vs fitted values)
A straight line model for the Nile • Annual river flow from ~1870 • Straight line is a relatively poor fit, lots of variation.
A straight line model for the Nile • relatively poor fit, lot of variation. • Any pattern in the residuals?
A quadratic model for the Nile • better fit, still lots of variation • Gives a smooth change, not abrupt • Any pattern in the residuals?
a non-parametric model for the Nile • a smooth function (LOESS) or non-parametric regression model • OK? • In later sessions, you will see some more flexible modelling tools
Regression examples? • In practical3final.txt, some R commands to complete some analyses • Example 1: Loch Lomond, plots and simple regression
what is a time series? • a time series is a sequence of measurements made over time. • notationally, this would commonly be written as y1, y2,…, yi, ….yT • the index i denotes the position in the sequence of observations • often we will assume that the data are equally spaced-so that i is truly an index, but for many environmental time series observations are not equally spaced.
how to plot the data a time series plot • choice of the x-axis scale • occasionally, each observation is indexed by its position in the sequence (OK if equally spaced) • alternatively, we may use the actual timescale (e.g. if an annual series, years or a daily series, then days 1-365) • or we may regard time on a continuous scale (time might be recorded in decimal form e.g. 1986.5- which would be June 1986)- this latter is often the preferred form for statistical modelling (time is then a continuous variable)
How is biodiversity changing (EEA CSI 009) • Populations of common and widespread farmland bird species in 2003 are only 71% of their 1980 levels. • an annual indicator
How is biodiversity changing (kitiwakes) (JNCC DEFRA) • the UK index of kittiwake abundance has declined rapidly since the early 1990s, such that by 2009 the index was just 50% of that in 1986, the lowest value in the 24 years of monitoring. • Notice the uncertainty bands
Water quality- freshwater • Concentrations of P generally decreased • Nitrate concentrations decreasing • What are the rates of change and are they significant?
Example: a time series plot (daily values) the x-axis shows the actual date
Example- air quality, monitored through time (from EMEP programme) note the gaps and the rather extreme values- one strategy is to take logs These are daily data
Observed temperature anomalies in Europe. • Change in different periods of the year may have different effects, • start of the growing season determined by spring and autumn temps, • changes in winter important for species survival. • note that the presentation shows winter and summer separately
River Clyde Nitrate in the Clyde sea area in different seasons
Environmental time series data features • patterns over time (both short and long term) • often missing data- may cause problems for statistical analysis • variation, which may not be constant over time so may need to consider transformations (log)
Seasonal patterns (cycles) • in many environmental times series, we could imagine some periodicity (e.g. such as a monthly pattern in temperature) • so it is common to produce a “seasonality plot”. the index (x-axis scale) depends on the period over which the cycle repeats itself (monthly, daily) • We will need to include a term in any model to describe these features
Example: Loch Leven, monthly data- data are plotted over the months of the year (Lowess smooth included)
what are the questions of interest? • we want to know about trends, where a trend is defined to be: • the long-term sweep of the data. • we want to know about possible seasonality (or cycles) • The seasonalcomponent of a time series describes a regular fluctuation which has a period. (The period is the time interval between consecutive peaks or troughs.)
Regression examples? • In practical3final.txt, some R commands to complete some analyses • Qn 1b) 1: Loch Lomond, plots and simple regression- and with an investigation of seasonality • Qn 2: dissolved oxygen in Clyde- simple and multiple regression, year, temperature and salinity are explanatory variables
a descriptive model • A useful descriptive model for a time series consists of 3 components: • X = Trend + Seasonal Component + Irregular Component or X = T+S+I • I is the irregular component, which is left over when the trend, and seasonal components are all accounted for. It is an irregular or random fluctuation (like residuals in regression).
smoothing a time series • In many time series, the seasonal variation can be so strong that it obscures any trend or cyclical component. However, for understanding the process being observed (and forecasting future values of the series), trends and cycles are of prime importance. Smoothing is a process designed to remove seasonality so that the long-term movements in a time series can be seen more clearly
Example: different smoothing technique applied to air quality data (that have been logged)
Example : water quality in the River Clyde • A very complex regression model is of the form • yi= 0(xi) + 1(xi)cos(2xi-(xi)) + i; i = 1;…;n; • includes a mean trend term and seasonal variation as follows: xi is year in decimal term • This includes smooth terms 0 and 1 and a varying coefficient seasonal term (modelled parametrically) using cosines • This can be simplified by setting some parameters to be constant
Example : Loch Leven-trends correcting for covariates • Loch Leven: key loch for water framework directive: environmental effect of interest is eutrophication: • measurement series covers 30 years, including a variety of biological, chemical and hydrological indicators but irregular in time. • Substantial improvement in the loch water quality,
other examples to try • Qns 3 in practical3final.txt • Qn3 asks whether DO is different before and after an upgrade to Shieldhall sewage work, to do this in a regression framework we need to introduce a FACTOR (a variable that takes only two values to identify before and after 1985).
When time is the explanatory variable • in many situations, we expect successive observations to show correlation at adjacent time points (most likely stronger the closer the time points are), strength of dependence usually depends on time separation or lag • for regularly spaced data, we typically make use of the autocorrelation function (ACF) to asses how strong this correlation is • We have not considered this in the earlier examples but.....