570 likes | 1.72k Views
Statistical Weather Forecasting. Independent Study Daria Kluver From Statistical Methods in the Atmospheric Sciences by Daniel Wilks. Outline. Review of Least-Squares Regression Simple Linear Regression Distribution of the Residuals The Analysis-of-variance table
E N D
Statistical Weather Forecasting Independent Study DariaKluver From Statistical Methods in the Atmospheric Sciences by Daniel Wilks
Outline • Review of Least-Squares Regression • Simple Linear Regression • Distribution of the Residuals • The Analysis-of-variance table • Goodness-of-fit Measures • Sampling Distributions of the Regression Coefficients • Examining Residuals • Prediction Intervals • Multiple Linear Regression • Derived predictor variables in multiple regression • Objective Forecasts- Without NWP • Stratification and Compositing • When the Predictand is a Probability • Predictor Selection • Screening regression • Stopping rules • Cross-validation • Analog Forecasting
Background • Statistical methods are necessary because the atmosphere is a nonlinear dynamical system, and not perfectly predictable in a deterministic sense. • Classical statistical forecast methods- do not operate with info from fluid-dynamical NWP models. Still useful at very short lead times (hrs in advance), or very long lead times (weeks or more).
Review of Least-squares regression • Least-squares regression - Describes the linear relationship between 2 variables. • Usually more than 1 predictor (independent) variable is needed in practical forecasts, but the ideas of Multiple Linear Regression are covered in the simple case.
Simple Linear Regression • Basically, simple linear regression seeks to summarize the relationship between 2 variables, by a straight line. • The regression procedure chooses the line that produces the least error for predictions of y based on x. • Why least-squares? • Line-fitting is fairly tolerant of small discrepancies between line and points, but is not resistant to outliers.
Example 15.0 miles (24.1 km) NW of Goldendale, Klickitat, WA, USAApprox. altitude: 1089 m (3572 ft)
The linear regression as calculated by SPSS: • Eqn is: • Y^=40.88x-76956.87
Line is y=a+bx • And the error, or residuals are ei=yi-y(xi) • Combine line + residuals to get regression eqn: yi=yi+ei=a+bxi+ei • Means that the true value of the predictand is the sum of predicted value + residual.
In order to minimize the sum of squared residuals sum(ei)2=sum(i-yi)2=sum(yi-[a+bxi]2 • Set derivative wrt parameters a and b to zero and solve derivaties are:… rearrange to get normal equations: sum yi=na+bsum(xi) sumxiyi=asumxi+bsum(xi)2 • Solving the normal equations for the regression parameters: b=nsum(xiyi)-sumxisumyi/nsum(xi)2-(sumxi)2 and a=ybar-bxbar
Distribution of the Residuals • Conventional to assume ‘ei’s are independent random variables with zero mean and constant variance. It is also sometimes assumed that these residuals follow a Gaussian distribution. • A constant-variance assumption really means that the variance of the residuals is constant in x. Therefore a given residual is equally likely to occur at any part of the regression line. Fig 6.2 same distributions but means are shifted depending on the level of the regression line. • In order to make statistical inferences in the regression setting this constant residual variance from the sample of the residuals must be made.
Sample average of residuals squared • S2e=1/n-2sumei2 =1847651.6 • Remember ei=yi-y(xi) • So s2e=1/n-2sum[yi-y(xi)]2 to compute the estimated residual variance. • Commonly written as SST=SSR+SSE
SST=SSR+SSE • SST • Total sum of squares • Sst=sum(yi-ybar)2=sumy2i-ny • Proportional(by a factor of n-1) to the sample variance of y • Measures the overall variability of the predictand. • SSR • Regression sum of squares • SSR=sum[y(xi)-ybar]2 sum squared difference between the regression predictions and the sample mean of y • Relates to regression eqn by: SSR=b2sum(xi-xbar)2=b2[sumx2i-nxbar2] • SSE • SSE=sum of squared errors • sum of squared differences btwn the residuals and their mean (z) • SSE=sum ei2 • So s2e=1/n-2{SST-SSR}=1/n-2{sumy2i-ny-2-b2[sumxi2-nxbar2]} computational form.
Analysis-of-variance table • Output from regression analysis is often given in an ANOVA table.
Goodness-of-Fit Measures • ANOVA table allows computation of 3 related measures of the “fit” of a regression • Fit=correspondence between the regression line and a scatterplot of the data • MSE-fundamental because it indicates the variabillity of the observed (predictand) y values around the forecast regression line • Reflects the accuracies of resulting forecasts. • MSE=se2 and shows how residuals cluster around the regression line. • R2-coefficient of determination • R2=SSR/SST=1-SSE/SST portion of the variation of the predictand “described” by the regression. • |sqrtR2|=pearson correlation • F ratio = MSR/MSE strong xy relationship=increased MSR, dec MSE • In multiple regression problems of multiplicity invalidate it. Model Summary(b)
Sampling Distributions of the Regression Coefficients • Estimation of a’s and b’s sampling distributions allows you to construct confidence intervals around the parameter values obtained. (also used for hypothesis testing about population values). • We assumed Gaussian distributions. • Eqns of intersept and slope • Showing that the percision with which the b and a can be estimated depends on the estimated standard deviation of the residuals, Se. • Some packages output “t ratio” = ratios of estimate parameters to their standard errors >>> a 1 sample t test is implied. Null hypothesis being that the underlying population mean for the parameter is zero reject null hy. At 5% level of est slop is ~2 as large as standard error.
Examining Residuals • It is important to check that all the assumptions you made hold true. And these pertain to the residuals. • Plot the residuals as a function of predicted value yhat.
Heteroscedasticity – non constant residual variance • Solution – apply a logarithmic transformation to y, which reduces larger values more strongly. • Or y**2 which stretches the small values? • **remember to recover predictand! Y=e^lny=e^(a+bx) • To determine graphically if the residuals folow a gaussian distribution do a quantile-quantile (Q-Q_ plot. • Investigate the degree to which the residuals are mutually independent. • Plot regression residuals as a function of time. If + or – cluster tog, time correlation is suspected.
Formal test: Durbin-Watson test. 1.33 for this example, reject null hypothesis. • Ho: residuals are serially independent • Halt: consistent with a 1st order autoregressive process • D=sum(ei-ei-1)2/sum ei2 if es are + corr d is small if es are randomly distributed d is large, so don’t reject Ho. • Fig 6.7 shows critical d values at 5% level. • **A trend in the residuals may indicate the nature of the relationship changing with time and another predictor variable may be needed.
Prediction Intervals • To calculate confidence intervals around forecast values. • Residuals must follow a gaussian distribution. • MSE=se2 residual estimated variance • So 95% confidence interval is approx bounded by yhat+-2se (TABLEB1) good with large sample size • Sqrt of se2 is standard dev of residuals = 1359.3 • Interval is +- 2718.6 • But predictand and slope are subject to sampling variations • For a forecast of y using predictand value xo prediction variance = s2yhant=se2[1+1/n+(xo-xbar)2/sum(xi-xbar)2] valid for simple LR
Multiple Linear Regression • Prediction eqn (where k=number of predictands) • Yhat=b0+b1x1+b2x2…+bkxk • Several things are the same but with vectors and matrix algebra. • MSE=SSE/(n-k-1) • R2 is the same, but is NOT the square of the pearson correlation coef btwn y and any of the x variables.
Model Summary(c) ANOVA(c) a Predictors: (Constant), season b Predictors: (Constant), season, stations c Dependent Variable: total a Predictors: (Constant), season b Predictors: (Constant), season, stations c Dependent Variable: total
Coefficients(a) a Dependent Variable: total
Derived Predictor variables in MR • “derived” predictors- mathematical transformation of predictor variables. • Ex power x2=x12 • Trid sin, cos • Binary or “dummy” (or 0 based on a threshold)
Objective Forecasts- Without NWP • Classical statistical forecasting = construction of weather forecasts through purely statistical means, without infor from NWP models. • Mostly based on multiple regression eqns • Objective – particular set of inputs will always produce the same forecast, once the eqns are developed. • **however, several subjective decesions go into the development of the forecast eqns. • You need developmental data:-past values of the predictand and matching collection of potential predictors, whose values will be known prior to the forecast time. • Regression eqn is fit to this historical “training” data. • *predictors that are not available in time for operational production of forecasts are of NO VALUE.
Stratification and Compositing • Physical and statistical relationships may change according to snow know condition (ie seasons) • Have a predictor that is a function of day • Sometimes better by stratifying data according to time fo year and producting separate forecast eqns for each month or season. • Stratify by meteorologic or climatological criteria. • Ex: • Separate temp forecasts for snow/non snow covered • Long-range forecasts for el nino vs non el nino • You can also composite datasets to increase sample size. • Data forom nearby (climatologically similar) stations
When the Predictand is a Probability • *Advantage of statistical over (deterministic) dynamical forecasting is the capacity to produce probability forecasts. • Explicit expression of the inherent uncertainty • Allows users to extract more value in decision making. • To produce probability info about a predictand • 1st transform predictand to a binary variable • 2nd do regression: 2 regression approaches: • (simplest) Regression Estimation of Event Probabilities (REEP) • MLR to derive a forecast eqn for binary predictand • Values are btwn 0 and 1, treated as specifications of probabilities of y=1 • No more computationally demanding than any other linear regression problem. • Logistic Regression (more theoretically satisfying) • Also binary predictand • Fits regression parameters to nonlinear eqn … • Guaranteed to produce properly bounded probability estimates. • Parameters must be estimated using iterative techniques, computationally intensive.
ANOVA(b) • y=1 if total snow more than 100 year mean. Model Summary(b) a Predictors: (Constant), EA, stations, season b Dependent Variable: VAR00017 a Predictors: (Constant), EA, stations, season b Dependent Variable: VAR00017 Coefficients(a)
Predictor Selection • How to find a good subset of potential predictors • Dangers in including too many predictor variables • Example with nonsense predictors and a perfect fit. • Development sample, dependent sample, or training sample- portion of available data used to produce the forecast eqn. • Any K=n-1 predictors will produce a perfect regression fit to any predictand for which there are n observations. Called “overfitting” • Will not work for independent, or verification data- data not used in the development of the equation.
Only the first 50 years: Model Summary a Predictors: (Constant), stations1, year b Dependent Variable: total1 ANOVA(b)
Coefficients(a) a Dependent Variable: total1
Lessons learned: • Chose only physically reasonable or meaningful predictors • Tentative regression eqn needs to be tested on a sample of data not involved in its development. (a very large difference in performance btwn the dependent and independent samples would lead to suspicion that the eqn was over fit). • You need a reasonably large developmental sample if the resulting regression eqn is to be stable-regression coefficients are also applicable to independent (future) data and not from an overfit. *in forecasting, little gained form 12+ predictors.
Screening Regression • Relevant predictor variables are almost always mutually correlated, so there is redundant info. • Screening regression – the problem of selecting a good set of predictors from a pool of potential predictors • Common type of screening: • Forward selection AKA stepwise regression (highest R2, lowest MSE, largest F ratio) • Backward elimination- remove (smallest t ratio)
Stopping Rules • Both forward selection and backward elimination require stopping criterion. • Ex: what if every time you recalculate the reg the F ratios change. • In practice, other less rigorous stopping criteria are used • Stop adding when none of the remaining predictors would reduce the R2 by an amount (ex 0.05%) • Use MSE. sqrtMSE directly reflects the anticipated forecast accuracy f the regression. Ex if MSE 0.01F2 why add more. Indicate +-2stdev (~95%) confidence interval of +-squr 0.01=.2F. Stopping criterion is where MSE does not decline appreciably with the addition of more predictors. • Artificial skill-underestimate of the operational MSE proveded by the performance of a forecast eqn on a developmental data set. • Sometimes the minimum number of K is smaller for the independent data. • For black box, you just minimize MSE not worry about identities of predictors. • For scientific understanding it is not the reduction of MSE tha is most important, but the relationships of the variables suggested by the regression procedure.
Cross-Validation • A resampling technique • The available data re repeatedly divided into developmental and verification data subsets. • Often: developmental data sets are size n-1 and verification data sets contain remaining ob of predictand therefore n partitions of the data • The cross-validation estimate of the prediction MSE is computed by forecasting each omitted ob, computing the squared difference btwn prediction and predictand and averaging the n squared differences. • Could do this for any m withheld obs and n-m size developmental data sets.
Analog Forecasting • To search archives of climatological synoptic data for maps closely resembling current observations, and assume that the future evolution of the atmosphere will be similar to the flows that followed the historical analog. • The atmosphere is a perfect model of itself • The atmosphere apparently does not ever exactly repeat itself • You must match all relevant fields (not just heights and temp, but SST etc) • Stats can be used to search for analogs and rank historical cases for closeness to current • Take average outcomes of all good analogs.