350 likes | 380 Views
A Brief Introduction to Statistical Forecasting. Kevin Werner. Outline. Principle Component Theory Applications Z Score VIPER. Basic Forecast Methods. Simulation modeling. Statistical regression. S Fork Rio Grande, Colo. Snow. Rainfall. Heat. Apr-Jul streamflow % avg. Snowpack.
E N D
A Brief Introduction to Statistical Forecasting Kevin Werner
Outline • Principle Component Theory • Applications • Z Score • VIPER
Basic Forecast Methods Simulation modeling Statistical regression S Fork Rio Grande, Colo Snow Rainfall Heat Apr-Jul streamflow % avg Snowpack Runoff Soil water May 1 snowpack % avg Credit: Tom Pagano
The General Linear Regression Model where: Y = dependent variable Xi = independent variables bi = regression coefficients n = number of independent variables Credit: Dave Garen
The Problem If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated. However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness. Credit: Dave Garen
Example Streamflow = bo + b1 * (Snotel A) + b2 * (Snotel B) -> Snotel sites are very well correlated -> An optimal b1 and b2 will be difficult to determine since the correlation is so strong
The Solution Possibilities: 1) Pre-combine X’s into composite index(es), e.g., Z-score method 2) Principal components regression These are similar in concept but differ in the mathematics. Credit: Dave Garen
Principal Components Analysis Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables. Principal components are linear combinations of the X’s. Credit: Dave Garen
Principal Components Analysis Each principal component is a weighted sum of all the X’s: . . . Credit: Dave Garen
Principal Components Analysis The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other. Principal components are new variables that are not correlated with each other. The principal components transformation is equivalent to a rotation of axes. Credit: Dave Garen
Principal Components Analysis Credit: Dave Garen
Principal Components Analysis The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true). Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression. Credit: Dave Garen
Principal Components Analysis -- Example Independent Variables: X1 – X5 Snow water equivalent at 5 stations X6 – X10 Water year to date precipitation at 5 stations X11 Antecedent streamflow X12 Climate teleconnection index Credit: Dave Garen
Correlation Matrix Credit: Dave Garen
First Five Eigenvectors Credit: Dave Garen
Principal Components Regression Procedure • Try the PC’s in order • Test for regression coefficient significance (t-test) • Stop at first insignificant component • Transform regression coefficients to be in terms of original variables • Sign test – coefficient signs must be same as correlation with Y Credit: Dave Garen
Summary • Principal components analysis is a standard multivariate statistical procedure • Can be used for descriptive purposes to reduce the dimensionality of correlated variables • Can be taken a step further to provide new, non-correlated independent variables for regression • PC’s taken in order, subject to t-test and sign test • Final model is expressed in terms of original X variables Credit: Dave Garen
Soil Moisture at the interannual timescale • Another example demonstrating importance of land surface processes in the climate system: Werner, 1999: • GCM run with and without active land surface model in South America to explore the importance of land surface processes in the climate system variability in the Nordeste region. • Both simulations include full atmospheric model, slab ocean model (no ocean dynamics), and dynamic land surface model everywhere except tropical South America in the Data Land simulation.
Soil Moisture at the interannual timescale • Modeled variability • Full dynamic land surface model simulation contains variability resembling observed variability with connection between NH and SH SSTs. • Fixed land surface model shows no connected variability between NH and SH SSTs
Resources • Dave Garen VIPER slides • Dennis Hartmann lecture notes (http://www.atmos.washington.edu/~dennis/)
What does z-score regression do? 1. Combines predictors into weighted indices, emphasizing good stations, minimizing bad ones. 2. Compensates for missing data with remaining data. 3. Regresses index against target predictand Credit: Tom Pagano
What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation Credit: Tom Pagano
What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation Credit: Tom Pagano
What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation avg stdev 135 30 60 15 Credit: Tom Pagano
What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation avg stdev 135 30 60 15 Z = (90 – 60)/15 = +2 Credit: Tom Pagano
How good are the results Under conditions of serially compete data, and relatively “normal” conditions PCA and Z-Score are effectively indistinguishable* Skill and behavior is similar to the official published outlooks** However… Any tool is a weapon if you hold it right. (aka “A fool with a tool is still a tool”) Credit: Tom Pagano *Viper technical note - 1 basin ** Pagano dissertation – 29 basins
The Viper Main Interface Layout and interpretation Credit: Tom Pagano
The Viper Main Interface Layout and interpretation Selecting predictors and predictands Global month changes Credit: Tom Pagano
The Viper Main Interface Layout and interpretation Selecting predictors and predictands Global month changes Predictors quality, availability Historical statistics Credit: Tom Pagano
The Viper Main Interface Layout and interpretation Selecting predictors and predictands Forecast vs observed time series Station availability, weights Global month changes Predictors quality, availability Historical statistics Credit: Tom Pagano
The Viper Main Interface Layout and interpretation Selecting predictors and predictands Forecast vs observed time series Station availability, weights Global month changes Predictors quality, availability Fcst vs obs scatterplot Helper variable Scatterplot/ Forecast progression Historical statistics Credit: Tom Pagano
The Viper Main Interface Layout and interpretation Selecting predictors and predictands Forecast vs observed time series Station availability, weights Global month changes Predictors quality, availability Fcst vs obs scatterplot Helper variable Scatterplot/ Forecast progression Settings Probability bounds Historical statistics Credit: Tom Pagano
Selecting predictors and predictands Forecast vs observed time series Station availability, weights Global month changes Predictors quality, availability Fcst vs obs scatterplot Helper variable Scatterplot/ Forecast progression Settings Probability bounds Historical statistics The Viper Main Interface Layout and interpretation There’s more if you scroll right: Relate any variable to another Credit: Tom Pagano