A Brief Introduction to Statistical Forecasting

A Brief Introduction to Statistical Forecasting Kevin Werner

Outline • Principle Component Theory • Applications • Z Score • VIPER

Basic Forecast Methods Simulation modeling Statistical regression S Fork Rio Grande, Colo Snow Rainfall Heat Apr-Jul streamflow % avg Snowpack Runoff Soil water May 1 snowpack % avg Credit: Tom Pagano

The General Linear Regression Model where: Y = dependent variable Xi = independent variables bi = regression coefficients n = number of independent variables Credit: Dave Garen

The Problem If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated. However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness. Credit: Dave Garen

Example Streamflow = bo + b1 * (Snotel A) + b2 * (Snotel B) -> Snotel sites are very well correlated -> An optimal b1 and b2 will be difficult to determine since the correlation is so strong

The Solution Possibilities: 1) Pre-combine X’s into composite index(es), e.g., Z-score method 2) Principal components regression These are similar in concept but differ in the mathematics. Credit: Dave Garen

Principal Components Analysis Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables. Principal components are linear combinations of the X’s. Credit: Dave Garen

Principal Components Analysis Each principal component is a weighted sum of all the X’s: . . . Credit: Dave Garen

Principal Components Analysis The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other. Principal components are new variables that are not correlated with each other. The principal components transformation is equivalent to a rotation of axes. Credit: Dave Garen

Principal Components Analysis Credit: Dave Garen

Principal Components Analysis The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true). Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression. Credit: Dave Garen

Credit: Dennis Hartmann

Principal Components Analysis -- Example Independent Variables: X1 – X5 Snow water equivalent at 5 stations X6 – X10 Water year to date precipitation at 5 stations X11 Antecedent streamflow X12 Climate teleconnection index Credit: Dave Garen

Correlation Matrix Credit: Dave Garen

First Five Eigenvectors Credit: Dave Garen

Principal Components Regression Procedure • Try the PC’s in order • Test for regression coefficient significance (t-test) • Stop at first insignificant component • Transform regression coefficients to be in terms of original variables • Sign test – coefficient signs must be same as correlation with Y Credit: Dave Garen

Summary • Principal components analysis is a standard multivariate statistical procedure • Can be used for descriptive purposes to reduce the dimensionality of correlated variables • Can be taken a step further to provide new, non-correlated independent variables for regression • PC’s taken in order, subject to t-test and sign test • Final model is expressed in terms of original X variables Credit: Dave Garen

Soil Moisture at the interannual timescale • Another example demonstrating importance of land surface processes in the climate system: Werner, 1999: • GCM run with and without active land surface model in South America to explore the importance of land surface processes in the climate system variability in the Nordeste region. • Both simulations include full atmospheric model, slab ocean model (no ocean dynamics), and dynamic land surface model everywhere except tropical South America in the Data Land simulation.

Soil Moisture at the interannual timescale • Modeled variability • Full dynamic land surface model simulation contains variability resembling observed variability with connection between NH and SH SSTs. • Fixed land surface model shows no connected variability between NH and SH SSTs

Resources • Dave Garen VIPER slides • Dennis Hartmann lecture notes (http://www.atmos.washington.edu/~dennis/)

What does z-score regression do? 1. Combines predictors into weighted indices, emphasizing good stations, minimizing bad ones. 2. Compensates for missing data with remaining data. 3. Regresses index against target predictand Credit: Tom Pagano

What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation Credit: Tom Pagano

What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation avg stdev 135 30 60 15 Credit: Tom Pagano

What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation avg stdev 135 30 60 15 Z = (90 – 60)/15 = +2 Credit: Tom Pagano

How good are the results Under conditions of serially compete data, and relatively “normal” conditions PCA and Z-Score are effectively indistinguishable* Skill and behavior is similar to the official published outlooks** However… Any tool is a weapon if you hold it right. (aka “A fool with a tool is still a tool”) Credit: Tom Pagano *Viper technical note - 1 basin ** Pagano dissertation – 29 basins

Super Quick Primer on VIPER

The Viper Main Interface Layout and interpretation Credit: Tom Pagano

The Viper Main Interface Layout and interpretation Selecting predictors and predictands Global month changes Credit: Tom Pagano

The Viper Main Interface Layout and interpretation Selecting predictors and predictands Global month changes Predictors quality, availability Historical statistics Credit: Tom Pagano

The Viper Main Interface Layout and interpretation Selecting predictors and predictands Forecast vs observed time series Station availability, weights Global month changes Predictors quality, availability Historical statistics Credit: Tom Pagano

The Viper Main Interface Layout and interpretation Selecting predictors and predictands Forecast vs observed time series Station availability, weights Global month changes Predictors quality, availability Fcst vs obs scatterplot Helper variable Scatterplot/ Forecast progression Historical statistics Credit: Tom Pagano

The Viper Main Interface Layout and interpretation Selecting predictors and predictands Forecast vs observed time series Station availability, weights Global month changes Predictors quality, availability Fcst vs obs scatterplot Helper variable Scatterplot/ Forecast progression Settings Probability bounds Historical statistics Credit: Tom Pagano

Selecting predictors and predictands Forecast vs observed time series Station availability, weights Global month changes Predictors quality, availability Fcst vs obs scatterplot Helper variable Scatterplot/ Forecast progression Settings Probability bounds Historical statistics The Viper Main Interface Layout and interpretation There’s more if you scroll right: Relate any variable to another Credit: Tom Pagano

A Brief Introduction to Statistical Forecasting

A Brief Introduction to Statistical Forecasting

Presentation Transcript

A Brief Introduction to

A Brief Introduction to MySQL

Statistical Weather Forecasting

Statistical Weather Forecasting

A Brief Introduction to Java

Introduction to Forecasting

A brief introduction to:

A brief introduction to eramba

A BRIEF INTRODUCTION TO FIELDBUS

Introduction to Forecasting

A Brief Introduction to Astrodynamics

A brief Introduction to Bioinformatics

A Brief Introduction to

Introduction to Forecasting

A Brief Introduction to VBA

A Brief Introduction to Helicopters

A Brief Introduction to Gravity

A brief introduction to doxygen

A Brief Introduction to iProcurement

Statistical Forecasting Models

Introduction to Forecasting