1 / 47

The Z-Score Regression Method and You Tom Pagano tom.pagano@porda 503-414-3010

The Z-Score Regression Method and You Tom Pagano tom.pagano@por.usda.gov 503-414-3010. Why do we need something new? What is a z-score? How does the regression work? How good are the results? How to stay out of trouble?. Why do we need something new or different?.

chandra
Download Presentation

The Z-Score Regression Method and You Tom Pagano tom.pagano@porda 503-414-3010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Z-Score Regression Method and You Tom Pagano tom.pagano@por.usda.gov 503-414-3010

  2. Why do we need something new? What is a z-score? How does the regression work? How good are the results? How to stay out of trouble?

  3. Why do we need something new or different? Challenges forecasters face: Data-rich mixed with data-poor stations Missing realtime data High cross-correlation of variables (“co-linearity”)

  4. Mt. Rose Apr 1 Snowpack (1910-2006) Uneven record lengths Some stations have many years

  5. Mt. Rose Water Year Precipitation (1981-2005) Mt. Rose Apr 1 Snowpack (1910-2006) Uneven record lengths Some stations have many years Others have fewer Typical regression requires completeness Overlapping record

  6. Mt. Rose Water Year Precipitation (1981-2005) Mt. Rose Apr 1 Snowpack (1910-2006) Uneven record lengths Some stations have many years Others have fewer Typical regression requires completeness Overlapping record The choice in this situation has been: Use fewer stations or use fewer years

  7. Why this is a problem To use new, younger stations, older information has to be “forgotten”. Otherwise, a station must exist for a long time before becoming useable.

  8. Why this is a problem To use new, younger stations, older information has to be “forgotten”. Otherwise, a station must exist for a long time before becoming useable. If one piece of data is missing in realtime then no forecast at all is available, even if 95% of the “information” is there.

  9. What does z-score regression do? 1. Combines predictors into weighted indices, emphasizing good stations, minimizing bad ones.

  10. What does z-score regression do? 1. Combines predictors into weighted indices, emphasizing good stations, minimizing bad ones. 2. Compensates for missing data with remaining data.

  11. What does z-score regression do? 1. Combines predictors into weighted indices, emphasizing good stations, minimizing bad ones. 2. Compensates for missing data with remaining data. 3. Regresses index against target predictand

  12. What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation

  13. What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation

  14. What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation avg stdev 135 30 60 15

  15. What is a z-score? A z-score is a “normalized anomaly”: Z = value - average standard deviation avg stdev 135 30 60 15 Z = (90 – 60)/15 = +2

  16. What is a z-score? +2 wetter stdev avg 0 1 Z-scores drier Stations are now on an “even footing”

  17. What is a z-score? wetter stdev avg 0 1 Z-scores drier If one station is partially missing, the other station hints at what it might have been.

  18. How does z-score regression work? 1. Normalize input time series (x – x )/σ x April 1st inches swe

  19. How does z-score regression work? 1. Normalize input time series (x – x )/σ x Standardized Anomalies (“z-scores”)

  20. How does z-score regression work? 2. Correlate each index with target (flow) to get weights r^2 with Apr-Jul flow Standardized Anomalies (“z-scores”) 0.48 0.52 0.61

  21. How does z-score regression work? r^2 with Apr-Jul flow 3. Develop weighted average of available sites 0.48 0.52 0.61 Standardized Anomalies (“z-scores”) e.g. A*x1 + B*x2 A + B Relative weightings

  22. How does z-score regression work? r^2 with Apr-Jul flow 3. Develop weighted average of available sites 0.48 0.52 0.61 Standardized Anomalies (“z-scores”) Weighted average e.g. A*x1 + B*x2 A + B Relative weightings

  23. How does z-score regression work? 4. Regress multi-station weighted index against flow Observed Multi-station z-score index

  24. The use of “groups” (aka components) In the case of multiple signals, stations with a like signal (e.g. fall precipitation) are combined by the user into their own respective “group index”, weighted by their combination with flow.

  25. The use of “groups” (aka components) In the case of multiple signals, stations with a like signal (e.g. fall precipitation) are combined by the user into their own respective “group index”, weighted by their combination with flow. All the group indices are then combined into a “master index”, weighted, again, by their correlation with flow. The master index is regressed against flow.

  26. Steps to z-score regression

  27. Steps to z-score regression

  28. Steps to z-score regression

  29. Steps to z-score regression

  30. Steps to z-score regression

  31. A realtime numerical example (1 group, 2 sites) Site Fry Lk Mary Group Snow Snow Avg 4” 5” Stdev 1” 2” Realtime Data 2” 2.5” Z-Score = -2.00 = -1.25 Correlation^2 with flow 0.75 0.50 (2-4)/1 (2.5-5)/2 Group Snow Group index = -1.7 -2*0.75 + -1.25*0.50 0.75+0.50

  32. A realtime numerical example (1 group, 2 sites) Site Fry Lk Mary Group Snow Snow Avg 4” 5” Stdev 1” 2” Realtime Data 2” 2.5” Z-Score = -2.00 = -1.25 Correlation^2 with flow 0.75 0.50 (2-4)/1 (2.5-5)/2 Group Snow Group index = -1.7 -2*0.75 + -1.25*0.50 0.75+0.50

  33. A realtime numerical example (1 group, 2 sites) Site Fry Lk Mary Group Snow Snow Avg 4” 5” Stdev 1” 2” Realtime Data 2” 2.5” Z-Score = -2.00 = -1.25 Correlation^2 with flow 0.75 0.50 (2-4)/1 (2.5-5)/2 Group Snow Group index = -1.7 -2*0.75 + -1.25*0.50 0.75+0.50

  34. A realtime numerical example (3 sites) Site Fry Lk Mary Newman Group Snow Snow Snow Avg 4” 5” 12” Stdev 1” 2” 4” Realtime Data 2” 2.5” 6” Z-Score = -2.00 = -1.25 = -1.50 Correlation^2 with flow 0.75 0.50 0.65 (2-4)/1 (2.5-5)/2 (6-12)/4 Group Snow Group index = -1.63 -2*0.75 + -1.25*0.50 + -1.5*0.65 0.75+0.50+0.65

  35. A realtime numerical example (3 sites, 1 missing) Site Fry Lk Mary Newman Group Snow Snow Snow Avg 4” 5” 12” Stdev 1” 2” 4” Realtime Data 2” missing 6” Z-Score = -2.00 = missing = -1.50 Correlation^2 with flow 0.75 0.50 0.65 (2-4)/1 (6-12)/4 Group Snow Group index = -1.77 -2*0.75 + -1.25*0.50 + -1.5*0.65 0.75+0.50+0.65

  36. A realtime numerical example (2 groups, 3 sites) Site Fry Lk Mary Fry Group Snow Snow Precip Avg 4” 5” 6” Stdev 1” 2” 2” Realtime Data 2” 2.5” 3” Z-Score = -2.00 = -1.25 = -1.50 Correlation^2 with flow 0.75 0.50 0.25 (2-4)/1 (2.5-5)/2 (3-6)/2 Group Correlation^2 with flow 0.6 0.25 Group Snow Precip Group index = -1.7 = -1.5 -2*0.75 + -1.25*0.50 0.75+0.50 -1.5 * 0.25 0.25 Master index -1.7*0.6 + -1.5*0.25 = -1.64 0.6+0.25

  37. How good are the results Under conditions of serially compete data, and relatively “normal” conditions PCA and Z-Score are effectively indistinguishable* Skill and behavior is similar to the official published outlooks** *Viper technical note - 1 basin ** Pagano dissertation – 29 basins

  38. How good are the results Under conditions of serially compete data, and relatively “normal” conditions PCA and Z-Score are effectively indistinguishable* Skill and behavior is similar to the official published outlooks** However… Any tool is a weapon if you hold it right. (aka “A fool with a tool is still a tool”) *Viper technical note - 1 basin ** Pagano dissertation – 29 basins

  39. Abuse of the z-score method If the main driver of skill is absent from certain years, those years will have overconfident forecasts. The set as a whole will not be as skillful as it could be. Fcst Obs r2=0.18 r2=0.95

  40. Abuse of the z-score method • If the main driver of skill is • absent from certain years, those years will have overconfident forecasts. • The set as a whole will not be as skillful as it could be. • Solutions: • Remove poor skill years from calibration set Fcst Obs r2=0.95

  41. Abuse of the z-score method • If the main driver of skill is • absent from certain years, those years will have overconfident forecasts. • The set as a whole will not be as skillful as it could be. • Solutions: • Remove poor skill years from calibration set • Remove poor skill station entirely Fcst Obs x x r2=0.95

  42. Abuse of the z-score method • If the main driver of skill is • absent from certain years, those years will have overconfident forecasts. • The set as a whole will not be as skillful as it could be. • Solutions: • Remove poor skill years from calibration set • Remove poor skill station entirely • If data for high skill station not available in realtime, remove high skill station Fcst Obs x

  43. More z-score method atrocities Stations’ period of records should be representative station1 station2

  44. More z-score method atrocities Stations’ period of records should be representative Blue station’s “wet” years are actually normal over longer term. station1 station2

  45. More z-score method atrocities Stations’ period of records should be representative Z-Score Rescaling Blue station’s “wet” years are actually normal over longer term.

  46. More z-score method atrocities Stations’ period of records should be representative Z-Score Rescaling Blue station’s “wet” years are actually normal over longer term. • Solutions: • Use consistent years • Eliminate one station • Estimate missing data ahead of time

  47. Summary Z-score regression – A regression methodology that, within reason, can handle uneven record lengths and missing data. It groups stations into indices, emphasizing good stations, minimizing the effect of poor stations. Multiple signals can be managed (e.g. snow, fall precip, baseflow). Can be abused especially if the input data set is highly uneven.

More Related