Factors Affecting Quality of Forecasts in Environmental Modeling Center

Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Joo-Hyung Son, Bo Cui, David Unger, Richard Verret, Dingchen Hou, Yuejian Zhu http://wwwt.emc.ncep.noaa.gov/gmb/ens/index.html FACTORS AFFECTING QUALITY OF FORECASTS

OUTLINE / SUMMARY • WHAT AFFECTS FORECAST QUALITY? • RESOLUTION • RELIABILITY • WHAT AFFECTS QUALITY OF BIAS CORRECTION? • BIAS CORRECTION METHOD • SAMPLE SIZE • GENERATION OF HIND-CASTS WITH FROZEN DA/MODEL • INCREASES SAMPLE SIZE • REDUCES RESOLUTION/RELIABILITY OF RAW FORECAST • REAL-TIME GENERATION OF HIND-CASTS • LATEST DA/MODEL VERSION USED • LARGE SAMPLE FOR CORRECTING BIAS IN 1ST MOMENT

QUALITY & UTILITY OF FORECASTS Quality of forecast process depends on its Statistical resolution Ability to distinguish (provide unique signals before) future events Temporal sequence foreseen - Inherent value of forecast process NWP methods used in 6 hr – 15 days range, statistical methods are not viable Can be improved by NWP DA & model development Statistical reliability Ability to simulate (not predict) nature faithfully Realism, fidelity - But no info on temporal sequence Can be improved by NWP model development Use of statistical bias correction methods Better statistical methods Larger data sample - Can be perfectly corrected with large enough sample Utility of forecasts depends on both resolution and reliability Dual requirement of Continuously improved DA & model Routinely done couple times per year Large hind-cast data sample Tried in US once with 8-10 year old system May conflict What is best way to proceed?

QUALITY OF BIAS CORRECTION ASSUME BOTH NATURE & FORECAST ARE STATIONARY Quality of bias correction depends on quality of bias estimation Bias estimation Estimating difference between expected values of two processes Nature (or its proxy) and Forecast of nature Statistical filters can be used Desirable characteristics of bias estimation Convergence to true value with increasing sample size Unbiased estimate Influenced by choice of method Minimal random noise in estimate Rate of convergence to unbiased estimate with increasing N must be high State of convergence depends on Choice of method Sample size

QUALITY OF BIAS CORRECTION REAL WORLD APPLICATIONS More complex problem, processes not stationary but influenced by Seasonal cycle Bias estimate as function of seasonal cycle Can be done but increases sample size requirement Regime changes Bias estimate as function of regime Can be done Use of recent data works well for short lead Increases sample size requirement for long lead Changes in forecast process (periodic upgrades of NWP DA & modeling) Estimation can be done but may require regeneration of large hind-cast dataset How to balance influence of different factors? Update model continuously Best statistical resolution Lack of large hind-cast dataset corrupts ability to bias correct Generate large hind-cast dataset Best reliability Corrupts statistical resolution for not using best available NWP system Can we have the good aspects of both worlds?

ISSUES How much skill we lose by using an 8-10 year old forecast system? 2 days skill at D3, 3 days at D7.5, 4+ days at 9D Estimate 2% rms error reduction per year in recent yrs How much skill we gain due to use of larger sample? Reduction depends on level of bias in forecast CDC hind-casts have much larger bias Large bias reduction observed Still does not compensate for loss of skill until 10+ day lead time Operational forecast system Not known Estimate via synthetic data Use synthetic data to estimate bias error reduction with more data & current fcst What is current level of NHE time mean error (estimated bias)? ~6-14% of climate standard deviation (2m temp) Gain from expanding sample From 100 days (Kalman Filter method) to 5,000 days (50 yrs hind-cast) 5-10% (depending on method) of random rms error This is equivalent to 2.5-5 yrs of model development If DA/model has to be frozen for 3-5 yrs Gain in reliability due to larger sample & Loss in resolution and reliability due to use of frozen system Are of similar magnitude Is it worth it?

Raw, Optimal & Actual Bias Corrected Ensembles Annual Mean RPSS ( 20040301 – 20050228 ) 500 mb Height over Northern Hemisphere • Decaying average bias correctionimproves RPSS for all lead time vs. raw oper. ens. • Climate error removed bias corrected reforecast gains significant improvementfor all lead time vs. raw reforecast 3 OPR ENS. • Operational vs. reforecast ens. oper. fcst is better than the bias-corrected reforecast out to 9-10 days. Beyond 10 days, bias-corrected reforecast becomes competitive to or better than oper. fcst 3 RFC ENS. • Sign of improving larger for CDC reforecast

Gain in 5 yrs = 15 hrs or 6 m (10%) Gain per year = 2% rms error reduction Gain in 5 yrs = 0.06 or ~15 hrs

ISSUES - 2 Regime dependent vs. climate mean bias correction Regime dependent (with small sample) Better at short lead Climate mean (with much larger sample) Better at long lead Could try regime dependent correction using hind-cast sample Combined method may be best? Is it worth the effort to generate hind-cast dataset? Questionable Are there additional factors/gains to consider? Bias correction for highly non-linear downstream applications River flow forecasting? Can we have the good aspects of both worlds? Latest model AND large hind-cast dataset? Real time generation of hind-cast data Assume biggest problem is bias in first moment Can be estimated using large sample of single forecasts Use recent data only to correct bias in higher moments of ensemble Focus on long lead bias estimation problem Regime dependent short lead estimation ok with most recent small sample Assume long lead bias depends only on model (and not quality of initial condition)

Raw, Optimal & Actual Bias Corrected Ensembles RPSS of 500 mb Height Northern Hemisphere, 2004 Summer Decaying average applied to CDC reforecast • decaying method gives better results than climate mean bias estimation for short range (~day 5), value of regime dependent correction • some gain from climate mean bias correction after _5 days

REAL-TIME GENERATION OF HIND-CAST DATASET? When computing facilities are upgraded (2007 at NCEP) Forgo increasing resolution/membership Instead generate hind-cast “ensemble” Arrangement Generate control forecast from re-analysis initial conditions Initialized from 30 days ahead of current date For each year in re-analysis dataset (30-50 yrs) Additional cost is 2-3 times current ensemble configuration - DOABLE Bias correction methodology Same as currently, except Estimate climate mean bias as mean of difference between Re-analysis & hind-cast forecast (+/-30 days x 30 yrs = 1,800 sample) Use centrally weighted recursive filter to save on disc storage requirements Combine regime dependent Kalman filter + climate mean bias estimates More weight on Kalman filter at short More weight on climate mean estimate at long leads Explore possible regime dependent corrections at long lead Save & use entire sample of 1,800 using hind-casts in research Discard 30+ day old hind-cast data Implement DA/model upgrades at wish Generate regime dependent & climate mean bias statistics during ~30-day parallel testing prior to operational implementation

REAL-TIME GENERATION OF HIND-CAST DATASET? Today’s Julian Date TJD TJD + 30 TJD - 30 Actual ensemble generated today 2006 Time 2005 2004 2003 1968 1967 Hind-casts for TJD+30 generated today Hind-casts (or its statistics) for TJD+/- 30 saved on disc

BACKGROUND

Current method, Real data Current method, Simulated data Estimated Before bias correction Estimated Before bias correction After bias correction After bias correction Current & new methods, Simul. data After bc., current method Actual New method, Simulated data Estimated Before bc. Before bias correction After bc, new method After bias correction

Statistical Post-processing • Bias correction methods • Decaying averaging bias correction ( ~ 46 day independent training data) • Operational NCEP ensemble • CDC reforecast data • 25-yr climatological mean forecast error (Hamill & Whitaker) • CDC reforecast • 31-day centered running mean forecast error ( dependent data), used as “optimal” benchmark • Operational NCEP ensemble • CDC reforecast data • “Hybrid” system, 2 step adjustment FCSTclibriated = FCSTOPR – BIAS DecayingOPR-RFC – BIASRFC-REA (25-YRS) • Data sets • NCEP operational global ens. Data (2004/05 analysis/modeling system) • CDC reforecast data set (1998 modeling system, 1978 – 2005) • 500 mb height, 850 mb temp, 2m temp, 10m U and V, Mar. 2004-Feb. 2005

Raw, Optimal & Actual Bias Corrected Ensembles Annual Mean RPSS ( 20040301 – 20050228 ) 500 mb Height over Northern Hemisphere • Decaying average bias correctionimproves RPSS for all lead time vs. raw oper. ens. • Climate error removed bias corrected reforecast gains significant improvementfor all lead time vs. raw reforecast 3 OPR ENS. • Operational vs. reforecast ens. oper. fcst is better than the bias-corrected reforecast out to 9-10 days. Beyond 10 days, bias-corrected reforecast becomes competitive to or better than oper. fcst 3 RFC ENS. • Sign of improving larger for CDC reforecast

Raw, Optimal & Actual Bias Corrected Ensembles RPSS of 850 mb Temperature Northern Hemisphere, 2004 Summer • Decaying average oper. ens. with bias correctionhas better performance than the raw fcst. 3 OPRR ENS. • Climate error removed bias corrected reforecast gains significant improvementfor most lead time vs. raw reforecast 3 RFC ENS. • Operational vs. reforecast ens. both the raw and post-processed oper. fcst. are better than the bias-corrected reforecast

Raw, Optimal & Actual Bias Corrected Ensembles • Decaying average • RMS error of OPR_DAV2% reduced for first week • Climate error removed • RMS error of RFC_COR improvement for all lead times vs.RFC_RAW

Given forecast model For a particular Bias-correction algorithms • New method (method 2) • Bias ~ weighted average of • Bias Estimation • Equal weight • Kalman Filter • Bias correction • Current method (method 1) : Kalman Filter weight

Factors Affecting Quality of Forecasts in Environmental Modeling Center