180 likes | 275 Views
An analysis of different bias-correction algorithms in a synthetic environment. Joo-Hyung Son 1 Zoltan Toth 2 and Dingchen Hou 3 1) Numerical Weather Prediction Division KMA 2) Environmental Modeling Center NCEP/NWS/NOAA 3) EMC/NCEP/NWS/NOAA and SAIC. OUTLINE. Introduction
E N D
An analysis of different bias-correction algorithms in a synthetic environment Joo-Hyung Son1 Zoltan Toth2 and Dingchen Hou3 1)Numerical Weather Prediction Division KMA 2)Environmental Modeling Center NCEP/NWS/NOAA 3)EMC/NCEP/NWS/NOAA and SAIC
OUTLINE • Introduction • Generation of a Synthetic Data Set • Effects of Sample size on the Bias Estimation • Bias Estimation Based on Bayesian Approach • Effect of Bias Correction on Probabilistic Forecast • Summary
Introduction Background • NWP products is subject to systematic error and random errors. • Estimating bias from historical data and then subtracting it from the forecast provides an effective way of reducing systematic errors. Existing Questions • How to estimate the Bias? There exist various methods of bias correction, e.g. equal weight method and Kalman Filter type algorithm (Cui et al, 2005). • What is the length of the historical data set required for a reasonable accuracy of bias estimation? No systematic investigations. This Study – A Simplified Approach • Single forecast of a single variable at a single grid point. • Simulated forecast (synthetic data )--- no dynamic evolution. • Simulated forecast of various skill (lead time) and bias level. • Simulation can be extended to represent more realistic forecasts.
Daily climate data Climate mean Climate standard deviation Generation of synthetic data - analysis • Assumptions • Remove annual cycle • Standardized • Stationary process • Analysis • General ARMA(p,q) model • Order of autoregressive • Order of moving average • White noise • Autocorrelation parameter • Moving average parameter Aotocorrelation 1.2 • Estimate parameters based on • 40 years climate data at 37.5N, 117.5W • 2m temperature 1 0.8 q = 1 p = 20 0.6 0.4 0.2 0 0 50 100 150 200 250 300 350 400 -0.2
4 3 2 1 0 0 365 730 -1 -2 -3 Climate generated byARMA(20,1) Generation of synthetic data - analysis Time series of analysis
analysis generated by ARMA model, N(0,1) • forecast, N(0.1) : forecast error, N(0,1) • bias, constant • correlation between forecast and analysis Generation of synthetic data - forecast Requirements: • The time series of analysis and forecast are similar stationary stochastic processes. • Forecast is correlated to analysis with a coefficient reflecting the skill of the forecastfor perfect correlation and non-correlated forecast. (simulate lead time 1 to 16 days) • Forecast is subject to random error (independent of analysis) with various variance (=1 no skill, =0 no noise). • Forecast is statistically the same as analysis (N(0,1)). This is satisfied by setting =sqrt(1-**2). • A constant (time independent) bias is added to the forecast. Model:
Comparison between Real data & Synthetic data Purple line: • “prediction” of how the forecast would look. • Normal forecast distribution centered on alpha times a, • : correlation estimated based on whole observation period • : mean of all analysis values falling between 3 and 4. • : standard deviation of forecast when corresponding analysis is between 3 and 4 Histogram: • Forecast after moving bias Testing Synthetic forecast model against real forecast data
0.5 0.5 0.5 0.5 day 3 10 day day 10 day 16 0.45 0.45 0.45 0.45 0.4 0.4 0.4 0.4 0.35 0.35 0.35 0.35 0.3 0.3 0.3 0.3 0.25 0.25 0.25 0.25 0.2 0.2 0.2 0.2 0.15 0.15 0.15 0.15 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0 0 0 0 -5 0 5 10 20 -20 -15 -10 15 -20 -20 -15 -15 -10 -10 -5 -5 -5 0 0 0 5 5 5 10 10 10 15 15 20 20 20 -20 -15 -10 15 Testing Synthetic forecast model against real forecast data mean
Bias-correction algorithms • Traditional method (method 1) • Bias ~ weighted average of • Bias Estimation • Equal weight • Kalman Filter • Bias Correction : Kalman Filter weight
Kalman filter absolute bias error for 100 cases Absolute bias error of Method 1 Red points: the point of equal weighting bias error corresponding to the average of the KF bias error from 1001 to 10000 based on the correlation (~120)
Given the forecast model For a particular For longer time series to sample, the whole distribution of , i.e. : Kalman Filter weight Bias-correction algorithms • New method (method 2) • Based on Bayesian Approach • Bias ~ weighted average of Note without sampling the whole distribution of shorter time series • Bias Estimation • Equal weight • Kalman Filter • Bias correction • Traditional method (method 1)
Absolute bias error of Method 2 Red points: the point of equal weighting bias error corresponding to the average of the KF bias error from 1001 to 10000 based on the correlation (~90) Kalman Filter Absolute bias error of 100 cases
Equal weight method Sample size required for the error to be less than a specific percentage of real bias m1 m2 m1 m2 Comparison of Methods 1 & 2
BIAS (Kalman Filter, method 1) 0.25 0.2 0.15 bias 0.1 0.05 0 correlation 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 lead time(day) 1 2 3 4 5 6 8 10 11 13 16 0.95 0.75 0.20 Effects on ensemble based probabilistic forecast Continuous Ranked Probability Score (CRPS) test • Assumption • Uncertainty is perfectly known (no bias in 2nd momentum) • Forecast • Bias increases with lead time (decreases with correlation) • Modified bias • Bias is standardized by climate standard deviation
, CDF analysis Effects on ensemble based probabilistic forecast Continuous Ranked Probability Score (CRPS) test • Ensemble distribution = forecast uncertainty • PDF of forecast , • CRPS
For synthetic forecast with error levels larger than that in real forecast Effects on ensemble based probabilistic forecast Continuous Ranked Probability Score (CRPS) test For synthetic forecast with error levels similar to that in real forecast Raw fcst 100 warming period 5000 warming period For synthetic forecast with error levels similar to that in real forecast
Summary • Working with synthetic analysis/forecast data sets is useful in the investigation of the performance of various statistical bias correction methods. (quick assessment/comparison) • Bayesian type bias estimation method may have the additional benefits (bias error). • Bias error is independent of bias level, but the probabilistic forecast error can be reduced as the bias is larger. • Need to consider realistic ensemble forecast and more complex bias estimation algorithms (comparing frequency and Bayesian approaches).