An analysis of different bias-correction algorithms in a synthetic environment

An analysis of different bias-correction algorithms in a synthetic environment Joo-Hyung Son1 Zoltan Toth2 and Dingchen Hou3 1)Numerical Weather Prediction Division KMA 2)Environmental Modeling Center NCEP/NWS/NOAA 3)EMC/NCEP/NWS/NOAA and SAIC

OUTLINE • Introduction • Generation of a Synthetic Data Set • Effects of Sample size on the Bias Estimation • Bias Estimation Based on Bayesian Approach • Effect of Bias Correction on Probabilistic Forecast • Summary

Introduction Background • NWP products is subject to systematic error and random errors. • Estimating bias from historical data and then subtracting it from the forecast provides an effective way of reducing systematic errors. Existing Questions • How to estimate the Bias? There exist various methods of bias correction, e.g. equal weight method and Kalman Filter type algorithm (Cui et al, 2005). • What is the length of the historical data set required for a reasonable accuracy of bias estimation? No systematic investigations. This Study – A Simplified Approach • Single forecast of a single variable at a single grid point. • Simulated forecast (synthetic data )--- no dynamic evolution. • Simulated forecast of various skill (lead time) and bias level. • Simulation can be extended to represent more realistic forecasts.

Daily climate data Climate mean Climate standard deviation Generation of synthetic data - analysis • Assumptions • Remove annual cycle • Standardized • Stationary process • Analysis • General ARMA(p,q) model • Order of autoregressive • Order of moving average • White noise • Autocorrelation parameter • Moving average parameter Aotocorrelation 1.2 • Estimate parameters based on • 40 years climate data at 37.5N, 117.5W • 2m temperature 1 0.8 q = 1 p = 20 0.6 0.4 0.2 0 0 50 100 150 200 250 300 350 400 -0.2

4 3 2 1 0 0 365 730 -1 -2 -3 Climate generated byARMA(20,1) Generation of synthetic data - analysis Time series of analysis

analysis generated by ARMA model, N(0,1) • forecast, N(0.1) : forecast error, N(0,1) • bias, constant • correlation between forecast and analysis Generation of synthetic data - forecast Requirements: • The time series of analysis and forecast are similar stationary stochastic processes. • Forecast is correlated to analysis with a coefficient reflecting the skill of the forecastfor perfect correlation and non-correlated forecast. (simulate lead time 1 to 16 days) • Forecast is subject to random error (independent of analysis) with various variance  (=1 no skill, =0 no noise). • Forecast is statistically the same as analysis (N(0,1)). This is satisfied by setting  =sqrt(1-**2). • A constant (time independent) bias is added to the forecast. Model:

Generation of synthetic data - forecast

Comparison between Real data & Synthetic data Purple line: • “prediction” of how the forecast would look. • Normal forecast distribution centered on alpha times a, • : correlation estimated based on whole observation period • : mean of all analysis values falling between 3 and 4. • : standard deviation of forecast when corresponding analysis is between 3 and 4 Histogram: • Forecast after moving bias Testing Synthetic forecast model against real forecast data

0.5 0.5 0.5 0.5 day 3 10 day day 10 day 16 0.45 0.45 0.45 0.45 0.4 0.4 0.4 0.4 0.35 0.35 0.35 0.35 0.3 0.3 0.3 0.3 0.25 0.25 0.25 0.25 0.2 0.2 0.2 0.2 0.15 0.15 0.15 0.15 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0 0 0 0 -5 0 5 10 20 -20 -15 -10 15 -20 -20 -15 -15 -10 -10 -5 -5 -5 0 0 0 5 5 5 10 10 10 15 15 20 20 20 -20 -15 -10 15 Testing Synthetic forecast model against real forecast data mean

Bias-correction algorithms • Traditional method (method 1) • Bias ~ weighted average of • Bias Estimation • Equal weight • Kalman Filter • Bias Correction : Kalman Filter weight

Kalman filter absolute bias error for 100 cases Absolute bias error of Method 1 Red points: the point of equal weighting bias error corresponding to the average of the KF bias error from 1001 to 10000 based on the correlation (~120)

Given the forecast model For a particular For longer time series to sample, the whole distribution of , i.e. : Kalman Filter weight Bias-correction algorithms • New method (method 2) • Based on Bayesian Approach • Bias ~ weighted average of Note without sampling the whole distribution of shorter time series • Bias Estimation • Equal weight • Kalman Filter • Bias correction • Traditional method (method 1)

Absolute bias error of Method 2 Red points: the point of equal weighting bias error corresponding to the average of the KF bias error from 1001 to 10000 based on the correlation (~90) Kalman Filter Absolute bias error of 100 cases

Equal weight method Sample size required for the error to be less than a specific percentage of real bias m1 m2 m1 m2 Comparison of Methods 1 & 2

BIAS (Kalman Filter, method 1) 0.25 0.2 0.15 bias 0.1 0.05 0 correlation 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 lead time(day) 1 2 3 4 5 6 8 10 11 13 16 0.95 0.75 0.20 Effects on ensemble based probabilistic forecast Continuous Ranked Probability Score (CRPS) test • Assumption • Uncertainty is perfectly known (no bias in 2nd momentum) • Forecast • Bias increases with lead time (decreases with correlation) • Modified bias • Bias is standardized by climate standard deviation

, CDF analysis Effects on ensemble based probabilistic forecast Continuous Ranked Probability Score (CRPS) test • Ensemble distribution = forecast uncertainty • PDF of forecast , • CRPS

For synthetic forecast with error levels larger than that in real forecast Effects on ensemble based probabilistic forecast Continuous Ranked Probability Score (CRPS) test For synthetic forecast with error levels similar to that in real forecast Raw fcst 100 warming period 5000 warming period For synthetic forecast with error levels similar to that in real forecast

Summary • Working with synthetic analysis/forecast data sets is useful in the investigation of the performance of various statistical bias correction methods. (quick assessment/comparison) • Bayesian type bias estimation method may have the additional benefits (bias error). • Bias error is independent of bias level, but the probabilistic forecast error can be reduced as the bias is larger. • Need to consider realistic ensemble forecast and more complex bias estimation algorithms (comparing frequency and Bayesian approaches).

An analysis of different bias-correction algorithms in a synthetic environment

An analysis of different bias-correction algorithms in a synthetic environment

Presentation Transcript

Synthetic Environment for Analysis

An Overview of Different Compression Algorithms

Bias correction in data assimilation

AN ANALYSIS OF THE KEMERALTI ENVIRONMENT

Bias Correction Methods Adjusting Moments

Algorithms in a Multiprocessor Environment

Bias correction in data assimilation

Bias correction in data assimilation

Bias correction in data assimilation

Length Bias (Different natural history bias)

Another Bias Correction Method

Nonresponse Bias Analysis in a Survey of Banks

Bias in ocean data assimilation Two-stage bias correction algorithm Bias model

Sea state bias correction in coastal waters

A RCM bias correction method for climatic indices

Analysis of Bias Project

Bias Correction of RTFDDA Surface Forecasts

Length Bias (Different natural history bias)

CFSR Radiosonde Radiation Bias Correction

An overview on different algorithms of SEO

Satellite Bias Correction for CFSRR

Performance Analysis of Four Different Types of Sorting Algorithms using Different Languages