10 likes | 152 Views
Incorporating Multi-model Ensemble Techniques into a Probabilistic Hydrologic Forecasting System: Relative Merits of Ensemble vs. Bias-Corrected Models Mergia Y. Sonessa, Theodore J. Bohn and Dennis P. Lettenmaier
E N D
Incorporating Multi-model Ensemble Techniques into a Probabilistic Hydrologic Forecasting System: Relative Merits of Ensemble vs. Bias-Corrected Models Mergia Y. Sonessa, Theodore J. Bohn and Dennis P. Lettenmaier Department of Civil and Environmental Engineering, Box 352700, University of Washington, Seattle, WA 98195 American Geophysical Union Fall Meeting, December 2008 4 ABSTRACT Multi-model ensemble techniques have been shown to reduce bias and to aid in quantification of the effects of model uncertainty in hydrologic modeling. These techniques are only beginning to be applied in operational hydrologic forecast systems. Much of the analyses (e.g. Ajami et al., 2006) that have been performed to date have focused on daily data over short durations, with constant ensemble model weights and constant bias correction parameters (if bias correction is applied at all). However, typical hydrologic forecasts can involve monthly flow volumes, lead times of several months, various probabilistic forecast techniques, and monthly bias corrections. Under these conditions the question arises as to whether a multi-model ensemble is as effective in improving forecast skill as under the conditions that have been investigated so far. To investigate the performance of a multi-model ensemble in the context of probabilistic hydrologic forecasting, we have extended the University of Washington's West-wide Seasonal Hydrologic Forecasting System to use an ensemble of three models: the Variable Infiltration Capacity (VIC) model version 4.0.6, the NCEP NOAH model version 2.7.1, and the NWS grid-based Sacramento/Snow-17 model (SAC). The objective of this presentation is to assess the performance of the ensemble of the three models as compared to the performance of the models individually, with and without various forms of bias correction, and in both retrospective and probabilistic forecast modes. Three forecast points within the West-wide forecast system domain were used for this research: the Feather River at Oroville, CA, the Salmon River at White horse, ID, and the Colorado River at Grand Junction. The forcing and observed streamflow data are for years 1951-2005 for the Feather and Salmon Rivers; and 1951-2003 for the Colorado. The models were first run for the retrospective period, then bias-corrected, and model weights were then determined using unconstrained multiple linear regression as a best-case scenario for the ensemble. We assessed the performance of the ensemble in comparison with the individual models in terms of correlation with observed flows (R), Root Mean Square Error (RMSE), and Coefficient of Prediction (Cp). To test forecast skill, we performed Ensemble Streamflow Prediction (ESP) forecasts for each year of the retrospective period, using forcings from all other years, for individual models and for the multi-model ensemble. To form the ensemble for the ESP runs, we used the model weights from the retrospective simulations. In both the retrospective and the probabilistic forecast cases, we found that a monthly bias correction applied to an individual model generally makes the individual model competitive with the multi-model ensemble. For ensemble methods other than unconstrained multiple least squares regression, ensemble performance tends to be worse, with individual bias-corrected models sometimes outperforming the ensemble. It should be noted that the entire time series was used for determining bias correction parameters, ensemble model weights, and assessing forecasts, i.e. the calibration and validation periods were the same. The relative benefits of monthly bias correction vs. multimodel ensemble in a validation period outside the training period remain to be investigated. 6 Retrospective Run – Whole-Timeseries Metrics ESP Run – Monthly Forecast R and Cp for all start months • Assessment: Bias Correction v. Ensemble • Constant Parameters • 1. No Bias Correction • The multimodel ensemble showed small improvement over the best individual model, in terms of RMSE and correlation with observed flow • 2. Bias Correction • The constant bias correction gave moderate reduction in RMSE for individual models and for the ensemble, but little improvement in correlation with observed flow • Ensemble performance only slightly better than the best model • Monthly Parameters • 3. No Bias Correction • Allowing model weights to vary monthly improved the ensemble performance substantially over the best individual model, in terms of both RMSE and correlation with observed flow, especially in the Salmon River • 4. Bias Correction • Applying a monthly bias correction to the individual models resulted in dramatic reductions in the models’ RMSE and substantial improvements in their correlations with observed flows • Despite the improvements in individual model performance, the ensemble average improved only slightly • Improvements in individual model performance made even the worst individual model competitive with the ensemble Scenarios Considered for Comparing Multimodel Ensemble (ENS) with Individual Models We first examine performance of the models and ensemble over the entire timeseries, as is typical of other studies such as Ajami et al. (2006). Previous studies have employed model weights and bias correction parameters (if any) that were constant in time. Here we investigate performance of both constant and monthly-varying model weights and model parameters, in the following four cases: Colorado Raw Bias Corrected 1 Case 1: Raw model output Constant ensemble model weights Case 2: Constant bias correction Constant ensemble model weights Case 3: Raw model output Monthly ensemble model weights Case 4: Monthly bias correction Monthly ensemble model weights 4 2 3 In all cases, the bias correction is derived from quantile mapping between simulated and observed flows, and the ensemble is formed from an unconstrained multiple linear regression between the observed flows and the simluated flows. -0.15 0 0.2 -0.6 0 3.8 4.2 Correlation of Models and Ensemble with Observations 4.1 RMSE of Models and Ensemble vs Observations 1 2 3 4 COLORADO 1 2 3 4 -0.21 0 0.12 -0.21 0 0.12 1 1 2 3 4 4 1 2 3 4 FEATHR 2 3 0.65 1.0 0.19 1.1 0.75 1.0 0.65 1.0 0.19 1.1 0.75 1.0 1 Models • Assessment: Bias Correction v. Ensemble • Here we investigate bias correction and multi-model ensembles in probabilistic forecasts, using the ensemble streamflow prediction (ESP) method. For each year of the same 52-year retrospective period (1951-2002) analyzed in sections 4 and 5, we created an ensemble of forcings (the ESP ensemble) composed of the forcings from the other 51 years. Models were started from each of the 12 possible starting months within the forecast year, and ran for 1 year. Thus, each month of the 52-year period was forecast with lead times ranging from 1 to 12 months. For each combination of forecast month and lead time, the forecast skill of the ESP ensemble mean was assessed across all 52 years in terms of coefficient of prediction (CP) (also called Nash-Sutcliffe Efficiency) and correlation with observations (R). This was performed for the individual models’ ESP means and the multi-model ensemble of the individual model ESP means. The bias correction and multi-model ensemble parameters used were the same ones derived in the retrospective simulations discussed in section 5. We show the results for the Colorado basin above; the other basins exhibited similar behavior. • No Bias Correction, Monthly Ensemble Weights • The multi-model ensemble generally performed better than the individual models, in terms of CP. This resulted in turn from reduction of RMSE. • The multi-model ensemble generally did not perform any better than the best model in terms of correlation with observations (R) • Monthly Bias Correction & Ensemble Weights • Applying a monthly bias correction to the individual models yielded substantial improvements in CP, due to reduction of RMSE. • Little change was seen in R. • Individual models with a monthly bias correction outperformed both uncorrected models and the ensemble of uncorrected models. • Forming a multi-model ensemble from bias corrected models yielded little or no improvement in CP or R in most forecast month/lead time combinations. • Months for which error correlation among models was low (circles 1-3) corresponded to large improvements in multi-model ensemble over individual models. An exception was for forecasts in summer months (circle 4). The models in our ensemble all share the same basic structure, consisting of grid cells containing a multi-layer soil column overlain by one or more “tiles” of different land covers, including vegetation with and without canopy and bare soil. Water and/or energy fluxes are tracked vertically throughout the column from the atmosphere through the land cover to the bottom soil layer. The figure below illustrates these features as implemented in the VIC (Variable Infiltration Capacity) macroscale land surface model (Liang et al., 1994). 1 2 3 4 SALMON 1 2 3 4 4.4 Comparison of Hydrographs • Models at a glance • VIC • Physically-based horizontal soil layers • Energy balance • 2-layer snow pack • Elevation bands • NOAH • Physically-based horizontal soil layers • Energy balance • Single-layer snow pack • No elevation bands • Sacramento/SNOW17 (SAC) • Conceptually-based soil storages • No energy balance • Elevation bands • Degree-day snow melt scheme • No explicit vegetation • Potential Evapotranspiration computed by NOAH is an input Comparison of hydrographs Fig. 4.4. shows a sample of the hydrographs for the cases of monthly-varying model weights, with (b) and without (a) monthly bias correction. It can be seen that the monthly bias correction removes differences in model timing of peak flows (e.g. circle “1”) arising from differences in the model snow pack formulations. In addition, the monthly bias correction removes systematic bias shared by all models (e.g. circle “2”). a 4.3 Error Correlation and Covariance, and Model Cross Correlation Ensemble Performance, Bias, and Model Collinearity These results can be explained by examining model collinearity and its sensitivity to bias correction. A multi-model ensemble benefits from having some degree of model independence. In particular, the more independent model errors are, the more likely they are to cancel out in the ensemble. Fig 4.3 shows the norms of the matrices of model error cross-correlation, model error covariance (scaled by observed flow variance) and model cross-correlation, for all 4 cases outlined above. Applying a constant bias correction has little effect on the correlation and covariance of model errors or the total cross correlation of the models. However, applying the monthly bias correction tends to increase error and model correlations, making them more collinear. This leaves less random error for the ensemble to operate on. 1 2 3 4 COLORADO 2 1 1 2 3 4 FEATHER b 2 Constant Parameters 1 2 3 4 SALMON 1 0.6 1.0 3 5 2 Basins used in this Study Model Forcings and Parameters Retrospective Run – Monthly Metrics • CONCLUDING REMARKS • Performance improvement due to a monthly bias correction can be much larger than for a multi-model ensemble, when dealing with monthly data from hydrologic models. • Forming a multi-model ensemble from models to which a monthly bias correction has been applied yields little improvement over the individual models. • The reason for this is that a monthly bias correction, performed on monthly data, tends to increase model collinearity. • Those months in which the multi-model ensemble yielded the largest improvements over individual models were those for which model errors were least correlated. • These relationships held for both retrospective simulations and probabilistic forecasts. Salmon River Colorado • Forcing • 0.125 degree LDAS forcings (Maurer et al, 2002) • Disaggregated to 3-hourly time step Ensemble Performance vs. Collinearity We can gain further insight if we examine the monthly statistics of the models and ensemble with and without monthly bias correction. Fig. 5 shows mean flow, correlation with observed flow, RMSE, and the norms of the matrices of error correlation, error covariance, and model cross-correlation with (b) and without (a) monthly bias correction for the Colorado basin, as well as the difference between the two cases (c). Bias correction in this case was obtained by fitting 3-parameter log normal distributions to the observed and simulated flows, tranforming them all to log space, adjusting the means and variances of the simulations to match the observed flows, and transforming them all back to flow space.. Raw Bias Corrected Bias Corrected - Raw Flow, cms Feather River Land Surface Parameters VIC LDAS parameters (Maurer et al, 2002) followed by iterative calibration of runoff and baseflow parameters by the UA-SCEM method NOAH NLDAS parameters (Mitchell et al, 2004) but with maximum snow albedo replaced by 0.85, to better match snow melt signatures SAC NLDAS (Mitchell et al, 2004) followed by iterative calibration of LZTWM, LZFPM, PFREE, LZSK, ASIMP, and UZTWM by the UA-SCEM method R Colorado River at Grand Junction RMSE Error Cov Ensemble performance varies from month to month, sometimes being worse than the best individual model. It can be seen that those months for which the ensemble performance is better than the best models are those months having relatively low error correlation and model cross correlation. From the plots in the difference column, we can see that the monthly bias correction tends to increase the cross correlation of model errors and model flows, particularly in the winter months in the Colorado basin. Results are similar in the other two basins. • References • Ajami, N. K., Q. Duan, X. Gao, and S. Sorooshian (2006), Multi-model combination techniques for hydrological forecasting: Application to Distributed Model Intercomparison Project results, J. Hydrometeorol., 7(4), 755– 768. • Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges, A Simple hydrologically Based Model of Land Surface Water and Energy Fluxes for GSMs, J. Geophys. Res., 99(D7), 14,415-14,428, 1994. • Mitchell, et al 2004. The Multi-institution North American Land Data Assimilation system (NLDAS): utilization of multiple GCIP products and partners in a continental distributed hydrological modeling system. Journal of Geophysical Research 109, doi:10.1029/2003JD003823. Model XCor Error Cor Month VIC SAC NOAH ENS OBS