Ensemble weather prediction: verification, numerical issues, and use

NOAA Earth System Research Laboratory Ensemble weather prediction:verification, numerical issues, and use Tom Hamill NOAA Earth System Research Lab Boulder, Colorado, USA tom.hamill@noaa.gov

“Ensemble weather prediction”

Outline • (1) Verification methods for evaluating ensemble (and probabilistic) forecasts • (2) Numerical issues and principles. What more is there more to running an ensemble than slapping various forecasts together? • (3) Examples of how ensembles can and will be used operationally to improve decisions.

(1) Verification: what constitutes a “good” ensemble forecast? Here, the observed is outside of the range of the ensemble, which was sampled from the pdf shown. Is this a sign of a poor ensemble forecast? 4

Rank 1 of 21 Rank 14 of 21 Rank 5 of 21 Rank 3 of 21 5

One way of evaluating ensembles: “rank histograms” or “Talagrand diagrams” We need lots of samples from many situations to evaluate the characteristics of the ensemble. Happens when observed is indistinguishable from any other member of the ensemble. Ensemble is “reliable” Happens when observed too commonly is lower than the ensemble members. Happens when there are either some low and some high biases, or when the ensemble doesn’t spread out enough. 6 ref: Hamill, MWR, March 2001

Rank histograms of Z500, T850, T2m(from 1998 reforecast version of NCEP GFS) Solid lines indicate ranks after bias correction. Rank histograms are particularly U-shaped for T2M, which is probably the most relevant of the three plotted here. 7

Also: “reliability diagrams” 8

Reliability diagrams Curve tells you what the observed frequency was each time you forecast a given probability. This curve ought to lie along y = x line. Here this shows the ensemble-forecast system over-forecasts the probability of light rain. 9 Ref: Wilks text, Statistical Methods in the Atmospheric Sciences

Reliability diagrams Reliability diagrams typically provide extra information. Inset histogram tells you how frequently each probability was issued. Perfectly sharp: frequency of usage populates only 0% and 100%. 10 Ref: Wilks text, Statistical Methods in the Atmospheric Sciences

Reliability diagrams BSS = Brier Skill Score BS(•) measures the Brier Score, which you can think of as the squared error of a probabilistic forecast. Perfect: BSS = 1.0 Climatology: BSS = 0.0 A perfect BSS occurs only when you forecast probabilities of 0 and 1 (and are perfectly reliable). 11 Ref: Wilks text, Statistical Methods in the Atmospheric Sciences

Brier score Define an event, e.g., obs. precip > 2.5 mm. Let be the forecast probability for the ith forecast case. Let be the observed probability (1 or 0). Then (So the Brier score is the averaged squared error of the probabilistic forecast) 12

Reliability after post-processing Statistical correction of forecasts using a long, stable set of prior forecasts from the same model (like in MOS). More on this in my second talk. 13 Ref: Hamill et al., MWR, Nov 2006

What other characteristics of ensemble forecasts are important? “Sharpness” measures the specificity of the probabilistic forecast. Given two reliable forecast systems, the one producing the sharper forecasts is preferable. But: don’t want sharp if not reliable. Implies unrealistic confidence. 14

“Spread-skill” relationships are important, too. Small-spread ensemble forecasts should have less ensemble-mean error than large-spread forecasts. ensemble-mean error from a sample of this pdf on avg. should be low. ensemble-mean error should be moderate on avg. ensemble-mean error should be large on avg. 15

More verification topics (see backup slides) • Continuous ranked probability score (CRPS). • Relative Operating Characteristic • Potential Economic Value • Issues with calculating skill - exaggerated skill when combining samples from regions with different climatologies • Evaluating multi-dimensional characteristics of ensembles.

(2) Ensemble prediction:numerical issues & principles • Initial conditions should be consistent with analysis errors and should grow rapidly. • Given forecast model is imperfect, ensemble should include methods for uncertainty of the models themselves. • For regional ensembles, careful consideration of domain size & treatment of lateral boundary conditions essential.

Why run forecasts from many initial conditions? Observations Data Assimilation Forecast Model First Guess Analysis

Why run forecasts from many initial conditions? these also have errors, and observations aren’t available everywhere Observations Data Assimilation Forecast Model First Guess Analysis

Why run forecasts from many initial conditions? these also have errors, and observations aren’t available everywhere Observations Data Assimilation Forecast Model First Guess Analysis this will inevitably have some errors, else why assimilate new observations?

Why run forecasts from many initial conditions? hence the “initial condition” will inevitably have some error; it will inherit some characteristics of the forecast error and the analysis error. these also have errors, and observations aren’t available everywhere Observations Data Assimilation Forecast Model First Guess Analysis this will inevitably have some errors, else why assimilate new observations?

Why run forecasts from many initial conditions? hence the “initial condition” will inevitably have some error; it will inherit some characteristics of the forecast error and the analysis error. these also have errors, and observations aren’t available everywhere Observations Data Assimilation Forecast Model First Guess Analysis and of course errors tend to grow with time, so it’d be helpful to have a sense of the diversity of possible outcomes this will inevitably have some errors, else why assimilate new observations?

Preferred characteristics of ensemble initial conditions • Differences between initial conditions should be larger: • in regions with few observations • in storm tracks, where past forecast differences have grown quickly • where model dynamics are not accurate. • Should perturb aspects of model state where • state is not estimated accurately. • model forecasts are sensitive to small changes in state.

Perturb the land surface? The land state can be thought of as part of the initial condition. Why not perturb it? Perturbing the soil moisture (here, WRF initialized with 2 different soil moisture analyses) increased warm-season precipitation forecast spread, modulated the details of thunderstorm activity. Likely to have biggest impact in warm season, when insolation is large. Though In winter, perturb snow cover/depth? 24 24 Ref: Sutton et al. MWR, Nov 2006

Model errors at mesoscale: higher resolution is important, but it’s more than just that. Land-surface parameterization Boundary-layer parameterization Convective parameterization Microphysical parameterization etc. 25

Model error at mesoscale:Example: cloud microphysical processes Conversion processes, like snow to graupel conversion by riming, are very difficult to parameterize but very important in convective clouds. Especially for snow and graupel the particle properties like particle density and fall speeds are important parameters. The assumption of a constant particle density is questionable. Aggregation processes assume certain collision and sticking efficiencies, which are not well known. Most schemes do not include hail processes like wet growth, partial melting or shedding (or only very simple parameterizations). The so-called ice multiplication (or Hallet-Mossop process) may be very important, but is still not well understood 26 from Axel Seifert presentation to NCAR ASP summer colloquium

Model error at mesoscale:Summary of microphysical issuesin convection-resolving NWP • Many fundamental problems in cloud microphysics are still unsolved. • The lack of in-situ observations makes any progress very slow and difficult. • Most of the current parameterization have been designed, operationally applied and tested for stratiform precipitation only. • Most of the empirical relations used in the parameterizations are based on surface observation or measurements in stratiform cloud (or storm anvils, stratiform regions). • Many basic parameterization assumptions, like N0=const., are at least questionable in convective clouds. • Many processes which are currently neglected, or not well represented, may become important in deep convection (shedding, collisional breakup, ...). • One-moment schemes might be insufficient to describe the variability of the size distributions in convective clouds. • Two-moment schemes haven‘t been used long enough to make any conclusions. • Spectral methods are overwhelmingly complicated and computationally expensive. Nevertheless, they suffer from our lack of understanding of the fundamental processes. 27 from Axel Seifert presentation to NCAR ASP summer colloquium

Sensitivity of deep convective storms to graupel properties in a microphysical parameterization Effect of assumed graupel density and particle size distribution, i.e. size and fall speed, in a storm split spawning supercells. Contours: rain isohyets: shading: hail/graupel depths greater than .01, 0.1, 1, and 10 mm. • : location of maximum graupel accumulation.  : location of maximum hail accumulation. Plausible changes in microphysical parameterizations can cause large changes in precipitation amount, type, and location. 28 Ref: Gilmore et al., Nov 2004 MWR

What is commonly done to deal with model error • Increase resolution • Use multiple forecast models. • Use multiple parameterizations. • Use multiple constants in parameterizations. • Introduce stochastic elements into deterministic forecast model. There isn’t a clean, unifying theory guiding us on model error.

Lateral boundary conditions(now universally accepted that perturbed LBCs necessary in limited-area ensembles) Example: SREF Z500 spread for a 19 May 98 case of 5-member, 32-km Eta model ensemble. (only small impact on precipitation field) Ref: Du and Tracton, 1999, WMO report for WGNE. 0-h 12-h 24-h 36-h Perturb both IC & LBC Perturb LBC only Perturb IC only 30 30

Lateral boundary condition issues for LAMs (and LAEFs) With 1-way LBCs, small scales in domain cannot interact with scales larger than some limit defined by domain size. LBCs generally provided by coarser-resolution forecast models, and this “sweeps” in low-resolution information, sweeps out developing high-resolution information. Physical process parameterizations for model driving LBCs may be different than for interior. Can cause spurious gradients LBC info may introduce erroneous information for other reasons, e.g., model numerics. LBC initialization can produce transient gravity-inertia modes. 31 31 Ref: Warner et al. review article, BAMS, November 1997

Influence of domain size T-126 global model driving lateral boundary conditions for nests with 80-km and 40-km grid spacing of limited-area model. from Warner et al. Nov 1997 BAMS, and Treadon and Peterson (1993), Preprints, 13th Conf. on Weather Analysis and Forecasting 32 32

Influence of domain size, continued large nested domain small nested domain 40-km nested domain in global model had thin, realistic jet streak using large domain (left) and smeared-out, unrealistic jet streak using small domain (right). High resolution of interior domain not useful here because of sweeping in of low-resolution information. 33 33 Ref: ibid

(3) Use of ensembles for improving decisions

Use of ensemble for severe-weather outlook 48 hr SREF Forecast Valid 21 UTC 7 April 2006 Prob (MLCAPE > 1000 Jkg-1) X Prob (6 km Shear > 40 kt) X Prob (0-1 km SRH > 100 m2s-2) X Prob (MLLCL < 1000 m) X Prob (3h conv. Pcpn > 0.01 in) Shaded Area Prob > 5% Max 40% 35 Example from David Bright, SPC, using Jun Du’s NCEP SREF system (MLCAPE = CAPE using lowest 100 hPa)

Use of ensemble for severe-weather outlook 36 hr SREF Forecast Valid 21 UTC 7 April 2006 Prob (MLCAPE > 1000 Jkg-1) X Prob (6 km Shear > 40 kt) X Prob (0-1 km SRH > 100 m2s-2) X Prob (MLLCL < 1000 m) X Prob (3h conv. Pcpn > 0.01 in) Shaded Area Prob > 5% Max 50% 36

Use of ensemble for severe-weather outlook 24 hr SREF Forecast Valid 21 UTC 7 April 2006 Prob (MLCAPE > 1000 Jkg-1) X Prob (6 km Shear > 40 kt) X Prob (0-1 km SRH > 100 m2s-2) X Prob (MLLCL < 1000 m) X Prob (3h conv. Pcpn > 0.01 in) Shaded Area Prob > 5% Max 50% 37

Use of ensemble for severe-weather outlook 12 hr SREF Forecast Valid 21 UTC 7 April 2006 Prob (MLCAPE > 1000 Jkg-1) X Prob (6 km Shear > 40 kt) X Prob (0-1 km SRH > 100 m2s-2) X Prob (MLLCL < 1000 m) X Prob (3h conv. Pcpn > 0.01 in) Shaded Area Prob > 5% Max > 50% 38 Tornadoes related to large-scale patterns of instability and shear, often predictable several days hence.

Severe event of April 7, 2006 First ever day-2 outlook “high risk” of severe weather issued by NOAA Storm Prediction Center; in past have been cautious > 800 total severe reports, 3 killer tornadoes,10 deaths Diagnostics from SREF and good past SREF performance aided forecaster confidence 39

Example of predicting extreme event from ensemble: USfire-weather forecasting Ingredients from large-scale conditions: High wind speeds Hot temperatures Low relative humidity near surface Little rainfall 40

SREF 500 hPa mean height, wind, temperature 41 Following plots courtesy of David Bright, NOAA/NCEP/SPC, using Jun Du’s NCEP SREF system

SREF mean precipitation, vertical velocity, thickness Over desert southwest US, little model forecast mean precipitation, and very warm conditions (purple is mean 5790 m 1000-500 hPa thickness). 42

SREF Pr[P12I > .01”] and Mean P12I = .01” (dash) Some members forecasting precipitation over Colorado, New Mexico, but southern Utah and Arizona forecast dry. 43

SREF Pr[RH < 15%] and Mean RH = 15% (dash) very low near-surface relative humidity over Arizona, southern Utah 44

SREF Pr[WSPD > 20 mph] and Mean WSPD = 20 mph (dash) Many of the members are forecasting gusty winds. 45

Pr [P12I < 0.01”] X Pr [RH < 15%] X Pr [WSPD > 20 mph] X Pr [TMPF > 60F] SREF Combined or Joint Probability 46 Joint probability of fire-weather ingredients.

NOAA SPC Operational Outlook(Uncertainty communicated in accompanying text) 47

observed position Example of improving methods for initializing ensemble. 72-h track T255 GFS forecasts from EnKF ensemble for Hurricane Rita.Intense vortices in several members.

NHC official track forecast(note: official track far south of actual track, south of EnKF ensemble track forecast)

Take-home messages as we move to ensemble prediction • A new probabilistic forecast paradigm requires verifying in new ways, measuring “reliability” and “sharpness.” • Ensemble prediction requires more than slapping some control forecasts together; there are numerical principles to be followed. • Better decisions, or advanced lead time for decisions, are possible by utilizing ensembles.

Ensemble weather prediction: verification, numerical issues, and use

Ensemble weather prediction: verification, numerical issues, and use

Presentation Transcript

The History and Future of Weather Radar and Storm Prediction

Ensemble Forecasting

A History of Numerical Analysis Ideas

Data Mining: Classification

Numerical Modeling: Where We Are and Where We’re Going

Two-level Adaptive Branch Prediction

Chapter 6. Classification and Prediction

CS490D: Introduction to Data Mining Prof. Chris Clifton

Weather

Prediction (Classification, Regression)

Property of Lear Siegler

Gene Prediction

Predictability of Weather and Climate (Seamless Prediction of Weather and Climate)

Classification and Prediction

NCEP Update: Review of Progress in Operational Weather, Climate and Ocean Forecasts

Numerical Analysis

Predictability of Weather and Climate (Seamless Prediction of Weather and Climate)

Functional Verification III

Energy Sub-Metering for Measurement and Verification

Weather Forecast in India