IMPACT OF TAMDAR ON THE RUC MODEL: A LOOK INTO SOME OF THE STATISTICS WITH CASE STUDIES

IMPACT OF TAMDAR ON THE RUC MODEL: A LOOK INTO SOME OF THE STATISTICS WITH CASE STUDIES Ed Szoke,* Stan Benjamin, Randy Collander*, Brian Jamison*, Bill Moninger, Tom Schlatter, and Tracy Smith* NOAA/ESRL Global Systems Division Boulder, CO USA *Joint collaboration with the Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, CO

The main issue: Objective evaluation (statistics) of relative humidity (RH) has occasionally shown poorer performance for RUC runs with TAMDAR Statistics - calculated by comparing RUC forecasts with and without TAMDAR to RAOBs at the standard pressure levels (850, 700, 500 mb) Is this really worse performance with TAMDAR OR are there other reasons for the poorer scores? Procedure: Find days that stand out with poorer scores Examine individual RAOBs with forecast soundings to see where the errors occur Concentrate on the Great Lakes subset (13 RAOBs) Overview

Verification areas: for this study we used the inner (blue) box containing 13 RAOBs

3-h RMS error statistics for June-October 2006 at 700 mb for Great Lakes area – 13 RAOBs Dev - control Dev2 – add TAMDAR RUC with TAMDAR (“dev2”, blue line) RUC run without TAMDAR (“dev”, red line) for the Great Lakes area. Bottom plot shows the difference, positive if dev2 is a better forecast than dev. Starred days highlight poorer scores for dev2.

Dev - control Dev2 – add TAMDAR 6-h RMS statistics for June-October 2006 at 700 mb Statistics (RMS error) for RH for 6-h RUC forecasts valid at 0000 UTC at 700 mb for the Great Lakes area. Starred days highlight poorer scores for dev2. * * * *

raob dev dev2 RMS score for dev2 is 7% worse than dev for 3-h forecasts valid at 0000 UTC 23 June. RAOB comparison showed 2 sites account for most of this error. Peoria, Illinois (ILX) comparison is shown here. For all plots RAOB is green, dev (RUC w/o TAMDAR) in blue dev2 (RUC w/TAMDAR) in black. The shape (character) of the dev2 RH appears to be a better match to the RAOB, but is off by ~50 mb so scores poorly at 700 mb. RH at 700 mb: RAOB = 74 % dev = 94 % dev2 = 34 % (60% error!) Case 1: 23 June 06 00z – 3h forecasts

raob dev dev2 Pittsburgh (PIT) was the other RAOB where the RH is significantly worse for dev2 than for dev. In this case, dev2 is 39% drier than the RAOB, while dev is only 12% excessively moist. While one could argue that the shape of the dev2 RH profile better matches the changes in the vertical shown by the RAOB, the excessive drying for dev2 is probably simply not as good a forecast in this case. Case 1: 23 June 06

raob dev dev2 RMS scores for dev2 again were ~7% worse than for dev, for 3-h forecasts valid at 0000 UTC 14 July. RAOB comparison showed 4 sites account for most of this error. Buffalo, NY (BUF) comparison is shown here. In this case the dev2 follows the RAOB RH profile nicely until there is a more moist shift exactly at 700 mb, yielding what apperas to be an unrepresentative error at 700 mb for dev2 while dev happens to get a perfect match. RH differences: dev2 ~20% dev almost no error Case 2: 14 July 06 – 3 h 700 mb forecasts

raob dev dev2 Aberdeen, South Dakota (ABR) comparison is shown here. The dev2 RH forecast more closely matches the RAOB up to ~770 mb, then both forecasts dry out, while the RAOB does not. While both forecasts dry at about the same rate in the vertical, it happens that the dev forecast crosses the RAOB at 700 mb. But this is because it is erroneously more moist below 750 mb! So the better score at 700 mb is not representative (with dev2 being 23% drier than the RAOB). Case 2: 14 July 06

raob dev dev2 RMS scores for dev2 forecasts were ~5% worse than for dev, for 6-h forecasts valid at 0000 UTC 14 July. RAOB comparison showed about half the sites accounting for smaller errors. Buffalo, NY (BUF) 6-h comparison is shown as it illustrates the error that happens to occur with a sharp but vertically shallow more moist layer in the RAOB just at 700 mb. Nothing in other observations to know if this is real. Without this layer the dev2 forecast follows the RAOB moisture more closely than dev. Case 2a: 14 July 06 – 6 h 700 mb forecasts

raob dev dev2 Green Bay, Wisconsin (GRB) comparison is shown here. The RMS error at 700 mb for dev2 on 12 Oct was 7% worse than for dev. Almost all of this error comes from the GRB comparison. RH & Differences at 700 mb: RAOB: 88% RH dev: 83% (-5% diff) dev2: 22% (-66% diff) The difference at 700 mb is the largest found during this period. It occurs as the dev2 forecast dries out a deep portion of the troposphere in nw flow. Is the forecast as bad as it looks? Case 3: 12 October 06 – 6h forecasts valid 0000 UTC

The forecast from dev2 may not be as bad as it appeared. There is significant drying to the west and northwest of GRB behind the deep 700 mb upper level low. (Dewpoint is number below the temperature on the station plots.) So the main issue may be that the forecast from dev2 is just off a small amount in timing. Case 3: 12 October 06 0000 UTC 700 mb plot

Case 3: 12 October 06 0000 UTC – RAOB and dev2 comparison illustrating drying Another way to show this drying is illustrated here with an overlay of the GRB RAOB and 2 upstream RAOBs (MPX and INL), along with the dev2 6-h forecast. MPX, more to the west of GRB, is drier above 700 mb. INL, more to the nw, shows the drier layer reaching all the way down past 700 mb. Note that the dev2 forecast compares rather well to the INL RAOB, verifying nearly exactly at 700 mb.

raob dev dev2 RMS score for dev2 is 4.5% worse than for dev for 3-h forecasts valid at 0000 UTC 20 October. RAOB comparison showed 2 sites account for most of this error (INL & MPX). International Falls (INL) comparison is shown. The deep layer of drying in the RAOB is better captured by the dev2 RH forecast, while dev appears to mainly miss this dry layer, but happens to verify better at 700 mb. RH & Differences at 700 mb: RAOB: 12% RH dev: 42% (+30% diff) dev2: 70% (+58% diff) Case 4: 20 October 06 0000 UTC 3-h forecasts at 700 mb

Case 4: 20 October 06 0000 UTC 500 mb plot with IR Similar to the last case, drying is occurring behind a trough axis passing INL, so could argue that the character of the dev2 forecast is more representative of what is really happening than the dev forecast, though scoring worse at 700 mb.

raob dev dev2 Case 5: 28 June 06 0000 UTC 3-h forecasts at 700 mb Better verification for dev2 (RUC w/TAMDAR) This time the RMS score for dev2 is 10% better than for dev for 3-h forecasts valid at 0000 UTC 28 June. RAOB comparisons found that a lot of variability, but some big errors for dev. Wilmington, Ohio (ILN) comparison is shown. Both forecasts have the drying beginning lower than observed, but because it does not start for dev2 until just above 700 mb it scores much better than dev. RH & Differences at 700 mb: RAOB: 78% RH dev: 13% (-55% diff) dev2: 61% (-17% diff)

raob dev dev2 Case 6: 18 Oct 06 0000 UTC 3-h forecasts at 700 mb Better verification for dev2 (RUC w/TAMDAR) • In this case.. • dev2 is 4% better than dev • (3-h forecasts ) • Minneapolis, Minnesota • (MSP) • This case illustrates the • effect of a • very sharp dry • layer in the RAOB (which • may or may not be real). • shape of both RUC • forecasts is similar, but • dev2 moisture profile • is shifted ~30 mb lower • and happens to closely • match the RAOB at • 700 mb, yielding an RH • value 31% better than • dev at 700 mb.

RH often varies strongly in the vertical as shown in RAOB profiles Calculating error statistics only at the mandatory levels makes them more vulnerable to unrepresentativeness It can only take 1 or 2 bad RAOB comparisons (out of 13 in the Great Lakes area) to yield a large RMS error With only mandatory levels being used, slight shifts of the RH in the vertical can be severely penalized The RAOBs often have some very sharp RH variations in the vertical that may or may not be real and can result in huge errors if they fall at a mandatory level Additionally, it is unrealistic to expect the RUC model to resolve some of these fluctuations (if they are indeed real) Considering the above reasoning, we decided to change the verification to a layer method Calculations made at 10-mb intervals What this all means...

Comparison of new scoring method with old for October 2006 New method: 900 to 650 mb averaging Old method: 700 mb single level Days of interest are highlighted. For 12 Oct and 20 Oct dev still scores better, but the error is much reduced (~2% RMS for layer on 12 Oct vs. 5% at 700 mb only; for 20 Oct also ~2% RMS for layer on vs. ~5% at 700 mb only). For 18 Oct, when dev2 scored better at 700 mb, the difference is also reduced by more than half. These results appear to be consistent with the sounding comparisons shown earlier.

We began the study as a forensic pathology study to try to better understand why the RMS RH scores were substantially worse for the RUC runs with TAMDAR on some days Used the Great Lakes area with 13 RAOB/forecast comparisons Focused on 3-h and 6-h forecasts valid at 0000 UTC since TAMDAR data in abundance for these initialization times Discovered issues with the mandatory-only scoring method Change to a layer average method have produced more representative results Found no characteristic problems with TAMDAR data or with RUC no misdesign with RUC assimilation or model additional TAMDAR in upstream airports would decrease aliasing Summary

IMPACT OF TAMDAR ON THE RUC MODEL: A LOOK INTO SOME OF THE STATISTICS WITH CASE STUDIES