Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development

Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development John Williams CIT Mini-Workshop 4 August 10, 2007

FAA-NASA DCIT algorithm role in GTGN NWP fields Lightning NEXRAD reflectivity Satellite data (IR, winds, CTCR, CType, OT mask, …) Graphic courtesy of virtualskies.arc.nasa.gov Cockpit display or alert ADDS: Dispatch, ATC, etc. GTGN In-situ reports, PIREPs, MDCRS winds and temperatures GTG diagnostics DCIT NTDA grids METARs

DCIT domain of applicability DCIT provides turbulence diagnosis within some radius (100 nmi?) around convection, with diminishing “confidence” for increasing distance DCIT Clear air NTDA

DCIT/GTGN schedule • DCIT version 0.1 – August 1, 2007 • very basic FL algorithm using RUC, GOES IR and NLDN Lightning on 13-km RUC grid • DCIT version 1.0 – September 30, 2007 • Additional predictors, altitude regimes, 4-km grid • DCIT version 2.0 – September 30, 2008 • Near-final version, described on next slide • DCIT version 3.0 (final) – September 30, 2009 • GTGN receives Aviation Weather Tech Transfer (AWTT) “experimental” approval – November, 2009 • GTGN receives Aviation Weather Tech Transfer (AWTT) “operational” approval – November, 2010

DCIT 2.0 high-level overview NWP 1 hr fcst Regrid to 4 km and advect forward based on winds NWP-derived Temporal interpolation FL or RF synthesis, deterministic and probabilistic turbulence and confidence outputs on 4-km Lambert conformal grid NWP 2 hr fcst Regrid to 4 km and advect backward based on winds NWP-derived Advect forward based on winds/storm tracks Satellite IR Lightning Advect forward based on storm tracks Object ID, distance calculations, conceptual models Lightning density Advect forward based on winds/storm tracks NTDA EDR mosaic Advect forward based on winds/storm tracks NSSL DBZ mosaic Advect forward based on winds/storm tracks Satellite products

Evaluating candidate diagnostics • Collect diagnostic data associated with in situ EDR reports from commercial aircraft • Sample data into training and testing subsets (each drawn from distinct days) • Train a Random Forest (machine learning “black box”) and evaluate first-guess diagnostic importance • Add or remove diagnostics and retrain Random Forest to produce a reduced set of strong candidates • Analyze different “regimes” (e.g., altitude bands) separately • Analyze diagnostics’ relationship to turbulence and develop a fuzzy logic algorithm

Previous results (all altitudes) • Vary threshold on average of random forest “votes” to generate ROC curves (PoD vs. FAR) Performance using RUC model data only Top 38 variables (model, radar, satellite)  Real-time T-storm data adds skill

Variable importance ranking • Automatically produced by random forest software

CTCR data analysis • Used 283 days of CTCR data provided by CIMSS for summers 2005 and 2006 over upper Midwest domain • 1.6 million EDR reports above 15,000 ft • Computed mean, max, standard deviation, and number of CTCR data points within 5, 10, 20 and 40 km, as well as wind speed and direction at nearest pixel • Repeated above for 15 minute and 30 minute lags • Included GOES IR, RUC, RUC-derived diagnostics, lightning strike data, and some radar data

Do CTCR data contain useful info? Random Forest performance without CTCR data Random Forest performance with CTCR data  CTCR data appear to provide a slight performance improvement for LOG and MOG EDR discrimination

What variables are most important? • Ranking produced by random forest software

GOES wind direction (exists at 86,000 pts) Conditional histograms Mean, Median, 90th Percentile

CTCR: Number of points within 40 km Conditional histograms Mean, Median, 90th Percentile

CTCR: Mean within 20 km Conditional histograms Mean, Median, 90th Percentile

CTCR: Mean w/i 20 km, |Temp – IR| < 20C Conditional histograms Mean, Median, 90th Percentile

CTCR: Standard deviation w/i 20 km Conditional histograms Mean, Median, 90th Percentile

CTCR: GOES windspeed Conditional histograms Mean, Median, 90th Percentile

For comparison: RUC windspeed Conditional histograms Mean, Median, 90th Percentile

CTCR: GOES – RUC windspeed Conditional histograms Mean, Median, 90th Percentile

GOES IR – aircraft temperature Conditional histograms Mean, Median, 90th Percentile

Constructing fuzzy-logic interest maps

“Consensus” combination f1 f2 f3 or f4 These factors suppress turbulence diagnosis

Fuzzy logic tuning to optimize weights Inputxi Desired outputyi Function fwith parameters w Adjust wto reduce error Outputf(xi)

Initial DCIT algorithm iMap = [-999 0.02; 0.5 0.02; 1 0.1; 200 2; 999 2]; Ltg_Interest = interp1(iMap(:,1), iMap(:,2), Ltg_Strikes); iMap = [-999 0.45; -105 0.45; 20 0.08; 40 0.04; 80 0.02; 999 0.02]; CTDist_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Channel_4_infrared - Linear_middle_temperature); iMap = [-999 0.15; -75 0.15; -25 0.08; 75 0.05; 999 0.02]; CT_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Channel_4_infrared); iMap = [-999 0.02; 0.001 0.02; 2.0 2.0; 999 2.0]; SF_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Structure_function_derived_eddy_dissipation_rate.^(1/3)); iMap = [-999999 0.02; 0.0 0.02; 1.0 0.08; 25 0.45; 999999 0.45]; Precip_Interest = interp1(iMap(:,1), iMap(:,2), RUC_Convective_precipitation); iMap = [-999 0.15; 1.25e4 0.35; 5.5e4 0.05; 8.65e4 0.02; 999e4 0.02]; Pressure_Interest = interp1(iMap(:,1), iMap(:,2), RUC_Mean_sea_level_pressure-RUC_Pressure); iMap = [-999 0.25; -0.05 0.25; 0.004 0.15; 0.008 0.02; 0.05 0.05; 0.08 0.02; 999 0.02]; Lapse_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Lapse_rate); iMap = [-999999 0.02; -2 0.02; 4 0.25; 600 0.40; 999999 0.40]; Ri_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Richardson_number); iMap = [-999 0.75; -7e-4 0.75; -1e-4 0.07; -0.5e-4 0.02; 2.3e-4 0.07; 8e-4 0.75; 999 0.75]; Vorticity_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Vorticity); iMap = [-999 0; -50 0; -10 0.15; 12 0.10; 40 0.15; 60 0; 999 0]; K_index_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_K_index); iMap = [-999 0.02; 0 0.02; 0.0025 0.10; 0.0045 0.30; 0.01 0.15; 0.03 0.15]; Humidity_Interest = interp1(iMap(:,1), iMap(:,2), RUC_Humidity_mixing_ratio); EDR = 0.58 * Ltg_Interest + 0.25 * CTDist_Interest + 0.01 * CT_Interest + 0.23 * SF_Interest + 0.064 * Precip_Interest + 0.41 * Pressure_Interest + 0.01 * Lapse_Interest + 0.12 * Ri_Interest + 0.08 * Vorticity_Interest + 0.12 * K_index_Interest + 0.30 * Humidity_Interest - 0.03;

Future work • Build “conceptual models” that relate observations to expected turbulence generation/propagation • Identify regimes where different combination logic is appropriate (use clustering techniques to refine) • Analyze multi-variable dependence (joint PDFs, principle components, etc.) and create multi-dimensional interest maps • Separate the diagnostics that directly predict CIT from those that identify conditions conducive to CIT • Construct a modified Takaki-Sugeno style algorithm and tune to optimize performance • Identify regime memberships • Use RF or FL prediction logic for each regime • Perform regime-weighted combination

Challenges for this approach • Empirical techniques may inadvertently exploit idiosyncrasies of the data, not fundamental physics • may be sensitive to precise method used for producing diagnostics • important to get large datasets with mature diagnostic algorithms • Training data may not be truly representative of the atmosphere • pilots attempt to avoid storms and regions of reported turbulence • may need to weight data based on density/sparseness • How does one judge the usefulness of sparsely available diagnostic fields?

Extra Slides

Available RUC and derived fields Turbulence Indices • Convective Parameters • CAPE • CIN • Showalter Index • Totals Indices • Lifted Index • Precipitable Water • SWEAT (Severe Wx Threat Index) • K-Index • Bulk Richardson Number • Richardson Number • Lapse Rate • DTF3 (Diagnostic TKE Formulations) • Vertical Shear • Horizontal Shear • 1/Stability • EDR (Structure Function derived Eddy Dissipation Rates) • SIGW (Structure Function derived Sigma Vertical Velocity) • Divergence • Vorticity • Dutton • NCSUI (NC State U. Index) • Colson-Panofsky • Ellrod1 • NCSUI (N.C. State U. Index) • Saturated Richardson Number • Frontogenesis Function • LAZ (Laikhman-Alter-Zalik) • NGM1 and NGM2 • ABSIA • UBF (Unbalanced Flow) • NVA (Negative Voriticity Advection) • Tropopause Height • Wind Speed

Learning a Predictive Algorithm: Random Forests • Basic idea • “grow” multiple decision trees to predict turbulence based on “dartboard” values, each using a random subset of data (“bagging”) and random splitting variables • trees function as “ensembles of experts” • trees “vote” to determine consensus categorization; they also create a “probability distribution” over classes Vote: 4 Vote: 2 Vote: 4 Vote: 4 Vote: 1 => consensus vote: 4 (“confidence” 3/5)

Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development

Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development

Presentation Transcript

Statistical Evaluation of Data

Opto-board Prototype Status

Evaluation of Statistical Reports

Statistical Evaluation of Data

USNG Prototype Evaluation

Status of the Calorimeter prototype

Algorithm Evaluation

Algorithm Performance Evaluation

Prototype 1: status, contents and results

DFEA Status and Algorithm

Status of electronics for NuTel prototype

TROPOMI NO 2 Prototype Algorithm

Algorithm Status:

Learning Algorithm Evaluation

NCAR TeraGrid RP Status Update

STATISTICAL EVALUATION

Status of shaper prototype production

STATISTICAL EVALUATION OF AGRICULTURAL DEVELOPMENT IN ASIAN COUNTRIES

Learning Algorithm Evaluation

Status of Prototype Preparation and SiPM Characterization

prototype development