290 likes | 403 Views
Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development. John Williams CIT Mini-Workshop 4 August 10, 2007. FAA-NASA DCIT algorithm role in GTGN. NWP fields Lightning NEXRAD reflectivity Satellite data (IR, winds, CTCR, CType, OT mask, …).
E N D
Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development John Williams CIT Mini-Workshop 4 August 10, 2007
FAA-NASA DCIT algorithm role in GTGN NWP fields Lightning NEXRAD reflectivity Satellite data (IR, winds, CTCR, CType, OT mask, …) Graphic courtesy of virtualskies.arc.nasa.gov Cockpit display or alert ADDS: Dispatch, ATC, etc. GTGN In-situ reports, PIREPs, MDCRS winds and temperatures GTG diagnostics DCIT NTDA grids METARs
DCIT domain of applicability DCIT provides turbulence diagnosis within some radius (100 nmi?) around convection, with diminishing “confidence” for increasing distance DCIT Clear air NTDA
DCIT/GTGN schedule • DCIT version 0.1 – August 1, 2007 • very basic FL algorithm using RUC, GOES IR and NLDN Lightning on 13-km RUC grid • DCIT version 1.0 – September 30, 2007 • Additional predictors, altitude regimes, 4-km grid • DCIT version 2.0 – September 30, 2008 • Near-final version, described on next slide • DCIT version 3.0 (final) – September 30, 2009 • GTGN receives Aviation Weather Tech Transfer (AWTT) “experimental” approval – November, 2009 • GTGN receives Aviation Weather Tech Transfer (AWTT) “operational” approval – November, 2010
DCIT 2.0 high-level overview NWP 1 hr fcst Regrid to 4 km and advect forward based on winds NWP-derived Temporal interpolation FL or RF synthesis, deterministic and probabilistic turbulence and confidence outputs on 4-km Lambert conformal grid NWP 2 hr fcst Regrid to 4 km and advect backward based on winds NWP-derived Advect forward based on winds/storm tracks Satellite IR Lightning Advect forward based on storm tracks Object ID, distance calculations, conceptual models Lightning density Advect forward based on winds/storm tracks NTDA EDR mosaic Advect forward based on winds/storm tracks NSSL DBZ mosaic Advect forward based on winds/storm tracks Satellite products
Evaluating candidate diagnostics • Collect diagnostic data associated with in situ EDR reports from commercial aircraft • Sample data into training and testing subsets (each drawn from distinct days) • Train a Random Forest (machine learning “black box”) and evaluate first-guess diagnostic importance • Add or remove diagnostics and retrain Random Forest to produce a reduced set of strong candidates • Analyze different “regimes” (e.g., altitude bands) separately • Analyze diagnostics’ relationship to turbulence and develop a fuzzy logic algorithm
Previous results (all altitudes) • Vary threshold on average of random forest “votes” to generate ROC curves (PoD vs. FAR) Performance using RUC model data only Top 38 variables (model, radar, satellite) Real-time T-storm data adds skill
Variable importance ranking • Automatically produced by random forest software
CTCR data analysis • Used 283 days of CTCR data provided by CIMSS for summers 2005 and 2006 over upper Midwest domain • 1.6 million EDR reports above 15,000 ft • Computed mean, max, standard deviation, and number of CTCR data points within 5, 10, 20 and 40 km, as well as wind speed and direction at nearest pixel • Repeated above for 15 minute and 30 minute lags • Included GOES IR, RUC, RUC-derived diagnostics, lightning strike data, and some radar data
Do CTCR data contain useful info? Random Forest performance without CTCR data Random Forest performance with CTCR data CTCR data appear to provide a slight performance improvement for LOG and MOG EDR discrimination
What variables are most important? • Ranking produced by random forest software
GOES wind direction (exists at 86,000 pts) Conditional histograms Mean, Median, 90th Percentile
CTCR: Number of points within 40 km Conditional histograms Mean, Median, 90th Percentile
CTCR: Mean within 20 km Conditional histograms Mean, Median, 90th Percentile
CTCR: Mean w/i 20 km, |Temp – IR| < 20C Conditional histograms Mean, Median, 90th Percentile
CTCR: Standard deviation w/i 20 km Conditional histograms Mean, Median, 90th Percentile
CTCR: GOES windspeed Conditional histograms Mean, Median, 90th Percentile
For comparison: RUC windspeed Conditional histograms Mean, Median, 90th Percentile
CTCR: GOES – RUC windspeed Conditional histograms Mean, Median, 90th Percentile
GOES IR – aircraft temperature Conditional histograms Mean, Median, 90th Percentile
“Consensus” combination f1 f2 f3 or f4 These factors suppress turbulence diagnosis
Fuzzy logic tuning to optimize weights Inputxi Desired outputyi Function fwith parameters w Adjust wto reduce error Outputf(xi)
Initial DCIT algorithm iMap = [-999 0.02; 0.5 0.02; 1 0.1; 200 2; 999 2]; Ltg_Interest = interp1(iMap(:,1), iMap(:,2), Ltg_Strikes); iMap = [-999 0.45; -105 0.45; 20 0.08; 40 0.04; 80 0.02; 999 0.02]; CTDist_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Channel_4_infrared - Linear_middle_temperature); iMap = [-999 0.15; -75 0.15; -25 0.08; 75 0.05; 999 0.02]; CT_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Channel_4_infrared); iMap = [-999 0.02; 0.001 0.02; 2.0 2.0; 999 2.0]; SF_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Structure_function_derived_eddy_dissipation_rate.^(1/3)); iMap = [-999999 0.02; 0.0 0.02; 1.0 0.08; 25 0.45; 999999 0.45]; Precip_Interest = interp1(iMap(:,1), iMap(:,2), RUC_Convective_precipitation); iMap = [-999 0.15; 1.25e4 0.35; 5.5e4 0.05; 8.65e4 0.02; 999e4 0.02]; Pressure_Interest = interp1(iMap(:,1), iMap(:,2), RUC_Mean_sea_level_pressure-RUC_Pressure); iMap = [-999 0.25; -0.05 0.25; 0.004 0.15; 0.008 0.02; 0.05 0.05; 0.08 0.02; 999 0.02]; Lapse_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Lapse_rate); iMap = [-999999 0.02; -2 0.02; 4 0.25; 600 0.40; 999999 0.40]; Ri_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Richardson_number); iMap = [-999 0.75; -7e-4 0.75; -1e-4 0.07; -0.5e-4 0.02; 2.3e-4 0.07; 8e-4 0.75; 999 0.75]; Vorticity_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Vorticity); iMap = [-999 0; -50 0; -10 0.15; 12 0.10; 40 0.15; 60 0; 999 0]; K_index_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_K_index); iMap = [-999 0.02; 0 0.02; 0.0025 0.10; 0.0045 0.30; 0.01 0.15; 0.03 0.15]; Humidity_Interest = interp1(iMap(:,1), iMap(:,2), RUC_Humidity_mixing_ratio); EDR = 0.58 * Ltg_Interest + 0.25 * CTDist_Interest + 0.01 * CT_Interest + 0.23 * SF_Interest + 0.064 * Precip_Interest + 0.41 * Pressure_Interest + 0.01 * Lapse_Interest + 0.12 * Ri_Interest + 0.08 * Vorticity_Interest + 0.12 * K_index_Interest + 0.30 * Humidity_Interest - 0.03;
Future work • Build “conceptual models” that relate observations to expected turbulence generation/propagation • Identify regimes where different combination logic is appropriate (use clustering techniques to refine) • Analyze multi-variable dependence (joint PDFs, principle components, etc.) and create multi-dimensional interest maps • Separate the diagnostics that directly predict CIT from those that identify conditions conducive to CIT • Construct a modified Takaki-Sugeno style algorithm and tune to optimize performance • Identify regime memberships • Use RF or FL prediction logic for each regime • Perform regime-weighted combination
Challenges for this approach • Empirical techniques may inadvertently exploit idiosyncrasies of the data, not fundamental physics • may be sensitive to precise method used for producing diagnostics • important to get large datasets with mature diagnostic algorithms • Training data may not be truly representative of the atmosphere • pilots attempt to avoid storms and regions of reported turbulence • may need to weight data based on density/sparseness • How does one judge the usefulness of sparsely available diagnostic fields?
Available RUC and derived fields Turbulence Indices • Convective Parameters • CAPE • CIN • Showalter Index • Totals Indices • Lifted Index • Precipitable Water • SWEAT (Severe Wx Threat Index) • K-Index • Bulk Richardson Number • Richardson Number • Lapse Rate • DTF3 (Diagnostic TKE Formulations) • Vertical Shear • Horizontal Shear • 1/Stability • EDR (Structure Function derived Eddy Dissipation Rates) • SIGW (Structure Function derived Sigma Vertical Velocity) • Divergence • Vorticity • Dutton • NCSUI (NC State U. Index) • Colson-Panofsky • Ellrod1 • NCSUI (N.C. State U. Index) • Saturated Richardson Number • Frontogenesis Function • LAZ (Laikhman-Alter-Zalik) • NGM1 and NGM2 • ABSIA • UBF (Unbalanced Flow) • NVA (Negative Voriticity Advection) • Tropopause Height • Wind Speed
Learning a Predictive Algorithm: Random Forests • Basic idea • “grow” multiple decision trees to predict turbulence based on “dartboard” values, each using a random subset of data (“bagging”) and random splitting variables • trees function as “ensembles of experts” • trees “vote” to determine consensus categorization; they also create a “probability distribution” over classes Vote: 4 Vote: 2 Vote: 4 Vote: 4 Vote: 1 => consensus vote: 4 (“confidence” 3/5)