1 / 29

Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development

Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development. John Williams CIT Mini-Workshop 4 August 10, 2007. FAA-NASA DCIT algorithm role in GTGN. NWP fields Lightning NEXRAD reflectivity Satellite data (IR, winds, CTCR, CType, OT mask, …).

marlo
Download Presentation

Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Status of NCAR Statistical Evaluation and Prototype CIT Algorithm Development John Williams CIT Mini-Workshop 4 August 10, 2007

  2. FAA-NASA DCIT algorithm role in GTGN NWP fields Lightning NEXRAD reflectivity Satellite data (IR, winds, CTCR, CType, OT mask, …) Graphic courtesy of virtualskies.arc.nasa.gov Cockpit display or alert ADDS: Dispatch, ATC, etc. GTGN In-situ reports, PIREPs, MDCRS winds and temperatures GTG diagnostics DCIT NTDA grids METARs

  3. DCIT domain of applicability DCIT provides turbulence diagnosis within some radius (100 nmi?) around convection, with diminishing “confidence” for increasing distance DCIT Clear air NTDA

  4. DCIT/GTGN schedule • DCIT version 0.1 – August 1, 2007 • very basic FL algorithm using RUC, GOES IR and NLDN Lightning on 13-km RUC grid • DCIT version 1.0 – September 30, 2007 • Additional predictors, altitude regimes, 4-km grid • DCIT version 2.0 – September 30, 2008 • Near-final version, described on next slide • DCIT version 3.0 (final) – September 30, 2009 • GTGN receives Aviation Weather Tech Transfer (AWTT) “experimental” approval – November, 2009 • GTGN receives Aviation Weather Tech Transfer (AWTT) “operational” approval – November, 2010

  5. DCIT 2.0 high-level overview NWP 1 hr fcst Regrid to 4 km and advect forward based on winds NWP-derived Temporal interpolation FL or RF synthesis, deterministic and probabilistic turbulence and confidence outputs on 4-km Lambert conformal grid NWP 2 hr fcst Regrid to 4 km and advect backward based on winds NWP-derived Advect forward based on winds/storm tracks Satellite IR Lightning Advect forward based on storm tracks Object ID, distance calculations, conceptual models Lightning density Advect forward based on winds/storm tracks NTDA EDR mosaic Advect forward based on winds/storm tracks NSSL DBZ mosaic Advect forward based on winds/storm tracks Satellite products

  6. Evaluating candidate diagnostics • Collect diagnostic data associated with in situ EDR reports from commercial aircraft • Sample data into training and testing subsets (each drawn from distinct days) • Train a Random Forest (machine learning “black box”) and evaluate first-guess diagnostic importance • Add or remove diagnostics and retrain Random Forest to produce a reduced set of strong candidates • Analyze different “regimes” (e.g., altitude bands) separately • Analyze diagnostics’ relationship to turbulence and develop a fuzzy logic algorithm

  7. Previous results (all altitudes) • Vary threshold on average of random forest “votes” to generate ROC curves (PoD vs. FAR) Performance using RUC model data only Top 38 variables (model, radar, satellite)  Real-time T-storm data adds skill

  8. Variable importance ranking • Automatically produced by random forest software

  9. CTCR data analysis • Used 283 days of CTCR data provided by CIMSS for summers 2005 and 2006 over upper Midwest domain • 1.6 million EDR reports above 15,000 ft • Computed mean, max, standard deviation, and number of CTCR data points within 5, 10, 20 and 40 km, as well as wind speed and direction at nearest pixel • Repeated above for 15 minute and 30 minute lags • Included GOES IR, RUC, RUC-derived diagnostics, lightning strike data, and some radar data

  10. Do CTCR data contain useful info? Random Forest performance without CTCR data Random Forest performance with CTCR data  CTCR data appear to provide a slight performance improvement for LOG and MOG EDR discrimination

  11. What variables are most important? • Ranking produced by random forest software

  12. GOES wind direction (exists at 86,000 pts) Conditional histograms Mean, Median, 90th Percentile

  13. CTCR: Number of points within 40 km Conditional histograms Mean, Median, 90th Percentile

  14. CTCR: Mean within 20 km Conditional histograms Mean, Median, 90th Percentile

  15. CTCR: Mean w/i 20 km, |Temp – IR| < 20C Conditional histograms Mean, Median, 90th Percentile

  16. CTCR: Standard deviation w/i 20 km Conditional histograms Mean, Median, 90th Percentile

  17. CTCR: GOES windspeed Conditional histograms Mean, Median, 90th Percentile

  18. For comparison: RUC windspeed Conditional histograms Mean, Median, 90th Percentile

  19. CTCR: GOES – RUC windspeed Conditional histograms Mean, Median, 90th Percentile

  20. GOES IR – aircraft temperature Conditional histograms Mean, Median, 90th Percentile

  21. Constructing fuzzy-logic interest maps

  22. “Consensus” combination f1 f2 f3 or f4 These factors suppress turbulence diagnosis

  23. Fuzzy logic tuning to optimize weights Inputxi Desired outputyi Function fwith parameters w Adjust wto reduce error Outputf(xi)

  24. Initial DCIT algorithm iMap = [-999 0.02; 0.5 0.02; 1 0.1; 200 2; 999 2]; Ltg_Interest = interp1(iMap(:,1), iMap(:,2), Ltg_Strikes); iMap = [-999 0.45; -105 0.45; 20 0.08; 40 0.04; 80 0.02; 999 0.02]; CTDist_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Channel_4_infrared - Linear_middle_temperature); iMap = [-999 0.15; -75 0.15; -25 0.08; 75 0.05; 999 0.02]; CT_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Channel_4_infrared); iMap = [-999 0.02; 0.001 0.02; 2.0 2.0; 999 2.0]; SF_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Structure_function_derived_eddy_dissipation_rate.^(1/3)); iMap = [-999999 0.02; 0.0 0.02; 1.0 0.08; 25 0.45; 999999 0.45]; Precip_Interest = interp1(iMap(:,1), iMap(:,2), RUC_Convective_precipitation); iMap = [-999 0.15; 1.25e4 0.35; 5.5e4 0.05; 8.65e4 0.02; 999e4 0.02]; Pressure_Interest = interp1(iMap(:,1), iMap(:,2), RUC_Mean_sea_level_pressure-RUC_Pressure); iMap = [-999 0.25; -0.05 0.25; 0.004 0.15; 0.008 0.02; 0.05 0.05; 0.08 0.02; 999 0.02]; Lapse_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Lapse_rate); iMap = [-999999 0.02; -2 0.02; 4 0.25; 600 0.40; 999999 0.40]; Ri_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Richardson_number); iMap = [-999 0.75; -7e-4 0.75; -1e-4 0.07; -0.5e-4 0.02; 2.3e-4 0.07; 8e-4 0.75; 999 0.75]; Vorticity_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_Vorticity); iMap = [-999 0; -50 0; -10 0.15; 12 0.10; 40 0.15; 60 0; 999 0]; K_index_Interest = interp1(iMap(:,1), iMap(:,2), RUC_derived_K_index); iMap = [-999 0.02; 0 0.02; 0.0025 0.10; 0.0045 0.30; 0.01 0.15; 0.03 0.15]; Humidity_Interest = interp1(iMap(:,1), iMap(:,2), RUC_Humidity_mixing_ratio); EDR = 0.58 * Ltg_Interest + 0.25 * CTDist_Interest + 0.01 * CT_Interest + 0.23 * SF_Interest + 0.064 * Precip_Interest + 0.41 * Pressure_Interest + 0.01 * Lapse_Interest + 0.12 * Ri_Interest + 0.08 * Vorticity_Interest + 0.12 * K_index_Interest + 0.30 * Humidity_Interest - 0.03;

  25. Future work • Build “conceptual models” that relate observations to expected turbulence generation/propagation • Identify regimes where different combination logic is appropriate (use clustering techniques to refine) • Analyze multi-variable dependence (joint PDFs, principle components, etc.) and create multi-dimensional interest maps • Separate the diagnostics that directly predict CIT from those that identify conditions conducive to CIT • Construct a modified Takaki-Sugeno style algorithm and tune to optimize performance • Identify regime memberships • Use RF or FL prediction logic for each regime • Perform regime-weighted combination

  26. Challenges for this approach • Empirical techniques may inadvertently exploit idiosyncrasies of the data, not fundamental physics • may be sensitive to precise method used for producing diagnostics • important to get large datasets with mature diagnostic algorithms • Training data may not be truly representative of the atmosphere • pilots attempt to avoid storms and regions of reported turbulence • may need to weight data based on density/sparseness • How does one judge the usefulness of sparsely available diagnostic fields?

  27. Extra Slides

  28. Available RUC and derived fields Turbulence Indices • Convective Parameters • CAPE • CIN • Showalter Index • Totals Indices • Lifted Index • Precipitable Water • SWEAT (Severe Wx Threat Index) • K-Index • Bulk Richardson Number • Richardson Number • Lapse Rate • DTF3 (Diagnostic TKE Formulations) • Vertical Shear • Horizontal Shear • 1/Stability • EDR (Structure Function derived Eddy Dissipation Rates) • SIGW (Structure Function derived Sigma Vertical Velocity) • Divergence • Vorticity • Dutton • NCSUI (NC State U. Index) • Colson-Panofsky • Ellrod1 • NCSUI (N.C. State U. Index) • Saturated Richardson Number • Frontogenesis Function • LAZ (Laikhman-Alter-Zalik) • NGM1 and NGM2 • ABSIA • UBF (Unbalanced Flow) • NVA (Negative Voriticity Advection) • Tropopause Height • Wind Speed

  29. Learning a Predictive Algorithm: Random Forests • Basic idea • “grow” multiple decision trees to predict turbulence based on “dartboard” values, each using a random subset of data (“bagging”) and random splitting variables • trees function as “ensembles of experts” • trees “vote” to determine consensus categorization; they also create a “probability distribution” over classes Vote: 4 Vote: 2 Vote: 4 Vote: 4 Vote: 1 => consensus vote: 4 (“confidence” 3/5)

More Related