340 likes | 488 Views
Improving HYSPLIT Forecasts with Data Assimilation*. Kostas Kalpakis Associate Professor Computer Science and Electrical Engineering Department University of Maryland Baltimore County April 5, 2011 Joint work with Shiming Yang and Yaacov Yesha. * Supported in part by an IBM grant.
E N D
Improving HYSPLIT Forecasts with Data Assimilation* Kostas Kalpakis Associate Professor Computer Science and Electrical Engineering Department University of Maryland Baltimore County April 5, 2011 Joint work with Shiming Yang and YaacovYesha *Supported in part by an IBM grant.
Outline • Introduction • Motivation and Goal • Data Assimilation • Our approach • State-Space Models • The NOAA HYSPLIT Model • The LETKF Algorithm • Experiments and Evaluation • CAPTEX • California wildfires, August 2009 • Summary
Motivation • High volume real-time sensor data streams for monitoring and forecasting applications are becoming ubiquitous • Bridging the gap between predictions and real-time observations is needed • Demands for environmental monitoring and hazard prediction are pressing • Need to incorporate measurements from the thousands of sensors that underlie IBM’s “smarter Planet” initiatives into various geophysical processes
Goals • Our goals are to • incorporate a data assimilation capability into HYSPLIT • HYSPLIT is extensively used as a routine for many data products • utilize in-situ and remotely sensed observations for improved forecasts • apply to wildfire smoke prediction and monitoring • develop efficient data assimilation system using InfoSphereStream’s SPADE framework for distributed high-performance platforms
Data assimilation • Data assimilation is a set of techniques that • Incorporate real world observations into model analysis and forecast cycle • Help reduce model error growth (small correction and short range forecast) • Improve upon the estimation of model initial conditions for the next forecast cycle
The state-space model • Model a system by • Where
Data assimilation in state-space • Data assimilation becomes an estimation problem • Find a maximum likelihood estimate of the trajectory of the system states given a set of observations • Problem reduces to minimizing the cost function • Kalman filters, a recursive method, can be used to minimize this cost function efficiently for low-dimensional state space, with linear model and observation operators, and Gaussian noise processes • Otherwise, the problem is often computationally difficult
Data assimilation via Kalman filters Background state … • Graphical view of data assimilation using Kalman filters Analysis state - - - Observation time
The NOAA HYSPLIT Model • HYSPLIT • Hybrid Single Particle Lagrangian Integrated Trajectory Model • A model system that computes air parcel trajectories, dispersion and deposition of pollutants • Computes particle dispersion with the puff model or the particle model • Needs meteorology data and emission source information • Has been validated using ground truth observations* • Used as a routine for various data products • Air Quality Index (AQI) • Smoke Forecast System (SFS) *R.R. Draxler, J.L. Heffter, and G.D. Rolph. Datem: Data archive of tracer experiments and meteorology. August 2001. http://www.arl.noaa.gov/DATEM.php, last checked Jul. 2010
Data assimilation for HYSPLIT • Utilize HYSPLIT as a model operator in a state-space model and assimilate observations into HYSPLIT • First, we need to carefully define the system state, so that we can extract it, modify it, and restart HYSPLIT • Second, since the model operator is non-linear and the system state is very large, standard extended Kalman filters are an expensive option for data assimilation • We use the LETKF algorithm, an ensemble transform Kalman filter
Data assimilation for HYSPLIT • Use • the mass of the particles in HYSPLIT as the system state • the grid concentrations as the default observation operator
LETKF Algorithm • LETKF (Local Ensemble Transform Kalman Filter)* • nonlinear model operators, linear observation operators • Gaussian state and observational noise processes • Reduces implementation costs since it does not need adjoints • It does analysis locally in the ensemble space • which is typically of low dimension (< 100) • avoids inverses of large matrices • It is embarrassingly parallel • We have implemented LETKF in C with MPI, and in IBM InfoSphereStreams *Brian Hunt, Eric Kostelich and Istvan Szunyogh, “Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter”, Physica D 230, pp 112-126, 2007.
The LETKF Algorithm • Global steps: maintain an ensemble of K system states • Forward system state: • Analysis: construct background analysis ensemble , background observation ensemble , and their mean and covariance matrices • Local steps: for each grid point, choose local observation and background system state. Then calculate: • Analysis error covariance: • Perturbation: • Analysis ensemble in ensemble space: • Analysis ensemble in state space:
Implementation using IBM InfoSphereStreams • InfoSphereStreams is a system developed by IBM for the very fast processing of large and fast data streams that supports • parallel and high performance stream processing • continuous ingestion and analysis • scaling over a range of hardware capabilities • flexible to changing user objectives, available data, and computing resource availability • the bursty nature of real-time observations of rapidly evolving physical phenomena • Uses SPADE to describe the stream operators
Experiments and evaluation • Experimentally evaluate our approach using the controlled releases of tracers available in DATEM datasets • Demonstrate our approach using in-situ and remotely sensed real data from a California fire in August 2009 • Observation and emission rates are taken from EPA AQS and GBBEP, and MODIS AOD when available
Evaluation metrics • We use HYSPLIT’s statmain to compute evaluation metrics for a HYSPLIT forecast with respect to the ground truth • We report on the following metrics • The Normalized Mean Squared Error (NMSE) • The model rank, an overall quality of the model (larger values are better; the maximum value is 4).
CAPTEX • CAPTEX (Cross-Appalachian Tracer Experiment) • Time: 2100 UTC Sep 18 to 2100 UTC Oct 29, 1983 • Area: U.S. and Canada • 6 releases (3hr duration each) of special tracer (PFT). • emission sources and rates are those in DATEM • Use DATEM CAPTEX observations as the ground truth • Observations at 84 stations every 3 hrs for 48 hrs after each release • Run 160 iterations, each iteration simulating a 3hr time period
CAPTEX After 3hr Forecasts with data assimilation After 6hr After 9hr After 12 hr
CAPTEX • CAPTEX with and w/o data assimilation
CAPTEX • CAPTEX with and w/o data assimilation
Modified CAPTEX • To assess whether our approach improves the forecasts given inaccurate emissions rates, we do the following • Use the CAPTEX concentrations as ground truth • Run HYSPLIT with modified emissions rate for CAPTEX in two modes (with and w/o data assimilation) • For the 2nd release that begins at 1700 UTC 25 Sep. 1983 use the emission rate of 33.5 Kg/h instead of the 67Kg/h given in DATEM • Compare with unmodified CAPTEX emissions w/o data assimilation
California wildfire, August 2009 • Experiments to forecast particulate matter PM2.5 concentrations from a wildfire in California on August 2009 • Data used • Ground observations from EPA’s Air Quality System (AQS) (hourly obs) • Satellite observations from • Terra/Aqua MODIS Aerosol Optical Depth (AOD) (daily obs) • Geostationary Operational Environmental Satellite (GOES) East/West AOD (hourly obs) • Emission rates from GBBEP (GOES-E/W Biomass Burning Emission Product) (hourly obs) • Data for SO2, NOx, CO, CO2, relative humidity are also available from these data sources but not used
California wildfire, August 2009 • Experiment using AQS observations and GBBEP emission rates • Time: 2100 UTC Aug 9 to 2100 UTC Aug 20, 2009 • Area: California and Nevada • use hourly AQS data as ground truth observations • use GBBEP hourly PM2.5 emissions from 2019 source points • emission rates range from 200g/hr to 10Kg/hr • each iteration simulates a 1hr period
California wildfire, August 2009 • AQS+GBBEP
California wildfire, August 2009 • AQS+GBBEP
California wildfire, August 2009 • AQS+GBBEP
Summary • Our data assimilation system: • demonstrates improvement on statistical metrics, e.g. average 16.0% improvement on NMSE in DATEM/CAPTEX • uses state-of-the-art prediction model and assimilation algorithm • shows that LETKF offers good algorithmic efficiency • can easily utilize other models and multiple data sources • Uses data sources from ground sites and satellites for pollutant concentration and emission rates • Can be extended to other domains, e.g. volcanic ash • Demo website: • http://bluegrit.cs.umbc.edu/~shiming1/demo/
Acknowledgments • We would like to thank • IBM for its generous support, and the InfoSphereStream team for its indispensible help • Drs. Ben Kyger and Roland Draxler for providing the HYSPLIT model and answering many of our questions • Dr. Milt Halem for his encouragement and support, and the Multicore Computing Center at UMBC for providing the computing environment • Dr. Hai Zhang of the UMBC Atmospheric Lidar Group, for his help on MODIS AOD • NASA for the MODIS data, NOAA for the GOES, GBBEP, and DATEM data, and EPA for the AQS data