360 likes | 378 Views
Integrated, scalable framework for dynamic weather analysis, forecast models, and data tools. Enables instant response to weather changes and user inputs, optimizing data collection and operation. Key components include CS/IT research, workflow orchestration, data streaming, monitoring, data management, and mining tools.
E N D
LEAD Motivation Each year, mesoscale weather – floods, tornadoes, hail, strong winds, lightning, and winter storms – causes hundreds of deaths, routinely disrupts transportation and commerce, and results in annual economic losses > $13B.
STATIC OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Traditional Methodology • Product Generation, • Display, • Dissemination Prediction/Detection PCs to Teraflop Systems • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields The Process is Entirely Serial and Static (Pre-Scheduled): No Response to the Weather! • End Users • NWS • Private Companies • Students
The LEAD Goal • To create an integrated, scalable framework that allows analysis tools, forecast models, and data repositories to be used as dynamically adaptive, on-demand systems that can • change configuration rapidly and automatically in response to weather • continually be steered by new data (i.e., the weather) • respond to decision-driven inputs from users • initiate other processes automatically • steer remote observing technologies to optimize data collection for the problem at hand • operate independent of data formats and the physical location of data or computing resources
LEAD CS/IT Research • Workflow orchestration – theconstruction and scheduling of execution task graphs with data sources drawn from real-time sensor streams and outputs • Data streaming – to support robust, high bandwidth transmission of multi-sensor data. • Distributed monitoringand performance evaluation -- to enable soft real-time performance guarantees by estimating resource behavior. • Data management – for storage and cataloging of observational data, model output and results from data mining. • Data miningtools – that detect faults, allow incremental processing (interrupt / resume), and estimate run time and memory requirements based on properties of the data (e.g., number of samples, dimensionality). • Semantic and data interchange technologies – to enable use of heterogeneous data by diverse tools and applications.
The Five Canonical Problems • #1. Create a 10-year detailed climatology of thunderstorm characteristics across the U.S. using historical and streaming NEXRAD radar data. This could be expanded to a fine-scale hourly re-analysis using ADAS. • #2. Run a broad parameter suite of convective storm simulations to relate storm characteristics to the environments in which they form/move • #3. Produce high-resolution nested WRF forecasts that respond dynamically to prevailing and predicted weather conditions and compare with single static forecasts • #4. Dynamically re-task a Doppler radar to optimally sense atmospheric targets based upon a continuous interrogation of streaming data • #5. Produce weather analyses and ensemble forecasts on demand – in response to the evolving weather and to the forecasts themselves
The Relevance of Canonical Problem Three March 28, 2000 Fort Worth Tornadic Storms
NWS 12-hr Forecast Valid at 6 pm CDTNo Explicit Evidence of Precipitation in North Texas
5 ENSEMBLES Actual Radar Ensemble Member #2 Ensemble Member #1 Control Forecast Ensemble Member #4 Ensemble Member #3
Data Surface Observations Upper-Air ObservationsCommercial Aircraft Data NEXRAD Radar Data Satellite Data Wind Profiler Data Land Surface Data Terrain Data Background Model Fields and Previous Forecasts Remapping, Gridding, Conversion ADAS Quality Control ADAS Analysis (3D Gridded Fields) + Background Fields Canonical Problem Three ADAS Quality Control ADASAnalysisProcessing ESML andDecoding 3D Gridded Fields in WRF Mass Coordinate + Suite of Ensemble Initial Conditions ADAS-to-WRF Converter Data mover and Computer Resource Allocater Metadata Catalog WRF Factory WRF Gridded Output myLEADStorage Define Data Requirements and Query for Desired Data Meta Data Creation and Cataloging Adjust ForecastConfiguration and Schedule Resources Visualize
LEAD: Everything is a Service Service B (WRF) Service A (ADAS) Service C (NEXRAD Stream) Service F (IDV) Service E (VO Catalog) Service D (MyLEAD) Service I (ESML) Service H (Scheduling) Service G (Monitoring) Service L (Decoder) Service K (Ontology) Service J (Repository) Many others…
ESML Service Surface Observations Soundings, Others NEXRAD Radar ESML file ESML file ESML file Network ESML Library Numerical Models and Assimilation Systems (ADAS, WRF) • Scientists can: • Select remote files across the network • Select different observational data to increase the model prediction accuracy • Purpose: • Use ESML to incorporate observational data into the numerical models for simulation Prediction
THREDDS Catalog THREDDS Catalog THREDDS Catalog THREDDS Catalog IDD IDD LEAD Data Foundation “Initial 7 Data Sets” • ETA model gridded analysis • METAR surface observations • Rawinsondes – upper air balloon observations • ACARS – commercial aircraft temperature/wind observations • NEXRAD Level II data • GOES visible satellite data OU testbed UAH testbed Unidata testbed NCSA testbed IDD data distribution system, THREDDS catalog IDD IDD
Datamining Tools IDV LEAD Data Services Clients THREDDS Browser Map GUI OPeNDAP Client Workflow • THREDDS • Thematic Real-time Environmental Distributed Data Services Meta services and tools VO Catalog Ontology Service gridFTP ESML Interchange Decoder (shim) services IU Stream Service THREDDS myLEAD Personal Catalog Catalog services LDM ADDE OPeNDAP RLS Data servers
Workflow Issues • BPEL4WS • Business Process Execution Language for Web Services • OGCE • Open Grid Computing Environment (aka portal) • www.collab-ogce.org/nmi/index.jsp • OGRE • Open GCE Runtime Engine • http://corvo.ncsa.uiuc.edu/ogre/ • leverages Java Commodity Grid (CoG) • expansion of Apache Ant • broader conditional execution • a general “loop” structure for task execution • data communication among tasks
The Portal “Front Door” Firewall Rendering Service Security Service Resource Monitoring Services Browser https Metadata Catalog Service Portal Server User’s local MyLead server Service Registry WRF Application Factory Service User’s Desktop OGRE Execution BPEL4ws workflow WRF
Client’s Browser Portal Architecture (OGCE) • Building on standard technologies • portlet Design (JSR-168) IBM, Oracle, Sun, BEA, Apache • grid standards: Java CoG, web/grid services/WSRP • User configurable, service oriented • Based on portlet design • a portlet is a component within the portal that provides the interface between the user and some service • portlets can be exchanged, interoperate Grid Java Protocols Java Local CoG COG Grid Services Portlets API Kit GRAM, Portal container MDS - LDAD MyProxy Grid Services Grid Service SOAP ws call Portlets Web Services
Component Composer • Interactive workflow composer • component database and workflow compiler • provided by the grid service • Composer allows • component selectiondrag/drop placement • save/load graph functions
User Interaction with Workflows Browser • In the portal server, save • the workflow script • an XML description of workflow parameters • Users invoke workflows from the portal • select workflow from list • Portal presents parameter form to the user • Portal sends parameter and script • to a “workflow factory” that starts execution https Portal Server Workflow Factory
LEAD Dynamic Workflow • A transforming element of LEAD is dynamic workflow orchestration and data management • Allows the use of analysis tools, forecast models, and data repositories not in fixed configurations or as static recipients of data, as is now the case, but rather as dynamically adaptive, on-demand, Grid-enabled systems that can • change configuration rapidly and automatically in response to weather • continually be steered by new data • respond to decision-driven inputs from users • initiate other processes automatically • steer remote observing technologies to optimize data collection
The OGRE Approach • The Open Grid Runtime Environment (OGRE) • An execution environment based on the Java COG and Globus for task and data management in the grid environment • Extends Apache ANT as the input language. • OGRE programs can publish events into a notification system such as WS-Notification • Interoperates with Web and Grid services. • A complete set of graphical tools for monitoring and managing the execution of the workflows OGRE managing WRF job execution
Data Miner Pattern Matcher WRF Factory WRF WRF WRF WRF …. Simulation Analysis Conditional Behavior • Two types • structure is constant • structure is dynamic • Workflow structure is constant, but • iterations are determined at runtime • the number of dynamically spawned tasks • depends upon the weather! • Workflow structure varies • set of dynamically selected patterns • selection based on triggers Condition 1 Condition 2
LEAD Characterization • Modules via SvPablo • building blocks • manually instrumented, most likely • WRF, the initial target • initial instrumentation by SC04 • Workflow via Autopilot • component overheads • service negotiation costs • timeline dependent on LEAD prototyping cycle • fault tolerance for workflow orchestration • Note • complete Autopilot reimplementation underway • web services and (later) GT4/WSRF compatibility
GWSTBs IDV IDDData Stream Decoders Prototype 1a (Finish 9/1/04) • Capability: Ingest a single data type (NCEP gridded Eta analysis), decode (from GRIB to NetCDF), visualize using IDV Automatic User driven • Issues • S/W compatibility across sites; all required files must reside on all sites (e.g., Java start, etc); packaging for installation • Use CVS now (hosted by NCSA) for version control and managing “releases” for the test beds • Each site does its own version mgmt • Introduce error tracking (e.g., BUGZILLA) • Must have installation README docs • What scripting language? Jython for sure • Begin work on global naming conventions, ontologies • Look-ahead ADAS and mining work • Orchestration performed by control scripts only • Prototype runs on all 6 GWSTB sites (first instantiation at Unidata) • No Globus, portal, metadata, global file names • Visualization by user starting IDV via web start • Decoder is run by the script as a utility; not yet a web service
GWSTBs IDV IDDData Stream Decoders Prototype 1b (Finish 10/15/04) • Capability: Add THREDDS catalog capability + MyLEAD Automatic User driven • Create THREDDS catalog (meta data description must be portable) via the catalog generator operating locally at each site • THREDDS catalog points to the local file • Push (publish) derived catalog entry to MyLEAD (XML) along with “investigation history” • Still using a script for workflow and it invokes MyLEAD • This prototype runs on all 6 GWSTB sites • Issues • How does XML meta data from THREDDS get to MyLEAD? Is it a reference or the data? Don’t want to decode the XML file • Need to decide upon meta data language for the long haul • We have decided to use the NMI version of Globus.
GWSTBs IDV IDDData Stream Decoders Prototype 1c (Finish 11/1/04) • Capability: Add Globus and Portal for inter-site operability on at least 2 GWSTB sites; DEMO FOR SC2004 Automatic User driven • Move a file (GRIB, decoded GRIB written to NetCDF, or IDV bundle) from one GWSTB to another • OpenDAP used for IDV access to decoded data on GWSTB as well as for remote access on multiple servers) • IDV, Eta decoder become services • Bring in portal to remotely invoke services (IDV) • If time permits, bring in a second data source to demonstrate data fusion • In general, inter-site functions will first be tested at 2 sites and then expanded to all as appropriate • Comment: As more grid functionality is added, fewer capabilities will be duplicated among all of the sites. For example, Oklahoma may focus on Level II data and run ADAS. • NCSA hosts primary (24/7) portal on their new Sun system and IU will help develop/implement/support it. IU will support the development version.
GWSTBs IDV,NCL IDDData Stream Decoders Prototype 1d (Finish 11/1/04) • Capability: Add raster graphics via NCL service to be developed by Don Middleton at NCAR Automatic User driven • Supplement IDV with an NCL (NCARgraphics Command Language) script-based service • Will allow for the creation of basic raster graphics • Ideal if this is available by SC2004 • To be added
GWSTBs IDV,NCL IDDData Stream Decoders Prototype 1e (Finish 12/15/04) • Capability: Prototype 1c/1d functional on all 6 GWSTBs Automatic User driven
GWSTBs IDV,NCL IDDData Stream Decoders Prototype 1f (Finish 2/1/05) • Capability: On at least 2 GWSTBs, bring in workflow composition (using OGRE) as a replacement for previous script and invoke from the portal; add monitoring and high-level querying Automatic User driven • Use graphical composition tool (with only a few service “boxes,” i.e., the Eta decoder, IDV, NCL, GridFTP, MyLEAD) to write OGRE script • Bring in monitoring via OGRE scripts and use graphical composition tool to view monitoring data • Using the portal, a user asks for the NetCDF file that IDV and/or NCL will visualize via high-level attributes which are tracked by the VO Catalog Service (registry) and Ontology Service (first occurrence) • Standardize global naming conventions • Re-evaluate the role of the GWSTBs (see previous slide) and consider expanding 1f capability as appropriate
GWSTBs IDV,NCL IDDData Stream Decoders Prototype 1g (Finish 4/15/05) • Capability: Add other 5 data streams (METAR, NEXRAD Level II and III, Upper-Air, GOES Satellite ) and their decoders Automatic User driven • All decoders are available now and provide for NetCDF formatting • All decoders must be invoked as services • Add monitoring of streaming NEXRAD Level II data • ESML comes into play to provide OpenDAP access to Level II data, with ESML being invoked as a service • Can do a lot of data fusion in IDV • ADDE needs to be added to provide transport and subsetting for these new data types • Issue • Evaluate the role of ESML service for the future, where we will have many data sets and not necessarily decoders for all of them
Prototype 2a (Finish 4/15/05) GWSTBs IDV,NCL IDDData Stream Decoders ADAS ADAS Output • Capability: Add ADAS as a service • Virtually everything within ADAS is hidden; it is used as a black box, with options for data types only. • Capabilities like EXT2ARPS will be made services later • ADAS needs to produce NetCDF output directly, not via ADAS2WRF • ADAS will be run over a fixed CONUS domain • Things to monitor: computational performance, data stream & service integrity (fault tolerance issues). Users need to give more guidance on this to DR. • Sequence is to get ADAS running as a bundled application that might be useful to broader Unidata community; perhaps make this a separate (preceding) prototype step • Once this is working, convert ADAS to a service • Workflow needs to recognize ADAS as a single service • Not clear how many sites will run ADAS (at least must run at Oklahoma) • How is monitoring output used for fault tolerance? Must dig into the issue of fault tolerance more broadly.
Prototype 2b (Finish 5/1/05) GWSTBs IDV,NCL IDDData Stream Decoders ADAS ADAS Output • Capability: Add geo-reference GUI to sub-set domains of interest • Issues • How is the bounding box information incorporated into the workflow? As a query to MyLEAD or THREDDS • The bounding box also must send domain configuration information to the ADAS input file – as is now done in Metrocast at WDT. Need to determine how WDT subsets the data – by obs based upon the bounding box or simply subsets on the analysis • This tool will operate at a high level within the workflow • Must now be able to visualize any observational data set individually or in combination using IDV, or individually using NCL • The GUI will need to use bounding boxes as well as be able to specify individual state • We want the subsetting to be done on the data prior to being used…as they’re decoded
Prototype 3 (Finish 5/1/05) IDV,NCL GWSTBs IDDData Stream Decoders ADAS ADAS Output • Capability: Add WRF model as a service to make a single, single-grid forecast using ADAS as init cond WRFModel WRFOutput IDV,NCL • This tool will operate at a high level within the workflow • Must now be able to visualize any observational data set individually or in combination using IDV, or individually using NCL (probably comes earlier) • The sequencing is to bundle WRF as an application and then convert it to a service. • We must monitor those things for which the information can be used in a meaningful way (e.g., failed simulation, ADAS, data drop-out, data delays).
Prototype 4 (Finish 5/1/05) IDV,NCL ADAM GWSTBs IDDData Stream Decoders ADAS ADAS Output IDV,NCL • Capability: Add selected components of ADaM data mining as a series of services WRFModel WRFOutput ADAM • This tool will operate at a high level within the workflow • The sequencing is to bundle ADaM as an application and then convert it to a service. • Individual ADaM components can be separate services (e.g., MDA), and ADaM can have its own set of workflows. • We will start with a single “black box” version of ADaM that does one thing (in the spirit of how ADAS is being used initially) • ADaM will take ADAS and WRF output and search for features, say at specific times and in specific regions. The output from ADaM needs to feed back to MyLEAD. Meteorologists need to provide the information needed to train AdaM. • Keep in mind that output from ADaM eventually will be triggering other tasks, so the meta data and MyLEAD need to take this into account. • Streaming data mining is not in the critical path for these prototypes but can be done as background research, e.g., mining of Level II data using 88D algorithms