290 likes | 390 Views
Towards Personalized and Active Information Management for Meteorological Investigations. Beth Plale Indiana University USA. Problem Statement. Mesoscale meteorology research is highly data-driven .
E N D
Towards Personalized and Active Information Management for Meteorological Investigations Beth Plale Indiana University USA
Problem Statement • Mesoscale meteorology research is highly data-driven. • Large percentage of data streams in from observational platforms. Available in OPeNDAP servers. • Data that is over 10 minutes old is too old. • Researchers are currently working on increasing real-time responsiveness to developing weather conditions. • Mesoscale meteorology is a vast information space. • Forecasting models assimilate data from growing number of sources
Solution Statement • Internet has proven the utility of user-oriented view towards information space management • Browser, bookmarks to organize • Blogs, web page tools (FrontPage, Dreamweaver) to publish • We apply concept of user-oriented view to management of mesoscale meteorology information space. • myLEAD: tool to help an investigator make sense of, and operate in, the vast information space that is mesoscale meteorology.
Motivation for LEAD • Each year, mesoscale weather – floods, tornadoes, hail, strong winds, lightning, and winter storms – causes hundreds of deaths, routinely disrupts transportation and commerce, and results in annual economic losses > $13B.
Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites
Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields
Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Prediction PCs to Teraflop Systems • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields
Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites • Product Generation, • Display, • Dissemination Prediction PCs to Teraflop Systems • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields
Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites • Product Generation, • Display, • Dissemination Prediction PCs to Teraflop Systems • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields • End Users • NWS • Private Companies • Students
Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites • Product Generation, • Display, • Dissemination Prediction PCs to Teraflop Systems • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields The process is entirely serial and pre-scheduled: no responseto weather! • End Users • NWS • Private Companies • Students
The LEAD Vision: No Longer Serial or Static OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites • Product Generation, • Display, • Dissemination Prediction PCs to Teraflop Systems • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields • End Users • NWS • Private Companies • Students
The LEAD Vision: No Longer Serial or Static OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites • Product Generation, • Display, • Dissemination Prediction PCs to Teraflop Systems • Analysis/Assimilation • Quality Control • Retrieval of Unobserved • Quantities • Creation of Gridded Fields • End Users • NWS • Private Companies • Students
LEAD data: initial working data set • ETA model gridded analysis • METAR surface observations • Rawinsondes – upper air balloon observations • ACARS – commercial aircraft temperature and wind observations • NEXRAD Level II data • GOES visible satellite data
Returning to Solution Statement • We apply concept of user-oriented view to management of mesoscale meteorology information space. • myLEAD: tool to help an investigator make sense of, and operate in, the vast information space that is mesoscale meteorology.
Information space management tool • At core is metadata catalog • Why? Observational products already being stored elsewhere. • Public file and could be large, so do not want to copy user’s file system. Instead maintain “bookmark” • Scale to support thousands of distributed users, including individual investigators, pre-college classroom investigators, casual observers.
Technical Challenges • Querying must be efficient • Over data products described by rich domain-specific metadata • Over data products whose description can be augmented over time • Obtaining metadata is hard • Automate as much as possible • Privacy must be fully enforced • Any data product that user designates as private must remain private • Publishing • Publish product to larger community: • data file, model output, full experiment • Must be under user control • Discovery of information that has been made public • Build trust • User may work within myLEAD space for 5 years of graduate work, for instance • User must be convinced of privacy, reliability, longevity, etc.
Rundown on Implementation Specs • Building on top of MCS and OGSA-DAI • MCS for extensible db schema, general db schema, and security infrastructure already in place • OGSA-DAI for grid/web service architecture • Database used is mySQL 5.0 • Supports stored procedures • Ogsa-dai to mySQL is JDBC • Data product descriptions in and out of database conform to LEAD-specific XML schema. • myLEAD server and myLEAD agent are written in java.
Related Work • mySpace – AstroGrid, UK • Similar to myLEAD in reigning information space • Creates swatches in large federation of data archives for the cache and persistent data for a “community” • Provides common query access over cache space and persistent space • RDF (Resource Description Framework) • Basic building block is the subject-predicate-object triple: • [S] – P -> [O] [Dickens] – hasWritten -> [Pickwick Papers] • Good for storing detailed relationship information (good for understanding the relationship between two terms) • NEESgrid – NCSA • Uses RDF • Little available in public literature • myGrid Information Repository (MIR) – myGRID, Manchester • Most similar to myLEAD • Support for text search scientific papers, uses Life Sciences Identifier (LSID) • myLEAD stronger personal orientation (gurantees, publishing, automatic metadata generation)
myLEAD Architecture User interface Portal access to myLEAD myLEAD agent MCS client MCS myLEAD OGSA-DAI JDBC myLEAD stored procedures data model relational DB Client side services Server side services myLEAD service
myLEAD use scenario myLEAD portlet as component of LEAD portal Factory myLEAD service myLEAD “agent” instance workflow Storage Repository Service (RLS) Data mining task WRF model /var/tmp/wrf_tmp NCSA IU Workflow confers with myLEAD “agent” to determine location of scratch space
Metadata Catalog Data Model Abe Bing Caru • Users • Investigations • Tornado April 20 Chicago Illinois • Experiments • Ensemble: run of 100 simultaneous forecast models parameterized slightly differently • Collections • Logical files • Input observational files, input parameters, derived files, analysis results, images, model results, workflows, execution status messages
Attributes stored in “type” tables: i.e., string, float, temporal, int. Great extensibility, but need to carefully control naming; efficient querying could be an issue as well. Data Model Investigation Logical file User – Dublin Core Collection
Wrf-out3-26Oct04:13:43:15 Wrf-out2-26Oct04:13:37:25 Wrf-out1-26Oct04:13:35:40 Data Model myWorkspace: J. Kowaleski preferences Workflow template vizEta 03Aug04:13:35:40 Workflow template WRF 15May04:05:25:59 Browser provides user a hierarchical view of space that is essentially flat. Users like hierarchy. Favorite spaces Home disk space Thor cluster scratch space Experiment 1: Norman, OK 21Oct04:23:11:45 Collection level Input observational NEXRAD 26Oct04:13:45:40 Logical file level GOES-infrared 26Oct04:12:00:00 METAR 26Oct04:09:10:05 Input parameters WRF-out Have associated a set of attributes that describe this data product workflow instance
myLEAD agent • Separate transient grid/web service • Has state about user, current investigation and experiment • Embeds myLEAD client API • Purpose: • Controls naming • Helps use database structure in repeatable, meaningful way • Maintains FSM of current state of execution; stores into new collection based on state • Input model run analysis final results • Derives metadata attributes for new data product object when created during course of workflow by means of: • Case-based reasoning • Internal state • Consulting ontology
Data Product Metadata Resources: “things that need describing (i.e., metadata)” Data mining
Current Research Challenges • Publishing • Publishing data product to larger community: • data file, model output, full experiment • Discovery of information that has been made public • Guarantees • Any data product that user designates as private must remain private • When request for product is issued, product must exist • Flexible yet efficient schema • Inherited from MCS, supports evolved understanding of data product over time by means of extended attributes • Immutable investigations • Collections, views, and logical files can be reused from earlier investigations without destroying integrity of earlier investigation • Proactive agent • Infers metadata attributes from context of active experiment using case-base reasoning.
4 days away from our national elections … wish us well. Beth Plale plale@cs.indiana.edu