140 likes | 262 Views
DataFed: View-Based Mediated Web Service Architecture. Rudolf B. Husar and Kari Hoijarvi Washington University, St. Louis Presented at IGARSS07, Barcelona, ES, July 22, 2007. Data View. User query. Mediator. Wrapper. Wrapper. Source 1. Source 2. Satellite. Satellite. NASA IDEA.
E N D
DataFed: View-Based Mediated Web Service Architecture Rudolf B. Husar and Kari Hoijarvi Washington University, St. Louis Presented at IGARSS07, Barcelona, ES, July 22, 2007 Data View User query Mediator Wrapper Wrapper Source 1 Source 2
Satellite Satellite NASA IDEA Emission Emission Emission Emission FS FireInv EPA NEISGEI Model NASA IDEA Emission Emission Emission Emission Ambient Ambient EPA AIRNow FS FireInv Ambient EPANEISGEI NOAA Forecast EPA-AQS DataMart NASAGloModel Model NASA DAACs Ambient Ambient Ambient Satellite Satellite EPAAIRNow EPAAIRNow EPA AQModel EPA-AQSAIRS EPA R&D Model NOAA WeaMod EPA NEI NASADAACs Model Model Ambient Ambient Ambient RPO VIEWS Emission Emission Emission Emission Internet NOAAForecast NOAA GASP EPAAQModel Content | Agency | Form NOAA ASOS Ambient Ambient Ambient Satellite Satellite Satellite Satellite NASA Missions Emission EPA NASA Mission EPA NEI RPOVIEWS Ambient NOAA State/Local Emission Emission Emission Emission Emission NASADAACs Satellite NOAAASOS NASA NOAA GASP NOAAWeaMod Model Other State/LocalEmission • Data are distributed geographically by autonomous providers others • Data includes emissions, ambient data, Content | Agency | Form Content | Agency | Form Content | Agency | Form Content | Agency | Form Content | Agency | Form Content | Agency | Form • Data are providedby multiple agencies: EPA, NOAA, NASA and others • Furthermore, data are provided in varied formats and access protocols Emission Emission Emission Emission Emission Emission EPA EPA EPA EPA EPA EPA Ambient Ambient Ambient Ambient Ambient Ambient NOAA NOAA NOAA NOAA NOAA NOAA • Data includes emissions • Data includes emissions, ambient data, satellite data and model output • Data includes emissions, ambient data, satellite data Satellite Satellite Satellite Satellite Satellite Satellite NASA NASA NASA NASA NASA NASA Model Model Model Model Model Model Other Other Other Other Other Other • Data on Internet are geography-independent and can be ‘linearized’ Information Landscape: Providers Geography, Content, Agency, Form
Public Scientist Manager Manager Scientist Manager Manager Policy Policy Policy Policy Policy Policy Public Public Scientist Manager Scientist Manager Scientist Manager Public Public Scientist Public Public Manager Scientist Internet Public Public Policy Policy Policy Policy Policy Public Public Scientist Policy Scientist Stakeholder | Agency | Form Stakeholder | Agency | Form Stakeholder | Agency | Form Stakeholder | Agency | Form Stakeholder | Agency | Form Stakeholder | Agency | Form Manager Policy Policy Policy Policy Policy Policy EPA EPA EPA EPA EPA EPA Policy Policy Policy Policy Policy Stakeholder | Agency | Form Stakeholder | Agency | Form NOAA NOAA NOAA NOAA NOAA NOAA Policy Public Public Public Public Public Public Scientist Policy Policy EPA EPA NASA NASA NASA NASA NASA NASA Manager Manager Manager Manager Manager Manager NOAA NOAA • Users includes policy makers, the public Public Public Other Other Other Other Other Other other • Users includes policy makers • Users includes policy makers, the public, AQmanagers Scientist Scientist Scientist Scientist Scientist Scientist and scientist NASA NASA Manager Manager • Users are affiliated with multiple agencies: EPA, NOAA, NASA, as well as others Other Other Scientist Scientist • Furthermore, users need various types of information provided in multiple formats • Users are distributed geographically • Since the users are also on the Internet, their geographic location is irrelevant Information Landscape: UsersTypes, Agency, Info Needs
Lets agree onSpace-Time-Parameter Data Access Query Protocol
Server GetCapabilities Client Std. Interface Std. Interface Capabilities, ‘Profile’ Back End Where? When? What? Which Format? Front End GetData Data OGS WCS/WMS Protocols Space-Time-Parameter queries T1 T2 Loose Coupling of Servcies
Jeff Ullman, 1998: Answering Queries Using Views User query View Mediator answers queries Collects data from wrappers or mediators Mediator Global Schema Query Result Wrapper (adapter) translates between the local and global language, model Query Result Wrapper Wrapper Query Query Result Result Source 1 Source 2 Heterogeneous sources Jeff Ullman, Stanford, 1998
Wrapper Classes:Point, Image, Grid 5Dim Data ModelCommon Views
Anatomy of a Wrapper Service: TOMS Satellite Image Data Daily TOMS images on FTP archive ftp://toms.gsfc.nasa.gov/pub/eptoms/images/aerosol/y2000/ea000820.gif Template: ftp://toms.gsfc.nasa.gov/pub/eptoms/images/aerosol/y[yyyy]/ea[yy][mm][dd].gif • Wrapper Service can access and spatially subset image for any day (WMS) • Wrapper Service and mediation is performed by third party • This makes a non-intrusive, adoptive system for agile networking Image Description for Data Access: image_width=502 image_height=329 margin_bottom=105 margin_left=69 margin_right=69 margin_top=46 lat_min=-70 lat_max=70 lon_min=-180 lon_max=180 Transparent colors for overlays RGB(89,140,255) RGB(41,117,41) RGB(23,23,23) RGB(0,0,0)
Explore Data Info Needs Federate Data Understand Providers Viewers Emission Surface Satellite Model Single Datasets Reports Slice & Dice Programs AQ Compliance Nowcast/Forecast Status & Trends Find Data Gaps ID New Problems ……… Wrappers Structuring Integrate What? When? Where? • The info system infrastructure needs to facilitate the creation of info products • Providers supply the ‘raw material’ (data and models) for ‘refined’ info products • The challenge is to design a general supportive infrastructure Data Users Data Providers Non-intrusive Linking & Mediation • Simply connecting the relevant provides and users for each info product is messy • Structuring the heterogeneous data into where-when-what ‘cubes’ simplifies the mess • The ‘cubed’ data can be accessed and explored by slicing-dicing tools • More elaborate data integration and fusion can be done by web service chaining • This infrastructure support for IDAQ can be provided by the ESIP Federation Integrated Data System for Air Quality-IDAQ ESIP AQ Cluster 050510 Draft rhusar@me.wustl.edu
WMS, WCS OGC Services Model-Data Comparison Workflow Software VIEWSChemical Data Ft. Collins, CO Std I/O Model-Data Comparison Workflow AeroCom Chemical Models Paris, FR Std I/O Workflow Flow ProgramLego-like assembly of component
DataFed: 100+ Datasets Non-intrusively Federated Near Real Time Data Integration Delayed Data Integration Surface Air Quality AIRNOW O3, PM25 ASOS_STI Visibility, 300 sites METAR Visibility, 1200 sites VIEWS_OL 40+ Aerosol Parameters Satellite MODIS_AOT AOT, Idea Project GASP Reflectance, AOT TOMS Absorption Indx, Refl. SEAW_US Reflectance, AOT Model Output NAAPS Dust, Smoke, Sulfate, AOT WRF Sulfate Fire Data HMS_Fire Fire Pixels MODIS_Fire Fire Pixels Surface Meteorology RADAR NEXTRAD SURF_MET Temp, Dewp, Humidity… SURF_WIND Wind vectors ATAD Trajectory, VIEWS locs. • Data are accessed from autonomous, distributed providers • DataFed ‘wrappers’ provide uniform geo-time referencing • Tools allow space/time overlay, comparisons and fusion
Summary • Third-party mediation can homogenize distributed ES data • Agile SOA-based IS can deliver diverse info products to users • Since 2005, one such IS, DataFed is used by EPA and in research • For networking, more data and services need to be federated Parting thoughts Think outside the stovepipe – Think networking Divide and Conquer, NO! Connect and Enable, YES! Thank you