1 / 30

Web Services and Water Markup Language for Distributed Hydrologic Data Access

Web Services and Water Markup Language for Distributed Hydrologic Data Access. Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of Universities for the Advancement of Hydrologic Sciences, Inc.; HIS = Hydrologic Information System

makaio
Download Presentation

Web Services and Water Markup Language for Distributed Hydrologic Data Access

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of Universities for the Advancement of Hydrologic Sciences, Inc.; HIS = Hydrologic Information System NSF-supported Collaborative Project: UT Austin + SDSC + Drexel + Duke +Utah State www.cuahsi.org/his/

  2. The Grid is becoming the backbone for collaborative science and data sharing CI is about RE-USING data and research resources !!

  3. Cyberinfrastructure for hydrology (in the U.S.) • Hydrologic observations: • Reliance on federally-organized data collection (NWIS, STORET, NCDC, etc.) with huge and complex nomenclatures •  simplifying access to federal repositories •  relatively lower emphasis on data ownership • Handling time in both UTC and local • Various spatial offsets • Multiple data types: time series, fields, spatial data • Integrative discipline: • Interoperation with atmospheric, ocean, soils, geomorphology, social datasets and services… • Community: • Organized by “natural boundaries” •  networks of relatively autonomous self-managed data nodes • Partnership with public sector water management • 96% use Windows for research; Excel, ArcGIS, Matlab – most popular Mix of standards, software licensing models, vocabularies; leveraging tools developed in other CI projects.

  4. Hydrologic Information System Service Oriented Architecture DASH: Data Access System for Hydrology Information input, display, query and output services Preliminary data exploration and discovery. See what is available and perform exploratory analyses 3rd party servers Web services interface e.g. USGS, NCDC GIS Matlab Observatory servers Workgroup HIS IDL SDSC HIS servers Splus, R D2K, I2K Programming (Fortran, C, VB) Downloads Uploads HTML -XML Data access through web services WaterOneFlow Web Services WSDL - SOAP Data storage through web services

  5. The CUAHSI Community, HIS and WATERS Government: USGS, EPA, NCDC, USDA Industry: ESRI, Kisters, OpenMI CUAHSI HIS WATERS Network Information System HIS Team WATERS Testbed Domain Sciences: Unidata, NCAR LTER, GEON Super computer Centers: NCSA, TACC HIS Team: Texas, SDSC, Utah, Drexel, Duke CUAHSI: 116 Universities (Nov. 2006)

  6. CUAHSI HIS as a mediator across multiple agency and PI data • Keeps identifiers for sites, variables, etc. across observation networks • Manages and publishes controlled vocabularies, and provides vocabulary/ontology management and update tools • Provides common structural definitions for data interchange • Provides a sample protocol implementation • Governance framework: a consortium of universities, MOUs with federal agencies, collaboration with key commercial partners, led by renowned hydrologists, and NSF support for core development and test beds

  7. NASA Storet Ameriflux Unidata NCDC Remote CUAHSI HIS Node (Windows) Remote CUAHSI HIS Node (Windows) Remote CUAHSI HIS Node (Windows) Remote CUAHSI HIS Node (Windows) NCAR NWIS HODM HODM HODM HODM Web Web Web Web HDAS HDAS HDAS HDAS Web Web Web Web Services Services Services Services Service Service Service Service CUAHSI Web Services IIS Web Server IIS Web Server IIS Web Server IIS Web Server ASP ASP ASP ASP . . . . Net Net Net Net Web Web Web Web Excel Visual Basic ArcGIS ArcGIS ArcGIS ArcGIS SQL Server SQL Server SQL Server SQL Server Service Service Service Service Technologies Technologies Technologies Technologies proxies proxies proxies proxies ArcGIS C/C++ Matlab Fortran Data Data Data Data Access SAS Main Components • Hydrologic Observations Data Model, ODM databases and site catalogs • Web services for accessing hydrologic repositories and data in ODMs • Clients: Online Data Access System + multiple desktopapplication add-ons • Network of CUAHSI HIS servers, deployed at hydrologic observatories and integrated with other observing systems and sensor data collection

  8. USGS Data Source Return network information, and variable information within the network Streamflow gages Network Return site information, including a series catalog of variables measured at a site with their periods of record Neuse River near Clayton, NC Sites ObservationSeries Discharge, stage, start, end (Daily or instantaneous) Return time series of values Values 206 cfs, 13 August 2006 {Value, Time, Qualifier} Point Observations Information Model • A data source operates an observation network • A network is a set of observation sites • A site is a point location where one or more variables are measured • A variable is a property describing the flow or quality of water • An observation series is an array of observations at a given site, for a given variable, with start time and end time • A value is an observation of a variable at a particular time • A qualifier is a symbol that provides additional information about the value

  9. Challenges… (1/2) • Sites • STORET has stations, and measurement points, at various offsets… • Site metadata lacking and inconsistent (e.g. 2/3 no HUC info, 1/3 no state/county info); agency site files need to be upgraded to ODM… • A groundwater site is different than a stream gauge… • Censored values • Values have qualifiers, such as “less than”, “censored”, etc. – per value. Sometimes mixed data types.. • Units • There are multiple renditions of the same units, even within one repository • There may be several units for the same parameter code (STORET) • If no value recorded – there are no units?? • Unit multipliers • E.g. NCDC ASOS keeps measurements as integers, and provides a multiplier for each variable • Sources • STORET requires organization IDs (which collected data for STORET) in addition to site IDs • Time stamps: ISO 8601 • A service to determine UTC offsets given lat/lon and date??

  10. Challenges… (2/2) • Values retrieval • USGS: by site, variable, time range • EPA: by organization-site, variable, medium, units, time range • NCDC: fewer variables, period of record applies to site, not to seriesCatalog • Variable semantics • Variable names and measurement methods don’t match • E.g. NWIS parameter # 625 is labeled ‘ammonia + organic nitrogen‘, Kjeldahl method is used for determination but not mentioned in parameter description. In STORET this parameter is referred to as Kjeldahl Nitrogen. • One-to-one mapping not always possible • E.g. NWIS: ‘bed sediment’ and ‘suspended sediment’ medium types vs. STORET’s ‘sediment’. • Ontology tagging, semantic mediation

  11. NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM12K, ODM • - From different database structures, data collection procedures, quality control, access mechanisms  to uniform signatures … Water Markup Language • - Tested in different environments • - Standards-based • - Can support advanced interfaces via harvested catalogs • - Accessible to community • - Templates for development of new services • Optimized, error handling, memory management, versioning, run from fast servers • Working with agencies on setting up services and updating site files

  12. WaterOneFlow API, v. 1.0 • GetValues • Returns a TimeSeries • GetSiteInfo • Station Information, including a period of record • GetVariableInfo • Returns variable/parameter information • Also: GetSites, GetVariables • Object and string output

  13. WaterML design principles • Driven largely by hydrologists; the goal is to capture semantics of hydrologic observations discovery and retrieval • Relies to a large extent on the information model as in ODM (Observations Data Model), and terms are aligned as much as possible • Several community reviews since 2005 • Driven by data served by USGS NWIS, EPA STORET, multiple individual PI-collected observations • Is no more than an exchange schema for CUAHSI web services • The least barrier for adoption by hydrologists • A fairly simple and rigid schema tuned to the current implementation • Conformance with OGC specs not in the initial scope

  14. Response Types SiteInfo Variables TimeSeries Key Elements site sourceInfo seriesCatalog variable timeSeries values queryInfo WaterML key elements GetSiteInfo GetVariableInfo GetValues

  15. variablesResponse variables 1 many timeSeriesResponse variable queryInfo timeSeries criteria sourceInfo queryURL variable values Structure of responses sitesResponse queryInfo site criteria siteInfo seriesCatalog 1 queryURL many series variable variableTimeInterval

  16. queryInfo name code location site seriesCatalog what how many variables when SiteInfo response TimePeriodType

  17. TimeSeries response queryInfo location variable values

  18. Clients • Tested with .Net and Java • Desktop clients: Excel, Matlab, ArcGIS, VB.NET,more beingwritten • Web client: DASH (Data Access System for Hydrology): http://river.sdsc.edu/DASH(beta)

  19. Direct DB connection Current Deployment Architecture VS 2005 DASH ODM GIS Data Mxd Service WaterOneFlow Web Services ODM tools ODM Loader AGS Server SQL Server ArcGIS 9.2 IIS Windows 2003 Server 4 GB Ram 1 TB Disk Quad Core CPU

  20. 6 5 4 2 3 1 WORKGROUP HIS SERVER ORGANIZATION STEPS FOR REGISTERING OBSERVATION DATA DASH Web Application Web Configuration file Stores information about registered networks MXD Stores information about layers Layer info,symbology, etc. WSDLs, web service URLs Connectionstrings Spatial store WOF services NWIS-IID points NWIS-IID WS USGS SQL Server NWIS-DV points NWIS-DV WS NWIS-IID NCDC ASOS points ASOS WS NWIS-DV STORET points STORET WS ASOS EPA TCEQ points TCEQ WS STORET BearRiver points BearRiver WS TCEQ TCEQ . . . . . . More WS fromODM-WS template More synced layers BearRiver My new points My new WS . . . More databases Background layers(can be in the same or separate spatial store) Geodatabase or collection of shapefilesor both Web services from a common template My new ODM ODMs and catalogs. All instances exposed as ODM (i.e. have standard ODM tables or views: Sites, Variables, SeriesCatalog, etc.) ODMDataLoader

  21. HIS Scalability • Adding… • …data types and datasets; processing models and services; servers; users and roles – • - shall not create unmanageable bottlenecks that require system re-engineering • Designing for scalability: • Distilling a generic set of web service signatures; resolving semantic and structural heterogeneities • Using ODM as a common generic format for time series data, for ease of coding and uniform search interfaces • DASH GUI design to abstract specifics of disparate repositories • Leveraging common CI components developed in GEON • Working with agencies to remove web service bottlenecks

  22. Near future • Deployment at the 11 WATERS test beds, and beyond • And documenting experience • Organizing HIS support • Working with federal and state agencies on web services • NCDC, USGS, EPA, state agencies (e.g. TCEQ) • Analysis services for site catalogs and ODMs ( ---- see next slide) • OGC connections: WaterML is OGC Discussion Paper (approved at April 2007 TC Meeting) • Need to be reviewed further, based on initial implementation • Internationalization (with CSIRO WRON, European WISE, H2OML) • Carry CUAHSI WaterML messages over O&M, as O&M profile • Towards WaterML and web services 1.1

  23. US Map of USGS Observations Alaska Puerto Rico Hawaii Antarctica

  24. US Map of USGS Observations – by Mean Period of Record

  25. Different types of nutrients by decade: Available Data Total

  26. Some physical properties by decade: Available Data Total

  27. Same without discharge, gage height, temperature and precipitation (the four most common, in that order): Available Data Total

  28. Near future • Deployment at the 11 WATERS test beds, and beyond • And documenting experience • Organizing HIS support • Working with federal and state agencies on web services • NCDC, USGS, EPA, state agencies (e.g. TCEQ) • Analysis services for site catalogs and ODMs ( ---- see next slide) • OGC connections: WaterML is OGC Discussion Paper (approved at April 2007 TC Meeting) • Need to be reviewed further, based on initial implementation • Internationalization (with CSIRO WRON, European WISE, H2OML) • Carry CUAHSI WaterML messages over O&M, as O&M profile • Towards WaterML and web services 1.1

  29. SDSC Spatial Information Systems Lab http://scirad.sdsc.edu/datatech/si.html Research and system development • Services-based spatial information integration infrastructure • Mediation services for spatial data, query processing, map assembly services • Long-term spatial data preservation • Spatial data standards and technologies for online mapping (SVG, WMS/WFS) • Support of spatial data projects at SDSC and beyond In Geosciences (GEON, CUAHSI, CBEO,…) services In Neurosciences (BIRN, CCDB) In regional development (NIEHS SBRP, Katrina) Contact: zaslavsk@sdsc.edu

  30. Links and Acknowledgments • The CUAHSI HIS project: • http://www.cuahsi.org/his/ (main site) • http://water.sdsc.edu (central development server) • Many thanks to Microsoft Research for partly sponsoring this trip

More Related