560 likes | 675 Views
CUAHSI HIS Service Oriented Architecture, Web Services, WaterML. Hydrologic Information System Service Oriented Architecture. Test bed HIS Servers. HIS Lite Servers. Central HIS servers. External data providers. Global search (Hydroseek). Deployment to test beds.
E N D
CUAHSI HIS Service Oriented Architecture,Web Services,WaterML
Hydrologic Information System Service Oriented Architecture Test bed HIS Servers HIS Lite Servers Central HIS servers External data providers Global search (Hydroseek) Deployment to test beds Customizable web interface (DASH) Other popular online clients HTML - XML Desktop clients Data publishing WaterOneFlow Web Services, WaterML WSDL - SOAP Ontology ETL services Controlled vocabularies Metadatacatalogs GIS ODM DataLoader Matlab ODMTools IDL Streaming Data Loading Splus, R Excel Ontology tagging (Hydrotagger) Programming (Fortran, C, VB) Server config tools Modeling (OpenMI) WSDL and ODM registration
Point Observations Information Model USGS Data Source Return network information, and variable information within the network Streamflow gages Network Return site information, including a series catalog of variables measured at a site with their periods of record Neuse River near Clayton, NC Sites ObservationSeries Discharge, stage, start, end (Daily or instantaneous) Return time series of values Values 206 cfs, 13 August 2006 {Value, Time, Qualifier} • A data source operates an observation network • A network is a set of observation sites • A site is a point location where one or more variables are measured • A variable is a property describing the flow or quality of water • An observation series is an array of observations at a given site, for a given variable, with start time and end time • A value is an observation of a variable at a particular time • A qualifier is a symbol that provides additional information about the value
WaterML design principles Driven largely by hydrologists; the goal is to capture semantics of hydrologic observations discovery and retrieval Relies to a large extent on the information model as in ODM (Observations Data Model), and terms are aligned as much as possible Several community reviews since 2005 Driven by data served by USGS NWIS, EPA STORET, multiple individual PI-collected observations Is no more than an exchange schema for CUAHSI web services The least barrier for adoption by hydrologists A fairly simple and rigid schema tuned to the current implementation Conformance with OGC specs not in the initial scope – but working with OGC on this
- From different database structures, data collection procedures, quality control, access mechanisms to uniform signatures … Water Markup Language • - Tested in different environments • - Standards-based • - Can support advanced interfaces via harvested catalogs • - Accessible to community • - Templates for development of new services • Optimized, error handling, memory management, versioning, run from fast servers • Working with agencies on setting up services and updating site files
Locations Variable Codes Date Ranges WaterML and WaterOneFlow STORET Data GetSiteInfo GetVariableInfo GetValues Data NAM NWIS WaterML Data WaterOneFlow Web Service Data Repositories Client EXTRACT TRANSFORM LOAD WaterML is an XML language for communicating water data WaterOneFlow is a set of web services based on WaterML
Set of query functions Returns data in WaterML WaterOneFlow NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM12K, USGS SNOTEL, ODM (multiple sites)
WaterML key elements Response Types SiteInfo Variables TimeSeries Key Elements site sourceInfo seriesCatalog variable timeSeries values queryInfo GetSiteInfo GetVariableInfo GetValues
variablesResponse variables 1 many timeSeriesResponse variable queryInfo timeSeries criteria sourceInfo queryURL variable values Structure of responses sitesResponse queryInfo site criteria siteInfo seriesCatalog 1 queryURL many series variable variableTimeInterval
More Information about WaterML…next 20 slides… Or check the specification online at http://www.opengeospatial.org/standards/dp
Elements Defining Spatial Location SourceInfoType for observation sites for continuous surfaces SiteInfoType DatasetInfoType (other site information) child elements (other dataset information) GeogLocationType GeogLocationType LatLonPointType LatLonPointType LatLonBoxType
SiteInfoResponseType • Namespaces • queryInfo • site Network Sites Variables
userparameters query URL queryInfo example • Parameters sent to service • URLs called (if external resource)
siteInfo • Name • Site Code • Location
geoLocation • geogLocation – geographic coordinates • LatLon point • LatLon box • localSiteXY – projected coordinates
series • variable – what is measured • valueCount – how many measurements • variableTimeInterval – when is it measured TimePeriodType
variable • variableCode – global identifier • variableName • units Sites Variables Values TimePeriodType
Compare with… variableTimeInterval • TimePeriodType – date range (including “last n days” • TimeInstantType – single measurement
queryInfo name code location site seriesCatalog Series how many variables when SiteInfo response TimePeriodType
VariablesResponseType • variable – same as in series element • Code, name, units Sites Variables Values
TimeSeriesResponseType • queryInfo • timeSeries • sourceInfo – “where” • variable – “what” • values Sites Variables Values
sourceInfo • SiteInfoType • Same as siteInfo element • code, name, location • DataSetInfoType • For data continuous in space • LatLonPointType • LatLonBoxType
values • Each time series value recorded in value element • Timestamp, plus metadata for the value, recorded in element’s attributes qualifier ISO Time value
value metadata examples • qualifiers • censorCode (lt, gt, nc) • qualityControlLevel (Raw, QC’d, etc.) • methodID • offset • offsetValue • offsetUnitsAbbreviation • offsetDescription • offsetUnitsCode
TimeSeries response queryInfo location variable values
OGC Harmonization Best Practices WaterML text includes steps for harmonizing with GML/O&M Align spatial feature descriptions (e.g. using gml:Point, gml:Envelope) Align service signatures (getCapabilities) Align terminology with O&M
Newest Developments USGS NWIS Daily Values web service EPA WQX services NCDC prototyping WaterML as format for data delivery
USGS Values • SDSC hosts a database catalog of USGS sites and series information • GetValues method now hosted at USGS • Follows the CUAHSI Webservices, and returns WaterML TimeSeriesResponse • Our service now proxies the USGS service instead of scraping the web site • More services to be developed (Real Time is next)
EPA Web Services • EPA now provides web services http://www.epa.gov/storet/web_services.html • The web services uses WQX, an implementation of the Environmental Sampling, Analysis and Results data standard. • Using EPA Webservices, instead of scrapping is over an order of magnitude increase. • Issues: WQX is based on the EPA data model (org-(Analysis-Location-(Result))) whereas WaterML is time-series oriented (Site-Variable-(Result). • We are working on mapping WQX in WaterML.
Mapping WQX results to WaterML TimeSeries call to StationWebService WQX WaterML • TimeSeries • Site • SiteInfo • Variable • VariableName • Units • Values • DataValue • DateTime • Value • Qualifiers • Method • DataValue • DateTime • Value • Qualifiers • Method • Qualifier • Methods WQX-Each activity produces one (or more) WaterML DataValue • Organization • Activity • ActivityDescription • ActivityStartDate • Details • MonitoringLocation • StationID and Name only • Result • Result Details • CharacteristicName (variable) • [ResultMeasureValue,Unit] ResultMeasure (DataValue) • Qualifier • BiologicalResultDescription • Details • ResultLabInformation • AnalysisStartDate • ResultAnalyticalMethod • Activity • ActivityDescription • ActivityStartDate
StoretResultService GetResults(Organization, MonitoringLocation, … CharacteristicName)
Hydrologic Information System Server: Software Stack, Deployment, Operation
Hydrologic Information System Service Oriented Architecture Test bed HIS Servers HIS Lite Servers Central HIS servers External data providers Global search (Hydroseek) Deployment to test beds Customizable web interface (DASH) Other popular online clients HTML - XML Desktop clients Data publishing WaterOneFlow Web Services, WaterML WSDL - SOAP Ontology ETL services Controlled vocabularies Metadatacatalogs GIS ODM DataLoader Matlab ODMTools IDL Streaming Data Loading Splus, R Excel Ontology tagging (Hydrotagger) Programming (Fortran, C, VB) Server config tools Modeling (OpenMI) WSDL and ODM registration
Supports data discovery, delivery and publication Data discovery – how do I find the data I want? Map interface and observations catalogs Metadata based Search Data delivery – how do I acquire the data I want? Use web services or retrieve from local database Data Publication – how do I publish my observation data? Use Observations Data Model Hydrologic Information Server
GetSites GetSiteInfo GetVariables GetVariableInfo GetValues Hydrologic Information Server WaterOneFlow services DASH – data access system for hydrology ArcGISServer Geospatial Data Observations Data & Catalogs Microsoft SQLServer Relational Database
Deployment Overview HIS Server Machines are staged at SDSC Base software components installed (Microsoft, ESRI) All HIS components installed: ODM Web Services and templates DASH = Data Access System for Hydrology + additional tools Servers can be accessed remotely, before shipped to testbeds
COTS Software HIS Applications Data Windows 2003 Server 4 GB Ram 500 GB Disk Quad Core CPU SQLServer 2005 IIS (Internet Information Server) Visual Studio 2005 NWIS DV ArcGIS Server DASH ODM tools NWIS IID ODDataLoader GIS Data Mxd Services WaterOneFlow Web Services Your ODM ArcGIS 9.2
System Disk HIS software Data Disk • Operating System • Program Files • - SQL Server • IIS • Visual Studio • - ArcGIS • - ArcGIS Server • -WaterOneFlow • Web Services • - DASH • - ODM Data Loader • ODM Tools • WSTestPage GIS Data SQL Data (ODM) P: 180Gb C: 50Gb O: 230Gb
6 5 4 2 3 1 WORKGROUP HIS SERVER ORGANIZATION STEPS FOR REGISTERING OBSERVATION DATA DASH Web Application Web Configuration file Stores information about registered networks MXD Stores information about layers Layer info,symbology, etc. WSDLs, web service URLs Connectionstrings Spatial store WOF services NWIS-IID points NWIS-IID WS USGS SQL Server NWIS-DV points NWIS-DV WS NWIS-IID NCDC ASOS points ASOS WS NWIS-DV STORET points STORET WS ASOS EPA TCEQ points TCEQ WS STORET BearRiver points BearRiver WS TCEQ TCEQ . . . . . . More WS fromODM-WS template More synced layers BearRiver My new points My new WS . . . More databases Background layers(can be in the same or separate spatial store) Geodatabase or collection of shapefilesor both Web services from a common template My new ODM ODMs and catalogs. All instances exposed as ODM (i.e. have standard ODM tables or views: Sites, Variables, SeriesCatalog, etc.) ODMDataLoader
New network registration steps Using the ODM DataLoader or other, load your data into a blank ODM instance (this will create all ODM tables that HIS relies on) Copy Web Services template to a new folder, edit the template web.config file to point to the new ODM, test to make sure the new service works as expected Create a point layer (a feature class in GDB, or a shapefile) from the new ODM’s Sites table using the GetSitesTool. Add the point layer to the MXD document, specify symbology, scale-dependent rendering, etc. Add information about the new ODM, the associated web service, and the associated point layer, to HIS configuration file (see the first slide for the exact content) Restart the HIS service Register and test the new service at the HIS Central: http://water.sdsc.edu/centralhis/ 1 2 3 4 5 6 7
Administration and Updates Admin accounts: local + remote (for SDSC, troubleshooting) Updating software: DASH and ODM Tools New versions on web site, with installation instructions ODM Data Loader ClickOnce deployment Regular software updates and patches for COTS Need to first try at SDSC; post on wiki Updating databases: Regularly updated at SDSC; available to workgroups via web services and direct connection to disrupter.sdsc.edu Updating web services New templates posted on web site, with instructions Information for developers: river.sdsc.edu/wiki/
Challenges: information model (1/3) Sites STORET has stations, and measurement points, at various offsets… Site metadata lacking and inconsistent (e.g. 2/3 no HUC info, 1/3 no state/county info); agency site files need to be upgraded to ODM… A groundwater site is different than a stream gauge… Censored values Values have qualifiers, such as “less than”, “censored”, etc. – per value. Sometimes mixed data types.. Units There are multiple renditions of the same units, even within one repository There may be several units for the same parameter code (STORET) If no value recorded – there are no units?? Unit multipliers E.g. NCDC ASOS keeps measurements as integers, and provides a multiplier for each variable Sources STORET requires organization IDs (which collected data for STORET) in addition to site IDs Time stamps: ISO 8601 Many in local times; how to convert? (UTC offsets given lat/lon and date?)
Values retrieval USGS: by site, variable, time range EPA: by organization-site, variable, medium, units, time range NCDC: fewer variables, period of record applies to site, not to seriesCatalog Now: web services stated to appear at agencies – more later Variable semantics Variable names and measurement methods don’t match E.g. NWIS parameter # 625 is labeled ‘ammonia + organic nitrogen‘, Kjeldahl method is used for determination but not mentioned in parameter description. In STORET this parameter is referred to as Kjeldahl Nitrogen. One-to-one mapping not always possible E.g. NWIS: ‘bed sediment’ and ‘suspended sediment’ medium types vs. STORET’s ‘sediment’. Ontology tagging, semantic mediation (to be presented later) Challenges: information model (2/3)
Challenges: data publication (3/3) Several modes of ODM publication Setup your own HIS Server (or a virtual server at SDSC), register data to Central HIS Prepare ODM and submit it to Central HIS for publication Submit raw data to Central HIS for publication What are your preferences? Security of the published data, and user agreements Simple tools for loading and massaging data in ODM From static files Streaming data, from different sensors Who is responsible for data quality and re-usability? Data shall be tagged with ontologies, conform with controlled vocabularies, discoverable. Data shall be curated and annotated How we extend the model to other data types
Additional materialsnot covered earlier… • On integration with RBNB • On CUAHSI HIS role as the community mediator • On data cubes • Work with us…
RBNB DataTurbine (Ring Buffered Network Bus) • Scalable, secure, programmable, versatile for different data types and vendor interfaces,developer community, with many applications written (e.g. data viewers and plug-ins), open source, high performancestreaming (10mb/s, 1000 frames/s) • Typical scenarios: CS loggers DBMS Loggernet CS loggers Monitoring and management apps NI loggers Other proprietary or in-house