180 likes | 303 Views
HYDROLOGIC METADATA CATALOG AND SEMANTIC SEARCH SERVICES IN CUAHSI HIS. Thomas Whitenack David Valentine, Ilya Zaslavsky, Michael Piasecki, David G. Tarboton, Jeffery S. Horsburgh, Timothy Whiteaker, Daniel Ames, David R. Maidment. http://his.cuahsi.org/. CUAHSI HIS
E N D
HYDROLOGIC METADATA CATALOGAND SEMANTIC SEARCH SERVICES IN CUAHSI HIS Thomas Whitenack David Valentine, Ilya Zaslavsky, Michael Piasecki, David G. Tarboton, Jeffery S. Horsburgh, Timothy Whiteaker, Daniel Ames, David R. Maidment http://his.cuahsi.org/ CUAHSI HIS Sharing hydrologic data
CUAHSI HIS The CUAHSI Hydrologic Information System (HIS) is an internet based system to support the sharing of hydrologic data. It is comprised of hydrologic databases and servers connected through web services as well as software for data publication, discovery and access. Data Discovery and Integration platform HIS Central Like search portals Google, Yahoo, Bing Metadata Search Metadata Services Service and data theme metadata Service registration Data carts Catalog harvesting Data Services HydroServer HydroDesktop Data Synthesis and Research platform Data Publication platform Water Data Services Spatial Data Services
What is the Hydrologic Metadata Catalog? • Database for the HIS Central registry and Search Services. • Stores Site, Variable, and Series information, plus general metadata for each registered service. Data Values are not in the Catalog. • Purpose is to provide ability to search across federated services to provide information which lead client applications to data values.
HIS Central • HIS Central is a web application where you can register Water Data Services into the Hydrologic Metadata Catalog.
Hydrologic Metadata CatalogHarvesting • Each registered water data service is harvested using the standard Water Data Service methods: • GetSites • Returns list of each site record for the available from the service • GetSiteInfo • For each site this request is made. • Returns • All variables monitored at the site • Period of record for each variable • The Number of values available
Ontology • Keyword Hierarchy used to categorize and assist in the discovery of monitored variables. • Each Variable is “tagged” to a keyword concept.
Storing the Ontologyin the database Concepts Hierarchy ConceptPaths
Ontology Service Methods • getSearchableTerms • Simply returns a list of all searchable Keyword Concepts. Searchable concepts include “branch” concepts as well as “Leaf” concepts. Higher level branches are not included as they are too broad. • getOntologyTree • By passing in a “Branch” concept, it returns the ontology terms below it in a tree structure. (Passing “HydroSphere” returns then entire ontology). • getWordList • Passing a substring, such as “temp” returns all keywords which contain that sequence of characters. This is intended as an usibility feature for the client applications.
Search Service Methods (1/3) • GetWaterOneFlowServiceInfo • Returns a list of all the services with which are registered with HIS Central. • GetServicesInBox • Same as GetWaterOneFlowServiceInfo method, but restricted by geographic envelope. These methods both return the following information: WSDL endpoint for Water Data service, title, name, organization, contact info, estimated number of values, number of sites, number of variables, and geographic extent.
Search Service Methods (2/3) • GetSitesInBox • Requires • Geographic extent (box) • Concept Keyword (can be empty) • NetworkIDs (used to restrict returned values, can be empty). • Returns information necessary to display sites on a map and request more information about series.
Search Service Methods (3/3) • GetSeriesCatalogForBox • The primary method for searching the catalog. Returns series record information. Client application uses this information to request the data values from the registered service. • You provide: • Geographic extent (box) • Temporal extent (begin/end dates) • Concept Keyword (can be empty) • NetworkIDs (used to restrict returned values, can be empty).
What info is in a Series Record you ask?Everything required to create a datacart. • SeriesRecord • ServCode - (string) services unique code – “nwis” • ServURL– (string) wsdl address of service • Location – (string) site code • VarCode- (string) variable code associated with the series • Varname –(string) variable name • beginDate – (string) start date of series • endDate – (string) end date of series (as of last harvest). • Authtoken – (string) unimplemented • ValueCount – (int) number of values in series • Sitename –(string) site name • Latitude –(double) • Longitude – (double) • datatype –(string) • valuetype –(string) • samplemedium –(string) • timeunits –(string) • conceptKeyword –(string) Ontology keyword to which this variable is tagged • genCategory –(string) • TimeSupport –(string)
Hydrologic Metadata Catalog Stats • Services • Variables • Sites • Series • Values referenced: • 47 • 4,812 • 1,889,199 • 8,516,440 • 4,622,778,988
Future Development • Need to standardize the services to use WaterML data exchange format. • Need to Harvest data directly from HydroServer capabilities services. • Need to extend the search to allow for other geometries to search by, besides envelope. (HUCs, counties, etc).
Conclusions • Searching across multiple, federated services is made possible by harvesting and indexing metadata from registered services. • Metadata is data. The catalog pushes the limits of what is metadata
Questions? • twhitenack@sdsc.edu • http://hiscentral.cuahsi.org • http://hiscentral.cuahsi.org/webservices/hiscentral.asmx