430 likes | 603 Views
eXtended Metadata Registry (XMDR) for Ecoinformatics Test Bed Interagency/International Cooperation on Ecoinformatics Copenhagen, Denmark June, 20 2006. Bruce Bargmeyer Lawrence Berkeley National Laboratory and Berkeley Water Center University of California, Berkeley Tel: +1 510-495-2905
E N D
eXtended Metadata Registry (XMDR) for Ecoinformatics Test Bed Interagency/International Cooperation on Ecoinformatics Copenhagen, Denmark June, 20 2006 Bruce Bargmeyer Lawrence Berkeley National Laboratory and Berkeley Water Center University of California, Berkeley Tel: +1 510-495-2905 bebargmeyer@lbl.gov
XMDR Purpose • Improve data management through use of stronger semantics management • Databases • XML data • Enable new wave of semantic computing • Take meaning of data into account • Process across relations as well as properties • May use reasoning engines, e.g., to draw inferences
Vocabulary Management • Vocabulary Management is the first step for use of semantic technologies • Define concepts and relationships • Harmonize terminology, resolve conflicts • Collaborate with stakeholders • An approach • Select a domain of interest • Enter core concepts and relationships • Enter metadata describing enterprise data • Engage community in vocabulary review • Harmonize, validate and vet the vocabulary
Use XMDR • For vocabulary repository • Register, harmonize, validate, and vet definitions and relations • To register mappings between multiple vocabularies • To register mappings of concepts to data • To provide semantics services • To register and manage the provenance of data XMDR is part of the infrastructure for semantics and data management.
XMDR Use • Upside • Collaborative • Supports interaction with community of interest • Shared evolution and dissemination • Enables Review Cycle • Standards-based – don’t lock semantics into proprietary technology • Foundation for strategic data centric applications • Lays the foundation for Ontology-based Information Management • Content is reusable for many purposes • Downside • Managing semantics is HARD WORK- No matter how friendly the tools • Needs integration with other components
XMDR Project Participants • Collaborative, interagency effort • EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …& others • Draws on and contributes to interagency/ International Cooperation on Ecoinformatics • Involves Ecoterm, international, national, state, local government agencies, other organizations as content providers and potential users • Interacts with many organizations around the world through ISO/IEC standards committees • Expected to interact with R&D under EU 7th Framework Program
XMDR Update • Extended the capabilities to register more difficult kinds of metadata and concept systems • Linguistic ontologies (OMEGA) • Axiomatized ontologies (OpenCyc) • Created new draft of ISO/IEC 11179. Working Draft 4 out for Comment, Committee Draft 1 to go out in June. • Includes UML packages to make it easier to understand and easier to align with other standards • Looking at alignment with OASIS ebXML Registry • Worked on mapping existing 11179 MDR (E2) extended content to proposed Edition 3, particularly Cancer Data Standards Repository (caDSR).
XMDR Update • Created new version of XMDR prototype software keyed to ISO/IEC 11179 Working Draft 4. • Revised ontology • Revised software • Reloaded previous content • Loading new content (ongoing) • OMEGA linguistic ontology • Cancer Data Standards Repository (caDSR) • OpenCyc ontology • SIC – NAICS codes • Mapping of NAICS to SIC codes • Improved interface
XMDR for Ecoinformatics Test Bed • Demonstrate the use of the eXtended Metadata Registry (XMDR) to unite concept systems (such as ontologies) and metadata (which describes data) to support semantic services that help to answer tough questions. • Load selected concept systems and metadata into the XMDR and then utilize semantics technologies, including semantics services to make use of data to demonstrate the results. • The demonstration is intended to help answer questions that are swirling around emerging semantics technologies. • Do these open new doors? Answer new questions? • How does this fit into the rest of what EPA is doing? • How can EPA lead in the use of these new technologies? • Why and how should EPA invest in the infrastructure that is necessary to make effective use of semantic technologies? • How is EPA aligning? What is the EPA strategy?
XMDR in Ecoinformatics Test Bed Think of XMDR as “Embedded—an essential part of an infrastructure upon which applications are built. • Embed XMDR in EU FP7 project technology • Embed XMDR in traditional database application environment • Embed XMDR in new semantic computing environment
XMDR in Ecoinformatics Test Bed • Include XMDR (ISO/IEC 11179 Edition 3 in architectures – DoD, EPA, Federal Enterprise Architecture • Include XMDR as key enabling capability for Ecoinformatics • Looking for a collaborator who has the “rest of the story” that can demonstrate the utility of XMDR
XMDR Demonstration using Water InformationPotential Collaboration with the following: • USGS Terminology Web Services • EU FP 7 EcoSemantics project • GEOSS data integration • Water Information System for Europe • Water Data Infrastructure (WADI) • Berkeley Water Center (BWC) Microsoft Technical Computing Initiative (TCI) • BWC Digital Watershed Research Thrust Area • Estuarine and Great Lakes Program (EAGLES) • LBNL Environmental Modeling projects
XMDR in Ecoinformatics Test Bed Demonstrate capabilities: • Register existing and formative water related concept systems, based on their underlying structures, such as graphs of varying complexity. • Register water ontologies as they are developed. • Interrelate concepts systems with each other. • Support efforts to converge on consistency through harmonization and vetting activities. • Interrelate concepts in concept systems with concepts in metadata and concepts in databases, knowledgebases, and text. • Provide semantic services needed to support traditional computing as well as semantic computing. • E.g., dereferencing the URIs used in creating RDF statements, by providing relevant information describing the referenced concept and its authoritative standing within some community of interest.
Collaborate with USGS Terminology Web Services • Already working with Mike Frame • Capability to use web service to access terms in multiple concept systems • Developed XMDR REST API to support this • More from Mike Frame
XMDR Prototype Modular Architecture:primary functional components Metadata Sources concept systems, data elements USERS Web Browsers…..Client Software Content Loading & Transformation Application Program Interface Human User Interface Authentication Service Validation Mapping Engine Search & Content Serving Metamodel specs (UML & Editing) XMDR data model & exchange format XML, RDF, OWL Logic Indexer Text Indexer Registry Store standard XMDR files XMDR metamodel (OWL & xml schema) standard XMDR files Text Index Logic Index standard XMDR files standard XMDR files
XMDR Prototype open source software components Metadata Sources concept systems, data elements USERS Web Browsers…..Client Software Content Loading & Transformation (Lexgrid & custom) Application Program Interface (REST) Human User Interface (HTML fromJSP and javascript; Exhibit) Authentication Service Validation (XML Schema) Mapping Engine Search & Content Serving (Jena, Lucene) Metamodel specs (UML & Editing) (Poseidon, Protege) XMDR data model & exchange format XML, RDF, OWL Logic Indexer (Jana & Pellet) Text Indexer (Lucene) Registry Store standard XMDR files XMDR metamodel (OWL & xml schema) standard XMDR files Text Index Logic Index standard XMDR files standard XMDR files Postgres Database
Third Party Software New REST style APIfacilitates interface for Web Services USERS Web Browsers…..Client Software Metadata Sources concept systems, data elements Content Loading & Transformation (Lexgrid & custom) Application Program Interface (REST) Human User Interface (HTML fromJSP and javascript; Exhibit) Authentication Service Validation (XML Schema) Mapping Engine Search & Content Serving (Jena, Lucene) Metamodel specs (UML & Editing) (Poseidon, Protege) XMDR data model & exchange format XML, RDF, OWL Logic Indexer (Jana & Pellet) Text Indexer (Lucene) Registry Store standard XMDR files XMDR metamodel (OWL & xml schema) standard XMDR files Text Index Logic Index standard XMDR files standard XMDR files Postgres Database
Collaborate with GEOSS (with EPA and Others) • Global Earth Observation System of Systems (GEOSS) ten-year implementation plan. • GEOSS is envisioned as a large national and international cooperative effort to bring together existing and new hardware and software, making it all compatible in order to supply data and information at no cost. The U.S. and developed nations have a unique role in developing and maintaining the system, collecting data, enhancing data distribution, and providing models to help all of the world's nations. Outcomes and benefits of a global informational system will include: • disaster reduction • integrated water resource management • ocean and marine resource monitoring and management • weather and air quality monitoring, forecasting and advisories • biodiversity conservation • sustainable land use and management • public understanding of environmental factors affecting human health and well being • better development of energy resources • adaptation to climate variability and change • Demonstrate data integration
GEOS Interoperability Request for help with interoperability between two GOESS components GEOSS Societal Benefit Activity GEOSS Standards and Interoperability Forum Experts, SDOs, Community Recommendation Register the recommendations, if “accepted” Register the issue as “under review” Study for possible existing solutions GEOSS Interoperability Registry GEOSS Components Registry References References Base GEOSS Standards GEOSS Standards Registry References ADC Co-Chair Meeting 27 Nov 2006 From: S.J.S. Khalsa, IEEE Geoscience and Remote Sensing Society
Collaborate with Water Information System for Europe (WISE) • Register metadata about WISE data elements • Register concept systems with concepts used in WISE data (glossary … ontology) • Support data harmonization • Initially shows support for traditional database computing • Helps to enable introduction of semantic computing for WISE • Are there any people working on WISE metadata and concept systems?
Collaboration with EPA Estuarine and Great Lakes Program (EAGLES) EAGLES Program is designed to: • Develop indicators and/or procedures useful for evaluating the ‘health' or condition of important coastal natural resources (e.g., lakes, streams, coral reefs, coastal wetlands, inland wetlands, rivers, estuaries) at multiple scales, ranging from individual communities to coastal drainage areas to entire biogeographical regions. • Develop indicators, indices, and/or procedures useful for evaluating the integrated condition of multiple resource/ecosystem types within a defined watershed, drainage basin, or larger biogeographical region of the U.S. • Develop landscape measures that characterize landscape attributes and that concomitantly serve as quantitative indicators of a range of environmental endpoints, including water quality, watershed quality, freshwater/estuarine/marine biological condition, and habitat suitability. • Develop nested suites of indicators that can both quantify the health or condition of a resource or system and identify its primary stressors at local to regional scales. • XMDR as extension to Environnemental Information Management System (EIMS)
Collaborate with Water Data Infrastructure (WADI) • WADI is a Semantic Computing application. • WADI goes from data collection to indicator display • XMDR could support concept management for WADI • WADI still needs some R&D and Demonstration • E.g., work on "integration" between a "data-layer“ (real data of RWS, all in XML and some basic low level RDF) and some higher layer of vocabularies/thesauri/ontologies
Potential Collaboration with Berkeley Water CenterDigital Watershed Research Thrust Area • Understanding hydrological processes with sufficient accuracy--in the face of anthropogenic and global changes--is a prerequisite to successful water management. • Progress in this area requires research in engineering and IT: data, technologies, modeling, analysis tools (Theme 1), and cyberinfrastructure (Theme 2). • Developing an understanding requires synthesis of theory, concepts and engineering/IT tools
Digital Watershed Theme 1-TOOLS Development of novel sensors, technologies, and modeling/ analysis approaches is needed to provide information about complex water systems and to ensure cost effective and sustainable delivery of clean water. Examples: • SENSORS to autonomously measure important components of the water cycle and water quality at sufficient resolution and coverage. • TECHNOLOGIES that promote, for example, point-of-use clean water use or cost-efficient desalinization. • NUMERICAL APPROACHES that represent the coupling between atmosphere, vegetation, vadose and groundwater processes that are important for accurately predicting watershed behavior and sustainability.
Theme 2: Water CyberInfrastructure • This theme focuses on the development of cyber-infrastructure that will enable researchers and water managers to: • Curate, assimilate, and clean complex, multi-scale datasets collected from networked micro sensors to global satellite platforms; • Connect datasets to analysis, modeling, and visualization tools • to facilitate hypotheses testing and eventually decision making.
Microsoft Technical Computing Initiative Approach • Demonstrate an advanced cyber-infrastructure approach for tackling 21st century challenges by leveraging web service concepts, technologies, and information technology expertise; • Early focus will integrate the most critical components needed to address relevant science questions, rather than creating a fully developed problem solving environment. • Demonstrate prototypes with end-to-end scenarios, and use feedback from water scientists to refine and augment • Work on two different, yet scientifically related projects that will : • Permit us to understand what is common and what is distinct between different water research approaches; • Allow us to work with a wide range of water datasets and analysis techniques; • Provide demonstration vehicles to two different water research communities.
Technical Computing Initiative The Microsoft TCI will focus on development based on the needs of different water research communities • CARBON-CLIMATE • Protocols for AmeriFlux data acquisition and reporting are well defined; • Data are small and fairly clean; • Will permit development and testing of a portal that will be rapidly useful for water scientists. • Advances developed during this project will be applied to the development of the more challenging Central Valley portal. • CA WATER RESOURCES • Extremely diverse datasets from many data providers; • Datasets typically ‘dirtier’ and larger than AmeriFlux; • Project offers significant potential for transferability to other basins; • Will build on advances developed under Carbon-Climate portal.
Carbon-Climate Workbench Host Ameriflux Climate Data, Statsgo Soils Data, MODIS products Tools: Statistical Graphical Web Service Interface to Data and Tools Choose Ameriflux Area/Transect, Time Range, Data Type Ecology Toolbox Design Workflow Data harvest Sites 1-16 Data Cleaning Tools Web-based Workbench access Compute Resources Import other Datasets Gap Fill, A technique Gap Fill, B technique Knowledge Generation Tools Statistical & graphical analysis Climate Statsgo MODIS Data Mining and Analysis Tools Version control Canoak Model Site 1 Canoak Model Site 9 LAI Temp Fpar Veg Index Surf Refl NPP Albedo Modeling Tools Network display LAI Statistical & Graphical analysis Visualization Tools Carbon-Climate Workbench
California Water CyberInfrastructure • BWC is in discussion with several groups to determine optimal project/place to develop and demonstrate Water TCI. • Criteria: • Agency involvement and interest; • Problem Characteristics (Science and socioeconomic importance; reward/risk); • Leveraging opportunity (projects / datasets); • Transferability to other basins; • Visibility • Springboard for Digital CAL synthesis • Ideal: Work with two different basins to explore what is similar and different in terms of water data IT and science challenges; • Long Term: Scalability between water agency / basin datasets and supply/demand estimates and DWR State components. State Water Plan.
Example Water TCI focus: Central Valley Water Resources and Quality • Across the US, groundwatersupplies roughly 40 percent of drinking water; • The State of California alone uses about 16 Million acre-feet of ground water each year, more than any other State in the Nation, and 80% of that goes toward crop irrigation; • The 400 Mile long Central Valley supplies ¼ of the food in the US. • California Groundwater quantity and quality is critical to the economic viability of the state; • Recognizing this importance, USGS has developed a $50 Million program focusing on CA water quality monitoring. • PROBLEM: Disparate datasets and tools hinder ability to assess water resources and quality in Central Valley (and most basins in world)…. Central Valley Ken Belitz (USGS))
USGS and State Water Resources Control BoardGAMA* and RASA** Projects • The importance of California groundwater quality and resources has prompted the USGS and SWRCB to develop a project to model flow pathways in the Central Valley (Central Valley RASA) and a $50M project to monitor ground water quality (GAMA); • As the GAMA project focuses on intensive data collection, no plans have been made to curate these data or to federate them with the other water datasets critical for understanding water balance and quality over time in the Central Valley. * Ground Water Ambient Monitoring and Assessment Program; ** Regional Aquifer Systems Analysis (Ref: Ken Belitz, USGS)
Example of GAMA Water Quality Data Ken Belitz (USGS)
Data Cleaning, Models, Analysis Tools Distributed California Water Resource Datasets Data Harvesting and Transformations BWC Data Gateway BWC Analysis Gateway Computational Resources BWC Water Portal Digital CAL Knowledge discovery, Hypothesis testing, Water Synthesis Dissemination and Archiving California Water Portal
FYISpecial Edition of IJMSO • Editing special edition of International Journal of Metadata, Semantics and Ontology • Open Forum on Metadata Registries • Topics related to metadata registries • Inviting people to write articles • Contact Bruce Bargmeyer
In Response to Mike Frame’s Question Describe the API for Terminology Web Services.
Initial XMDR REST-style Application Programming Interface (API) • Search Methods (GET) • Text Search • SPARQL Search • XMDR Search (not documented yet) • Registry Information Methods • Summary information • registered models • Identified Items • Method Parameters • can be included as part of any method • as part of URL • Accept_type (what xml components to expect) • Stylesheet (how to display results)
*REST API (Search Results) searchResult (application/xml) <searchResult> <queryID>jfs934js</queryID> </searchResult> textResultSet (application/xml) <resultSet> <itemSet> <item> <!—element names will be names of fields in the Lucene document and element values will be their string values </item> … <item> </item> </itemSet> <locallyAvailable>0</locallyAvailable> </resultSet> sparqlResultSet (application/xml) <resultSet> <itemSet> <item> <!—SPARQL result set – in XML format - fill in from SPARQL protocol spec --> </item> </itemSet> <locallyAvailable>0</locallyAvailable> </resultSet>
*REST -- Registry (content) methods (* indicates that feature is not yet implemented)
*REST API (Registry Results) contentList (application/xml) <contentList> <item>nameOfItem</item> … <item>nameOfItemN</item> </contentList>
Acknowledgements • Susan Hubbard, BWC • John McCarthy, LBNL • Karlo Berket, LBNL This material is based upon work supported by the National Science Foundation under Grant No. 0637122, USEPA and USDOD. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, USEPA or USDOD.