310 likes | 407 Views
Linked Data use in the Global Change Information System. Stephan Zednik , Rensselaer Polytechnic Institute Peter Fox, Rensselaer Polytechnic Institute Marshall X Ma, Rensselaer Polytechnic Institute. The Global Change Research Act and USGCRP.
E N D
Linked Data use in the Global Change Information System Stephan Zednik, Rensselaer Polytechnic Institute Peter Fox, Rensselaer Polytechnic Institute Marshall X Ma, Rensselaer Polytechnic Institute
The Global Change Research Act and USGCRP • USGCRP was mandated by Congress in the Global Change Research Act (GCRA) of 1990 (P.L. 101 – 606) “To provide for development and coordination of a comprehensive and integrated United States Research Program which will assist the Nation and the world to understand, assess, predict, and respond to human-induced and natural processes of global change.”
U.S. Global Change Research Program • Coordinates Federal research to better understand and prepare the nation for global change • Prioritizes and supports cutting edge scientific work in global change • Assesses the state of scientific knowledge and the Nation’s readiness to respond to global change • Communicates research findings to inform, educate, and engage the global community
National Climate Assessment • Major product of the USGCRP • Mandate to release every 4 years • Integrates and summarizes current research • Highlights key knowledge to improve policy choices • Hundreds of expert contributors • 60-member federal advisory committee • 3rd National Climate Assessment (2014) • Web-first delivery • http://nca2014.globalchange.gov
…in the aftermath of this… CLIMATEGATE
Global Change Information System (GCIS) Long Term Vision: The Global Change Information System (GCIS) is intended to eventually become a unified web based source of authoritative, accessible, usable and timely information about climate and global change for use by scientists, decision makers, and the public. Initial Prototype: Coincident with the release of the Third National Climate Assessment (NCA) on May 6 2014, the GCIS supports the distribution, presentation and documentation needs of the NCA, integrating that content into the USGCRP web site and demonstrating the potential for GCIS to support the long term vision.
Complete Traceability for NCA Content Transparency ------------------------------------------------------------------------ Reproducibility Easier . . . . . . . . . . . . . . . . . . . . . . . . Harder Traceable Sources Traceable Data Traceable Processes Traceable Tools • References • Image sources • Data sources • Link to datasets • Complete metadata • Description of methods • Access to process info & review • Access to computer code • Description of systems and platforms
Data and The National Climate Assessment The Challenge • More than 250 named authors (>1000 contributing!) • 827 pages • 43 Chapters and Appendices • 284 Figures • More than 600 Images • 3395 References • Approximately 83 data sources used across as many as 235 instances
GCIS Information Model and and Semantic Application Prototypes (GCIS-IMSAP) • vocabularyand ontology development within the context of the overall development of semantic prototypes for the National Climate Assessment portals • searchand browse options that inspire confidence • data providers will be citable with detailed provenance
GCIS Data Mining Structured information with relationships allows integrated data mining, searching, metrics. • What projects provided data used to produce figures that were referenced in the 2013 NCA section about coastal sea level rise impacts? • Which data centers hold data referenced by papers related to forests in the midwest? • Which agencies have people working on projects related to societal impacts of extreme weather events? • Show me the latest papers about health impacts of air quality in California. Which datasets were used in the analysis of air quality in California?
GCIS Linked Data • Create an entity from the structured metadata about each thing – reference related entities. • Identify it on the web with a persistent, controlled resolvableidentifier. • Present with a human readable web page and a machine interface.
GCIS Ontology • Reports, Figures, Images, Research Papers, Journals, Measurements, Datasets, Instruments, Agencies, Projects, People, Models, Algorithms, … • Findings: “Climate is changing”, “Sea Level is Rising”, … • Concepts: “Impacts of Climate Change on Human Health”, “Adaptation”, … • Define entity classes, properties, and interrelationships based on elements from the NCA report • Focus on provenance of report elements and “traceable accounts” for report findings • Reuse existing ontologies where possible
Use Case 1 Visit data center website of dataset used to generate a report figure
Data Provenance of Report Figure Structured representation of report elements Figure cites source publication Publication data citation Dataset landing page
Use Case 2 Find people involved in the generation of a chapter in the NCA3 draft report and their roles Chapter key finding authors Contributors to cited datasets Chapter editors more… Cited authors Chapter authors
W3C PROV Ontology attribution (with role information) derivation relationships for contributing inputs
Use Case 3 Provenance tracing of NASA contributions to Figure 1.2 in NCA3 draft report
Report Finding Traceable Account … prepare a summary “traceable account” (a few sentences to a paragraph) that describes the main factors that contributed to the conclusion and level of confidence
GCIS Linked Data API Full API at: https://data.globalchange.gov/api_reference
GCIS Linked Data API - example curl http://data.globalchange.gov/report/nca3/figure/overview-observed-change-in-very-heavy-precipitation-2.ttl
GCIS Web Site https://data.globalchange.gov
GCIS Web Site – figure example (HTML) Content Formats
Next Steps • encourage community use of Linked Data API • “structured-content first” delivery • evolve ontology for future assessments • tooling to capture report metadata during report construction
Links • Ontology documentation • https://data.globalchange.gov/gcis.owl • Concept map • http://bit.ly/1EypztG • Ontology RDF serialized in Turtle format • https://raw.githubusercontent.com/USGCRP/gcis-ontology/gcis-ontology-1.2/gcis.ttl
Special Thanks • GCIS Visionary : Curt Tilmes • System Engineer : Brian Duggan Also thanks to • Andrew Buddenberg : Client development • Steve Aulenbach : Data Curator • Justin Goldstein : Researcher • Robert Wolfe : Project Manager • Amanda McQueen, Brent Newman : GCIS Interns • Tania Sizer : Web Designer • Xiagoang (Marshall) Ma, Peter Fox and the team at the Tetherless World Constellation : Ontology Engineering.