10 likes | 127 Views
Deliberations of Reusing the PROV Ontology for Provenance Capture in Earth and Environmental Sciences
E N D
Deliberations of Reusing the PROV Ontology for Provenance Capture in Earth and Environmental Sciences Xiaogang (Marshall) Ma(max7@rpi.edu)1, Peter Fox (pfox@cs.rpi.edu)1, Curt Tilmes (ctilmes@usgcrp.gov)2,3, Stace Beaulieu (sbeaulieu@whoi.edu)4, Patrick West (westp@rpi.edu)1, Linyun Fu (ful2@rpi.edu)1, Jin Guang Zheng (zhengj3@rpi.edu)1 1 Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA; 2NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA; 3U.S. Global Change Research Program, Suite 250, 1717 Pennsylvania Avenue, NW, Washington, DC 20006, USA;4Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA Background The W3C PROV Ontology (PROV-O) [1] provides an excellent model for representing, capturing and sharing provenance information on the Web. Meanwhile, the Earth and environmental science domain faces increasing demand for documenting and publishing provenance information of scientific data, methods and workflows. Currently we are still short of experience and best practices of deploying the PROV-O for Earth and environmental sciences. This presentation will discuss our experience of reusing the PROV-O for two recent research projects. The PROV-O provides three starting-points classes, Entity, Agent and Activity, and properties that relate them, as well as a number other classes and properties enriching the provenance representation (Fig. 1). In practices one can design domain ontologies including specific classes and properties and map those classes and properties into the PROV-O. Deliberations happen in the domain ontology design and the mapping. In our first project we faced the provenance reconstruction of a report so we took definition of entities and relationships among them as the main stream of provenance capture. For the second project we shifted the main stream to provenance capture of activities because that work was designed to be associated with the workflow of data processing and information of activities is available. We hope this discussion will benefit colleagues who are also working on provenance capture and presentation. Fig. 1. The three starting-point classes Entity, Agent and Activity in PROV-O and the properties that relate them. Figure is from [1]. Provenance capture with a report and associated documents The first example is with an ontology (the GCIS ontology) [2] developed for the Global Change Information System (GCIS), which is a part of the U.S. Global Change Research Program (USGCRP). The initial focus of GCIS is to support the third National Climate Assessment (NCA3) in the United States. It will present the NCA3 report and also incorporate integrated access to inter-linked resources supporting that report. When we started to develop the GCIS ontology in 2012 summer , the NCA3 draft report was about to finish (version for public comment released in 2013 January). The first-hand materials we had are the draft report and the figures and tables inside it, as well as the references cited by the report and the internal structure of it (i.e., chapters, sections, etc.). We developed the GCIS ontology accordingly (Fig. 2). The Instances of those classes and the relationships among them can be captured from the documents we have, but without details about the activities that generated those documents. Fig. 2. Basic classes and properties in the current GCIS ontology (v. 1.1). Classes and properties represent a brief structure of the draft NCA3. There is no subclass of prov:Activity in this figure. Provenance capture in a workflow The second example is with an ontology for provenance capture in the project Employing Cyber Infrastructure Data Technologies to Facilitate IEA for Climate Impacts in NE & CA LME's (#3 & #7) (ECO-OP). In the ECO-OP project the iPython Notebook is applied as an environment for scientific data processing, figure and table generation and report writing. This allows provenance information to be captured in a workflow. We are currently developing an ontology serving that purpose (Fig. 3). In Fig.3 two classes, ecoop:IPythonNotebookRun and ecoop:CodeCellRun, constitute the trunk of the ontology. They both are subclasses of prov:Activity. The resources used and generated in the activities, such as program libraries and datasets, are subclasses of prov:Entity, and information about them can be captured when iPython Notebook runs. In this way the second example is different from the first example. Fig. 3. A part of the ECO-OP provenance ontology under development. In this ontology, subclasses of prov:Activity are the trunk. References: [1] Lebo, T., Sahoo, S. , McGuinness, D. L. (eds.) , 2013. PROV-O: The PROV Ontology. Available via: http://www.w3.org/TR/prov-o [2] http://tw.rpi.edu/web/project/gcis-imsap/GCISOntology Concluding remarks From the deliberations of reusing the PROV Ontology we see three core elements in the development of domain ontologies: expressivity , implementabilityand maintainability. Expressivity is about level of details and precision in semantic representation of ontologies; implementability is about that the developed ontologies can be used efficiently in practices; and maintainability is about to keep the ontologies updating and evolving in a long-term perspective. The balance is application-dependent, but the three elements are often encountered together in domain ontology works. It is recommended that future domain ontology works can seek approaches to address the challenge of balancing these three elements. Deliberations may cause confusion in practices of ontology development, but those approaches can provide some guidelines. Acknowledgments: Get poster at: