10 likes | 234 Views
AGUFM13 - IN11A-1509 (MS Hall A-C). Integrate Semantic Applications into a Data Science Platform. VIVO - represents academic research communities Every person, organization, or other data entity in VIVO has a unique identifier
E N D
AGUFM13 - IN11A-1509 (MS Hall A-C) Integrate Semantic Applications into a Data Science Platform • VIVO - represents academic research communities • Every person, organization, or other data entity in VIVO has a unique identifier • VIVO enables the discovery of research and scholarship across disciplines at one institution or across many • Records are both human-readable and machine-readable • We’ve extended (yes, ontologies) VIVO to the science network – datasets, instruments, sites, etc. • Feeding this back to VIVO Progress in Open-World, Integrative, Collaborative Science Data Platforms Peter Fox1 (pfox@cs.rpi.edu) (1Rensselaer Polytechnic Institute 110 8th St., Troy, NY, 12180 United States – see Acknowledgements) • Identify Everything • DCO-ID • Content negotiation • Respect other locations, names CKAN VIVO ABSTRACT As collaborative, or network science spreads into more Earth and space science fields, both the participants and their funders have expressed a very strong desire for highly functional data and information capabilities that are a) easy to use, b) integrated in a variety of ways, c) leverage prior investments and keep pace with rapid technical change, and d) are not expensive or time-consuming to build or maintain. At a conceptual level, science networks (even small ones) deal with people, and many intellectual artifacts produced or consumed in research, organizational and/our outreach activities, as well as the relations among them. Increasingly these networks are modeled as knowledge networks, i.e. graphs with named and typed relations among the 'nodes'. Nodes can be people, organizations, datasets, events, presentations, publications, videos, meetings, reports, groups, and more. In this heterogeneous ecosystem, it is also important to use a set of common informatics approaches to co-design and co-evolve the needed science data platforms based on what real people want to use them for. GHS – Handle.net • Solution: Leverage Open-Source Semantic Technologies • Triples tie it together Information Models and Leveraged Ontology Community Data and Groups • USE: W3C Provenance Ontology (PROV-O) • Developed by W3C Provenance Working Group • W3C 2013 Recommendation - http://www.w3.org/TR/prov-o/ • Information models provide domain level view and logical models implemented in ontology leverage a wide variety of vocabularies and encoding schemes • Alignment performed using … • RDFS subClassOf assertions • SKOS broadMatchstatements • Integrate terms from VIVO, FoaF, BIBO, O&M, VSTO, BCO-DMO, TWC and others • Starting with their information models where possible • Progress = integration at the application level to bring several key functions together in a Web-based environment oovering most research needs • HIDE ALL OF “IT” FROM THE USERS!! • Key integrating concepts • Groups • Formal Community Groups • Ad-hoc special purpose/ interest groups • Fine-grained access control and membership • Linked • All content can be tagged at a variety of level of detail • Incentives to contribute content • Content is visible in key reports – accepted by sponsors Community Dashboard and Reporting Progress Sponsors: AP Sloan Foundation National Science Foundation Tetherless World Constellation • Knowledge network – implements both the collaboration and the integration • Many means of population • User generation • Machine generation • Substantially contributing these enhancements back to open-source community (CKAN, VIVO, GHS) Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute CKAN– Comprehensive Knowledge Archive Network VIVO - Research Collaboration Network model and software GHS – Global Handle System PROV– W3 Provenance Model and Ontology Acknowledgments: DCO Data Science Team W3C Provenance Working Group