320 likes | 327 Views
This presentation explores the importance of data stewardship in virtual observatories and the need for integration, transparency, and collaboration. Prof. Peter Fox discusses the challenges and solutions in accessing and sharing scientific data worldwide.
E N D
Why We Need To Get Smart About Data To Be Better Stewards: Making Smarter Virtual Observatories Prof. Peter Fox (pfox@cs.rpi.edu, @taswegian, #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive Science/ IT and Web Science Rensselaer Polytechnic Institute, Troy, NY USA And the Deep Carbon Observatory Data Science Team IGARSS 2015, Milan, Italy, July 28, 2015 http://tw.rpi.edu/web/doc/IGARSS2015_Milan_Fox20150728_TU3.Y2 or http://bit.ly/1D50rQE(1D50rQE)
Data Science Team + Hao, Kaleo, Stephen, Anusha, Jun, Mengyu, Chengcong, Harsha, Dan, …
What to expect… • The Virtual Observatory – in brief • Ecosystems and stewarding • Systems v. Frameworks -> Platforms • Mediation • Deep Carbon Observatory • Integration, Transparency and Collaboration • The smarts.. • Where we are headed
Working premise :== Mission Statement • Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data and information that: • appears to be integrated • appears to be locally available • is in a language (written, programming, or science) that is understandable and can be shared • Data intensive – volume, complexity, mode, scale, heterogeneity, … in an OPEN WORLD 4
Experience • Ecosystem metaphor – how to steward? Data Information Knowledge Creation Gathering Presentation Organization Integration Conversation Context 5
Producers Consumers Experience • Not just curator, i.e. producer to consumer Data Information Knowledge Creation Gathering Presentation Organization Integration Conversation Context 6
Stewardship in the ecosystem! • Many elements, and we still do not have sufficient information models (and meaning) of how they inter-relate – a massive stewardship challenge Accountability Collaboration Identity Explanation Justification Verifiability Proof Trust Citability Integratability ‘Transparency’ -> Translucency ‘Provenance’
Framework v. systems v. platforms • Rough definitions • Systems have very well-define entry and exit points. A user tends to know when they are using one. Options for extensions are limited and usually require engineering • Frameworks have many entry and use points. A user often does not know when they are using one. Extension points are part of the design • Platforms ~ arise from frameworks Tetherless World Constellation 8
High-level framework architectures Tetherless World Constellation
Core and Framework Semantics - Multi-tiered interoperability Mediation! Mediation! Mediation!
Mediation 6th Generation Guess Smart Text Agents, Smart Data Agents, Relationship/ Association Rules, Cognitive Collaboration All these generations of mediation are in effect as we collaborate From: C. Borgman, 2008, NSF Cyberlearning Report, Illustration by Roy Pea and Jillian C. Wallis
Deep Carbon Observatory (DCO) … • “We are dedicated to achieving transformational understanding of carbon’s chemical and biological roles in Earth.” www.deepcarbon.net
Collaboration and Integration needs … • “Enable DCO team leaders to create new groups and associate a number of content types --- documents, discussions, blog posts, tasks, links, and bibliographic entries --- with the group, as well as simple event management (a private event calendar for the group) and embedding of external services (e.g. and esp. Google Calendar)” … more… (data, publications, projects)… stewarding a Knowledge Network … and a Virtual Organization (> 1000+ people = more)
Decadal goals = Discovery science Global community of ‘Carbon scientists’ contributing to the Deep Earth Computer (data legacy) comprising: • Global Earth Mineral Laboratory • Inventory of Deep Fluids • Global Volcano Gas Emissions • Census of Deep Microbial Life • State of High Pressure and Temperature Carbon and Related Materials • Global Inventory of Diamonds with Inclusions • 7 others…
TW-SPARQL Application: Dynamic, Stylized Menu Generation (using Drupal host) • Menus based on parameterization of page • See “Recent Findings" and "Projects" below • Note also expanded view “>”
DCO Data Science Platform = DCVO CKAN VIVO GHS – Handle.net
Stewardship of data-information-knowledge deepcarbon.net info.deepcarbon.net data.deepcarbon.net dx.deepcarbon.net
VIVO Extension: Dataset deposit in attached data repository Need DCO-ID? Begin NO Revise metadata YES YES • Includes multi-level metadata collection • Includes persistent identifier (DCO-ID generation) • Includes interaction with dedicated repository OR accepts third-party deposit details Generate & register DCO-ID (unique suffix, blank URL) NO NO Data deposit YES NO External data YES Review DCO-ID & CKAN metadata Collect CKAN metadata & generate URL Revise CKAN metadata Add URL (to data in external repository) Deposit in CKAN & generate URL to data URL to the downloadable data Update DCO-ID (map the DCO-ID to CKAN URL) Update DCO-ID record Object without data URL End DCO-ID & DCO-ID metadata Deposited DCO data or URL to external data Data Science
DCO Ontology http://deepcarbon.net/dco_datasets Click on Title: “DCO Ontology” or https://deepcarbon.net//dco_dataset_summary?uri=http://info.deepcarbon.net/individual/n5989
State to date… • Knowledge network – implements both the collaboration and the integration, reporting implements the transparency • It’s being USED • Many means of population • User generation • Machine generation • Contributing these enhancements back to open-source communities (CKAN, VIVO) July 2014
Thus… progress in VOs • Integrative – semantics • Transparent – semantics • Collaborative – semantics • Stewardship! • Yep – semantics • And cognition • This is where are we headed
Thank you • pfox@cs.rpi.edu and the DCO Data Science Team • @taswegian #twcrpi • http://tw.rpi.edu • http://tw.rpi.edu/web/project/DCO-DS • http://deepcarbon.net
Modern informatics enables a new scale-free framework approach • Use cases • Stakeholders • Distributed authority • Access control • Ontologies • Maintaining Identity
vivoweb.org • VIVO - represents academic research communities • Every person, organization, or other data entity in VIVO has a unique identifier • VIVO enables the discovery of research and scholarship across disciplines at one institution or across many • Records are both human-readable and machine-readable • VIVO Extension - we’ve extended (yes, ontologies) VIVO to the science network – datasets, instruments, sites, etc.
Collaboration tools Group Based Collaboration Group data deposit and reporting Listings of group content Group management and messaging Listings of group documents
Group bibliography Group shared calendar Group task management Group membership Group event management 32