1 / 21

Data Fabric Interest Group Plenary 9 Core Session Barcelona

Data Fabric Interest Group Plenary 9 Core Session Barcelona. Agenda. Welcome, Introduction (5 minutes) Election of new Co-Chair (5 minutes) Review of Activities (30 minutes) Global Digital Object Cloud Update (15 minutes) Discussion/Questions(20 minutes)

jhawes
Download Presentation

Data Fabric Interest Group Plenary 9 Core Session Barcelona

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Fabric Interest Group Plenary 9 Core Session Barcelona

  2. Agenda • Welcome, Introduction (5 minutes) • Election of new Co-Chair (5 minutes) • Review of Activities (30 minutes) • Global Digital Object Cloud Update (15 minutes) • Discussion/Questions(20 minutes) • Gaps/Opportunities/Next Steps (15 minutes)

  3. Co-Chair Election

  4. DFIG Current Activities • Ecosystems and Core Components • Recommendations (https://hdl.handle.net/11304/a3d012ca-4e23-425e-9e2a-1e6a195b966f) • Aggregate slide deck (DF IG Documents RDA wiki page) • Common Governance and Operating Procedures • GEDE group • Metadata and PIDs • PID Kernel Subgroup -> PID Kernel WG proposal • Brokering Services • DFIG/Brokering Workflows • Training and Education • DFIG/ETHRD workshop planning session

  5. Types of Data Fabrics • We can differentiate between • user data fabrics to support discovery and access to published data • collaboration data fabrics that support processing of shared collections • repository data fabrics that are focusing on preserving data • Supported virtualized entities in these DFs are • data collections that include the context of DOs • workflows encapsulating analyses • data flows managing data transport • Essential capabilities are interoperability, federation, interaction control * Source: Reagan Moore

  6. Nature of a Data Fabric • Data Fabrics in the above sense are blueprints to create generic infrastructures that support virtualisation of collections, workflows and data flows • Instantiations of Data Fabrics will offer a set of services some of which are core and others are optional • Data Fabrics are NOT instantiations of a specific collection, workflow or data flow.

  7. Defining Core Components configuration A configuration B Task to solve: • Identify and specify Common Components (CoCo) • Recommend CoCo • Put CoCo in place Not ONE architecture: Identify CoCos that could cooperate in specific configurations to solve a function (infra, VRE, etc.), Common Components & Services Specific Components & Services

  8. Identifying Core Components • Core Data Type Definitions, Metadata Standards and Vocabularies • Trustworthy Data Repositories • Trustworthy, Machine-Actionable Registries of • Repositories, Data Types, Metadata Standards, Vocabularies, Authorization Records, Licenses • PID Services • Collection Services • Brokering Services • Common Governance and Operating Procedures • Training and Education

  9. From Core Components to Data Fabrics Configurations must be driven by workflows and use cases Increasing scale requires moving away from Human Controlled Processing to Type-Triggered Automatic Processing Component configurations should enable an ecosystem of tools and services

  10. Human Controlled Processing (HCP) Observations Experiments Simulations etc. Cycle can be manually controlled or semi-automatically via pre-set pipelines. Even in case of semi-automatic pipelines humans are close-in "designers“

  11. Type-Triggered Automatic Processing (T-TAP) Data Events New feature: cycles run highly autonomously - precise steps depend on the types of data entering the workflow exposing new DOs Structured Data Markets adding new data some kind of profile matching Researchers are not in direct control Data Federation Agents Data Type Registry Processing services result Brokering & Mediation services scripts

  12. A neurologist wants to research the causal relation between Alzheimer phenomena and specific genes, proteins, neural activity, etc., using machine learning algorithms on confidential data from a federation of hospitals and labs. A linguist researches theories about „economy of languages“ finding objective patterns that make languages more or less easy to process and learn by applying machine learning algorithms on open data from a variety of sources filtered by languages and feature The data manager of a large data centre must continuously and asynchronously check the quality of new data of specific types, transform it according to certain rules, and create n replications in a federation Use Cases

  13. Recommendations Update • PID Focus Area work is progressing • GEDE Europe (https://www.rd-alliance.org/groups/gede-group-european-data-experts-rda) was highly active with f2f and virtual meetings • Result is a new report: Grouped List of Assertions (also uploaded to DFIG pages) • consultation of in total 25 reports and papers suggested by participants • extraction of <60 assertions from all documents • then classification of these assertions into sections (1. nature of PIDs and PID systems, 2. their relevance, 3. assigning PIDs, 4. using PIDs, 5. Handles and DOIs, 6. others) • much agreement in core assertions • some variety in way of assigning and using PIDs

  14. Areas of discussion • PID in binding role, which type of attribute to add to PID record or to landing page • type of attributes need to be machine readable and specified • how to indicate versions • time of assignment of PIDs • granularity of PID assignment • role of repositories (trustworthy) in assigning • use of fragment indicators • how to add life cycle statements (deletion, splitting, merging, etc.) • when Handles and when DOIs

  15. Next Steps • broad commenting on summary assertions by RDA/DFIG and GEDE people within April 17 via web pages and P9 sessions • virtual meeting in May (DFIG and GEDE groups) • f2f meeting in June/July to finish the main summary assertions • afterwards a final report on agreements and identifying areas of disagreements • start interacting about next topic area • primary areas of interest could be „repositories“ (tasks, interfaces, data organisation, etc.) and „data processing“ (workflows, type triggered, etc.)

  16. PIDs remain central PID Record PID PID CKSM PID PID paths PID Metadata Rights data copies Relations Provenance

  17. PID Kernel Update • Worked started in Denver at P8 • Working groups met over the last 6 months • Draft profile created • PID Kernel Working Group Case Statement Submitted • Work completes at P11

  18. Identifier Service Identifier Service Repo/Registry Repo/Registry Repo/Registry Repo/Registry Identifier Service ID: HGY… ID: XZY… ID: 876… Global Digital Object Cloud (GDOC) Repo/Registry (object:dataset) End users, developers, and automated processes deal with persistently identified, virtually aggregated digital objects, including collections which are overlays on multiple network services which in turn are overlays on existing or future information storage systems. ID: 843… ID: 123… ID: 987/… G A A A 101110010101001010 010101010101010100 010101010101010100 111110101101010111 (object:publication) (object:collection)

  19. GDOC – Is it Real? • Storage – not our problem, but • Latency is an issue • Changing interfaces can be a problem • Services • Identifier • Common resolution systems • PID Kernel, Profiles • Repo/Registry • Common APIs • Confusion: Repository not equal to Storage • Confusion: Registry is a Repository of metadata objects • Object Level • Common Object Interface must be provided by Repo/Registry • Collections ARE Objects • Clients • Good News / Bad News – web browser remains universal client

  20. GDOC – Is it Real? • CONCLUSION: Evolution needed & inevitable; RDA can help drive it • DFIG, Brokering, PID Kernel, Collections, DTR, ….

  21. Gaps/Opportunities • Further progress on Machine-Actionable Registries • DFT for vocabulary - needs population and use • Have DTR for data types - needs testing and iteration • R3Data for Repositories - need a machine-actionable equivalent • Metadata Catalog - machine actionable catalog is a pending RDA WG • Not sure if anyone is working on Authorization and License registries • Governance and Operating Procedures • Need for this will become critical as soon as test beds and functional ecosystems are available • PIDs • Linked Open Data community needs • Recommendations for workflow vs publication

More Related