Data Fabric IG Introduction

Data Fabric IGIntroduction

Observations I from Recent Overview • about 50 interviews & about 75 community interactions • Data Management and Processing is too time consuming and costly due to organization heterogeneity. • Federating data including logical layer information (tracing provenance, understanding creation context, checking identity and integrity, etc.) is too costly. • DM and DP is not ready for Big Data due to the lack of usage of automated procedures incorporating proper data organization mechanisms.

Observations II from Recent Overview • Due to lack of software that is supporting proper data organizations we continue to create legacy data. • Example: a key biologists is spending 75% of his time for data management - a waste of money and human capital. • To a large extent results are not reproducible. • Senior Domain People agree: • need a change data organization and procedures • but risky path and lack data professionals • people hesitate since they miss clear perspectives

Data Fabric Sketch a very rough sketch of the Data production and processing machinery of data-driven science • One Big Question for RDA: • How can we maximally support this machinery • unload researchers, • make science reproducible, • etc.

“Data Fabric” based on Recent Overview often all in file system data one can work with organized data sharable & re-usable data often a lot of copying in file system

One concrete conclusion • current practice: a data collection comes with its own data organization, management and access solutions • future: there is no need for this heterogeneity since DOs can be treated content independent to a certain extent

Until now in RDA • started with a number of WG activities in P1 • DTR, PP, PIT, MD, DFT – the old ones • some people found this urgent and interesting topics • almost at the end and some questions: what now, how does it all fit into landscape, etc.? • they started this DF brainstorming • even more groups started • are there general themes in the data landscape • the whole issue of data publishing/citation/etc. • the whole issue of scientific culture/legal & ethical aspects • our daily data work in the departments – the Data Fabric • may be more

A few Questions arise • what is the scope of RDA’s Data Fabric? • what are the characteristics of RDA’s Data Fabric? (term is used in industry already: efficient computational machine) • what are the components of RDA’s Data Fabric? • what should the DF IG do within RDA and what not?

Scope and Characteristics of RDA’s DF • DF is about • making departments’ data science reproducible • creating the conditions for trust in the anonymous data domain • identifying mechanisms, components and interfaces making data science efficient and cost effective • discussing cross-disciplinary approaches • defining a framework that allows to include new components or component variants in a flexible way • Example: • DF will state necessity of a worldwide available machinery to register & resolve DOs, we will say something about registered attribute types and specify an API • but we will not say how to implement and use such a system

Scope and Characteristics of RDA’s DF • DF is NOT about • prescribing an overarching architecture we need to follow • specifying an implementation of such an architecture • discussing specific technologies and tools • more than discussing the processing machinery (not publication, citation, l & e, etc.) • DF is about highly automated procedures or at least guidance to follow such procedures.

Components of RDA’s DF (just first ideas) • domain of registered data objects (DO) incl. basic organization principles (data, code, knowledge) • domain of registered actors (ORCID, etc.) • domain of trusted repositories for DOs • accepted policy principles (proper organizing mechanisms, self-documenting, certified, etc.) • set of trusted registries (types&concepts, metadata and provenance schemas, metadata instances, repositories, PIDs, policies, etc.) • what about semantics – so important! • much already out there, need to see how this can all fit together and how we can foster software development

DF IG way of acting • DF IG must be an inclusive open platform for interaction • DF IG needs to place the various WGs/IGs on the landscape • DF IG needs to identify barriers across groups • DF IG can work as umbrella to maintain WG results • open position papers will summarize the state of discussions and provoke convergence debates • it will NOT take council’s of TAB’s role

Task of today DF IGBoF • What are DF’s Scope and Characteristics? • What are DF’s components, interfaces, mechanisms? • How should DF act? • Who will chair DF IG?

Data Fabric IG Introduction

Data Fabric IG Introduction

Presentation Transcript

Data Challenges and Fabric Architecture

RDA Data Foundation and Terminology (DFT) IG: Introduction

Data Center Fabric Market

Introduction to Fabric

Introduction to Fabric manufacture

r - ig r ig d - ig d ig

Data Services IG Discussion

Introduction to EDG Fabric Monitoring

Data Management IG Discussion

Global Data Fabric Market Growth

Spunlace nonwoven fabric introduction product

Data Foundations And Terminology (DFT) IG

Joint meeting: IG Brokering, IG Data Fabric “Advancing Data Fabric with Brokering services”

Extract Data from Instagram (IG) Following

Data Fabric Market

Data Fabric and Data Mesh

Data Fabric Market

Fueling Next-generation Data Management with Data Fabric