1 / 22

EUDAT

This workshop discusses the challenges and approaches in building a Collaborative Data Infrastructure, focusing on user-focused functionality, data capture and transfer, trust, data curation, data discovery and navigation, workflow creation, and more.

ksheehan
Download Presentation

EUDAT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EUDAT AAI for a Collaborative Data Infrastructure - Challenges and Approaches - Johannes Reetz, EUDAT VAMP workshop Helsinki, 30 Sep 2013

  2. The CDI conceptCollaborative Data Infrastructure User-focused functionality, data capture & transfer, VREs Users Trust Data Curation Data Generators Data discovery & navigation, workflow creation, annotation, interpretability Community Support Services Persistent storage, identification, authenticity, workflow execution, mining Common Data Services

  3. Initially six research communities on Board • EPOS: European Plate Observatory System • CLARIN: Common Language Resources and Technology Infrastructure • ENES: Service for Climate Modelling in Europe • LifeWatch: Biodiversity Data and Observatories • VPH: The Virtual Physiological Human • INCF: International Neuroinformatics • All share common challenges: • Reference models and architectures • Persistent data identifiers • Metadata management • Distributed data sources • Data interoperability

  4. Communities andData Centers Identifying basic requirements Identifycommonalities,common data services

  5. What community users see … Today Community portal, single credential type Community Layer Community specific authentication, authorization & single sign-on commutity data

  6. What community users see … Tomorrow EUDAT portal, fornon-affiliated users, many credential types Various community portals, different credential types common metadata exploration common data stage-in and stage-out services data services for the long tail data, also from citizen scientists common replication services with access to distributed storage Unified Authentication, Authorization & Single Sign-On data community data commutity data useful very Other

  7. from: Analysis ofthe FIM doc(v0.7, L. Florio et al. 2013) (high) (high) (high) EUDAT supportstheserequirements, but emphasizes #3, #4 and #9 User friendliness (high) Browser & non-browser federated access (high) Multiple technologies with translators including dynamic issue of credentials (medium) Bridging communities (medium) Implementations based on open standards and sustainable with compatible licenses (high) Different Levels of Assurance with provenance (high) Authorisation under community and/or facility control (high) Attributes must be able to cross national borders(high) Well defined semantically harmonised attributes(medium) Flexible and scalable IdP attribute release policy(medium)

  8. EUDAT Sites communitycentres repositories generaldatacentres (replica) storages

  9. SafeReplication Service Community repository • To optimize access for user from different regions • To bring data strategically closer to systems for powerful compute-intensive analysis • PIDs are used to keep track on location andcan provide attributes PIDs•Policyrules Data center Data center Data center store store store EUDAT CDI Domain of registered data • Robust, safe and highly available data replication service for small- and medium- sized repositories • To guard against data loss in long-term archiving and preservation

  10. Use Case: CLARIN – Safe Replication EPIC PID registry

  11. Safe Replication “islands” INCF EPOS / Orpheus diXa ENES/CMIP5,IPCC-AR5 CLARIN / Replix communitycentres repositories CLARIN / CUNI VPH / VIP CLARIN / CUNI NeuGrid generaldatacentresreplicastorages EPOS / PP WG7

  12. Data Staging Service • Provide the means to ingest computational results into the repository via the EUDAT infrastructure Data center Data center store store PRACE HPC HPC EUDAT CDI Domain of registered data Support researchers in transferring large data collections from EUDAT storage to HPC facilities Reliable, efficient, and easy-to-use tools to manage data transfers

  13. EUDAT Services (1) Safe Replication Service ReplicatingData Objects (DO) from a RepositorytoReplica Storages Repository & Replica Storagebelongto separate administrative zones Registration ofOriginal DOandReplica PID / objectidentifier Service Create DO handles Manages/Maintain DO handles Resolve DO handles Data Staging Service Replication of Data fromthedomainof registered data (Stage-Out) Replication ofdataobjectsintothedomainof registered data (Stage-In) Replication of not-registered Data Objects betweenscratchstorages

  14. Service specificactors/actions (1) Safe Replication Service Repository Data Manager replicates Replica Storage Manager registers DOs 1) (community) useraccessdata via repository 2) User accessdata via replicastorage PID (Handle) Service Repository Data Manager: creates/managesprimaryobject handle Replica Storage Manager: creates/managessecondaryobjecthandles Users andothersresolvesthelocationofthephysicalstoragethehandles (PIDs) Data Staging Users accessandfetchdatafromeithertherepositoryorthereplicastorage User ingestnewdataintotherepository Community repository PIDs•Policyrules Data center Data center Data center store store store EUDAT CDI Domain of registered data

  15. Simple Store for ”long-tail” data and the Citizen scientists Simplestore portal Simpleupload Simple metadata • Utilise other EUDAT services to provide reliability and data retention • PIDs are assigned to uploaded DO PID registration Data center Data center Data center store store store EUDAT CDI Domain of registered data Allow registered users to upload ”long tail” data into the EUDAT store Enable sharing objects and collections with other researchers

  16. Definition of the data sets as objects for entitlement Joint Metadata Service Data center Data center Community Community store store repository repository Metadata portal EUDAT CDI Domain of registered data Find and define collections of scientific data – generated either by various communities or via EUDAT services (e.g. facetted search) Access those data collections through the given references in the metadata to the relevant data stores

  17. EUDAT Services (2) Simple Store Service Repositoryfor registered datawithmetadataforthesharing Digital objectsare registered (handlesareassigned) Fragmented User Group: manycommunities& „citizenscientists“ arecontributingandretrievingdata EUDATbox Service Temporaryshareablestoragespacefordata, not necessarily registered User depositsdata – not necessarilywithmetadata Not a homogeneoususergroup: manycommunities, „citizenscientists“ (Joint) Metadata Service Metadatafromvariousrepositoriesareharvestedandcollected Metadataexploration, facettedsearch:resultsetsdefinedatasetforentitlement

  18. Service specificactors/actions (2) Simple Store (Repository) Users depositdataandmetadata User searchforandaccessdata Repository Storage Manager (needstocreatethe handle service) EUDAT box User depositdata User sharesdatabyinvitingotherusers User accessdata (Joint) Meta Data Service Manager harvestsmetadatafrom (many)repositories also via thereplicasite Data center Data center Community Community store store repository repository Metadata portal EUDAT CDI Domain of registered data

  19. * IdP A IdP B • zonedcredentialconversionservice • uniqueuserIds,project-wisemappedto • attributebasedaccesscontrolinformation Different types of Identity Providers AuthN Identity credentialconversion IdP C x.509 shib eID IdP D OpenID consolidatedcredentials AtP 1 Communities AtP 2 AtP 3 Attribute Provider AuthZ either community-managed or ( ) attributes provided by user’s home IdP are reused *

  20. EUDAT AAI-TF approach ConSec: Contrail Security code

  21. The Figure shows the high level view: SAML is used for authentication (possibly translated from OpenID (not shown)); OAuth(version 2) is used for delegation (internally, within the federation), and XACML is used for access control policies. Control (in the workflow sense) roughly goes from left to right and from top to bottom. Internally, an X.509 certificate with authorisation attributes is generated; this certificate is also managed internally and thus not usually exposed to (or accessible by) the user. Its purpose is threefold: (a) to ensure that non-HTTP services can be accessed (i.e., outside the OAuth delegation workflow), such as GridFTP and iRODS, and (b) to allow fine-grained authorisation, and (c) to allow command line access to services for expert users. In OAuth, the authorisation server remains the central hub where access is delegated. However since, EUDAT needs finer grained access, so the generated X.509 certificate carries also authorisation attributes (see below), which are checked against pre-defined access policies. The system deployed and used by EUDAT was built by the Contrail project, so we are reusing the Contrail Security (ConSec) code and tools developed within this pilot project. This decision was based on the evaluation of options, where ConSec promised most of the features required by the EUDAT communities. EUDAT is currently running a ConSec authentication infrastructure for integration at FZJ. EUDAT is currently not running an authorisation infrastructure.

More Related