170 likes | 278 Views
EUDAT Towards a European Collaborative Data Infrastructure. Damien Lecarpentier – CSC, IT Center for Science, Finland ISC’11, Hamburg, 20 June 2011. Outline of the talk. EUDAT concept EUDAT consortium EUDAT service approach Expected benefits and challenges of a CDI.
E N D
EUDATTowards a European Collaborative Data Infrastructure Damien Lecarpentier – CSC, IT Center for Science, Finland ISC’11, Hamburg, 20 June 2011
Outline of the talk • EUDAT concept • EUDAT consortium • EUDAT serviceapproach • Expectedbenefits and challenges of a CDI
EUDAT Key facts and objectives • Initiativefundedthrough FP7 e-InfrastructureCall 9 (WP11): INFRA-2011-1.2.2: Data infrastructure for e-Science (november 2010) • Call 9 Objective: ”Establish a peristent and robustserviceinfrastructure for scientific data in Europe thatresponds to the need of data-intensive Science of 2020” • Budget 43M€ • EUDAT selected for funding (three-yearproject) • Officialstartingdate: 1st October 2011 • Biggestbudget of the call: 9,3 M€ EC Grant • Total Budget: 16,3 M€ • Consortium • 23 partnersrepresenting 13 countries • 15 usercommunitiesfrom a widerange of disciplines (Biomed, Earth Science, Climate, SSH, etc.) • Targets • EUDAT objective: “To deliver a Collaborative Data Infrastructure (CDI) with the capacity and capability for meeting researchers’ needs in a flexible and sustainable way, across geographical and disciplinary boundaries.” • The infrastructure must be Collaborative • The infrastructure must be driven by researchers’ needs • The infrastructure must be sustainable yet flexible • The infrastructure must be pan-European • The infrastructure must be multi-disciplinary
The current data infrastructurelandscape: challenges and opportunities • Long history of data management in Europe: several existing data infrastructures dealing with established and growing user communities (e.g., ESO, ESA, EBI, CERN) • New Research Infrastructures are emerging and are also trying to build data infrastructure solutions to meet their needs (CLARIN, EPOS, ELIXIR, ESS, etc.) • A large number of projects providing excellent data services (EURO-VO, GENESI-DR, Geo-Seas, HELIO, IMPACT, METAFOR, PESI, SEALS, etc.) • However, most of these infrastructures and initiatives address primarily the needs of a specific discipline and user community • Challenges • Compatibility, interoperability, and cross-disciplinary research • Data growth in volume and complexity (the so-called “data tsunami”) • strong impact on costs threatening the sustainability of the infrastructure • Opportunities • Potential synergies do exist: although disciplines have different ambitions, they have common basic needs and requirements that could be matched with generic pan-European services supporting multiple communities and ensuring greater interoperability. • Strategyneeded at pan-Europeanlevel
Towards a Collaborative Data Infrastructure Source: HLEG report, p. 31 • EUDAT willfocus on buildingthisgeneric data infrastructurelayer and offer a trusteddomain for long term data preservationaccompaniedwithrelatedservices to store, identify, authenticate and minethese data. • Thisneedbedone in closecollaborationwith the Communities • Coreservicesmustmatch the requirements of the communities • Communityservicescanalsobeincorporated into the common data serviceinfrastructurewhentheyare of use to othercommunities.
The EUDAT Communities (byfield) • EUDAT targetsallscientificdisciplines(disciplineneutral): • To enable the capture and identifycross-disciplinerequirements • To involving the scientists of all the communities in the shaping of the • infrastructure and itsservices
EUDAT Services Activities – Iterative Design • EUDAT’s Services activity is concerned with identification of the types of data services needed by the European research communities, delivering them through a federated data infrastructure and supporting their users • 1. CapturingCommunitiesRequirements (WP4) • Services to bedeployedmustbebased on usercommunitiesneeds • Strongengagement and collaborationwithusercommunities (EUDAT communities and beyond) to capturerequirements • 2. Building the services (WP5) • Userrequirementsmustbematchedwithavailabletechnologies • Need to identify: • availabletechnologies and tools to develop the required services (technologyappraisal) • gaps and marketfailuresthatshouldbeaddressedby EUDAT researchactivities • Services must be designed, built and tested in a pre-production test bed environment and made available to WP4 for evaluation by their users • 3. Deploying the services and operating the federatedinfrastructure (WP6) • Services mustbedeployed on the EUDAT infrastructure and made available to users, withinterfaces for cross-site, cross-communityoperation • Reliability, 24h/7d availability and accessibility of the shared services, withoperationalsecurity, data integrity and compliancewithstakeholderrequirements and policies.
EUDAT core services Core services arebuilding blocks ofEUDAT‘s Common Data Infrastructure mainlyincluded on bottomlayerofdataservices • Fundamental Core Services • Long-termpreservation • Persistent identifierservice • Data accessandupload • Workspaces • Web executionandworkflowservices • Single Sign On (federated AAI) • Monitoringandaccountingservices • Network services • Extended Core Services (community-supported) • Joint metadataservice • Joint dataminingservice No need to match the needs of all at the same time, addressing a group of communities can be very valuable, too
Service Model Approach and Generic Collaboration Generic Service Model • Fundamental Core Servicesmeetstronglyoverlappingservicerequirements • Extended Core Servicesaremainlycommunity-supported, communityrequirementsaretypicallyoverlapping between somedisciplines Collaboration between Teams • Fundamental Core Servicesareoperatedandsupportedby an Operations Team which collaboratesacrosstheparticipating centres. • Extended Core Servicesandotherjoint multi-disciplinaryservice must becommunity-supported, therequirementsareoverlapping between a specificsubsetofdisciplines
EUDAT Timeline 1st User Forum 2nd User Forum 3rd User Forum 4th User Forum EUDAT Kick-Off SustainabilityPlan Cross- Community Services Fullcore Services deployed First Services available USER REQUIREMENTS SERVICE DESIGN SERVICE DEPLOYMENT Service deployment 2012 2013 2014 2015
Expectedbenefits of a Collaborative Data Infrastructure • Enabling multi-disciplinary data intensive research and collaboration • Development of common services supporting research communities • Support to existing scientific communities’ infrastructures • Support to smaller communities through access to sophisticated services • Inter-disciplinary collaboration and exploitation of synergies between communities • Communities from different disciplines working together to build services • Data sharing between disciplines • Collaboration with other large-scale infrastructure • European e-Infrastructures: Géant, PRACE,EGI, etc. • Global initiatives in the US, Japan, Australia, etc. • Ensuring wide access to and preservation of data in a sustainable way • A robust generic infrastructure capable of handling the scale and complexity of data that will be generated over the next 10-20 years • Greater access to existing data and better management of data for the future • Increased security by managing multiple copies in geographically distant locations • Put Europe in a competitive position for important data repositories of world-wide relevance • Economies of scale and cost-efficiency • Shared resources and work are less costly
Challenges and Opportunities • Deliveringhighlevelmulti-disciplinary data services • Achieving a highlevel of interoperability in the context of diversity of data, researchdisciplines and practices • Need to stronglyinvolve the differentcommunities in the design and evaluation of services • EUDAT as a platform to discussinteroperabilityissues (alongwithotherinitiatives: e.g DAITF) • Building trustamongstakeholders • Trust between serviceproviders and usersbutalso between the researchers and disciplinesthemselves • Trust in the EUDAT infrastructure, the data deposited and collected, data integrity • Ensuring the sustainaibility of the infrastructure • Providing a framework and a plan to ensure the continuity of servicesbeyond the immediatefundingwindow, through the settingup of a sustainableentity • Funding and business models • Parnerships (new communities, industry, etc.) and governancemodels
The beginning of a long journey… “Do the difficult things while they are easy and do the great things while they are small. A journey of a thousand miles must begin with a single step.” Lao Tzu
How to get in touchwith EUDAT? Kimmo Koski, CSC - IT Center for Science EUDAT Project Coordinator Kimmo.Koski@csc.fi Peter Wittenburg, Max Planck Institute for Psycholinguistics at Nijmegen (MPI-PL) EUDAT Scientific Coordinator Peter.Wittenburg@mpi.nl Damien Lecarpentier, CSC - IT Center for Science EUDAT Project Manager Damien.Lecarpentier@csc.fi • EUDAT@ISC’11 • BoF session on “e-Infrastructure for science in Europe”, on Tuesday 21 June, 14:30-15:15, Hall B • Partners’ booths at ISC: • CSC #146 • BSC # 114 • DKRZ # 140 • EPCC # 152 THANK YOU!