1 / 13

Work Package II - DIOS

Work Package II - DIOS. Simone Campana CERN, Geneva, Switzerland Annecy, 8 February 2019. WP2 objective. Create a cloud of data services, often referred to as a “ Data Lake ” by building on and integrating existing work from a variety of areas: Research Infrastructures

lily
Download Presentation

Work Package II - DIOS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Work Package II - DIOS Simone Campana CERN, Geneva, Switzerland Annecy, 8 February 2019

  2. WP2 objective Create a cloud of data services, often referred to as a “Data Lake” by building on and integrating existing work from a variety of areas: Research Infrastructures previous EU projects state of the art solutions in the appropriate areas Collaborates with ongoing work from GEANT, PRACE, and other proposed H2020 projects specifically addressing the European Open Science Cloud Simone.Campana@cern.ch

  3. WP2 funded effort • Others partners plan to participate with unfunded effort Simone.Campana@cern.ch

  4. Data Lake strawman Storage Orchestration Services Volatile Storage Asynchronous Data Transfer Services Storage Storage Distributed Storage Data (Lake) Infrastructure Storage Storage Storage Content Delivering and Caching Services Data Center Data Center HPC Compute @HOME Grid Compute Grid Compute Grid Compute Cloud Compute Compute Infrastructure Compute Provisioning Simone.Campana@cern.ch

  5. Task 2.2: orchestration service Storage Orchestration Services Volatile Storage Asynchronous Data Transfer Services Storage Storage Distributed Storage • Replication policies, access policies • QoS: optimization between redundancy, performance and cost • Data lifetimes and lifecycles: dynamic replication, deletion, change of QoS Data (Lake) Infrastructure Storage Storage Storage Content Delivering and Caching Services Data Center Data Center HPC Compute @HOME Grid Compute Grid Compute Grid Compute Cloud Compute Compute Infrastructure Compute Provisioning Implement a system managing scientific user policies while optimizing the service provider costs. Simone.Campana@cern.ch

  6. Task 2.3: Integration with Compute Storage Orchestration Services Volatile Storage Asynchronous Data Transfer Services Processing capacity might be not co-located with data • Data Transfer Services • Caching and latency hiding services (Content Deliver Network) • Compute services will be heterogeneous: Grid, HPC, Cloud (including commercial) Storage Storage Distributed Storage Data (Lake) Infrastructure Storage Storage Storage Content Delivering and Caching Services Data Center Data Center HPC Compute @HOME Grid Compute Grid Compute Grid Compute Cloud Compute Compute Infrastructure Compute Provisioning Does not focus of provisioning compute resources Focuses instead on serving data to large scale processing centers Processing capacity might be not co-located with data Simone.Campana@cern.ch

  7. Task 2.4: Networking Wide Area Network is a key component in the Data Lake model Task 2.4 develops the capability to provide high capacity networking between data centers to enable traffic management Storage Orchestration Services Volatile Storage Asynchronous Data Transfer Services Storage Storage Distributed Storage Data (Lake) Infrastructure Leverages work done in WLCG and GEANT. Applies to all scenarios in Task 2.3: • Asynchronous Data Transfer of large data volumes • Content Deliver Network for processing • Integration of commercial compute resources Storage Storage Storage Content Delivering and Caching Services Data Center Data Center HPC Compute @HOME Grid Compute Grid Compute Grid Compute Cloud Compute Compute Infrastructure Compute Provisioning Simone.Campana@cern.ch

  8. Task 2.5: Authentication and Authorization Integrates solutions from different projects/activities to build a federated storage infrastructure • provide the appropriate level of granularity of authentication and access control to manage and protect data • provide the means by which to enable open access once data is released to the broader community Heterogeneous authentication mechanisms, management of memberships and policies, controlled delegation, leverage off-the-shelf libraries and components Simone.Campana@cern.ch

  9. Task 2.1 puts all this together Storage Orchestration Services Volatile Storage Asynchronous Data Transfer Services Storage Storage Distributed Storage • Store scientific data of the Research Infrastructures with the policies by them defined • Provide the needed monitoring and analytics tools • Certifies the data centers for bit preservation Data (Lake) Infrastructure Storage Storage Storage Content Delivering and Caching Services Data Center Data Center HPC Compute @HOME Grid Compute Grid Compute Grid Compute Cloud Compute Compute Infrastructure Compute Provisioning Builds the prototype which federates the storage of several of the data centers that support the main science communities in the project Simone.Campana@cern.ch

  10. WP2 tasks Task 2.1 Data Lake Infrastructure and Federation Services. CERN (Xavier Espinal) Task 2.2 Data Lake orchestration service. DESY (Patrick Fuhrmann) Task 2.3 Integration with Compute Services. NOW-I-ASTRON Task 2.4 Networking. SKAO (Rosie Bolton) Task 2.5 Authentication and Authorization. INFN (Andrea Ceccanti) Simone Campana (CERN) as WP leader, Rosie Bolton (SKAO) as deputy Simone.Campana@cern.ch

  11. Milestones and Deliverables Simone.Campana@cern.ch

  12. Next Steps Focus on our first milestone (and deliverable): discuss and design the first implementation of the data lake Evolve the strawman into an architecture Which components to focus on in the initial phase Sciences drive the needs (and needs drive the design) This will be an initial design. Flexibility (both in design and components) is one key aspect Simone.Campana@cern.ch

  13. Next steps. Practically… • Start with an overview of the tools and services we have available and we are developing/considering • One phone call dedicated to each task, in the next 2 months • Receive input from the Research Infrastructures and project: which use cases are covered and which one are not, or need attention • F2F one day meeting to go through the input. Task Leaders and one person for each RI/project should be there • WP2 workshop in early July to produce the design of the system (based on the digested input) and define the work program • Locations volunteering? Simone.Campana@cern.ch

More Related