1 / 12

Predrag Buncic CERN

Learn about ALICE's computing strategy for Run3 by Predrag Buncic at CERN, focusing on event processing, memory matrix, and cost efficiency.

tgoldsmith
Download Presentation

Predrag Buncic CERN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ALICE Computing Model in Run3 Predrag Buncic CERN

  2. A L I C E ALICE @ Run 3 MULTIPLEXER Computer Farm Complete Events Continuous detector reading: replace events with time windows (20 ms, 1000 events). Memory Matrix 2 Predrag Buncic The ALICE Online-Offline Project

  3. A L I C E O2 TDR https://cds.cern.ch/record/2011297 3 Predrag Buncic The ALICE Online-Offline Project

  4. A L I C E ALICE @ Run 3 6 TB/s 2.4 MW computing facility 100 k CPU cores, 5000 GPUs, 500 FPGAs 60 PB local disk buffer at P2 90 GB/s 4 Predrag Buncic The ALICE Online-Offline Project

  5. A L I C E How to keep the cost down? x100 Event rate x10 Storage Software performance Use of hardware accelerators Parameterized MC Minimize remote I/O Specialize sites for given workflow 5 Predrag Buncic The ALICE Online-Offline Project

  6. A L I C E What will change in Run 3? T0/T1 CTF -> ESD -> AOD 1..n O2 RAW -> CTF -> ESD -> AOD 1 CTF Reconstruction Calibration Archiving Analysis Reconstruction Calibration AOD AOD AOD T2/HPC MC -> CTF -> ESD -> AOD 1..n AF AOD -> HISTO, TREE 1..3 • Grid Tiers will be mostly specialized for given role • O2 facility (2/3 of reconstruction and calibration), T1s (1/3 of reconstruction and calibration, archiving to tape), T2s (simulation) • All AODs will be collected on the specialized Analysis Facilities (AF) capable of processing ~5 PB of data within ½ day timescale • The goal is to minimize data movement and optimize processing efficiency Analysis Simulation 6 Predrag Buncic The ALICE Online-Offline Project

  7. A L I C E Evolution of T2s Disk (PB) CPU (cores) • Expecting Grid resources (CPU, storage) to continue growing at 20% rate per year • Since T2s will be used almost exclusively for simulation jobs (no input) and resulting AODs will be exported to T1s/AFs, we expect to significantly lower needs for storage on T2s • Simulation will remain the main consumer of CPU cycles 7 Predrag Buncic The ALICE Online-Offline Project

  8. A L I C E Computing shares in Run 3 2/3 1/3 • Relative importance of O2 vs Grid may change with time if Grid resources continue to grow as expected 8 Predrag Buncic The ALICE Online-Offline Project

  9. A L I C E Analysis Facilities • Motivation • Analysis is the least efficient of all workloads that we run on the Grid • I/O bound in spite of attempts to make it more efficient by using the analysis trains • Requirements • 20-30’000 cores and 5-10 PB of disk on very performant file system • In order to run the organized analysis on local data like we do today on the Grid efficiently using available resources we need to adapt and develop our analysis software 9 Predrag Buncic The ALICE Online-Offline Project

  10. A L I C E Reducing complexity: Data Management • Federating storage • Each cloud/region provides reliable data management and sufficient processing capability • Whatever happened in a region, stays in that region 10 Predrag Buncic The ALICE Online-Offline Project

  11. A L I C E Reducing complexity: Workload Management HPC MC -> CTF T1 Simulation Facility AOD ALICE GRID No need to track every job AOD HPC AOD -> HISTO, TREE Analysis Facility • Localizing execution of complex workflows • For this to work, batch systems would have to be able to execute the complex workflows (many of them can but we do not use them like that in the grid context) 11 Predrag Buncic The ALICE Online-Offline Project

  12. A L I C E Top 5 concerns 12 Predrag Buncic The ALICE Online-Offline Project

More Related