ALICE Tier2/3 @ GSI: User Experience and Infrastructure Overview

User experience with the ALICE Tier2/3 @GSI A. Andronic, A. Kalweit, A.Manafov, A.Kreshuk, C.Preuss, D.Miskowiec, J.Otwinowski, K. Schwarz, M. Ivanov, A. Marin, M.Zynovyev, P. Braun-Munzinger, P.Malzacher, S. Radomski, S. Masciocchi, T.Roth, V.Penso, W.Schoen (ALICE-GSI)

Outline • Introduction: • about GSI • about ALICE • The GSI Tier2/3 • The GSIAF • The GSI lustre cluster • The Grid@GSI • Conclusions

GSI: Gesellschaft für SchwerionenforschungGerman Institute for Heavy Ion Research ~1000 employees ~1000 guest scientists Budget: ~95 Mio Euro

FAIR GSI as of today

ALICE: The dedicated HI Experiment at the CERN LHC

ALICE Collaboration > 1000 Members ~ 30 Countries ~ 100 Institutes ALICE@GSI: Large participation in TPC and TRD Detector calibration Physics Analysis

The ALICE EXPERIMENT

The ALICE Grid Map Europe Africa Asia North America ALICE-FAIR meeting 8

Alice Tier 2/3 @GSI: Size/Ramp-up plans Capacity is for the Tier 2 (fixed via WLCG MoU) +1/3 for the Tier 3 http://lcg.web.cern.ch/LCG/C-RRB/MoU/WLCGMoU.pdf

What we want to provide:a mixture of a Tier 2 a Tier 3 a PROOF farm with local storage: GSIAF integrated in the standard GSI batch farm (GSI, FAIR) We want to be able to readjust the relative size of the different parts on request.

Investment plans at GSI: ALICE Tier 2 GSI Invest FAIR ALICE T2 Time

GSI – current setup CERN GridKa 1 Gbps Grid 3rd party copy test cluster10 TB 80 TB ALICE::GSI::SE::xrootd vobox Grid CE LCG RB/CE GSI batchfarm: ALICE cluster (160 nodes/1500 cores for batch: 20 nodes for GSIAF) Directly attached disk storage (81 TB) PROOF/Batch GSI batchfarm: Common batch queue Lustre Clustre 150 TB GSI

Present Status • ALICE::GSI:SE::xrootd • 75 TB disk on fileserver (16 FS a 4-5 TB each)‏ • 3U 12*500 GB disks RAID 5 • 6 TB user space per server • Batch Farm/GSIAF • gave up concept of ALICE::GSI::SE_tactical::xrootd • not good to mix local and Grid access • cryptic file names make non Grid access difficult nodes dedicated to ALICE (Grid+local)‏ (used by FAIR/Theory if free) • ~1500 CPU' • 15*4 = 60 Cores, 8 GByte RAM, 2 Tbyte Disk + System (D-GRID) • 25*8 = 200 Cores, 16 GByte RAM, 2 Tbyte Disk in a RAID 5 (ALICE) • 40*8 = 320 Cores, 32 GByte RAM, 2 Tbyte Disk in a RAID 5 (D-GRID) • 7*16*8 = 896 Cores, 16 GByte RAM, 2 * 128 GByte Disks in a RAID MIRROR (Blades) (ALICE) • on all machines: Debian Etch 64bit

The GSI AF

PROOF – user experience • Proof cluster: 20 x 8 = 160 workers • Used heavily for code development and debugging as it providesfastresponse onlarge statistics • For example, ~1.4 TBytes of data are processed in ~20 minutes for a very CPU-intensive analysis • Overall, the users we arevery happywith it • (almost) everything is allowed – we can still handle it with 6-8 active users • All machines see an NFS-mounted disk • users can use their own libraries • Large disk space (lustre + local disks) • Intermediate results at many points can be studied

Installation • shared NFS dir, visible by all nodes • xrootd ( version 2.9.0 build 20080621-0000)‏ • ROOT (521-01-alice and 519-04)‏ • AliRoot (head)‏ • all compiled for 64bit • reason: due to fast software changes • disadvantage: possible NFS stales • started to build Debian packages of the used software to install locally

Configuration • Setup: 1 standalone, high end 32 GB machine for xrd redirector and proof master, Cluster: xrd data servers and proof workers, AliEn SE, Lustre • So far no authentification/authorization • Via Cfengine • platform independent computer administration system (main functionality: automatic configuration). • xrootd.cf, proof.conf, TkAuthz.Authorization, access control, Debian specific init scripts for start/stop of daemons (for the latter also Capistrano and LSF methods for fast prototyping)‏ • All configuration files are under version control (subversion)‏

monitoring via MonaLisahttp://lxgrid3.gsi.de:8080

PROOF users at GSIAF

Cluster load

PROOF cluster - issues • But, still there are some problems: • Transparency for users • “It runs fine locally, but crashes on PROOF, how do I find where the problem is?” • Fault Tolerance • Much progress in the last year, but still our problem #1 • The worst is that misbehavior of one user session can kill the whole cluster • Happens rarely, but needs manual administrator intervention

The Lustre Cluster

The upgraded (alpha) GSI Lustre Cluster Running lustre 1.6.4.3, Debian 2.6.22 kernel • 27 (17) Object Storage Servers, in ``fail out mode'' • Roughly 135 (80)TBytes volume (RAID 5) • Ethernet connections (27(17) x 1 Gbit/s). Bonding tested (2x1 Gbit/s per OSS), but hardware not available • ~ 1500 (400) ALICE client CPU's Other talks: W. Schoen, St Louis (2007), CERN (2008, HEPIX) S. Masciocchi (CERN,2008)

Computing infrastructure 27/

The ALICE Analysis Train The concept: (ROOT, ALICE) • Experimental data have large volume (200 kBytes/event) • All data stored in ROOT format • The data analysis is dominated by input/output latencies • Idea: load data once and run many analyses (train) • The ALICE Analysis Framework (A. Morsch, A. Gheata, et al.) The GSI analysis train: • 12 physics analyses (CPU/total time ~ 0.75) • Reads simulated events from lustre • Runs as batch jobs on the local farm

Speed results Due to I/O, CPU/total time improves with the train

Performance (with 17 FS) • Total n. of events/sec versus number of parallel jobs data on lustre * node filled with MC jobs data on one local disk * node filled with MC jobs Saturation due to network limitation! If (4000 ev/sec) we need 3 days to analyze 109 events (1 year@LHC)

Network traffic-1 (with 17 FS) • 10 Gbit connection • switch giffwsx41 (the best one) • 20 nodes No problems on the 10Gbit links

Network traffic-2 (with 17 FS) • file server lxfsd011 • 1 Gbit connection • for each of the current 17 file servers Very close to saturation on the 1Gbit links!!!

Network traffic again (with 27 FS) 10 Gbit connection, switch giffwsx41 Now: data traffic better distributed

GSI Luster Clustre

Next Generation Cluster • Soon available: running lustre 1.6.5 (Move to Version 1.8.x when available) • 35 Object Storage Servers • Initially 160 TBytes volume, later 600 TBytes • MDS: 2 servers in a High Availability configuration • Ethernet connections (100x1 Gbit/s) • ~1400 ALICE client CPU's • ~ total 4000 GSI client CPU's • quotas will be enabled Walter Schoen Thomas Roth

The ALICE-GSI Grid

ALICE Grid jobs computed at GSI > 50000 GSI: 1% Job efficiency at GSI: 80.6%

Conclusions • Coexistence of interactive and batch processes (PROOF analysis on staged data and Grid user/production jobs) on the same machines can be handled !!! • re”nice” LSF batch processes to give PROOF processes a higher priority (LSF parameter)‏ • number of jobs per queue can be increased/decreased • queues can be enabled/disabled • jobs can be moved from one queue to other queues • Currently at GSI each PROOF worker is an LSF batch node • optimised I/O. Various methods of data access (local disk, file servers via xrd, mounted lustre cluster) have been investigated systematically. Method of choice: Lustre and eventually xrd based SE. Local disks are not used for PROOF anymore at GSIAF. • PROOF nodes can be added/removed easily • Administrative overhead with local disks is larger compare to with a file cluster • extend GSI T2 and GSIAF according to promised ramp up plan

Acknowledgements: The Team A. Andronic, A. Kalweit, A.Manafov, A.Kreshuk, C.Preuss, D.Miskowiec, J.Otwinowski, K. Schwarz, M. Ivanov, A. Marin, M.Zynovyev, P. Braun-Munzinger, P.Malzacher, S.Radomski, S. Masciocchi, T.Roth, V.Penso, W.Schoen (ALICE-GSI, IT-GSI)

Backup slides

The ALICE computing model (1/2) • pp • Quasi-online data distribution and first reconstruction at T0 • Further reconstructions at T1’s • AA • Calibration, alignment and pilot reconstructions during data taking • Data distribution and first reconstruction at T0 during four months after AA • Further reconstructions at T1’s • One copy of RAW at T0 and one distributed at T1’s

The ALICE computing model (2/2) • T0 • First pass reconstruction, storage of one copy of RAW, calibration data and first-pass ESD’s • T1 • Reconstructions and scheduled analysis, storage of the second collective copy of RAW and one copy of all data to be kept, disk replicas of ESD’s and AOD’s • T2 • Simulation and end-user analysis, disk replicas of ESD’s and AOD’s

The TPC

The Transition Radiation Detector e-identification • 18 supermodules • 6 radial layers • 5 longitudinal stacks • 540 chambers • 750m2 active area • 28m3 of gas Each chamber: ≈ 1.45 x 1.20m2 ≈ 12cm thick (incl.Radiators and electronics) in total1.18 million read out channels

TRD assembly and installation 4 SM's are installed

GSIAF – GSI Analysis Facility

Present Status • ALICE::GSI:SE::xrootd • 75 TB disk on fileserver (16 FS a 4-5 TB each)‏ • 3U 12*500 GB disks RAID 5 • 6 TB user space per server • Batch Farm/GSIAF • gave up concept of ALICE::GSI::SE_tactical::xrootd • not good to mix local and Grid access • cryptic file names make non Grid access difficult nodes dedicated to ALICE (Grid+local)‏ (used by FAIR/Theory if free) • 1500 CPU's • 160 boxes: 1200 cores (to a large extend funded by D-Grid): each • 2*2core 2.67 GHz Xeon, 8 GB RAM • 2.1 TB local disk space on 3 disks + system disk • Additionally 24 new boxes: each • 2*4core 2.67 GHz Xeon, 16 GB RAM • 2.0 TB local disk space on 4 disks including system • up to 2*4 core, 32GB RAM and Dell Blade Centres • on all machines: Debian Etch 64bit

ALICE Tier2/3 @ GSI: User Experience and Infrastructure Overview

ALICE Tier2/3 @ GSI: User Experience and Infrastructure Overview

Presentation Transcript

Branching and Looping Examples, cont’d

Midwest Tier2 Networking Status and Plans

Alice 3d program Workshop

Programming in Alice 3.1

UX – User experience

The ALICE Inner Tracking System: commissioning and running experience

Introduction to Alice

Alice in Wonderland В КОМПЬЮТЕРНЫХ ИГРАХ

SAP User Experience Management by Knoa Compared to Other End-User Monitoring Solutions

User-System I nteractions in Case B ase R easoning

Alice in Action with Java

ALICE STATUS

ALICE analysis framework

How to get started London Tier2

What is User Experience?

Staging to CAF + User groups + fairshare

Tapping into the User Experience

User Experience (UX) Design

CMI IP Experience: A User’s Perspective