ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

ALICE T1/T2 workshop 4-6 June 2013CCIN2P3 LyonFamous last words

Some stats • 46 registered participants, we’ve counted 45 in the room. Still looking for the mystery No.46… • Good attendance, clearly these venues are still popular and needed • 24 presentations over 5 session • 9 general on operations, software, procedures • 15 site-specific • The ‘KIT’ session format worked well this time too • Appropriate number of coffee and lunch breaks, social events • Ample time for questions (numerous) and discussion (lively), true workshop style • With one notable exception (LB), all presenters respected the allotted time

Themes • Operations summary • WLCG middleware/services • Monitoring • Networking: LHCONE and IPv6 • Storage: xrood v4. and EOS • CVMFS and AliRoot • Site operations, upgrades and (new) projects, gripes (actually none…)

Messages digest from the talks(Renaud* and Latchezar’s take) • We attempted to not trivialize the message of the various presentations. It is still better to look at the original slides • Operations • Successful year for ALICE and Grid operations – smooth and generally problem free, incident handling is mature and fast • No changes foreseen to the operations principles and communication channels • 2013/2014 (LHC LS1) will be years of data reprocessing and infrastructure upgrade • The focus is on analysis – how to make it more efficient * Renaud has graciously accepted to be blamed for all incorrect statements

Messages (2) • WLCG middleware • CVMFS installed on many sites, leverage ALICE deployment and tuning through the existing TF • WLCG VO-box is there and everyone should update • All EMI-3 products can be used • SHA-2 is on the horizon, services must be made compatible • glExec – hey, it is still alive! • Agile Infrastructure – IaaS, SaaS (for now) • OpenStack (Cinder, Keystone, Nova, Horizon, Glance) • Management through Puppet (Foreman, MPM, PuppetDB, Hiera, git) … and Facter • Storage with Ceph • All of the above – prototyping and tests, ramping up

Messages (3) • Site dashboard • http://alimonitor.cern.ch/siteinfo/issues.jsp • Get on the above link and start fixing, if you are on the list • LHCONE • The figure • speaks for itself • All T2s should • get involved • Instructions, • expert lists are • in the presentation

Messages (4) • IPv6 and ALICE • IPv4 address space almost depleted, IPv6 is being deployed (CERN, 3 ALICE sites already) • Not all services are IPv6-ready – test and adjustment is needed • Cool history of the network bw evolution • Xrootd 4.0.0 • Complete client rewrite, new caching, non-blocking request (client call-back), new user classes for metadata and data operations, IPv6 ready • Impressive speedup for large operations • API redesigned, no backward compatibility, some cli commands change names • ROOT plugin ready and being tested • Mid-July release target

Messages (5) • EOS • Main disk storage manager at CERN, 45PB deployed 32PB used (9.9/8/3 ALICE) • Designed to work with cheap storage servers, uses software raid (RAIN), ppm probability of file loss • Impressive array of control and service tools (operations in mind) • Even more impressive benchmarks… • Site installation – read carefully the pros/cons to decide if it is good for you • Support – best effort, xrootd type

Messages (6) • ALICE production and analysis software • AliRoot is “one software to rule them all” in ALICE offline • >150 developers, analysis 1M SLOC, reconstruction, simulation, calibration, alignment, visualization: ~1.4M SLOC, supported on many platforms and flavors • In development since 1(8)998 • Sophisticated MC framework with embedded (multiple) generators, using G3 and G4 • Incorporates the full calibration code, which is also run on-line and in HLT (code share) • Encapsulates fully the analysis, a lot of work on improving it, more quality and control checks needed • Efforts to reduce memory consumption in reco • G4 and Fluka in MC

Messages (7) • CVMFS – timeline and procedures • Mature, scalable and supported product • Used by all other LHC experiments (and beyond) • Based on proven CernVM Family • Enabling technology for Clouds, CernVM as a user interface, Virtual Analysis Facilities, opportunistic resources, Volunteer computing, Part of a Long Term Data Preservation • April 2014 – CVMFS on all sites, only method of sw distribution for ALICE

Sites Messages (1) • UK • GridPP T1+19, RAL, Oxford and Birmingham for ALICE • Smooth operation, ALICE can (and does) run beyond its pledge, occasional problems with job memory • Test of cloud on small scale • RMKI_KFKI • Shared CMS/ALICE (170 cores, 72TB disk) • Good resources delivery • Fast turnaround of experts, good documentation on operations is a must (done)

Sites Messages (2) • KISTI • Extended support team of 8 people • Tape system tested with RAW data from CERN • Network still to be debugged, but not a showstopper • CPU to be ramped up x2 in 2013 • Well on its way to be the first T1 since the big T1 bang • NDGF • Lose some (PDC), get some more cores (CSC) • Smooth going, dCache will stay, and will get location information to improve efficiency • The 0.0009 effy at DCSC/KU still a mystery, however hurts NDGF as a whole

Sites Messages (3) • Italy • New head honcho – Domenico Elia (grazie Massimo!) • Funding is tough, National Research Projects help a lot for manpower, PON helps with hardware in the south • 6T2s and T1 – smooth delivery and generally no issues • Torino is a hotbed of new technology – Clouds (OpenNebula, GlusterFS, OpenWRT) • TAF is open for business, completely virtual (surprise!) • Prague • The city is (partially) under water • Current 3.7cores 2PB disk, shared LHC/D0, contributes ~1.5% Grid resources of ALICE+ATLAS • Stable operation, distributed storage • Funding situation is degrading

Sites Messages (4) • US • LLNL+LBL resources purchasing is complimentary and fits well to cover changing requirements • CPU pledges fulfilled, SE a bit underused, on the rise • Infestation of the ‘zombie grass’ jobs, this is California… • Possibility for tape storage at LBL (potential T1) • France • 8T2s, 1T1, providing 10% of WLCG power, steady operation • Emphasis on common solutions for services and support • All centres are in LHCONE (7+7PB have already passed through it) • Flat resources provisioning for the next 4 year

Sites Messages (5) • India (Kolkata) • Provides about 1.2% of ALICE resources • Innovative cooling solution, all issues of the past solved, stable operation • Plans for steady resources expansion • Germany • 2T2s, 1T1 – the largest T1 in WLCG, provides ~50% of ALICE T1 resources • Good centre names: Hessisches Hochleistungsrechenzentrum Goethe Universität (requires 180IQ to say it) • The T2s have heterogeneous installation (both batch and storage), support many non-LHC groups, well integrated in the ALICE Grid, smooth delivery

Sites Messages (6) • Slovakia • Since 2006 In ALICE • Serves ALICE/ATLAS/HONE • Upgrades planed for air-conditioning and power, later CPU and disk, expert support is a concern • Reliable and steady resources provision • RDIG • RRC-KI (toward T1): Hardware (CPU/Storage) rollout, service installation and validation, personnel is in place, pilot testing with ATLAS payloads • 8T2s + JRAF + PoD@SPbSU, deliver ~5% of the ALICE Grid resources, historically support all LHC VOs • Plans for steady growth and sites consolidation • As all others, reliable and smooth operation

Some mementos

Victory! I work at a T1! How are you so cool under pressure?

The group

At the end • On behalf of all participants: • Thank you Renaud and thanks to the CCIN2P3 crew for the flawless organization

…and the future • There is still a visit of the computing centre in the afternoon for the people who subscribed • The next workshop (in one year’s time) needs a host • And now to lunch…

ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

Presentation Transcript

Teaching With Alice First Bytes Teachers Workshop July 2008

Alice 3d program Workshop

Programming in Alice 3.1

Alice in Wonderland В КОМПЬЮТЕРНЫХ ИГРАХ

The ALICE DQM Software and ROOT

Ultra-peripheral collisions with ALICE

Famous last words

Graph theory is not...

Dr Andy Adams Bayer Environmental Science, Lyon, France

THE PHYSICS OF THE ALICE INNER TRACKING SYSTEM

New network infrastructure at CCIN2P3

4 th Workshop on ALICE Installation and Commissioning

Schools Update and Workshop 6th June 2013

Particle ID in ALICE

Particle identification. RHIC, LHC (ALICE) (status and future)

D0 Grid: CCIN2P3 at Lyon

Prospects in ALICE for f mesons

Introduction to Alice

ALICE Tape Usage at Tier-1s: introduction