220 likes | 397 Views
ALICE Grid Status. David Evans The University of Birmingham. GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006. Outline of Talk. The ALICE Experiment ALICE computing requirements ALICE Grid – AliEn Analysis using AliEn Status of ALICE Data Challenge 2006 Summary and Outlook.
E N D
ALICE Grid Status David Evans The University of Birmingham GridPP 16th Collaboration Meeting QMUL 27-29 June 2006
Outline of Talk • The ALICE Experiment • ALICE computing requirements • ALICE Grid – AliEn • Analysis using AliEn • Status of ALICE Data Challenge 2006 • Summary and Outlook
The ALICE Experiment • ALICE is one of the four main LHC experiments at CERN. • Only one dedicated to heavy-ion physics. • Study of QCD under extreme conditions • ~ 1000 collaborators • ~ 100 institutions • Birmingham is only UK institute involved
ALICE Requirements • Data taking (each year) • 1 month of Pb-Pb data ~ 1 PByte • Also p-p for rest of the year ~ 1 PByte • Large scale simulation effort • 1 Pb-Pb event: ~ 8 hrs (3 GHz) • Data Reconstruction • Data analysis • Smaller Collaboration than ATLAS or CMS but similar computing requirements.
Profile of CPU requirements 35 MSK2K Total Ext Tier 1 Ext Tier 2 CERN T0 CERN T1 Nov 09 Jan 07 Sept 08
Tier Hierarchy • MONARC Model • ‘Cloud Model’ (Tier free) used in ALICE data challenges (native AliEn sites – for LCG site we comply with Tier model) Tier 0 RAW data master copy Data reconstruction (1st pass) Prompt analysis Tier 1 Copy of RAW reconstruction Scheduled analysis Tier 2 MC production Partial copy of ESD Data analysis
ALICE Gridd - AliEn • AliEn (ALICE Environment) – Grid framework developed by ALICE – used in production for ~5 years. • Based on WEB services and standard protocols. • Built around open source code • Less than 5% is native AliEn code (mainly PERL). • To date, > 500,000 ALICE jobs have been run under AliEn control worldwide.
user Resource Broker Resource Broker server server user job list AliEn ‘Pull’ Protocol • One of the major differences between ALiEn and LCG grids is that AliEn uses the ‘pull’ rather than ‘push’ protcol. • EDG/Globus model: • ALiEn model:
LCG / gLite • ALICE is committed to using as much common grid applications as possible. • Changeshave been made to make AliEn work with LCG • E.g. changes to File Catalogue (FC) LFC (Local File Catalogue or LCG File Catalogue) • V0 Box at each Tier 1 and Tier 2 • Globus/GSI compatible authentication • Interface AliEn gLite in development
Analysis • Core of ALICE computing model is AliRoot • Uses ROOT framework • Couple AliEn with ROOT for Grid-based analysis. • Use PROOF – Parallel ROOT Facility • To the user it’s like using ROOT • 4-tier architecture: • ROOT client session, API server (AliEn + PROOF), Site PROOF master servers, PROOF slave servers. • Data from DC2006 only accessible via Grid
AliEn FC …. API Server Client API PROOF Each node has PROOF slave Each site has PROOF master server List of sites with data Uses ‘pull’ protocol i.e. the slaves ask the master for work packets. Slower slaves get smaller work packets etc.
ALICE Data Challenge 2006 (PDC’06) • Last ‘challenge’ before the start of data taking • Test of all Grid components • AliEn as a ALICE interface to the Grid and much, much more • LCG/gLite baseline services (WMS, DMS) • Test of computing centres infrastructure • Major test of stability of all of the above
Grid software deployment and running • LCG sites are operated through the VO-box framework • All ALICE sites need one • Relatively extended deployment cycle, a lot of configuration and version update issues had to be solved • Situation is quite routine now • Data management • This year – xrootd as disk pool manager on all sites • The installation/configuration procedures have just been released • xrootd integrated in other storage management solutions (CASTOR, DPM, dCache) – under development • Data replication (FTS) • We use it for scheduled replication of data between the computing centres (RAW from T0->T1, MC production T2->T1, etc…) • Fully incorporated in the AliEn FTD, to be extensively tested in July
VO box support and operation • In additional to the standard LCG components, the VO-box runs ALICE-specific software components • V0-boxes now at RAL Tier 1 and Birmingham Tier 2 • Birmingham ALICE students are testing ALiEn for analysis purposes through Birmingham Tier 2. • The installation and maintenance of these is entirely our responsibility: • Support for UK V0-box supplied by CERN (no UK manpower available) • Site related problems are handled by the site admins • LCG services problems are reported to GGUS
Operation status • Running in a continuous mode since 24/05 • VO-boxes: • monthly releases of AliEn (curently v.2-10) , LCG 2.7.0 and soon tests of gLite 3.0 • Central ALICE services: • AliEn machinery and API Service is developed/deployed and maintained by the AliEn team • Site services: • Stability testing of both AliEn and LCG components • The interfaces AliEn-LCG/gLite are still in development • A gLite V0-box has already been provided at CERN and first tests performed.
RAL: 0.7% Sites contributions in the past 2 months • 60%T1, 40%T2 (almost half from 2 T2 sites!)
Running status – site averages • Pledged resources – 4000 CPUs • Our average is on a 12% level • Due to central and site services malfunctions • Mostly due to sites providing less CPUs than pledged
Stability improvements • This is a data challenge, so there is always place for improvement: • AliEn is undergoing gradual fixes and new features are added • The LCG software will undergo a quantum leap – move from LCG to gLite • Site infrastructure – VO-box, etc… also needs solidification, especially at the T2s • Monitoring and control – continuously adding new features
Outlook • PDC’06 has started as planned • This is the last exercise before the beam! • It is a test of all Grid tools/services we will use in 2007 • If not in PDC’06, good chance is that they will not be ready • It is also a large-scale test the computing infrastructure – computing, storage and network performance
Outlook (2) • We have all pieces needed to run production on the Grid (some untested). • The exercise started 2 months ago and will continue until the end of the year • At the moment, we are optimising the use of resources – attempting to get from the sites the promised resources • Next phase of the plan is a test of the file transfer utilities of LCG (FTS) and integration with AliEn FTD • In parallel to that we will run event production as usual
Summary • AliEn is a Grid framework developed by ALICE using 95% open source code (e.g SOAP) and 5 % AliEn specific (perl) code. • AliEn evolving to take into account EGEE/gLite framework and to work with LCG. • New user interfaces developed • PROOF for analysis developed • Better authentication/authorisation developed • Data Challenge 2006 – since April – going well • V0 boxes at RAL T1 and B’ham T2 • Lack of computing resources a worry.