270 likes | 397 Views
Status of the ALICE Experiment. Patricia Méndez Lorenzo CERN (IT-GD) / INFN(CNAF) FZK T1-T2 Workshop Forschungszentrum Karlsruhe 19-20 October 2005. Outlook. ◘ PDC04 Results ◘ Introduction: Some Generalities
E N D
Status of the ALICE Experiment Patricia Méndez Lorenzo CERN (IT-GD) / INFN(CNAF) FZK T1-T2 Workshop Forschungszentrum Karlsruhe 19-20 October 2005
Outlook • ◘ PDC04 Results • ◘ Introduction: Some Generalities • ➸Production on different Grids, available resources, interfacing AliEn and LCG • ◘ Scope and Planning for PDC05 • ◘ Goals of the new DC • ◘ Baseline Services • ◘ ALICE Requirements • ◘ Use of LCG and SC3 infrastructure • ◘ Next Steps • ◘ Support • ◘ Summary Thanks to S. Bagnasco, P. Buncic, L. Betev, F. Carminati, P-G. Cerello and P. Saiz 2 FZK 20th October Patricia Mendez Lorenzo
Some Results of the last PDC04 • ◘ Phase 1: Production of RAW data at CERN (Mar-May 2004) • ➸Large output files • ➸1a: Central events • (long jobs, large files) • ➸1b: Peripheral events • (short jobs, smaller files) S. Bagnasco.SC3 Detailed Planning Workshop, CERN 13.June, 05) 3 FZK 20th October Patricia Mendez Lorenzo
Some Results of the last PDC04 • ◘ Statistics after phase 1 (ended April 4, 2004): • ➸ALICE::CERN::LCG is the interface to LCG-2 • ➸ALICE::Torino::LCG is the interface to GRID.IT ~ 1.3 million files, 26 TB data volume S. Bagnasco.SC3 Detailed Planning Workshop, CERN 13.June, 05) 4 FZK 20th October Patricia Mendez Lorenzo
Generalities about Alice Production • ◘ ALICE has it own Task Queue and related services and use the LCG RB to submit the job agents • ➸Pull Model service: a server holds a master queue of jobs and it is up to the CE that provides the CPU cycles. It asks for the jobs • ➸No Information System is included • ➸ It offers a single interface for ALICE users into the complex, heterogeneous (multiple Grids) and fast-evolving Grid reality • ◘ Several Grid infrastructures are available during their Data Challenges • ➸LCG INFNGRID possible others in US • ➸Lots of resources but different middlewares 5 FZK 20th October Patricia Mendez Lorenzo
Production on different Grids • Design Strategy: • ◘ Use AliEn as a general front-end • ➸The resource is used transparently andindependently of the middleware system behind • ◘ Minimize points of contact between the systems • ➸No need to re-implement services • ➸No special requirements to run on remote CE/WNs • ◘ Make full use of the provided services • ➸Let the Grids do their work • ◘ Use high-level tools and APIs to access Grid resources • ➸Developers put a lot of abstraction effort into hiding the complexity and shielding the user from implementation changes 6 FZK 20th October Patricia Mendez Lorenzo
Available Resources • ◘LCG’2 core sites • ➸ CERN, CNAF, FZK, NIKHEF, RAL, Lyon, Taiwan (more than 1000 CPUs) • ➸ Each LCG with a VO-BOX seen as an independent site • ➸ AliEn services (CE+SE sited in each • VO-BOX) • ◘INFNGRID • ➸ LNL.INFN, PD.INFN and several smaller ones (400 CPUs not including CNAF) 7 FZK 20th October Patricia Mendez Lorenzo
Task Queue File Catalogue Interfacing AliEn and LCG ALICE SC3 layout LCG site (T1) ALICE Layout in terms of LCG/SC3 WN LCG CE User (Production Manager) VO-Box (UI) RB (external) SE (local) CElist T1 T2 VO-BOX (UI) CeList T2 T2 VO-BOX (UI) CeList T2 T2 VO-BOX (UI) CeList T2 8 FZK 20th October Patricia Mendez Lorenzo
Submission to LCG+AliEn sites “Double access” for selected sites (CNAF and CT.INFN) (S. BagnascoSC3 Detailed Planning Workshop, CERN 13.June, 05) WN A User submits jobs Submission WN Server WN Alien CE LCG UI LCG CE/SE WN LCG RB WN 9 FZK 20th October Patricia Mendez Lorenzo
New PDC05 FZK 20th October Patricia Mendez Lorenzo
Scope and Planning of DC05 • Physics Data Challenge: Phases • ◘First Phase:Simulation of Monte Carlo Events in all available resources • ➸ flow (production over), p+p(current production), Hijing+(next) • ➸ Registration of all the outputs in the ALICE File Catalog (central catalog) and store them at CERN-CASTOR (for SC3) • ➸ Production foreseen using LCG resources as soon as the VO BOXES are all configured • ◘Second Phase:Reconstruction of the raw events stored at CERN • ➸ Test of file transfer utilities (FTS) • ➸ Use of the local catalog at each site (LFC) • ◘Third Phase:Analysis phase • ◘ From the LCG resources, ALICE will use only those sites involved in SC3 11 FZK 20th October Patricia Mendez Lorenzo
Timeline of PDC05/SC3 2005 Aug Sep Oct Nov Dec Prototype data analysis (Phase 3) ALICE data ‘push’: - reserved/shared bandwidth - test of FTS (Phase 2) Job submission through LCG interface Event production (Phase 1) SC3 – start of service phase L. Betev, F. Carminati.GDB Meeting in Bologna. October 2005 12 FZK 20th October Patricia Mendez Lorenzo
Primary and Secondary Goals • ◘ Fundamental Goals: • ➸Use of the deployed LCG SC3 infrastructure for the ALICE DC05 • ➸ Test of the data transfer and storage services (SC3) • ➸ Test of distributed reconstruction and calibration model (ALICE) • ➸ Integrate the use of LCG resources with other resources available to ALICE within one single VO interface for different Grids • ➸ Analysis of reconstructed data 13 FZK 20th October Patricia Mendez Lorenzo
Baseline Services (I) • Services provided during SC3 (I. Bird, LCG PEB, 7th 2005) • ◘Storage Management services • ➸ Based on SRM as the interface • ◘ Basic transfer services • ➸ gridftp, srmcopy • ◘ Reliable file transfer service • ◘ Grid catalogue services • ◘ Catalogue and data management tools • ◘ Database services • ➸ Required at T1 and T2 • ◘ Compute Resource Services • ◘ Workload management 14 FZK 20th October Patricia Mendez Lorenzo
Baseline Services (II) • ◘ Clear need for VOMS: roles, groups, subgroups • ◘ POSIX-like I/O service • ➸ local files and include links to catalogues • ◘ Grid monitoring tools and services • ➸ Focused on job monitoring • ◘ VO agent framework • ◘ Applications software installation service • ◘ Reliable messaging service • ◘ Information system 15 FZK 20th October Patricia Mendez Lorenzo
Alice Requirements • VO-BOXES: Deployed in all T1 and T2 • PIII 2GHz, 1024 MB RAM. Any Linux flavour, kernel 2.4+. • User accounts for SGMs, via gsissh • UI functionality (including FTS and access to local catalog) • Access to the experiment software installation area • Agents and services • Site service interfaces and monitoring agents: • Storage Element Service (SES), File Transfer Daemon (Interface to FTS) • Cluster Monitor (CM), MonALISA, Agents Monitoring • Alien Computing Element (Interface to LCG RB) • PackMan (PM), xrootd • Connectivity • Outbound connectivity + Access to local storage (direct or SRM) • Inbound connectivity on some fixed network ports • From CERN, for CM and PM (e.g.: 8084 and 9991) • From World, for SES an xrootd (e.g.: 8082 and 51234) • Local data buffer for intermediate input/output of jobs (SES service) • Size: at least the number of job slots on the site * 3GB • Not necessary if xrootd is running on the site SE (may be included in DPM) 16 FZK 20th October Patricia Mendez Lorenzo
Alice Requirements • VO-BOXES: Current Status • Deployed at: • ➸CERN, CNAF, NiKHEF/SARA, IN2P3, Catania, Torino, Bari, GSI, FZK, RAL • ➸ AliEn specific services and software deployed in all VO-BOXES • ➸ Submissions through VO-BOX • - Job submission to RB is possible • - Still some infrastructure missing (env variables, lcg- infosites...) • - Completed for next release (LCG2.7.0 in October) 17 FZK 20th October Patricia Mendez Lorenzo
Alice Requirements • ◘ The configuration of the current deployed VO- BOX allows the job submissions… but… some considerations • ➸Some jobs submissions performed last week from VO-BOX at CERN • ➸A RB is defined in default • - Not sure is there in the rest of VO-BOXES • ➸A VO configuration file is however mandatory in the submission command line • ➸/tmp/jobOutput not available: Specify the output directory • ➸lcg-infosites has been installed by hand • - Probably missed in the rest of sites 18 FZK 20th October Patricia Mendez Lorenzo
Use of the SC3 and LCG infrastructure • ALICE is beginning the production in LCG without FTS • ◘ FTS: Deployed in all sites • ➸ Perl API implemented in the ALICE framework • ➸ Tests among T0-T1 performed this summer • ➸ FTS will be used as FTD plug-in • ➸ FTD was tested between native AliEn sites • -Already used in DC04 • ◘ Current Status: • ➸FTS through FTD not yet ready • ➸ALICE is testing the FTS standalone 19 FZK 20th October Patricia Mendez Lorenzo
Use of the SC3 and LCG infrastructure • ◘LFC: Deployed in all sites • ➸ Considered the local ALICE catalog • ➸ Central Alien storage index • ➸ Perl API implemented in the AliEn Framework • -More than 10000 entries (LFC as unique catalog) • - Too many authentications slow the process • ◘ Current Status of the LFC interface: • ➸OK but not used by current production • ➸ Will exercise it with special jobs • ◘ SRM: Deployed in all sites 20 FZK 20th October Patricia Mendez Lorenzo
Next Steps for SC3 • ◘ Get the SC3 Production Started • ➸Integration of the last version of AliRoot v.2.3 • ➸ Build the distribution • ➸ Deployment at T1 (CC-IN2P3, CNAF, FYK, RAL) • ➸ Waiting for NIKHEF • ➸ Run massively on T1 • ➸Timescale: Already running at CERN, then running continuously • ➸Issues: FTS not yet included “Next Steps” reported by P-G. Cerello during the TF Meeting, 06/10/05 21 FZK 20th October Patricia Mendez Lorenzo
Next Steps for SC3 • ◘ Extend to T2s • ➸ Foreseen T2s: Bari, Catania, GSI and Torino • ➸ Deployment on T2 • - LCG VO-BOX: site managers • - Alice Services on VO-BOXES • ➸ Operation: • - Monitoring-error reporting by ALICE Task Force support on sites • ➸Timescale: 1-2 weeks 22 FZK 20th October Patricia Mendez Lorenzo
Next Steps for SC3 • ◘ FTS Tests: • ➸Massively test all the T0 <--->T1, T1<--->T2 connections/endpoint involved in SC3 • -Configure/test the script execution on VO- BOXES • - Run several threads of the testing script, so as to reach the highest throughput • - Issues: with or without LFC registration? • ALICE has decided to run FTS in the simpler mode: NO automatic update of the catalog • ➸Timescale: 2 weeks 23 FZK 20th October Patricia Mendez Lorenzo
Next Step for SC3 • ◘ Steps: • ➸While FTS being tested, complete the Perl API integration in AliEn • ➸Test on a T1<--->T2 connection • Update AliEn distribution • ➸Deploy on VO-BOXES • ➸Start jobs with FTS transfer includes • ➸Timescale: 1 month? 24 FZK 20th October Patricia Mendez Lorenzo
In terms of support • ◘ SC3 Weekly Meeting with site • and experiment represents • ◘ AliEn central services – • ALICE responsibility • ◘ Task Force Weekly Meeting • LCG, Sites, Alice and ARDA • ◘ Periodically Action List update • to cover the Alice needs L. Betev, F. Carminati. GDB Meeting in Bologna. October 2005 26 FZK 20th October Patricia Mendez Lorenzo
Action List • ◘ Update every week during the TF Meeting • Task number: Task Label: Description. (Assigned to). Entry Date-Outing Data: Current Status • 1st Group: VO BOXES (VOBOX) • 1.1 VOBOX-conf: Fully configuration of the LCG VO-BOXES. (P. Cerello and P.Mendez). 06/10/05-(00/00/00): Work going on. Writing a brief document • (How-To) to describe the configuration of the UI inside the VO Box. • Then the document is planned to be passed to all sites. • 1.2 VOBOX-test: Configuration tests of the LCG Tools. (F.Donno): 06/10/05-(00/00/00): Work is ongoing. • 1.3 VOBOX-lcg-is:Installation of lcg-infosites in all VO BOXES. (P. Mendez): 06/10/05-(00/00/00): • Work almost finished: We have got in contact with all the sites. • Done: CERN, Torino, FZK, RAL, Bari • To be done in a short time: IN2P3, CNAF and GSI • Waiting answers from: NIKHEF and Catania • 1.4 VOBOX-AliEn: Deploy AliEn 2.3. (P. Buncic). 06/10/05-(00/00/00): • Deployed at CERN, deployment at the rest of the sites going on. • The new version is ready. 27 FZK 20th October Patricia Mendez Lorenzo
Summary • ◘ PDC05 going for a good road: • ➸Testing the LCG provided baseline services (Workload management system, SRM, FTS) • ➸Development and testing of interfaces of AliEn to LCG and beyond – ARC (Nordic), OSG (US) • ◘ Coordination of activities: • ➸Fully integrated with LCG SC3 • ➸Operation of the DC is managed by the ALICE-LCG TF • -LCG, ARDA, site experts, ALICE • ◘ PDC05 tasks: • ➸Flow events (completed), starting with p+p • ➸Test of FTS – file replications (in 2 weeks) • ➸Prototype of analysis – end 2005/beginning 2006 28 FZK 20th October Patricia Mendez Lorenzo