380 likes | 411 Views
Alice DC Status. P. Cerello March 19 th , 2004. Summary. Status of AliRoot Status of AliEn Physics Data Challenge Conclusions. AliReconstruction. AliSimulation. ESD. AliAnalysis. G4. G3. FLUKA. AliRoot layout. AliEn. ISAJET. AliRoot. Virtual MC. HIJING. EVGEN. MEVSIM. HBTAN.
E N D
Alice DC Status P. Cerello March 19th, 2004
Summary • Status of AliRoot • Status of AliEn • Physics Data Challenge • Conclusions
AliReconstruction AliSimulation ESD AliAnalysis G4 G3 FLUKA AliRoot layout AliEn ISAJET AliRoot Virtual MC HIJING EVGEN MEVSIM HBTAN STEER PYTHIA6 PDF PMD EMCAL TRD ITS PHOS TOF ZDC RICH HBTP STRUCT CRT START FMD MUON TPC RALICE ROOT
AliRoot Current status • Major changes in the last year • New multi-file I/O finally in full production • New coordinate system (and we survived!) • New reconstruction and simulations “drivers” • First attempt at the ESD and analysis framework • Improvements in reconstruction and simulation • Clearly the system works well, however many changes to come • ESD: the philosophy is still evolving • Introduction of FLUKA and new geometrical modeller • Development of the analysis framework • Raw data for all the detectors • Introduction of the condition database infrastructure
Software Development Process • ALICE opted for a light core CERN offline team… • Concentrate on framework, software distribution and maintenance • …plus some people from the collaboration • GRID coordination (Torino), World Computing Model (Nantes), Detector Construction Database (Warsaw), Web and VMC (La Habana) • Close integration with physics! • The ALICE Physics Coordinator is also a member of the offline team • A development cycle adapted to ALICE • Developers work on the most important feature at any moment • A stable production version exists • Collective ownership of the code • Flexible release cycle and simple packaging and installation • Micro-cycles happen continuously, macro-cycles 2-3 times per year • Discussed & implemented at Off-line meetings and Code Reviews
External software AliEn Core Components & services Interfaces RDBMS (MySQL) Database Proxy ADBI DBD DBI User Application File & Metadata Catalogue API (C/C++/perl) LDAP Authentication RB FS External Libraries User Interface Perl Modules Perl Core CLI Config Mgr CE SOAP/XML V.O. Packages & Commands SE GUI Web Portal Package Mgr (…) Logger Low level High level The ALICE Approach (AliEn) • Standards are now emerging for the basic building blocks of a GRID • There are millions lines of code in the OS domain dealing with these issues • Why not using these to build the minimal GRID that does the job? • Fast development of a prototype, no problem in exploring new roads, restarting from scratch etc etc • Hundreds of users and developers • Immediate adoption of emerging standards • An example, AliEn by ALICE (5% of code developed, 95% imported)
2001 2002 2003 2004 2005 Start 10% Data Challenge (analysis) Physics Performance Report (mixing & reconstruction) First production (distributed simulation) AliEn Timeline Functionality + Simulation Interoperability + Reconstruction Performance, Scalability, Standards + Analysis
AliEn + ROOT (A) ? provides: Analysis Macro Input Files Query for Input Data new TAliEnAnalysis Object USER List of Input Data + Locations produces Job Splitting IO Object 1 for Site A IO Object 1 for Site BI IO Object 1 for Site C Job Submission IO Object 2 for Site A Job Object 1 for Site B Job Object 1 for Site A Job Object 2 for Site A Job Object 1 for Site C Execution Histogram Merging Tree Chaining Results:
PROOF of AliEn (B) PROOF uses AliEn Grid File Catalogue and Data Management to map LFN’s to a chain of PFN’s and Workload Management to detect which nodes in a cluster can be used in a parallel session Nice! Now I can finally analyze my datasets on the Grid and produce a histogram. And it is fast too! • The PROOF system allows: • parallel analysis of objects in a set of files • parallel execution of scripts on clusters of heterogeneous machines
PDC 3 schema AliEn job control Data transfer Production of RAW Shipment of RAW to CERN Reconstruction of RAW in all T1’s CERN Analysis Tier2 Tier1 Tier1 Tier2
Signal-free event Merging Mixed signal
AliEn, Genius & EDG/LCG seen by ALICE User submits jobs Server Alien CE LCG UI Alien CEs/SEs LCG RB LCG CEs/SEs LCG PFN Catalog Catalog LCG LFN LCG PFN = AliEn LFN
Job submission EDG RB Interface Site Server EDG Site AliEn CE EDG CE EDG UI EDG SE AliEn SE WN AliEn PFN LFN=PFN Data Registration Data Registration Data Catalogue Replica Catalogue LFN LFN Mar, 11th, 2003: first AliRoot job, driven by AliEn, run on EDG AliEn – EDG Interface Status report
ALICE PDC-3 & LCG • All the production will be started via AliEn, the analysis will be done via Root/Proof/AliEn • LCG-2 will be one CE element of AliEn, which will integrate seamlessly LCG and non LCG resources • If LCG-2 works well, it will suck a large amount of jobs, and it will be used heavily • If LCG-2 does not work well, AliEn will privilege other resources, and it will be less used • In all cases we will use LCG-2 as much as possible • We will not need to take any decision: the performance of the system will decide for us • The figure of merit will be
AliEn & LCG: Data Challenge Alien CE/SE A User submits jobs Alien CE/SE Submission Alien CE/SE Server LCG CE/SE Alien CE LCG UI Catalog LCG CE/SE LCG RB Catalog LCG CE/SE
AliEn – LCG Interface • Remote AliEn and AliRoot installation OK on all LCG-2 sites • Job management interface works with no real problem • No reliable SE available on the LCG production infrastructure • generated data is always moved to CERN CASTOR as soon as the job finishes, using AliEn tools (AIOd). • An interface to LCG storage is anyhow available, and it will be tested as soon as LCG provides storage support on the EIS testbed.
Software Installation on LCG LCG site installAlice.sh • Via LCG jobs • $VO_ALICE_SW_DIR/root/v3-10-02/… geant3/v0-6/… aliroot/v4-01-Rev-00/… alien/… AliEn/… installAlice.jdl LCG site LCG-UI LCG site installAliEn.sh LCG site installAliEn.jdl LCG site
First Event Round on LCG • OK: as reported by AliEn. Output transfered to CERN CASTOR and registered on AliEn Data Catalogue • Aborted by LCG: reported as “Aborted” by LB. • Zombi: lost contact between AliEn and the job. All due to server and gateway restarts, many probably finished correctly on LCG. • Aborted by AliEn: failed. Many due to server and gateway problems since then fixed. • Still running: As reported by AliEn on Sunday, Feb, 29th, 5 p.m.
Short history • Jan 03: Requirements for ALICE PDC04 presented to PEB • End Dec 03: Announcement of LCG-2 by mid February 2004 • Beg Jan 04: Decision to delay PDC04 by one month waiting for LCG-2 • End Jan 04: LCG announces that there will be no SE in LCG-2 • Beg Feb 04: The WAN resources allocated by LCG for data storage are insufficient/inadequate • Mid Feb 04: Development of an ALICE solution, developed in haste and working against all odds! • End Feb 04: IT has also come up with a solution responding to a CMS requirement • End Feb 04: Production started, new sites being added • Confusing that during all this time LCG-2 has been declared “ready for ALICE” on a day-by-day basis! • Beg Mar 04: castor database has to be reinstalled (running on Linux 6.2!) • Beg Mar 04: castor servers have to be reinstalled for security • Beg Mar 04: LCG RB works differently on the different centres. CNAF has to be switched on and off by hand, otherwise it “swallows” all the jobs! • Beg Mar 04: we are getting now close to 10 TB, 30 were promised by LCG on 1/1/04 • Mid Mar 04: Files on the IT-provided pool are erased before being copied on tape(!) • 18 Mar 04: restart production & insert Grid.it
Shapshot on Mar, 16th • file:///C:/Documents%20and%20Settings/Piergiorgio%20Cerello/My%20Documents/Alice/AlienControls.htm
Data Challenge Statistics • First round, closed on Mar 16th
Data Challenge Statistics • First round, closed on Mar 16th
Data Challenge Statistics • First round, closed on Mar 16th
DC Monitoring: http://alien.cern.ch • Monalisa: http://aliens3.cern.ch:8080
Shapshot on Mar, 18th • file:///C:/Documents%20and%20Settings/Piergiorgio%20Cerello/My%20Documents/Alice/AlienControls2.htm
Data Challenge Statistics • First+Second round, started on Mar 18th : + 1713 jobs
Data Challenge Statistics • First+Second round, started on Mar 18th : +1051, + 680
Data Challenge Statistics • First+Second round, started on Mar 18th : +592, +476
Present Status • AliEn native sites • CERN, CNAF, Cyfronet, Catania, FZK, JINR, LBL, Lyon, OSC, Prague, Torino • LCG-2 sites • CERN, CNAF, RAL ok (up to 400 concurrent jobs) • FZK: problems with installation, solved as of mar, 18th • NIKHEF: old version of aliroot in $PATH – solved as of mar,18th • TAIWAN: intermittent problems (network?) • Fermilab: “not an Alice site” • Grid.it sites • Installation (aliroot & AliEn) ok everywhere but Bo • In production as of mar, 18th • Ba, Ct, Fe, LNL, Pd, To ok • Bo-INGV, Pi, not seen by RB • Bo, Rm: minor installation problems • Mar, 19th, 00:30 – Ba 1, Ct 7, Fe 7, LNL 97, Pd 70, To 17 = 199 running jobs
Double access @ CNAF WN A User submits jobs Alien/CNAF CE/SE WN Submission Server WN Alien CE LCG UI LCG/CNAF CE/SE WN LCG RB WN
Remarks • First GRID production with fully transparent common access to different middlewares (AliEn & LCG) • Relevant improvement in the LCG stability (450/12 hours wrt. 450/2 months) • AliEn – LCG load is about 50-50 • Optimal situation: wrt any other choice (AliEn only or LCG only) the availability of resources is doubled • There is room for improvement (on both sides) but • The Data Challenge started well, altough it is just at the beginning • We hope in the continued support from LCG • And centres should provide us with the promised resources • AliEn already provides functionality for distributed analysis • LCG/ARDA will improve it
Conclusions • ALICE has solutions that are evolving into a solid computing infrastructure • Major decisions have been taken and users have adopted them • Collaboration between physicists and computer scientists is excellent • The tight integration with ROOT allows a fast prototyping and development cycle • AliEn goes a long way toward providing a GRID solution adapted to HEP needs • It allowed us to do large productions with very few people “in charge” • Many ALICE-developed solutions have a high potential to be adopted by other experiments and indeed are becoming “common solutions”
V.O. directory Registry/Lookup/Config DBD/RDBMS User Interface Factory Authentication Auditing CE User Interface Grid Monitoring DB Proxy Gatekeeper Job Manager Catalogue Optimiser Process Monitor Job Broker Storage Element Transfer Optimizer Transfer Manager Job Optimizer Transfer Broker File Transfer AliEn 1 1 1. lookup 1..n 3. register 2. authenticate 1..n 1 API 4. bind 1 0..n 0..n 1 0..n 1..n 1 0..n 1 1 0..n 1 1 1 1 0..n
“Long they laboured in the regions of Eä, which are vast beyond the thought of Elves and Men, until in the time appointed was made Arda...” - J.R.R Tolkien, Valaquenta ARDA in a nutshell • ARDA RTAG • Found AliEn “the most complete system among all considered” in Sep ‘03 • Suggested a “fast prototype” in 6 months • Six months went to calm the turmoil spurred by this report! • ARDA is now started as suggested by the report • At least so we hope! • ARDA, if successful, will form the basis for the EGEE MW
ROOT, ALICE & LCG • LCG has brought support for ROOT and FLUKA • We will continue to develop our system • Providing basic technology,e.g. VMC and geometrical modeller • … and we will try to collaborate with LCG wherever possible • Possible convergence in the simulation area, collaboration on simple benchmarks • We have proposed to base LCG on ROOT and AliEn • LCG established a client-provider relationship with ROOT, which is rapidly evolving • Is now adopting AliEn via ARDA/EGEE • LCG decided to develop alternatives for some ROOT elements or hide them with interfaces • We expressed our worries • No time to develop and deploy a new system • Duplication and dispersion of efforts • Divergence with the rest of HEP • We will keep looking for opportunities to collaborate