Monitoring of a distrubuted computing system: the Grid AliEn@CERN

Monitoring of adistrubuted computing system:the Grid AliEn@CERN Marco MEONI Master Degree – 19/12/2005

Content • MonALISA Adaptationsand Extensions • Grid Conceptsand Grid Monitoring • PDC’04 Monitoring and Results • Conclusions and Outlooks http://cern.ch/mmeoni/thesis/eng.pdf

Section I Grid Concepts and Grid Monitoring

ALICE experiment at CERN LHC 1) Heavy Nuclei and proton-proton colliding 5) ALICE physicists analyse the the data and search for physics signals of interest 2) Secondary particles are produced in the collision 4) Particle properties (trajectories, momentum, type) are reconstructed by the AliRoot software 3) These particles are recorded by the ALICE detector

Grid Computing • Grid Computing definition • “coordinated use of large sets of heterogenous, geographically distributed resources to allow high-performance computation” • The AliEn system • - pull rather than push architecture: the scheduling service does not need to know the status of all resources in the Grid – the resources advertise themselves; • - robust and fault tolerant, where resources can come and go at any point in time; • - interfaces to other Grid flavours allowing for rapid expansion of the size of the computing resources, transparently for the end user.

Producer Store location Transfer Data Registry Lookup location Consumer Grid Monitoring • GMA Architecture • R-GMA: an example of implementation • Jini (Sun): provides the technical basis

MonALISA framework • Distributed monitoring service system using JINI/JAVA and WSDL/SOAP technologies • Each MonALISA server acts as a dynamic service system and provides the functionality to be discovered and used by any other services or clients that require such information

Section II MonALISA Adaptations and Extensions

MonALISAAgent MonALISA Adaptations • A Web Repository as a front-end for production monitoring • Stores history view of the monitored data • Displays the data in variety of predefined histograms and other visualisation formats • Simple interfaces to user code: custom consumers, configuration modules, user-defined charts, distributions • Farms monitoring • User Java class to interface MonALISA and bash script to monitor the site Remote Farm WEB Repository CE Monitoring script Monitored data Java interface class WNs Grid resources User code MonALISA framework

Packages installation (Tomcat, MySQL) Configuration of main servlets for ALICE VO Setup of scripts for startup/shutdown/backup Repository Setup • A Web Repository as a front-end for monitoring • Keeps full history of monitored data • Shows data in a moltitude of histograms • Added new presentation formats to provide a full set (gauges, distributions) • Simple interfaces to user code: custom consumers, custom tasks • Installation and Maintenance • All the produced plots have been built and customized as from as many configuration files • SQL, parameters, colors, type • cumulative or averaged behaviour • smooth, fluctuations • user time intervals • …many others

Added a Java thread (DirectInsert) to feed directly the Repository, without passing by the MonALISA agents Repository Ad hoc java thread Jobs information AliEn Jobs Monitoring • Centralized or distributed? • AliEn native APIs to retrieve job status snapshots Job is submitted (Error_I) INSERTING AliEn TQ WAITING (Error_A) ASSIGNED CE (Error_S) QUEUED (Error_E) STARTED (Error_R) ZOMBIE RUNNING WN >1h (Error_V, VT, VN) VALIDATION FAILED (Error_SV) >3h SAVING DONE TOMCATJSP/servlets

Data Replication: MASTER DB REPLICA DB Online Replication aliweb01.cern.ch alimonitor.cern.ch Repository DataBase(s) Data Collecting: • 7+ Gb of performance information, 24.5M records • During DC data from ~2K monitored parameters arrive every 2/3 mins 1min 10 min 100 min { 60 bins for each basicinformation Averaging process FIFO • ROOT • CARROT • MonALISA Agents • Repository Web Services • AliEn API • LCG Interface • WNs monitoring (UDP) • Web Repository Data collecting and Grid Monitoring Grid Analysis

Web Repository • Storage and monitoring tools of the Data Challenge running parameters, task completion and resource status

Visualisation Formats Menù CE Load factors and tasks completion Statistics and real-time tabulated Stacked Bars Running history Snapshots and Pie charts

Monitored parameters • 2k parameters and 24,5M records with 1 minute granularity • Analysis of the collected data allows for improvement of the Grid performance 1868 • Derived classes

MonALISA Extensions • Job monitoring of Grid users • Application Monitoring (ApMon) at WNs • Repository Web Services • Using AliEn commands (ps –a, jobinfo #jobid, ps –X -st) + output parsing • Job’s JDL scanning • Results presented in the same web front end • ApMon is a set of flexible APIs that can be used by any application to send monitoring information to MonALISA services, via UDP datagrams • Allows for data aggregation and scaling of the monitoring system • Developed a light monitoring C++ class to include within the Process Monitor payload • Alternative to ApMon for WEB repository purposes - don’t need MonALISA agents - store data directly into the DB repository • Used to monitor Network Traffic through the ftp servers of ALICE at CERN

MonALISA Extensions • Distributions for principle of Analysis • First attempt for a Grid performance tuning, based on real monitored data • Use of ROOT and Carrot features • Cache system to optimize the requests ROOT histogram server process (central cache) A p a c h e HTTP 1. ask for histogram 2. query NEW data 3. send NEW data MonALISA Repository 4. send resulting object/file ROOT/Carrot histogram clients

Section III PDC’04 Monitoring and Results

Purpose: test and validate the ALICE Offline computing model: Produce and analyse ~10% of the data sample collected in a standard data-taking year Use the complete set of off-line software: AliEn, AliROOT, LCG, Proof and, in Phase 3, the ARDA user analysis prototype Structure: logically divided in three phases: Phase 1 - Production of underlying Pb+Pb events with different centralities (impact parameters) + production of p+p events Phase 2 - Mixing of signal events with different physics content into the underlying Pb+Pb events Phase 3 – Distributed analysis PDC’04

Central servers Master job submission, Job Optimizer, RB, File catalogue, processes control, SE… Sub-jobs AliEn-LCG interface Sub-jobs CERN CASTOR: disk servers, tape RB LCG is one AliEn CE CEs Output files CEs Job processing Job processing PDC’04 Phase 1 • Task - simulate the data flow in reverse: events are produced at remote centres and stored in the CERN MSS Storage

Total CPU profile • Aiming for continuous running, not always possible due to resources constraints Total number of jobs running in parallel 18 computing centres participating • Start 10/03, end 29/05 (58 days active) • Maximum jobs running in parallel:1450 • Average during active period: 430

Efficiency • Calculation principle: jobs are submitted only once Successfully done jobs all submitted jobs Error (CE) free jobs all submitted jobs Error (AliROOT) free jobs all submitted jobs

Phase 1 of PDC’04 Statistics

PDC’04 Phase 2 • Task - simulate the event reconstruction and remote event storage Central servers Master job submission, Job Optimizer (N sub-jobs), RB, File catalogue, processes monitoring and control, SE… Register in AliEn FC: LCG SE: LCG LFN = AliEn PFN Sub-jobs Sub-jobs Storage AliEn-LCG interface CERN CASTOR: underlying events Underlying event input files RB Storage CEs CEs CERN CASTOR: backup copy Job processing Job processing Output files Output files zip archive of output files Local SEs Local SEs File catalogue Primary copy Primary copy edg(lcg) copy&register

Individual sites: CPU contribution • Start 01/07, end 26/09 (88 days active) • As in the 1st phase, general equilibrium in CPU contribution • AliEn direct control: 17 CEs, each with a SE • CERN-LCG is encompassing the LCG resources worldwide (also with local/close SEs)

Sites occupancy • Outside CERN, sites such as Bari, Catania and JINR have generally run always at the maximum capacity

Phase 2: Statistics and Failures

Sub-job 1 Output file 1 PDC’04 Phase 3 File Catalogue query • Task – user data analysis Data set (ESDs, other) Job Optimizer Grouped by SE files location Sub-job 2 Sub-job n User job (many events) Job Broker Submit to CE with closest SE Job output CE and SE CE and SE CE and SE processing processing processing Output file 2 Output file n File merging job

Analysis • Start September 2004, end January 2005 • Distributions charts built on top of ROOT environment using the Carrot web interface • Distribution of number of running jobs • - mainly depends on number of waiting jobs in TQ and availability of free CPU at the remote CEs • Occupancy versus the number of queued jobs • - there is an increase of the occupancy as more jobs are waiting in the local batch queue and a saturation is • reached at around 60 queued jobs

Section IV Conclusions and Outlook

User jobs have been running for 9 months using AliEn MonALISA has provided a flexible and complete monitoring framework successfully adapted to the needs of Data Challenge MonALISA has given the expected results for performance tuning and workload balancing Approach step by step: from resources tuning to resources optimization MonALISA has been able to gather, store, plot, sort and group large variety of monitored parameters, either basic or derived in a rich set of presentation formats The Repository has been the only source of historical information and the modular architecture has made possible a development of variety of custom modules (~800 lines of fundamental source code and ~3k lines to perform service tasks) PDC’04 has been a real example of successful Grid interoperability by interfacing AliEn and LCG and proving the AliEn design scalability The usage of MonALISA in ALICE has been documented in an article for a conference at Computing in High Energy and Nuclear Physics (CHEP) ‘04, Interlaken - Switzerland Unprecedented experience to develop and improve a monitoring framework on top of a real functioning Grid, massively testing the involved software technologies Easy to extend the framework and replace components with equivalent ones following the technical needs or strategic choices Lessons from PDC’04

Dott. F.Carminati, L.Betev, P.Buncic and all colleagues in ALICE for the enthusiasm they trasmitted during this work MonALISA team collaborative anytime I needed Credits

Monitoring of a distrubuted computing system: the Grid AliEn@CERN