140 likes | 267 Views
Prototype of a Parallel Analysis System for CMS using PROOF CHEP 2006 - Mumbai, India - February 2006. I. González , D. Cano, R. Marco Instituto de Física de Cantabria (CSIC – U.C.), Santander – Spain J. Cuevas Dpto. De Física, Universidad de Oviedo, Oviedo - Spain. Outline.
E N D
Prototype of a ParallelAnalysis System for CMS using PROOFCHEP 2006 - Mumbai, India - February 2006 I. González, D. Cano, R. Marco Instituto de Física de Cantabria (CSIC – U.C.), Santander – Spain J. Cuevas Dpto. De Física, Universidad de Oviedo, Oviedo - Spain
Outline • Ideal and usual HEP Analysis • A parallel system: PROOF • Prototype: • Objectives • Implementation • User point of view • Running it.. • Performance studies • Hardware setup • Results • Experience with CMS & PROOF • User point of view • Developer point of view • Conclusions Prototype of a Parallel Analysis System for CMS using PROOF - I. González
Ideal HEP Analysis - Kind of HPC • A typical analysis needs a continuous algorithm refining cycle • Implement algorithm (cuts, particle identification, high level reconstruction) • Run on data and build histograms, tables,… • Look at the results and think on improvements and go back to point 1 • To achieve interactivity: • Huge levels of CPU are needed during short periods to process large amounts of data HPC model • Time is spent in coding and thinking… not waiting! Prototype of a Parallel Analysis System for CMS using PROOF - I. González
A typical sample (final ROOT files) in CMS: Signal may be of the order of 1M events (150 GB) Backgrounds much, much bigger Together might be of the order of 1 TB Algorithms may include: Loops inside loops inside loops… Construction of new objects, collections, etc… Kinematical cuts Histograms, summaries and various other mathematical objects Unavoidable on a single CPU Too long (hours for signal or days for all) Access to data Uncomfortable on a distributed batch system Take care in your code of splitting the samples Interactivity is completely lost Debugging becomes complicated Take care of merging the final results Mixture of both Develop and debug in a single CPU Production selection on a batch system … parallelisation may be better Usual HEP Analysis – Data Processing Prototype of a Parallel Analysis System for CMS using PROOF - I. González
Important aspects of a parallel system Support for various authentication mechanisms Possibility to upload code Local and remote data access Efficient master/slave communication Clever load balancing Easy splitting of data and merging of final objects PROOF is the Parallel facility in ROOT Provides a simple model to process TTrees in parallel Disperse your data among your slaves… …build your libraries following the Selector model …and PROOF takes care of the CPU load for each slave PROOF supports several ways of authentication: SSH, GSI, kerberos… Profit from GRID technology Easy mechanism to upload code Package code into PAR (tar.gz) files and tell ROOT how to load it …and is easy to share these files Remote data may be accessed via rfio, (x)rootd, … Implements dynamic load balancing … based on local availability of data… … and individual CPU performance Master/Slave communication is done through special light daemons (proofd) Automatic sample splitting and support for object merging is provided: Automatically handled by PROOF for ROOT objects Other objects need to inherit from TObject and implement merging code Recover your histograms automatically to your normal ROOT session A parallel system - PROOF More information in the talk from G. Ganis and in http://root.cern.ch/root/PROOF.html Prototype of a Parallel Analysis System for CMS using PROOF - I. González
CMS & PROOF Prototype - Objectives • Hide PROOF details as much as possible • The physicist should concentrate on the analysis development • Forget about the insights of PROOF • Make all the operations related to PROOF (compilation, packaging and uploading, etc) invisible for the user • Easy code migration from current analysis applications based on CMS tools (ExRootAnalysisReader) • Integrate and reuse code from those tools (do not reinvent the wheel) • Provide the same level of functionality • Load balancing handled and optimised automatically by PROOF • Base design on the Selector model in ROOT • Favour a modular analysis development: • Facilitate code sharing between different physicists • Provide a mechanism to (de-)activate parts of the analysis at will • Profit from GRID local infrastructures • Clusters ready to use • Authentication and certification mechanisms in place Prototype of a Parallel Analysis System for CMS using PROOF - I. González
One class to encapsulate the interaction with PROOF (compilation, packaging, uploading,…) Modularity achived by inheriting from AnalysisModule base class Related algorithms to process data (IsoMuonFinder, TTbarSelection,…) Analysis Modules Manager One class to specialise TSelectorfor CMS data Integrates also CMS already existing analysis tools One class to encapsulate Counters Very simple but non existing in ROOT One class to handle an Input File Main macro to run PROOF Main macro to run sequentially Useful for debugging Some scripts To generate the skeleton of a new Analysis Module Internally used by the tool itself CMS & Proof Prototype – Implementation MyAnalysisMod1 Analyser void AddModule(AnalysisModule*) void Init() void Loop() void Summary(ostream& os) vector <AnalysisModule*> theModules AnalysisModule void Init() void Loop() void Summary() MyAnalysisMod2 MyAnalysisModN Prototype of a Parallel Analysis System for CMS using PROOF - I. González
In the input file several things need to be set: Data files: Location and name PROOF Master IP name Analysis Modules to use Analysis Modules Where the actual code goes The place to concentrate Only two mandatory methods The skeleton may be created with a script Code produced is well commented with hints They can be easily shared between developers Extra features PROOF running statistics may be activated A mechanism to pass parameters to the modules has been developed Avoid recompilation if a cut is changed Define them in the input file Utility packages are supported Need not executed on each event Each module may implement a summary method to be printed at the end of each job Number of events to run is configurable … CMS & Proof Prototype – User point of view Prototype of a Parallel Analysis System for CMS using PROOF - I. González
CMS & Proof Prototype – Running it… • Set environment (once) • Authentication (once) • Start ROOT • Execute RunProof.C [#] export ORCA_SRCPATH=/path/to/ORCA/src [#] export ORCA_LIBPATH=/path/to/ORCA/libs [#] grid-proxy-init Your identity: /C=ES… [#] root –l root[0] .x RunProof.C >> Creating CMSProofLoader... Info in <TUnixSystem::ACLiC>: creating shared library ~/CMSProof/./CMSProofLoader_C.so … >> Checking if PAR files need to be redone... … >> Initialising PROOF... … SUMARY ====== Number of events processed: 794920 >> PROOF Done!!! root[1] fMyHistogram->Draw(); root[2] .q • Draw histograms Prototype of a Parallel Analysis System for CMS using PROOF - I. González
Performance Studies – Hardware setup • Hardware description • 90 nodes • IBM xSeries 336 , 2 Processors Xeon 3.2GHz , 2GB memory , 2 Hard Disk SATA 80+400GB • Network: Gigabit Ethernet. • Stack of 4 units: Switch 3COM SuperStack3 3870 48 ports • Each node has 1 Gigabit Ethernet connection to a Gigabit port. • 1 slave per node • 1 Master • 80 slaves • Data distributed in blocks of ~10K events (~1.5 GB) • 1 in each node • 800K events (~120 GB) in total Prototype of a Parallel Analysis System for CMS using PROOF - I. González
Performance Studies – Results • We used a real analysis: • Selection of top quark pair production events with a tau, a lepton and two b quarks in the final state • t reconstruction from tracks, jets and clusters Results • Total time = processing + initialisation times • Run: Only loop on events • In 1 CPU ~ 4 hours • In 80 CPUs ~4 minutes • Initialisation time takes ~3 minutes including: • Authentication: • Done on all slaves, even if unused • Therefore not dependent on the number of slaves used • Remote environment setting • Code uploading and compilation • Smart: Only done for newer code • First time it takes some time (not in plots) • TChain initialisation • Very long for very distributed chains (normal case) • Run time scales close to the ideal 1/Ncpu Prototype of a Parallel Analysis System for CMS using PROOF - I. González
Good things Code was easily and quickly migrated from the previous “framework” One morning of basically copy/paste It is done only once and forever The old sequential mode is still available which is very useful for debugging Common code is being shared between different developers located in different places In our sites quick analysis development has been possible thanks to the interactivity provided by the analysis parallelisation Now we have developed ~20 new modules More than 200 histograms are produced in a few minutes Time is spent thinking and programming the new cuts and algorithms, not waiting for results Physicists are concentrated on the physics and computer managers are concentrated on computers Problems and Improvements Debugging the code is still a difficult task If an error happens in the remote node, PROOF master hangs We need to put more effort on this issue Histogram are only recovered at the end PROOF allows to draw plots while data at given intervals while data is being processed GUI: To set master name, number of events, data files, analysis modules, packages, parameters… Based on the existing and evolving one in PROOF? Data is specified by data file name and its location in the slaves need to be known. Use “named” datasets to, for example, specify data by physics process Might be supported by PROOF More will certainly come as we use it… Experience with CMS & PROOF Prototype Prototype of a Parallel Analysis System for CMS using PROOF - I. González
Experience with PROOF – Developer point of view • Some problems found with PROOF but a lot of good support from PROOF development team • Installing and setting PROOF is not a straightforward task: • Need to deal with several interconnected different applications: ROOT, Globus, xinetd,… • Documentation still incomplete… but improving! • PROOF is very sensitive to time syncronisation (authentication), node and network status, ??? • Unable to recover or skip incorrect nodes • Finding what is wrong is not easy • Errors are not always issued… sometime it just hangs • Events are sometimes skipped with no warning message • Could not properly test the configuration with two slaves per node: • Sometimes it would hang (even though there were two CPUs) • No gain observed when it works • Performance is strongly dependent on the “locality” of data • Best performance achieved if data is equally distributed among slaves, i.e. no network data transfer needed • Currently done manually… • PROOF should automatically handle this • Currently being developed (learnt it just yesterday!) Prototype of a Parallel Analysis System for CMS using PROOF - I. González
Conclusions • We wanted to implement a tool to quickly, easily and interactively develop a HEP analysis… • … that was modular, usable now and easy • … that did not force us to rewrite our already existing code • … that allowed us to concentrate on physics • … that fully exploited local CPU farms • We built a light tool which profits from existing CMS libraries and PROOF… • … that fills all the requirements • … that has brought “interactivity” back into the analysis cycle • … that took us one morning to migrate to • … that allows code sharing • We gain a lot of experience on using PROOF and developing tools for PROOF… • … that will allow new functionalities into the tool • … that will ease the integration of PROOF with the new CMS event data model and framework • More information in: http://grid.ifca.unican.es/cms/proof Prototype of a Parallel Analysis System for CMS using PROOF - I. González