190 likes | 408 Views
Analysis of CMS Heavy Ion Simulation Data Using ROOT/PROOF/Grid. Jinghua Liu for Pablo Yepes, Jinghua Liu Rice University, Houston, TX Maarten Ballintijn, Gunther Roland, Bolek Wyslouch, Jinlong Zhang MIT, Cambridge, MA Supported by NSF grants #0218603, #0219063. Outline.
E N D
Analysis of CMS Heavy Ion Simulation Data Using ROOT/PROOF/Grid Jinghua Liu for Pablo Yepes, Jinghua Liu Rice University, Houston, TX Maarten Ballintijn, Gunther Roland, Bolek Wyslouch, Jinlong Zhang MIT, Cambridge, MA Supported by NSF grants #0218603, #0219063
Outline From data analysis user’s point of view • Why: ROOT/PROOF/Grid • How: Step by Step • What: Test Result • Summary Other PROOF talks in this conference: Fons Rademakers Maarten Ballintijn
ROOT/PROOF • ROOT as a data analysis tool • PROOF: Parallel ROOT Facility ,based on and part of ROOT • on clusters of heterogeneous machines • parallel analysis of objects in a set of files • parallel execution of scripts • Transparency, Scalability, Adaptability, Error handling, Authentication • “Bring the KB to the PB not the PB to the KB” KB: code-->CPU, PB: data Use distributed CPUs to analyze distributed data
PROOF/Grid Interface • Use a Grid Resource Broker to detect which nodes in a cluster can be used in the parallel session • Use Grid File Catalogue and Replication Manager • Utilize Grid Monitoring Services • Support Globus Authentication • Abstract Grid interface
Step by Step • Setup PC cluster(s) (for PROOF/Grid) • Prepare the data files • Write analysis code (algorithm) • Compile a data set for PROOF • Run a PROOF job • Get the results
PC Clusters • Client machine (desktop) P4 @ 1.8GHz /512MB/40GB • Cluster1: 2 Dual Xeon @ 2.4GHz /1GB/360GB 1 Dual Athlon @ 1.73GHz /1GB/240GB 8 Dual PIII @ 400MHz /512MB/60GB • Cluster 2: 3 Dual Athlon @ 1.67GHz /2GB/200GB • Operating systems: RedHat 6.1, RedHat 7.3, Slackware 8.1 • Globus version: 2.2
CMS Heavy Ion Simulation • Jet & high-pT particle angular correlation • Use Calorimeters only
CMS Heavy Ion Simulation • Pythia (event generator): 10,000 jet events • Hijing (Heavy Ion event generator): 1000 events • Each Hijing event (dN/dy~5000) was divided into ~500 sub-events • Randomly re-combine 500 sub-events (from different events) to form a new Hijing event, a cheap way to obtain more Monte Carlo events • CMSIM (GEANT 3 based simulation program for CMS)
Data Production: Globus Jobs • Globus used to submit & manage the jobs • No data replication (files were intentionally stored locally)
Build ROOT Tree • Superimpose jet events on top of Hijing events and generate ROOT Tree • Standalone code linked with ROOT libraries CMS: Ecal (Electromagnetic Calorimeter): barrel 61200 cells, endcap 14648 cells HCal (Hadronic Calorimeter): 14616 cells (multi-layer) 4032 towers calotree--Ecal cells (energy, position) Hcal towers (energy, position) • 10,000 events were split into 100 files, 100 events each, file size ~160MB, total data 16GB • Data distributed, each node got some local files
TSelector – The Algorithms • Create TSelector from TTree $ root root[0] TFile f(“heavyion001.root”) root[1] calotree->MakeSelector(“myselector”) root[2] .q $ ls myselector.C myselector.h • Add the analysis code (algorithm) into TSelector $ vi myselector.h $ vi myselector.C
TSelector – The Algorithms • myselector.h Class myselector : public TSelector { public: TTree *fChain; . . private: TH1F *hist1d; TH2F *hist2d; . . . }
TSelector – The Algorithms • myselector.C void myselector::Begin(TTree *tree) { hist1d = new TH1F(“DeltaPhi”,”DeltaPhi”,100,180.,180.); Hist2d = new TH2F(“EtaPhi”,”EtaPhi”,100,-5.,5.,100,-4.,4.); fOutput->Add(hist1d); fOutput->Add(hist2d); } Bool_t myselector::Process(Int_t entry) { user’s analysis code goes here! for(i=0; i< nclusters; i++) { if (Et1>5) for(j=i+1; j< nclusters; j++) { if(Et2>5) { DeltaPhi= … hist1d->Fill(DeltaPhi); }
TDSet – Data Location • Specify a collection of TTrees or files [] TDSet *ds = new TDSet(“TTree”, “calotree”); [] ds->Add(“/data1/cms/cmsim/heavyion001.root”); [] ds->Add(“/data1/cms/cmsim/heavyion002.root”); … [] ds->Add(“lfn://pcs21.rice.edu/data5/heavyion110.root”); [] ds->Add(“lfn://pcs11.rice.edu/cms/cmsim/heavyion230.root”); … [] ds->Print(); • It’s better to put these into a macro • Returned by DB or File Catalog query etc
Running a PROOF Job $ root [] gROOT->Proof(“proofmaster.rice.edu”); [] TDSet *ds = new TDSet(“TTree”, “calotree”); [] ds->Add(“. . .”); . . . [] ds->Process(“myselector.C+”, “options”, nentries, first); (note: options must be pre-coded in myselector.C) [] TH1F *h1=(TH1F *)gProof->GetOutput(“DeltaPhi”); [] h1->Draw();
Scale plot • Analysis speed vs. CPUs (PIII 1GHz equivalent) • CPU power/data size balanced • CPU intensive calculations
Summary • CMS Heavy Ion Analysis implemented and tested with PROOF • Scales well with CPUs • PROOF/Grid can provide the data analysis power unavailable otherwise. This power can be achieved without much extra effort • PROOF/Grid interface is under rapid development. The plan is to extend the presented study to use Grid interface