180 likes | 377 Views
PROOF - Parallel ROOT Facility. Maarten Ballintijn http://root.cern.ch. Bring the KB to the PB not the PB to the KB. PROOF Intro. Collaboration between core ROOT group at CERN and MIT Heavy Ion Group. Rene Brun Fons Rademakers. Gunther Roland Maarten Ballintijn.
E N D
PROOF - Parallel ROOT Facility Maarten Ballintijn http://root.cern.ch Bring the KB to the PB not the PB to the KB
PROOF Intro • Collaboration between core ROOT group at CERN and MIT Heavy Ion Group • Rene Brun • Fons Rademakers • Gunther Roland • Maarten Ballintijn • Part of and based on ROOT framework • ROOT since 1995, PROOF started 2001 • A wealth of info at http://root.cern.ch • In ROOT CVS tree, beta tests ongoing
PROOF Intro • Collection of servers processes data • Parallel I/O and Parallel CPU • CPU Allocation and Data Access Strategies • Dynamic resource allocation • Local data first, also rootd, SAN/NAS • Transparency • Single source Analysis code • Input Objects copied from Client • Output Objects merged, returned to Client • Scalability and Adaptability • Dynamic packet size
PROOF Intro Slave many slaves Internet Master ROOT Client Session
Phobos Event and AnT Tree 1 TTree: Paddle Event 0..n 0..n Track Vertex 0..n Hit 1..n
PROOF Packages • Provide a collection of files in the sandbox • Binary or Source packages • PAR files: Proof ARchive. Like Java jar • Tar file, ROOT-INF directory • BUILD.sh • SETUP.C, per slave setting • API manage and activate packages
AnT Package ant: PROOF-INF/ Makefile LinkDef.h TPhAnTEventInfo.cxx TPhAnTEventInfo.h TPhAnTHit.cxx TPhAnTHit.h TPhAnTPdlInfo.cxx TPhAnTPdlInfo.h TPhAnTTrack.cxx TPhAnTTrack.h TPhAnTVertex.cxx TPhAnTVertex.h ant/PROOF-INF: BUILD.sh SETUP.C #!/bin/sh # BUILD.sh -- Build libant.so exec make // SETUP.C -- Load AnT library { gSystem->Load("libPhysics.so"); gSystem->Load("libant.so"); }
Analysis using TSelector • Extend Framework by inheritance // Abbreviated version class TSelector : public TObject { Protected: TList *fInput; TList *fOutput; public void Init( TTree* ); void Begin( Ttree* ); Bool_t Process(int entry); void Terminate(); };
Analysis using TSelector • Create Class inheriting from Tselector • Implement member functions • Begin() – Called once at the beginning of an analysis job, in each of the slave servers. Used to e.g. create histograms, initialize data • Process()- Called for each entry to be processed (by that slave) • Terminate()- Called once at the end of an analysis job, in each of the slave servers. Used to e.g. for post processing data, cleanup • Init() – Called for each new file
Example Selector antsel.C Antsel::Begin(Ttree *) { fVtx_x = new TH1F(“vtxx”,“Vertex X”,100,-10.,10.); } Antsel::Process(int entry) { fChain->GetTree()->GetEntry(entry); if ( evtInfo->fPdlInfo->fPdlMean < 1500 ) return; TPhAnTVertex *v = evtInfo->fRMSSelvtx->GetObject(); fVtx_x->Fill( v->fPos.X() ); } Antsel::Terminate() { fOutput->Add(fVtx_x); }
Running locally • Develop and debug selector locally on small event sample. % root Root[0] TFile *f = Tfile::Open(“ant_sample.root”) Root[1] TTree *t = (Ttree*) f->Get(“trkTree”) Root[3] t->Process(“antsel.C”,””,2000) Real time 0:00:06, CP time 5.940 Root[4] vtxx->Draw() Root[5] .! vi antsel.C • About 8Mb data (~x5 compression) • Develop until ready for large sample.
Running Locally • Ready to run on a large sample
TDSet – Specify the data • Specify a collection of TTrees or TFiles with objects [] TDSet *d = new TDSet(“TTree”, “tracks”, “/”); [] TDSet *d = new TDSet(“TEvent”, “”, “/objs”); [] d->Add(“root://rcrs4001/a.root”, “tracks”, “dir”, first, num); … [] d->Print(“a”); • To be returned by DB or File Catalog query etc. • Use logical filenames (“lfn:…”)
Running with PROOF • Ready to run on large event sample % root Root[0] gROOT->Proof(“pgate.lns.mit.edu”) … login details … Root[1] TDSet *ds = make_dset() Root[2] gProof->UploadPackage(“ant.par”) Root[3] gProof->EnablePackage(“ant”) … Root[4] gProof->Process(“antsel.C”,””,60000) Real time 0:00:12, CP Time 0.050 Root[5] ((TH1F*)gProof->GetOutput(“vtxx”))->Draw() • Use same session to look at other histograms, change cuts etc.
The PROOF advantage • Processed 240 Mb in 12 sec.
PROOF Scalability 8.8GB, 128 files 1 node: 5:25 m 32 nodes in parallel: 12 s 32 nodes: dual Itanium II 1 GHz CPU’s, 2 GB RAM, 2x75 GB 15K SCSI disk, 1 Fast Eth, 1 GB Eth nic (not used) Each node has one copy of the data set (4 files, total of 277 MB), 32 nodes: 8.8 Gbyte in 128 files, 9 million events
Future Work • Ongoing development • Improvements and defect fixes • Event lists • Friend Tree • Multi site PROOF sessions • Continued development of GRID based PROOF cluster
Other PROOF Talks • Fons Rademakers: • Distributed Parallel Analysis Framework with PROOF (15:00, session 2) • Jinghua Liu: • Analysis of CMS Heavy Ion Simulation Data Using ROOT/PROOF/Grid