1 / 18

PROOF - Parallel ROOT Facility

PROOF - Parallel ROOT Facility. Maarten Ballintijn http://root.cern.ch. Bring the KB to the PB not the PB to the KB. PROOF Intro. Collaboration between core ROOT group at CERN and MIT Heavy Ion Group. Rene Brun Fons Rademakers. Gunther Roland Maarten Ballintijn.

gili
Download Presentation

PROOF - Parallel ROOT Facility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PROOF - Parallel ROOT Facility Maarten Ballintijn http://root.cern.ch Bring the KB to the PB not the PB to the KB

  2. PROOF Intro • Collaboration between core ROOT group at CERN and MIT Heavy Ion Group • Rene Brun • Fons Rademakers • Gunther Roland • Maarten Ballintijn • Part of and based on ROOT framework • ROOT since 1995, PROOF started 2001 • A wealth of info at http://root.cern.ch • In ROOT CVS tree, beta tests ongoing

  3. PROOF Intro • Collection of servers processes data • Parallel I/O and Parallel CPU • CPU Allocation and Data Access Strategies • Dynamic resource allocation • Local data first, also rootd, SAN/NAS • Transparency • Single source Analysis code • Input Objects copied from Client • Output Objects merged, returned to Client • Scalability and Adaptability • Dynamic packet size

  4. PROOF Intro Slave many slaves Internet Master ROOT Client Session

  5. Phobos Event and AnT Tree 1 TTree: Paddle Event 0..n 0..n Track Vertex 0..n Hit 1..n

  6. PROOF Packages • Provide a collection of files in the sandbox • Binary or Source packages • PAR files: Proof ARchive. Like Java jar • Tar file, ROOT-INF directory • BUILD.sh • SETUP.C, per slave setting • API manage and activate packages

  7. AnT Package ant: PROOF-INF/ Makefile LinkDef.h TPhAnTEventInfo.cxx TPhAnTEventInfo.h TPhAnTHit.cxx TPhAnTHit.h TPhAnTPdlInfo.cxx TPhAnTPdlInfo.h TPhAnTTrack.cxx TPhAnTTrack.h TPhAnTVertex.cxx TPhAnTVertex.h ant/PROOF-INF: BUILD.sh SETUP.C #!/bin/sh # BUILD.sh -- Build libant.so exec make // SETUP.C -- Load AnT library { gSystem->Load("libPhysics.so"); gSystem->Load("libant.so"); }

  8. Analysis using TSelector • Extend Framework by inheritance // Abbreviated version class TSelector : public TObject { Protected: TList *fInput; TList *fOutput; public void Init( TTree* ); void Begin( Ttree* ); Bool_t Process(int entry); void Terminate(); };

  9. Analysis using TSelector • Create Class inheriting from Tselector • Implement member functions • Begin() – Called once at the beginning of an analysis job, in each of the slave servers. Used to e.g. create histograms, initialize data • Process()- Called for each entry to be processed (by that slave) • Terminate()- Called once at the end of an analysis job, in each of the slave servers. Used to e.g. for post processing data, cleanup • Init() – Called for each new file

  10. Example Selector antsel.C Antsel::Begin(Ttree *) { fVtx_x = new TH1F(“vtxx”,“Vertex X”,100,-10.,10.); } Antsel::Process(int entry) { fChain->GetTree()->GetEntry(entry); if ( evtInfo->fPdlInfo->fPdlMean < 1500 ) return; TPhAnTVertex *v = evtInfo->fRMSSelvtx->GetObject(); fVtx_x->Fill( v->fPos.X() ); } Antsel::Terminate() { fOutput->Add(fVtx_x); }

  11. Running locally • Develop and debug selector locally on small event sample. % root Root[0] TFile *f = Tfile::Open(“ant_sample.root”) Root[1] TTree *t = (Ttree*) f->Get(“trkTree”) Root[3] t->Process(“antsel.C”,””,2000) Real time 0:00:06, CP time 5.940 Root[4] vtxx->Draw() Root[5] .! vi antsel.C • About 8Mb data (~x5 compression) • Develop until ready for large sample.

  12. Running Locally • Ready to run on a large sample

  13. TDSet – Specify the data • Specify a collection of TTrees or TFiles with objects [] TDSet *d = new TDSet(“TTree”, “tracks”, “/”); [] TDSet *d = new TDSet(“TEvent”, “”, “/objs”); [] d->Add(“root://rcrs4001/a.root”, “tracks”, “dir”, first, num); … [] d->Print(“a”); • To be returned by DB or File Catalog query etc. • Use logical filenames (“lfn:…”)

  14. Running with PROOF • Ready to run on large event sample % root Root[0] gROOT->Proof(“pgate.lns.mit.edu”) … login details … Root[1] TDSet *ds = make_dset() Root[2] gProof->UploadPackage(“ant.par”) Root[3] gProof->EnablePackage(“ant”) … Root[4] gProof->Process(“antsel.C”,””,60000) Real time 0:00:12, CP Time 0.050 Root[5] ((TH1F*)gProof->GetOutput(“vtxx”))->Draw() • Use same session to look at other histograms, change cuts etc.

  15. The PROOF advantage • Processed 240 Mb in 12 sec.

  16. PROOF Scalability 8.8GB, 128 files 1 node: 5:25 m 32 nodes in parallel: 12 s 32 nodes: dual Itanium II 1 GHz CPU’s, 2 GB RAM, 2x75 GB 15K SCSI disk, 1 Fast Eth, 1 GB Eth nic (not used) Each node has one copy of the data set (4 files, total of 277 MB), 32 nodes: 8.8 Gbyte in 128 files, 9 million events

  17. Future Work • Ongoing development • Improvements and defect fixes • Event lists • Friend Tree • Multi site PROOF sessions • Continued development of GRID based PROOF cluster

  18. Other PROOF Talks • Fons Rademakers: • Distributed Parallel Analysis Framework with PROOF (15:00, session 2) • Jinghua Liu: • Analysis of CMS Heavy Ion Simulation Data Using ROOT/PROOF/Grid

More Related