1 / 38

Super Scaling PROOF to very large clusters

Maarten Ballintijn , Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal / FNAL CHEP 2004. Super Scaling PROOF to very large clusters. Outline. PROOF Overview Benchmark Package Benchmark results Other developments Future plans. Outline.

Download Presentation

Super Scaling PROOF to very large clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal / FNAL CHEP 2004 Super Scaling PROOF to very large clusters

  2. Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans

  3. Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans

  4. PROOF – Parallel ROOT Facility • Interactive analysis of very large sets of ROOT data files on a cluster of computers • Employ inherent parallelism in event data • The main design goals are: • Transparency, scalability, adaptability • On the GRID, extended from local cluster to wide area virtual cluster or cluster of clusters • Collaboration between ROOT group at CERN and MIT Heavy Ion Group

  5. Slave Master Slave Slave Internet PROOF, continued • Multi Tier architecture • Optimize for Data Locality • WAN Ready and GRID compatible Slave User

  6. PROOF - Architecture • Data Access Strategies • Local data first, also rootd, rfio, SAN/NAS • Transparency • Input objects copied from client • Output objects merged, returned to client • Scalability and Adaptability • Vary packet size (specific workload, slave performance, dynamic load) • Heterogeneous Servers • Migrate to multi site configurations

  7. Outline • PROOF Overview • Benchmark Package • Dataset generation • Benchmark TSelector • Statistics and Event Trace • Benchmark results • Other developments • Future plans

  8. Dataset generation • Use the ROOT “Event” example class • Script for creating PAR file is provided • Generate data on all nodes with slaves • Slaves generate data files in parallel • Specify location, size and number of files % make_event_par.sh % root root[0] gROOT->Proof() root[1] .X make_event_trees.C(“/tmp/data”,100000,4) root[2] .L make_tdset.C root[2] TDSet *d = make_tdset.C()

  9. Benchmark TSelector • Three selectors are used • EventTree_NoProc.C – Empty Process() function, reads no data • EventTree_Proc.C – Reads all data and fills histogram (actually only 35% read in this test) • EventTree_ProcOpt.C – Reads a fraction of the data (20%) and fills histogram

  10. Statistics and Event Trace • Global Histograms to monitor master • Number of packets, number of events, processing time, get packet latency; per slave • Can be viewed using standard feedback • Trace Tree, detailed log of events during query • Master only or Master and Slave • Detailed List of recorded events follows • Implemented using standard ROOT classes and PROOF facilities

  11. Events recorded in Trace • Each event contains a timestamp and the recording slave or master • Begin and End of Query • Begin and End of File • Packet details and processing time • File Open statistics (slaves) • File Read statistics (slaves) • Easy to add new events

  12. Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans

  13. Benchmark Results • CDF cluster at Fermilab • 160 nodes, initial tests • Pharm, Phobos private cluster, 24 nodes • 6, 730 MHz P3 dual • 6, 930 MHz P3 dual • 12, 1.8 GHz P4 dual • Dataset: • 1 files per slave, 60000 events, 100 Mb

  14. Results on Pharm

  15. Results on Pharm, continued

  16. Local and remote File open Local local remote

  17. Slave I/O Performance

  18. Benchmark Results • Phobos-RCF, central facility at BNL, 370 nodes total • 75, 3.05 Ghz P4 dual, IDE • 99, 2.4 Ghz P4 dual, IDE • 18, 1.4 Ghz P3 dual, IDE • Dataset: • 1 files per slave, 60000 events, 100 Mb

  19. PHOBOS RCF LAN Layout

  20. Results on Phobos-RCF

  21. Looking at the problem

  22. Processing time distributions

  23. Processing time, detailed

  24. Request packet from Master

  25. Benchmark Conclusions • The benchmark and measurement facility has proven to be a very useful tool • Don’t use NFS based home directories • LAN topology is important • LAN speed is important • More testing is required to pinpoint sporadic long latency

  26. Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans

  27. Other developments • Packetizer fixes and new dev version • PROOF Parallel startup • TDrawFeedback • TParameter utility class • TCondor improvements • Authentication improvements • Long64_t introduction

  28. Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans

  29. Future plans • Understand and Solve LAN latency problem • In prototype stage • TProof::Draw() • Multi level master configuration • Documentation • HowTo • Benchmarking • PEAC PROOF Grid scheduler

  30. The End • Questions?

  31. stdout/obj proof ana.C proof TFile TFile TFile proof TNetFile proof proof proof = master server proof = slave server Parallel Script Execution #proof.conf slave node1 slave node2 slave node3 slave node4 Local PC Remote PROOF Cluster root *.root node1 ana.C *.root $ root root [0] .x ana.C $ root root [0] .x ana.C root [1] gROOT->Proof(“remote”) $ root root [0] tree->Process(“ana.C”) root [1] gROOT->Proof(“remote”) root [2] dset->Process(“ana.C”) $ root node2 *.root node3 *.root node4

  32. Master Slave(s) Client SendFile SendFile Process(dset,sel,inp,num,first) GetEntries Process(dset,sel,inp,num,first) GetPacket ReturnResults(out,log) ReturnResults(out,log) Simplified message flow

  33. TSelector TProof TSelector Slave(s) TSelector control flow Begin() Send Input Objects SlaveBegin() Process() ... Process() SlaveTerminate() Return Output Objects Terminate()

  34. PEAC System Overview

  35. Active Files during Query

  36. Pharm Slave I/O

  37. Active Files during Query

More Related