380 likes | 521 Views
Maarten Ballintijn , Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal / FNAL CHEP 2004. Super Scaling PROOF to very large clusters . Outline. PROOF Overview Benchmark Package Benchmark results Other developments Future plans. Outline.
E N D
Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal / FNAL CHEP 2004 Super Scaling PROOF to very large clusters
Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans
Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans
PROOF – Parallel ROOT Facility • Interactive analysis of very large sets of ROOT data files on a cluster of computers • Employ inherent parallelism in event data • The main design goals are: • Transparency, scalability, adaptability • On the GRID, extended from local cluster to wide area virtual cluster or cluster of clusters • Collaboration between ROOT group at CERN and MIT Heavy Ion Group
Slave Master Slave Slave Internet PROOF, continued • Multi Tier architecture • Optimize for Data Locality • WAN Ready and GRID compatible Slave User
PROOF - Architecture • Data Access Strategies • Local data first, also rootd, rfio, SAN/NAS • Transparency • Input objects copied from client • Output objects merged, returned to client • Scalability and Adaptability • Vary packet size (specific workload, slave performance, dynamic load) • Heterogeneous Servers • Migrate to multi site configurations
Outline • PROOF Overview • Benchmark Package • Dataset generation • Benchmark TSelector • Statistics and Event Trace • Benchmark results • Other developments • Future plans
Dataset generation • Use the ROOT “Event” example class • Script for creating PAR file is provided • Generate data on all nodes with slaves • Slaves generate data files in parallel • Specify location, size and number of files % make_event_par.sh % root root[0] gROOT->Proof() root[1] .X make_event_trees.C(“/tmp/data”,100000,4) root[2] .L make_tdset.C root[2] TDSet *d = make_tdset.C()
Benchmark TSelector • Three selectors are used • EventTree_NoProc.C – Empty Process() function, reads no data • EventTree_Proc.C – Reads all data and fills histogram (actually only 35% read in this test) • EventTree_ProcOpt.C – Reads a fraction of the data (20%) and fills histogram
Statistics and Event Trace • Global Histograms to monitor master • Number of packets, number of events, processing time, get packet latency; per slave • Can be viewed using standard feedback • Trace Tree, detailed log of events during query • Master only or Master and Slave • Detailed List of recorded events follows • Implemented using standard ROOT classes and PROOF facilities
Events recorded in Trace • Each event contains a timestamp and the recording slave or master • Begin and End of Query • Begin and End of File • Packet details and processing time • File Open statistics (slaves) • File Read statistics (slaves) • Easy to add new events
Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans
Benchmark Results • CDF cluster at Fermilab • 160 nodes, initial tests • Pharm, Phobos private cluster, 24 nodes • 6, 730 MHz P3 dual • 6, 930 MHz P3 dual • 12, 1.8 GHz P4 dual • Dataset: • 1 files per slave, 60000 events, 100 Mb
Local and remote File open Local local remote
Benchmark Results • Phobos-RCF, central facility at BNL, 370 nodes total • 75, 3.05 Ghz P4 dual, IDE • 99, 2.4 Ghz P4 dual, IDE • 18, 1.4 Ghz P3 dual, IDE • Dataset: • 1 files per slave, 60000 events, 100 Mb
Benchmark Conclusions • The benchmark and measurement facility has proven to be a very useful tool • Don’t use NFS based home directories • LAN topology is important • LAN speed is important • More testing is required to pinpoint sporadic long latency
Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans
Other developments • Packetizer fixes and new dev version • PROOF Parallel startup • TDrawFeedback • TParameter utility class • TCondor improvements • Authentication improvements • Long64_t introduction
Outline • PROOF Overview • Benchmark Package • Benchmark results • Other developments • Future plans
Future plans • Understand and Solve LAN latency problem • In prototype stage • TProof::Draw() • Multi level master configuration • Documentation • HowTo • Benchmarking • PEAC PROOF Grid scheduler
The End • Questions?
stdout/obj proof ana.C proof TFile TFile TFile proof TNetFile proof proof proof = master server proof = slave server Parallel Script Execution #proof.conf slave node1 slave node2 slave node3 slave node4 Local PC Remote PROOF Cluster root *.root node1 ana.C *.root $ root root [0] .x ana.C $ root root [0] .x ana.C root [1] gROOT->Proof(“remote”) $ root root [0] tree->Process(“ana.C”) root [1] gROOT->Proof(“remote”) root [2] dset->Process(“ana.C”) $ root node2 *.root node3 *.root node4
Master Slave(s) Client SendFile SendFile Process(dset,sel,inp,num,first) GetEntries Process(dset,sel,inp,num,first) GetPacket ReturnResults(out,log) ReturnResults(out,log) Simplified message flow
TSelector TProof TSelector Slave(s) TSelector control flow Begin() Send Input Objects SlaveBegin() Process() ... Process() SlaveTerminate() Return Output Objects Terminate()