1 / 33

A prototype for an extended PROOF

A prototype for an extended PROOF. What is PROOF ? ROOT analysis model … … on a multi-tier architecture Status New development Prototype based on XRD Demo. G. Ganis / CERN PH-SFT, June 2005. The ROOT analysis model: Trees.

blair-cain
Download Presentation

A prototype for an extended PROOF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A prototype for an extended PROOF • What is PROOF ? • ROOT analysis model … • … on a multi-tier architecture • Status • New development • Prototype based on XRD • Demo G. Ganis / CERN PH-SFT, June 2005

  2. The ROOT analysis model: Trees • Main data structure in ROOT, extending the concept of PAW ntuple • Collection of independent entries • Organized in • Leafs (basic type, array, C++ object) • Branches (collection of Leafs / Branches)

  3. The ROOT analysis model: Trees (cnt’d) • Efficient access to portions of entry data • Several facilities to work with trees • Tree friends (TTree::AddFriend): • extend an existing tree without touching it • e.g. an experiment read-only tree with user-specific • branches / leafs • Tree chains (TChain) • list of trees to make tree size virtually unbounded • (typical size of single tree is < 2 GB) • In all cases the result behaves exactly as a single tree

  4. The ROOT analysis model: Selector • TSelector: main tool to define the data processing strategy • Simple structure • Framework automatically generated for a tree • tree->MakeSelector(“MySelector”) void MySelector::Begin(TTree *tree) { // method called before starting the event loop fPtBranch = tree->GetBranch(“pt”); fPtBranch->SetAddress(&fPt); fPtHist = new TH1F(“Pt”,”Pt”,100,0.,400.); } Bool_t MySelector::Process(Long64_t entry) { // Method called for each entry in the tree fPtBranch->GetEntry(entry); fPtHist->Fill(fPt); } void MySelector::Terminate() { // method called when the event loop is over fPtHist->Draw(); } Read only what is needed by the algorithm

  5. The ROOT analysis model: h1 analysis example { // localProcessing.C // Define the data set TChain a("h42"); a.Add("/home/ganis/rootdata/dstarmb.root"); a.Add("/home/ganis/rootdata/dstarp1a.root"); a.Add("/home/ganis/rootdata/dstarp1b.root"); a.Add("/home/ganis/rootdata/dstarp2.root"); // Process the selector a.Process("h1analysis.C"); } root [0] .x localProcessing.C Starting h1analysis with process option: Starting h1analysis with process option: Processing file: /home/ganis/rootdata/dstarmb.root Processing file: /home/ganis/rootdata/dstarp1a.root Processing file: /home/ganis/rootdata/dstarp1b.root Processing file: /home/ganis/rootdata/dstarp2.root FCN=70.4023 FROM MIGRAD STATUS=CONVERGED 220 CALLS 221 TOTAL EDM=1.37834e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 p0 9.59988e+05 9.07051e+04 7.92857e+01 -2.69331e-09 2 p1 3.51130e-01 2.32881e-02 4.69706e-05 5.29292e-03 3 p2 1.18502e+03 5.95938e+01 6.72112e-01 2.29626e-06 4 p3 1.45569e-01 5.93851e-05 8.69320e-07 -1.75027e+00 5 p4 1.24388e-03 6.63103e-05 7.86533e-07 -6.72432e-01 Real time 0:00:17.563133, CP time 5.880

  6. PROOF • Why ? • Data to be analyzed only rarely can be all local • Data transfer of full data sets takes time • Goal: provide a tool for interactive analysis on a heterogeneous cluster • exploit inter-independence of entries in a tree • basic parallelism achieved by splitting the data into packets of variable • size distributed to participant nodes • Focus on: • Transparency • same selectors, … on PROOF as in local session • Scalability • linear scaling up to large number of workers (tested up to 1000) • Adaptability • cope automatically with different cluster configurations and • varying running conditions / perfomances Motto: Bring the KiloBytes to the PetaBytes and not the PetaBytes to the KiloBytes

  7. PROOF: architecture

  8. proofd proofd proofd PROOF: connection layer slave n slave 1 proofd proofd execv() fork() execv() fork() … proofslave proofslave proofserv master execv() proofd fork() client parentproofd (always running) childproofd (transforming in proofserv / proofslave) proofserv / proofslave : TProofServ instances

  9. PROOF: simplified message flow

  10. PROOF: workflow

  11. PROOF: data access strategies • Each slave get assigned, as much as possible, packets representing data in local files • If no (more) local data, get remote data via (x)rootd, rfiod or dCache (needs good LAN, like GB eth) • In case of SAN/NAS just use round robin strategy

  12. PROOF: processing algorithms • TSelector adapted to PROOF • Natural additions • Input list: code to be run, … • Output list: results • Methods to initialize and • finalize processing within • a slave • Method to init a tree void MySelector::Begin(TTree *tree){ // called in the client for local inits } void MySelector::SlaveBegin(TTree *tree) { // called in each slave before processing fPtHist = new TH1F(“Pt”,”Pt”,100,0.,400.); fOutput->Add(fPtHist); } void MySelector::Init(TTree *tree) { // called at each tree change fPtBranch = tree->GetBranch(“pt”); fPtBranch->SetAddress(&fPt); } Bool_t MySelector::Process(Long64_t entry){ // called for each entry in the tree fPtBranch->GetEntry(entry); fPtHist->Fill(fPt); } void MySelector::SlaveTerminate() { // called in each slave after processing } void MySelector::Terminate() { // called in the client after processing fPtHist->Draw(); } Defines the list of objects wanted back Objects with Merge() method are automatically merged in Terminate The modified TSelector works also in non-PROOF sessions

  13. PROOF: the data • Data set: dedicated class TDSet • Specifies a collection of files • with objects • Understands logical file names • Could be return by a query to • a database or file catalog or … • API very close to TChain { // proofProcessing.C // Define the data set TDSet a(“TTree”,"h42"); a.Add(“root://oplapro62.cern.ch//tmp/dstarmb.root"); a.Add(“root://oplapro62.cern.ch//tmp/dstarp1a.root"); a.Add(“root://oplapro62.cern.ch//tmp/dstarp1b.root"); a.Add(“root://oplapro62.cern.ch//tmp/dstarp2.root"); // Process the selector a.Process("h1analysis.C"); }

  14. PROOF: running the query Executing … root[0] gROOT->Proof(“pcepsft43.cern.ch”) PROOF set to parallel mode (10 slaves) root[1] .x proofProcessing.C Starting h1analysis with process option: Starting h1analysis with process option: Processing file: /tmp/ganis/rootdata/dstarp1a.root Processing file: /tmp/ganis/rootdata/dstarp2.root Starting h1analysis with process option: Processing file: //tmp/ganis/rootdata/dstarmb.root Processing file: //tmp/ganis/rootdata/dstarp1b.root Processing file: //tmp/ganis/rootdata/dstarp2.root FCN=70.4023 FROM MIGRAD STATUS=CONVERGED 220 CALLS 221 TOTAL EDM=1.37834e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 p0 9.59988e+05 9.07051e+04 7.92857e+01 -2.69331e-09 2 p1 3.51130e-01 2.32881e-02 4.69706e-05 5.29292e-03 3 p2 1.18502e+03 5.95938e+01 6.72112e-01 2.29626e-06 4 p3 1.45569e-01 5.93851e-05 8.69320e-07 -1.75027e+00 5 p4 1.24388e-03 6.63103e-05 7.86533e-07 -6.72432e-01 root[2]

  15. PROOF: additional features • Possibility to upload and / or build additional packages • packed as PAR file (Proof ARchive, as Java JAR …) • gProof->UploadPackage(“MyPackage.par”) • gProof->EnablePackage(“MyPackage”) • Cache system to minimize the number of file transfers • File identity and integrity using message digest technology • Feedback information at configurable time intervals

  16. PROOF: realtime feedback Chain definition (header) is fetched from the PROOF master Feedback histogram, updated every (e.g.) 1 second

  17. PROOF on clusters • PROOF can use “resource brokers” to find out where to start the slaves • PROOF can use file catalogs to locate the files to be analysed • Concrete examples: • Interface with Condor Computing-On-Demand system • master start the slaves as COD jobs • PEAC: PROOF-Enabled Analysis Cluster • Complete event analysis solution: • data catalog, resource broker, PROOF • TGrid: abstract Grid interface for all Grid services • Concrete implementation for Alien // Connect TGrid *alien = TGrid::Connect(“alien”); // Query TGridResult *res = alien->Query(“lfn:///alice/simulation/2001-04/V0.6*.root“); // Data set TDSet *treeset = new TDSet("TTree", "AOD"); treeset->Add(res); // use files in result set to find remote nodes gROOT->Proof(res); treeset->Process(“myselector.C”);

  18. PROOF: current limitations • Originally intended for short queries • TDSet::Process blocks until is done • Stateful connection • everything is lost if the connection is lost or cut • Originally designed for a local cluster • static configuration • Robustness of some components • Interrupt control-flow based on Out-Of-Band messages • Authentication when different protocols are required at different steps • Sandbox when user account not available • Documentation

  19. PROOF: team for new developments • Maarten Ballintijn • Marek Biskup • Rene Brun • Derek Feichtinger (ARDA) • G.G. • Guenter Kickinger • Andreas Peters (ARDA) • Fons Rademakers

  20. PROOF: new development fields • Interactive batch • stateless connection • non blocking queries • Robusteness • Get rid of OOB messages • Setup/ configuration issues • zero-config setup • allow slaves to come and go • Grid interfacing • efficient use of grid information (catalogs, resource brokers, …) • Performance issues • targeted read ahead, improved caching, query estimators • Authentication • Adopt XROOTD framework • Analysis issues: • Tree friends, event lists, indices • GUI, Browsing

  21. Typical query-time distribution

  22. XPD: communication layer for PROOF based on XROOTD • Transfer of state from the client to the PROOF cluster requires a manager on the • cluster side keeping track of existing sessions and query submissions • XROOTD (in ROOT since v 4.01.02), provides a generic main component (xrd) • for handling of networking issues and protocol scheduling, and utilities tools (forking, • error handling, security, …) on which the manager can be based on • Candidate to introduce • interactive-batch mode: • possibility to leave a session if a query takes too long and • reconnect later to pick-up the results • non-blocking query submission: • possibility to detach from the query while being processed • (even for potentially short queries) • more robust authentication system

  23. How does XROOTD work • Multi-component server based on a multi-thread architecture • xrd component: provides networking, thread management, protocol scheduling • Minimal sets of threads: • Acceptor: opens connection; matches the protocol; submits job to scheduler • Pollers: react to any activity on open links; submit job to scheduler • Scheduler: schedules work to be done (jobs) • Worker(s): wait for job to be done • Buffer manager: dynamically optimizes use of memory buffers • Workers created / destroyed following needs • Links not attached to a specific worker: first worker free takes the job • Jobs ≡ data/information to be processed for a given link

  24. poller accept How does XROOTD work files XROOTD XrdXrootdProtocol XrdJob WN BM scheduler links • one XrdXrootdProtocol instance per physical • connection (i.e. per client session) • client gateway to the files: used to communicate • with all the files the client wants to access on that • specific server

  25. poller accept How does XPROOFD work proofserv XPROOFD XrdProotdProtocol XrdJob WN static area scheduler links • one XrdProotdProtocol instance per physical • connection (i.e. per client session) • client gateway to proofserv • static area keeps all the relevant information about • a user and its activities on the cluster

  26. XPROOFD: communication layer slave 1 slave n fork() fork() … XrdProofd XrdProofd proofslave proofslave xc xc PO PO xc proofserv xc TXPSocket xc PO XRD pollers fork() master xc XrdProofd client PO

  27. Basic ingredients • Client side: • new class TXPSocket • TSocket interface understanding the new communication protocol • new class TXProofMgr • reflects the status of a client vis-à-vis of a given cluster • start / attach sessions, described by TProof instances (no more unique) • Server side: • new implementation of XrdProtocol, XrdProofdProtocol • client gateway to the cluster, one-to-one relation to TXProofMgr • static area to describing the persistent information (server lifetime) • new class XrdProofSrv • proxy to the external processor (proofserv), submitted queries, results, … • one per external processor

  28. TXPSocket • Separate thread for receiving messages • Intensive use of unsolicited messages • normal asynchronous messages (i.e. in Collect) • interrupts (no OOB) • ping functionality • Synchronous and asynchronous messages posted in • separate queues • Interrupt handler waken up with internal SIGURG • (from reader to main thread) • Ping treated as a special interrupt (level 0)

  29. TXPSocket – Reader thread TCP connection recv() interrupts SIGURG sync msg async msg Post event

  30. XPD: Demo! • Results achieved with the realistic prototype • Multi-sessions • Disconnect / Reconnect • Process: blocking query • Submit: non-blocking query • Finalize results from different sessions • Archive results to /afs using same daemon as file server

  31. XPD: what next • Deep test of the communication layer • latencies • synchronization problems • Test with large realistic number of slaves • Alternatives for internal connection • Enable authentication • XROOTD load balancing?

  32. Other studies • Advanced prototype using a communication layer based on • memory mapped message queue technology (A. Peters, • D. Feichtinger): • full state in message queues • nice recovery features • multi-thread master • queue insertion, configuration, scheduler, packetizer • client frontend • slave splitting in supervisor and processors • not attached to a specific user • better use of resources

  33. Summary • Lot of activity going on to improve the PROOF system • Working prototype with a communication layer based on • XROOTD exists • interactive batch, multi-session, reconnect • Alternative studies may provided good solutions for some • issues • Goal: have the new system in good shape for ROOT05

More Related