420 likes | 441 Views
PROOT Tutorials – Session 10. PROOF, GRID, AliEn Fons Rademakers. Bring the KB to the PB not the PB to the KB. PROOF. Collaboration between core ROOT group at CERN and MIT Heavy Ion Group. Fons Rademakers. Maarten Ballintijn. Part of and based on ROOT framework
E N D
PROOT Tutorials – Session 10 PROOF, GRID, AliEn Fons Rademakers Bring the KB to the PB not the PB to the KB ROOT Tutorials - Session 10
PROOF • Collaboration between core ROOT group at CERN and MIT Heavy Ion Group • Fons Rademakers • Maarten Ballintijn • Part of and based on ROOT framework • Uses heavily ROOT networking and other infrastructure classes • No external technologies ROOT Tutorials - Session 10
Parallel ROOT Facility • The PROOF system allows: • Parallel analysis of trees in a set of files • Parallel analysis of objects in a set of files • Parallel execution of scripts on clusters of heterogeneous machines • Its design goals are: • Transparency, scalability, adaptability • Prototype developed in 1997 as proof of concept, full version nearing completion now ROOT Tutorials - Session 10
stdout/obj proof ana.C proof TFile TFile TFile proof TNetFile proof proof proof = master server proof = slave server Parallel Script Execution #proof.conf slave node1 slave node2 slave node3 slave node4 Local PC Remote PROOF Cluster root *.root node1 ana.C *.root $ root root [0] .x ana.C root [1] gROOT->Proof(“remote”) $ root root [0] .x ana.C $ root $ root root [0] tree->Process(“ana.C”) root [1] gROOT->Proof(“remote”) root [2] chain->Process(“ana.C”) node2 *.root node3 *.root node4 ROOT Tutorials - Session 10
Data Access Strategies • Each slave get assigned, as much as possible, packets representing data in local files • If no (more) local data, get remote data via rootd and rfio (needs good LAN, like GB eth) • In case of SAN/NAS just use round robin strategy ROOT Tutorials - Session 10
PROOF Transparency • Make working on PROOF as similar as working on your local machine • Return to the client all objects created on the PROOF slaves • The master server will try to add “partial” objects coming from the different slaves before sending them to the client ROOT Tutorials - Session 10
PROOF Scalability • Scalability in parallel systems is determined by the amount of communication overhead (Amdahl’s law) • Varying the packet size allows one to tune the system. The larger the packets the less communications is needed, the better the scalability • Disadvantage: less adaptive to varying conditions on slaves ROOT Tutorials - Session 10
PROOF Adaptability • Adaptability means to be able to adapt to varying conditions (load, disk activity) on slaves • By using a “pull” architecture the slaves determine their own processing rate and allows the master to control the amount of work to hand out • Disadvantage: too fine grain packet size tuning hurts scalability ROOT Tutorials - Session 10
Workflow For Tree Analysis –Pull Architecture Slave 1 Master Slave N Process(“ana.C”) Process(“ana.C”) Initialization Packet generator Initialization GetNextPacket() GetNextPacket() 0,100 Process 100,100 Process GetNextPacket() GetNextPacket() 200,100 Process 300,40 Process GetNextPacket() GetNextPacket() 340,100 Process Process 440,50 GetNextPacket() GetNextPacket() 490,100 Process 590,60 Process SendObject(histo) SendObject(histo) Wait for next command Add histograms Wait for next command Display histograms ROOT Tutorials - Session 10
PROOF Error Handling • Handling death of PROOF servers • Death of master • Fatal, need to reconnect • Death of slave • Master can resubmit packets of death slave to other slaves • Handling of ctrl-c • OOB message is send to master, and forwarded to slaves, causing soft/hard interrupt ROOT Tutorials - Session 10
PROOF Authentication • PROOF supports secure and un-secure authentication mechanisms • Un-secure • Mangled password send over network • Secure • SRP, Secure Remote Password protocol (Stanford Univ.), public key technology • Kerberos5 • Globus ROOT Tutorials - Session 10
Architecture and Implementation ROOT Tutorials - Session 10
TSelector – The algorithms • Basic ROOT TSelector // Abbreviated version class TSelector : public TObject { Protected: TList *fInput; TList *fOutput; public void Init(TTree*); void Begin(Ttree*); Bool_t Process(int entry); void Terminate(); }; ROOT Tutorials - Session 10
TDSet – The data • Specify a collection of TTrees or files with objects root[0] TDSet *d = new TDSet(“TTree”, “tracks”, “/”); OR root[0] TDSet *d = new TDSet(“TEvent”, “”, “/objs”); root[1] d->Add(“root://rcrs4001/a.root”); … root[10] d->Print(“a”); root[11] d->Process(“mySelector.C”, nentries, first); • Returned by DB or File Catalog query etc. • Use logical filenames (“lfn:…”) ROOT Tutorials - Session 10
Sandbox – The Environment • Each slave runs in its own sandbox • Identical, but independent • Multiple file spaces in a PROOF setup • Shared via NFS, AFS, shared nothing • File transfers are minimized • Cache • Packages ROOT Tutorials - Session 10
Sandbox – The Cache • Minimize the number of File transfers • One Cache per file space • Locking to guarantee consistency • File identity and integrity ensured using • MD5 digest • Time stamps • Transparent via TProof::Sendfile() ROOT Tutorials - Session 10
Sandbox – Package Manager • Provide a collection of files in the sandbox • Binary or Source packages • PAR files: PROOF ARchive. Like Java jar • Tar file, ROOT-INF directory • BUILD.C or BUILD.sh • SETUP.C, per slave setting • API manage and activate packages ROOT Tutorials - Session 10
Implementation Highlights • TProofPlayer class hierarchy • Basic API to process events in PROOF • Implement event loop • Implement proxy for remote execution • TEventIter • Access to TTree or TObject derived collection • Cache file, directory, tree ROOT Tutorials - Session 10
TProofPlayer Slave Client TProofServ TPPRemote Master TPPSlave TProof TProof TProofServ Slave TPPRemote TProofServ TPPSlave ROOT Tutorials - Session 10
SendFile SendFile Process(dset,sel,inp,num,first) GetEntries Process(dset,sel,inp,num,first) GetPacket ReturnResults(out,log) ReturnResults(out,log) Simplified Message Flow Master Client Slave(s) ROOT Tutorials - Session 10
Dynamic Histogram Binning • Implemented using THLimitsFinder class • Avoid synchronization between slaves • Keep score-board in master • Use histogram name as key • First slave posts limits • Master determines best bin size • Others use these values ROOT Tutorials - Session 10
Merge API • Collect output lists in master server • Objects are identified by name • Combine partial results • Member function: Merge(TCollection *) • Executed via CINT, no inheritance required • Standard implementation for Histograms • Otherwise return the individual objects ROOT Tutorials - Session 10
Setting Up PROOF ROOT Tutorials - Session 10
Setting Up PROOF • Install ROOT system • For automatic execution of daemons add proofd and rootd to /etc/inetd.conf (or in /etc/xinetd.d) and /etc/services (not mandatory, servers can be started by users) • The rootd (1094) and proofd (1093) port numbers have been officially assigned by IANA • Setup proof.conf file describing cluster • Setup authentication files (globally, users can override) ROOT Tutorials - Session 10
PROOF Configuration File # PROOF config file. It has a very simple format: # # node <hostname> [image=<imagename>] # slave <hostname> [perf=<perfindex>] # [image=<imagename>] [port=<portnumber>] # [srp | krb5] # user <username> on <hostname> node csc02 image=nfs slave csc03 image=nfs slave csc04 image=nfs slave csc05 image=nfs slave csc06 image=nfs slave csc07 image=nfs slave csc08 image=nfs slave csc09 image=nfs slave csc10 image=nfs ROOT Tutorials - Session 10
The AliEn GRID ROOT Tutorials - Session 10
AliEn a Lightweight GRID • AliEn (http://alien.cern.ch) is a lightweight alternative to full blown GRID based on standard components (SOAP, Web services) • Distributed file catalogue as a global file system on a RDBMS • TAG catalogue, as extension • Secure authentication • Central queue manager ("pull" vs "push" model) • Monitoring infrastructure • C/C++/perl API • Automatic software installation with AliKit The Core GRID Functionality !! • AliEn is routinely used in production for Alice PPR ROOT Tutorials - Session 10
Perl5 AliEn SOAP Server alien (shell,Web) Client User Application (C/C++/Java/Perl) SOAP Client Authorisation Service SOAP DBI Proxy server DB Driver ADMIN File transport Service File Catalogue PROCESSES File Catalogue DB Sync Service AliEn Components Architecture DISK API Services (file transport, sync) Secure authentication service independent of underlying database File catalogue: global file system on top of relational database Central task queue ROOT Tutorials - Session 10
ALICE ALICE USERS SIM AliEn Components File catalogue Tier1 |--./ | |--cern.ch/ | | |--user/ | | | |--a/ | | | | |--admin/ | | | | | | | | | |--aliprod/ | | | | | | | |--f/ | | | | |--fca/ | | | | | | | |--p/ | | | | |--psaiz/ | | | | | |--as/ | | | | | | | | | | | |--dos/ | | | | | | | | | | | |--local/ | | | | | | | |--b/ | | | | |--barbera/ ALICE LOCAL | |--36/ | | |--stderr | | |--stdin | | |--stdout | | | |--37/ | | |--stderr | | |--stdin | | |--stdout | | | |--38/ | | |--stderr | | |--stdin | | |--stdout |--simulation/ | |--2001-01/ | | |--V3.05/ | | | |--Config.C | | | |--grun.C Files, commands (job specification) as well as job input and output, tags and even binary package tar files are stored in the catalogue ROOT Tutorials - Session 10
Bookkeeping --./ | |--r3418_01-01.root | |--r3418_02-02.root | |--r3418_03-03.root | |--r3418_04-04.root | |--r3418_05-05.root | |--r3418_06-06.root | |--r3418_07-07.root | |--r3418_08-08.root | |--r3418_09-09.root | |--r3418_10-10.root | |--r3418_11-11.root | |--r3418_12-12.root | |--r3418_13-13.root | |--r3418_14-14.root | |--r3418_15-15.root D0 path char(255) T2526 dir integer(11) hostIndex integer(11) <fk> type char(4) entryId integer(11) <pk> dir integer(8) T2527 name char(64) type char(4) owner char(8) dir integer(8) ctime char(16) name char(64) comment char(80) owner char(8) content char(255) ctime char(16) method char(20) comment char(80) methodArg char(255) content char(255) gowner char(8) method char(20) size integer(11) methodArg char(255) gowner char(8) size integer(11) AliEn Components ROOT Tutorials - Session 10
AliEn Components Data access ROOT Tutorials - Session 10
PROOF and the GRID ROOT Tutorials - Session 10
PROOF Grid Interface • PROOF can use a Grid Resource Broker to detect which nodes in a cluster can be used in the parallel session • PROOF can use Grid File Catalogue and Replication Manager to map LFN’s to chain of PFN’s • PROOF can use Grid Monitoring Services • Access will be via abstract Grid interface ROOT Tutorials - Session 10
Different PROOF Scenarios –Static, stand-alone • This scheme assumes: • no third party grid tools • remote cluster containing data files of interest • PROOF binaries and libs installed on cluster • PROOF daemon startup via (x)inetd • per user or group authentication setup by cluster owner • static basic PROOF config file • In this scheme the user knows his data sets are on the specified cluster. From his client he initiates a PROOF session on the cluster. The master server reads the config file and fires as many slaves as described in the config file. User issues queries to analyse data in parallel and enjoy near real-time response on large queries. • Pros: easy to setup • Cons: not flexible under changing cluster configurations, resource availability, authentication, etc. ROOT Tutorials - Session 10
Different PROOF Scenarios –Dynamic, PROOF in Control • This scheme assumes: • grid resource broker, file catalog, meta data catalog, possible replication manager • PROOF binaries and libraries installed on cluster • PROOF daemon startup via (x)inetd • grid authentication • In this scheme the user queries a metadata catalog to obtain the set of required files (LFN's), then the system will ask the resource broker where best to run depending on the set of LFN's, then the system initiates a PROOF session on the designated cluster. On the cluster the slaves are created by querying the (local) resource broker and the LFN's are converted to PFN's. Query is performed. • Pros: use grid tools for resource and data discovery. Grid authentication. • Cons: require preinstalled PROOF daemons. User must be authorized to access resources. ROOT Tutorials - Session 10
Different PROOF Scenarios –Dynamic, AliEn in Control • This scheme assumes: • AliEn as resource broker and grid environment (taking care of authentication, possible via Globus) • AliEn file catalog, meta data catalog, and replication manager • In this scheme the user queries a metadata catalog to obtain the set of required files (LFN's), then hands over the PROOF master/slave creation to AliEn via an AliEn job. AliEn will find the best resources, copy the PROOF executables and start the PROOF master, the master will then connect back to the ROOT client on a specified port (callback port was passed as argument to AliEn job). In turn the slave servers are started again via the same mechanism. Once connections have been setup the system proceeds like in example 2. • Pros: use AliEn for resource and data discovery. No pre-installation of PROOF binaries. Can run on any AliEn supported cluster. Fully dynamic. • Cons: no guaranteed direct response due to the absence of dedicated "interactive" queues. ROOT Tutorials - Session 10
Different PROOF Scenarios –Dynamic, Condor in Control • This scheme assumes: • Condor as resource broker and grid environment (taking care of authentication, possible via Globus) • Grid file catalog, meta data catalog, and replication manager • This scheme is basically same as previous AliEn based scheme. Except for the fact that in the Condor environment Condor manages free resources and as soon as a slave node is reclaimed by its owner, it will kill or suspend the slave job. Before any of those events Condor will send a signal to the master so that it can restart the slave somewhere else and/or re-schedule the work of that slave on the other slaves. • Pros: use grid tools for resource and data discovery. No pre-installation of PROOF binaries. Can run on any Condor pool. No specific authentication. Fully dynamic. • Cons: no guaranteed direct response due to the absence of dedicated "interactive" queues. Slaves can come and go. ROOT Tutorials - Session 10
TGrid Class –Abstract Interface to AliEn class TGrid : public TObject { public: virtual Int_t AddFile(const char *lfn, const char *pfn) = 0; virtual Int_t DeleteFile(const char *lfn) = 0; virtual TGridResult *GetPhysicalFileNames(const char *lfn) = 0; virtual Int_t AddAttribute(const char *lfn, const char *attrname, const char *attrval) = 0; virtual Int_t DeleteAttribute(const char *lfn, const char *attrname) = 0; virtual TGridResult *GetAttributes(const char *lfn) = 0; virtual void Close(Option_t *option="") = 0; virtual TGridResult *Query(const char *query) = 0; static TGrid *Connect(const char *grid, const char *uid = 0, const char *pw = 0); ClassDef(TGrid,0) // ABC defining interface to GRID services }; ROOT Tutorials - Session 10
Running PROOF Using AliEn TGrid *alien = TGrid::Connect(“alien”); TGridResult *res; res = alien->Query(“lfn:///alice/simulation/2001-04/V0.6*.root“); TDSet *treeset = new TDSet("TTree", "AOD"); treeset->Add(res); gROOT->Proof(res); // use files in result set to find remote nodes treeset->Process(“myselector.C”); // plot/save objects produced in myselector.C . . . ROOT Tutorials - Session 10
Future • Ongoing development • Event lists • Friend Trees • Scalability to O(100) nodes • Multi site PROOF sessions • The GRID ROOT Tutorials - Session 10
Demo! • The H1 example analysis code • Use output list for histograms • Move fitting to client • 8 fold H1 example dataset • 2.1 Gbyte • 2.3 Million Events ROOT Tutorials - Session 10
Demo! • Client machine • PIII 700 MHz laptop • Standard IDE disk • Cluster with 4 nodes • Dual Itanium 2 @ 900 MHz / 4 GB • 15K SCSI disk ROOT Tutorials - Session 10