1 / 19

Extension of DIRAC to enable distributed computing using Windows resources

Extension of DIRAC to enable distributed computing using Windows resources. 3 rd EGEE User Forum 11-14 February 2008, Clermont-Ferrand. J. Coles , Y. Y. Li, K. Harrison, A. Tsaregorodtsev, M. A. Parker, V. Lyutsarev. Overview. Why port to Windows and who is involved? DIRAC overview

meverhart
Download Presentation

Extension of DIRAC to enable distributed computing using Windows resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extension of DIRAC to enable distributed computing using Windows resources 3rd EGEE User Forum 11-14 February 2008, Clermont-Ferrand J. Coles, Y. Y. Li, K. Harrison, A. Tsaregorodtsev, M. A. Parker, V. Lyutsarev

  2. Overview • Why port to Windows and who is involved? • DIRAC overview • Porting process • Client (job creation/submission) • Agents (job processing) • Resources • Successes/usage • Deployment • Summary University of Cambridge

  3. Motivation • Aim: • Enabling Windows computing resources in the LHCb workload and data management system DIRAC • Allow what can be done under Linux to be possible under Windows • Motivation: • To increase the number CPU resources available to LHCb for production and analysis • To offer a service to Windows users • Allow transparent job submissions and execution on Linux and Windows • Who’s involved: • Cambridge, Cavendish – Ying Ying Li, Karl Harrison, Andy Parker • Marseilles, CPPM - Andrei Tsaregorodtsev (DIRAC Architect) • Microsoft Research – Vassily Lyutsarev University of Cambridge

  4. DIRAC Overview • Distributed Infrastructure with Remote Agent Control • LHCb’s distributed productionand analysis workload and data management system • Written in Python • 4 sections • Client • User interface • Services • DIRAC Work Management System, based on the main Linux server • Agents • Resources • CPU resources and Data storage

  5. DISET security module • DIRAC Security Transport module – underlying security module of DIRAC • Provides grid authentication and encryption (using X509 certificates and grid proxies) between the DIRAC components • Uses OpenSSL with pyOpenSSL (DIRAC’s modified version) wrapped around it. • Standard: Implements Secure Sockets Layer and Transport Layer Security, and contains cryptographic algorithm. • Additional: Grid proxy support • Pre-built OpenSSL and pyOpenSSL libraries are shipped with DIRAC • Windows libraries are provided alongside Linux libraries, allowing appropriate libraries to be loaded at run time • Proxy generation under Windows • Multi-platform command: dirac-proxy-init • Validation of generated proxy is checked under both Windows and Linux University of Cambridge

  6. Client – job submissions SoftwarePackages = { “DaVinci.v12r15" }; InputSandbox = { “DaVinci.opts” }; InputData = { "LFN:/lhcb/production/DC04/v2/00980000/DST/Presel_00980000_00001212.dst" }; JobName = “DaVinci_1"; Owner = "yingying"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = { "std.out", "std.err", “DaVinci_v12r15.log” “DVhbook.root” }; JobType = "user"; import DIRAC from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication(‘DaVinci', 'v12r15') job.setInputSandbox(['DaVinci.opts’]) job.setInputData(['LFN:/lhcb/production/DC04/v2/00980000/DST/Presel_00980000_00001212.dst']) job.setOutputSandbox([‘DaVinci_v12r15.log’, ‘DVhbook.root’]) dirac.submit(job) • Submissions made with valid grid proxy • Three Ways • JDL (Job Description Language) • DIRAC API • Ganga • Built on DIRAC API commands • Currently under porting process to Windows • Successful job submission returns job ID, provided by Job Monitoring Service JDL API > myjob.py or enter directly in python under Windows • > dirac-job-submit.py myjob.jdl • Under Windows University of Cambridge

  7. DIRAC Agent under Windows • Python installation script • Downloads and installs DIRAC software, and sets up DIRAC Agent • Agents are initiated on free resources • Agent Job retrieval: • Run DIRAC Agent to see if there are any suitable jobs on the server. • Agent retrieves any matched jobs. • Agent Reports to Job Monitoring Service of job status • Agent downloads and installs required applications to run the job. • Agent retrieves any required data. • (see next slide) • Agent creates Job Wrapper to run the job (wrapper platform aware). • Upload output to storage if requested Web Monitoring Linux Sites Windows Sites University of Cambridge

  8. Data access • Data access to LHCb’s distributed data storage system requires: • Access to LFC (LCG File Catalogue, maps LFNs (Logical File Names) to the PFNs (Physical File Names)) • Access to the Storage Element • On Windows a catalogue client is provided via the DIRAC portal service • Uses DIRAC’s security moduleDISET and a valid user’s grid proxy • Authenticates to Proxy server, and proxy server contacts File catalogue on user’s behalf with its own credentials • Uses .NetGridFTP client 1.5.0 provided by University of Virginia • Based on GridFTP v1, from tests it seems to be compatible with GridFTP server used by LHCb (edg uses GridFTP client 1.2.5-1 and globus GT2) • Client contains functions needed for file transfers • get, put, mkdir • And a batch tool that mimics the command flags of globus-url-copy • Requirements: • .Net v2.0 • .NetGridFTP binaries are shipped with DIRAC • Allows full data registration and transfer to any Storage Element supporting GridFTP University of Cambridge

  9. DIRAC CE backends • DIRAC provides a variety of Compute Element backends under Linux: • Inprocess (standalone machine), LCG, Condor etc… • Windows: • Inprocess • Agent loops in preset intervals assessing the status of the resource • Microsoft Windows Compute Cluster • Additional Windows specific CE backend • Requiresone shared installation of DIRACand applications on theHead nodeof the cluster • Agents are initiated from theHead node, and communicates with theCompute Cluster Services • Job outputsare uploaded to the Sandboxes directlyfrom the worker nodes University of Cambridge

  10. RAW Statistics DST DST Sim RAWmc DaVinci Analysis Boole Digitalisation Brunel Reconstruction Bender LHCb applications • Five main LHCb applications (C++ : Gauss, Boole, Brunel, DaVinci Python: Bender) Gauss Event Generation Detector Simulation Data flow from detector MC Production Job Sim – Simulation data format RAWmc – RAW Monte Carlo, equivalent to RAW data format from detector DST – Data Storage Tape Analysis Job University of Cambridge

  11. Gauss • Most LHCb applications are compiled for both Linux and Windows • For historical reasons, we use Microsoft Visual Studio .Net 2003 • Gauss – only application, previously not compiled under Windows. • Gauss relies on three major pieces of software not developed by LHCb • Pythia6: simulation of particle production – Legacy Fortran code • EvtGen: Simulation of particle decays – C++ • Geant4: Simulation of detector – C++ • Gauss needs each of the above to run under Windows • Work strongly supported by LHCb and LCG software teams • All third-party software now successfully built under Windows • Most build errors have resulted from Windows compiler being less tolerant of “risky coding” than gcc • Insist on arguments passed to function being of correct type • More strict about memory management • Good for forcing code improvements! • Able to fully build Gauss under Windows with both Generator and Simulation parts • We are able to produce full Gauss jobs of BBbar events, with comparable distributions to those produced under Linux • Have installed and tested Gauss v30r4 on Cambridge cluster • Latest release of Gauss v30r5 • First fully Windows compatible release • Contains both pre-built GEANT4 and Generator Windows binaries University of Cambridge

  12. Cross-platform job submissions • Job creation and submission process is the same under both Linux and Windows (i.e. uses the same DIRAC API commands, and the same steps) • Two current types of main LHCb grid jobs • MC Production Jobs – CPU intensive, no input required. Potentially ideal for ‘CPU scavenging’ jobs • Recent efforts (Y.Y.Li, K.Harrison) allowed Gauss to compile under Windows (see previous slide) • A full MC production chain is still to be demonstrated on Windows • Analysis Jobs – Requires input (data, private algorithms, etc …) • DaVinci, Brunel, Boole • Note: requires C++ compiler for customised user algorithms • Jobs submitted with libraries are bound to the same platform for processing • Platform requirements can be added during job submission • Bender (Python) • Note: no compiler, linker or private library required • Allows cross-platform analysis jobs to be performed • Results retrieved to local computer via >dirac_job_get_output.py 1234results in the outputsandbox >dirac-rm-get(LFN)this uses GridFTP to retrieve outputdata from a Grid SE University of Cambridge

  13. DIRAC Widows usage • DIRAC is supported on two Windows platforms • Windows XP • Windows Server 2003 • Use of DIRAC to run LHCb physics analysis under Windows • Comparison between DC04 and DC06 data on B±→D0(Ksπ+π-)K± channel • 917,000 DC04 events processed under Windows, per selection run • ~48hours total CPU time on 4 nodes • Further ~200 jobs (totalling ~4.7 million events) submitted from Windows to DIRAC, processing on LCG, retrieved on Windows • Further selection background studies are currently being carried out with the system • Processing speed comparisons between Linux and Windows • Difficult, as currently the Windows binaries are built in debug mode by default University of Cambridge

  14. DIRAC deployment Future deployment University of Cambridge

  15. Windows wrapping • Bulk of DIRAC python code was already platform independent • However not all python modules are platform independent • Three types of code modifications/additions: • Platform specific libraries and binaries (e.g. OpenSSL, pyOpenSSL, .NetGridFTP) • Additional Windows specific code (e.g. Windows Compute Cluster CE backend, .bat files to match Linux shell scripts) • Minor Python code modifications (e.g. changing process forks to threads) • Dirac installation ~ 60MB • Per LHCb application ~ 7GB Windows Specific 6% Windows port modifications by file size of used DIRAC code Modified for cross-platform compatibility 34% Unmodified 60% University of Cambridge

  16. Summary • Working DIRAC v2r11, and able to integrate both Windows standalone and cluster CPUs to existing Linux system • Porting – replacement of Linux specific python code & provision of windows equivalents where platform independence not possible (e.g. pre-compiled libs, secure file transfers…) • Windows platforms tested: • Windows XP • Windows Server 2003 • Cross-platform job submissions and retrievals • Little change to syntax for user • Full analysis jobs cycle on Windows, from algorithm development to results analysis. (BenderRunning(linux)Getting results ) • Continued use for further physics studies • All applications for MC production jobs tested • Deployment extended to three site so far, totalling 100+ Windows CPUs. • Two Windows Compute Cluster sites Future plans: • Test the full production chain • Deploy on further systems/sites e.g. Birmingham • Larger scale test • Continued usage for physics studies • Provide a useful tool when LHC data arrives University of Cambridge

  17. Backup slides University of Cambridge

  18. Cross-platform compatibility University of Cambridge

  19. Head Node 1 DIRAC Job Matcher Agent 2 Job Management Service DaVinci 3 Software Repository Sandbox Service DIRAC Wrapper Job Monitoring Service Job Watch-dog WMS LFC Service Proxy Server DISET Local SE Job Submission by User University of Cambridge

More Related