1 / 17

Experience with Globus O nline at Fermilab

Computing Sector, Fermi National Accelerator Laboratory. Experience with Globus O nline at Fermilab. Overview. Integration of Workload Management and Data Movement Systems with GO Center for Enabling Distributed Petascale Science (CEDPS): GO integration with glideinWMS

mirit
Download Presentation

Experience with Globus O nline at Fermilab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing Sector, Fermi National Accelerator Laboratory Experience with Globus Online at Fermilab GlobusWorld 2012: Experience with GO@Fermilab

  2. Overview • Integration of Workload Management and Data Movement Systems with GO • Center for Enabling Distributed Petascale Science (CEDPS): GO integration with glideinWMS • Data Handling prototype for Dark Energy Survey (DES) • Performance tests of GO over 100 Gpbs networks • GO on the Advanced Network Initiative (ANI) testbed • Data Movement on OSG for end users • Network for Earthquake Engineering Simulation (NEES) GlobusWorld 2012: Experience with GO@Fermilab

  3. Fermilab’s interest in GO • Data Movement service for end users • Supporting user communities on the Grid • Evaluating GO services in the workflows of our stakeholders • Data Movement service integration • Evaluate GO as a component of middleware systems e.g. Glidein Workload Management • Evaluate performance of GO for exa-scale networks (100 GE) GlobusWorld 2012: Experience with GO@Fermilab

  4. 1. CEDPS • CEDPS: The five year project 2006-2011, funded by Department of Energy (DOE) • Goals • Produce technical innovations for rapid and dependable data placement within a distributed high performance environment and for the construction of scalable science services for data and computing from many clients. • Address performance and functionality troubleshooting of these and other related distributed activities. • Collaborative Research • Mathematics & Computer Science Division, Argonne National Laboratory • Computing Division, Fermi National Accelerator Laboratory • Lawrence Berkeley National Laboratory • Information Sciences Institute, University of Southern California • Dept of Computer Science, University of Wisconsin Madison • Collaborative work done by Fermi National Lab, Argonne National Lab, University of Wisconsin • Supporting the integration of data movement mechanisms with scientific Glidein workload management system • Integration of asynchronous data stage-out mechanisms in overlay workload management systems GlobusWorld 2012: Experience with GO@Fermilab

  5. glideinWMS • Pilot-based WMS that creates on demand a dynamically-sized overlay condor batch system on Grid resources to address the complex needs of VOs in running application workflows • User Communities • CMS • Communitties in the Fermilab • CDF • DZero • Intensity Frontier Experiments (Minos, Minerva, Nova …) • OSG Factory at UCSD & Indian Univ • Serves OSG VO Frontends, including ICECube, Engage, LSST, … • CoralWMS - Frontend for TeraGrid community • Atlas - Evaluating glideinWMS interfaced with Panda framework for their analysis framework • User community growing rapidly GlobusWorld 2012: Experience with GO@Fermilab

  6. Glideinwms Scale of Operations CMS Factory@CERN serving ~400K jobs OSG Factory@UCSD serving ~200K jobs CMS Analysis Frontend@UCSD serving pool with ~25K jobs CMS Frontend@CERN serving pool with ~50K jobs CMS Production Factory (up) & Frontend at CERN OSG Factory & CMS Analysis at UCSD GlobusWorld 2012: Experience with GO@Fermilab

  7. Integrating glideinWMS with GO glideinWMS Glidein Factory, WMS Pool • Goals: • Middleware handle data movement, rather than the application • Middleware optimize use of computing resources (CPU do not block on data movement) • Users provide data movement directives in the Job Description File (e.g. storage services for IO) • glideinWMS procures resources on the Grid and run jobs using Condor • Data movement is delegated to the underlying Condor system • globusconnect is instantiated and GO plug-in is invoked using the directives in the JDF • Condor optimizes resources VO Infrastructure VO Frontend Condor Central Manager Condor Scheduler Job Grid Site Worker Node glidein Condor Startd globusonline.org GlobusWorld 2012: Experience with GO@Fermilab

  8. Validation Test Results • Tests – Modified Intensity Frontier experiment (Minerva) jobs to transfer output sandbox to GO endpoint using transfer plugin • Jobs: 2636, with 500 running at a time • Total files transferred: 16359 • Upto 500 dynamically created GO endpoints at a given time. • Lessons Learned • Integration tests successful with 95% transfer success rate -- stressing scalability of GO in an unintended way • GO team working on the scalability issues identified • Efficiency and scalability can be increased by modifying the plugin to reuse GO endpoints and by transferring multiple files at the same time. GlobusWorld 2012: Experience with GO@Fermilab

  9. 2. Prototype integration of GO with DES Data Access Framework • Motivation • Support Dark Energy Survey preparation for data taking • See Don Petravick’s talk on Wed • DES Data Access Framework (DAF) uses a network of GridFTP servers to reliably move data across sites. • In Mar 2011, we investigated the integration of DAF with GO to address 2 issues: • DAF data transfer parameters were not optimal for both small and large files. • Reliability was implemented inefficiently by sequentially verifying real file size with DB catalogue. GlobusWorld 2012: Experience with GO@Fermilab

  10. Results and improvements • Tested DAF moving 31,000 files (184 GB) with GO vs. UberFTP • Results • Time for Transfer + Verification is the same (~100 min) • Transfer time is 27% faster with GO than with UberFTP • Verification time is 50% slower with GO than sequentially with UberFTP • Proposed Improvements: • Allow specification of src / dest transfer reliability semantics (e.g. same size, same CRC, etc.) – Implemented for size • Allow finer-grain failure model (e.g. specify number of transfer retrials instead of time deadline) • Provide interface for efficient (pipelined) lsof src / destfiles. GlobusWorld 2012: Experience with GO@Fermilab

  11. 3. GO on the ANI Testbed • Motivation:Testing Grid middleware readiness to interface 100 Gbits links on the Advanced Network Initiative (ANI) Testbed. • Characteristics: • GridFTPdata transfers (small, medium, large, all sizes) • 300GB of data split into 42432 files (8KB – 8GB) • Network: aggregate 3 x 10Gbit/s to bnl-1 test machine • Local tests(reference) initiated on bnl-1 • FNAL and GO tests:initiated on “FNAL initiator”; GridFTPcontrol forwarded through “VPN gateway” Work by Dave Dykstra w/ contrib. by Raman Verma& Gabriele Garzoglio 11 GlobusWorld 2012: Experience with GO@Fermilab

  12. Test results • GO (yellow) does almost as well as practical max (red) for medium-size files. • Working with GO to improve transfer parameters for big and small files. • Small files have very high overhead over wide area control channels • GO auto-tuning works better for medium files than for the large files • Counterintuitively, increasing concurrency and pipelining on small files reduced the transfer throughput. 12 Work by Dave Dykstra w/ contrib. by Raman Verma& Gabriele Garzoglio GlobusWorld 2012: Experience with GO@Fermilab

  13. 4. Data Movement on OSG for NEES A. R. Barbosa, J. P. Conte, J. I. Restrepo, UCSD • Motivation • supporting NEES group at UCSD to run computations on the Open Science Grid (OSG) • Goal • Perform parametric studies that involve large-scale nonlinear models of structure or soil-structure systems with large number of parameters and OpenSees runs. • Application example • nonlinear time-history (NLTH) analyses of advanced nonlinear finite element (FE) model of a building • Probabilistic seismic demand hazard analysis making use of the “cloud method”: 90 bi-directional historical earthquake record • Sensitivity of probabilistic seismic demand to FE model parameters 30 days on OSG vs. 12 yrs on Desktop GlobusWorld 2012: Experience with GO@Fermilab

  14. Success and challenges • Jobs submitted from RENCI (NC) to ~ 20 OSG sites. Output collected at RENCI. • NEES scientist moved 12 TB from the RENCI server to the user’s desktop at UCSD using GO • Operations: every day, set up the data transfer update for the day: fire and forget …almost… • …there is still no substitute for a good network administrator • Initially, we had 5 Mbps  eventually 200 Mbps (over 600 Mbps link). Improvements: • Upgrade eth card on user desktop • Migrate from Windows to Linux • Work with the user to use GO • Find a good net admin to find and fix broken fiber at RENCI, when nothing else worked. • Better use of GO on OSG: Integrate GO with the Storage Resource Broker (SRM) GlobusWorld 2012: Experience with GO@Fermilab

  15. Conclusions • Fermilab has worked with the GO team to improve the system for several use cases: • Integration with glidein Workload Management – Stress the “many-globusconnect” dimension • Integration with Data Handling for DES – New requirements on reliability semantics • Evaluation of performance over 100 Gbps networks – Verify transfer parameters auto-tuning at extreme scale • Integrate GO with NEES for regular operations on OSG – Usability for GO’s intended usage GlobusWorld 2012: Experience with GO@Fermilab

  16. Acknowledgments • GlobusOnline team for their support in all of these activities. • Integration of Glideinwms and globusonline.org was done as a part of CEDPS project • glideinWMS infrastructure is developed in Fermilab in collaboration with the Condor team from Wisconsin and High Energy Physics experiments. • Most of the glideinWMS development work is funded by USCMS (part of CMS) experiment. • Currently used in production by CMS, CDF and DZero, MINOS, ICECube with several other VOS evaluating it for their use case. • The Open Science Grid (OSG) • Fermilabis operated by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the United States Department of Energy. GlobusWorld 2012: Experience with GO@Fermilab

  17. References • CEDPS Report: GO Stress Test Analysis • https://cd-docdb.fnal.gov:440/cgi-bin/RetrieveFile?docid=4474;filename=GlobusOnline%20PluginAnalysisReport.pdf;version=1 • DES DAF Integration with GO • https://www.opensciencegrid.org/bin/view/Engagement/DESIntegrationWithGlobusonline • GridFTP & GO on the ANI Testbed • https://docs.google.com/document/d/1tFBg7QVVFu8AkUt5ico01vXcFsgyIGZH5pqbbGeI7t8/edit?hl=en_US&pli=1 • OSG User Support of NEES • https://www.opensciencegrid.org/bin/view/Engagement/EngageOpenSeesProductionDemo GlobusWorld 2012: Experience with GO@Fermilab

More Related