110 likes | 315 Views
Globus Online + Panda: a brief summary. Maxim Potekhin for the BNL PAS team Brookhaven National Laboratory ATLAS S&C Week, March 13 th 2012. Overview. For more info, please see my Globus presentation this coming Thursday How does Globus Online work and why we are considering it?
E N D
Globus Online + Panda:a brief summary Maxim Potekhin for the BNL PAS team Brookhaven National Laboratory ATLAS S&C Week, March 13th 2012
Overview • For more info, please see my Globus presentation this coming Thursday • How does Globus Online work and why we are considering it? • Globus Online in the context of Panda, what functionality has been implemented and what we can offer in near term • Credits: Carlos Contreras, Wensheng Deng, Shuwei Ye, Horst Severini
How does it work? From the Globus Online Web Site: “Globus Online is a hosted service that automates the tasks associated with moving files between sites, or “endpoints.” In a nutshell, here’s how to transfer files using the Web interface: • Log in to Globus Online • Select “Start Transfer” on the “Go To” drop-down menu • Specify the source and destination endpoints, and the files to move • Click an arrow button to initiate the transfer” https://www.globusonline.org/howitworks/ Why did we become interested? In addition to forming the basis of ATLAS computing, Panda has been once in a while used by a number of “non-ATLAS” VOs within the Open Science Grid. One of the challenges in enabling these organizations to use Panda was the absence of sufficiently light-weight and generic data movement subsystem, independent of DDM/DQ2. This was partially resolved by using a variety of one-off solutions. In 2011, Globus Online was presented at the OSG All-Hands meeting, and it’s feature set presented a near-perfect match to the functionality we had been missing in serving the needs of small organizations and research teams.
How does it work? 1. User initiates transfer request 3. Globus Online notifies user Destination Source 2. Data Transfer
Features API In addition to the Web interface, Globus Online offers other APIs: • HTTP • Full-fledged gsissh interface • Python binding Globus Connect. Globus Connect is a portable client, available in Linux, Windows and Mac versions, which can be used to create Globus Endpoints on demand on the machine of user’s choice (such as their laptop or their analysis workstation). In the Linux environment, it can be dynamically downloaded and installed if needed.
Why it may be useful for ATLAS? Hypothetical use case During analysis, a user would like to run a series of filters on data which exists on a few different sites. In addition to having the resulting data written to the “usual” locations at the processing sites, the user wants to have the resulting body of data conveniently placed in one location such as their workstation or laptop, or any GridFTP site. The transfers can be monitored on the Web Page or from a Python client if needed.
Running with Globus-aware ATLAS Pilot • Pilot submission to Panda follows the usual technique (Pilot Scheduler in the past, APF going forward). Pilot type must point to a Globus-Enabled version of the Pilot. • The user submits their job using prun, with entries in the “metadata” section specifying the parameters of data transfer. • Job runs, output data gets to destination In our testing, we had to re-use existing command line options in prun, will request that changes be made to make those more user-friendly. So the exact prun option syntax is work in progress. Example: --exec "/usatlas/u/wdeng/prod_test/test/hadd_wrap.py 900b1a06-5d44-4a88-8394-bbf1ca090ccd_0.HIST.root `echo %IN | sed 's/,/ /g'`" --site ANALY_BNL_T3 "--athenaTag=AtlasPhysics,16.6.4.1.1" --noBuild --outDS user.wdeng.0a5c4f41-facf-4af6-b21a-b3230efb7c43.HIST --outputs 900b1a06-5d44-4a88-8394-bbf1ca090ccd_0.HIST.root --inDS data10_7TeV.00165954.physics_CosmicCalo.recon.HIST.r1608_tid176557_00 "--nFiles=2" "--nFilesPerJob=2" --useChirpServeruserendpoint:wdeng#mytest111:/tmp/wdeng/test_go/ Note that Wensheng had specified his own endpoint, as defined in Globus Online, as destination for the data.
Use Case with the Generic Pilot To better illustrate what options and what semantics may be used, let’s consider the “Generic Pilot” use case, where processing is done for a non-Atlas user. ./sendJob.py --njobs 1 --computingSite TEST3 --transformation http://www.usatlas.bnl.gov/~mxp/panda/transformations/maxim_test.sh --prodSourceLabel user --cloud OSG --jobParameters ‘\ globus-user=mxp \ in-mode=server \ out-mode=server \ files-in=xs.sh dir-in=/direct/usatlas+u/mxp/ \ files-out=xs.sh dir-out=/home/usatlas1/ \ globus-endpoint-in=mxp#MXP_BNL_TEST \ globus-endpoint-local=mxp#MXP_BNL_TEST \ globus-endpoint-out=mxp#MXP_OU_TEST’ The “modes” can be any of the following: server (gridFTP endpoint), local (local file copy), gc (Globus Connect).
List of Endpoints Maintenance of endpoints on the Globus Online portal
Conclusions New functionality On-demand, optional data transfer to any GridFTP server or workstation, with error handling, retries and extensive monitoring. Status Prototype tested with actual payload Issues Better understanding use cases Scalability of GridFTP instances