180 likes | 305 Views
BaBar WEB job submission with Globus authentication and AFS access. T. Adye, R. Barlow, A. Forti , A. McNab, S. Salih, D. H. Smith on behalf of the BaBar computing group. Introduction.
E N D
BaBar WEB job submission with Globus authentication and AFS access T. Adye, R. Barlow, A. Forti, A. McNab, S. Salih, D. H. Smith on behalf of the BaBar computing group
Introduction • BaBar computing is a evolving towards a distributed model rather than centralized one. The main goal is to allow to all the physicist of the collaboration to have access to all the resources. • As an exercise to highlight what we need to get from more sophisticated middleware we have tried to solve some of these problems with the existing technology in two ways.
Introduction • The first way, that resulted in the BaBarGrid demonstrator, is run through a WEB browser from the user laptop or desktop and doesn't require supplementary software on this platform. • The second way is to use globus as an extended batch system command line on a system with afs access and the aim is to simplify the input output sandbox problem through a shared file system. The afs tokens are maintained using gsiklog.
Components • Common to both • BaBar VO • Generic Accounts • Globus Authentication and Authorization • globus command line tools • Data Location according to user specifications done with BaBar metadata catalog • Different: • WEB browser and http server • AFS
BaBar VO (Virtual Organization) • Any BaBar grid user has by definition a Grid certificate from an accepted authority, and an account on the central SLAC system with BaBar authorisation in the afs acl list. • Users can register for BaBarGrid use just by copying their DN (Distinguished Name) into a file in their home area at SLAC. • A cron job then picks this up and sends it to the central BaBar VO machine after checking the afs acl lists. • With another cron job all participating sites pick up the list of authorized BaBar users and insert it into their gridmap files with the generic userid .babar.
VO maintenance (2) • Local system manager retains the power to modify the cron job that pulls the grid map file. • With the generic userids there is no need to create accounts for each user at each site. • It is straightforward to ensure that these generic accounts have low levels of privilege, and local users are given priority over ones from outside. • This system has proved easy to operate and reliable.
input sandbox (1) • For each job one requires: • The binary, the data files • a set of .tcl files • a .tcl file specifying all the data files for this job • a small .tcl file that pulls in the others • a large .tcl file containing standard procedural stuff • various other .dat • the calibration (conditions) database • The setting of appropriate environment variables • The presence of some dynamic (shared) libraries
input sandbox (2) • For BaBar this is a particular problem because it is assumed that the job runs in a ‘test release directory' in which all these files are made available through pointers to a parent release. • Alternatives for this problem are: • Only to run at sites where the desired parent release is available. Too restrictive. • Provide these files and ship them (demonstrator) • To run from within an afs directory. Use gsiklog to gain access to the test and the parent releases and cd to the test release as the very first step of each job (job submission within afs)
data location (1) • Data location is done through a metadata catalog. • Each site has a slightly modified replica of the central catalog in which collections (root files or objectivity collections) on local disk are flagged. Each catalog allows read access from outside. • Users can make their own specification for the data. • They then provide an ordered list of sites. The system locates the matching data available at the first site querying its catalog. It then enquires the second site for matching the data that wasn’t at the first one, and this is repeated through the site list.
data location (2) • The previous method has been improved adding to the metadata catalog the list of sites and the list indexes that uniquely identify in the catalog each collection on disk at each site. • This has improved the speed of the query because the data selection is done only once on a local database. • A user doesn’t have to give anymore a list of sites but he can manually exclude them if needed. • Jobs are split accordingly to the sites with the data and to user specifications like the number of events to be processed in each job. • If some data exist at more than one site this is reported in an index file that maps the tcl files with site names.
demonstrator job submission • The user creates a grid-proxy and then uploads it into the server. • This provides a single entry authorisation point, as the browser then uses this certificate to authenticate the globus job submission. • The server can then submit jobs to the remote sites on behalf of the user using globus-job-submit. • Job submission is done by a cgi perl script running in a web server. • There is no other resource matching but the data.
demonstrator job submission • Data selection in this case is done querying all the sites • Jobs are grouped accordingly to the sites where they will be submitted. • For convenience when collecting the output • Each of these groups is called superjob and is assigned an superjobid. • The totality of superjobs is called hyperjob and is assigned a unique id hyperjobid • For each superjob there is a Job0 in which the input sandbox is copied to the remote site. The other jobs part of the group just follow.
demonstrator output sandbox • As each job finishes it moves its output file to one directory /path/<superjobid> and then tars together all the files there. • The user can then request that the outputs are collected on a machine local to the http server which has spare disk space and can run grid-ftp. This machine copies all the superjobs outputs in one directory /path/<hyperjobid>. • A link is provided and a specific MIME type given. When the link is clicked on the hyperjob directory is downloaded to the desktop machine where the application specific to this type has been arranged to unpack the directory and run a standard analysis job on it to draw the desired histograms.
demonstrator User WEB browser Metadata catalog Remote site data location input sandbox transfer globus_job_submit http server output retrieval Output collector
job submission with afs • All BaBar software and the user test release (working directory) are in afs • There is no need to provide and ship data or tcl files because those can be accessed through links to the parent releases. • User locates data with the previously described method and data tcl files are stored in the test release. • User create a proxy. • Jobs are submitted to different sites according to how the data have been split. • There is no need of categorizing jobs in hyper, super and effective jobs for collecting the output.
job submission with afs • Job submission requires: • gsiklog is copied and executed to gain access to the working directory and all the other software. • Some environment variables like LD_LIBRAY_PATH are redefined to override the remote batch nodes setup. This happens also in the demonstrator • The output sandbox is simply written back in the working directory and doesn’t require any special treatment.
BaBar job submission with afs Remote site Remote site Metadata catalog Remote site farm gsiklog gsiklog AFS cell globus_job_submit gsiklog data location globus_job_submit globus_job_submit gsiklog output Local site farm Users desktops globus_job_submit output User area in AFS cell
Conclusions • The use of a shared file system as AFS has resulted in a great simplification of the input/output sandbox, especially in a complicated case like the user analysis one. • There might be concern about the performance, but the comparison here should be done between running on a overloaded local system or running on a non-overloaded shared system. • The experience with the demonstrator has resulted in a nice GUI but it lacks of flexibility due to the fact that the http server has to be setup on purpose and it requires a three step data transfer to bring the output back to the user desktop.