180 likes | 309 Views
Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego. Physics with SAM-Grid. Outline. Components of SAM Grid Job submission Example Problems Outlook Summary. Components of SAM-Grid (global). JIM (Job and Information Management system) Frontend and glue
E N D
Stefan Stonjek University of Oxford CHEP 2003 25th March 2003 San Diego Physics with SAM-Grid Stefan Stonjek
Outline • Components of SAM Grid • Job submission • Example • Problems • Outlook • Summary Stefan Stonjek
Components of SAM-Grid (global) • JIM (Job and Information Management system) • Frontend and glue • Condor-G for global submission and brokering • GRAM protocol to transfer job to execution site • Authentication via GSI (Grid Security Infrastructure) Stefan Stonjek
Components of SAM-Grid (local) • Data handling with SAM (Sequential data Access via Meta-data) • Local job submission to • CDF: CAF (Central Analysis Facility) • FBS: Farm batch system • Kerberized tools to write data back to FNAL • DØ: OpenPBS • Output data handling via SAM Stefan Stonjek
Submission to the Grid • User must provide : • Grid proxy • Job description file • “tar” file with executable and configuration files • GUI exists (generates “jdf” file, “tar” file and submits) • Submission to via JIM to Condor • Submit to the site with the most files of the required dataset already present • New Condor-MMS feature: execution of external code when negotiating the matches • Here: calls SAM to check for the presence of input data at different locations Stefan Stonjek
Local Submission • Job is transferred as “tar” file via the GRAM protocol and than submitted to the local batch system • Different local batch systems are possible • Need adaptor for submission and job status information • Supported at the moment: FBS, PBS, LSF, Condor • Queues have to be the same at all sites • Problem: job should not stop in the middle of an input file • User has to limit amount of input relative to CPU time in queue (needs to know queue CPU time) or provide CPU time per event (difficult) Stefan Stonjek
SAM-Grid Architecture Stefan Stonjek
Job Description File • executable = ./run-job.sh • sam_dataset = jbot0g • input_sandbox_tgz = inbox.tgz • output_sandbox = sam@sam.fnal.gov:/www/output.tgz • email = stonjek@fnal.gov • job_type = caf • caf_job_type = sam • caf_initial_section = 1 • caf_final_section = 1 Stefan Stonjek
Data Handling (SAM) • If needed SAM transfers the files to the local site • SAM translates dataset name to list of files • Selection can be based on physics meta-data • File transfer and delivery is transparent for the user Stefan Stonjek
Job Monitoring • Monitoring via a Web page • Job is identified by a global job ID • Decentralized approach • Several independent web-servers possible Stefan Stonjek
Layout of the Example(shown at Supercomputing 2002, November 2002) • Submission via command line • Broker to one site • Transfer via GRAM protocol • Local job submission by CAF • Job monitoring via Web • Transfer of results via kerberized rcp to FNAL web server Stefan Stonjek
CDF • Kyungpook National University, Korea • Rutgers State University, New Jersey, US • Rutherford Appelton Laboratory, UK • Texas Tech, Texas, US • University of Toronto, Canada • DØ • Imperial College, London, UK • Michigan State University, Michigan, US • University of Michigan, Michigan, US • University of Texas at Arlington, Texas, US Grid Map Stefan Stonjek
Physics z0(µ1) z0(µ2) Standard CDF analysis job submitted via SAM-Grid and executed somewhere J/ψ => µ+ µ- Stefan Stonjek
Problems (Security) • Firewalls (different settings at different sites) • Block all incoming connection to unpriviled ports • Cancel idle TCP/IP connection • Communication problems, in particular for remote execution • Authentication (ssh, kerberos, GSI, ...) • FNAL allows just kerberized access • Different and local policies • Problem: How to write back the data? • Grid Security Infrastructure (GSI) might help Stefan Stonjek
Problems (Private Networks) • Already problems with SAM • Worker node can contact outside world • Outside world can not call back • Problem if long time between call from worker and the response from the outside • IPv6 might be a solution Stefan Stonjek
Outlook • Deploy SAM-Grid to further locations • Develop SAM-Grid towards production readiness Stefan Stonjek
Summary • Several new tools and protocols were used to from a Grid enabled environment to do physics • SAM-Grid is able to use Grid technology to perform real world physics analysis Stefan Stonjek