230 likes | 310 Views
ATLAS Data Challenge on NorduGrid. CHEP2003 – UCSD Anders Wäänänen waananen@nbi.dk. NorduGrid project. Launched in spring of 2001, with the aim of creating a Grid infrastructure in the Nordic countries. Idea to have a Monarch architecture with a common tier 1 center
E N D
ATLAS Data Challenge on NorduGrid CHEP2003 – UCSD Anders Wäänänen waananen@nbi.dk
NorduGrid project • Launched in spring of 2001, with the aim of creating a Grid infrastructure in the Nordic countries. • Idea to have a Monarch architecture with a common tier 1 center • Partners from Denmark, Norway, Sweden, and Finland • Initially meant to be the Nordic branch of the EU DataGrid (EDG) project • 3 full-time researchers with few externally funded
Motivations • NorduGrid was initially meant to be a pure deployment project • One goal was to have the ATLAS data challenge run by May 2002 • Should be based on the the Globus Toolkit™ • Available Grid middleware: • The Globus Toolkit™ • A toolbox – not a complete solution • European DataGrid software • Not mature for production in the beginning of 2002 • Architecture problems
Input “sandbox” UI JDL Input “sandbox” Output “sandbox” Job Submit Job Query Brokerinfo Job Status Output “sandbox” Job Status A Job Submission Example Replica Catalogue Information Service Resource Broker Author. &Authen. Storage Element Job Submission Service Logging & Book-keeping Compute Element
Architecture requirements • No single point of failure • Should be scalable • Resource owners should have full control over their resources • As few site requirements as possible: • Local cluster installation details should not be dictated • Method, OS version, configuration, etc… • Compute nodes should not be required to be on the public network • Clusters need not be dedicated to the Grid
User interface • The NorduGrid user interface provides a set of commands for interacting with the grid • ngsub – for submitting jobs • ngstat – for states of jobs and clusters • ngcat – to see stdout/stderr of running jobs • ngget – to retrieve the results from finished jobs • ngkill – to kill running jobs • ngclean – to delete finished jobs from the system • ngcopy – to copy files to, from and between file servers and replica catalogs • ngremove – to delete files from file servers and RC’s
ATLAS Data Challenges • A series of computing challenges within Atlas of increasing size and complexity. • Preparing for data-taking and analysis at the LHC. • Thorough validation of the complete Atlas software suite. • Introduction and use of Grid middleware as fast and as much as possible.
Data Challenge 1 • Main goals: • Need to produce data for High Level Trigger & Physics groups • Study performance of Athena framework and algorithms for use in HLT • High statistics needed • Few samples of up to 107 events in 10-20 days, O(1000) CPU’s • Simulation & pile-up • Reconstruction & analysis on a large scale • learn about data model; I/O performances; identify bottlenecks etc • Data management • Use/evaluate persistency technology (AthenaRoot I/O) • Learn about distributed analysis • Involvement of sitesoutsideCERN • use of Grid as and when possible and appropriate
DC1, phase 1: Task Flow • Example: one sample of di-jet events • PYTHIA event generation: 1.5 x 107 events split into partitions (read: ROOT files) • Detector simulation: 20 jobs per partition, ZEBRA output Athena-Root I/O Zebra Hits/ Digits MCTruth Atlsim/Geant3 + Filter Di-jet HepMC (~450 evts) (5000 evts) 105 events Pythia6 Atlsim/Geant3 + Filter Hits/ Digits MCTruth HepMC Atlsim/Geant3 + Filter Hits/ Digits MCtruth HepMC Detector Simulation Event generation
DC1, phase 1: Summary • July-August 2002 • 39 institutes in 18 countries • 3200 CPU’s , approx.110 kSI95 – 71000CPU-days • 5 × 107 events generated • 1 × 107 events simulated • 30 Tbytes produced • 35 000 files of output
DC1, phase1 for NorduGrid • Simulation • Dataset 2000 & 2003 (different event generation) assigned to NorduGrid • Total number of fully simulated events: • 287296 (1.15 × 107 of input events) • Total output size: 762 GB. • All files uploaded to a Storage Element (University of Oslo) and registered in the Replica Catalog.
Job xRSL script & (executable=”ds2000.sh”) (arguments=”1244”) (stdout=”dc1.002000.simul.01244.hlt.pythia_jet_17.log”) (join=”yes”) (inputfiles=(“ds2000.sh” “http://www.nordugrid.org/applications/dc1/2000/dc1.002000.simul.NG.sh”)) (outputfiles= (“atlas.01244.zebra” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.zebra”) (“atlas.01244.his” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.his”) (“dc1.002000.simul.01244.hlt.pythia_jet_17.log” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.log”) (“dc1.002000.simul.01244.hlt.pythia_jet_17.AMI” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.AMI”) (“dc1.002000.simul.01244.hlt.pythia_jet_17.MAG” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.MAG”)) (jobname=”dc1.002000.simul.01244.hlt.pythia_jet_17”) (runtimeEnvironment=”DC1-ATLAS”) (replicacollection=”ldap://grid.uio.no:389/lc=ATLAS,rc=NorduGrid,dc=nordugrid,dc=org”) (maxCPUTime=2000)(maxDisk=1200) (notify=”e waananen@nbi.dk)
NorduGrid job submission • The user submits a xRSL-file specifying the job-options. • The xRSL-file is processed by the User-Interface. • The User-Interface queries the NG Information System for resources and the NorduGrid Replica-Catalog for location of input-files and submits the job to the selected resource. • Here the job is processed by the Grid Manager, which downloads or links files to the local session directory. • The Grid Manager submits the job to the local resource management system. • After simulation finishes, the Grid-Manager moves requested output to Storage Elements and registers these into the NorduGrid Replica-Catalog.
RSL RSL RSL RC MDS SE SE NorduGrid job submission Gatekeeper GridFTP Grid Manager
NorduGrid Pileup • DC1, pile-up: • Low luminosity pile-up for the phase 1 events • Number of jobs: 1300 • dataset 2000: 300 • dataset 2003: 1000 • Total output-size: 1083 GB • dataset 2000: 463 GB • dataset 2003: 620 GB
Pileup procedure • Each job downloaded one zebra-file from dc1.uio.no of approximate • 900MB for dataset 2000 • 400MB for dataset 2003 • Use locally present minimum-bias zebra-files to "pileup" events on top of the original simulated ones present in the downloaded file. The output size of each file was about 50 % bigger than the original downloaded file i.e.: • 1.5 GB for dataset 2000 • 600 GB for dataset 2003 • Upload output-files to dc1.uio.no and dc2.uio.no SE‘s • Register into the RC.
Other details • At peak production, up to 200 jobs were managed by the NorduGrid at the same time. • Has most of Scandinavian production clusters under its belt (2 of them are in Top 500) • However not all of them allow for installation of ATLAS Software • Atlas job manager Atlas Commander support the NorduGrid toolkit • Issues • Replica Catalog scalability problems • MDS / OpenLDAP hangs – solved • Software threading problems – partly solved • Problems partly in Globus libraries
NorduGrid DC1 timeline • April 5th 2002 • First ATLAS job submitted (Athena Hello World) • May 10th 2002 • First pre-DC1-validation-job submitted (ATLSIM test using Atlas-release 3.0.1) • End of May 2002 • Now clear that NorduGrid mature enough to handle real production • Spring 2003 (now) • Keep running Data challenges and improve the toolkit
Quick client installation/job run • As a normal user (non system privileges required): • Retrieve nordugrid-standalone-0.3.17.rh72.i386.tgz tar xfz nordugrid-standalone-0.3.17.rh72.i386.tgz cd nordugrid-standalone-0.3.17 source ./setup.sh • Get a personal certificate: grid-cert-request • Install certificate per instructions • Get authorized on a cluster • Run a job grid-proxy-init ngsub '&(executable=/bin/echo)(arguments="Hello World")‘
Resources • Documentation and source code are available for download • Main Web site: • http://www.nordugrid.org/ • ATLAS DC1 with NorduGrid • http://www.nordugrid.org/applications/dc1/ • Software repository • ftp://ftp.nordugrid.org/pub/nordugrid/
The NorduGrid core group • Александр Константинов • Balázs Kónya • Mattias Ellert • Оксана Смирнова • Jakob Langgaard Nielsen • Trond Myklebust • Anders Wäänänen