280 likes | 306 Views
Learn about GANGA, an easy-to-use frontend that allows for job definition and management in distributed analysis. Explore its user interface options, supported applications, and component architecture.
E N D
Distributed Analysis using GANGA Dietrich Liko
Overview • What is GANGA ? • Why a User Interface ? • How GANGA works ? • User Interface • Command line • GUI • Scripts • GANGA Usage • ATLAS • LHCb
What is GANGA ? • Ganga is an easy-to-use frontend for job definition and management • Allows simple switching between testing on a local batch system and large-scale data processing on distributed resources (Grid) • Developed in the context of ATLAS and LHCb • For ATLAS, have built-in support for applicationsbased on Athena framework, for JobTransforms,and for DQ2 data-management system • For LHCb built-in support for applications build on Gaudi framework and the DIRAC middleware • Component architecture readily allows extension • Implemented in Python
Who is GANGA ? • Ganga is an ATLAS - LHCb joint project • Support for development work from UK (PPARC/GridPP), Germany (D-Grid) and EU (EGEE/ARDA) • Core team U.Egede (Imperial), K.Harrison (Cambridge), D.Liko (CERN), A.Maier (CERN), J.T.Moscicki (CERN), A.Soroko (Oxford), C.L.Tan (Birmingham), A. Muraru (CERN), J.Elmshäuser (Munich)
GANGA Job What to run Application Where to run Backend Data read by application Input Dataset Job Data written by application Output Dataset Rule for dividing into subjobs Splitter Rule for combining outputs Merger
Applications LHCb applications ATLAS applications Metadata catalogues Other applications File catalogues Data storage and retrieval User interface for job definition and management Tools for data management GANGA Remoterepository Experiment-specific workload-management systems Gangamonitoringloop Localrepository Local batch systems Distributed (Grid) systems Ganga job archives Processing systems (backends) GANGA Building Blocks Ganga has built-in support for ATLAS and LHCb Component architecture allows customization for other user groups
Backends and Applications AthenaMC (Production) Athena (Simulation/Digitisation/ Reconstruction/Analysis) Gauss/Boole/Brunel/DaVinci (Simulation/Digitisation/ Reconstruction/Analysis) Executable PBS LSF OSG PANDA LHCb WMS US-ATLAS WMS Implemented Coming soon
GANGA Activities • Main Users • Other activities Garfield HARP
Different working styles • Command Line Interface in Python (CLIP) provides interactive job definition and submission from an enhanced Python shell (IPython) • Especially good for trying things out, and seeing how the system works • Scripts, which may contain any Python/IPython or CLIP commands • allow automation of repetitive tasks • Scripts included in distribution enable kind of approach traditionally used when submitting jobs to a local batch system • Graphical User Interface (GUI) allows job management based on mouse selections and field completion • Lots of configuration possibilities
IPython Comfortable python shell Many useful extensions http://ipython.scipy.org/ CLIP GANGA Command line interface Jobs are Python objects How to define a job ? j=Job() j.application=Executable() j.application.exe=‘/bin/echo’ j.applications.args=[‘Hello World’] j.backend=LCG() j.submit() Command Line Interface
Scripts Example from ATLAS ganga athena --inDS trig1_misal1_csc11.005033.Jimmy_jetsJ4.recon.AOD.v12000601 --outputdata AnalysisSkeleton.aan.root --split 3 --maxevt 100 --lcg --ce ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas AnalysisSkeleton_topOptions.py
ATLAS Computing Model • Event Filter Farm at CERN • Located near the Experiment, assembles data into a stream to the Tier 0 Center • Tier 0 Center at CERN • Raw data Mass storage at CERN and to Tier 1 centers • Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD) • Ship ESD, AOD to Tier 1 centers Mass storage at CERN • Tier 1 Centers distributed worldwide (10 centers) • Re-reconstruction of raw data, producing new ESD, AOD • Scheduled, group access to full ESD and AOD • Tier 2 Centers distributed worldwide (approximately 30 centers) • Monte Carlo Simulation, producing ESD, AOD, ESD, AOD Tier 1 centers • On demand user physics analysis • CERN Analysis Facility • Analysis • Heightened access to ESD and RAW/calibration data on demand • Tier 3 Centers distributed worldwide • Physics analysis
ATLAS Distributed Analysis • Data is being distributed to the sites • AOD: To all Tier-1 sites, further distribution to the Tier-2 • ESD: To two Tier-1 sites, only small subsets to Tier-2 • Jobs are send to the data • Tier-2 are the main resource for User Analysis • TAG based analysis will reduce the need for IO • Some important event characteristics will be stored in DB • POOL/Root file based (Tier-2) • Database based (Tier-1 ?) • Two main activities • GANGA on LCG • Pathena/PANDA on OSG
ATLAS infrastructure • ATLAS uses several Grid Infrastructures • EGEE/LCG • OSG • Nordugrid • Workload management • LCG RB • PANDA • ARC middleware • Datamanagement • Don Quijote • Uses FTS and grid specific file catalogs
Workload management • EGEE - Resource Broker • Push model • New gLite RB not yet available for analysis • New players have entered the field … • PANDA • Pull model • Similar to Alien and Dirac • Integrated with DDM • ARC • Push model • Gatekeeper regulates the access to data
Datamanagement with DQ2 Central Dataset catalog Local File Catalogs LFC MySQL RLS OSG EGEE Nordugrid
GANGA Core U. Egede, K. Harrison, J.Moscicki, A.Soroko, V.Romanovsky, A. Murao GANGA GUI C.L. Tan Athena AOD analysis J. Elmshäuser Tag Navigator M. Kenyon, C. Nicholson User production F. Brochu EGEE/LCG H.-C. Lee, D. Liko Nordugrid P. Katarina, B. Hallvard PANDA D.Liko + support from PANDA AMI Integration F. Fassi, C.L. Tan + support from AMI Mona Lisa Montoring B. Gaidioz, J. Yu, T. Reddy Who is ATLAS GANGA ?
GANGA features • Current situation • GANGA on EGEE/LCG • pathena/PANDA on OSG • In upcoming release GANGA 4.3 • Support for gLite RB • Support for PANDA (OSG ATLAS) • Support for ARC (Nordugrid) • Additional features • AMI Metadata • MonaLisa based application monitoring • Tag navigator for event selection on DB
User sends job to DIRACWMS DIRAC sends pilot job to LCG site Only if the Pilot is running well the job is pulled from the taskqueue Small files returned via Sandbox, large files are registered in LFC User is interacting only with DIRAC and is shielded from RB problems Analysis with DIRAC
Throughput • 90 % of the results within 3 hours • 95 % of the results after 4 hours • 100 % after 10 hours • The delay was casued by a problem to access the data on a Tier-1
Reliability • DIRAC • Problems releated to file registration • Resource Broker • Submission to all sites gave a bad success rate • Submission to a well working tier-1 site gave results close to DIRAC submissions
Summary • User Analysis based on GANGA is well progressing • More then 300 persons have tried GANGA since this year • Up to 50 users on a daily basis • ATLAS and LHCb use the same framework, but different plugins • The flexibility of the framework is an insurance for future middleware developments • The GANGA model has brought together a good collaboration between various development teams • Have a try yourself • Experiment specific tutorials on regular basis • ATLAS Distributed Analysis Tutorial in Lyon • Interest to include GANGA in the EGEE Dissemination • First tutorial in Taipeh