220 likes | 374 Views
Use of Condor and GLOW for CMS Simulation Production. What are Condor & GLOW? What is special about Condor & GLOW environment? What is Jug? Why is Jug needed? What did we achieve for CMS? What did it take to get there? Summary What is relevant for you?.
E N D
Use of Condor and GLOW for CMS Simulation Production • What are Condor & GLOW? • What is special about Condor & GLOW environment? • What is Jug? • Why is Jug needed? • What did we achieve for CMS? • What did it take to get there? • Summary • What is relevant for you? D. Bradley, S. Dasu, M. Livny, V. Puttabuddhi, S. Rader, W. H. SmithUniversity of Wisconsin - Madison
Condor • Most of you know what Condor is from Condor-G • This talk is about using Condor without Grid tools • It is more than a simple batch queuing system • Condor in its full glory on UW campus Grid • Job scheduling • Job-Resource match-making • Job chaining (Dagman) • Job tracking to completion • Job flocking from one Condor pool to another • Cannot assume availability of the same resources in all pools • Resource allocation priorities • Foreign pools may give you idle resources but will want to preempt whenever they have work to do • Condor makes another match for your job • You will be more efficient if you rerun from where you left of • Can be automatically achieved with check-pointing image or work status • Resource usage monitoring
matchmaker condor_submit schedd (Job caretaker) Startd (Runs job) Globus gatekeeper schedd (Job caretaker) gahp condor_submit gridmanager Condor vs Condor-G • Condor: • Condor-G: From A. Roy
HEPmatchmaker condor_submit condor_submit HEPschedd (Job caretaker) HEPschedd (Job caretaker) Condor Job Flocking HEP1 HEP2 HEP3 HEP3 HEP3 HEP4 GLOW1 HEP5 GLOW2 CS1 HEP6 GLOW3 CS2 HEP7 CS3 GLOW4 HEP8 CS4 GLOW5 GLOW5 CS5 GLOW6 CS6 GLOW7 CS7 GLOW8 CS8 CS9 GLOWmatchmaker CSmatchmaker
Condor Universes • Jobs can live in one of several Universes • Standard Universe • Specially compiled jobs that can checkpoint images • Restricted system library access in Standard Universe • Jobs see “submit machine” resources • IO is redirected • Jobs can be preempted on a CPU when it receives higher priority match • Another free CPU picks up the task using the checkpoint image • Vanilla Universe • Job is scheduled and matched • No checkpointing of images • Users must checkpoint their work to be efficient • Condor issues signals that can be trapped to save work status • When job is resumed elsewhere you continue from where left of • Suitable for HEP applications
Condor Usage at UW • Several collaborating Condor pools • Jobs from one pool of Condor machines can flock to another • Important for sharing resources with other compute intensive researchers on campus • A job submitted in hep.wisc.edu domain can run in all collaborating pools in campus • Opportunistic use of idle resources • Everyone gains because all pools stay busy at all times • Buy resources for steady state operation rather than for peak needs
Grid Laboratory Of Wisconsin • GLOW - Inter-disciplinary collaboration • Astro-physics, Biochemistry, Chemical Engineering, Computer Science, High-energy Physics & Medical Physics • Resources distributed at 6 GLOW sites • Approximately 1/3 built • Operated collaboratively • Common hardware and software platform • Intel Xeons running RH9 • It was easy to agree on common platform! • Some customization for host sites • For instance, higher storage for HEP, MPI for medical physics group, larger memory for biochemistry site
GLOW Deployment • First phase deployed in Spring 2004 • Second phase in October 2004 • When done, 800 Xeon CPUs + 100 TB disk GLOW CPU @ HEP
Resource Sharing • Six GLOW sites • Equal priority 16.67% average • One can get more work done • Chemical Engineering took 33% • Others scavenge idle resources • Yet, they got 39% • Message is that efficient users can realize much more than they put in on average
CMS Jobs and Condor • CMSIM - Simulation using Geant3 • Can run in Standard Universe • Adapting to Condor was simple • OSCAR - Simulation using Geant4 • Uses multi-threaded & dynamically loaded libraries • Cannot checkpoint images • Runs only in Vanilla Universe • ORCA - Digitization (and DST production) • Vanilla Universe • IO intensive - especially reading • Efficient shared file system needed for pileup
CMS Work Breakdown • CMS work is done in multiple sequential steps • Dataset: A collection of events of a particular physics event type • A dataset is too large for a single job • Requires multiple programs to process the data • Assignment: A chunk of the work for a dataset • Split into several stages, cmkin, cmsim + hit formatting or OSCAR, and ORCA • Split into several chunks of events • Job: A particular processing step for a particular chunk of events • Several jobs make up an assignment • CMS Production manager hands out assignments • A database keeps track of which regional center got what assignment and tracks progress • Publishes data for physicist use only upon completion of processing and verification of returned job output
Juggling Jobs with Jug Jug is a python-based job management system developed at UW-HEP for running on top of the lower level batch system (e.g. Condor or Condor-G). A chained bunch of jobs are tracked persistently to ensure that they are completed even with unavoidable resource failures New stages of processing can be added dynamically Workers can be added or removed dynamically Successful jobs move on to next stage of processing Failed jobs get back in the system - however, at the tail end of the queue Recurring failures do not waste resources
Filling the Jug Database • MCRunjob “configurator” Inserts a batch of job entries into Jug from a general workflow description. May be driven by RefDB, the CERN assignment database. • Or native Jug syntax for stand-alone use Batch #event generation name = “edde.cmkin” seed_low = 120000 seed_high = seed_low + 400 software = “/cms/sw/cmkin_edde” environment = EVENTS_PER_JOB = 250 Batch #event simulation name = “edde.oscar” parent name = “edde.cmkin” input_files = “*.ntpl” software = “/cms/sw/oscar_3_3_2” “/cms/pool” environment = DATASET = “edde” OWNER = “edde_oscar332”
Batch Management The “DAG in a database” may be monitored and extended at any time. User may drill into aggregate view to inspect details.
Juggling with N>1 • High level of redundancy • Any number of SOAP RPC handlers • Multiple points of submission to batch system • Essential for scaling up, especially in the Standard Universe (remote IO burden) • Any number of storage handlers • Even instances of the same job may be automatically mirrored • Useful at tail end of a rush job when better machines become idle. • When a job is likely to be stuck but a hard timeout is not appropriate. • When paranoid of preemption.
CMSIM Production on Condor • CMSIM - Simulation using Geant3 • Largest single contributor during PCP04 of any single CMS institution • We could exploit idle cycles on UW campus Condor pools efficiently • Standard Universe helps • Many submit machines cooperated by feeding on jobs from the same database, balancing I/O load. 8.8M of 40M produced world-wide during this period. Waiting for data transfer
OSCAR Simulation on Condor/GLOW • OSCAR - Simulation using Geant4 • Runs in Vanilla Universe only • Poor efficiency because of lack of checkpointing • Application level checkpointing not in production (yet) No Assignments
CMS Reconstruction on Condor/GLOW • ORCA - Digitization • Vanilla Universe only • IO Intensive • Used Fermilab/DESY dCache system • Automatic replication of frequently accessed “pileup” events helps scalability.
CMS Work Done on Condor/GLOW • Shared resources at UW Condor/GLOW turned out to be a top source for CMS • Largest single institution excluding DC04 DST production at CERN * Includes all INFN sites * Includes Wisconsin Grid3 site * Includes DC04 DST production http://cmsdoc.cern.ch/cms/production/www/cgi/SQL/RCFarmStatus.php on 22 Sep 04
Data Movement to/from FNAL & CERN • Stork was used to move large datasets between Wisconsin and Fermilab. • Works in combination with DAGMan to provide reliable data transfer. • Supports gridftp and other protocols. • All data in Wisconsin was stored on a cluster of RAID arrays managed by dCache. • Full handshake before files are removed at UW • Datasets moved after an assignment was complete • Helps keep related files in the same tape cartridge • Large cache (few TB) was needed • The system was reliable after initial learning
Summary • UW Campus Grid (Condor/GLOW) • Successful concept • Embraced by widely differing science groups • Opportunistic use of idle resources • Everyone gains by keeping the iron hot at all times • Gains due to efficient use of systems • Deploy for steady-state use • Realize much higher peak performance • Robust, checkpointable software is the key • CMS Usage of Condor/GLOW • Successful use of shared resources for CMS work • Top producer of CMS data in 2003-2004 • Message • Get together with colleagues on campus and build shared grids • Join world-wide shared grids with your campus grid • Open Science Grid (Ruth’s talk) and EGEE are the future