300 likes | 477 Views
The STAR Unified Meta-Scheduler (SUMS). A front end around evolving technologies for user analysis and data production. J é r ô me Lauret , Gabriele Carcassi, Levente Hajdu Efstratios Efstathiadis, Lidia Didenko, Valeri Fine Iwona Sakrejda, Doug Olson. Outline. Project overview
E N D
The STAR Unified Meta-Scheduler (SUMS) A front end around evolving technologies for user analysis and data production. Jérôme Lauret, Gabriele Carcassi, Levente Hajdu Efstratios Efstathiadis, Lidia Didenko, Valeri Fine Iwona Sakrejda, Doug Olson
Outline • Project overview • STAR Experiment • Problematic • Solution • Design and architecture • Basic principles • Building blocks • Add-on (usage tracking) • Usage • Grid experience • Schedulers • Key features • MonaLISA policy • Contributions • GUI, dispatchers • Future work & Conclusion Jérôme LAURET, RHIC-STAR/BNL
Project overview Jérôme LAURET, RHIC-STAR/BNL
The STAR Experiment • The Solenoidal Tracker At RHIC • http://www.star.bnl.gov/ is an experiment located at BNL (USA) • A collaboration of 546people wide, spanning over 12countries • A PByte scale experiment overall (raw, reconstructed events, simulation) with large amount of files (several Million) • Run4 alone (2003-2004) has produced 200 TB of raw data • Rich set of data analysis and simulation problems • Expecting 200 TB of reconstructed data • 40 TB of MuDST (1 pass) • Files copied to Tier1 using SRM tools (see Track 4, 344 ? Jérôme LAURET, RHIC-STAR/BNL
Problematic • Ongoing analysis • Past and new sets of data are constantly analyzed • Data spread at many location • sites and storage type, some on distributed disk local to each machine not easily accessible • Evolving technologies • Distributed computing (re) shapes itself as we make progress: Condor-G, portals, Meta-Schedulers, Web Services, Grid Services, … • Batch technologies themselves evolve Users have to adapt within a productive environment and ever growing scientific program May be fine for new experiment, not for running ones Jérôme LAURET, RHIC-STAR/BNL
Solution • Allow user to pursue scientific endeavor without disruption • Make use of current/available resources • Ensure same productivity (subjective without matrix) • Develop a front end shielding the user from technology details and changes – Job concept Abstraction • Attract users to migrate to new framework & Grid=> data management, file relocation => Catalog • Design a tool/framework allowing for evolution • Changing underlying technology should NOT mean change in user’s daily routine • Framework should allow for testing ideas, plug-in of new components (Dispatcher for Local Resource Managers = LRMS), moving users to distributed computing with no extraneous knowledge Jérôme LAURET, RHIC-STAR/BNL
And so SUMS was born … • Project started in 2002 • Light developer team (<> ~ 1.0 FTE) • Surrounding activities have enriched the project and spawned activities and collaborations (Monitoring, U-JDL, Resource Brokering studies, …) • Historically • STAR project, design and prototype responsibility taken by WSU. • Project enhanced and brought to user community (Gabriele Carcassi) • Current development & design (Levente Hajdu) • Entirely written in Java • Portable, modular class based design • Project management, auto-documentation, … Jérôme LAURET, RHIC-STAR/BNL
Design / Architecture - Opened Jérôme LAURET, RHIC-STAR/BNL
Basic principles • Users do NOT write • shell scripts and submit • series of tag=value • Instead, they write an XML – U-JDL • Describing their “intent” to work on files, a DataSet, collections, etc … • They do not have to know where those files are located (LFN or collections may convert to PFN) • They do not have to handle the gory details of resource management (bsub –R …) • They do not need to think where their job will best fit, their input to SUMS are rates or ranges indications • Following a prescribed schema and … % star-submit MyJob.xml % star-submit-template –template MyTemplateJob.xml –entities jobname=test,year=2004 Jérôme LAURET, RHIC-STAR/BNL
Query/Wildcard sched1043250413862_0.list / .csh resolution /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... Job description /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... test.xml ... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... <?xml version="1.0" encoding="utf-8" ?> /star/data09/reco/productionCentral/FullFie... <job maxFilesPerProcess="500"> /star/data09/reco/productionCentral/FullFie... / star/data09/reco/productionCentral/FullFie... <command>root4star -q -b sched1043250413862_1.list / .csh /star/data09/reco/productionCentral/FullFie... rootMacros/numberOfEventsList.C\ /star/data09/reco/productionCentral/FullFie... (\"$FILELIST\"\)</command> /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... <stdout /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... URL="file:/star/u/xxx/scheduler/out/$JOBID.out" /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /> /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... <input /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... URL="catalog:star.bnl.gov?production=P02gd,fil /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... ... etype=daq_reco_mudst" preferStorage="local" /star/data09/reco/productionCentral/FullFie... nFiles="all"/> ... <output fromScratch="*.root" sched1043250413862_2.list / .csh toURL="file:/star/u/xxx/scheduler/out/" /> </job> /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... /star/data09/reco/productionCentral/FullFie... ... What it does … User Input … () … Policy …. dispatcher Jérôme LAURET, RHIC-STAR/BNL
Architecture / building blocks • Main boxes are javaclasses • The framework choosesthe blocks to use depending on user options (% … -policy XXX) • Interface between blocks are identical • Implementations of the Policy class = the heart of SUMS (decision making, planning, resource brokering, …)Extendable, adaptable Jérôme LAURET, RHIC-STAR/BNL
Job Initializer XML is validated, request objects created … Jérôme LAURET, RHIC-STAR/BNL
Queues • Queue concept is “opened” • Queue can be a LRMS queue (PBS, LSF, SGE, …) • Queue can be a Pool or a DRMS (Condor, Condor-G, …) • A Web or Grid Service • … anything for which a dispatcher can be written • The object container is defined or defines • Defined by a name (may be logical) • Associated to a dispatcher (has a pointer to a dispatcher object) – LSFDispatcher uses logical name = queue name • Has resource requirements • CPUtime limits, memory limits, the type of storage it can access, storage limits • Base rule: they can be undefined -1 (to be expected from Policy stand point) Jérôme LAURET, RHIC-STAR/BNL
Policies • Policies integrate pre-defined queues • Serialized XML as local configuration • A policy can make use of as many queues as necessary • Queues may have • a type (LSF, PBS, Condor, …) • a scope (Local, Distributed, …) • Allows SUMS to decide which one to take depending on RB decision • Queues can be given an initial weight (for example, used for ordering if weight = priority) • Queues have a weight-incremental • Complex policies may order queues as necessary (your choice) – Default order by weight (priority) Jérôme LAURET, RHIC-STAR/BNL
Policy note – job splitting • <input> element can take several form • Transition formats: PFN, PFN (wildcard) • <input URL="file:/star/data15/reco/productionCentral/FullField/P02ge/2001/322/st_physics_2322006_raw_0016.MuDst.root" /> • <input URL="file:/star/data15/reco/productionCentral/FullField/P02ge/2001/*/*.MuDst.root" /> • Locally distributed PFN support • <input URL="file://rcas6078.rcf.bnl.gov/home/starreco/reco/productionCentral/FullField/P02gd/2001/279/st_physics_2279005_raw_0285.MuDst.root" /> • List support • <input URL="filelist:/star/u/user/username/filelists/mylist.list" /> • Dataset, MetaData support • <input URL="catalog:star.bnl.gov?production=P02gd,filetype=daq_reco_mudst,storage=local" nFiles="2000" /> • … LFN support on the way … • Preferred STAR usage: map MetaData/Collections or LFN to PFN, dispatch jobs--- BUT THERE ARE TWO WAYS --- • PFN converted (URL syntax do not end up in final lists, APPS work as usual) • Lists are formatted and passed to APPS as URL, APPS need to sort URLExample: rootd syntax like URL passed as-is Jérôme LAURET, RHIC-STAR/BNL
Dispatchers • High level dispatcher • do a redirect to • PBS • LSF • SGE • Condor • Condor-G • BOSS • … Jérôme LAURET, RHIC-STAR/BNL
Add-On – Usage monitoring • Needed usage feedback - Monitoring user’s usage to • Allow for a better targeted tool • Focus can be made on most used/preferred feature • CS fantasy trimmed down • Serves better the user community • Eliminates divergence and re-focus • Practicality first, SciFi later … • Ensures equity of usage • Helps re-focusing tutorials & documentation • JSP based (tomcat) with MySQL back-end • All options and usage are recorded Jérôme LAURET, RHIC-STAR/BNL
Example of useful information … Which storage type is most used … may very well be a $$ / accessibility question Implemented two ways of accessing locally distributed files. Is it used ?? Added SGE dispatcher a few weeks ago … Jérôme LAURET, RHIC-STAR/BNL
Example II-a PDSF BNL 4500 jobs /day Peaks at 20k Jérôme LAURET, RHIC-STAR/BNL
Example II-b • Pessimistic graph is an • integral count over time. • It shows that after first usage, users keep using SUMS … • NB: Drop from the beginning of the summer indicates • Vacation time • Conference time • Lack of new data • (this is not the best period • for SUMS commercial but • informative nonetheless) See more statistics at http://www.star.bnl.gov/STAR/comp/Grid/scheduler/ Jérôme LAURET, RHIC-STAR/BNL
Physicist usage • As far as we know, 85% of active users using SUMS • Publications selection / confirmed as 100% SUMS analysis based • J. Gonzales - Nuclear Experiment, abstractnucl-ex/0408016, Pseudorapidity Asymmetry and Centrality Dependence of Charged Hadron Spectra in d+Au Collisions at sqrt(SNN)=200 GeV (submitted to PRC) • L. S. Barnby – QM Proceedings - 2004 J. Phys. G: Nucl. Part. Phys. 30 S1121-S1124 • T. Henry - Full jet reconstruction in d+Au and p+p collisions at RHIC, Journal of Physics G: Nuclear Physics (volume 30, issue 8) S1287 • J.S. Lange - Proceedings 19th Winter Workshop on Nuclear Dynamics (2003), nucl-ex/0306005 - Review of search for heavy flavor (c,b quarks) production in leptonic decay channels in Au+Au collisions at sqrt(sNN)=200 GeV at the STAR Experiment at RHIC. • A. Tang - Anisotropy at RHIC: the first and the fourth harmonic • … • http://www.star.bnl.gov/central/publications/ (7 papers / analysis submitted in the past 3 months) Jérôme LAURET, RHIC-STAR/BNL
Grid experience • Use of SUMS for Grid job submissions possible • Modulo RSL extensions • <input> <output> tags MUST specify path as relative path (“bla.root”, “blop/test.dat”, …) • <output> attribute fromScratch / toURL designed to bring the files back (globus-url-copy) • Grid experience has been a challenge • Cryptic messages, had a problem with a globus error 74: no clue of what it was for months, no Grid Help-desk, no knowledge base index. Turned out to be a firewall issue, burst of massive job death • Nonetheless • ¼ of Run4 simulation production made on grid • 100,000 events generated, analysis ongoing • Success rate • 85% when all goes well • 60% when lots of jobs are submitted (above issue) • Planning to run on larger scale platform, Grid3+ and/or OSG-0 with (hopefully) better ways to track errors/problems Jérôme LAURET, RHIC-STAR/BNL
Schedulers Jérôme LAURET, RHIC-STAR/BNL
Schedulers • Can a user front end to other LRMS/DRMS be called a “scheduler” ?? • Is using the local resource within the same paradigm than globally distributed resources ? Jérôme LAURET, RHIC-STAR/BNL
Schedulers • Key features for a scheduler • Keep global accounting • Scheduling decisions may be based on • Resource availability, respect of local policies, fairshare (cluster autonomy) • Advance reservation, best use of resources • Network and data cache, data availability • … • Job migration, moving jobs to/from a trusted cluster • Spanning and workflow • Human readable messages • … • Scheduling algorithm can be complex • Attempts to predict (Weather Services) has been proven difficult • Dedicated Global accounting and standard messages possible • Mixed of LRMS and DRMS capabilities (user autonomy) not common • Complex algorithm takes into account so many parameters … • Empirical approach • Inspect queue behavior, send jobs, see how queue reacts … re-adjust • Self-sustained system • Adapts to network/resource/load changes ?? Jérôme LAURET, RHIC-STAR/BNL
Monitoring Policy LSF Empirical approach (?) • Information fed by agents to ML • Information is recovered by SUMS module • Scheduling decisions made based on load and “queue” or “pool” response time • Self-sustained system (no need for %tage based submission branching) • Hopefully no need for complex algorithm • Respond as resources, priorities, bandwidth adjusts • Results / details in Efstratios Efstathiadis presentation, Track 4 - 393 Jérôme LAURET, RHIC-STAR/BNL
Contributions • RHIC/Phenix collaboration have tested and using SUMS • Contributions included addition of dispatchers (PBS, BOSS) – Andrey Shevel • Development includes creation of GUI front end for end-users – Mike Reuter • Job tracking and monitoring • SUMS allows for dispatching to ANY queues • BOSS (from CMS) a possible solution as “a” dispatcher • Implemented / contributed by Andrey Shevel (Phenix/SUNY-SB) – Track 5, 86 BODE tracking Jérôme LAURET, RHIC-STAR/BNL
Future work • High Level User JDL work • Started with a document on RDL (PPDG-39) • Motivation • Current U-JDL simple enough but has its limitations • Extension to new resource requirement possible but inelegant • U-JDL considers most (but not all) data sets • Lacks concept of tasks and sandboxes • Workflow diagram are only AND (sequential) implemented (need OR, conditional branching etc …) • SBIR with Tech-X (David Alexander) • Deliverables • Enhanced and complete U-JDL (AJHDL) • A WSDL for creating a Grid Service • Reviewed most available high level JDL • Job Submission Description Language (JSDL) (GGF) • Analysis Job Description Language (AJDL) (Atlas) • User Request Description Language (URDL) (PPDG-39 / Jlab/STAR) • Job Description Language (JDL) (DataGrid) • Job Description Language (JDL) (JLab) • … Jérôme LAURET, RHIC-STAR/BNL
Future work • We promised our users the U-JDL will not change • For what they know, it won’t (XSLT, schema transformation) • But the ones using AJHDL will have access to more features • We are working on job tracking • We are working on the concept of Meta-Log (application level monitoring) • Seems to be forgotten • Valeri Fine – Poster, 480 Jérôme LAURET, RHIC-STAR/BNL
Conclusions • SUMS is NOT • a batch system • A toy (real needs, real use, real Physics) • SUMS is • A front end to local and distributed RMS acting like a client to multiple, heterogeneous RMS • A flexible opened architecture, object oriented framework in which with plug-and-play features • A good environment for further developing • Standards (such as High level JDL) • Scalability of other components (ML work, immediate use) • Used in STAR for real Physics (usage and publication list) • Usedfor Distributed / Grid Simulation job submission • Used successfully by other experiments • A mean to make active users transition to distributed computing and recover under-used resources … • … Jérôme LAURET, RHIC-STAR/BNL