170 likes | 370 Views
Torque/Maui PIC and NIKHEF experience C. Acosta-Silva , J. Flix, A. Pérez-Calero (PIC) J. Templon (NIKHEF). Outline. System overview Successful experience (NIKHEF and PIC) Torque/Maui current situation Torque overview Maui overview Outlook. S ystem overview.
E N D
Torque/Maui PIC and NIKHEF experience C. Acosta-Silva, J. Flix, A. Pérez-Calero (PIC) J. Templon (NIKHEF)
Outline • Systemoverview • Successfulexperience (NIKHEF and PIC) • Torque/Maui currentsituation • Torque overview • Maui overview • Outlook
Systemoverview • TORQUEis a community and commercialeffortbasedonOpenPBS project. It improves scalability, enables fault tolerance and many other features • http://www.adaptivecomputing.com/products/open-source/torque/ • MauiClusterScheduleris a jobschedulercapable of supportingmultipleschedulingpolicies. Itis free and open-source software • http://www.adaptivecomputing.com/products/open-source/maui/
Systemoverview • TORQUE/Maui system has the usual batchsystemcapabilities: • Queuesdefinition (routingqueues) • Accounting • Reservation/QOS/Partition • FairShare • Backfilling • Handling of SMP and MPI jobs • Multicoreallocation and jobbackfillingensurethat Torque/Maui iscapable of supportingmulticore jobs
Succesfulexperience • NIKHEF and PIC are multi-VO siteswith local & Gridusers • Succesfulexperienceduringfirst LHC run with Torque/Maui system • Currently, both are running Torque-2.5.13 + Maui-3.3.4 • NIKHEF: 30% non-HEP, 55% WLCG, rest non-WLCG HEP or local jobs. Highly non-uniformworkload • 3800 jobs slots • 97.5% utilization (last 12 months) • 2000 waiting jobs (average)
Succesfulexperience NIKHEF: runningjobs (lastyear) NIKHEF: queuedjobs (lastyear)
Succesfulexperience • PIC: 3% non-HEP, 83% Tier-1 WLCG, 12% ATLAS Tier-2, rest local jobs (ATLAS Tier-3, T2K, MAGIC,…) • 3500 jobs slots • 95% approxutilization (last 12 months) • 2500 waiting jobs (average)
Succesfulexperience PIC: runningjobs (lastyear)
Succesfulexperience PIC: queuedjobs (lastyear)
Torque overview • Torque has a very active community: • Mailinglist: torqueusers@supercluster.org • Total free supportfromAdaptive Computing • New releaseseachyear (approx. orless) and frequent new patches • 2.5.13 isthelastreleaseof branch 2.5.X
Torque overview • Torque is well integrated with EMI middleware • Vastly used in WLCG Grid sites (~75% of sites in BDii -pbs-) • No complex to install, configure and manage: • via qmgr tool • plain text accounting • … • Torque scalability issues • Reported for branch 2.5.X • Not detected at our scale • Branch 4.2.X presents significant enhancements to scalability for large environments, responsiveness, reliability, …
Maui overview • Support:Mauiis no longersupportedbyAdaptive Computing • Documentation: • Poor documentation causes initialcomplexitytoinstallit • Things do notalwaysworklikethedocumentationsuggests • Scalabilityissues: • At ~8000 queued jobs, Maui hangs • MAXIJOBS parameter can be adjustedtolimitthenumber of jobs considerforscheduling • Thissolvesthisissue(currently in production in NIKHEF)
Maui overview • Moabisthe non-free schedulersupportedbyAdaptive Computing and based in Maui • Aimstoincreasethescalability • Itis a continuedcommercialsupport • Configuration files are very similar totheones in Maui: http://docs.adaptivecomputing.com/mwm/help.htm#a.kmauimigrate.html • FeedbackfromsitesrunningTorque/Moabwould be a goodcomplementtothisreview
Outlook • Torque/Maui scalabilityissues • Onlyrelevantforlargersites • feasibleoptionforsmall-mediumsizesites • Might be wellsolved in 4.2.X branch and tunning Maui options • Actually, multicorejobs reduces thenumber of jobstobehandledbythesystem • for sites that are predominantly WLCG (eg PIC at 95%), switching to a pure multicore load would further reduce scheduling issues at the site level. • for sites that are much less WLCG dominated (egNikhef at 55%), a switch to pure multicore load might actually increase scheduling issues at the site level, as this move would remove much of the entropy which allows reaching 97% utilization. • Anotherconcernisthesupportforthesystems, being Maui theweakest link forthe Torque/Maui combination
Outlook • Somefutureoptions • Changefrom Maui to Moab(but, itisnot free!) • Settingup a kindof “OpenMaui” projectwithin WLCG-sites as a communityefforttoprovidesupport and improvementstoMaui • Integratewithanotherscheduler. Whichone? • Complete change to anothersystem (SLURM, HTCondor, …) • “Do nothing” untila real problemarrives • Currently, just a worry, no real problemdetected so far in PIC/NIKHEF • Improvementsfrommigrating to anothersystemunclear
Outlook • Questions: • Ifdecidedfor WLCG sites to moveawayfrom Torque/Maui, wouldit be feasiblebeforethe LHC Run2? • Migration to a new batchsystemrequires time and effort, thusmanpower and expertise, in order to reach and adequate performance for a Gridsite • Notclearifneededbefore Run2 • Whathappenswithsitessharedwith non-WLCG VOs? • Impactonotherusers(NIKHEF 45%) • For PIC, several disciplines relyon local jobsubmissions. A changeonthebatchsystemaffectsmanyusers, and requires re-education, changes, and tests of theirsubmissiontools to adapt to an eventual new system