170 likes | 184 Views
Learn about the philosophy of Unix tools, their composition, and their use in interactive MPI on demand. Discover the benefits of division in Unix processes, such as restartability, better security, and scalability across multi-core systems.
E N D
Unix Tool Philosophy • 1) Individual tools do one thing well • 2) Communicate via ascii streams • 3) Are composable
The Paradox • Universal assent that it’s good • No one uses it • (Except for shell one-liners) • grep ^abc| sort | uniq –c | sort –n
More than just shell scripts Division in Unix processes provides: Restartabilty Better security Scalable across multi-core
For example… • Qmail: • Secure, stable • Implemented across ~dozen processes
Getting back to Condor… • Condor uses this in some places • x-Gahp’s • condor_master • Replaceable shadow/starter pairs • Multi_shadow vs. many shadow • But not everywhere • schedd
Condor Daemons as Components • Very Successful strategy: • Glide-in • Personal-condor • “Hoffman” and schedd’s as jobs • Condor-c
Case Study: MPI on Demand • The problem: • Have a pool with lots of machines • Very-long running (weeks) vanilla jobs • Need to run big, but short MPI • Can’t reboot startds • Need Dedicated scheduler • Requires dedicated machines
Possible Solutions • Add “suspension slot” • Requires Reboot • Submit MPI job normally • Preempts vanilla job
COD refresher • COD: Computing On Demand • No Scheduling • No File Transfer • When COD runs, vanilla job suspends • “Checkpoint to swap” • Needs security on to work • Explicitly allowed
Startd as COD job • Overview: • Launch personal condor • Run startds as COD jobs on base pool • Report to personal Condor • Base jobs suspend • Submit parallel job to personal Condor • Remove COD startds
Startd under COD: Details • Two condor_config files: careful! • COD provides no file transfer • Can re-use existing startd binary • Need to pre-stage or NFS config_file • Don’t lose claimid!
Example code • HOSTS=“a b c” • For h in hosts do; • Condor_cod request –name $h > claimid.$h • For n in claimid.* do; • Condor_cod activate –id `cat $n` -jobad ja
Cod JOB_AD • CMD = “/nfs/path/run-startd.sh” • IWD = “/tmp” • Out = “startd.out” • Err = “startd.err” • Universe = 5
Run-startd.sh • Mkdir –p p-condor/{spool,log,execute) • CONDOR_CONFIG=/nfs/new_config • Exec /usr/sbin/condor_master –f -t
Summary • Use condor daemons as components • Mix-and-match as needed
Questions? • Thank You!