300 likes | 408 Views
Advanced Condor mechanisms CERN Feb 14 2011. a better title… “Condor Potpourri”. Igor feedback “Could be useful to people, but not Monday” If not of interest, new topic in 1 minute . Central Manager Failover. Condor Central Manager has two services condor_collector
E N D
a better title…“Condor Potpourri” • Igor feedback “Could be useful to people, but not Monday” • If not of interest, new topic in 1 minute
Central Manager Failover • Condor Central Manager has two services • condor_collector • Now a list of collectors is supported • condor_negotiator (matchmaker) • If fails, election process, another takes over • Contributed technology from Technion
Submit node robustness:Job Progress continues if connection is interrupted • Condor supports reestablishment of the connection between the submitting and executing machines. • If network outage between execute and submit machine • If submit machine restarts • To take advantage of this feature, put the following line into their job’s submit description file: JobLeaseDuration = <N seconds> For example: job_lease_duration = 1200
Submit node robustness:Job Progress continues if submit machine fails Automatic Schedd Failover Condor can support a submit machine “hot spare” • If your submit machine A is down for longer than N minutes, a second machine B can take over • Requires shared filesystem (or just DRBD*?) between machines A and B *Distributed Replicated Block Device – www.drbd.org
Interactive Debugging • Why is my job still running?Is it stuck accessing a file?Is it in an infinite loop? • condor_ssh_to_job • Interactive debugging in UNIX • Use ps, top, gdb, strace, lsof, … • Forward ports, X, transfer files, etc.
condor_ssh_to_job Example % condor_q -- Submitter: perdita.cs.wisc.edu : <128.105.165.34:1027> : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 einstein 4/15 06:52 1+12:10:05 R 0 10.0 cosmos 1 jobs; 0 idle, 1 running, 0 held % condor_ssh_to_job 1.0 Welcome to slot4@c025.chtc.wisc.edu! Your condor job is running with pid(s) 15603. $ gdb –p 15603 …
How it works • ssh keys created for each invocation • ssh • Uses OpenSSH ProxyCommand to use connection created by ssh_to_job • sshd • runs as same user id as job • receives connection in inetd mode • So nothing new listening on network • Works with CCB and shared_port
What?? Ssh to my worker nodes?? • Why would any sysadmin allow this? • Because the process tree is managed • Cleanup at end of job • Cleanup at logout • Can be disabled by nonbelievers
Concurrency Limits • Limit job execution based on admin-defined consumable resources • E.g. licenses • Can have many different limits • Jobs say what resources they need • Negotiator enforces limits pool-wide 11
Concurrency Example • Negotiator config file • MATLAB_LIMIT = 5 • NFS_LIMIT = 20 • Job submit file • concurrency_limits = matlab,nfs:3 • This requests 1 Matlab token and 3 NFS tokens 12
Green Computing • The startd has the ability to place a machine into a low power state. (Standby, Hibernate, Soft-Off, etc.) • HIBERNATE, HIBERNATE_CHECK_INTERVAL • If all slots return non-zero, then the machine can powered down via condor_power hook • A final acked classad is sent to the collector that contains wake-up information • Machines ads in “Offline State” • Stored persistently to disk • Ad updated with “demand” information: if this machine was around, would it be matched?
condor_rooster • Periodically wake up based on ClassAd expression (Rooster_UnHibernate) • Throttling controls • Hook callouts make for interesting possibilities…
Dynamic Slot Partitioning Divide slots into chunks sized for matched jobs Readvertise remaining resources Partitionable resources are cpus, memory, and disk See Matt Farrellee’s talk 20
Dynamic Partitioning Caveats • Cannot preempt original slot or group of sub-slots • Potential starvation of jobs with large resource requirements • Partitioning happens once per slot each negotiation cycle • Scheduling of large slots may be slow 21
High Throughput Parallel Computing • Parallel jobs that run on a single machine • Today 8-16 cores, tomorrow 32+ cores • Use whatever parallel software you want • It ships with the job • MPI, OpenMP, your own scripts • Optimize for on-board memory access
Configuring Condor for HTPC • Two strategies: • Suspend/drain jobs to open HTPC slots • Hold empty cores until HTPC slot is open • We have a recipe for the former on the Condor Wiki • http://condor-wiki.cs.wisc.edu • User accounting enabled by Condor’s notion of “Slot Weights”
j1 j2 j3 j4 CPU AffinityFour core Machinerunning four jobs w/o affinity core1 core2 core3 core4 j3a j3b j3c j3d
CPU Affinityto the rescue SLOT1_CPU_AFFINITY = 0 SLOT2_CPU_AFFINITY = 1 SLOT3_CPU_AFFINITY = 2 SLOT4_CPU_AFFINITY = 3
j1 j2 j3 j4 Four core Machinerunning four jobs w/affinity core1 core2 core3 core4 j3a j3b j3c j3d
Condor + Hadoop FS (HDFS) • Condor+HDFS = 2 + 2 = 5 !!! • A Synergy exists (next slide) • Hadoop as distributed storage system • Condor as cluster management system • Large number of distributed disks in a compute cluster Managing disk as a resource
condor_hdfs daemon • Main integration point of HDFS within Condor • Configures HDFS cluster based on existing condor_config files • Runs under condor_master and can be controlled by existing Condor utilities • Publish interesting parameters to Collector e.g IP address, node type, disk activity • Currently deployed at UW-Madison
Condor + HDFS : Next Steps? • Integrate with File Transfer Mechanism • FileNode Failover • Management of HDFS • What about HDFS in a GlideIn environment?? • Online transparent access to HDFS??
Remote I/O Socket • Job can request that the condor_starter process on the execute machine create a Remote I/O Socket • Used for online access of file on submit machine – without Standard Universe. • Use in Vanilla, Java, … • Libraries provided for Java and for C, e.g. : Java: FileInputStream -> ChirpInputStream C : open() -> chirp_open() • Or use Parrot!
shadow starter Secure Remote I/O Local I/O (Chirp) I/O Server I/O Proxy Fork Local System Calls Job Home File System I/O Library Submission Site Execution Site
DMTCP • Written at Northeastern U. and MIT • User-level process checkpoint/restart library • Fewer restrictions than Condor’s Standard Universe • Handles threads and multiple processes • No re-link of executable • DMTCP and Condor Vanilla Universe integration exists via a job wrapper script