270 likes | 370 Views
Tips For Deploying Large Pools. Overview. When supporting pools of hundreds or thousands of machines, there are some potentially tricky issues that can come up. Here I’ll address a few of them and talk about some solutions and workarounds. Scalability Questions.
E N D
Overview When supporting pools of hundreds or thousands of machines, there are some potentially tricky issues that can come up. Here I’ll address a few of them and talk about some solutions and workarounds.
Scalability Questions • How many jobs can I submit at once? • How many machines can I have in my pool? • Does it matter how long my jobs run? • What other factors impact scalability?
Job Queue • The condor_schedd can be one of the major bottlenecks in a condor system. • One schedd *can* hold 50000 jobs (or perhaps more) but it becomes painful to use, and can bring the throughput of your pool way down.
Why? • Besides consuming an enormous amount of memory and disk, having lots of jobs in the queue impacts the time it takes to match jobs. • The condor_schedd is single-threaded. • So, while running condor_q and waiting for 10000 jobs to be listed, the schedd can’t be doing other things, like actually starting jobs (spawning shadows) • It also cannot talk to the negotiator to match new jobs… which causes the negotiator to perhaps timeout waiting!
Job Queue • One options is to use DAGMan to throttle the number of submitted jobs. • Add all your jobs to a DAG (even if there are no dependences) and then do: condor_submit_dag -maxjobs 200 • DAGman will then never allow more than 200 jobs from this batch to be submitted at once.
DAGMan • DAGMan also provides the nice ability to retry jobs that fail • Each DAGMan batch of is independent of any others, i.e. the maxjobs is only for a particular batch of jobs • Can add a delay between submission of jobs so the condor_schedd isn’t swamped using: • DAGMAN_SUBMIT_DELAY = 5
Other Small Time Savers When Submitting • In the submit file: • COPY_TO_SPOOL = FALSE • In the condor_config file: • SUBMIT_SKIP_FILECHECK = TRUE
File Transfer • If you are using Condor’s file transfer mechanism and you are also using encryption, the overhead can be significant. • Condor 6.7 allows per-file specification of whether or not to use encryption
Per-File Encryption . . . Transfer_input_files = big_tarball.tgz, sec.key Encrypt_input_files = sec.key Dont_encrypt_input_files = big_tarball.tgz . . .
Per-File Encryption . . . Transfer_input_files = big_tarball.tgz, sec.key Encrypt_input_files = *.key Dont_encrypt_input_files = *.tgz . . .
Job Queue • My machine is running 800 jobs and the load is too high!! How can I throttle this? • Use MAX_JOBS_RUNNING in the condor_config file. By default, this is set to 300. (You may actually wish to increase this if your submit machine can handle it) • This controls how many shadows the schedd will spawn
Pool Size • Some of the largest known condor pools are over 4000 nodes • Some have 1 VM per actual CPU, and some have multiple VMs per CPU
Central Manager • If you have a lot of machines sending updates to your central manager, it is possible you are losing some of the periodic updates. You can determine if this is the case using the COLLECTOR_DAEMON_STATS feature…
Keeping Update Stats COLLECTOR_DAEMON_STATS = True COLLECTOR_DAEMON_HISTORY_SIZE = 128 % condor_status -l | grep Updates UpdatesTotal = 57200 UpdatesSequenced = 57199 UpdatesLost = 2 UpdatesHistory = "0x00000000800000000000000000000000"
If Your Network Is Swamped • You can make many different intervals longer: • UPDATE_INTERVAL = 300 • SCHEDD_INTERVAL = 300 • MASTER_UPDATE_INTERVAL = 300 • ALIVE_INTERVAL = 300
Negotiation • Normally considers each job separately • Can I run this job? No… • Can I run this job? No… • Can I run this job? No… • Etc…
Negotiation 10/12 09:08:45 Request 00463.00000: 10/12 09:08:45 Rejected 463.0 zmiller@cs.wisc.edu <128.105.166.24:37845>: no match found 10/12 09:08:45 Request 00464.00000: 10/12 09:08:45 Rejected 464.0 zmiller@cs.wisc.edu <128.105.166.24:37845>: no match found 10/12 09:08:45 Request 00465.00000: 10/12 09:08:45 Rejected 465.0 zmiller@cs.wisc.edu <128.105.166.24:37845>: no match found
Negotiation • This process can be greatly sped up if you know which attributes are important to each job: • SIGNIFICANT_ATTRIBUTES = Owner,Cmd • Then, once a job is rejected, any more jobs of the same “class” can be skipped immediately. • The less time the schedd spends talking to the negotiator, the better
Job Length • The length of your jobs matters! • There is overhead in scheduling a job, moving the data, starting shadows and starters, etc. • Jobs that run just a few seconds incur way more overhead than they do work!
Job Length • So if your jobs are too short, the schedd basically cannot keep up with keeping the pool busy • There is of course no exact formula for how long they should run, but longer-running jobs usually get better overall throughput (assuming no evictions!)
Other factors • If you have many jobs running on a single submit host, you may want to increase some of your resource limits • On linux (and others I’m sure) there are system-wide limits and per-process limits on the number of file descriptors (FDs).
Resource Limits • System Wide: • Edit /etc/sysctl.conf: # increase system fd limit fs.file-max = 32768 • Or: echo 32768 > /proc/sys/fs/file-max
Resource Limits • Per-Process: • Edit /etc/security/limits.conf • Or: su - root ulimit -n 16384 # for sh limit descriptors 16384 # for csh su - your_user_name <run job here>
Port Ranges • Default range is 1024 to 4999 • Again, in /etc/sysctl.conf: # increase system IP port limitsnet.ipv4.ip_local_port_range = 1024 65535 • Or: echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range
Complex Problem • Exactly how much work a system can do is a fairly complex problem since you are dealing with many types of resources (CPU, disk, network I/O) • Some experimentation is necessary.
Questions? Thank You!