230 likes | 323 Views
Processor Co-Allocation in Multicluster Systems. DAS-2 Workshop Amsterdam June 6, 2002 Anca Bucur and Dick Epema Parallel and Distributed Systems Group Delft University of Technology. Introduction (1).
E N D
Processor Co-Allocation in Multicluster Systems DAS-2 Workshop Amsterdam June 6, 2002 Anca Bucur and Dick Epema Parallel and Distributed Systems Group Delft University of Technology D.H.J. Epema/PDS/TUD
Introduction (1) • In multicluster systems (like the DAS, in GRIDs), jobs may use co-allocation (i.e., span multiple clusters): • to use available capacity • to process geographically spread data • Single-application performance issues: • application restructuring • wide-area runtime systems (e.g., optimize collective communication operations) • Multiple-application performance issues: • design/analyze scheduling policies • minimize response time, maximize maximal utilization D.H.J. Epema/PDS/TUD
Introduction (2): Example • In april 2001, the Cactus Computational Toolkit was used for four-hour astrophysics simulations involving Einstein’s General Relativity equations • Equipment: • At NCSA: 480 CPUs of three SGI Origin2000 systems • At SDSC: 1020 CPUs of Blue Horizon • OC-12 622-Mbit/s network D.H.J. Epema/PDS/TUD
Introduction (3): Problems 1 job: 2 3 cluster 3 fits with if flexible cluster 2 processors (pattern: idle) fits with if unordered cluster 1 time D.H.J. Epema/PDS/TUD
System Model • Multicluster system consisting of clusters of processors of equal speed • Communication speed ratio : the ratio of the wide-area and local message transfer times …. D.H.J. Epema/PDS/TUD
Job Components • A job consists of job components that each go to a single cluster, one task per processor • Distributions of job-component sizes: • Uniform: U[a,b] • Truncated and adapted geometric (favors small sizes and powers of 2): D(q) on [1,b] …. system job …. D.H.J. Epema/PDS/TUD
Job Request Types (1) • Ordered and unordered requests specify their job-component sizes: Ordered: Unordered: ? …. …. …. …. D.H.J. Epema/PDS/TUD
Job Request Types (2) • Flexible and total requests only specify the total number of processors needed: flexible: total: ? …. D.H.J. Epema/PDS/TUD
Fitting a Job (1) • It is clear when an ordered or a total request fits • For an unordered request: • order components according to decreasing sizes • use First-Fit (FF) or Worst-Fit (WF) .… job WF idle system …. in use D.H.J. Epema/PDS/TUD
Fitting a Job (2) • For a flexible request: • determine minimal number of clusters needed • fill least-loaded clusters (CF) completely, or balance load (LB) (variation: LB-A) CF LB idle job in use D.H.J. Epema/PDS/TUD
Scheduling Policies • First Come First Served • Fit Processors First Served: search queue for jobs that fit job queue system …. …. …. D.H.J. Epema/PDS/TUD
Interarrival/Service Times • Poisson arrival process in simulations • All tasks in a job have the same service time • Service-time distributions used: • Deterministic (mean 1) • Exponential (mean 1) • Hyperexponential (mean 1, coeff. of var. 3) • Derived from the DAS D.H.J. Epema/PDS/TUD
Communication • We model jobs without and with communication • With communication: • tasks alternate between compute and communication phases • communication phase: all-to-all personalized communication • time for a single local synchronous message send operation: 0.001 • communication speed ratios considered: 1-100 D.H.J. Epema/PDS/TUD
Single-cluster DAS Statistics service time number of jobs number of jobs nodes requested mean: 23.34 coeff. of var.: 1.11 mean: 356.45 (62.66) coeff. of var.: 5.37 D.H.J. Epema/PDS/TUD
Performance Evaluation • Parameters we vary: • job request structure • job-component-size distribution • service-time distribution • number and sizes of clusters (base case: 4x32) • placement of unordered and flexible jobs • scheduling policy • communication speed ratio • co-allocation versus no co-allocation • queueing structure (global/local) • Performance metrics: • mean response time (only simulation) • maximal utilization (analysis and simulation) D.H.J. Epema/PDS/TUD
Influence of Structure and Size ordered response time response time unordered total response time utilization utilization D.H.J. Epema/PDS/TUD
Influence of Communication Speed Ratio utilization response time response time 10 100 utilization Right to left: total, flexible, unordered, ordered D.H.J. Epema/PDS/TUD
Co-Allocation versus no Co-Alloc. (1) flexible 2 components 4 components 1 component utilization • no communication • unordered jobs • job size: • 4xD(0.9) on [1,8] • (fits on a single • cluster) response time D.H.J. Epema/PDS/TUD
Co-allocation versus no Co-alloc. (2) utilization LB-A, ratio 5 LB-A, ratio 50 no co-allocation, FF • communication • flexible jobs • job size: • 4xD(0.9) on [1,8] response time D.H.J. Epema/PDS/TUD
An Application on the DAS (1) • Solves the Poisson equation with a red-black Gauss-Seidel scheme • Measurements on the DAS (times in ms): • Time for diffusing local errors and computing the global error: 14 ms D.H.J. Epema/PDS/TUD
An Application on the DAS (2) total ordered utilization response time Equal mix of jobs of sizes (2,2,2,2) and (4,4,4,4) D.H.J. Epema/PDS/TUD
Maximal Utilization (1) • Assume: constant backlog, ordered jobs, exponential service (no communication) • Consider: the joint probability distribution of the sizes of jobs in the system • Result: this distribution is the same • when the system runs for a long time • when the system is filled from the empty state • Use the convolution of the job-size distribution to determine the distribution of the numbers of jobs in the system • Compute the maximal utilization D.H.J. Epema/PDS/TUD
Maximal Utilization (2) • We have an approximation for the maximal utilization for unordered jobs with WF • We use simulations to validate this approximation • Capacity loss (1-max. util.) for 4 clusters of size 32, uniform job-component sizes: D.H.J. Epema/PDS/TUD