Instant-access cycle stealing for parallel applications requiring interactive response

Instant-access cycle stealing for parallel applications requiring interactive response Paul Kelly (Imperial College) Susanna Pelagatti (University of Pisa) Mark Rossiter (ex-Imperial, now with Telcordia)

Application scenario… • Workplace with fast LAN and many PCs • Some users occasionally need high computing power to accelerate interactive tasks • Example:CAD • Interactive design of components/structures • Analyse structural properties • Simulate fluid flow • Compute high-resolution rendering • Most PCs are under-utilised most of the time • Can we use spare CPU cycles to improve responsiveness?

The challenge… • Cycle stealing the easy way… • Maintain a batch queue • Maximise throughput for multiple, long-running jobs • Wait til desktop users leave their desks • This paper is about doing it the hard way: • Using spare cycles to accelerate short, parallel tasks (5-60 seconds) • In order to reduce interactive response time • While desktop users are at their desks • This means: • No batch queue – execute immediately using resources instantaneously available • No time to migrate or checkpoint tasks • No time to ship data across wide-area network

A challenging environment… • For our experiments, we used a group of 32 Linux PCs in a very busy CS student lab • Graph shows hourly-average percentage utilisation (on a log scale) over a typical day • Although not 100% busy, machines in continuous use

Scenario • Host PCs service interactive desktop users • Requests to execute parallel guest jobs arrive intermittently • System allocates group of idle PCs to execute guest job • Objectives: • Minimise average response time for guest jobs • Keep interference suffered by hosts within reasonable limits • We show that this can really work, even in our extremely challenging environment • Next: characterise patterns of idleness • Then: design software to assign guest tasks • Then: evaluate alternative strategies by simulation

Earlier work • Batch queue, multiple long-running jobs • Parallel jobs • “60-workstation cluster can handle job arrival trace taken from a dedicated 32-node CM-5” • Wide-area networks • Our goal: Improve response time for individual tasks • Litzkow, Livny, Mutka, “Condor - a hunter of idle workstations”. ICDCS’88. • Atallah, Black, et al, “Models and algorithms for co-scheduling compute-intensive tasks on networks of workstations”. JPDC 1992. • Arpaci, Dusseau et al “The interaction of parallel and sequential workloads on a network of workstations”. SIGMETRICS’95 • Acharya, Edjlali, Saltz, “The utility of exploiting idle workstations for parallel computing”. SIGMETRICS’97 • Petrini, Feng, “Buffered coscheduling: a new methodology for multitasking parallel jobs on distributed systems”. IPDPS 2000. • United Devices, Seti@home, Entropia • Subholk, Lieu, Lowekamp, “Automatic node selection for high performance applications on networks”. PPoPP 1999.

Characterize patterns of idleness • Idle periods occur frequently • 90% of idle periods occur within 5s Idle = over a one second period, less than 10% of CPU time is spent executing user processes, and at least 90% of CPU time could be devoted to a new process

Characterize patterns of idleness • Idle periods occur frequently • 90% of idle periods occur within 5s • Idle periods don’t last long • Only 50% last more than 3.3s Idle = over a one second period, less than 10% of CPU time is spent executing user processes, and at least 90% of CPU time could be devoted to a new process

Distribution of idleness – 32 PCs in busy student lab • It’s very likely that we’ll have up to 15 idle machines at any time

Distribution of idleness – 32 PCs in busy student lab • It’s unlikely that the same 15 machines will stay idle for long • It’s very likely that we’ll have up to 15 idle machines at any time

So how much can we hope to get? • With our 32-PC cluster, an idle group of 5 processors has about a 50% chance of remaining idle for more than 5 seconds • This is our parallel computing resource!

The mpidled software • mpidled is a Linux daemon process which runs on every participating PC: • Monitors system utilisation, determines whether system is idle • Uses this and past measurements to predict short-term future utilization • mpidle is a client application which lists the participating PCs which are currently predicted to be idle • Produces list of machine names, for use as MPI machinefile

Zero administration by leadership election • Participating PCs are regularly unplugged and rebooted • Vital to minimize systems administration overheads… • Mpidled daemons autonomously elect “leader” to handle client requests (current implementation relies on LAN broadcast, confined to one subnet) • Mpidle usually responds in less than 0.15s

Load prediction • We use recent measurements of idleness to predict how idle each PC will be in the future • Good prediction leads to • shorter execution time for guest jobs • Less interference with host processes, ie the desktop user • We’re interested in short-running guest jobs – so we don’t consider migrating tasks if the prediction turns out wrong

How good is load prediction? • Previous studies (Dinda and O’Halloran, Wolski et al) have shown that taking the weighted mean of the last few samples works as well as anything For 10-second prediction Forecast length (seconds)

How well does it work? • Simulation, driven by traces from 32 machines gathered over one week, during busy working hours • Uses application’s speedup curve to predict execution time given number of processors available • Also uses trace load data to compute CPU share available on each processor • For this study, we simulated execution of a ray-tracing task • Sequential execution takes 42 seconds • Speedup is more-or-less linear with 50-60% efficiency • Requests to execute a guest task arrive with an exponential distribution, with mean inter-arrival time of 20 seconds

How well does it work - baseline • Disruption to desktop users is dramatically reduced compared to assigning work at random (but not zero) • Although many processors used, speedup is low • Quite often, a guest task is rejected because no processor is idle • Usually because earlier guest task is still running

Allocation policy matters… • The simplest policy is to allocate all available (idle) processors to each guest job • This leads to a bimodal distribution: a substantial proportion of guest jobs get little or no benefit

A better strategy – holdback • The problem: • If a second guest task arrives before the first has finished, very few processors are available to run it • Idea: “holdback” • Hold back a percentage r of processors in reserve • Each guest task is allocated (1-r) of the available (idle) processors

Holdback improves fairness • By holding back some resources at each allocation, guest tasks get a more predictable and consistent share • How much to hold back depends on rate of arrival of guest tasks Frequency (%) Group size Group size Group size

How much to hold back • Mean speedup maximised with right holdback • Parallel efficiency lower than would be on dedicated parallel system, due to interference • Larger group size doesn’t imply higher speedup • Details depend on speedup characteristics of guest application workload

Conclusions & Further work • Simple, effective tool, to be made freely available • Even extremely busy environments can host a substantial parallel workload • Short interactive jobs can be accelerated, if • Relatively small startup cost, data size • Parallel execution time lies within scope of load prediction – 10 seconds or so • Desktop users prepared to tolerate some interference • Plenty of scope for further study… • Memory contention • Adaptive holdback • Integrate with queuing to handle longer-running jobs • How to reduce startup delay?

Instant-access cycle stealing for parallel applications requiring interactive response

Instant-access cycle stealing for parallel applications requiring interactive response

Presentation Transcript

Interactive Voice Response

Distributed Interactive Applications

Instant-access cycle stealing for parallel applications requiring interactive response

Interactive Nitrogen Cycle

Stealing Home Interactive Vocabulary PowerPoint

Stealing

‘Stealing’

Cycle Stealing (kind of)

Interactive Nitrogen Cycle

SMART Response PE Interactive Response System

Introductory Parallel Applications

Interactive Voice and Video Response Applications

Interactive Voice Response (IVR) Systems: Mobile Applications for Low-Literate Users

Interactive Distributed Generation for Demand Response

3 Parallel Applications

IVR (Interactive Voice Response)

PARALLEL APPLICATIONS

Impostors for Interactive Parallel Computer Graphics

Interactive Web3D applications

Scaling Parallel Applications

Interactive Course for Parallel Programming

Access Applications