150 likes | 236 Views
Computer Science 320. Load Balancing for Hybrid SMP/ Clusters. Load Balancing Strategies. For SMP, use a dynamic schedule to break the work into smaller chunks to keep the threads continually busy
E N D
Computer Science 320 Load Balancing for Hybrid SMP/Clusters
Load Balancing Strategies • For SMP, use a dynamic schedule to break the work into smaller chunks to keep the threads continually busy • For cluster, use the master/worker pattern with a dynamic schedule to keep the nodes continually busy • For hybrid, put several worker threads in each node, and schedule them as in the cluster program
One-Level Scheduling Strategy Cluster Hybrid
Hybrid Mandelbrot Set Program • Each of Kp nodes has Kt worker threads • Node 0 has one extra thread (the master) • Each worker thread is numbered, from 0 to Kt * Kp - 1 • The master thread communicates with all worker threads; message tags identify them
Set Up and Run the Threads ParallelTeamteam = new ParallelTeam (rank == 0 ? Kt+1 : Kt); // Every parallel team thread runs the worker section, except thread Kt // (which exists only in process 0) runs the master section. team.execute(new ParallelRegion(){ public void run() throws Exception{ if (getThreadIndex() == Kt) masterSection(); else workerSection(rank * Kt + getThreadIndex()); } }); The workerSection method takes a parameter to identify the thread for messages to and from the master thread
Scheduling the Threads in the Master private static void masterSection()throws IOException{ intprocess, thread, worker; Range range; // Set up a schedule object to divide the row range into chunks. IntegerScheduleschedule = IntegerSchedule.runtime(); schedule.start(K, new Range(0, height-1)); // Send initial chunk range to each worker. If range is null, no more // work for that worker. Keep count of active workers. intactiveWorkers = K; // (Kp * Kt) for (process = 0; process < Kp; ++ process) for (thread = 0; thread < Kt; ++ thread) worker = process * Kt + thread; range = schedule.next(worker); world.send(process, worker, ObjectBuf.buffer(range)); if (range == null) --activeWorkers; }
Scheduling the Threads in the Master private static void masterSection()throws IOException{ intprocess, thread, worker; Range range; // Repeat until all workers have finished. while (activeWorkers > 0){ // Receive an empty message from any worker. CommStatusstatus = world.receive(null, null, IntegerBuf.emptyBuffer()); process = status.fromRank; worker = status.tag; // Send next chunk range to that specific worker. // If null, no more work. range = schedule.next(worker); world.send(process, worker, ObjectBuf.buffer (range)); if (range == null) --activeWorkers; } }
Worker Thread Activity: Receive private static void workerSection(int worker) throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, worker, rangeBuf); Range range = rangeBuf.item; if (range == null) break; intlb = range.lb(); intub = range.ub(); intlen = range.length(); // Allocate storage for matrix row slice if necessary. if (slice == null || slice.length < len) slice = new int [len] [width]; // Code to compute rows and columns of slice goes here.
Worker Thread Activity: Send private static void workerSection(int worker) throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, worker, rangeBuf); Range range = rangeBuf.item; if (range == null) break; . . . . . . // Report completion of slice to master. world.send(0, worker, IntegerBuf.emptyBuffer()); // Set full pixel matrix rows to refer to slice rows. System.arraycopy(slice, 0, matrix, lb, len); // Write row slice of full pixel matrix to image file. writer.writeRowSlice(range); }
One-Level Scheduling Performance • With one master and Kt * Kp workers, lots of messages just to schedule them all • Two-level scheduling: • One worker per node, but each worker uses multiple threads • Two schedules, one from the master for each worker and one from each worker for its threads
Changes to Program • Master uses a schedule with chunk size of 100, worker uses schedule with chunk size of 1 • Master node has two parallel sections as well as a worker team • No worker tags needed • Master section has no changes otherwise
Set Up and Run the Threads // In master process, run master section and worker section in parallel. if (rank == 0) new ParallelTeam(2).execute (new ParallelRegion(){ public void run() throws Exception{ execute(new ParallelSection(){ public void run() throws Exception{ masterSection(); } }, new ParallelSection(){ public void run() throws Exception{ workerSection(); } }); } }); // In worker process, run only worker section. else workerSection();
Worker Thread Activity private static void workerSection() throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . // Parallel team to calculate each slice in multiple threads. ParallelTeamteam = new ParallelTeam(); for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, rangeBuf); Range range = rangeBuf.item; if (range == null) break; final intlb = range.lb(); final intub = range.ub(); final intlen = range.length(); // Allocate storage for matrix row slice if necessary. if (slice == null || slice.length < len) slice = new int [len] [width];
Worker Thread Activity private static void workerSection() throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . // Parallel team to calculate each slice in multiple threads. ParallelTeamteam = new ParallelTeam(); for (;;){ . . . // Compute rows of slice in parallel threads. team.execute(new ParallelRegion(){ public void run() throws Exception{ execute (lb, ub, new IntegerForLoop(){ // Use the thread-level loop schedule. public IntegerSchedule schedule(){ return thrschedule; } // Compute all rows and columns in slice. public void run (int first, int last){ for (int r = first; r <= last; ++ r){ // Yadah, yadah, yadah