1 / 15

Computer Science 320

Computer Science 320. Load Balancing for Hybrid SMP/ Clusters. Load Balancing Strategies. For SMP, use a dynamic schedule to break the work into smaller chunks to keep the threads continually busy

jalena
Download Presentation

Computer Science 320

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Science 320 Load Balancing for Hybrid SMP/Clusters

  2. Load Balancing Strategies • For SMP, use a dynamic schedule to break the work into smaller chunks to keep the threads continually busy • For cluster, use the master/worker pattern with a dynamic schedule to keep the nodes continually busy • For hybrid, put several worker threads in each node, and schedule them as in the cluster program

  3. One-Level Scheduling Strategy Cluster Hybrid

  4. Hybrid Mandelbrot Set Program • Each of Kp nodes has Kt worker threads • Node 0 has one extra thread (the master) • Each worker thread is numbered, from 0 to Kt * Kp - 1 • The master thread communicates with all worker threads; message tags identify them

  5. Set Up and Run the Threads ParallelTeamteam = new ParallelTeam (rank == 0 ? Kt+1 : Kt); // Every parallel team thread runs the worker section, except thread Kt // (which exists only in process 0) runs the master section. team.execute(new ParallelRegion(){ public void run() throws Exception{ if (getThreadIndex() == Kt) masterSection(); else workerSection(rank * Kt + getThreadIndex()); } }); The workerSection method takes a parameter to identify the thread for messages to and from the master thread

  6. Scheduling the Threads in the Master private static void masterSection()throws IOException{ intprocess, thread, worker; Range range; // Set up a schedule object to divide the row range into chunks. IntegerScheduleschedule = IntegerSchedule.runtime(); schedule.start(K, new Range(0, height-1)); // Send initial chunk range to each worker. If range is null, no more // work for that worker. Keep count of active workers. intactiveWorkers = K; // (Kp * Kt) for (process = 0; process < Kp; ++ process) for (thread = 0; thread < Kt; ++ thread) worker = process * Kt + thread; range = schedule.next(worker); world.send(process, worker, ObjectBuf.buffer(range)); if (range == null) --activeWorkers; }

  7. Scheduling the Threads in the Master private static void masterSection()throws IOException{ intprocess, thread, worker; Range range; // Repeat until all workers have finished. while (activeWorkers > 0){ // Receive an empty message from any worker. CommStatusstatus = world.receive(null, null, IntegerBuf.emptyBuffer()); process = status.fromRank; worker = status.tag; // Send next chunk range to that specific worker. // If null, no more work. range = schedule.next(worker); world.send(process, worker, ObjectBuf.buffer (range)); if (range == null) --activeWorkers; } }

  8. Worker Thread Activity: Receive private static void workerSection(int worker) throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, worker, rangeBuf); Range range = rangeBuf.item; if (range == null) break; intlb = range.lb(); intub = range.ub(); intlen = range.length(); // Allocate storage for matrix row slice if necessary. if (slice == null || slice.length < len) slice = new int [len] [width]; // Code to compute rows and columns of slice goes here.

  9. Worker Thread Activity: Send private static void workerSection(int worker) throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, worker, rangeBuf); Range range = rangeBuf.item; if (range == null) break; . . . . . . // Report completion of slice to master. world.send(0, worker, IntegerBuf.emptyBuffer()); // Set full pixel matrix rows to refer to slice rows. System.arraycopy(slice, 0, matrix, lb, len); // Write row slice of full pixel matrix to image file. writer.writeRowSlice(range); }

  10. One-Level Scheduling Performance • With one master and Kt * Kp workers, lots of messages just to schedule them all • Two-level scheduling: • One worker per node, but each worker uses multiple threads • Two schedules, one from the master for each worker and one from each worker for its threads

  11. Two-Level Scheduling

  12. Changes to Program • Master uses a schedule with chunk size of 100, worker uses schedule with chunk size of 1 • Master node has two parallel sections as well as a worker team • No worker tags needed • Master section has no changes otherwise

  13. Set Up and Run the Threads // In master process, run master section and worker section in parallel. if (rank == 0) new ParallelTeam(2).execute (new ParallelRegion(){ public void run() throws Exception{ execute(new ParallelSection(){ public void run() throws Exception{ masterSection(); } }, new ParallelSection(){ public void run() throws Exception{ workerSection(); } }); } }); // In worker process, run only worker section. else workerSection();

  14. Worker Thread Activity private static void workerSection() throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . // Parallel team to calculate each slice in multiple threads. ParallelTeamteam = new ParallelTeam(); for (;;){ // Receive chunk range from master. If null, no more work. ObjectItemBuf<Range> rangeBuf = ObjectBuf.buffer(); world.receive(0, rangeBuf); Range range = rangeBuf.item; if (range == null) break; final intlb = range.lb(); final intub = range.ub(); final intlen = range.length(); // Allocate storage for matrix row slice if necessary. if (slice == null || slice.length < len) slice = new int [len] [width];

  15. Worker Thread Activity private static void workerSection() throws IOException{ // Image, writer, matrix, and row slice variables are now local here. . . . // Parallel team to calculate each slice in multiple threads. ParallelTeamteam = new ParallelTeam(); for (;;){ . . . // Compute rows of slice in parallel threads. team.execute(new ParallelRegion(){ public void run() throws Exception{ execute (lb, ub, new IntegerForLoop(){ // Use the thread-level loop schedule. public IntegerSchedule schedule(){ return thrschedule; } // Compute all rows and columns in slice. public void run (int first, int last){ for (int r = first; r <= last; ++ r){ // Yadah, yadah, yadah

More Related