Applying Scheduling and Tuning to On-line Parallel Tomography

Applying Scheduling and Tuning to On-line Parallel Tomography Shava Smallen Indiana University Henri Casanova, Francine Berman University of California at San Diego San Diego Supercomputer Center

Outline • Introduction to On-line Parallel Tomography • Tunable On-line Parallel Tomography • User-directed application-level scheduler • Experiments • Summary

What is tomography? • Tomography: a method for reconstructing the interior of an object from its projections • National Center for Microscopy and Imaging Research (NCMIR) • Electron Microscopy Electron Microscope

Example • Compute and data-intensive • E.g. 2k x 2k dataset (pixels) • 2k units of work (slices) • Total input data size: 976 MB • Total output data size: 9.6 GB • Compute time: ~ 16 days on a standard workstation • Off-line • Data collection • Data processing • Data viewing Tomogram of spiny dendrite (Images courtesy of Steve Lamont)

On-line Parallel Tomography on-line parallel tomography • Provide interactive soft real-time feedback on quality of data acquisition • High tomogram resolution and frequent refreshes • Efficiency benefits for users and microscope

NCMIR Compute Platform • Distributed multi-user, heterogeneous Grid • Meteor cluster (SDSC) • Pentium III dual procs (Linux) Blue Horizon (SDSC) 1152 procs (AIX, Loadleveler, Maui Scheduler) network • NCMIR cluster • SGI Indigo2, SGI Octane (IRIX) • SUN ULTRA, SUN Enterprise (Solaris)

Application Tunability on-line parallel tomography reduce(f) • On-line parallel tomography is a tunable application • [Chang,et al] Availability of alternate configurations • Resource utilization • Output • On-line parallel tomography output • Tomogram resolution • Refresh frequency • Tunability controlled by configuration pair ( f, r ) where • fis the reduction factor (tomogram resolution) • ris the number of projections per refresh (refresh frequency) • E.g. (2,3)

Tunability/Scheduling • At run-time, we need to find out which configuration pairs are feasible • Flexibility to allow for trade-offs between f and r • e.g., (2, 3 ) or (3, 2) • Resource availability • User bounds • E.g., • Refresh at least once every 10 minutes • Minimum image resolution 256 x 256 pixels • A configuration pair is feasible if we can find a corresponding schedule • We choose an adaptive-scheduling approach

Application-Level Scheduler (AppLeS) • Enable an application to adaptively schedule its execution on distributed, heterogeneous resources in order to improve performance • Type of information used: • static • e.g. application model, network topology, … • dynamic • e.g. Network Weather Service (NWS) - available CPU, bandwidth, … AppLeS + application = self-scheduling application

generate request • User-directed AppLeS • Involves user in scheduling process • Flexible process infeasible adjust request request feasible display pairs review rejects all pairs accepts one find schedule User-directed AppLeS User execute on-line parallel tomography

On-line Parallel Tomography Architecture worker Update tomogram slices worker worker scanlines worker projection worker writer preprocessor

Scheduling Approach • Constrained optimization problem based on soft real-time execution • compute constraint • static benchmark, dynamic CPU availability (NWS) • transfer constraint • topology info (ENV), dynamic bandwidth (NWS) • Problem is a nonlinear program • Exploit small range of fto reduce to multiple mixed integer programs which is solved via lp_solve • approximate solution

Experiments • Goals: • Set 1 – Scheduler Results • Evaluate scheduler efficacy • Evaluate impact of dynamic resource availability on scheduler efficacy • Set 2 – Tunability Results • Evaluate usefulness of tunability • Simulation • Number of experiments • Repeatability

NCMIR Grid • Case Study: • week of traces: May 19 – 26, 2001 • CPU availability (NWS) • Bandwidth (NWS) • Node availability (Maui scheduler showbf)

Scheduling Strategies • 4 scheduling strategies

Simtomo • Simulates an execution of on-line parallel tomography • Uses Simgrid - Casanova [CCGrid’2001] • toolkit for evaluating scheduling algorithms • tasks • resources modeled using traces • E.g. Parameter sweep applications [HCW’00] • 2 types of simulations • Executed at 10 minute intervals • 1004 simulations x 4 schedulers

Simulation Types Real trace 1 1 1 1 1 0 0 0 0 0 1. Partially trace-driven (perfect load predictions) 2 1 3 2. Completely trace-driven (imperfect load predictions) 2. Completely trace-driven (imperfect load predictions) 2. Completely trace-driven (imperfect load predictions) 2 1 3 2 1 3

Performance Metric • Relative refresh lateness expected refresh period (based on r) actual refresh period relative refresh lateness

Scheduling Results (1)(partially trace-driven) May 19-26, 2001 98% Importance of dynamic bandwidth info

Scheduling Results (2)(Completely trace-driven) May 19-26, 2001 57.1% Student Version of MATLAB

Tunability Results • How often does the pair change (i.e., tune) • Assume a single user model where user always chooses pair with lowest f • Find the best pairs throughout simulated week • Snapshot of Monday May 21st • On average, pair changed 25% of the time (2,2) (3,2) (2,2) (3,1) 8:00 9:00 10:00 11:00

Summary • Tunable on-line parallel tomography at NCMIR • Dynamic resource information improves scheduler efficacy • Dynamic bandwidth information is key • Case for tunability in a Grid environment

Future Work • Introduce cost • another tunable parameter: (f, r, $) • More Grid simulations • Traces from various sites across US and Europe • Generalizing to other applications • Rescheduling • Production use at NCMIR

Parallel Tomography at NCMIR projection scanline • Embarrassingly parallel Z specimen slice X Y projection scanline

Scheduling Latency • Time to search for feasible triples 1k x 1k 2k x 2k

Applying Scheduling and Tuning to On-line Parallel Tomography

Applying Scheduling and Tuning to On-line Parallel Tomography

Presentation Transcript

Parallel Application Memory Scheduling

Parallel Line and Angles

AUTOMATICALLY TUNING PARALLEL AND PARALLELIZED PROGRAMS

On-line Parallel Tomography

Resource augmentation and on-line scheduling on multiprocessors

Automatic Tuning for Parallel FFTs

ON LINE SCHEDULING

Scheduling of parallel processes

On-line adaptive parallel prefix computation

Scheduling Generic Parallel Applications –Meta-scheduling

Parallel Machine Scheduling

NAMD Parallel Performance on Ranger: MPI Tuning

Parallel Tomography

On-line adaptative parallel prefix computation

Processor-oblivious parallel algorithms and scheduling Illustration on parallel prefix

On-line Scheduling

Scheduling on Parallel Systems

Parallel Line Angles

Parallel Computing Explained Parallel Code Tuning

Parallel Tomography

Scheduling on Parallel Systems

Patch Scheduling for On-line Games