420 likes | 607 Views
Assisting technologies for program parallelization. Chikayama/Taura Lab. Masakazu HAYATSU hayatsu@logos.t.u-tokyo.ac.jp. Agenda. Introduction Difficulty of Program Parallelization Assistant Tools for Program Parallelization SUIF Explorer S-Check Ursa Minor Conclusion. Introduction.
E N D
Assisting technologies for program parallelization Chikayama/Taura Lab. Masakazu HAYATSU hayatsu@logos.t.u-tokyo.ac.jp
Agenda • Introduction • Difficulty of Program Parallelization • Assistant Tools for Program Parallelization • SUIF Explorer • S-Check • Ursa Minor • Conclusion
Introduction • Popularization of parallel computer • Commercial computer with very large # of processor • Low-end PC with 2-4 processor • Performance • Progress of speedup of uni-processor is getting sluggish ⇒Importance of a parallel program is increasing further
Difficulty of Program Parallelization • Dependency • dead lock • data race • Avoid these problem A A B B 100 100? 1? X 1
× ? Automatic Parallelization • Low performance • Parallelization technique is fragile • Knowledge out of code is often required : for(i=0; i<N; i++){ a[f(i)] = 0; //A a[g(i)] = 1; // B } :
Development Process Design & Improve Model Manually Optimizing Program Run Data Race, Dead Lock … ○ Speedup Evaluation Validity Check × Done Finding Problems
(define (RayTracing ViewPoint Vscan nref energy rgb) (if (<= nref 4) (let ((crashed? (tracer ViewPoint Vscan))) ;crashed? (if (and (not crashed?) (!= nref 0)) (let* ((hl0 (fcsyn (f+ (f* (vector-ref Vscan 0) (vector-ref Light 0)) (f* (vector-ref Vscan 1) (vector-ref Light 1)) (f* (vector-ref Vscan 2) (vector-ref Light 2))))) (hl (if (f< hl0 0.0) 0.0 hl0)) (ihl (f* hl hl hl energy (car beam)))) (begin (vector-set! rgb 0 (f+ (vector-ref rgb 0) ihl)) (vector-set! rgb 1 (f+ (vector-ref rgb 1) ihl)) (vector-set! rgb 2 (f+ (vector-ref rgb 2) ihl))))) (if crashed? (let* ((P (cdr crashed?)) ;intersection point (m (car crashed?)) ;crashed object (NV (Get-NVector m Vscan P))) (let* ((br (fcsyn (f+ (f* (vector-ref NV 0) (vector-ref Light 0)) (f* (vector-ref NV 1) (vector-ref Light 1)) (f* (vector-ref NV 2) (vector-ref Light 2))))) (br1 (if (f< br 0.0) 0.0 br)) (bright (if (and (car sh) (Shadow-Check-One-Or-Matrix (car or-Net) P)) 0.0 (f* (f+ br1 0.2) energy (vector-ref m 11))))) (begin(utexture m P) (vector-set! rgb 0 (f+ (vector-ref rgb 0) (f* bright (vector-ref m 13)))) (vector-set! rgb 1 (f+ (vector-ref rgb 1) (f* bright (vector-ref m 14)))) (vector-set! rgb 2 (f+ (vector-ref rgb 2) (f* bright (vector-ref m 15)))) Problem of Manual Parallelization • User must fully understand many lines of code It is prone tocause an error
Important factor for assistant tool • Assist for program parallelization • Combine the benefit of automatic/manual • automatic:can extract information by the numbers • manual:can use high level information • Extract information, and highlight important information
Candidate for parallelization ( 0R-05-01, 0R-05-02, 0R-05-03 ) ( 0R-0e-01, 0R-0e-02 ) ( 0R-0t-02, 0R-0t-03 ) ( 0R-0w-01, 0R-0w-02 ) Extraction of parallelism ;; quick : v— array to be sorted left,right— renge for sort (define (quick v left right) (if (>= left right) v (let ((new-left left) (new-right right) (pivot (vector-ref v (floor (/ (+ left right) 2))))) (do () ((> new-left new-right)) (do () ((>= (vector-ref v new-left) pivot)) (set! new-left (+ new-left 1))) (do () ((<= (vector-ref v new-right) pivot)) (set! new-right (- new-right 1))) (if (<= new-left new-right) (begin (swap v new-left new-right) (set! new-left (+ new-left 1)) (set! new-right (- new-right 1))))) (begin (quick v left new-right) (quick v new-left right))))) (quick #(4 5 3 1 4 0 5 6 ) 0 7) ;; quick : v— array to be sorted left,right— range for sort (define (quick v left right) (if (>= left right) v (let ((new-left left) (new-right right) (pivot (vector-ref v (floor (/ (+ left right) 2))))) (do () ((> new-left new-right)) (do () ((>= (vector-ref v new-left) pivot)) (set! new-left (+ new-left 1))) (do () ((<= (vector-ref v new-right) pivot)) (set! new-right (- new-right 1))) (if (<= new-left new-right) (begin (swap v new-left new-right) (set! new-left (+ new-left 1)) (set! new-right (-new-right 1))))) (begin (quick v left new-right) (quick v new-left right))))) (quick #(4 5 3 1 4 0 5 6 ) 0 7)
notice • Different approach • Our work: based on dependency analysis • Today’s survey: based on profile data • Profile data? • Isn't it enough if execution time is known?
Difficulty in Tuning a Parallel Program (1/2) 100 parallel region 10% • Coverage • Percentage of total execution time spent in the parallel regions • Amdahl’s law • Granularity • Average length of computation between synchronizations • Overhead of communication, synchronization
Difficulty in Tuning a Parallel Program (2/2) Top resource-using code segment • Critical Path Simple consumption of resources does not mean that there is a corresponding potential for improvement
Assistant Tool for Program Parallelization • SUIF Explorer • Coverage and Granularity • S-Check • Effect of change on allover performance • Ursa Minor • Experienced programmer's knowledge
Assistant Tool for Program Parallelization • SUIF Explorer • Coverage and Granularity • S-Check • Effect of change on allover performance • Ursa Minor • Experienced programmer's knowledge
SUIF Explorer [Liao, et al 1999] • Objective • Identify the important loops • Rules of thumb • Most of a program’s execution time is spent on a small percentage of the code • Most of a program’s execution time is spent on loops
The SUIF Explorer System Sequential Program 2.Collecting profile & dynamic dependences Parallelizing Compiler 1. Automatic parallelization Execution Analyzers Parallelization Guru 3.Guidance to improving program performance Rivet Visualizer User
The Parallelization Guru (1/2) • Parallelization guidance • The coverage and granularity • Updates the information as new loops are parallelized • A list of loops to parallelize • Sorted in order of execution time • Have no I/O and are not nested under some parallel loops • Dependence information on each loop
The Parallelization Guru (2/2) • User interaction • Starts with the loop at the top of the list • If (loop have many dependence) user don’t choose to attempt • else User then determines • if the static dependence can be ignored • if an array can be privatized …etc. • using program slice
program slice contribute to the value
The Parallelization Guru • Comment • Performance data & Dependency information are related closely ⇒ it cut down development cost • It is applicable only to loops
Assistant Tool for Program Parallelization • SUIF Explorer • Coverage and Granularity • S-Check • Effect of change on allover performance • Ursa Minor • Experienced programmer's knowledge
S-Check [Snelick 1997] • Objective • Identify the parts of the program that changes to them will significantly improve overall performance • Effect prediction • Determine the effect of changes in the code without actually making the changes
Sensitive Checker • Insert “delay” into segments of a parallel program, calculate sensitivity to perturbation • Assumption • A program code segment ishighly sensitive to slight perturbations ⇒ comparable segment improvements will boost performance correspondingly
Program Model • Code = Transfer Function • Taylor expansion • βj := indicating how sensitive execution is • βi,j := interactions between code
・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(1) ・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(0) ・・・・・・ delay(1) ・・・・・・ delay(0) ・・・・・・ delay(1) ・・・・・・ delay(1) ・・・・・・ delay(0) ・・・ original parallel program while(x>y){ // A delay(a); } delay(b); send(…); // B ・・・・・・ do_computation{delay(c); …}; // C while(x>y){ } send(…); ・・・・・・ do_computation{…}; Insert delays 1:ON / 0:OFF Mark possible bottlenecks // A // B // C Generate & Run numerous versions of program Effects Source 0.44 A 4.54 B 0.07 AB 1.21 C 0.02 BC 0.34 AC 0.00 ABC Analyze Results Solve for Effects
UserInteract (1/3) • Test code locations are selected manually or automatically • Information provided from profiler • programming constructs (ex. while, for) • certain library function call (ex. barrier(), send())
User Interact(2/3) • Set the parameter • delay perturbation patterns • delay value • Trade off (info vs # of run)
UserInteract(3/3) • Higher effect code is more likely to be a bottleneck • Dependency is not dealt with
S-Check • Comment • Identify the program segment linking directly to a performance • Knowledge about the program is required in order to mark possible bottlenecks • code size get bigger, sensitivity test take longer time • Dependence information is not available
Assistant Tool for Program Parallelization • SUIF Explorer • Coverage and Granularity • S-Check • Effect of change on allover performance • Ursa Minor • Knowledge of experienced programmer's
Ursa Minor [Kim, et al. 2000] • Objective • × stop at pointing to problematic code〇 present with possible causes and solutions • Transfer knowledge to novice programmer from experienced programmer
UrsaMinor System Import/Export Data files from Polaris or other Parallel Program Static Data Dynamic Data Merlin Performance Adviser Database Database Manager Store analyzed data, Map file, etc. GUI Manager Analyze problem Suggest solution Table View Structure View User
Merlin Performance Advisor • Knowledge database • knowledge on diagnosis and solutions • Transfer programming experience from experts to new users (with “MAP” file) • Performance model • Architecture … etc.
Merlin Symptom ⇒ Diagnostic Suggestions
Advisor Map (1/2) • Advisor Map • Problem Domain • General performance problems from the viewpoint of programmers • Diagnostics Domain • Possible causes of these problems • Solution Group • Possible remedies
Expression Evaluator • Basic Spreadsheet Operations • Numeric Functions: NEG, ADD, SPDUP, PERCO, ARVG, etc. • Relational Functions: EQ, NE, etc. • Query Functions: PARALLEL, HASIO, HASCALL, HASDEP, etc. • Logical Functions: AND, OR, etc.
Merlin • Comment • The idea which progressed further rather than indication of a bottleneck • Who write the “MAP”? • The effect of this technology depends on quality of the MAP
Comparison • SUIF Explorer vs. S-Check • No configuration, dependence information • Efficiency? • Two vs. Ursa Minor • Practical • Not kind to beginners
Conclusion • Several approach to guide the user with smart information • Future work • Integration • Profiler and Dependence Analyzer • Portability • Different architecture, OS, performance