360 likes | 484 Views
Sound and Precise Analysis of Parallel Programs through Schedule Specialization. Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University. Motivation. Analyzing parallel programs is difficult. . precision. Total Schedules. Dynamic Analysis. Analyzed Schedules. ?.
E N D
Sound and Precise Analysis ofParallel Programs throughSchedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University
Motivation • Analyzing parallel programs is difficult. precision Total Schedules Dynamic Analysis Analyzed Schedules ? Static Analysis Analyzed Schedules soundness (# of analyzed schedules / # of total schedules)
Schedule Specialization • Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime. Total Schedules precision Dynamic Analysis Schedule Specialization Enforced Schedules Static Analysis Analyzed Schedules soundness (# of analyzed schedules / # of total schedules)
Enforcing Schedules Using Peregrine • Deterministic multithreading • e.g. DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11) • Performance overhead • e.g. Kendo: 16%, Tern & Peregrine: 39.1% • Peregrine • Record schedules, and reuse them on a wide range of inputs. • Represent schedules explicitly.
Schedule Specialization • Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime. precision Dynamic Analysis Schedule Specialization Enforced Schedules Static Analysis Analyzed Schedules soundness (# of analyzed schedules / # of total schedules)
Framework • Extract control flow and data flow enforced by a set of schedules Schedule Specialization Specialized Program Program C/C++ program with Pthread Extra def-use chains Schedule Total order of synchronizations
Outline • Example • Control-Flow Specialization • Data-Flow Specialization • Results • Conclusion
Running Example int results[p_max]; intglobal_id = 0; intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } void *worker(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } Thread 1 Thread 0 Thread 2 create create lock Race-free? unlock lock unlock join join 8
Control-Flow Specialization atoi intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i < p ++i create i = 0 i < p ++i create join create return join join 9
Control-Flow Specialization atoi atoi intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i = 0 i < p ++i i < p create create i = 0 i < p ++i create join create return join join 10
Control-Flow Specialization atoi atoi intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i = 0 i < p ++i i < p create create create i = 0 ++i i < p ++i i < p create join create create return join join 11
Control-Flow Specialization atoi atoi i < p intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i = 0 i = 0 i < p ++i i < p i < p create join create i = 0 ++i ++i i < p ++i i < p i < p create join create join create return ++i ++i join i < p join return 12
Control-Flow Specialized Program atoi i < p intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); i = 0; // i < p == true pthread_create(&child[i], 0, worker.clone1, 0); ++i; // i < p == true pthread_create(&child[i], 0, worker.clone2, 0); ++i; // i < p == false i = 0; // i < p == true pthread_join(child[i], 0); ++i; // i < p == true pthread_join(child[i], 0); ++i; // i < p == false return 0; } i = 0 i = 0 i < p i < p join create ++i ++i i < p i < p create join ++i ++i i < p return
More Challenges onControl-Flow Specialization • Ambiguity Caller Callee S1 call call S2 ret • A schedule has too many synchronizations
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = global_id global_id++ unlock lock my_id = global_id global_id++ unlock join join 15
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = global_id global_id++ unlock lock my_id = global_id global_id++ unlock join join 16
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = 0 global_id= 1 unlock lock my_id = global_id global_id++ unlock join join 17
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = 0 global_id= 1 unlock lock my_id = 1 global_id = 2 unlock join join 18
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id = 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 1; pthread_mutex_unlock(&global_id_lock); results[0] = compute(0); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 2; pthread_mutex_unlock(&global_id_lock); results[1] = compute(1); return 0; } global_id = 0 create create lock my_id = 0 global_id= 1 unlock lock my_id = 1 global_id = 2 unlock join join 19
More Challenges onData-Flow Specialization • Must/May alias analysis • global_id • Reasoning about integers • results[0] = compute(0) • results[1] = compute(1) • Many def-use chains
Evaluation • Applications • Static race detector • Alias analyzer • Path slicer • Programs • PBZip2 1.1.5 • aget 0.4.1 • 8 programs in SPLASH2 • 7 programs in PARSEC
Static Race Detector # of False Positives
Static Race Detector # of False Positives
Static Race Detector # of False Positives
Static Race Detector # of False Positives
Static Race Detector: Harmful Races Detected • 4 in aget • 2 in radix • 1 in fft
Conclusion and Future Work • Designed and implemented schedule specialization framework • Analyzes the program over a small set of schedules • Enforces these schedules at runtime • Built and evaluated three applications • Easy to use • Precise • Future work • More applications • Similar specialization ideas on sequential programs
Related Work • Program analysis for parallel programs • Chord (PLDI ’06), RADAR (PLDI ’08), FastTrack (PLDI ’09) • Slicing • Horgon (PLDI ’90), Bouncer (SOSP ’07), Jhala (PLDI ’05), Weiser (PhD thesis), Zhang (PLDI ’04) • Deterministic multithreading • DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11) • Program specialization • Consel (POPL ’93), Gluck (ISPL ’95), Jørgensen (POPL ’92), Nirkhe (POPL ’92), Reps (PDSPE ’96)
Handling Races • We do not assume data-race freedom. • We could if our only goal is optimization.
Input Coverage • Use runtime verification for the inputs not covered • A small set of schedules can cover a wide range of inputs