660 likes | 906 Views
Kendo: Efficient Deterministic Multithreading in Software. Marek Olszewski Jason Ansel Saman Amarasinghe Commit Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Example. Simple OpenMP Parallel Code:. double inv_sum = 0.0;
E N D
Kendo: Efficient Deterministic Multithreading in Software MarekOlszewski Jason Ansel SamanAmarasinghe Commit Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
Example • Simple OpenMP Parallel Code: double inv_sum = 0.0; #pragmaomp parallel for reduction(+: sum) for (inti = 1; i < 10000000; i++) inv_sum += 1.0 / i; printf(“inv_sum: %.64g\n”, inv_sum);
Example • Simple OpenMP Parallel Code: double inv_sum = 0.0; #pragmaomp parallel for reduction(+: sum) for (inti = 1; i < 10000000; i++) inv_sum += 1.0 / i; printf(“inv_sum: %.64g\n”, inv_sum);
Example • Simple OpenMP Parallel Code: Threads double inv_sum = 0.0; #pragmaomp parallel for reduction(+: sum) for (inti = 1; i < 10000000; i++) inv_sum += 1.0 / i; printf(“inv_sum: %.64g\n”, inv_sum); Critical section Reduction Reduction Run 1: inv_sum: 16.69531126586006308798459940589964389801025390625 Run 2: inv_sum: 16.695311265860066640698278206400573253631591796875 Run 2: inv_sum: 16.695311265860066640698278206400573253631591796875
Another Example Threads • Common parallel programming paradigm: • Radiosity (Singh et al. 1994) • LocusRoute (Rose 1988) • Delaunay Triangulation (Kulkarni et al. 2008) Global State Critical section data data data Non-commutative updates data data data Global data structure Lock Lock Lock Lock Locks Threads perform repeated well-synchronized updates to global state
Another Example Threads • Non-deterministic internal states and output • Difficult to eliminate using today’s programming idioms Global State Critical section data data data data Non-commutative updates data data data data Global data structure Lock Lock Lock Lock Locks
Non-Determinism • Hard to create programs with repeatable results • Determinism is often part of program specifications, eg: • Don’t want a verilog compiler to generate different circuits every time • Multi-threaded replicas in fault-tolerant systems must be deterministic ? ? ? ? ? ? ? ?
Non-Determinism • Hard to create programs with repeatable results • Determinism is often part of program specifications, eg: • Don’t want a verilog compiler to generate different circuits every time • Multi-threaded replicas in fault-tolerant systems must be deterministic • Debugging becomes more difficult • Heisenbugs • Difficult to perform cyclic debugging • Common debugging method used for sequential programs • Testing offers weak guarantees • Will the code pass the test again? ? ? ? ? ? ? ? ?
Deterministic Execution Model • Non-determinism causes many problems • Why do we put up with it? • Present parallel programmer with deterministic execution model • Interleave critical sections deterministically • Only allow one interleaving • Find a good interleaving that preserves the parallel performance
Token Algorithm Thread 1 Thread 2 Thread Progress Threads racing to acquire Lock A
Token Algorithm Thread 1 Thread 2 Thread Progress Threads racing to acquire Lock A
Token Algorithm Thread 1 Thread 2 Thread Progress Threads racing to acquire Lock A
Token Algorithm Thread 1 Thread 2 Thread Progress
Token Algorithm Thread 1 Thread 2 Thread Progress det_lock(A)
Token Algorithm Thread 1 Thread 2 Token Thread Progress det_lock(A) wait_for_token() lock(A) pass_token()
Token Algorithm Thread 1 Thread 2 Token Thread Progress det_lock(A) wait_for_token() lock(A) pass_token()
Token Algorithm Thread 1 Thread 2 Token Thread Progress det_lock(A) wait_for_token() lock(A) pass_token()
Token Algorithm Thread 1 Thread 2 Token Thread Progress det_lock(A) det_lock(A) wait_for_token() lock(A) pass_token() wait_for_token() lock(A) pass_token()
Token Algorithm Thread 1 Thread 2 Token Thread Progress det_lock(A) det_lock(A) wait_for_token() lock(A) pass_token() wait_for_token() lock(A) pass_token()
Token Algorithm Thread 1 Thread 2 Token Thread Progress det_lock(A) det_lock(A) wait_for_token() lock(A) pass_token() wait_for_token() lock(A) pass_token()
Token Algorithm Thread 1 Thread 2 Token Thread Progress det_lock(A) det_lock(A) wait_for_token() lock(A) pass_token()
Token Algorithm Thread 1 Thread 2 Token Thread Progress det_lock(A) det_lock(A) wait_for_token() lock(A) pass_token()
Token Algorithm Thread 1 Thread 2 Token Thread Progress det_lock(A) wait_for_token() lock(A) pass_token() det_unlock(A) Guarantees that thread 1 will always acquire lock before thread 2
Token Algorithm Thread 1 Thread 2 Token Thread Progress • Load imbalance! • Allow threads to pass token outside of critical sections • High overhead! • Too much serialization!
What Do We Need? • Method of tracking thread progress • Must be deterministic • Must match true progress of thread in physical time as close as possible • Must be cheap to compute • Ability to pass the token in advance (before it is received) • Decouples threads
Logical Time Algorithm • Each thread keeps a low overhead counter called its “Logical Clock” • Incremented often, and in a way that tries to match progress of thread as close as possible • Clocks collectively create a notion of “logical time” • Abstract counterpart to physical time • Threads take turns holding a “virtual token” • Thread’s turn when its clock is a global minimum • Thread passes the “virtual token” by incrementing its clock • Does not have to wait for its turn • Allows threads to execute asynchronously while outside of critical sections
Logical Time Algorithm Thread 1 Thread 2 t=3 t=3 Physical Time Deterministic Logical Time Threads racing to acquire Lock A
Logical Time Algorithm Thread 1 Thread 2 t=6 Physical Time t=8 Deterministic Logical Time Threads racing to acquire Lock A
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=8 det_lock(A) t=18 Deterministic Logical Time
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=8 det_lock(A) t=18 Deterministic Logical Time wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=16 det_lock(A) t=18 Deterministic Logical Time wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time det_lock(A) t=18 t=20 Deterministic Logical Time wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=20 t=22 Deterministic Logical Time
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=22 det_lock(A) Deterministic Logical Time t=24 wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=22 det_lock(A) Deterministic Logical Time t=24 wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=22 det_lock(A) Deterministic Logical Time det_unlock(A) t=26 wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=24 Deterministic Logical Time t=29
Logical Time Algorithm Thread 1 Thread 2 t=3 t=3 Physical Time Deterministic Logical Time Threads racing to acquire Lock A
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=6 t=8 Deterministic Logical Time Threads racing to acquire Lock A
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=8 t=18 Deterministic Logical Time Threads racing to acquire Lock A
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=16 t=22 det_lock(A) Deterministic Logical Time wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=16 t=22 det_lock(A) Deterministic Logical Time wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time det_lock(A) t=18 t=22 det_lock(A) Deterministic Logical Time wait_for_turn(); lock(A); wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time det_lock(A) t=18 t=22 det_lock(A) Deterministic Logical Time wait_for_turn(); lock(A); wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time det_lock(A) t=18 t=22 det_lock(A) Deterministic Logical Time wait_for_turn(); lock(A); wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=20 t=22 det_lock(A) Deterministic Logical Time wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=22 det_lock(A) t=23 Deterministic Logical Time wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=22 det_lock(A) t=23 Deterministic Logical Time wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=22 det_lock(A) Deterministic Logical Time det_unlock(A) t=26 wait_for_turn(); lock(A);
Logical Time Algorithm Thread 1 Thread 2 Physical Time t=24 Deterministic Logical Time t=29 Guarantees that thread 1 will always acquire lock before thread 2