220 likes | 252 Views
Fast Paths in Concurrent Programs. Wen Xu, Princeton University Sanjeev Kumar , Intel Labs . Kai Li, Princeton University. Message-Passing Style Processes & Channels E.g. Streaming Languages. Concurrent Programs. Processor 1. Processor 1. P1. C1. C3. Processor 2.
E N D
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs . Kai Li, Princeton University
Message-Passing Style Processes & Channels E.g. Streaming Languages Concurrent Programs Processor 1 Processor 1 P1 C1 C3 Processor 2 P2 P3 C2 P4 • Uniprocessors • Programming Convenience • Embedded devices • Network Software Stack • Media Processing • Multiprocessors • Exploit parallelism • Partition Processes Problem: Compile a concurrent program to run efficiently on a Uniprocessor Fast Paths in Concurrent Programs
Process-based Approach Keep processes separate Context Switch between the processes Small executable Sum of Processes Significant overhead Automata-based Approach Treat each process as a state machine Combine the state machines Small Overhead Large Executables Potentially Exponential Compiling Concurrent Programs • One Study Compared the two approaches and found: • Compared to Process-based approach, the Automata-based Approach generates code that is • Twice as fast • 2-3 Orders of magnitude larger executable • Neither approach is satisfactory Fast Paths in Concurrent Programs
Our Work • Our Goal: Compile Concurrent Programs • Automated using a Compiler • Low Overhead • Small Executable Size • Our Approach: Combine the two approaches • Use process-based approach to handle all cases • Use automata-based approach to speed up the common cases Fast Paths in Concurrent Programs
Outline • Motivation • Fast Paths • Fast Paths in Concurrent Programs • Experimental Evaluation • Conclusions Fast Paths in Concurrent Programs
Fast Paths • Path: A dynamic execution path in the program • Fast Path or Hot Path: Well-known technique • Commonly-executed Paths (Hot Path) • Specialize and Optimize (Fast Path) • Two components • Predicate that specifies the fast path • Optimized code to execute the fast path • Compilers can be used to automate it • Mostly in sequential Programs Fast Paths in Concurrent Programs
Manually implementing Fast Paths • To achieve good performance in Concurrent programs • Start: Insert code that identifies the common case and transfer control to fast path code • Extract and optimize fast path code manually • Finish: Patch up state and return control at the end of fast path • Obvious drawbacks • Difficult to implement correctly • Difficult to maintain Fast Paths in Concurrent Programs
Outline • Motivation • Fast Paths • Fast Paths in Concurrent Programs • Experimental Evaluation • Conclusions Fast Paths in Concurrent Programs
Our Approach Baseline (Process-based) Fast Path (Automata-based) 1 Optimized Code Test 2 a = b; b = c * d; d = 0; if (c > 0) c++; a = c; b = c * d; d = 3; if (c > 0) c++; Abort? 3 Fast Paths in Concurrent Programs
Multiple processes Concurrent Program Regular expressions Statements Conditions (Optional) Synchronization (Optional) Support early abort Advantages Powerful Compact Hint Specifying Fast Paths fastpath example { process first { statement A, B, C, D, #1; start A ? (size<100); follows B ( C D )*; exit #1; } process second { ... } process third { ... } } Fast Paths in Concurrent Programs
Extracting Fast Paths • Automata-based approach to extract fast paths • A Fast Path involves a group of processes • Compiler keeps track of the execution point for each of the involved processes • On exit, control is returned to the appropriate location in each of the processes Baseline: Concurrent. Fast Path: Sequential Code • Fairness on Fast Path • Embed scheduling decisions in the fast path • Avoid scheduling/fairness overhead on the fast path • Rely on baseline code for fairness • Always taken a fraction of the time Fast Paths in Concurrent Programs
Optimization on Fast Path • Enabling Traditional Fast Paths • Generate and Optimize baseline code • Generate Fast path code • Fast Paths have exit/entry points to baseline code • Use data-flow information from baseline code at the exit/entry point to start analysis and optimize the fast path code • Speeding up fast path using lazy execution • Delay operations that are not needed when fast paths are executed to the end • Such operations can be performed if the fast path is aborted Fast Paths in Concurrent Programs
Outline • Motivation • Fast Paths • Fast Paths in Concurrent Programs • Experimental Evaluation • Conclusions Fast Paths in Concurrent Programs
Experimental Evaluation • Implemented the techniques in the paper • In ESP Compiler • Supports concurrent programs • Two class of programs • Filter Programs • VMMC Firmware • Answer three questions • Programming effort (annotation complexity) needed • Size of the executable • Performance Fast Paths in Concurrent Programs
Filter Programs P1 C1 P2 C2 P3 C3 P4 • Well-defined structure • Streaming applications • Use Filter Programs by Probsting et al. • Good to evaluate our technique • Concurrency overheads dominate • Experimental Setup • 2.66 GHz Pentium 4, 1 GB Memory, Linux 2.4 • 4 Versions of the code • Annotation Complexity • Program sizes: 153, 125, 190, 196 lines • Annotation sizes: 7, 7, 10, 10 lines Fast Paths in Concurrent Programs
Filter Programs Cont’d 5.53 5.15 Executable Size Program 2 Program 3 Program 4 Program 1 28.33 23.52 4.17 9.47 Better Performance than Both Performance Relatively Small Executable Fast Paths in Concurrent Programs
VMMC Firmware • Firmware for a gigabit network (Myrinet) • Experimental Setup • Measure network performance between two machines connected with Myrinet • Latency & Bandwidth • 3 Versions of the firmware • Concurrent C version with Manual Fast Paths • Process-based code without Fast Paths • Process-based code with Compiler-extracted Fast Paths • Annotation Complexity (3 fast paths) • Fast Path Specification: 20, 14, and 18 lines • Manual Fast Paths in C: 1100 lines total Fast Paths in Concurrent Programs
VMMC Firmware Cont’d Performance: Latency Generated Code Size Assembly Instructions s Message size (in Bytes) Fast Paths in Concurrent Programs
Outline • Motivation • Fast Paths • Fast Paths in Concurrent Programs • Experimental Evaluation • Conclusions Fast Paths in Concurrent Programs
Conclusions • Fast Paths in Concurrent Programs • Evaluated using Filter programs and VMMC firmware • Process-based approach to handle all cases • Keeps executable size reasonable • Automata-based approach to handle only the common cases (Fast Path) • Avoid high overhead of process-based approach • Often outperforms the automata-based code Fast Paths in Concurrent Programs
ABCDEF Abcdef Ghijk ABCDEF Abcdef Ghijk Abcdef Ghijk Abcdef Ghijk Abcdef Ghijk ABCDEF Abcdef Ghijk Fast Paths in Concurrent Programs