200 likes | 304 Views
Framework for Profile-Analysis Data-Layout Optimizations. Shai Rubin. Ras Bodik. Trishul Chilimbi. University of Wisconsin. University of Wisconsin. Microsoft Research. DL Optimization. Data Layout Optimization (What). References sequence: A.x, B, A.z. Original data layout.
E N D
Framework for Profile-Analysis Data-Layout Optimizations Shai Rubin Ras Bodik Trishul Chilimbi University of Wisconsin University of Wisconsin Microsoft Research
DL Optimization Data Layout Optimization (What) References sequence: A.x, B, A.z Original data layout Modified data layout CPU cache blocks cache blocks 1 cycle 4 4 Cache B B 3 3 2 2 A.z 1 A.x 1 A.x A.z A.x A.z time 102 cycles time A.x B A.z A.x B A.z A.x B A.z A.x B A.z Memory Pages Memory Pages Memory B 2 2 A A A A A 1 1 B B B 106 cycles time A.x B A.z A.x B A.z time A.x B A.z A.xBA.z DL optimization: increase spatial locality of data to prevent memory faults. Disk
Optimal Layout “Good” Layout Data Layout Optimization (How) Layout Space Reference Summary Data Layout Optimizer Optimal for simple loops Heuristic Array Dep. Analysis (static) Ref. Trace (dynamic) Data Layout Enforce layout Program′ Program Compile Time 1. Compile Time 2. Runtime Scientific (array based) General purpose (pointer based)
Problems with Current Data-Layout Optimization • Computationally hard to find the optimal layout [Petrank]. • Computationally hard to approximate the optimal layout [Petrank]. • Implication - heuristics are not robust: • will not work for all programs. • From our experience with heuristics: • Field Reordering [Chilimbi PLDI’99] – no improvement (on perl). • Custom Memory Allocator [Seidl ASPLOS’98] degrades performance (on espresso). • Our approach: replace heuristic with feedback-driven search.
Current program data layout “Good” + “easy” to enforce layouts Optimal data layout Searching For a Data Layout • Problem: Perform a search in the data layout space. • Look for: Data Layout Space • a “good” layout. “Good” Layouts • an “easy” to enforce layout. • Search advantage: • Robust, for each program finds a “good” layout.
Is Search Practical? • Not clear: Possible layouts Reference Trace Optimizer (Heuristic) Data Layout Enforce Enforce layout Edit Compile Execute Evaluate Continue? End
Outline • Background and Problem Definition • Search is a solution, but may not practical • Making the search practical • Applications • Summary
Field Reordering Linearization Class Splitting “good “and enforceable layouts Framework for Data Layout Optimization Making the Search Practical Data Object AnalysisDOA(CST,LS)NLS Data Layout Search Engine Reference Trace Layout Space Trace T Narrowed Space Trace Benefit Layout SelectorLS(NLS,B,CST,SS)DL Compress(T)CST Search Strategy T Search Strategy Compressed Symbolic Trace Data Layout Enforce Layout AL(DL,CST)NT Evaluate Simulate(NT)B Continue(B) Continue? Edit Compile Execute Evaluate New Trace T Benefit End
Trace Representation • Problem: reference trace cannot be easily manipulated since it is too large (>10GB, >100M references). • Solution: compressed trace (using modified SEQUITUR). • Example: SEQUITUR Representation SacBBBAAe Bbc ACC Cbd • Trace: acbcbcbcbdbdbdbde • Representation advantage: • Compact; fits into main memory [ChilimbiPLDI’01]. • Expose repetitions (we use this later). • It produces a symbolic trace (i.e., a terminal is a data object).
Field Reordering Linearization Class Splitting “good “and enforceable layouts Framework for Data-Layout Optimization Data Object AnalysisDOA(CST,LS)NLS Data Layout Search Engine Reference Trace Layout Space Narrowed Space Trace Benefit Layout SelectorLS(NLS,B,CST,SS)DL Compress(T)CST Search Strategy Search Strategy Compressed Symbolic Trace Data Layout Enforce Layout EL(DL,CST)CST’ Evaluate Simulate(NT)B Continue(B) Continue? Compile New Trace Benefit End
Avoid re-compilation • Problem: data layout evaluation (edit+compilation+simulation). • Solution: “pretend” that the program was edited and compiled. • Symbolic trace + data layout concrete address trace. A.x10 A.z14 B20 A.x30 A.z34 B20 User (Optimizer) Compile Run (simulate) Edit program Simulate Enforce Layout 30,20,34,20 30,20,34,20 New concrete trace A.x, B, A.z, B Single symbolic trace • Simple, but crucial for an efficient search.
Field Reordering Linearization Class Splitting “good “and enforceable layouts Framework for Data-Layout Optimization Data Object AnalysisDOA(CST,LS)NLS Data Layout Search Engine Reference Trace Layout Space Narrowed Space Trace Benefit Layout SelectorLS(NLS,B,CST,SS)DL Compress(T)CST Search Strategy Search Strategy Compressed Symbolic Trace Data Layout Enforce Layout EL(DL,CST)CST’ Evaluate Simulate(CST’)B Continue(B) Continue? Compile New Trace Benefit End
Memoization: Efficient Trace Simulation • Evaluation using simulation: MissRateT=Simulate(T); • Problem: simulation of the whole trace (T) is too expensive. • Solution: avoids re-simulation of repeated sub-traces. • Memoization: • Simulate each “low level” rule, compute its memoization value. • For cache simulation: memoization value = CacheState [CS]. • Recursively compose memoization values for “higher” rules. SEQUITUR Representation SBBBAA Bbc ACC Cbd CSC=Simulate′(C) CSB=Simulate′(B) CSA = CSCCSC CSS = CSBCSBCSBCSACSA T: bcbcbcbdbdbdbd MissRateT =
Outline • Background and Problem Definition • Search is a solution, but maybe not feasible • Making the search practical: • Trace representation • Avoid recompilation • Efficient simulation • Applications • Summary
Framework Application (1) • Application: an implementation of the framework that searches in a sub-space of the layout space. • Field Reordering: • Objective: reduce number of cache misses. • Sub-space: all possible (legal) orders of fields in (heap) objects. • Our search strategy: (almost) exhaustive search.
Field Reordering: Exhaustive Search • We compared: • Best field order found by our iterative search. • Field orders produced by existing heuristics: • Fields Temporal Affinity [ChilimbiPLDI’99] • Fields Access Frequency [TruongPACT’98]. Runtime improvement: 0%-4.5%.
Custom Memory Allocator (CMA) • Objective: reduce number of page faults. Allocator 2 Allocator 1 Reference trace: ABABA address address Page 2 Page 2 B B Page 1 Page 1 A A A A B A B A time time Poor locality Good locality • CMA can work well if it has a good placement function: • assigns dynamically allocated heap objects to memory pages (heaps).
Size size24 size<24 1 2 CMA Placement Function (PF) malloc(size s){ } PF: Map objects to heaps PF(heap object)int • How we can find a placement function using our framework? • A placement function defines a data layout. • Learn by measuring the benefits of its data layout. • How: use a learning algorithm. Decision Tree Learner Profiling Information Profile(Heap objects) runtime attributes PF(Attributes)int Learner Use Framework to Evaluate PF
CMA Results 1Relative to original working set size.
Contributions and Future Work • Formulate data layout optimization as a search process. • Build a framework for efficient search process. • Improve existing optimizations; enable new optimizations. • Framework limitations: • Difficult to handle very large traces (>0.5B references). • Requires some guidance from the programmer (search strategy). • Future work • Advanced search strategies that combine several optimizations. • Other non-data-layout optimization – prefetching.