440 likes | 686 Views
Stephen Curial - Xymbiant Systems Inc . Peng Zhao - Intel Corporation J. Nelson Amaral - University of Alberta Yaoqing Gao, Shimin Cui, Raul Silvera, Roch Archambault - IBM Toronto Software Laboratory. MPADS: Memory-Pooling-Assisted Data Splitting. FROM SUN MICROSYSTEMS. Goal. What:
E N D
Stephen Curial - Xymbiant Systems Inc. Peng Zhao - Intel Corporation J. Nelson Amaral - University of Alberta Yaoqing Gao, Shimin Cui, Raul Silvera, Roch Archambault - IBM Toronto Software Laboratory MPADS: Memory-Pooling-Assisted Data Splitting FROM SUN MICROSYSTEMS José Nelson Amaral
ISMM 2008 Goal • What: • Improve spatial locality • Where: • Linked-based data structures • How: • Pooling similar structures together • Grouping same fields from multiple objects together
ISMM 2008 Goal (cont.) • Why: • Because we can • Allow easy-to-write, easy-to-read, easy-to-maintain code to improve performance • What compiler: • IBM XL compiler suite • Limitation: • Needs more precise pointer analysis to benefit from more opportunities
ISMM 2008 Most Relevant Earlier Work • Pool Allocation • Lattner and Adve (CGO 04, PLDI 05) • Reference Affinity • Zhong, Orlovich, Shen, Ding (PLDI 04) • Rabbah and Palem (TECS 03) • Array Reshaping • Zhao, Cui, Gao, Silvera, Amaral (TOPLAS 07)
ISMM 2008 A refreshing outcome “MPADS is not the first implementation of the combination of memory pools and splitting of pointer-based data structures.” “MPADS is still not delivering its full potential on standard benchmarks in the IBM XL compiler.” Reviewer’s Comment: “The technique only worked for Olden, and did nothing for SPECcpu2000 (but the authors get bonus points for being honest about that.)”
ISMM 2008 Student University Class The Cost of Programming Productivity • Easy-to-read and easy-to-maintain code often results in lower runtime performance.
ISMM 2008 Student The Cost of Programming Productivity • Abstraction • Inheritance Person Support Staff Professor
ISMM 2008 Univ. ID Date of Adm Faculty Department Program Classes Enr. Grades Student The Cost of Programming Productivity • Data Encapsulation Name Address Date of Birth Driver Lic. Gender Person Citizenship
ISMM 2008 Univ. ID Date of Adm Faculty Department Program Classes Enr. Grades Student: Name 4 bytes Person: Address 32 bytes 4 bytes Date of Birth 32 bytes 1 byte Driver Lic. 4 bytes 1 byte Gender 3 bytes 2 bytes Citizenship 1 byte 4 bytes 16 bytes 4 bytes 4 bytes A possible data layout
ISMM 2008 8000 Name 0 8008 8 8016 16 8024 24 8032 Address 8064 32 8040 8072 40 8048 8080 48 8056 Date of Birth Dr. Lic. Ge Univ. ID Univ. ID Univ. ID Date of Adm. Date of Adm. Date of Adm. Fa. Fa. De De Progr. Progr. Classes Enr. Classes Enr. Citizenship Grades Grades Data in Memory 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Memory Address Memory Address
ISMM 2008 Assume a Cache Organization • POWER5 Cache Organization • L1 Data Cache: 32 Kbytes, 128-byte cache lines • L2 Cache: 1.44 Mbytes, 128-byte cache lines • L3 Cache: 32 Mbytes, 512-byte cache lines
ISMM 2008 Cache Organization Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 1 2 3 Cache Lines 4 5 ••• 255
ISMM 2008 Univ.ID Adm. F. D. Prg Class. Grades • Univ.ID Adm. F. D. Prg Class. Example: A search through the data structures Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 1 2 3 Cache Lines 4 5 ••• 255 How many Computing Science students are younger than 23 year old?
ISMM 2008 Example: A search through the data structures Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 Univ.ID Adm. F. D. Prg Class. Grades • Univ.ID Adm. F. D. Prg Class. 1 2 3 Cache Lines 4 5 ••• 255 Student structure: For every 24 bytes loaded, reads either 1 or 5.
ISMM 2008 Example: A search through the data structures Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 Univ.ID Adm. F. D. Prg Class. Grades • Univ.ID Adm. F. D. Prg Class. 1 2 3 Cache Lines 4 5 Name Address DofB DL. G Citizens. ••• 255 0 32 64 68 72 ••• 127
ISMM 2008 Example: A search through the data structures Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 Univ.ID Adm. F. D. Prg Class. Grades • Univ.ID Adm. F. D. Prg Class. 1 2 3 Cache Lines 4 5 Name Address DofB DL. G Citizens. ••• 255 0 32 64 68 72 ••• 127 Person structure: For every 88 bytes loaded, reads 4.
ISMM 2008 ••• ••• Univ. ID Univ. ID Univ. ID Univ. ID Univ. ID Univ. ID Date of Adm. Date of Adm. Date of Adm. Fa. Fa. Fa. De De De Progr. Progr. Progr. Classes Enr. Classes Enr. Classes Enr. Grades Grades Grades ••• ••• ••• ••• Date of Adm. Date of Adm. Date of Adm. ••• Fa. Fa. Fa. ••• De De De ••• Progr. Progr. Progr. ••• Data Reshaping for Arrays of Structures Student *ListOfStudents; …. ListOfStudents = (Student*)malloc(….); •••
ISMM 2008 ID1 ID2 ID3 Adm1 Adm2 Adm3 Fac1 Fac2 Fac3 Dep1 Dep2 Dep3 Clas1 Clas2 Clas3 1 Grad1 Grad2 2 3 Grad3 Maximal Structure Splitting ID1 Adm1 Fac1 Dep1 Clas1 Grad1 1 Grad2 2 ID2 Adm2 Fac2 Dep2 Clas2 ID3 Adm3 Fac3 Dep3 Clas3 Grad3 3
ISMM 2008 ID1 ID2 ID3 ID4 ID5 Adm1 Adm2 Adm3 Adm4 Adm5 Fac1 Fac2 Fac3 Fac4 Fac5 Dep1 Dep2 Dep3 Dep4 Dep5 Clas1 Clas2 Clas3 Clas4 Clas6 Grad1 Grad2 Grad3 Grad4 Grad5 1 2 3 4 6 ID7 Adm7 Fac7 Dep7 Clas7 Grad7 7 Implementation of Pool Allocation • Intercept mallocs and replace by pool allocation: each structure layout gets its own pool. • If pool is full another pool can be allocated
ISMM 2008 Implementing Pool Allocation • The following types of statements need to be transformed: • Memory allocation statements • Memory reference statements
ISMM 2008 Transforming Memory Allocation Statements • Extended pointer analysis to maintain a set of allocation sites associated with each alias set. • When an alias set is selected for transformation: • Replace each associated allocation with a call to the pool allocation function.
ISMM 2008 Transforming Memory References • Update address calculation for loads and stores: • Uniform splitting --- all fields are the same size • Address calculation is simpler • Restricts application of technique or • Requires memory padding • Non-uniform splitting --- fields of different size • Address calculation is more involved • Can be applied more generally
ISMM 2008 s Non-UniformExample pool_base struct example { type_3 a; /* 3 bytes */ type_7 b; /* 7 bytes */ type_5 c; /* 5 bytes */ }; pool_base = s & 0xF…F000 index = (s – pool_base) / 3 field_base = (3+7)*num_structs_per_pool s->c = *(s + field_base - 3*index + 5*index) How can the compiler find the address to access: s->c s->c = *(s + field_base + (5-3)*index) field_base
ISMM 2008 Data Transformation Safety • How the compiler decide whether it is safe to transform a given structure? • Based on the results of the pointer analysis.
ISMM 2008 Is it safe to transform a given data structure? Structure layout: two structures have the same layout if each field has the same offset and the same length. • Build alias set • If a pointer P may point to the structure • Then all the objects in the points-to set of the alias set of P must have the same layout. Points-to set Data Struct 1 P Q Data Struct 2 Alias set
ISMM 2008 Experimental Results - Micro Benchmarks (Speedup) Power 4 Power 5 Linked List 1A Linked List 2 Binary Tree Linked List 1A Linked List 2 Binary Tree Linked List 1B Linked List 1B Linked List 2 w/ alloc Binary Tree w/ alloc Linked List 2 w/ alloc Binary Tree w/ alloc
ISMM 2008 Experimental Results - Micro Benchmarks(Instruction Count) Power 4 Power 5 Linked List 1A Linked List 2 Binary Tree Linked List 1A Linked List 2 Binary Tree Linked List 1B Linked List 1B Linked List 2 w/ alloc Binary Tree w/ alloc Linked List 2 w/ alloc Binary Tree w/ alloc
ISMM 2008 Experimental Results - Micro Benchmarks(L2 Cache Misses) Power 4 Power 5 Linked List 1A Linked List 2 Binary Tree Linked List 1A Linked List 2 Binary Tree Linked List 1B Linked List 1B Linked List 2 w/ alloc Binary Tree w/ alloc Linked List 2 w/ alloc Binary Tree w/ alloc
ISMM 2008 Experimental Study - Olden & LLU (Speedup) Power 4 Power 5 tsp llu bh tsp llu em3d health bh em3d health power power
ISMM 2008 Active Hardware Prefetch Streams Active Prefetching Streams from Memory to L2 (in POWER4)
ISMM 2008 Related Work • Pool Allocation • Lattner & Adve - PLDI 2005 • Data Structure Analysis • Array Based Structure Splitting • Zhong et al. - PLDI 2004 • Reference affinity / affinity based splitting • Memory Trace • Safe Pointer Based Structure Splitting • Jeon, Shin and Han - CC 2007 • Similar to non-uniform splitting • Affinity based splitting uses static analysis • Regular expression framework • Guarantee Safety with regular expressions
ISMM 2008 Final Remarks • Our Compiler-Research Guiding Principles • Programming productivity • Enables programmers to be efficient • Enables easy-to-write/easy-to-maintain programs • Execution Time Performance • Recover runtime efficiency (time, storage or energy) through • Code analysis • Improved code generation • Knowledge of computer architecture and memory hierarchy
ISMM 2008 Pointer Analysis Primer • The following statement: int *a = malloc(…); • Creates: • a memory object (A), • a pointer (a), • and a points-to relation (a,A): a A
ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; S = {(a,b)} b Andersen: S = {(a,b)} b a (Shapiro/Horwitz, PPL97)
ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; S = {(a,b); (b,c)} b c Andersen: S = {(a,b); (b,c)} b c a (Shapiro/Horwitz, PPL97)
ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; a = &d; S = {(a,b); (b,c)} b c Andersen: What should happen in the Steensgaard analysis? S = {(a,b); (b,c); (a,d)} b c a d (Shapiro/Horwitz, PPL97)
ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; a = &d; S = {(a,b); (b,c); (a,d); (d,c)} (b,d) c Andersen: S = {(a,b); (b,c); (a,d)} b c a d (Shapiro/Horwitz, PPL97)
ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; a = &d; d = &e; S = {(a,b); (b,c); (a,d); (d,c)} (b,d) c Andersen: And now? S = {(a,b); (b,c); (a,d)} b c a d (Shapiro/Horwitz, PPL97)
ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; a = &d; d = &e; S = {(a,b); (b,c); (a,d); (d,c); (d,e); (b,e)} (b,d) (c,e) Andersen: S = {(a,b); (b,c); (a,d); (d,e)} b c a e d (Shapiro/Horwitz, PPL97)