180 likes | 284 Views
Data Access Profiling & Improved Structure Field Regrouping in Pegasus . Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session. Introduction. Structure definitions group fields by semantics, not access contemporaneity
E N D
Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session
Introduction • Structure definitions group fields by semantics, not access contemporaneity • Data access profiling can be used to improve cache performance by reordering for contemporaneity In this context, contemporaneity is a measure of how close in time two data accesses to structure fields occur
Problem Statement • Obtaining contemporaneity information for structure fields • Exploiting this information to improve the ordering of the fields • Doing this within the CASH/Pegasus environment
Approach • Pegasus Implementation • Data Access Profiling to track contemporaneous field accesses to build the Field Affinity Graphs • Modify Simulator interface to SimpleScalar (3rd party cache simulator) to achieve this • Regrouping Algorithm • Field Affinity Graphs built by the modified Simulator are then used to recommend reorderings based on a new regrouping algorithm
Design Overview • Build stage: Tag structure field accesses in the Pegasus IR • Simulation stage: Propagate tag information through SimpleScalar to the new regroup library • Final stage: Invoke regrouping algorithm to calculate reordering recommendations
Build Stage, Tagging Accesses • Objective: Identify and tag structure field accesses in the Pegasus IR • Not trivial, since SUIF/C2DIL do not preserve required type information during transformation to IR • Need to identify patterns that indicate structure field accesses
Actual Pegasus Illustration int foo(struct my_t stestfoo) { int retval = stestfoo.f2; return(retval); } Which wire here should havestruct type? int foo(struct my_t* stestfoo) { return(stestfoo->f2); } Which wire here has struct type?
Simulation Process • Tag info on loads and stores is propagated through SimpleScalar to the regrouping library that builds the field affinity graph (done online, during simulation)
Regrouping Stage • After simulation, analyze collected profiling data to produce reordering recommendation • Can be done better than has been done in previous work (greedy) • Cannot be done optimally (NP-hard) • Field Affinity Graph (one per structure): • Vertices: fields in a structure • Edge weights: represent degree of contemporaneity of accesses between the fields
Matching Heuristic • Find a maximum weight matching in the field affinity graph • Fields that will not fit into a cache line together anyway are identified and ignored • Structure is reordered by placing matched fields together
NP-Hardness • NP-Hardness is shown by reducing graph coloring problem to regrouping problem
Results • Implemented successfully to handle structure field accesses done through pointers (ptr->fld) • So far, only small programs have been tested • Reordering is done manually and fed into simulator again to obtain the number of cycles for comparison
Results - Example Original: struct my_t { int f1; int f2; char nu[4096]; int f3; int f4; }; int foo(struct my_t *elt) { int i; elt->f1 = 2; elt->f4 = 100; for(i=0; i < 50; i++) { elt->f1++; elt->f4--; } return elt->f1+elt->f4; } Modified: struct my_t { int f1; int f4; int f2; char nu[4096]; int f3; }; int foo(struct my_t *elt) { int i; elt->f1 = 2; elt->f4 = 100; for(i=0; i < 50; i++) { elt->f1++; elt->f4--; } return elt->f1+elt->f4; } 745 Cycles per Call (one less cache miss) 750 Cycles per Call
Conclusion • Performance improvements are achievable even on simple programs using reorganization recommendations • Propagation of full type information in SUIF/c2dil from source would be required to optimize non-pointer accesses • Less memory-exposed languages would allow for easy and quick implementation of the reordering recommendation
References • Trishul M. Chilimbi, Bob Davidson, and James R. Larus, “Cache-Conscious Structure Definition,'' in Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation, pages 13-24, May 1999. • Mathprog (Weighted Matching Algorithm) http://elib.zib.de/pub/Packages/mathprog/matching/weighted/ • Pegasus: http://www-2.cs.cmu.edu/~phoenix/ • SUIF: http://suif.stanford.edu/ • SimpleScalar Tool set: http://www.cs.wisc.edu/~mscalar/simplescalar.html