180 likes | 194 Views
Xiaomi An, Jiqiang Song, Wendong Wang SimpLight Nanoelectronics Ltd 2008/03/24. Temporal Distribution Based Software Cache Partition To Reduce I-Cache Misses. outline. Traditional code layout optimizations Code layout optimizations in Open64 compiler
E N D
Xiaomi An, Jiqiang Song, Wendong Wang SimpLight Nanoelectronics Ltd 2008/03/24 Temporal Distribution Based Software Cache Partition To Reduce I-Cache Misses
SimpLight Confidential Patent pending outline • Traditional code layout optimizations • Code layout optimizations in Open64 compiler • Temporal distribution based software cache partition to reduce I-Cache misses • Future work
SimpLight Confidential Patent pending Traditional code layout optimizations • Code layout is a kind of optimization to change the code organization in memory. • Main benefits of code layout: • Improve branch prediction by placement of basic blocks • Reduce I-cache misses by changing code’s mapping onto cache (mainly compulsory misses and conflict misses) • Fit code into complex memory hierarchy (e.g. scratch-pad memory and cache)
SimpLight Confidential Patent pending Traditional code layout optimizations • Representation of temporal relationship: • control flow graph with edge frequency • weighted call graph • temporal relation graph • Consideration of cache architecture: • Linearize code, do not consider cache architecture (Pettis and Hansen) • Distribute temporal interleaved code onto different cache lines (Hashemi, Gloy, etc)
SimpLight Confidential Patent pending Code layout optimizations in Open64 compiler • Profile based basic block reordering and procedure-splitting in CG • Based on control flow graph with edge frequency • Pettis and Hansen based algorithm • Procedure reordering in IPA • Based on weighted call graph with call-edge frequency • Kind of Pettis and Hansen based algorithm
SimpLight Confidential Patent pending Software cache partition • What is Software cache partition? • Through code layout optimization, different code blocks are mapped to different regions of the I-cache. • Benefits of software cache partition • Reduce cache misses • Remove interference of multi-programs and avoid additional hardware support (embedded systems) • Soft implementation of scratch pad memory on top of I-cache
SimpLight Confidential Patent pending Video app Audio app Benefits of software cache partition (1) • Remove interference of multi-programs and avoid additional hardware support I-cache is partitioned according to the performance demand and code locality of the video application and the audio application.
SimpLight Confidential Patent pending Code with real time requirement Other code Benefits of software cache partition (2) • Soft implementation of scratch pad memory on top of I-cache I-cache is partitioned to guarantee code with real time requirement will not be replaced after they are brought into the cache.
SimpLight Confidential Patent pending A/E A B/F B C C D D U/P/X E/U/P/X V/Q/Y F/V/Q/Y Benefits of software cache partition (3) • Reduce I-cache misses Runtime trace of code blocks: ABCDEF(UV)5ABCDEF(PQ)5ABCDEF(XY)5ABCDEF Layout 1: 24 misses Layout 2: 18 misses
SimpLight Confidential Patent pending Hot, dense and good regularity Cold Cold Hot and good locality Hot and good locality Hot and good locality Temporal distribution based layout of code blocks in the partitioned cache • Selection of good candidates holding cache lines exclusively • Hot, Dense and Temporal Distribution Mapping into I-cache: Share cache lines Share cache lines
SimpLight Confidential Patent pending A B C D P Q X Y U V E F Temporal distribution • Temporal locality and temporal regularity Trace: ABCDEF(UV)5ABCDEF(PQ)5ABCDEF(XY)5ABCDEF A,B,C,D,E,F have good temporal regularity since they have uniform distribution along the trace. U,V,P,Q,X,Y have good temporal locality since they exhibit a large skew in the reference distribution. Share cache lines Our mapping: Totally 18 misses Share cache lines
Qualification of temporal distribution • Variance of reuse distance • Temporal distribution • Weighted temporal distribution SimpLight Confidential Patent pending
SimpLight Confidential Patent pending Iterative partition and layout Func Partition (RB, IRB) Sort nodes in RB by instruction density // highest //instruction density first RB_SIZE= Calc_rb_size(RB) IRB_SIZE= Calc_irb_size(IRB) While(RB_SIZE+IRB_SIZE>CACHE_SIZE) { Adjust(RB, IRB) RB_SIZE= Calc_rb_size(RB) IRB_SIZE= Calc_irb_size(IRB) }
SimpLight Confidential Patent pending 100.00% 90.00% BB reorder 80.00% 70.00% BB reorder + 60.00% layout 50.00% BB reorder + 40.00% pu split + layout 30.00% 20.00% 10.00% 0.00% H264 enc H264 dec AVSMdec MPEG4dec G729.A Experiments and results (1) Cumulative effect of optimizations on I-cache miss reduction
SimpLight Confidential Patent pending 100.00% 90.00% 80.00% 70.00% 60.00% TD 50.00% PH TRG 40.00% 30.00% 20.00% 10.00% 0.00% H264 enc H264 dec AVSM dec MPEG4 dec G729.A Experiments and results (2) Reduction of I-cache misses by TD, PH and TRG.
SimpLight Confidential Patent pending 80.00% 70.00% 60.00% 50.00% TD 40.00% PH TRG 30.00% 20.00% 10.00% 0.00% HE:stenfan HE:akiyo HE:football HD:stenfan HD:akiyo HD:football Experiments and results (3) H264 codec I-cache miss reduction by TD, PH and TRG with various inputs
SimpLight Confidential Patent pending Future work • Improve current iterative partition algorithm • Incorporate more cache configurations into the layout algorithm, e.g. cache line size, L2 cache … • Develop effective software cache partition method for multi-thread programs on our memory hierarchy
SimpLight Confidential Patent pending Thank You!