Paper Presentation

Paper Presentation Tag Overflow Buffering: Reducing Total Memory Energy by Reduced-Tag Matching Yun-Chung Yang MirkoLoghi, Palo Azzoni, and Massimo Poncino IEEE Transaction on VLSI System, May 2009

Abstract We propose a novel energy-efficient cache architecture based on a matching mechanism that uses a reduced number of tag bits. The idea behind the proposed architecture is based on moving a large subset of the tag bits from the cache into an external register (called the Tag Overflow Buffer) that serves as an identifier of the current locality of the memory references. Dynamic energy efficiency is achieved by accessing, for most of the memory references, a reduced-tag cache; furthermore, because of the reduced number of tag bits, leakage energy is also reduced as a by-product. We achieve average energy savings ranging from 16% to 40% (depending on different cache structural parameters) on total (i.e., static and dynamic) cache energy, and measured on a standard suite of embedded applications.

Related Work Reducing the energy cost Configure cache parameter Reducing the tag number Set-associative cache[11] Use in branch prediction[8]-[9]. Reconfigurable at runtime, decrease performance and increase miss rate[3]-[7]. False detection[10]-[14]. Partial tag comparison[12] Way predicting cache architecture[13] Data cache energy minimization through programmable tag size[15] This paper

Introduction(I) • Application-specific system optimization using the feature of high predictable memory patterns. • Due to a well-defined application. • Previous work shows that the caches are to exploit the high locality of its memory references. • The miss rate versus the number of tag bit. • MiBench

Introduction(II) • Must use carefully, it may result in determining a miss as a hit. • The false hit will might program failure.

Proposed Method • Bring out the tag bits outside the cache into a register that identifies the current locality. • On a memory access, lookup the register first check in the locality or not • On hit – partial tag cache access. • On miss – normal miss procedure, with minor modification. • Three strength of proposed method • Fixed number of tag • Small hardware overhead • Energy reduction

Reduced-Tag Cache Architecture • Store t-k bits and use them as locality identifier. We called it Tag Overflow Buffer (TOB). • Feed the t-k MSB to TOB compare to see if it’s a safe cache access. • While miss, a corresponding memory access is needed, but different from original one, don’t replace the miss line.

Locality Change Detection • Based on the address and TOB miss output to decide whether or not to enable the loading of new locality value.

Choosing the Optimal Tag Size • Chosen of k is the most critical design for the TOB effectiveness. • Etot(k) = Ecache(k) + MR(k)‧Emiss • Ecache(k) = Edata + Etag(k)

Experiment Result • 8KB, directed-mapped, 8Byte line size, Temp=50C • Instruction cache, k from 2 to 8, saving between 14% and 18%. • Data cache, k from 5 to 10, saving between 12% and 16%.

Impact on Performance • The impact of the miss rate increase on the total execution time for the various benchmarks.

Sensitivity to Cache Parameters • Increasing line size causes a reduction of the savings, since larger lines imply a reduced weight of the tag comparison.

Relaxing the Application Dependence • Find the best k for every application, if the system execute a mix of applications and choose the largest k for this system.

Conclusion • Because of the high locality of application, the tag of cache is less used and move to TOB. • The scheme does the energy saving from 16%~48% depends on the cache parameter.

Paper Presentation