1 / 20

Paper Presentation

Paper Presentation. Energy-Efficient Trace Reuse Cache for Embedded Processors. 2013/01/14 Yun-Chung Yang. Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large Scale Integration(VLSI) Systems, Vol. 19, No.9 , September 2011. Outline. Abstract Related Work Introduction

miyoko
Download Presentation

Paper Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paper Presentation Energy-Efficient Trace Reuse Cache for Embedded Processors 2013/01/14 Yun-Chung Yang Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large Scale Integration(VLSI) Systems, Vol. 19, No.9, September 2011

  2. Outline • Abstract • Related Work • Introduction • Proposed Method • Experiment Result • Conclusion • My Comment

  3. Abstract For an embedded processor, the efficiency of instruction delivery has attracted much attention since instruction cache accesses consume a great portion of the whole processor power dissipation. In this paper, we propose a memory structure called Trace Reuse(TR) Cacheto serve as an alternative source for instruction delivery. Through an effective scheme to reuse the retired instructions from the pipeline back-end of a processor, the TR cache presents improvement both in performance and power efficiency. Experimental results show that a 2048-entry TR cache is able to provide 75% energy saving for an instruction cache of 16 kB, at the same time boost the IPC up to 21%. The scalability of the TR cache is also demonstrated with the estimated area usage and energy-delay product. The results of our evaluation indicate that the TR cache outperforms the traditional filter cache under all configurations of the reduced cache sizes. The TR cache exhibits strong tolerance to the IPC degradation induced by smaller instruction caches, thus makes it an ideal design option for the cases of trading cache size for better energy and area efficiency.

  4. Related Work This paper Performance Energy Branch Prediction Instruction cache restructuring Trace cache filter cache [1]-[6] [7]-[9] [10]-[13] [15], [16]

  5. Introduction • Improvement in both performance and power efficiency. • Performance • Improvement on instruction delivery to boost up processor performance. • Power Efficiency • The upper level of memory architecture use less power while accessing. e.g. Filter cache, but increase performance due to cache access. • Prior work focuses the front-end of instruction delivery. Those works need right program trace to reduce execution latency and energy consumption.

  6. Add D flip-flops and History Trace Buffer

  7. Hit ratio of the HTB • Indicate how often the opportunity occurs for fetching the same instruction in HTB. • HTB hit rate = Hk / Fk • Hk is the hit count in HTB, Fk is total instruction in program.

  8. Proposed Architecture

  9. Proposed Method • Trace-Reuse cache architecture • HTB(Trace History Buffer) – FIFO buffer, store the instruction retired from pipeline backend. • TET(Trace Entry Table) – store the PC value of control-transfer instruction, for instance branch, and the corresponding HTB index. • Update HTB and TET • When an instruction retired from pipeline, buffered in the HTB with its PC value. • If incoming instruction is branch instruction TET will update as well.

  10. Operation

  11. TET Implementation • TET is check each cycle, size and structure is important. • Replaced-by-invalidation policy used in TET, when TET is full, instead of replacing any TET entry, the newly generated trace entry is discarded.

  12. TET Implementation • Fully Associate • 4-way set associate • Directed mapped

  13. Adjustment of TET and HTB • Add a busy bit to TET(a) entry • Add invalidate flag and taken/not-taken direction bit to HTB(b)

  14. Replaced-by-invalidation

  15. Experiment Result • Impact on Instruction Cache Access. • Energy Efficiency. • Using MiBench as input code.

  16. Impact on Instruction Cache Access • Total number of instruction access in different TRC sizes.

  17. Energy Efficiency • Using CACTI tool to calculate. • Tprogram-execution is the elapsed program execution time • Energy-delay product(EDP) is calculate by multiplying normalized Etotal and Tprogram-execution

  18. Conclusion • For an embedded system with non-taken prediction scheme, TR cache can up boost up 92% prediction rate with 21% performance. • TR cache virtually expands the capacity of the conventional instruction cache. • Can be done without the support of trace-prediction and trace-construction hardware. • Can deliver instruction with lower energy cost than the conventional instruction cache.

  19. My Comment • This is the first time of Journal paper. • The proposed idea is easy, but did a great improvement to whole system. • Think that what should data should we put in our tag architecture.

More Related