1 / 6

Reducing Cache Traffic and Energy with Macro Data Load

Reducing Cache Traffic and Energy with Macro Data Load. Lei Jin and Sangyeun Cho*. Dept. of Computer Science University of Pittsburgh. Motivation. Data cache access is a frequent event 20~40% of all instructions access data cache

finna
Download Presentation

Reducing Cache Traffic and Energy with Macro Data Load

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing Cache Traffic and Energywith Macro Data Load Lei Jin and Sangyeun Cho* Dept. of Computer Science University of Pittsburgh

  2. Motivation • Data cache access is a frequent event • 20~40% of all instructions access data cache • Data cache energy can be significant (~16% in StrongARM chip [Montanaro et al. 1997]) • Reducing cache traffic leads to energy savings • Existing thoughts • Store-to-load forwarding • Load-to-load forwarding • Use available resources to keep data for reuse • LSQ [Nicolaescu et al. 2003] • Reorder buffer [Önder and Gupta 2001]

  3. Macro Data Load (ML) • Previous works are limited by exact data matching • Same address and same data type • Exploit spatial locality in cache-port-wide data • Accessing port-wide data is free • Naturally fits datapath and LSQ width • Recent processors support 64 bits • Many accesses are less than 64 bits w/o ML w/ ML

  4. ML Potential CINT2k CFP2k • ML uncovers more opportunities • ML especially effective with limited resource MiBench

  5. ML Implementation • Architectural changes • Relocated data alignment logic • Sequential LSQ-cache access • Net impact • LSQ becomes a small fully associative cache with FIFO replacement

  6. Result: Energy Reduction • Up to 35% (MiBench) energy reduction! • More effective than previous techniques CINT CFP MiBench

More Related