1 / 24

Techniques for Reducing Read Latency of Core Bus Wrappers

Techniques for Reducing Read Latency of Core Bus Wrappers. Roman L. Lysecky, Frank Vahid, & Tony D. Givargis Department of Computer Science University of California Riverside, CA 92521 {rlysecky, vahid, givargis}@cs.ucr.edu This work was supported in part by the NSF and a DAC scholarship.

abiba
Download Presentation

Techniques for Reducing Read Latency of Core Bus Wrappers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Techniques for Reducing Read Latency of Core Bus Wrappers Roman L. Lysecky, Frank Vahid, & Tony D. Givargis Department of Computer Science University of California Riverside, CA 92521 {rlysecky, vahid, givargis}@cs.ucr.edu This work was supported in part by the NSF and a DAC scholarship. University of California, Riverside

  2. Core X Core Y MIPS MEM Core Library Cache DMA DSP Introduction • Core-based designs are becoming common • available as both soft and hard • Problem - How can interfacing be simplified to ease integration? University of California, Riverside

  3. Introduction • One Solution - One standard on-chip bus • All cores have same interface • Appears to be unlikely (VSIA) • Another Solution - Divide core into a bus wrapper and internal parts • Rowson and Sangiovanni-Vincentelli ‘97 - Interface-Based Design • VSIA developing standard for interface between wrapper and internals • Far simpler than standard on-chip bus • Refer to bus wrapper as an interface module(IM) University of California, Riverside

  4. Previous Work - Pre-fetching • Analogous to caching, store local copies of registers inside the interface module • Enable quick response time • Eliminates extra cycles for register reads • Transparent to system bus and core internals • Easily integrate with different busses • No performance overhead • Acceptable increases in size and power • Pre-fetching was manually added to each core University of California, Riverside

  5. Controller - Interfaces to system bus pre-fetch registers Pre-fetch Unit - Implements the pre-fetching heuristic Goal: maximize the number of hits Previous Work - Architecture of IM How can we automate the design of the PFU? University of California, Riverside

  6. Outline • “Real-time” Pre-fetching • Mapping to real-time scheduling • Update Dependency Model • General Register Attributes • Petri Net model construction • Petri Net model refinement • Pre-fetch Scheduling • Experiments • Conclusions University of California, Riverside

  7. Naïve Schedule More Efficient Schedule Real-time Pre-fetching • Age constraint • Number of cycles old data may be when read • Access-time constraint • Maximum number of cycles a read access may take A - Age Constraint = 4 B - Age Constraint = 6 Access-time Constraint = 2 University of California, Riverside

  8. Real-time Pre-fetching • Mapping to Real-time scheduling • Register -> Process • Internal bus -> Processor • Pre-fetch -> Process execution • Register age constraint -> Process period • Register Access-time constraint -> Process deadline • Pre-fetch time -> Process computation time • Assume a pre-fetch requires 2 cycles University of California, Riverside

  9. Real-time Pre-fetching • Cyclic Executive • Major cycle = time required to pre-fetch all registers • Minor cycle = rate at which highest priority process will be executed • Problems • Sporadic writes • All process periods must be multiples of the minor cycle • Computationally infeasible for large register sets University of California, Riverside

  10. Real-time Pre-fetching • Rate monotonic priority assignment • Register with smallest register age constraint will have the highest priority University of California, Riverside

  11. Ci = Computation Time for register i Ai = Pre-fetch Time for register i Real-time Pre-fetching • Utilization-based schedulability test University of California, Riverside

  12. Ri = Response Time for register i Ci = Computation Time for register i Ii = Maximum interference in interval [t, t+Ri) Real-time Pre-fetching • Response Time Analysis • Response of register I is defined as follows • Register set is schedulable if for each register the response time is less than or equal to its age constraint University of California, Riverside

  13. Real-time Pre-fetching • Sporadic register writes • Writes to registers are sporadic • Take control of internal bus, thus delaying pre-fetching of registers • Deadline monotonic priority • Register with smallest register access-time constraint will have the highest priority • Add a write register WR to register set • Access-time constraint = Deadline • Age constraint = maximum rate at which write will occur University of California, Riverside

  14. Experiments - Area(Gates) Average increase of IM w/ RTPF over IM w/ BW of 1.4K gates Note: To better evaluate the effects of IM’s, our cores were kept simple, thus resulting in a smaller than normal size. University of California, Riverside

  15. Experiments - Performance(ns) Average increase in performance of IM w/ RTPF over IM w/ BW of 11% University of California, Riverside

  16. Experiments - Energy(nJ) Average increase in energy of IM w/ RTPF over IM w/ BW of 10% University of California, Riverside

  17. Register Attributes • Register Attributes • Update type, access type, notification type, and structure type • Update dependencies • Internal dependencies • dependencies between registers • External dependencies • updates to register via reads and writes from on-chip bus • updates from external ports to internal core register • Petri Nets • Determined that we could use Petri Nets to model our update dependencies University of California, Riverside

  18. Bus Place Register Places Update Dependencies Random Transition Petri Net Based Dependency Model University of California, Riverside

  19. Refined Transition Data Dependency Refined Petri Net Model University of California, Riverside

  20. Pre-fetch Schedule • Create a heap registers to be pre-fetched • Create a list for update arcs • Repeat • if request detected then • add outgoing arcs to heap • set write register access-time to 0 and add to heap • if read request detected then • add outgoing arcs to update arc list • for register at top of heap do • if access-time = 0 then pre-fetch register, remove from heap • if current age = 0 then pre-fetch register, reset current age, add register to heap • while update arcs list is not empty do • if transition fires then set register’s access-time to 0 and add to heap University of California, Riverside

  21. Experiments - Area(Gates) Average increase of IM w/ PF over IM w/ BW of 1.5K gates Average increase of IM w/ PF over IM w/ RTPF of .1K gates Note: To better evaluate the effects of IM’s, our cores were kept simple, thus resulting in a smaller than normal size. University of California, Riverside

  22. Experiments - Performance(ns) Average increase in performance of IM w/ PF over IM w/ BW of 26% Average increase in performance of IM w/ RTPF over IM w/ BW of 16% University of California, Riverside

  23. Experiments - Energy(nJ) Average decrease in energy of IM w/ PF over IM w/ BW of 11% Average decrease in energy of IM w/ PF over IM w/ RTPF of 20% University of California, Riverside

  24. Conclusions • Real-time pre-fetching and update dependency pre-fetching produce good results • Update dependency model is more efficient in pre-fetching registers • Two approaches are complementary • Enable the automatic generation of pre-fetching unit University of California, Riverside

More Related