1 / 18

Using Packet Information for Efficient Communication in NoCs

Explore dynamic multicast tree construction, VC as cache, packet concatenation for efficient communication in Network-on-Chips with motivation, proposals, caveats, solutions, and results.

mirnag
Download Presentation

Using Packet Information for Efficient Communication in NoCs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Packet Information for Efficient Communication in NoCs Prasanna Venkatesh R, MadhuMutyam PACE Lab IIT Madras

  2. Agenda • Motivation • Existing techniques to handle multicasts at NoC • Dynamic Multicast Tree • VC as Cache • Packet Concatenation • IPC Results • Energy Analysis • Conclusion

  3. Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the average is 7.5%only.

  4. Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the maximum communication exists for < 4% of the time.

  5. Multicasts: Solutions in the literature • Separate injections flood the network with redundant copies • Multicasts: Single copy till a common path and forks to multiple copies • Simplifies routing logic Dynamic Multicast routing can make use of idle paths to avoid congestion. But is it possible to meet timing constraints?

  6. Our Proposals to achieve multicast efficiency • Dynamic multicast tree construction using redundant route computation units • Will penalize unicasts and create starvation? • Three optimizations on unicasts to enhance dynamic multicasting • VC as cache • Packet Concatenation • Critical word first

  7. Critical Word First • Borrowed from Cache data transfer optimization technique • Make efficient use of the flit level split of a packet containing cache block • Send the requested word with the header flit

  8. Dynamic Multicast Tree Method • Compute Odd-Even route at each router for all multicast destinations • Takes one RC cycle per destination • Add a redundant RC unit to speed this process • No extra chip area because of the simplicity Caveats • Bottlenecks unicasts • Slow when there is no congestion

  9. VC as Cache: Scenario • A shared cache block is requested by more than one node at a given time frame • The owner sends a multicast of the block to all the requestors • A request arrives after this multicast • The owner resends the block after processing this request

  10. Solution – Add the new requestor to the processed multicast midway! • Compare up to five multicast packets with an incoming request packet at the router • If matched, • Forward the request to the owner for coherence and book keeping with a time stamp of the previous message • Add this requestor to the multicast destinations

  11. Packet Concatenation • A request is a single flit packet • When RC units are busy, we can club single flit packets to the same destination to form a “super-packet” • This means it is going to take one RC cycle to compute multiple packet routes from there on.

  12. Configuration for simulations • Simulators Multi2sim 4.0.1, Booksim 2.0, Orion 2.0 • Real time simulation • 64 Nodes with 32 cores + L1 nodes and 32 shared distributed L2 cache banks • 1 Flit for request and coherence packets, 5 flits for cache block • Benchmarks: • SPLASH2 and PARSEC workloads with 32 threads • All high injection workloads are picked after an initial study on their injection rates

  13. IPC Results • Abbreviations: • C – Critical Word first • V – VC as cache • D – Dynamic Multicast Tree • P – Packet Concatenation

  14. IPC Results • Abbreviations: • C – Critical Word first • V – VC as cache • D – Dynamic Multicast Tree • P – Packet Concatenation

  15. Scaling to 512 Nodes: IPC Results • Abbreviations: • C – Critical Word first • V – VC as cache • D – Dynamic Multicast Tree • P – Packet Concatenation

  16. Fine Grained Energy Footprint of Barnes • Abbreviations: • C – Critical Word first • V – VC as cache • D – Dynamic Multicast Tree • P – Packet Concatenation

  17. Conclusion and future extensions • Scalable solution for multicasts • Can fit with existing techniques • Easy to implement • Energy Efficient • Packet Concatenation can be switched on selectively depending on the load requirements • Other architecture level inputs can also be used for further performance. • Example: #Instructions waiting, memory level parallelism

  18. Thank you

More Related