180 likes | 200 Views
Explore dynamic multicast tree construction, VC as cache, packet concatenation for efficient communication in Network-on-Chips with motivation, proposals, caveats, solutions, and results.
E N D
Using Packet Information for Efficient Communication in NoCs Prasanna Venkatesh R, MadhuMutyam PACE Lab IIT Madras
Agenda • Motivation • Existing techniques to handle multicasts at NoC • Dynamic Multicast Tree • VC as Cache • Packet Concatenation • IPC Results • Energy Analysis • Conclusion
Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the average is 7.5%only.
Motivation SPLASH and PARSEC benchmarks have upto 87% of nodes participating in a multicast. But the maximum communication exists for < 4% of the time.
Multicasts: Solutions in the literature • Separate injections flood the network with redundant copies • Multicasts: Single copy till a common path and forks to multiple copies • Simplifies routing logic Dynamic Multicast routing can make use of idle paths to avoid congestion. But is it possible to meet timing constraints?
Our Proposals to achieve multicast efficiency • Dynamic multicast tree construction using redundant route computation units • Will penalize unicasts and create starvation? • Three optimizations on unicasts to enhance dynamic multicasting • VC as cache • Packet Concatenation • Critical word first
Critical Word First • Borrowed from Cache data transfer optimization technique • Make efficient use of the flit level split of a packet containing cache block • Send the requested word with the header flit
Dynamic Multicast Tree Method • Compute Odd-Even route at each router for all multicast destinations • Takes one RC cycle per destination • Add a redundant RC unit to speed this process • No extra chip area because of the simplicity Caveats • Bottlenecks unicasts • Slow when there is no congestion
VC as Cache: Scenario • A shared cache block is requested by more than one node at a given time frame • The owner sends a multicast of the block to all the requestors • A request arrives after this multicast • The owner resends the block after processing this request
Solution – Add the new requestor to the processed multicast midway! • Compare up to five multicast packets with an incoming request packet at the router • If matched, • Forward the request to the owner for coherence and book keeping with a time stamp of the previous message • Add this requestor to the multicast destinations
Packet Concatenation • A request is a single flit packet • When RC units are busy, we can club single flit packets to the same destination to form a “super-packet” • This means it is going to take one RC cycle to compute multiple packet routes from there on.
Configuration for simulations • Simulators Multi2sim 4.0.1, Booksim 2.0, Orion 2.0 • Real time simulation • 64 Nodes with 32 cores + L1 nodes and 32 shared distributed L2 cache banks • 1 Flit for request and coherence packets, 5 flits for cache block • Benchmarks: • SPLASH2 and PARSEC workloads with 32 threads • All high injection workloads are picked after an initial study on their injection rates
IPC Results • Abbreviations: • C – Critical Word first • V – VC as cache • D – Dynamic Multicast Tree • P – Packet Concatenation
IPC Results • Abbreviations: • C – Critical Word first • V – VC as cache • D – Dynamic Multicast Tree • P – Packet Concatenation
Scaling to 512 Nodes: IPC Results • Abbreviations: • C – Critical Word first • V – VC as cache • D – Dynamic Multicast Tree • P – Packet Concatenation
Fine Grained Energy Footprint of Barnes • Abbreviations: • C – Critical Word first • V – VC as cache • D – Dynamic Multicast Tree • P – Packet Concatenation
Conclusion and future extensions • Scalable solution for multicasts • Can fit with existing techniques • Easy to implement • Energy Efficient • Packet Concatenation can be switched on selectively depending on the load requirements • Other architecture level inputs can also be used for further performance. • Example: #Instructions waiting, memory level parallelism