1 / 16

Implementation of Cache Coherency in NoC Mike Cika 2/29/2012

Implementation of Cache Coherency in NoC Mike Cika 2/29/2012. Brief Overview. What is Cache Coherency Problems with Cache Coherency Some Solutions to These Problems. What is Cache Coherency?. Shared Memory Multiprocessors every processor has its own cache

joella
Download Presentation

Implementation of Cache Coherency in NoC Mike Cika 2/29/2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation of Cache Coherency in NoCMike Cika2/29/2012

  2. Brief Overview • What is Cache Coherency • Problems with Cache Coherency • Some Solutions to These Problems

  3. What is Cache Coherency? • Shared Memory Multiprocessors • every processor has its own cache • DRAM Slices Split Among Processors • faster access to required data • read/write requests require communication • must maintain data integrity Culler, Singh “Parallel Computer Architecture: A Hardware/Software Approach” 1999

  4. What is Cache Coherency? • Provide a Set of Locations That Hold Data • Data at Location Must be Up-to-Date Culler, Singh “Parallel Computer Architecture: A Hardware/Software Approach” 1999

  5. Problems in NoC • Directory Based Lookup Required for Scalable Designs • Maintaining Coherency is Much More Difficult than Before Requestor Directory Node Write Miss Invalidations Received P P P P L2 L2 L2 L2 L1 L1 L1 L1 Respond With Sharers ID Invalidations Sent Sharer Sharer

  6. Problems in NoC • With Large Number of Nodes Problems Arise • 1-to-M Communication • duplicate unicast messages • how/where to duplicate • avoid double duplications

  7. Towards the Ideal On-chip Fabric for 1-to-Many and Many-to-1 Communication • Motivation for Work • Overview Tushar Krishna, Li-ShiuanPeh, Bradford M. Beckmann, and Steven K. Reinhardt Dept. of EECS, MIT, Cambridge, MA AMD Research, Bellevue, WA {tushar, peh}@csail.mit.edu, {Brad.Beckmann, Steve.Reinhardt}@amd.com

  8. Ideal 1-to-M Motivation • Traffic Patterns • research shows high number of 1-to-M traffic • performance of 1-to-M needs to rival 1-to-1 Krishna, T. Peh, L. Beckmann, B. Reinhardt, S. Towards the Ideal On-Chip Fabric for 1-to-Manh and Many-to-1 Communication. Micro’11..

  9. FANOUT: Flow Across Network Over Uncongested Trees • Load Balancing • links may go unused • Whirl Algorithm • virtual multicast trees • need to be load balanced

  10. FANOUT: Flow Across Network Over Uncongested Trees • Algorithm Constraints • load balanced for broadcasts and dense 1-to-M • ensure non-duplicate reception • not table based • deadlock free Krishna, T. Peh, L. Beckmann, B. Reinhardt, S. Towards the Ideal On-Chip Fabric for 1-to-Manh and Many-to-1 Communication. Micro’11..

  11. FANOUT: Flow Across Network Over Uncongested Trees • Routing for Multicast Messages • left turn bit (LTB), right turn bit (RTB) • each direction carries these two bits • each LTB is randomly chosen for a given direction • ensures load balanced approach RTBs = ~LTBwRTBe = ~LTBs RTBw = ~LTBnRTBn = ~LTBe Krishna, T. Peh, L. Beckmann, B. Reinhardt, S. Towards the Ideal On-Chip Fabric for 1-to-Manh and Many-to-1 Communication. Micro’11..

  12. FANOUT: Flow Across Network Over Uncongested Trees • All Turns Except U-turns Allowed • deadlock avoidance required • Conventional Virtual Channel Management • VC-a in south direction • VC-b in in all other directions Krishna, T. Peh, L. Beckmann, B. Reinhardt, S. Towards the Ideal On-Chip Fabric for 1-to-Manh and Many-to-1 Communication. Micro’11..

  13. mXbar: Forking Packets • Duplicating Packets in Router • reading same flit multiple times • serialization delay, increased buffer occupancy • reading flit once and forking it in the crossbar Krishna, T. Peh, L. Beckmann, B. Reinhardt, S. Towards the Ideal On-Chip Fabric for 1-to-Manh and Many-to-1 Communication. Micro’11..

  14. Single-cycle Fanout Router • 2-Stage Pipeline Realized Using Lookahead Routing • multiple flits can have multiple next routers • 1-Stage Realized by Sending k+4 Bit Vector, Bufferless Kumar, A. Peh, L. Kundu, P. Jha, N. Express Virtual Channels: Toward the Ideal Interconnection Fabric. ISCA, June 2007.

  15. Fanin Performance Evaluation Krishna, T. Peh, L. Beckmann, B. Reinhardt, S. Towards the Ideal On-Chip Fabric for 1-to-Manh and Many-to-1 Communication. Micro’11..

  16. Further Discussion/Questions • Other Topologies • router forking scalable for N-dimension cubes • non cube/mesh topology require more overhead • Utilizing Optics • broadcast by nature • used more in high core numbers (1024+) • Changing Router Design • rotary based routers • flit travel around circle twice for a duplicate

More Related