210 likes | 357 Views
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections. M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol Computer Engineering Department, Sharif University of Technology, Tehran, Iran modarressi@ce.sharif.edu. Outline.
E N D
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol Computer Engineering Department, Sharif University of Technology, Tehran, Iran modarressi@ce.sharif.edu
Outline • Introduction and Motivations • Virtual Point-to-Point (VIP) Connections • Static VIP Construction Scheme • Dynamic VIP Construction Scheme • Setup Network • Evaluation Results • Conclusions and Future Work Sharif University of Technology 2
On-Chip Communication Mechanisms • Packet-Switched NoCs • Good Resource Utilization • Modest Design Effort/Time Due to Structured and Predictable Links • Some Power and Performance Overheads Due to Multi-Stage Pipelined Routers • Dedicated Point-to-Point Links • Ideal Power and Performance • Poor Scalability: Significant Area Overhead for Large Systems • Significant Design Effort/Time Due to Non-Predictable Link Properties Virtual Point-to-Point Connections in a Packet-Switched NoC
VIP Connections • VIP: VIrtual Point-to-point Connections • Over One VC (Virtual Channel) of Each Physical Channel • Bypass Some Router Pipeline Stages • Inexpensive Extensions to a Traditional Wormhole Router • Router Control Unit, Arbiter, Buffer of the VIP Virtual Channels
Router Architecture • Buffer at the VIP Virtual Channels Is Replaced by a Register (1-Flit Buffer) • VIP Paths Are Kept by VIP Allocator Units at Output Ports • Determines Which Input Is Connected to This Port Along the VIP • Allocates Output Port to VIP When Control Signals Indicate That the VIP Has an Incoming Flit to Forward • A Flow-Control Mechanism Prevents Starvation in Packet-Switched Flits
VIP Connections • A VIP Is Constructed by Chaining the VIP Registers in the Routers Between the Source And Destination Nodes of a Communication Flow • Provides a Virtual Dedicated Pipelined Link With 1-flit VIP Buffers as Staging Registers • Flits Only Travel Over the Crossbars and Links Which Cover the Actual Physical Distance Between Their Source and Destination Nodes • Skip Through Buffer Read, Buffer Write, and Allocation Operations 6
VIP Connections • VIPs Are Not Allowed to Share a Common Link • To Remove Buffering, Arbitration,… • A Limited Number of VIPs in a Network • But VIPs Cover a Significant Portion of On-Chip Traffic Due to Communication Locality • In Most Multi-Core SoC Applications Each Core Communicates With a Few Other Cores • In CMP Workloads Each Node Tends to Have a Small Number of Favored Destinations for Its Messages
VIP Construction Algorithm - Static • Based on Application Traffic Pattern • Input Applications Are Described by a Task-Graph (TG) • A Heuristic Algorithm • Map the TG Cores into the Nodes of a Mesh-based NoC • Construct VIP for TG Edges in Order of Their Communication Volumes • Find a Path Through Packet-Switched Network for a TG Edge If There Are Not Sufficient Free Resources to Build a VIP for It
VIPs for the VOPD Application • VIPs Cover 100% of the On-Chip Traffic for This Application • Static VIP Construction Scheme: • Benchmarks: VOPD, MWD, MPEG, MP3+H263 • Up to 58% Reduction in Message Latency (39% on Average) • Up to 65% Reduction in Power Consumption (49% on Average)
VIPs vs. Physical Point-to-Point Connections • VIPs Offer: • Power and Performance Close to Dedicated Physical Point-to-Point Connections • More Flexibility • Dynamically Reconfigurable Based on the Traffic Pattern of the Running Application • Less Design Effort • Customized Dedicated Connections Over Regular Components
Dynamic VIP Construction • An Alternative VIP Construction Scheme • Dynamically Changes the VIP Connections in Response to Communication Requirements Imposed By the Running Application • Monitoring the NoC Traffic • Detecting High-Volume Communications and Constructing a VIP for Them • Select the Best Route for a VIP Using a Simple Setup Network
Setup Network • Setup Network Structure • A Light-Weight Control Network • Simple Node Structure and Small Bit-Width • The Same Topology as the Main Data Network • Setup Network Operation • Keep the Track of the Number and Destination of Packets Sent by Each Node • Select Traffic Flows Weighting Higher Than a Threshold (Bit/Sec.) • Finds a Path Along One of the Shortest Paths Between the Source and Destination Nodes of the Traffic Flow to Construct a VIP
Dynamic VIP Construction • Establishing a New VIP May Tear Down Some Existing VIPs • Cost of a VIP: The Cumulative Weight (bit/sec.) of the VIPs That Will Be Torn Down By This New VIP • Setup Network: • Finds the Path With Minimum Cost • Sends the Cost to the Source Node to Decide on Establishing the New VIP • A New VIP Is Established If the Cumulative Weight of the Torn Down VIPs Is Less Than the Weight of the Requesting Traffic Flow 13
Setup Network • VIP Setup Procedure: • Arbitrating Among VIP Setup Requests • Running the Distributed VIP Setup Algorithm • Setting Up a VIP in the Data Network By Configuring the VIP Allocator of the Nodes Along the VIP Path • Tearing Down Conflicting VIPs • Each Setup Network Node Contains the Configuration Information of Its Corresponding Data Network Node • Due to the Distributed Nature of the Algorithm Short Reconfiguration Time 14
D S Select the Minimum Cost and Keep the Port from Which the Smaller Cost Is Received 12 21 9 2 9 10 15 4 5 7 5 8 0 3 9 5 12 1. Add the Received Cost (4) to the Weight of Ports Along the Shortest Path (the W and N Ports) toward the Destination Node 2. Send the New Costs (9 and 12) to the Neighboring Nodes Along the Destination Node 5 0 5 4 12 4 8 Port Cost ( Weight of the VIP Using It ) 15
Dynamic VIP Construction • The Setup Network Operates in Parallel with Packet Transmission in Packet-switched Network • Hide the Setup Time • The Setup Network Has a Small Bit-width and Operates Infrequently (Only When a High-volume Flow Is Detected) • Negligible Power and Area Overhead 16
Evaluation Results • XMulator NoC Simulator (www.xmulator.org) • A C# -based Simulator • Orion Power Library • Comparison with a Conventional NoC (5-Stage Pipelined Wormhole Switch) • Multi-Core SoC Traffic: • H.263 Decoder+MP3 Decoder, H.263 Decoder+ MP3 Encoder, MP3 Decoder+ MP3 Encoder 38% Reduction in Message Latency, 46% Reduction in Power Consumption 17
Evaluation Results Synthetic Traffic: N-Hot Traffic: 80% of Messages to Exactly N Destination, 20% to Randomly Chosen Nodes Message Latency (cycles for 8-flit packets) Power (nJ/Cycle) 18
Summary and Future Work • Adaptable Virtual Point-to-Point Connections in a Packet-Switched NoC • Benefit from the Advantages of Both Communication Methods • Two Static and Dynamic VIP Construction Schemes • Significant Power/Latency Reduction • Future Work • Comparing the Method with Related Work; Express Virtual Channels, Single-Cycle Routers, … • Precise Area/Power Results by Implementing the NoC in Hardware • Analytical Models Show Small Area Overhead 19
Thank You Questions? modarressi@ce.sharif.edu 20