300 likes | 421 Views
Run-time Adaptive on-chip Communication Scheme. 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C. Outline. Abstract Introduction Motivation Case Study AdNoC Concept Definitions Algorithm Hardware Implementation Conclusion. Abstract.
E N D
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C
Outline • Abstract • Introduction • Motivation Case Study • AdNoCConcept • Definitions • Algorithm • Hardware Implementation • Conclusion
Abstract • During run-time varying workloads and/or constraints in embedded systems require run-time adaptivityto provide a high degree of efficiency during any operation mode/scenario. • We are presenting the first approach of an adaptive on-chip communication scheme. • It provides an adaptive routing/path allocation algorithm to meet a required level of Quality of Services (QoS) which is guaranteed bandwidth.
Introduction(1/2) • A run-time adaptive network on chip that adapts the underlying interconnection infrastructure on-demand in response to changing communication requirements imposed by an application. • To provide on-demand interconnections, we present a novel adaptive routing/path allocation algorithm that meets QoSrequirements (bandwidth).
Introduction(2/2) • The scheme makes decisions locally at each router depending on the available bandwidth in each direction to the neighboring router. • Dynamic connections are realized by re-assigning a certain number of buffer blocks to different output ports of a router on-demand. • It also increases the resource utilization, especially buffer utilization, through on-demand buffer block configuration.
Motivation Case Study (1/4) • We motivate the need of an adaptive NoC by means of a very simple scenario. We study an MPEG decoder [1] and an Image Processing Line (IPL) [18] application. • The task graphs are shown in Figures 1a and 1b. • Assume at time t0 the NoC is running the MPEG video decoder (Fig. 1c). • At time t1, the IPL needs to be executed then it is also mapped besides the MPEG onto the processing elements. Once a mapping is performed, the routers attempt to set up meaningful routes (Fig. 1d).
Fig. 1. Motivation to use an adaptive communication architecture
Motivation Case Study (2/4) • In this example, the Gauss task Gauss1 first establishes a route to its neighboring filter task Filter1. It then conducts QNoCa deterministic XY routing algorithm for Filter2. • However, that will fail due to the limited bandwidth availability. • Consequently, it forces the router at Gauss1 to try another route which is successful (Fig. 1e).
Motivation Case Study (3/4) • With the routes, the routers supply a corresponding buffer block, allocating the buffer to output ports on-demand. • The second Gauss task Gauss2 attempts to conduct the same action. • However, it fails at finding a route to Filter1 and Filter2. Thus it becomes necessary to invoke a re-mapping (Fig. 1f).
Motivation Case Study (4/4) • Routing needs to be implemented through an algorithm which can identify feasible routes. • After path selection, appropriate buffer blocks need to be employed on-demand to that path. • If path and buffer blocks are not available the mapping function sends appropriate feedback to the upper layer. • Therefore, in a dynamic run-time application scenario an adaptive on-chip communication infrastructure which can build connections on-demand to provide QoS.
AdNoC Concept • The AdNoCarchitecture is proposed to support QoS-supported on-chip communication for a network exposed to varying system constraints. • As most NoCs, it utilizes packet-based communication. The architecture is pipelined and deploys wormhole routing because of its low latency in practice and small buffer space requirements.
Definitions(1/4) • Definition 1: An application task graph (TG) is a directed graph Gk= (T, F), • T is the set of all tasks tiused by an application • fi, j∈ F represents the connection from task tito tj • Definition 2: Physical Network (PN) is a directed graph P = (N, V, Bt, r). • N is a set of tiles ni • vi, j∈ V represent an edge, the physical channel between niand nj • Each tile has a current buffer configuration at time t, bi,t ∈ Btrepresents the state of a buffer assignment to individual output ports. • A routing function r which determines the paths taken.
Definitions(2/4) • Definition 3: Logical Network (LN) at time t is a directed graph Lt= (M, W) • M is a set of task groups mi • w i, j∈ W represents the set of connections between two task groups miand mj • Definition 4: The Task Mapping Function is a function lt: T’ ⊆ T → Ltwhich maps subset T’ of each task graph T to the logical network LN.
Definitions(3/4) • Definition 5: The Network Mapping Function is a function pt: Lt → S ⊆ P which maps a logical network onto a subset of the physical network. • Definition 6: A Routing Function r : N × N → V , r : (ni, nk) → vi,jreturns a path vi,jaway from the current PE (ni) given the input port for each transaction and the destination nk.
Definitions(4/4) • Definition 7: • The Buffer Configuration bi,tis the current buffer configuration of tile ni ∈ N. • A Virtual Channel (VC) is a unidirectional logical or virtual connection between the tile niand nj • Each VC is realized by an independently managed pair of message buffers referred to as the Virtual Channel Buffer (VCB).
Definitions(4/4) • Definition 8: The System Monitor M is an infrastructure which is used to collect, aggregate, and process system statistics. • Definition 9: Our Adaptive Network on Chip AdNoC is defined as the tuple AdNoC= (P, M, Lt, Gi, pt, lt, r) with the parameters as given above.
Algorithm (1/11) • To provide bandwidth guarantee in an adaptive NoC, the underlying communication infrastructure needs to provide an adaptive path allocation strategy. • Therefore, finding a path/routing for a given logical network and physical mapping of the application is a major challenge. The run-time path allocation algorithm is given in Alg. 1.
Algorithm(3/11) • For a requesting transaction, the path is checked in every possible direction and the VCB is assigned accordingly on-demand. • The weighted XY algorithm wXYpresented in Alg. 2 assigns each output port a weight based on available bandwidth and dx or dybetween the current and the destination nodes. • This ideally gives the packet a maximum number of sensible routing choices along its path.The weight is also proportional to the available bandwidth.
Algorithm(5/11) • The wXYroute allocation strategy is described as follows: given is the tuple ρ= {N, E, S, W, P}. • Each i ∈ ρhas a weight wiand available bandwidth biwith bi ≤ bmax, bmax being the maximum line bandwidth.
Algorithm(6/11) • The current router coordinates are x, y. Each packet p has destination coordinates xd, ydand a required bandwidth bp. The weights are assigned as follows:
Algorithm(7/11) • The route r chosen is then: • The router distribute the VCBs to any route as needed by assigning it to the according output port.
Algorithm(8/11) • Our scheme to assign buffers on-demand (at runtime) is given in Alg. 3. • The benefits of such on-demand assignment is evident: buffers are only allocated when needed meaning that virtual channels can be reused by different ports.
Algorithm(9/11) • Fig. 3 shows an exemplary scenario to showcase the run-time behavior using different transactions in one router.
Algorithm(10/11) • t0: All four directions are occupied with four different transactions; buffers are also assigned. • t1: Transaction T5 requests a path and weights are calculated till tδtaking 4 hardware cycles. A buffer is also assigned to the calculated direction before tδ. • t2: Transaction T1, T2, and T4 free their corresponding channels and assigned buffers.
Algorithm(11/11) • t3: Four new transactions T1, T2, T4, and T6 request processing and they are granted resources. • t4: Transactions T7 requests a path and buffer but due to unavailable buffer resources, the transaction cannot be granted. So, the requesting transaction has to wait or inform the upper layer through the system monitor.
Hardware Implementation • Our hardware platform for the AdNoCis illustrated in Fig. 4. • It consists of mainly two parts: • the run-time path allocation • the on-demand VCB assignment part. • The path allocation part either decides based on the lookup table or by calculating the type of the flit.
Conclusion • We have introduced the first approach of an adaptive on-chip communication architecture. It provides an adaptive path allocation algorithm to meet varying bandwidth guarantees. • Run-time connections are realized by re-assigning a number of buffer blocks on-demand. • Our buffer allocation scheme increases the buffer utilization and decreases the overall buffer use.