270 likes | 365 Views
隱藏之投影片. ADAM: Run-time Agent-based Distributed Application Mapping for on-chip Communication. To be studied :. 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C. ★★★. Xen. ???. Abstract(1/5).
E N D
隱藏之投影片 ADAM: Run-time Agent-based Distributed Application Mapping for on-chip Communication To be studied : 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C ★★★ Xen ???
Abstract(1/5) • Design-time decisions can often only cover certain scenarios and fail in efficiency when hard-to-predict system scenarios occur. • This drives the development of run-time adaptive systems. • To the best of our knowledge, we are presenting the first scheme for a runtime application mapping in a distributed manner using agents targeting for adaptive NoC-based heterogeneous multi-processor systems.
Abstract(2/5) • Some events that may require a re-mapping at run-time for an adaptive system where design-time mapping algorithms fail are given below: • On-line detection of hardware faults. • To minimize run-time system costs (i.e. to save energy because of the low battery status). • When the user requirements change, e.g. the user wants to switch video playback to a higher resolution. • The system is analyzed during run-time and self-adapts in terms of when and how a mapping algorithm should be invoked.
Abstract(3/5) • Our novel contributions are as follows: • (1) We provide a run-time agent-based distributed mapping algorithm for next generation self-adaptive heterogeneous MPSoCs. • Our mapping algorithm is composed of two main parts: • (a) virtual cluster selection and cluster reorganization at run-time • (b) a mapping algorithm inside a cluster at run-time. • (2)We propose a run-time cluster negotiation algorithm that generates virtual clusters to solve the problems of the centralized mapping algorithm (ex: single point of failure). • (3) We present a low cost heuristic-based mapping algorithm in terms of execution cycles on any instruction set processor that minimizes the communication related energy consumption.
Abstract(4/5) • Small system with few tiles: low traffic, low computational effort • 但當擴充到hundreds of thousands of cores 會發生一些問題。
Abstract(5/5) • With hundreds or thousands of cores • Scalability issues • Single point of failure: of the whole chip! • High computation complexity • 於是提出(右圖)方法 • Hierarchical Approach
Some Definitions(1/4) • In the following we introduce our run-time Agent-based Distributed Application Mapping (ADAM) for a heterogeneous MPSoC with a NoC • Definition 1:An application communication task graph (CTG) is a directed graph Gk= (T,F) . • T is a set of all tasks of an application. • fi,j ∈ F is a set of all flows between connected tasks tiand tjannotated by the inter-task bandwidth requirement. • Definition 2: A heterogeneous MPSoC architecture in a NoC platform HMPSoCNoCis a directed graph P = (N, V ) • vertices N is a set of tiles ni • vi,j ∈ V present an edge, the physical channel between two tiles ni and nj. • A tile , ni ∈ N is composed of: a heterogeneous PE, a network interface, a router, local memory and a cache.
Some Definitions(2/4) • Definition 3: A cluster is a subset Ci ⊆ N, • N is the set of tiles nj that belong to the HMPSoCNoC • a virtual cluster Cvi, is a cluster where there are no fixed boundaries to decide which tiles are included and which tiles are not. • It can be created, resized and destroyed at run time. • Definition 4: An agent Ag is a computational entity, which acts on behalf of others. • The properties of an agent in our scheme are: • an agent is a smaller task closer to the system • It must do resource management • It may need memory to store state information for the resources • it must be executable on any processing element • it must be migratable • it must be recoverable • it may be destroyed if the cluster no longer exists.
Some Definitions(3/4) • Definition 5: A cluster agent CA ∈ Ag is an agent that is responsible for mapping operations within the cluster Ci. • The cluster agent is located in the processing element • where the index j of pj denotes that the cluster agent can be mapped to any PE of the cluster. • Definition 6: A global agent GA is an agent that stores the information for performing the mapping operations to a selected cluster. • It stores information regarding the current usage of communication and computation resources for each cluster and this information is used for selection and re-organization of the clusters • GA is movable and the stored information is light-weight and easily.
Some Definitions(4/4) • Definition 7: The application mapping function is given by m : T∈ ti → nj ∈ N. • Definition 8: A binding is a function b : , , • T is the set of all tasks of an application and Tps is the set of the PE types that are used on the HMPSoCNoC. • The function assigns each task tiof the CTG to a favorable type of PE. • After the binding operation is completed, the tasks are allowed to be mapped only to PEs of the type given by the binding function b.
The ADAM Flow(1/3) • An overview of our ADAM system is presented in Fig. 1. • The run-time mapping in our scheme is achieved by using a negotiation policyamong Cluster Agents (CAs) and Global Agents (GAs) of a certain instance of time distributed over the whole chip. • In Fig. 1 an application mapping request is sent to the CA of the requesting cluster which receives all mapping requests and negotiates with the GAs. • The GAs have global information about all the clusters of the NoC in order to make decisions onto which cluster the application should be mapped to.
The ADAM Flow(2/3) • Possible replies to this mapping request are: • 1. When a suitable cluster of the application exists then the GAs inform the requesting source CA and the requesting source CA asks the suitable destination CA for the actual mapping of the application. • 2. When no suitable clusters are found by the GAs then the GAs report the next most promising cluster where it is possible to map the application to after task migration which is negotiated between the GA and the CA to make this cluster suitable for the mapping. • 3. When neither a suitable cluster nor a candidate cluster for task migration are found, then the re-clustering concept is used. • If all the above-mentioned options do not lead to a successful mapping (the application and the system constraints are not met), then the mapping request is refused and reported to the requester. .
Cluster Negotiation Algorithm(1/5) • The algorithms have the following important input and output data objects: • The application CTG, G with required computational resource profiles for each task. • G is given by a set of entries for each flow: entry = (idsrc, iddst, bwreq, lat, RRtp). • Idsrcand iddstare the id of the source and destination task of the flow • bwreq is the required bandwidth of the flow • lat is the communication latency • RRtp is the resource requirement on each PE type that is needed for a task to ensure a successful execution. • The state information about all clusters are stored in a summarized format by the GAs (Table 1 and data object nhistc).
Cluster Negotiation Algorithm(2/5) • Energy Model: To make a binding decision the amount of energy consumption for different PE types at different resource requirement levels is needed. • We take an example from Fig. 2 (b) • for the PE type tp2 the energy consumption is specified by two values: • tp2 : (4X, 12X) that means that each PE of type tp2 consumes 4 units of energy (static energy consumption) in a fixed time when it uses no processing resources • 12 units of energy when it consumes the complete PE resources • otherwise E = u ・ (E[100%] − E[0%]) + E[0%]. Fig2.
Cluster Negotiation Algorithm(3/5) • thist[] and nhistc[] are two data objects that store the resource requirement histogramswithin the local memory of the CAs and GAs • thist for the required resources for the tasks • nhistc for the actual PE resource usage status of the cluster c (i.e. Fig. 2 (e), (f)). • Classify tasks by their computation resource requirements The matching of the two data objects nhistc and thist ----equation(1)
Cluster Negotiation Algorithm(4/5) • In Fig. 2 we present an example of the cluster searching procedure. • The task graph of an application that is requested to be mapped is shown in Fig. 2(a). • The energy consumed by various PE types in different resource requirement levels is given in Fig. 2(b) • The resource requirements of the tasks is given in Fig. 2(c). • It is used to calculate the actual required energy consumption for every task on different types of PEs (Fig. 2(d)). • Fig. 2(e) shows the resource requirement profile to create a histogram corresponding to the data object thist[] • Fig. 2(f) presents the histogram nhistc[] for a cluster. • Fig. 2(g) presents the new binding and the selection of the cluster.
The Mapping Algorithm(1/5) • To decide to which tile of a particular PE type a task should be mapped, a heuristics is used, described by the cost function c(t, n),for the selection of a tile nj for a given task ti. • D(n) is the average distance of a tile to all other tiles of the cluster. • d(k) is the Manhattan distance between the mapped tasks, • vol(k) is the communication volume between the connected tasks • RR(nj) is the resource requirement of the PE that will be assigned for the task • bwt(nj) is the total bandwidth requirement of the tasks on the tile.
The Mapping Algorithm(2/5) • In the following, Alg. 2 is explained using an example (see Fig. 5).
The Mapping Algorithm(3/5) • In Fig. 5 (a) we present a task graph, whose tasks are grouped by the binding function (shown in different colors) in the earlier negotiation stage. • In Fig. 5 (b) a part of the tiles of the current cluster is presented. • In Fig. 5(f) presents the computational resource requirements for each task of the task graph. • In Fig. 5(g) shows the current resources in use of some of these tiles . • the availability of the resources is presented by the ordered column in a table (Fig. 5 (d)). • In Fig. 5 (e) we see the first set of flows ftp2 that connect PEs of PE type 2: {f12, f13, f34}. • The flows are sorted in a decreasing order according to their bandwidth requirements. • The result of a successful mapping is illustrated in Fig. 5 (c).
The Mapping Algorithm(4/5) • The pseudo code of the run-time mapping algorithm insideeach cluster is presented in Algorithm. 2. • The input data is the CTG of theapplication. • The CTG contains the communication costs for each flow fijbetween the tasks ti and tj • The model tileLUT,clu of the HMPSoCNoCthatstores the current state of the used computation and communicationresources of that particular cluster. • The tile-LUT tileLUT,clu contains each tile’s current computation resource usage, the type of the PE of this tile tpPE, and the current bandwidth usage for each link • The output (mpng) is the mapping of tasks to tiles of the network which is used to allocate the tiles physically on the network.
Result(1/) • Result comparison on a system with 2048 tiles • 7times lower computational effort compared to Nearest Neighbor
Result(2 • 64x64 NoC:2551/238.9 10.7 times lower traffic in ADAM compared to a centralized schemes
Conclusion • We have introduced the first scheme for a run-time application mapping in a distributed manner using an agent-based approach. • We target adaptive NoC-based heterogeneous multi-processor systems. • Provides 7 times lower computational effort compared to Nearest Neighbor (NN) heuristics • 10.7 times lower trafficproduced by this mapping functionality compared to a centralized scheme