210 likes | 219 Views
Energy- and Performance-Aware Mapping for Regular NoC Architectures. Jingcao Hu and Radu Marculescu Carnegie Mellon University. Introduction. Integration levels allow system-on-chip design Network-on-chip offers design automation and high performance Structured network wiring
E N D
Energy- and Performance-Aware Mapping for Regular NoC Architectures Jingcao Hu and Radu Marculescu Carnegie Mellon University
Introduction • Integration levels allow system-on-chip design • Network-on-chip offers design automation and high performance • Structured network wiring • Modularity (floorplan) • Standard network interfaces • Assume regular tile-based approach • Assume task-graph architecture where #nodes = #tiles communication paths are annotated with traffic load
Introduction • This paper examines two problems: • Routing • Use standard routing policies to avoid deadlock • Encode deterministic, predefined route for each S/D pair into unique routing table at each router • Mapping • Find good policy for mapping IP cores to tiles • Objective: Minimize energy, maximize performance
Routers • Buffer space • Use small registers (1-2 flits/input port) • Low area requirement, low decoding latency • Routing type • Wormhole routing • Takes advantage of small buffer space • Routing policy • Use deterministic (vs. adaptive) • Simple routing logic • In-order packet arrival • Traffic is minimal and predictable, which allows optimization by placement • Built-in deadlock avoidance
Platform • n x n grid of tiles • Each router has routing table and 5x5 crossbar • Energy model: • Ebit = ESbit + EBbit + EWbit + ELbit • EBbit and EWbit is small compared to other terms… • Ebit = ESbit + ELbit • One bit to ti to tj: • Ebit(ti, tj) = nhops x ESbit + (nhops - 1) x ELbit • Minimal routing : nhops – 1 is Manhattan distance
Obligatory Theory Stuff • Application Characterization Graph (APCG) • Vertexes are IPs • Arcs define communication between IPs and contain information about data volume and required bandwidth • Architecture Characterization Graph (ARCG) • Vertexes are tiles and are fully connected • Arcs contain routings information • Candidate minimal paths (set of links) • Energy requirement • Routes chosen according to XY routing • Problem: find mapping such that energy is minimized
Sanity Check • TGFF used to generate series of task graphs for 3 x 3 to 13 x 13 grids • There are n! possible mappings • Finding optimal mapping is constrained quadratic assignment problem (NP-hard) • Generate 3000 random mappings of IPs • From these, choose best energy and median energy reqm’t • Use SA to search for best mapping
Mapping Search Classic search tree: need state representation, operators, utility function, queuing function, and trim function (paths required by TG)
Search Heuristic • Cost of a node = energy consumed by all nodes which have been mapped • Upper bound cost = no less than minimum cost of descendent leaf nodes • i.e. From this map state, we can do AT LEAST this good • Lower bound cost = lowest cost possible for descendent leaf nodes • i.e. From this map state, we can do AT BEST this good • Algorithm: • Unexpanded node is selected • Next unassigned IP is assigned to each open tile • PAT is computed for each child node • Trim nodes whose cost or LBC > lowest UBC that has been found • How to compute routing paths and UBC/LBC for each node? • Better routing path allocation leads to better results • Tighter UBC/LBC assist in trimming away bad nodes but requires more time
Routing Path Allocation fully adaptive, 8 turns XY routing, 4 turns • Odd-even, 6 turns • Even column: • no EN or ES turn • Odd column: • no NW or SW turn west-first, 6 turns No NW or SW turns
Routing Path Allocation • Find list of communication loads (LCL) • LCL is list of datapaths in task graph exposed by assigning an IP to a tile • Flexibility of a CL is defined as number of possible minimum paths through network (based on routing policy) • LCL is sorted from least flexible to most flexible • choose_link() returns least loaded link allowed by routing policy
UBC Calculation • Compute UBC by for each node by greedily mapping the remaining unmapped IPs • Next unmapped IP with highest communication demand is selected • Ideal location is calculated: • IP is mapped to closest open tile to x,y (Manhattan distance) • Performed until all IPs are mapped • Cost of this leaf node is UBC
LBC Calculation cost amonst mapped to unmapped IPs cost amonst mapped IPs cost amonst unmapped IPs cost of route Unmapped IPs Unmapped tiles Mapped IPs
Pseudocode IPs are sorted by communication demand (in descending order) Priority queue (PQ) sorts nodes to be branched based on cost (in ascending order)
Results (rel. to link bandwidth)
Multimedia Application • Multimedia system (MMS): H263 encoder/decoder, MP3 encoder/decoder • 40 tasks, assign to 16 IPs from Mentor • Use audio/video clips to derive patterns
Results • Mapping: • 10 x 10 tiles: • EPAM took a few minutes to map • SA didn’t finish after 40 hours • MMS: • EPAM-XY can’t find a solution for <= 324 Mb/s links • EPAM-OE and -WF can find solution down to 307 Mb/s • Energy requirement: • OE = WF < XY
Irregular Regions • Irregular region sizes • Divide regions into dummy IPs • Assign high weights for dummy IPs in APCG • Mapping will place these IPs in adjacent tiles