HAsim On-Chip Network Model Configuration

HAsim On-Chip Network ModelConfiguration • Michael Adler

IMEM FET The Front End Multiplexed Legend: Ready to simulate? 1 redirect No CPU 1 CPU 2 (from Back End) training 1 Line Pred (from Back End) Branch Pred 1 2 fault vaddr pred 1 mispred 0 1 inst or fault 0 first FET ITLB IMEM PC Resolve Inst Q 0 1 1 1 0 vaddr paddr enq or drop 0 deq paddr 0 rspImm 0 I$ 1 rspDel 1 slot

On-Chip Networks in a Time-Multiplexed World

Problem: On-Chip Network CPU L1/L2 $ msg credit [0 1 2] [0 1 2] CPU 1 L1/L2 $ CPU 0 L1/L2 $ CPU 2 L1/L2 $ r r r r Memory Control • Problem: routing wires to/from each router • Similar to the “global controller” scheme • Also utilization is low msg msg r credit credit router

Multiplexing On-Chip Network Routers Router 0 Router 1 Router 0..3 reorder σ(x) = (x + 1) mod4 Router 3 Router 2 reorder σ(x) = (x + 2) mod4 σ(x) = (x + 3) mod4 reorder 1 2 3 1 2 3 Simulate the network without a network 2 3 0 2 3 0 3 0 1 3 0 1 0 1 2 0 1 2

On-Chip Network Model Multiplexed Topology

HAsim’s Network Model is Abstract • In a software model the target network can be built at run-time • Dynamism is expensive in FPGAs and recompilation is slow • Solution: Constrained dynamism • Fixed parameters: Max nodes, max edges per node, max VCs • Dynamic: • Number of active contexts (nodes) • Endpoints of each edge (indirection table) • Routing table • Address mapping of distributed LLC

Topology Manager • Software – runs once at startup so no need to optimize • HASIM_CHIP_TOPOLOGY_CLASS: • Manages streaming of parameters to the FPGA • Iterates over all software topology mapping classes until convergence • Namespace defined by dictionaries • .dic files are preprocessed by LEAP tools • Hierarchy of enumerated types

How do I… • Map address ranges to LLC segments? • Map target cores to nodes? • Pick a number of memory controllers and map them to nodes? • Define a target machine network topology? • Manage interleaving for multiplexing the network and cores?

Map Address Ranges to LLC Segments (SW) • Build a table of n_llc_map_entries, where each entry is an index to a portion of the distributed LLC. • icn-mesh.cpp: for (intaddr_idx = 0; addr_idx < n_llc_map_entries; addr_idx++){boolis_last = (addr_idx + 1 == n_llc_map_entries);topology->SendParam(TOPOLOGY_NET_LLC_ADDR_MAP,&cores_net_pos[addr_idx % num_cores],sizeof(TOPOLOGY_VALUE),is_last);}

Map Address Ranges to LLC Segments (FPGA) Consume the table that was streamed in from SW • last-level-cache-no-coherence.bsv: // Define a node that will stream in the topology. This builds a node// on a ring. The node looks for messages tagged TOPOLOGY_NET_MEM_CTRL_MAP// and emits associated payloads.let ctrlAddrMapInit <- mkTopologyParamStream(`TOPOLOGY_NET_MEM_CTRL_MAP);// Allocate a local memory and initialize it with the streamed-in entries.LUTRAM#(Bit#(TLog#(TMul#(8, MAX_NUM_MEM_CTRLS))),STATION_ID) memCtrlDstForAddr<-mkLUTRAMWithGet(ctrlAddrMapInit);// Map an address to a node ID using the tablefunction STATION_ID getMemCtrlDstForAddr(LINE_ADDRESS addr); // Use the low bits of the address as the index (resize does this).return memCtrlDstForAddr.sub(resize(addr));endfunction

Map Address Ranges to LLC Segments (LLC Hub) • rule . . .// Incoming request from coreif (m_reqFromCore matches tagged Valid .req) begin// Which instance of the distributed cache is responsible?let dst = getLLCDstForAddr(req.physicalAddress);if (dst == local_station_id) begin// Local cache handles the address.if (can_enq_reqToLocalLLC &&& ! isValid(m_new_reqToLocalLLC)) begin// Port to LLC is available. Send the local request.did_deq_reqFromCore = True;m_new_reqToLocalLLC = tagged Valid LLC_MEMORY_REQ { src: tagged Invalid,mreq: req};debugLog.record(cpu_iid, $format("1: Core REQ to local LLC, ") + fshow(req)); endendelse if (can_enq_reqToRemoteLLC && ! isValid(m_new_reqToRemoteLLC)) begin// Remote cache instance handles the address and the OCN requestport is available. //// These requests share the OCN request port since only one type of request goes to // a given remote station. Memory stations get memory requests above. LLC stations get// core requests here.did_deq_reqFromCore = True;m_new_reqToRemoteLLC = tagged Valid tuple2(dst, req);debugLog.record(cpu_iid, $format("1: Core REQ to LLC %0d, ", dst) + fshow(req)); endend . . .endrule

Map Cores and Memory Controllers to Nodes • All computed (currently) in icn-mesh.cpp • Given number of target cores and number of memory controllers: • Builds a rectangle of cores as close to square as possible • Adds a row of memory controllers at the top and bottom • Topology streamed to FPGA using same mechanism as address mapping • E.g., 15 cores and 3 memory controllers: • x M M xC CCCCCCCCCCCCCC xx M x x

Network Topology: Map Cores/Memory Controllers to Nodes • Multiplexed order of nodes is the same as order of cores • No permutations required for local port • Nodes are connected to: • Core • Memory controller • Nothing • The node doesn’t care what is connected! • Hide indirection in ports

Network Topology: Map Cores/Memory Controllers to Nodes • In icn-mesh.bsv: • //// Local ports are a dynamic combination of CPUs, memory controllers, and// NULL connections. //// localPortMap indicates, for each multiplexed port instance ID, the type// of local port attached (CPU, memory controller, NULL). //let localPortInit <- mkTopologyParamStream(`TOPOLOGY_NET_LOCAL_PORT_TYPE_MAP);LUTRAM#(Bit#(TLog#(TAdd#(TAdd#(MAX_NUM_CPUS, 1),NUM_STATIONS))),Bit#(2)) localPortMap <- mkLUTRAMWithGet(localPortInit);PORT_SEND_MULTIPLEXED#(MAX_NUM_CPUS, OCN_MSG) enqToCores<-mkPortSend_Multiplexed("Core_OCN_Connection_InQ_enq");PORT_SEND_MULTIPLEXED#(MAX_NUM_MEM_CTRLS, OCN_MSG) enqToMemCtrl<-mkPortSend_Multiplexed("ocn_to_memctrl_enq");PORT_SEND_MULTIPLEXED#(NUM_STATIONS, OCN_MSG) enqToNull<-mkPortSend_Multiplexed_NULL(); let enqToLocal<- mkPortSend_Multiplexed_Split3(enqToCores, enqToMemCtrl, enqToNull,localPortMap);

Network Topology: Defining Inter-Node Edges • Each network node: N Local W E S

Network Multiplexing • Logically, there are n nodes in the network. • Each has a local port connected either to a core, to memory or to nothing. • Network connection mapping and routing will determine the topology. • Topology manager defines the routing table. • Note: Dateline not yet implemented

Network Topology and Routing • Torus:

Network Topology and Routing • Mesh (connections identical, routing table ignores some edges):

Network Topology and Routing • Bi-directional ring:

Network Topology and Routing • Uni-directional ring:

Final Problem: Multiplexing On-Chip Network Routers Router 0 Router 1 Router 0..3 reorder σ(x) = (x + 1) mod4 Router 3 Router 2 σ(x) = (x + 2) mod4 reorder reorder σ(x) = (x + 3) mod4 1 2 3 1 2 3 2 3 0 2 3 0 3 0 1 3 0 1 0 1 2 0 1 2

Network Topology: Communication Across Multiplexed Nodes • Each node talks to a different multiplexed node instance • Naïve port binding would have each node talk only to itself • A-Ports are already buffered • Bury transformation in A-Ports • Retain simple read next / write next port semantics within models

Network Topology: Communication Across Multiplexed Nodes • icn-mesh.bsv: • // Initialization from topology managerReadOnly#(STATION_IID) meshWidth <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_WIDTH);ReadOnly#(STATION_IID) meshHeight <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_HEIGHT); // Outbound and inbound ports are loopbacks to the same multiplexed module. Ports connect // to logically different nodes but physically to the same simulator object. Vector#(NUM_PORTS, PORT_SEND_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqTo = newVector();Vector#(NUM_PORTS, PORT_RECV_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqFrom = newVector(); // Outbound port is a normal A-Port. It has no buffering.enqTo[portEast] <- mkPortSend_Multiplexed("mesh_interconnect_enq_E"); // Inbound port provides buffering for multiplexing. Instead of forwarding messages FIFO // it must transform the messages so they cross to the correct multiplexed instance when // instances (nodes) are traversed sequentially.enqFrom[portWest] <- mkPortRecv_Multiplexed_ReorderLastToFirstEveryN("mesh_interconnect_enq_E", 1,meshWidth, meshHeight); . . .enqFrom[portEast] <- mkPortRecv_Multiplexed_ReorderFirstToLastEveryN("mesh_interconnect_enq_W", 1,meshWidth, meshHeight);enqFrom[portSouth] <- mkPortRecv_Multiplexed_ReorderFirstNToLastN("mesh_interconnect_enq_N", 1,meshWidth);enqFrom[portNorth] <- mkPortRecv_Multiplexed_ReorderLastNToFirstN("mesh_interconnect_enq_S", 1,meshWidth);

HAsim On-Chip Network Model Configuration