190 likes | 299 Views
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab. Winter 2009. Implementing a NoMC on the Gidel platform end-semester presentation. Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch. Table of Contents.
E N D
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Winter 2009 Implementing a NoMC on the Gidel platformend-semester presentation Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch
Project goals • Implementing a parallel processing system which contains several NoCs, each chip containing several sub-networks of processors. • Converting existing router to support Altera platform. • Expanding the router to enable communications between similar sub-networks. • Implementing a processor network which supports communication with the PC enabling: • Use of PC’s CPU as part of the processing network. • Simple I/O between PC and the rest of the processing network.
Top-level structure of the expanded network • Each white square represents a single FPGA on the Gidel board. • FPGA-FPGA, FPGA-PC routes go via designated routers (GW). • The GWs design/protocols are the same as the internal routers.
Router from previous project • Two main units: • Permission Unit • Port FSM • Time limited • Round Robin arbiter • Port to Port & broadcasting • Smart Connectivity • R – R • R - Core • Modular design
Permission process • Round Robin arbiter-service order according to loop counter. • Check if DEST is not busy. • Permit for a ‘time slot’. • If not requesting, service next requesting port. • BUSY and LAST writing ports are saved. • Check for messages COMM and direct to relevant port according to table • Broadcast priority to enable only one bcast’ at a time.
Our changes for the router New router types: Changes: • Local router (LR) • Fabric router (FR) • Primary/secondary interchip router (P/S-ICR) • PC router (PCR) • Fifth port • Routing table • Broadcast table
Fifth port 5th Port Just adding another port module to the ring…
Routing chip fabric local Address comm rank • Local router: • Similar comm – routing by rank. • Other comms – to 5th port. • Other routers: • Routing by comm only. • Result: smaller routing tables
Routing Non-existing components to be added.
Broadcast table • Broadcasting only to spanning tree branches. • Table tags branch ports with ‘1’ value: • Connected to “Port FSM” unit of each port.
Software design Software layers • Application Layer: MPI functions interface • Network Layer: hardware independent implementation of these functions • Data layer: relies on command bit fields • Physical layer: designed for FSL bus Add async. functions Adjusted for new comm size Adjust to conform with alterai/f. Using DMA transfers.
Message Passing Flow Sending MPI_Isend: only adds send request to sending list. Source Buffer • Destination • Tag • Buffer address • Size • Destination • Tag • Buffer address • Size • Destination • Tag • Buffer address • Size DMA transfer DMA sends data asynchronously. Network Receiving Transfer data into buffer in background. DMA transfer MPI_Irecv: only adds receive request to receiving list. Auxiliary Receive Buffer (Constant) • Source • Tag • Buffer address • Size • Source • Tag • Buffer address • Size • Source • Tag • Buffer address • Size DMA transfer DMA receives data asynchronously. Destination Buffer
Obstacle1 - Memory bottleneck • Each Nios uses ~13Kb onchip memory. • FPGA has only ~70Kb onchip memory. Only 5 processors fit. Solutions: • Offchip memory – slow. • Reducing program footprint. • Using bigger FPGA for the whole network.
Obstacle2 - Cache coherency DMA buffer Memory ! ! cache line cache line cache line cache line Cache • Cache flush is necessary but not enough! • Incoherency in unaligned cache lines. Solutions: • Not using cache – asynchronic system not effective. • Disabling cache in buffer area – cannot use cache after DMA transfer. • Align DMA buffers to cache lines (using memalign).
Local router Testing NiosII NiosII * PIO to FIFO connector PIO PIO * Simple FIFO PC * Simple FIFO Local router * Simple FIFO * Simple FIFO NiosII NiosII PIO PIO Testing Program • PIO output debug information, data sent/received and results. • Test program prints the PIO data on screen. • In simulation PIO can be read directly from wave.
Application Multiple matrix multiplication. MUL MUL MUL MUL
Questions Questions