Implementing a NoMC on the Gidel platform end-semester presentation

Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Winter 2009 Implementing a NoMC on the Gidel platformend-semester presentation Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch

Table of Contents

Project goals • Implementing a parallel processing system which contains several NoCs, each chip containing several sub-networks of processors. • Converting existing router to support Altera platform. • Expanding the router to enable communications between similar sub-networks. • Implementing a processor network which supports communication with the PC enabling: • Use of PC’s CPU as part of the processing network. • Simple I/O between PC and the rest of the processing network.

Top-level structure of the expanded network • Each white square represents a single FPGA on the Gidel board. • FPGA-FPGA, FPGA-PC routes go via designated routers (GW). • The GWs design/protocols are the same as the internal routers.

Router from previous project • Two main units: • Permission Unit • Port FSM • Time limited • Round Robin arbiter • Port to Port & broadcasting • Smart Connectivity • R – R • R - Core • Modular design

Permission process • Round Robin arbiter-service order according to loop counter. • Check if DEST is not busy. • Permit for a ‘time slot’. • If not requesting, service next requesting port. • BUSY and LAST writing ports are saved. • Check for messages COMM and direct to relevant port according to table • Broadcast priority to enable only one bcast’ at a time.

Our changes for the router New router types: Changes: • Local router (LR) • Fabric router (FR) • Primary/secondary interchip router (P/S-ICR) • PC router (PCR) • Fifth port • Routing table • Broadcast table

Fifth port 5th Port Just adding another port module to the ring…

Routing chip fabric local Address comm rank • Local router: • Similar comm – routing by rank. • Other comms – to 5th port. • Other routers: • Routing by comm only. • Result: smaller routing tables

Routing Non-existing components to be added.

Broadcast table • Broadcasting only to spanning tree branches. • Table tags branch ports with ‘1’ value: • Connected to “Port FSM” unit of each port.

Software design Software layers • Application Layer: MPI functions interface • Network Layer: hardware independent implementation of these functions • Data layer: relies on command bit fields • Physical layer: designed for FSL bus Add async. functions Adjusted for new comm size Adjust to conform with alterai/f. Using DMA transfers.

Message Passing Flow Sending MPI_Isend: only adds send request to sending list. Source Buffer • Destination • Tag • Buffer address • Size • Destination • Tag • Buffer address • Size • Destination • Tag • Buffer address • Size DMA transfer DMA sends data asynchronously. Network Receiving Transfer data into buffer in background. DMA transfer MPI_Irecv: only adds receive request to receiving list. Auxiliary Receive Buffer (Constant) • Source • Tag • Buffer address • Size • Source • Tag • Buffer address • Size • Source • Tag • Buffer address • Size DMA transfer DMA receives data asynchronously. Destination Buffer

Obstacle1 - Memory bottleneck • Each Nios uses ~13Kb onchip memory. • FPGA has only ~70Kb onchip memory. Only 5 processors fit. Solutions: • Offchip memory – slow. • Reducing program footprint. • Using bigger FPGA for the whole network.

Obstacle2 - Cache coherency DMA buffer Memory ! ! cache line cache line cache line cache line Cache • Cache flush is necessary but not enough! • Incoherency in unaligned cache lines. Solutions: • Not using cache – asynchronic system not effective. • Disabling cache in buffer area – cannot use cache after DMA transfer. • Align DMA buffers to cache lines (using memalign).

Local router Testing NiosII NiosII * PIO to FIFO connector PIO PIO * Simple FIFO PC * Simple FIFO Local router * Simple FIFO * Simple FIFO NiosII NiosII PIO PIO Testing Program • PIO output debug information, data sent/received and results. • Test program prints the PIO data on screen. • In simulation PIO can be read directly from wave.

Application Multiple matrix multiplication. MUL MUL MUL MUL

Questions Questions

Implementing a NoMC on the Gidel platform end-semester presentation

Implementing a NoMC on the Gidel platform end-semester presentation

Presentation Transcript

Modified Presentation on Luminis Platform

End of Semester Review

End of Semester Presentation

Overview Presentation on Luminis Platform

End of Semester Review

THE END OF THE SEMESTER

Independent Study End of Semester Presentation

Semester Presentation

SEMESTER PRESENTATION

Strategies for Implementing a Learning Platform

The End of the Semester

End of Semester Presentation

Project Final Semester A Presentation

Implementing a NoMC on the Gidel platform mid-semester presentation

End of Semester Presentation 05-07-2004

Implementing a NoMC on the Gidel platform end-project presentation

End of Semester Meeting

End of Semester Presentation

End of Semester Review

Final Presentation Semester A

End of Semester

End of Semester Review