1 / 19

Implementing a NoMC on the Gidel platform end-semester presentation

Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab. Winter 2009. Implementing a NoMC on the Gidel platform end-semester presentation. Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch. Table of Contents.

Download Presentation

Implementing a NoMC on the Gidel platform end-semester presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Winter 2009 Implementing a NoMC on the Gidel platformend-semester presentation Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch

  2. Table of Contents

  3. Project goals • Implementing a parallel processing system which contains several NoCs, each chip containing several sub-networks of processors. • Converting existing router to support Altera platform. • Expanding the router to enable communications between similar sub-networks. • Implementing a processor network which supports communication with the PC enabling: • Use of PC’s CPU as part of the processing network. • Simple I/O between PC and the rest of the processing network.

  4. Top-level structure of the expanded network • Each white square represents a single FPGA on the Gidel board. • FPGA-FPGA, FPGA-PC routes go via designated routers (GW). • The GWs design/protocols are the same as the internal routers.

  5. Router from previous project • Two main units: • Permission Unit • Port FSM • Time limited • Round Robin arbiter • Port to Port & broadcasting • Smart Connectivity • R – R • R - Core • Modular design

  6. Permission process • Round Robin arbiter-service order according to loop counter. • Check if DEST is not busy. • Permit for a ‘time slot’. • If not requesting, service next requesting port. • BUSY and LAST writing ports are saved. • Check for messages COMM and direct to relevant port according to table • Broadcast priority to enable only one bcast’ at a time.

  7. Our changes for the router New router types: Changes: • Local router (LR) • Fabric router (FR) • Primary/secondary interchip router (P/S-ICR) • PC router (PCR) • Fifth port • Routing table • Broadcast table

  8. Fifth port 5th Port Just adding another port module to the ring…

  9. Routing chip fabric local Address comm rank • Local router: • Similar comm – routing by rank. • Other comms – to 5th port. • Other routers: • Routing by comm only. • Result: smaller routing tables

  10. Routing Non-existing components to be added.

  11. Broadcast table • Broadcasting only to spanning tree branches. • Table tags branch ports with ‘1’ value: • Connected to “Port FSM” unit of each port.

  12. Software design Software layers • Application Layer: MPI functions interface • Network Layer: hardware independent implementation of these functions • Data layer: relies on command bit fields • Physical layer: designed for FSL bus Add async. functions Adjusted for new comm size Adjust to conform with alterai/f. Using DMA transfers.

  13. Message Passing Flow Sending MPI_Isend: only adds send request to sending list. Source Buffer • Destination • Tag • Buffer address • Size • Destination • Tag • Buffer address • Size • Destination • Tag • Buffer address • Size DMA transfer DMA sends data asynchronously. Network Receiving Transfer data into buffer in background. DMA transfer MPI_Irecv: only adds receive request to receiving list. Auxiliary Receive Buffer (Constant) • Source • Tag • Buffer address • Size • Source • Tag • Buffer address • Size • Source • Tag • Buffer address • Size DMA transfer DMA receives data asynchronously. Destination Buffer

  14. Obstacle1 - Memory bottleneck • Each Nios uses ~13Kb onchip memory. • FPGA has only ~70Kb onchip memory. Only 5 processors fit. Solutions: • Offchip memory – slow. • Reducing program footprint. • Using bigger FPGA for the whole network.

  15. Obstacle2 - Cache coherency DMA buffer Memory ! ! cache line cache line cache line cache line Cache • Cache flush is necessary but not enough! • Incoherency in unaligned cache lines. Solutions: • Not using cache – asynchronic system not effective. • Disabling cache in buffer area – cannot use cache after DMA transfer. • Align DMA buffers to cache lines (using memalign).

  16. Local router Testing NiosII NiosII * PIO to FIFO connector PIO PIO * Simple FIFO PC * Simple FIFO Local router * Simple FIFO * Simple FIFO NiosII NiosII PIO PIO Testing Program • PIO output debug information, data sent/received and results. • Test program prints the PIO data on screen. • In simulation PIO can be read directly from wave.

  17. Application Multiple matrix multiplication. MUL MUL MUL MUL

  18. Questions Questions

More Related