ESC499 – A TMD-MPI/MPE Based Heterogeneous Video System

ESC499 – A TMD-MPI/MPE Based Heterogeneous Video System Tony Zhou, Prof. Paul Chow April 6th, 2010

Background • The Background • Message Passing Interface (MPI): is a specification for an API that allows many computers to communicate with one another. • An API is an abstraction that defines and describes an interface for the interaction with a set of functions. • MPI has become a de facto standard for communication among processes that model a parallel program running on a distributed memory system. • Prof. Paul Chow’s Research • Hardware systems are better suited for parallel processing. FPGA’s reconfigurable nature makes hardware computing engine (CE) design easy. • Similar to what MPI provides to software developers, • TMD-MPI provides software and hardware middleware layers of abstraction for communications to enable the portable interaction between embedded processors, CEs and X86 processors. ESC499 – EngSci Thesis

Filling the Gap, and Defining the Scope • The TMD-MPI research is still in its infant stage compared to the MPI standard, implementation and characterization of designs are lacking. • This project attempts to fill this gap by investigating alternative approaches to present hardware and software elements. • If a simple feasible heterogeneous system was successfully demonstrated, this thesis will focus on expanding the software element network to exploit more parallelism. ESC499 – EngSci Thesis Hardware Element Hardware Element Software Element Software Element Software Element Software Element Hardware Element Hardware Element Software Element Software Element Software Element Software Element Software Element

Objectives • The goal is to create a heterogenous video processing system that demonstrates TMD-MPI’s capabilities as the interface between CEs and software processes. • Called heterogenous due to the combination of hardware engines and software processes. • Implement and characterize different configurations of the system. • Research and Groundwork • Manuel Saldana’s paper • “A Parallel Programming Model for a Multi-FPGA Multiprocessor Machine” • TMD-MPI Library v1.0: software MPI interface designed for Xilinx Microblaze • TMD-MPE v1.0: hardware implementation of send and receive commands of the TMD-MPI library. • Jeff Goeder’s Project • Video System Groundwork: streams video from VGA port, to external memory, then to DVI-out, through MPE-MPE message passing. ESC499 – EngSci Thesis

System Block Diagram

High Level Implementation • The primary goal focuses on functionality rather than performance. • Speed and performance considerations aside, two approaches from the high level perspective can be adopted. • Distributed Memory • Distributed memory for each node • Pass the entire video as continuous messages • Shared Memory • Shared memory for all the nodes • Pass only the pointer to the video in memory ESC499 – EngSci Thesis Network Traffic: (640x480 px) (32-bit/px) = 1200 KB per frame Network Traffic: 32-bit (4B) memory addresses

Distributed Memory • Distributed-memory, video streaming approach. • Microblaze cannot pull data off the FIFO fast enough due to several factors ESC499 – EngSci Thesis Single frame example: Video Decoder @100Mhz Xilinx FSL (FIFO) DVI out V-Dec Xilinx Microblaze Microblaze @ 1-10Mhz Multi-Frame Speed Issue: Xilinx FSL (FIFO) DVI-output @100Mhz

Microblaze PLB bus traffic • First, Xilinx FSL (FIFO) interface access time. • Second, memfory access time, bus arbitration. • Third, implicit sequential execution of instructions in a normal processor. ESC499 – EngSci Thesis • Microblaze operates @ 100Mhz, however the speed is limited by other factors Video Decoder @100Mhz FIFO Microblaze @ 1-10Mhz FIFO DVI-output @100Mhz

Shared Memory • Shared-memory, address mapped tasks • Only 32-bit memory addresses are passed as messages between ranks. • Significant reduction in network traffic (b/f: 640 x 480 x 32 bits per frame) • Multiple microblazes in parallel • Each microblaze is assigned a different region in the common memory space. • Each microblaze can have its own codec (eg on left) or the same one. • Each microblaze then put its own section of frame into its corresponding place in the DVI-out memory space ESC499 – EngSci Thesis Single frame example: Inside the memory:

Results • Why Software & Why Hardware • The TMD-MPI approach to heterogeneous systems prove to be easy and efficient in development. • Shared memory approach significantly improves speed and is linearly scalable. • Suggestion: software-to-hardware, since TMD-MPI/MPE abstracts interface complexities away from the developer. Conclusion ESC499 – EngSci Thesis

Acknowledgements: Professor Paul Chow, Sami Sadaka, Kevin Lam, Kam Pui Tang, Manuel Saldana Thanks and Q&A

ESC499 – A TMD-MPI/MPE Based Heterogeneous Video System