Multiprocessor Architecture for Image Processing

Multiprocessor Architecture for Image Processing Under the guidance of Dr. Anshul Kumar Mayank Kumar 2006EE10331 Pushpendre Rastogi 2006EE50412

Introduction • Signal Processing, particularly image/video processing in embedded platform for implementing complex algorithms meeting real time deadlines requires high end processors. • Power consumption and cost are the major issues against massive deployments of Embedded processing nodes. • Eg surveillance camera network, traffic monitoring and control etc

Introduction • FPGA/Reconfigurable ASIC provide promising solution to the above problem by designing specific hardware utilizing the parallelism in algorithm. • Though, there are many shortcomings • Gates get used up when complex algorithm are implemented. • Implementing sequential algorithms on FPGA directly is highly inefficient.

Our approach • To design a multiprocessor architecture to facilitate the processing of high resolution image/video frames. • Design of PE, or node processor customized to handle pixel/region level operations efficiently. • Given the PE, design of the architecture for interconnecting these processors and design of input/output Hardware.

Novelty • By having an array of processors, we are exploiting the parallelism offered by processing different regions of frame in different processors. • In any processor, sequential algorithm are efficiently implemented by providing application specific instruction set. Locally Sequential and Globally parallel

Locally Sequential Globally Parallel • Any class of algorithms which are window based and essentially operates on regions of the image, rather then the image as a whole. • Image change detection for surveillance applications • Optic flow, motion estimation, filtering etc • We chose “Image change detection using Background Modeling” as a test algorithm.

Word Done • Hardware Part • Initial Architecture • Drawbacks • Change of platform • New Architecture • Implementation • Software Part • Algorithm Analysis and implementation • Fixed point Matlab Simulation • C Implementation

Initial Architecture RGB Conversion Power PC M1 M1 M1 M E M O R Y Video DAC Video ADC` MPMC M1 M1 M1 M1 M1 M1 Array Topology Monitor Camera Virtex II Pro

Architectural Drawbacks • Multi processor Memory controller could only handle finite (2-4) parallel access from different processors. • Solution: We should use BRAM for parallel access. • We need to store the whole frame as the image format in XUPV30 is interlacing. -> Will use up all available BRAMs • Solution: Use a board which provides progressive data. Moreover, all digital camera these days provide progressive image data.

Change of Platform • We switched to Xilinx ML401 Virtex Video Starter Kit. • Provides progressive Video input • Much more BRAM, • Matlab/Simulink as a design platform for designing at higher abstraction level. • Though, switching platform consumed time due to a associated learning curve.

New Architecture Custom Memory Controller (Verilog Module) VIO_in VIO_in ` Video DAC Video ADC` Array of Block Ram Array of Processor Network Monitor Camera

Description and Implementation • ML401 VSK provides two FPGAs • Xilinx XUP2V7 for image input/output • Xilinx ML401 for developing application. • VIO_in and VIO_out are reference design which sandwiches the user level design. It provides progressive image data. • We designed the custom Memory controller suited to our needs. It writes data to FIFOs implemented using BRAMs.

Custom Memory controller • Takes H_sync, v_sync, rst, Pixel_clk as input and selects a target FIFO to write the incoming data. • Each BRAM stores Image data corresponding to 4 lines. • It first empties the queue reading the result computed in the last iteration. • The other end of the FIFO is read through the Microblaze processor using FSL Links.

Processor Network • Each processor network comprises of one Master processor, and 1-7 slave processors. • Master processor reads data from FIFO and distribute the work among slave processors. • We demonstrated this using 3 processor- 1 master and 2 slave

Processor Network Basic Design • We connected the master processor to Uart to establish a serial link for input/output. • The master processor connected to slave processor which are running the same algorithm. • It takes input from uart, and passes it to diferent slaves. • Master processor distributes work, by sending different regions of the image to different processors.

Software Architecture • Studied the Adaptive Background Mixture Model. [1], [2] • Analysis of the algorithm for: • Parallelism exploitation • Length of code for implementation • Memory requirements to store data. • Feasibility

The Algorithm • Models each region of the image frame as a sum of N Gaussians with respective weights attached, • Update the model when new frame arrives. • Depending on which Gaussian distribution (k) the current pixel data belongs to , make the Foreground/Background decision • Effectively models repetitive changes in background. • Resistant to noise and slow illumination variations

Fixed Point Matlab simulation • Using Fixed point toolbox, we redefined our variables and constant in Q format. • Data Types: DataTypeMode: Fixed-point: binary point scaling Signed: true WordLength: 32 FractionLength: 31 Weight/other Constants DataTypeMode: Fixed-point: binary point scaling Signed: true WordLength: 32 FractionLength: 23 Pixel Data

Fixed Point Calculations RoundMode: nearest OverflowMode: wrap ProductMode: SpecifyPrecision ProductWordLength: 32 ProductFractionLength: 23 SumMode: SpecifyPrecision SumWordLength: 32 SumFractionLength: 23 CastBeforeSum: true

Matlab simulation

C implementation • The Code is ported onto Xilinx Platform Studio for putting it onto Microblaze processors. • Simulations shows equivalent results. • All the PE contains the same code, they get different data to operate upon coming from different regions of the image.

Pitfalls • Xilinx VSK design suit promises high level design of image/video processing using simulink. • We tried using this, but it does not provide enough granularity for our design needs. • Design become very complex to debug. • Very tough to tweak sample design • Xilinx EDK should be used for these kind of designs.

Conclusions • We designed different parts of our proposed architecture: • Input/output • Custom Memory controller • Basic Network processor. • We have simulated and implemented the test algorithm on a network of processor as a proof of concept. • We learnt the FPGA design flow and the Hardware Software Co-design.

Future work • In this work, we used Microblaze processors. • Instruction set not optimized for Pixel/Region based image processing. • Lots of extra features that can be trimmed. • Design of a custom processor suited for these application. • Less FPGA Area need • More efficient

References [1] Adaptive Background Mixture Model for Real-time tracking – Cris Stauffer, WELGrimson: AI, MIT – 1999 [2] Understanding Background Mixture model- P Wayne Power, Johnn A. Schoonees: Image and vision computing NZ, 2002 [3] A Microblaze based Multiprocessor SoC – P. Huerta, J. Castillo, J.I. Martinaze: 2007 [4]Xilinx Microblaze ProcessorReference V7.0 UG081 [5]Xilinx Virtex II Pro User Guide [6] Xilinx Video Start Kit (VSK) user Guide [7] Xilinx: SAPP529 Connecting customized IP to the Microblaze Soft Processor Core using FSL Link [8] EDK 9.1i Microblaze tutorial – A getting Started Guide [9] Xilinx White paper: Multiprocessor on XPS

Multiprocessor Architecture for Image Processing

Multiprocessor Architecture for Image Processing

Presentation Transcript

Image processing

Multiprocessor Architecture for Image processing

Outline For Image Processing

IMAGE PROCESSING ON THE TMS320C8X MULTIPROCESSOR DSP

Multiprocessor Architecture

ECEC 453 Image Processing Architecture

Fuzzy for Image Processing

Image Processing

ECE-C453 Image Processing Architecture

ECEC-453 Image Processing Architecture

MATLAB for Image Processing

Image processing

Image Processing for MRI

Application of multiprocessor and GRID technology in medical image processing

Image Processing

CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem

Chapter 4 Multiprocessor architecture

ECEC 453 Image Processing Architecture

ECE-C490 Winter 2004 Image Processing Architecture

image processing

Image Processing

Image Processing