250 likes | 384 Views
Multiprocessor Architecture for Image Processing. Under the guidance of Dr. Anshul Kumar. Mayank Kumar 2006EE10331 Pushpendre Rastogi 2006EE50412. Introduction.
E N D
Multiprocessor Architecture for Image Processing Under the guidance of Dr. Anshul Kumar Mayank Kumar 2006EE10331 Pushpendre Rastogi 2006EE50412
Introduction • Signal Processing, particularly image/video processing in embedded platform for implementing complex algorithms meeting real time deadlines requires high end processors. • Power consumption and cost are the major issues against massive deployments of Embedded processing nodes. • Eg surveillance camera network, traffic monitoring and control etc
Introduction • FPGA/Reconfigurable ASIC provide promising solution to the above problem by designing specific hardware utilizing the parallelism in algorithm. • Though, there are many shortcomings • Gates get used up when complex algorithm are implemented. • Implementing sequential algorithms on FPGA directly is highly inefficient.
Our approach • To design a multiprocessor architecture to facilitate the processing of high resolution image/video frames. • Design of PE, or node processor customized to handle pixel/region level operations efficiently. • Given the PE, design of the architecture for interconnecting these processors and design of input/output Hardware.
Novelty • By having an array of processors, we are exploiting the parallelism offered by processing different regions of frame in different processors. • In any processor, sequential algorithm are efficiently implemented by providing application specific instruction set. Locally Sequential and Globally parallel
Locally Sequential Globally Parallel • Any class of algorithms which are window based and essentially operates on regions of the image, rather then the image as a whole. • Image change detection for surveillance applications • Optic flow, motion estimation, filtering etc • We chose “Image change detection using Background Modeling” as a test algorithm.
Word Done • Hardware Part • Initial Architecture • Drawbacks • Change of platform • New Architecture • Implementation • Software Part • Algorithm Analysis and implementation • Fixed point Matlab Simulation • C Implementation
Initial Architecture RGB Conversion Power PC M1 M1 M1 M E M O R Y Video DAC Video ADC` MPMC M1 M1 M1 M1 M1 M1 Array Topology Monitor Camera Virtex II Pro
Architectural Drawbacks • Multi processor Memory controller could only handle finite (2-4) parallel access from different processors. • Solution: We should use BRAM for parallel access. • We need to store the whole frame as the image format in XUPV30 is interlacing. -> Will use up all available BRAMs • Solution: Use a board which provides progressive data. Moreover, all digital camera these days provide progressive image data.
Change of Platform • We switched to Xilinx ML401 Virtex Video Starter Kit. • Provides progressive Video input • Much more BRAM, • Matlab/Simulink as a design platform for designing at higher abstraction level. • Though, switching platform consumed time due to a associated learning curve.
New Architecture Custom Memory Controller (Verilog Module) VIO_in VIO_in ` Video DAC Video ADC` Array of Block Ram Array of Processor Network Monitor Camera
Description and Implementation • ML401 VSK provides two FPGAs • Xilinx XUP2V7 for image input/output • Xilinx ML401 for developing application. • VIO_in and VIO_out are reference design which sandwiches the user level design. It provides progressive image data. • We designed the custom Memory controller suited to our needs. It writes data to FIFOs implemented using BRAMs.
Custom Memory controller • Takes H_sync, v_sync, rst, Pixel_clk as input and selects a target FIFO to write the incoming data. • Each BRAM stores Image data corresponding to 4 lines. • It first empties the queue reading the result computed in the last iteration. • The other end of the FIFO is read through the Microblaze processor using FSL Links.
Processor Network • Each processor network comprises of one Master processor, and 1-7 slave processors. • Master processor reads data from FIFO and distribute the work among slave processors. • We demonstrated this using 3 processor- 1 master and 2 slave
Processor Network Basic Design • We connected the master processor to Uart to establish a serial link for input/output. • The master processor connected to slave processor which are running the same algorithm. • It takes input from uart, and passes it to diferent slaves. • Master processor distributes work, by sending different regions of the image to different processors.
Software Architecture • Studied the Adaptive Background Mixture Model. [1], [2] • Analysis of the algorithm for: • Parallelism exploitation • Length of code for implementation • Memory requirements to store data. • Feasibility
The Algorithm • Models each region of the image frame as a sum of N Gaussians with respective weights attached, • Update the model when new frame arrives. • Depending on which Gaussian distribution (k) the current pixel data belongs to , make the Foreground/Background decision • Effectively models repetitive changes in background. • Resistant to noise and slow illumination variations
Fixed Point Matlab simulation • Using Fixed point toolbox, we redefined our variables and constant in Q format. • Data Types: DataTypeMode: Fixed-point: binary point scaling Signed: true WordLength: 32 FractionLength: 31 Weight/other Constants DataTypeMode: Fixed-point: binary point scaling Signed: true WordLength: 32 FractionLength: 23 Pixel Data
Fixed Point Calculations RoundMode: nearest OverflowMode: wrap ProductMode: SpecifyPrecision ProductWordLength: 32 ProductFractionLength: 23 SumMode: SpecifyPrecision SumWordLength: 32 SumFractionLength: 23 CastBeforeSum: true
C implementation • The Code is ported onto Xilinx Platform Studio for putting it onto Microblaze processors. • Simulations shows equivalent results. • All the PE contains the same code, they get different data to operate upon coming from different regions of the image.
Pitfalls • Xilinx VSK design suit promises high level design of image/video processing using simulink. • We tried using this, but it does not provide enough granularity for our design needs. • Design become very complex to debug. • Very tough to tweak sample design • Xilinx EDK should be used for these kind of designs.
Conclusions • We designed different parts of our proposed architecture: • Input/output • Custom Memory controller • Basic Network processor. • We have simulated and implemented the test algorithm on a network of processor as a proof of concept. • We learnt the FPGA design flow and the Hardware Software Co-design.
Future work • In this work, we used Microblaze processors. • Instruction set not optimized for Pixel/Region based image processing. • Lots of extra features that can be trimmed. • Design of a custom processor suited for these application. • Less FPGA Area need • More efficient
References [1] Adaptive Background Mixture Model for Real-time tracking – Cris Stauffer, WELGrimson: AI, MIT – 1999 [2] Understanding Background Mixture model- P Wayne Power, Johnn A. Schoonees: Image and vision computing NZ, 2002 [3] A Microblaze based Multiprocessor SoC – P. Huerta, J. Castillo, J.I. Martinaze: 2007 [4]Xilinx Microblaze ProcessorReference V7.0 UG081 [5]Xilinx Virtex II Pro User Guide [6] Xilinx Video Start Kit (VSK) user Guide [7] Xilinx: SAPP529 Connecting customized IP to the Microblaze Soft Processor Core using FSL Link [8] EDK 9.1i Microblaze tutorial – A getting Started Guide [9] Xilinx White paper: Multiprocessor on XPS