150 likes | 300 Views
Parallel Processing – Final Project. Image Processing Using Cilk Tomer Y & Tuval A (pp25). Project Goals. Global Motion Estimation Using Full Search Block Matching Algorithm for motion vector detection Multithreaded parallel programming with Cilk. Cilk Description.
E N D
Parallel Processing – Final Project Image Processing Using Cilk Tomer Y & Tuval A (pp25) Image Processing Using Cilk
Project Goals • Global Motion Estimation • Using Full Search Block Matching Algorithm for motion vector detection • Multithreaded parallel programming with Cilk Image Processing Using Cilk
Cilk Description • Cilk is a language for multithreaded parallel programming based on ANSI C. • Cilk is designed for general-purpose parallel programming, but it is especially effective for exploiting dynamic, highly asynchronous parallelism, which can be difficult to write in data-parallel or message-passing style. • Unlike many other multithreaded programming systems, Cilk is algorithmic, in that the runtime system employs a scheduler that allows the performance of programs to be estimated accurately based on abstract complexity measures. Image Processing Using Cilk
Introduction to Cilk • The philosophy behind Cilk is that a programmer should concentrate on structuring the program to expose parallelism and exploit locality, leaving Cilk's runtime system with the responsibility of scheduling the computation to run efficiently on a given platform. • Thus, the Cilk runtime system takes care of details like load balancing, paging, and communication protocols. • Unlike other multithreaded languages, however, Cilk is algorithmic in that the runtime system guarantees efficient and predictable performance. Image Processing Using Cilk
A serial C programto compute the nth Fibonacci number A parallel C programto compute the nth Fibonacci number
Compiling and running Cilk programs For producing the fib executable, type the command : > cilk -O2 fib.cilk -o fib To run the program, type: > fib --nproc 4 30 This starts fib on 4 processors to compute the 30th Fibonacci number. At the end of the execution, you should see a printout similar to the following: Result: 832040
Compiling and running Cilk programs – collect performance information The Cilk runtime system collects this information when a program is compiled with the flags -cilk-profile and -cilk-critical-path. $ cilk -cilk-profile -cilk-critical-path -O2 fib.cilk -o fib Cilk program compiled with profiling support can be instructed to print performance information by using the --stats option.
Compiling and running Cilk programs – collect performance information (cont.) The command line > fib --nproc 4 --stats 1 30 yields an output similar to the following: Result: 832040 RUNTIME SYSTEM STATISTICS: Wall-clock running time on 4 processors: 2.593270 s Total work = 10.341069 s Total work (accumulated) = 7.746886 s Critical path = 779.588000 us Parallelism = 9937.154003 FOR MORE INFO... http://supertech.lcs.mit.edu/cilk/
Motion Estimation Motion Estimation Importance : • Effective and quick video signal transmission/storage depends on video compression process. • In order to get high compression ratio while preserving high image quality motion vectors are transmitted instead of image itself. Image Processing Using Cilk
Local Motion EstimationFSA-Full Search Block Matching Algorithm In the two pictures (256*256 pixels) below we can see movement of the camera (in this case 30 pixels right and 18 pixels down). The motion vector (30,18) Previous frame Current frame
Our algorithm The goals of the program is detecting the movement and returning the motion vector The steps of the programs are : 1. Read the two BMP files into two matrixes, containing values of every pixel in the frame. 2. Divide the 256*256 image into Macro Blocks, each Macro Block containing 16*16 pixels.
Pixel position - (i*16 , j*16) Dividing the image to 16*16 MB MB = 16*16 pixels
The next steps of the programs are : 3. Sending each of the MB( i = 0,2,…,15 , j = 0,2,…,15 ) for local motion vector estimation – generating 16*16 processes. 4. Local motion estimation – assuming movements in the x and y directions (-15 <= x_move,y_move <=15 ) and calculating Mean Absolute Error for each of this movements. where S is the previous frame and R is the current frame
5. Find the lowest MAE and chose it’s movement offsets (x_move and y_move) as the local motion vector. 6. Calculate the global motion vector by summing all the local motion vectors together and divide the result by the number of the MB.
Parallel computing The parallel algorithm is based on a Master-Slaves method. By dividing the frame into 16*16 MB we can assign one MB to each process. The processes are independent and no communication between the processes is needed. Considering those facts we should achieve a significant speedup. Image Processing Using Cilk