A Parallel Implementation of MSER detection

A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao

Review Invariant to affine transformation, such as rotation, translation, and scale change; Denotes a set of stable connected components that are detected in gray scale image;

Review • MSER is a stable Connected Component of thresholded image • All pixels inside the MSER have higher or lower intensities than in the surrounding regions • Regions are selected to be stable over intensity range

Sequential and Parallel Approach Sequential { Parallel { bucketSort(); buildDirectedGraph( ); Find ( ); blockReduction( ); Union( ); parentCompression( ); Update( ); // already get regions GetRegion( ); computeVariation( ); computeVariation( ); findRoot( ); leastVariation( ); } } leastVariation( );

buildDirectedGraph A parent’s value of each pixel should no less than its current value. local memory: visited, members Shared memory

buildDirectedGraph Memory Usage: local memory: visited, members Shared memory Also process edge for next step

16*16, 8*8 Block Reduction

log 24 Block Reduction log 22 totally 3 iterations are needed

Load edge information to each pixel Block Reduction If (horizontal_pixelUpdate)

History buffer Block Reduction

Parent Compression Shared memory based on parent locality

FindRegion • FindRoot, so that we can process each region’s tree respectively • Find region’s parent and child based on the delta, so that variation can be computed. • var = (area(parent) – area(child))/area(current region); • Send the region information to CPU • Scan every region’s tree, find the minival variation, which is MSER regions. • Filter the region

Performance Analysis • For 256*256 image,

Performance Analysis • For 1024*768 image,

Performance Analysis Why 8*8 better than 16*16? • local memory usage • recursion times • block execution • block reduction times • parent locality

Performance Analysis GPU vs CPU timing • intermidiate values • Synchronization • record information • memory transfer

Conclusion • Very large data dependancy, still can be solved. • Should be suitable to multicore microprocessor, whose individual core is strong enough than the single thread in GPU. • The bottenleck is still memory.

Future Work • More efficient block reduction. (decoder and encoder) Memory random access GPU code effciency

A Parallel Implementation of MSER detection

A Parallel Implementation of MSER detection

Presentation Transcript

A Parallel, High Performance Implementation of the Dot Plot Algorithm

A Parallel Viterbi Decoder Implementation for High Throughput

Parallel implementation of RAndom SAmple Consensus (RANSAC)

Implementation of a parallel web proxy server with caching

Sequence detection for parallel ACK

MIDeA :A Multi-Parallel Instrusion Detection Architecture

Parallel Edge Detection

Parallel Pair-HMM SNP Detection

Parallel Implementation of BWT

A GPU Implementation of Extragalactic Radio Source Detection A CUDA Approach

The PFunc Implementation of NAS Parallel Benchmarks.

Implementation of Parallel Algorithms for Heterogeneous Platforms

Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition

Bulk-Synchronous Parallel ML Semantics and Implementation of the Parallel Juxtaposition

Implementation of a Parallel K-Nearest Neighbor Algorithm Using MPI

Real-Time Canny Edge Detection Parallel Implementation for FPGAs

Scalable Parallel Intrusion Detection

Parallel Detection of Regulatory Elements with gMP

Implementation of Parallel Simulated Annealing

A Parallel Algorithm for Hardware Implementation of Inverse Halftoning

Text Detection from Natural Images using MSER Algorithm

A Parallel Algorithm for Hardware Implementation of Inverse Halftoning