510 likes | 745 Views
Real-time Signal Processing on Embedded Systems. Advanced Cutting-edge Research Seminar I&III. Practical Applications. Pedestrian Detection FPGA-based system Pedestrian Tracking GPU-based system. Hardware Architecture for High-Accuracy Real-Time Pedestrian Detection with CoHOG Features.
E N D
Real-time Signal Processing on Embedded Systems Advanced Cutting-edge Research Seminar I&III
Practical Applications • Pedestrian Detection • FPGA-based system • Pedestrian Tracking • GPU-based system
Hardware Architecture forHigh-Accuracy Real-Time Pedestrian Detection with CoHOG Features
Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion
Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion
Pedestrian detection on automotive systems • Challenges: • Various appearances of pedestrians …Clothes’ shape and color, pose,etc. • Template-baseorsimplegradient-basemethoddoesnotperformhigh-accuracyrecognition • Viewpointmovement …allobjectsinanimagearemoving • Backgroundsubtractionorframesubtractioncannotbeused Arobustrecognitionmethodsuitableforpedestriansisrequired
Pedestrian detection algorithms • Recent trend: • Combination of gradients and histograms • Gradient: robust for illumination and color change • Histogram: robust for deformation • Examples • Histograms of oriented gradients (HOG) • Co-occurrence histograms of oriented gradients (CoHOG)* • HOG-based method • Using pairs of oriented gradients • One of today’s best algorithms for pedestrian detection • However, Real-time execution is difficult to be achieved by software implementation(e.g. a few seconds are required for processing on a 320x240 image) Specialized hardware for real-time processing * T. Watanabe, S. Ito, and K. Yokoi, “Co-occurrence histograms of oriented gradients for pedestrian detection,” PSIVT2009
Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion
Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion
Pedestrian detection using CoHOG Divideintosmallregions (BLOCKS) Pickuppairwisepixels Calculateco-occurrencehistograms Calculategradientorientations Co-occurrencehistogramoforientedgradients Offset1 CoHOGfeaturevector Classified by SVM Offset2 Repeatforvariouspositions of pixel pairs(called asOFFSETS) Variations of offsets(31 offsets) Gradientorientations
Sliding window approach Feature vectors are extracted in a scan line order. Image size or window size is scaled to detect pedestrians in another scale. Detection procedure
Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion
Parallel execution ofCoHOG feature calculation • Large number of co-occurrence histograms must be calculated → All histograms can be calculated in parallel • Offsets • 31 parallel threads • Blocks • Horizontal:6parallel threads • Vertical: 12 parallel threads Large parallelism Weexecute31 parallel offsetsand6 horizontal block-threads=186 parallel threads Blocknumber:6x12=72 Processingperformanceisdrasticallyimproved! Offsetvariations:31
Merging histogram calculation and SVM prediction Matrix size: 8x8=64 • Dimensions of CoHOG feature vector is very high • 64×31offsets×72blocks=about 140k dimensions • Large memory is required to store the feature vector • Many multiplications must be executed duringSVM prediction f(x)=sign(w・x+b) Blocknumber:6x12=72 Offsetvariations:31 Our proposal: Execute histogram calculation and SVM prediction simultaneously
Merging histogram calculation and SVM prediction • Straightforwardapproach Histogram calculation +1toacorrespondingbin Scanimage i j +1 +1 +1 j SVM prediction i Histogram is generated ×wi,j ×wi,j ×wi,j ×wi,j Weightingvectorvalues + Inner product is calculated for SVM prediction
Merging histogram calculation and SVM prediction • Proposed method Histogram calculation Scanimage i j +wi,j +wi,j SVM prediction +wi,j + Directlyaccumulateweightingvectorvalues LargememorytostorehistogramsandmanymultipliersforSVM prediction areunnecessary Circuitsizecanbedrasticallyreduced!
Proposed architecture Gradientorientationimagegenerator CombinedmoduleforhistogramcalculationandSVMprediction Inputimage Shiftregisters Sobelfilter(horizontal) Orientationclassifier Linebuffers Framebuffer WxH Sobelfilter(vertical) Weighting vectorROMs 31offsets Controller Sub-windowdata 6blocks Accumulator Results
Proposed architecture Gradientorientationimagegenerator CombinedmoduleforhistogramcalculationandSVMprediction • Parallelexecution • 31offsets×6blocks=186parallelthreads • MerginghistogramcalculationandSVMprediction • Nohistogrammemoryandmultipliers • Onlyweighting vectorROMsandanaccumulator Inputimage Shiftregisters Sobelfilter(horizontal) Orientationclassifier Linebuffers Framebuffer WxH Sobelfilter(vertical) Weighting vectorROMs 31offsets Controller Sub-windowdata 6blocks Accumulator Results Efficienthardwarearchitectureissuccessfully designedbyusingproposedmethods
Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion
FPGA implementation • Implementation result • Target FPGA: Xilinx Virtex-5 XC5VLS330T-2 Max delay: 5.997ns (Max frequency:167MHz) Capable for real-time processing on 38 fps 320x240 video sequence Our system can process139,166 sub-windows / second Intel Core i7 3.2GHz:about 1,100 sub-windows / second More than 100 times faster!
Pedestrian detection system • FPGA board • Receives input images from host PC, and returns results of pedestrian detection • Xilinx Virtex-5 FPGA LX330T • PCI Express endpoint • DDR2 memory • HostPC • Transfers images captured by a camera, and displays detection results • CPU: Intel Core i7 3.2GHz • Camera: USB webcam (640x480 resolution) PCIExpress Detection result
Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion
Conclusion • High-performance and efficient hardware architecture for CoHOG-based pedestrian detection is proposed • Effectively exploits parallelism in CoHOG algorithm→ 186 parallel processing is realized • Drastically reduces circuit area (memory and multipliers) by proposing simultaneous execution of histogram calculation and SVM prediction • Achieves more than 100 times faster processing by FPGA implementation than CPU→ Capable for real-time processing on 38 fps 320x240 videosequence
Parallel Implementation of Pedestrian Tracking Using Multiple Cues on GPGPU
Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion
Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion
Introduction • Pedestrian recognition • Detection • Tracking Combination of 2 steps Track the pedestrians over the frames Scan entire image Input image Detection Tracking
Introduction • Pedestrian Tracking • Particle Filter • HSV color histogram(K. Okuma et.al., ECCV2004) Succeed to track Fail to track Simple background Complex background HSV histogram within the rectangle
Introduction Color information Red shirt Red car Gray gnd. Gray gnd. HSV histogram HSV histogram Shape information Combining both color and shape information
Introduction • The contributions of this paper • New pedestrian tracking algorithm using both color and shape information based on particle filters • Parallel implementation on GPGPU for real-time processing
Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion
Particle Filter (pedestrian tracking) Scatter particles Eliminate low likelihood particlesand replicate high likelihood particles. Measure the pedestrian likelihood Re-sampling (time t) Measurement Prediction Current frame (time t-1) Particle
Particle Filter (pedestrian tracking) • To define pedestrian likelihood, • we use • Shape information…HOG feature • Color information…HSV histogram Measurement Re-sampling Prediction Current frame Particle
Histograms of Oriented Gradients • Represent object shape information Calculate gradient orientation Aggregate gradient orientation of each block Map the vector on the feature space Learn beforehand by SVM Non-pedestrian Discriminant border Pedestrian HOG Feature space
HSV Histogram • Represent object color information • Convert an input image into a HSV image • Calculate a HSV hist. • Calculate a Bhattacharyya dist. HSV color space Hue Bhattacharyya distance Saturation Value Reference HSV hist. Input image HSV feature space HSV histogram
Pedestrian tracking using multiple cues Non-pedestrian Existing algorithm Reference HSV hist. Pedestrian Pedestrian likelihood Measurement Prediction HOG feature space HSV feature space Weighted coefficient [0,1]
Tracking results • HOG+HSV (our proposed algorithm) • HSV only (K. Okuma et.al., ECCV2004) • HOG only
Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion
NVIDIA GPU architecture SM SM SM • Streaming multiprocessors (SM) • 32-bit scalar processors (SP) • Shared memory • Read only cache • Device memory SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shrdmem Shrdmem Shrdmem Cache Cache Cache Device memory • In case of Tesla C1060, • 4GB Device memory • 30 streaming multiprocessors (total 240 SPs) • 1.3 GHz processor clock
Implementation strategy SM SM SM • Run measurement process on GPU. • Almost 99% computation time SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shrdmem Shrdmem Shrdmem Cache Cache Cache Device memory Measurement Re-sampling Prediction Current frame
Implementation strategy SM SM SM • Allocate each particle on SM • Independent process of each particle SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shrdmem Shrdmem Shrdmem Cache Cache Cache Device memory Measurement Re-sampling Prediction Current frame
Implementation strategy SM SM SM • Exploit pixel level parallelism on SPs • Sync. among SPs is fast. SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shrdmem Shrdmem Shrdmem Cache Cache Cache Device memory Measurement Re-sampling Prediction Current frame
HSV likelihood calculation Transfer the results to the CPU memory Sum all the histograms Calculate the Bhattacharyya dist. Calculate HSV histogram on SPs per line Allocate each particle calculation to the SM Bhattacharyya distance Reference HSV hist. Input image HSV feature space HSV histogram
HOG likelihood calculation Calculate the distance to the discriminant border Transfer the results to the CPU memory Sum histograms Calculate grad.and angle on SPs Calculate HOG histogram on SPs per some pixels Non-pedestrian Allocate each particle calculation to the SM Discriminant border Pedestrian HOG Feature space
Processing time • GPU: NVIDIA Tesla C1060 • Number of multiprocessors: 30 • Total number of scalar processors: 240 • Comparing Intel Core i7 965 @ 3.2 GHz 13.9 times faster 113.6 fps
Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion
Conclusion • Pedestrian tracking algorithm using HSV and HOG featuresis proposed • Real-time processing can be achieved by the parallel implementation using NVIDIA GPU
Report subject (not mandatory) • What do you think about the advance of signal processing on embedded systems in the future? • Please submit the report by email to miya@is.naist.jp. • Please write your student ID and name. • Deadline: Feb 3rd 17:00
レポート課題(必須ではない) • 組込みシステムにおける信号処理の今後について自由に述べよ(応用でも、やりたいことでも何でもOK) • 提出先 miya@is.naist.jp • IDと名前をメール本文に明記すること。 • 締切 2/317:00