Object Detection using Deep Neural Network

Object DetectionusingDeep Neural Network Wan-Ru, Lin 2016/10/27

Outline • Introduction • Background • R-CNN (2014) • SPPnet (2014) – speedup R-CNN • Fast R-CNN (2015) • Faster R-CNN (2015) • YOLO (2015)

Introduction • Object detection has long been an interesting task in computer vision • Location (x,y,w,h) • Classification

Introduction • Before fast R-CNN (2015)… • After fast R-CNN … cat Classifier Feature extraction Region proposal cat Region proposal Feature extraction Classifier [R. Girshick, “Fast R-CNN,” in IEEE International Conference on Computer Vision (ICCV), 2015]

Introduction (2014) (2015) (2015) YOLO (2015)

Background • Convolution Neural Network(CNN) • Convolution • Nonlinearity – (sigmoid , ReLU) • Pooling classifier Feature extractor

Background • Pooling • reduce the spatial size • translation invariant • Loss function • Error backpropagation

Background Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor • PASCAL VOC • Location • Class

Background • Pre-training • ILSVRC dataset ~ 120W images • Fine-tuning • PASCAL VOC 2012

R-CNN • Multi-stage SVM Selective Search

R-CNN • Selective Search • Generate possible object locations

R-CNN • Training • Supervised pre-training : ILSVRC 2012 • Domain-specific fine-tuning : • warp input • output number : 1000 -> 20 + 1(ground truth) • SVM • Separate data with hyperplane

R-CNN • Disadvantage of R-CNN • Distortion due to warping • Training is a multi-stage pipeline • Training is expensive in space and time • Object detection is slow • VGG takes 47s/image

R-CNN

SPPnet • Share feature map • Fixed-length feature • Assume bins • ROI size : • Pooling window size = • Avoid image warping

SPPnet • Share feature maps speed up R-CNN • Achieve comparable mAP with R-CNN

Fast R-CNN 1-scale SPP layer (7x7) • Single-stage training • Training can update all network layer Selective Search ~2K

Fast R-CNN • Multi-task loss • Output : • v

Fast R-CNN • Contributions • Higher mAP than R-CNN and SPPnet • Training is single-stage, using multi-task loss • Training can update all network layers • No disk storage is required for feature caching

Fast R-CNN

Faster R-CNN • Selective search consumes much running time • Fast R-CNN • Region proposal network (RPN)

Faster R-CNN • Region proposal network (RPN) • Pick top-ranked 100 proposal at test time

Faster R-CNN • Timing(ms)

Faster R-CNN • Contribution • Present RPNs for efficient and accurate region proposal generation • Sharing convolutional features for region proposal and object detection

YOLO • Use features from the entire image to predict each bounding box • Single neural network • Region proposal • Feature extraction • Classification • Bounding box regression

YOLO • Divide input image to grid • Each grid cell • predict 2 bounding boxes (x,y,w,h) • Confidence scores of bounding boxes • Predict class probabilities :

YOLO IOU = 0.8 IOU = 0.3 • Output number =

YOLO • VOC 2007

YOLO

YOLO • Limitation • Struggle with small objects that appear in groups • Struggle to generalize to objects in new or unusual aspect ratios or configurations

Reference [1] Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. [2] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders. Selective search for object recognition. International journal of computer vision, 104(2):154–171, 2013. [3] R. B. Girshick. Fast R-CNN. CoRR, abs/1504.08083, 2015 [4] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497, 2015 [5] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015). [6] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014.

Object Detection using Deep Neural Network

Object Detection using Deep Neural Network

Presentation Transcript

SAS Deep Learning Object Detection, Keypoint Detection

Object detection

Object Recognition from Photographic Images Using a Back Propagation Neural Network

Neural Network Training Using MATLAB

Object Detection

Malware Detection Based on Malicious Behaviors Using Artificial Neural Network

Intrusion Detection Using Hybrid Neural Networks

Predicting Signal Peptides using Deep Neural Networks

NEURAL NETWORK-BASED FACE DETECTION

Object Detection

Scheduling problems using Neural network

Deep Neural Network Language Models

Object Recognition Using a Neural Network and Invariant Zernike Features

Face Detection Using Neural Network

Automatic Detection of ADHD subjects using Deep Convolutional Neural Network

DeepID-Net: deformable deep convolutional neural network for generic object detection

Rotation Invariant Neural-Network Based Face Detection

Deep Learning and Neural Network

Deep Neural Network and Its Elements

Object detection

Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks

Diabetic Retinopathy Detection using Neural Networking