What, Where & How Many? Combining Object Detectors and CRFs

What, Where & How Many?Combining Object Detectors and CRFs L’ubor Ladický, Paul Sturgess, Karteek Alahari, Chris Russell, and Philip H.S. Torr Lecturer：Zhiguo Ma

Outline • Authors • Abstract • Background • Hierarchical CRF • Object detector potential in CRF • Experiments & Conclusion

作者介绍 • L’ubor Ladický • 8 papers on CVPR,ICCV,BMVC,ECCV ,etc. • Best paper of BMVC 2010 & ECCV 2010 • Website： http://sots.brookes.ac.uk/lubor/ • No information for Paul Sturgess & Chris Russell

Karteek Alahari 10+ papers on ACCV, ICPR, CVPR, BMVC,PAMI, ECCV, etc. Website: http://www.di.ens.fr/~alahari/

Philip H.S. Torr • PhD at the Robotics Research Group of the University of Oxford. • Oxford as a research fellow, and is currently a Visiting Fellow in Engineering Science at the University of Oxford • Research scientist for Microsoft Research • Many papers on Journal & conference in fields of CV,ML, PR.

Abstract • Computer vision algorithms for individual tasks such as object recognition, detection and segmentation have shown impressive results in the recent past. • The next challenge is to integrate all these algorithms and address the problem of scene understanding. This paper is a step towards this goal. • We present a probabilistic framework for reasoning about regions, objects, and their attributes such as object class, location, and spatial extent. • Our model is a Conditional Random Field defined on pixels, segments and objects. We define a global energy function for the model, which combines results from sliding window detectors, and low-level pixel-based unary and pair wise relations. • One of our primary contributions is to show that this energy function can be solved efficiently. • Experimental results show that our model achieves significant improvement over the baseline methods on CamVid and PASCAL VOC datasets.

摘要 • 针对单独任务（如物体识别、检测和分割）的计算机视觉算法，在近几年取得很大的进步。下一个挑战是整合这些算法，解决场景理解的问题，本篇文章是向此目标前进的一步。 • 我们提出了一种概率性框架用于推断区域、物体及其属性（如物体类别，位置及空间范围等）。 • 我们的模型是一个定义在像素、区域、物体上的条件随机场。模型定义了一个全局能量函数，整合来自滑动窗口物体检测器、底层像素级的一元和二元信息。 • 我们的一个主要贡献是展示这个能量函数可以被有效地求解。 • 在CamVid及PASCAL VOC数据集上的结果显示，我们的模型比基准算法获得了很大的性能提升。

Background （a）原始图像（b）物体类别分割（c）物体检测（d）检测与分割结合（本文）物体类别分割会丢失一些物体，且不提供物体数目信息；物体检测能检测到此类物体，但不提供前、背景分割结果。整合分割与检测，可以解决上述问题。

Related Work • Stuff and Things • Stuff: homogeneous or reoccurring pattern of fine-scale properties, but no specific spatial extent or shape • Things: have distinct size and shape. • Object class segmentation • Successful on stuff, but fails on things • Foreground (thing) object detection • Good at things, but fail on stuff, which is amorphous

CRF • Label set object class( such as car, airplane, bicycle, etc.) • Random variables Image pixel • Clique c set of pixels conditionally dependent on each other • Labeling x any possible assignment of labels to pixels

Posterior distribution & energy of CRF Normalized factor Labeling Data Clique Set Potential function

Potentials in energy function • Unary potential • Local feature responses , the likelihood of a pixel taking a certain label • Pairwise potential • Encourage neighboring pixels take the same label • Higher order potential between segments • Model relationship between segments, object, etc. • Color potential for instance of objects • Foreground and background estimation

Object detector potential in CRF The set of pixels in a detection d1 is denoted by Xd1 , yd1 represent the validity of detector

Energy function with detector potential Pixel-based energy Pixels in detection Detected Label Set of detections Detection score

Inference for detector potentials • Rewrite detector potential:

Experimental Results • Dataset • CamVid • 10 minutes of high quality 30HZ • 960 X 720 resolution • Three of four sequences shot in daylight, one shot in dusk • 32 classes totally, 11 classes used in this papers • PASCAL VOC 2009 • 14743 images, 20 foreground class and 1 background class • 749 training, 750 validation and 750 test images.

Details of CRF framework • Two level hierarchy CRF based on pixels and segments • Pixel-based potentials • Use TextonBoost to estimate the probability of a certain label by boosting weak classifiers based on a set of shape filter responses. • Segment-based potentials • Segments or super-segments based on Mean shift • Joint Boosting algorithm

Detection-based potentials • Detectors • Histogram-based detector • Multiple features( bag of word, self-similarity, SIFT and oriented edges descriptors) • Cascaded classifier composed of SVMs • Parts-based detector • HOG descriptors • Deformable parts and global template • Latent SVM • Output of detectors • Bounding boxes with response scores • Foreground and background color model • GMM

Results on CamVid dataset

Result on PASCAL VOC dataset

Summary • Integration of detectors with CRF. • Can handle occluded objects and false detections • Efficient and tractable with graph cut.

Thank you! Any Question?

What, Where & How Many? Combining Object Detectors and CRFs