240 likes | 608 Views
What, Where & How Many? Combining Object Detectors and CRFs. L’ubor Ladický, Paul Sturgess, Karteek Alahari, Chris Russell, and Philip H.S. Torr Lecturer : Zhiguo Ma. Outline. Authors Abstract Background Hierarchical CRF Object detector potential in CRF Experiments & Conclusion. 作者介绍.
E N D
What, Where & How Many?Combining Object Detectors and CRFs L’ubor Ladický, Paul Sturgess, Karteek Alahari, Chris Russell, and Philip H.S. Torr Lecturer:Zhiguo Ma
Outline • Authors • Abstract • Background • Hierarchical CRF • Object detector potential in CRF • Experiments & Conclusion
作者介绍 • L’ubor Ladický • 8 papers on CVPR,ICCV,BMVC,ECCV ,etc. • Best paper of BMVC 2010 & ECCV 2010 • Website: http://sots.brookes.ac.uk/lubor/ • No information for Paul Sturgess & Chris Russell
Karteek Alahari 10+ papers on ACCV, ICPR, CVPR, BMVC,PAMI, ECCV, etc. Website: http://www.di.ens.fr/~alahari/
Philip H.S. Torr • PhD at the Robotics Research Group of the University of Oxford. • Oxford as a research fellow, and is currently a Visiting Fellow in Engineering Science at the University of Oxford • Research scientist for Microsoft Research • Many papers on Journal & conference in fields of CV,ML, PR.
Abstract • Computer vision algorithms for individual tasks such as object recognition, detection and segmentation have shown impressive results in the recent past. • The next challenge is to integrate all these algorithms and address the problem of scene understanding. This paper is a step towards this goal. • We present a probabilistic framework for reasoning about regions, objects, and their attributes such as object class, location, and spatial extent. • Our model is a Conditional Random Field defined on pixels, segments and objects. We define a global energy function for the model, which combines results from sliding window detectors, and low-level pixel-based unary and pair wise relations. • One of our primary contributions is to show that this energy function can be solved efficiently. • Experimental results show that our model achieves significant improvement over the baseline methods on CamVid and PASCAL VOC datasets.
摘要 • 针对单独任务(如物体识别、检测和分割)的计算机视觉算法,在近几年取得很大的进步。下一个挑战是整合这些算法,解决场景理解的问题,本篇文章是向此目标前进的一步。 • 我们提出了一种概率性框架用于推断区域、物体及其属性(如物体类别,位置及空间范围等)。 • 我们的模型是一个定义在像素、区域、物体上的条件随机场。模型定义了一个全局能量函数,整合来自滑动窗口物体检测器、底层像素级的一元和二元信息。 • 我们的一个主要贡献是展示这个能量函数可以被有效地求解。 • 在CamVid及PASCAL VOC数据集上的结果显示,我们的模型比基准算法获得了很大的性能提升。
Background (a)原始图像 (b)物体类别分割 (c)物体检测 (d)检测与分割结合(本文) 物体类别分割会丢失一些物体,且不提供物体数目信息;物体检测能检测到此类物体,但不提供前、背景分割结果。整合分割与检测,可以解决上述问题。
Related Work • Stuff and Things • Stuff: homogeneous or reoccurring pattern of fine-scale properties, but no specific spatial extent or shape • Things: have distinct size and shape. • Object class segmentation • Successful on stuff, but fails on things • Foreground (thing) object detection • Good at things, but fail on stuff, which is amorphous
CRF • Label set object class( such as car, airplane, bicycle, etc.) • Random variables Image pixel • Clique c set of pixels conditionally dependent on each other • Labeling x any possible assignment of labels to pixels
Posterior distribution & energy of CRF Normalized factor Labeling Data Clique Set Potential function
Potentials in energy function • Unary potential • Local feature responses , the likelihood of a pixel taking a certain label • Pairwise potential • Encourage neighboring pixels take the same label • Higher order potential between segments • Model relationship between segments, object, etc. • Color potential for instance of objects • Foreground and background estimation
Object detector potential in CRF The set of pixels in a detection d1 is denoted by Xd1 , yd1 represent the validity of detector
Energy function with detector potential Pixel-based energy Pixels in detection Detected Label Set of detections Detection score
Inference for detector potentials • Rewrite detector potential:
Experimental Results • Dataset • CamVid • 10 minutes of high quality 30HZ • 960 X 720 resolution • Three of four sequences shot in daylight, one shot in dusk • 32 classes totally, 11 classes used in this papers • PASCAL VOC 2009 • 14743 images, 20 foreground class and 1 background class • 749 training, 750 validation and 750 test images.
Details of CRF framework • Two level hierarchy CRF based on pixels and segments • Pixel-based potentials • Use TextonBoost to estimate the probability of a certain label by boosting weak classifiers based on a set of shape filter responses. • Segment-based potentials • Segments or super-segments based on Mean shift • Joint Boosting algorithm
Detection-based potentials • Detectors • Histogram-based detector • Multiple features( bag of word, self-similarity, SIFT and oriented edges descriptors) • Cascaded classifier composed of SVMs • Parts-based detector • HOG descriptors • Deformable parts and global template • Latent SVM • Output of detectors • Bounding boxes with response scores • Foreground and background color model • GMM
Summary • Integration of detectors with CRF. • Can handle occluded objects and false detections • Efficient and tractable with graph cut.
Thank you! Any Question?