510 likes | 682 Views
Detecting Cartoons a Case Study in Automatic Video-Genre Classification. Tzvetanka Ianeva Arjen de Vries Hein Röhrig. Outline. Goal: remove cartoons from search results in TREC-2002 video track Our Approach: extract Image Descriptors & SVM Machine Learning Related work
E N D
Detecting Cartoons a Case Study in Automatic Video-Genre Classification Tzvetanka Ianeva Arjen de Vries Hein Röhrig
Outline • Goal: remove cartoons from search results in TREC-2002 video track • Our Approach: extract Image Descriptors & SVM Machine Learning • Related work • Novel Descriptors from Granulometry • SVM Learning • Experimental Results
TREC-2002 video track • TREC- workshops for large scale evaluation of information retrieval technology • CWI participation: Probabilistic Multimedia Retrieval Model • does not distinguish sufficiently “Cartoons”
Example of undesirable ‘cartoon’ Query Best Matches returned
Related work • M.Roach et al. Motion based classificationof cartoons (2001) • B.T.Truong et al. Automatic genre identification for content-based video categorization (2000) • J.R.Smith et al. Searching for images and videos on the world wide web • N.C.Rowe et al. Automatic caption localization for photographs on www pages • V.Athitsos et al. [ASF] Distinguishing photographs and graphics on the www
Cartoons • What is a Cartoon? • Cartoons do not contain any photographic material • Photos photographic camera • Appears easy to find cartoons • Few, simple, strong colors, patches of uniform colors, strong black edges, text
Image Descriptors Input Image Image descriptors 0.6231 0.9266 … … • greater correlation • normalized • Example: avg. sat., thresh. brightness 1 2 148 (240x352x3) 0.2880 0.4125 … … 1 2 148
Overview of our all image descriptors Image Descriptors Dimension average saturation1 threshold brightness 1 color histogram 45 edge-direction histogram 40 compression ratio 1 multi-scale pat. spectrum 60
Brightness and Saturation • HSV color model • Cartoons brighter => use % pixels with Value > 0.4 • Cartoons have strong colors => use average Saturation
Saturation in cartoon and photo images RGB S-(HSV) RGB S-(HSV) 0.6231 0.2880
Brightness in cartoon and photo images . RGB V-(HSV) RGB V-HSV 0.9266 0.4125
Histograms • Image I : XxY -> Rc • Filter F : I -> I’ • Bins Bk partition of Rc • hk = #{ (x,y) : I’(x,y) є Bk } • E.g. brightness metric: I grayscale, c=1, B1 = [ 0, 0.4 ], B2=[0.4,1], return h2
Color Histogram • More general than brightness & saturation • Again HSV color space • Partition HSV into 3x3x5 = 45 bins • Cartoons have less colors => col. hist. desc.
Edge detection • Cartoons have strong black edges => • Approx. total derivative of intensity I(x,y) = ( I(x,y), I(x,y) ) x y • Approx. || and • histogram of (, ||) • 5 intervals for|| 0 … sqrt(20) • 8 intervals for 0 … 2
Compressibility 0.23365 0.13548 • Cartoons: more simple composition • Detect complexity by measuring compression ratio • Theory: “Kolmogorov complexity” • Our application: use lossless PNG compression • Lossy JPEG not useful
Granulometries • Idea: measure size distribution of objects • How? openings by structuring element of growing scale • Normalized size distribution • Derivative = pattern spectrum
Openings Opening = erosion then dilation with same SE
Structuring Elements • Non-flat parabola better(?) than flat disk • Parabola: efficient computation, symmetry
Small-scale pattern spectrum descriptors SE disk ri = i, i = 1,…20
SVM Learning • Simplest case: linear separator • SVM finds hyperplane with largest margin • Closest points = Support Vectors
SVM Learning: nonseparable • Noisy data: no separating hyperplane at all! • Solution: penalty C for points inside the margin • C SVM machines
SVM = quadratic programming SVM task: Equivalent dual problem:
SVM with kernels SVM task: Equivalent dual problem:
SVM kernels RBF kernels Polynomial kernels
SVM with kernels: decision function SVM task: Equivalent dual problem: Decision function:
Experimental Data • Key frames from TREC 2002 Video Track • 13,026 photographic images • 1,620 cartoons • Manually classified • Experiments 1-3: train on (random) 3908 photos and 486 cartoons
Experiment 1: individual performance Et= Ep+Ec |p| |c| |p|+|c| |p|+|c| σ2 = 0.1 0.05 < σ2 < 0.5 σ2 = 0.07 0.05 < σ2 < 0.5 0.05 < σ2 < 0.5 σ2 = 0.07
Experiment 2: “convergence” of SVM learning (Pattern spectrum)
Experiment 3: combined performance σ2 = 0.06
Experiment 4: web-image classifier on our data Test set: random 1,000 photos and 1,000 cartoons
Experiment 5: Performance on web images Comparison with 14,039 photographic and 9,512 graphical images harvested from WWW train on (random) 4239 photographics and 2826 graphics + dimension and file type features
Conclusions • Hard task: good classifier • Use dynamics/spatio-temporal relations ? • Semantic Gap? • Combine classifiers? • Granulometry not enough