210 likes | 221 Views
This paper presents KLBoosting, a variation of RealBoost that uses Kullback-Leibler divergence to select optimal features. It also includes a detailed description of the feature selection process and parameter learning in KLBoosting compared to AdaBoost.
E N D
Kullback-Leibler Boosting Ce Liu, Hueng-Yeung Shum Microsoft Research Asia CVPR 2003 Presented by Derek Hoiem
RealBoost Review • Start with some candidate feature set • Initialize training sample weights • Loop: • Add feature to minimize error bound • Reweight training examples, giving more weight to misclassified examples • Assign weight to weak classifier according to weighted error of training samples • Exit loop after N features have been added
The Basic Idea of KLBoosting • Similar to RealBoost except: • Features are general linear projections • Generates optimal features • Uses KL divergence to select features • Finer tuning on coefficients
Linear Features • KLBoosting: • VJ Adaboost:
What makes a feature good? • KLBoosting: • RealBoost: • Minimize upper bound on classification error
Creating the feature set • Sequential 1-D Optimization • Begin with large initial set of features (linear projections) • Choose top L features according to KL-Div • Initial feature = weighted sum of L features • Search for optimal feature in directions of L features
Example • Initial feature set: x x x x x x x x
Example • Top two features (by KL-Div): x x x x x x x x w1 w2
Example • Initial feature (weighted combo by KL): x x x x x x x x w1 f0 w2
Example • Optimize over w1 x x x x x x x x f1= f0 + B* w1 w1 f1 w2 B = -a1..a1
Example • Optimize over w2 x x x x x x x x f2= f1 + B* w2 w1 f2 w2 B = -a2..a2 (and repeat…)
Creating the feature set First three features Selecting the first feature
Classification = ½ in RealBoost
Parameter Learning • With each added feature k: • Set first a1..ak-1 to current optimal value • Set ak to 0 • Minimize recognition error on training: • Solve using greedy algorithm
KLBoost vs AdaBoost 1024 candidate features for AdaBoost
Face detection: candidate features 52,400 2,800450
Face detection: training samples • 8760 faces + mirror images • 2484 non-face images 1.34B patches • Cascaded classifier allows bootstrapping
Face detection: final features top ten global semantic global not semantic local
Results Test time: .4 sec per 320x240 image x x x x 8 85 853 Schneiderman (2003)
Comments • Training time? • Which improves performance: • Generating optimal features? • KL feature selection? • Optimizing alpha coefficients?