Classification.NET: Efficient and Accurate Classification in C#

Classification.NET: Efficient and Accurate Classification in C# Jamie Shotton Toshiba Corporate Research & Development Center, Japan • http://jamie.shotton.org/work/code/Classification.NET.zip

Introduction • This tutorial gives • brief introduction to classification theory • ideas for practical design and implementation of classifiers • examples of classifiers in Vision • Main technical reference • [Bishop, 2006] • Programming references • [Murach, 2005] • [Liberty, 2005]

Structure • Introduction to classification • Library Design • Implementing classifiers • Real vision research

Classification • Infer discrete class label y from a set of measurements x • Mapping • from a data point • to a class label • This tutorial considers • D-dimensional feature space • binary labels • Supervised learning • labeled training set • N training examples Vending machine Measurements x: material, diameter, weight Class labels y : coin value • [Bishop, 2006]

Probabilistic Interpretation • Discriminative models • model conditional probability directly • Generative models are an alternative • use Bayes’ theorem to infer conditional probability • Decision theory is used to choose a single y • e.g. maximum a-posteriori (MAP) • [Bishop, 2006]

Example Classifiers • Nearest neighbour • [Shakhnarovich, 2006] • Linear discriminants • decision stumps • Fisher’s linear discriminant • perceptrons • Decision trees • [Bishop, 2006]

Example Classifiers • Boosting • http://www.boosting.org/ • Support Vector Machines (SVMs) • a ‘kernel’ methodhttp://www.kernel-machines.org/ • And many more! Demo Time • [Bishop, 2006]

Classification.NET • Framework for classification in C# • general purpose • extensible • A few example classifiers • Download library, demo, and slides from • http://jamie.shotton.org/work/code/Classification.NET.zip • many thanks to Matthew Johnson for the demo

Why C#? • Modern, object-oriented language • combines best of C++ and Java • pointers, interpreted or compiled, garbage collected • .NET libraries • rapid development and re-use of code • Freely available IDEs • http://msdn.microsoft.com/vstudio/express/visualcsharp/ • http://www.mono-project.com/ [Scientific C#]

Representing Data generics double <T> float flexible no performance hit accurate fast, memory efficient double[] or float[] T[] fast, easy, but inflexible IDataPoint<T> flexible v low performance hit

Representing Data int fast, easy extensible to multi-class

Representing Data Sets • So just use T[,] or T[][] arrays? • Not flexible • e.g. on-the-fly computation example i matrix … dimension d …… …… …… …… row vector …

Representing • Custom class DataSet<T> • no interface changes needed for: • on-the-fly computation • sparse arrays • sparse data points void Increment(DataSet<double> dataSet) { for(inti = 0; i < dataSet.Count; i++) for(intd = 0; d < dataSet.Dimensionality; d++) dataSet.Data[i][d] ++; }

Representing Data – Summary IDataPoint<T> int DataSet<T> LabeledDataSet<T>

Classifier<T> –Classifier Base Class public abstract class Classifier<T> { // Train the classifier public abstract void Train(LabeledTrainingSet<T>trainingSet); // Return the classification for the data point public abstract int Classify(IDataPoint<T>dataPoint); … }

Nearest-Neighbour Classification (NN) • Find the nearest point in training set • distance metric (e.g. Euclidean) • Classify the point as • [Shakhnarovich, 2006]

Nearest-Neighbour Classification (NN) x2 x1 ‘decision boundary’ • [Shakhnarovich, 2006]

Nearest-Neighbour Classification (NN) ‘Voronoi’diagram x2 x1 • [Shakhnarovich, 2006]

Implementing NN Classification • So let’s implement NN in Classification.NET • Naïve implementation • very memory hungry • training is instantaneous • testing is very slow

Improvements to NN Classification • Distance computation ‘trick’ • Distances.Euclidean(IDataPoint<double> a,IDataPoint<double> b, double minDistance ) • exact • kd-trees • [Beis, 1997] • classNearestNeighbourFast{ … } • exact or approximate • Parameter sensitive hashing • [Shakhnarovich, 2006] • approximate

Decision Stumps (DS) • Divide space into two halves • division is axis-aligned • classify each half differently • Examples • 2D • 3D

Decision Stumps (DS) • Classifier compares • value xd with • threshold µ • Returns +1or -1 based on sign s µ µ x2 x3 x1

Training Decision Stumps (DS) x1-value threshold x2-value threshold or x2 x2 x1 x1

Training DS • But not always this easy! • not usually linearly separable • D dimensions • Search for best decision stump H • dimensions d • thresholds µ • signs • Training set error

Training DS Efficiently • Project onto each dimension successively x2 projectiononto x2 axis x1 projection onto x1 axis

Which Thresholds To Try? • Fixed discrete set • perhaps wasteful • does not adapt to data • Adaptive discrete set • calculate mid-points between pairs of points • Efficient calculation of training set error ² • algorithm

Efficient computation of error ² • Recall • Consider decision stump with sign • Trivially, decision stump training set error

Efficient computation of error ² • Linear search over µ with update µ 4

Efficient computation of error ² • Linear search over µ with update µ 4 5

Efficient computation of error ² • Linear search over µ with update µ 4 5 6

Efficient computation of error ² • Linear search over µ with update µ 4 5 6 5

Efficient computation of error ² • Linear search over µ with update µ 4 5 6 5 4 3 2 3

Efficient computation of error ² • Linear search over µ with update µ 4 5 6 5 4 3 2 3 5 4 3 4 5 6 7 6

DS Implementation Demo Time • public class DecisionStump: Classifier<double> • { • private int_d; // The data dimension • private double _threshold; // The threshold • private int _sign; // The sign (+1 or -1) // Train the classifier public override void Train(LabeledTrainingSet<T>trainingSet) { … } // Return the classification for the data point public override intClassify(IDataPoint<T>dataPoint) { • returndataPoint[_d] > _threshold ? _sign : -_sign; } …

DS Summary • Complexity • reasonable training time • very low memory • instantaneous classification time • Classification accuracy • individually, not very powerful • but in combination, much more powerful…

Boosting • Many variants, e.g. • AdaBoost[Freund, 1999] • LogitBoost & GentleBoost [Friedman, 1998] • Cascade [Viola, 2001] • super-fast • JointBoost [Torralba, 2007] • multi-class with shared features • Core ideas • combine many simple classifiers • weight the training data points

Core Idea 1 – Classifier • Combine many simple classifiers (‘weak’ or ‘base’ learners) • computes classification score of weak learner • multiplies by learned confidence value • sums over T rounds • compares sum to zero • gives discrete classification value, +1 or -1

Core Idea 2 – Training • Weight the training data points • normalised distribution • emphasise poorly classified examples • Learning is greedy iteration • At round (iteration) t • choose optimal weak learnerunder distribution • calculateto reflect updated accuracy

Weak Learners • Can use almost any type of classifier • must adapt to weights distribution • must give some classification advantage • Simple change allows DS to learn with weights:

AdaBoost Learning Algorithm • Initialise weights • For • train weak learner using distribution • compute training set error • calculate confidence • update weights • [Freund, 1999]

AdaBoost with DS Example 50 rounds 2 rounds 3 rounds 4 rounds 5 rounds 1 round

AdaBoost Implementation Demo Time • public class AdaBoost<WeakLearner>: Classifier<double> • where WeakLearner : Classifier<double>, IWeightedLearner • { • private List<WeakLearner> _h = newList<WeakLearner>(); // The learned weak learners • privateList<double> _alpha = new List<double>(); // The learned alpha values • // Return the classification for the data point • public override intClassify(IDataPoint<T>dataPoint) • { • double classification = 0.0; • // Call the weak learner Classify() method and combine results • for (int t = 0; t < _h.Count; t++) • classification += _alpha[t] * _h[t].Classify(dataPoint); • // Return the thresholded classification • return classification > 0.0 ? +1 : -1; • } • …

AdaBoost Summary • Complexity • complexity of weak learners x T • Weak Learners • stumps, trees, even AdaBoost classifiers • e.g. AdaBoost<AdaBoost<DecisionStump>> • Classification accuracy • very flexible decision boundary • good generalization

Support Vector Machines (SVMs) • Maximize the margin • good generalization • Kernels allow complex decision boundaries • linear, Gaussian, etc. • Classification.NET • class SVM • wrapper for [SVM.NET] library smaller margin larger margin Demo Time • [Bishop, 2006], [Burges, 1998]

Contour Fragments for Object Detection • We can recognise objects based fragments of contour: • Can a computer? • [Shotton, 2007a]

Contour Fragments for Object Detection Demo Time • Clustering learns fragments • Labeled training data • object bounding boxes • Boosted classifier learns • is the object centre here? example i sparse image locations rear torso head hind legs ears belly head … …… …… …… …… dimension d contour fragments • [Shotton, 2007a]

TextonBoost • Goal: semantically segment an image using • texture (via ‘textons’) • layout • context building bicycle road

Classification.NET: Efficient and Accurate Classification in C#