Learning with AdaBoost

Learning with AdaBoost Fall 2007

Outline • Introduction and background of Boosting and Adaboost • Adaboost Algorithm example • Adaboost Algorithm in current project • Experiment results • Discussion and conclusion Learning with Adaboost

Boosting Algorithm • Definition of Boosting[1]: Boosting refers to a general method of producing a very accurate prediction rule by combining rough and moderately inaccurate rules-of-thumb. • Boosting procedures[2] • Given a set of labeled training examples ,where is the label associated with instance • On each round , • The booster devises a distribution (importance) over the example set • The booster requests a weak hypothesis (rule-of-thumb) with low error • After T rounds, the booster combine the weak hypothesis into a single prediction rule. Learning with Adaboost

Boosting Algorithm(cont’d) • The intuitive idea Altering the distribution over the domain in a way that increases the probability of the “harder” parts of the space, thus forcing the weak learner to generate new hypotheses that make less mistakes on these parts. • Disadvantages • Needs to know the prior knowledge of accuracies of the weak hypotheses • The performance bounds depends only on the accuracy of the least accurate weak hypothesis Learning with Adaboost

background of Adaboost[2] Learning with Adaboost

Adaboost Algorithm[2] Learning with Adaboost

Advantages of Adaboost • Adaboost adjusts adaptively the errors of the weak hypotheses by WeakLearn. • Unlike the conventional boosting algorithm, the prior error need not be known ahead of time. • The update rule reduces the probability assigned to those examples on which the hypothesis makes a good predictions and increases the probability of the examples on which the prediction is poor. Learning with Adaboost

The error bound[3] • Suppose the weak learning algorithm WeakLearn, when called by Adaboost, generates hypotheses with errors . Then the error of the final hypothesis output by Adaboost is bounded above by Note that the errors generated by WeakLearn are not uniform, and the final error depends on the error of all of the weak hypotheses. Recall that the errors of the previous boosting algorithms depend only on the maximal error of the weakest hypothesis and ignored the advantages that can be gained from the hypotheses whose errors are smaller. Learning with Adaboost

A toy example[2] Training set: 10 points (represented by plus or minus)Original Status: Equal Weights for all training samples Learning with Adaboost

A toy example(cont’d) Round 1: Three “plus” points are not correctly classified;They are given higher weights. Learning with Adaboost

A toy example(cont’d) Round 2: Three “minuse” points are not correctly classified;They are given higher weights. Learning with Adaboost

A toy example(cont’d) Round 3: One “minuse” and two “plus” points are not correctly classified;They are given higher weights. Learning with Adaboost

A toy example(cont’d) Final Classifier: integrate the three “weak” classifiers and obtain a final strong classifier. Learning with Adaboost

Look at Adaboost[3] Again Learning with Adaboost

Adaboost(Con’d):Multi-class Extensions • The previous discussion is restricted to binary classification problems. The set Y could have any number of labels, which is a multi-class problems. • The multi-class case (AdaBoost.M1) requires the accuracy of the weak hypothesis greater than ½. This condition in the multi-class is stronger than that in the binary classification cases Learning with Adaboost

AdaBoost.M1 Learning with Adaboost

Error Upper Bound of Adaboost.M1[3] • Like the binary classification case, the error of the final hypothesis is also bounded. Learning with Adaboost

How does Adaboost.M1 work[4]? Learning with Adaboost

Adaboost in our project Learning with Adaboost

Adaboost in our project • 1) The initialization has set the total weights of target class the same as all other staff. bird[1,…,10] = ½ * 1/10; otherstaff[1,…,690] = ½ * 1/690; • 2) The history record is preserved to strengthen the updating process of the weights. • 3) the unified model obtained from CPM alignment are used for training process. Learning with Adaboost

Adaboost in our project • 2) The history record weight_histogram(withHistory Record) weight_histogram(without History Record) Learning with Adaboost

Adaboost in our project 3) the unified model obtained from CPM alignment are used for training process. This has decreased the overfitting problem. 3.1) Overfitting Problem. 3.2) CPM model. Learning with Adaboost

3.1) Overfitting Problem. Why the trained Adaboost does not work for bird 11~20? I have compared: I ) the rank of alpha value for each 60 classifiers II) how each classifier has actually detected birds in train process III) how each classifier has actually detected birds in test process. The covariance is also computed for comparison: cov(c(:,1),c(:,2)) ans = 305.0000 6.4746 6.4746 305.0000 K>> cov(c(:,1),c(:,3)) ans = 305.0000 92.8644 92.8644 305.0000 K>> cov(c(:,2),c(:,3)) ans = 305.0000 -46.1186 -46.1186 305.0000 Adaboost in our project Overfitted! Train data is different from test data. This is very common. Learning with Adaboost

Adaboost in our project Train Result(Covariance:6.4746) Learning with Adaboost

Adaboost in our project Comparison:Train&Test Result(Covariance:92.8644) Learning with Adaboost

Adaboost in our project 3.2) CPM: continuous profile model; put forward by Jennifer Listgarten. This is very useful for data alignment. Learning with Adaboost

Adaboost in our project • The alignment results from CPM model: Learning with Adaboost

Adaboost in our project • The unified model from CPM alignment: without resampled after upsample and downsample Learning with Adaboost

Adaboost in our project • The influence of CPM for history record Learning with Adaboost

Browse all birds Learning with Adaboost

Curvature Descriptor Learning with Adaboost

Distance Descriptor Learning with Adaboost

Adaboost without CPM Learning with Adaboost

Adaboost without CPM(con’d) Learning with Adaboost

Good_Part_Selected(Adaboost without CPM con’d) Learning with Adaboost

Adaboost without CPM(con’d) • The Alpha Values • Other Statistical Data: zero rate: 0.5333; covariance: 0.0074; median: 0.0874 Learning with Adaboost

Adaboost with CPM Learning with Adaboost

Adaboost with CPM(con’d) Learning with Adaboost

Good_Part_Selected(Adaboost without CPM con’d) Learning with Adaboost

Adaboost without CPM(con’d) • The Alpha Values • Other Statistical Data: zero rate: 0.6167; covariance: 0.9488; median: 1.6468 Learning with Adaboost

Conclusion and discussion 1) Adaboost works with CPM unified model; This model has smoothed the trained data set and decreased the influence of overfitting. 2) The influence of history record is very interesting. It will suppress the noise and strengthen the WeakLearn boosting direction. 3) The step length of KNN selected by Adaboost is not discussed here. This is also useful for suppress noise. Learning with Adaboost

Conclusion and discussion(con’d) 4) The Adaboost does not rely on the trained order. The obtained Alpha value has very similar distribution for all the classifiers. There are two examples: Example 1: four different train orders have obtained the Alpha as follow: 1) 6 birdsAlpha_All1= 0.4480 0.1387 0.2074 0.5949 0.5868 0.3947 0.3874 0.5634 0.6694 0.74472) 6 birdsAlpha_All2= 0.3998 0.0635 0.2479 0.6873 0.5868 0.2998 0.4320 0.5581 0.6946 0.76523) 6 birdsAlpha_All3 = 0.4191 0.1301 0.2513 0.5988 0.5868 0.2920 0.4286 0.5503 0.6968 0.71344) 6 birdsAlpha_All4= 0.4506 0.0618 0.2750 0.5777 0.5701 0.3289 0.5948 0.5857 0.7016 0.6212 Learning with Adaboost

Conclusion and discussion(con’d) Learning with Adaboost

Conclusion and discussion(con’d) Example 2: 60 parts from Curvature Descriptor, 60 from Distance Descriptor; 1) They are trained independently at first; 2) Then they are combined to be trained together. The results are as follow: Learning with Adaboost

Learning with AdaBoost