1 / 11

2D1431 Machine Learning

2D1431 Machine Learning. Boosting. Classification Problem. Assume a set s of N instances x i  X each belonging to one of M classes { c 1 ,…c M }. The training set consists of pairs (x i ,c i ). A classifier C assigns a classification C(x)  { c 1 ,…c M } to an instance x.

brilliant
Download Presentation

2D1431 Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2D1431 Machine Learning Boosting

  2. Classification Problem • Assume a set s of N instances xi X each belonging to one of M classes { c1,…cM }. • The training set consists of pairs (xi,ci). • A classifier C assigns a classification C(x)  { c1,…cM } to an instance x. • The classifier learned in trial t is denoted Ct while C* is the composite boosted classifier.

  3. Bagging & Boosting • Bagging and Boosting aggregate multiple hypotheses generated by the same learning algorithm invoked over different distributions of training data [Breiman 1996, Freund & Schapire 1996]. • Bagging and Boosting generate a classifier with a smaller error on the training data as it combines multiple hypotheses which individually have a larger error.

  4. Boosting • Boosting maintains a weight wi for each instance <xi, ci> in the training set. • The higher the weight wi, the more the instance xi influences the next hypothesis learned. • At each trial, the weights are adjusted to reflect the performance of the previously learned hypothesis, with the result that the weight of correctly classified instances is decreased and the weight of incorrectly classified instances is increased.

  5. Boosting • Construct an hypothesis Ct from the current distribution of instances described by wt. • Adjust the weights according to the classification error et of classifier Ct. • The strength at of a hypothesis depends on its training error et. at = ½ ln ((1-et)/et) learn hypothesis Set of weighted wit instances xi hypothesis Ct strength at adjust weights

  6. Boosting • The final hypothesis CBO aggregates the individual hypotheses Ct by weighted voting. cBO(x) = argmax cjCSt=1Tatd(cj,ct(x)) • Each hypothesis vote is a function of its accuracy. • Let wit denote the weight of an instance xi at trial t, for every xi, wi1 = 1/N. The weight wit reflects the importance (e.g. probability of occurrence) of the instance xi in the sample set St. • At each trial t = 1,…,T an hypothesis Ct is constructed from the given instances under the distribution wt. This requires that the learning algorithm can deal with fractional examples.

  7. Boosting • The error of the hypothesis Ct is measured with respect to the weights et = SiCt(xi)  ci wit / Si wit at = ½ ln ((1-et)/et) • Update the weights wit of correctly and incorrectly classified instances by wit+1 = wit e-atif Ct(xi) = ci wit+1 = wit eatif Ct(xi)  ci • Afterwards normalize the wit+1 such that they form a proper distribution Si wit+1 =1

  8. Boosting x c1(x) C1 C2 CT c2(x) cT(x) train train train S,w1 S,w2 S,wT • The classification cBO(x) of the boosted hypothesis is obtained by summing the votes of the hypotheses C1, C2,…, CT where the vote of each hypothesis Ct is weights at . cBO(x) = argmax cjCSt=1Tatd(cj,ct(x)) … …

  9. Boosting Given {<x1,c1>,…<xm,cm>} Initialize wi1=1/m For t=1,…,T • train weak learner using distribution wit • get weak hypothesis Ct : X  C with error et = SiCt(xi)  ci wit / Si wit • choose at = ½ ln ((1-et)/et) • update: wit+1 = wit e-at if Ct(xi) = ci wit+1 = wit eatif Ct(xi)  ci Output the final hypothesis cBO(x) = argmax cjCSt=1Tatd(cj,ct(x))

  10. Bayes MAP Hypothesis • Bayes MAP hypothesis for two classes x and o • red : incorrect classified instances

  11. Boosted Bayes MAP Hypothesis • Boosted Bayes MAP hypothesis has more complex decision surface than individual hypotheses alone

More Related