1 / 64

Boosting

Boosting. Shai Raffaeli Seminar in mathematical biology. http://www1.cs.columbia.edu/~freund/. Male. Human Voice. Female. Toy Example. Computer receives telephone call Measures Pitch of voice Decides gender of caller. mean1. mean2. var1. var2. Probability. Generative modeling.

Download Presentation

Boosting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Boosting Shai Raffaeli Seminar in mathematical biology http://www1.cs.columbia.edu/~freund/

  2. Male Human Voice Female Toy Example • Computer receives telephone call • Measures Pitch of voice • Decides gender of caller

  3. mean1 mean2 var1 var2 Probability Generative modeling Voice Pitch

  4. No. of mistakes Discriminative approach Voice Pitch

  5. mean2 mean1 Probability No. of mistakes Ill-behaved data Voice Pitch

  6. Machine Learning Decision Theory Statistics Traditional Statistics vs. Machine Learning Predictions Actions Data Estimated world state

  7. Boosting

  8. Feature vectors Binary labels {-1,+1} Positive weights A weighted training set

  9. Non-negative weights sum to 1 Binary label Feature vector (x1,y1,w1),(x2,y2,w2) … (xn,yn,wn) Weighted training set instances labels x1,x2,x3,…,xn y1,y2,y3,…,yn The weak requirement: A weak learner A weak rule weak learner h h

  10. weak learner h1 (x1,y1,w1), … (xn,yn,wn) (x1,y1,1/n), … (xn,yn,1/n) (x1,y1,w1), … (xn,yn,wn) (x1,y1,w1), … (xn,yn,wn) (x1,y1,w1), … (xn,yn,wn) (x1,y1,w1), … (xn,yn,wn) (x1,y1,w1), … (xn,yn,wn) (x1,y1,w1), … (xn,yn,wn) (x1,y1,w1), … (xn,yn,wn) (x1,y1,w1), … (xn,yn,wn) weak learner h2 h8 hT h9 h7 h5 h3 h4 h6 The boosting process Sign[] + + + Final rule: a1 h1 a2 h2 aT hT

  11. Adaboost • Binary labels y = -1,+1 • margin(x,y) = y [St atht(x)] • P(x,y) = (1/Z) exp (-margin(x,y)) • Given ht, we choose at to minimize S(x,y) exp (-margin(x,y))

  12. Adaboost Freund, Schapire 1997

  13. Main property of adaboost • If advantages of weak rules over random guessing are: g1,g2,..,gTthen in-sample error of final rule is at most

  14. Adaboost as gradient descent • Discriminator class: a linear discriminator in the space of “weak hypotheses” • Original goal: find hyper plane with smallest number of mistakes • Known to be an NP-hard problem (no algorithm that runs in time polynomial in d, where d is the dimension of the space) • Computational method: Use exponential loss as a surrogate, perform gradient descent.

  15. Prediction = Margin = - Cumulative # examples w - + + + - Mistakes Correct - + + + - - - - + Correct Mistakes Margin Margins view Project

  16. Adaboost = Logitboost Brownboost 0-1 loss Margin Mistakes Correct Adaboost et al. Loss

  17. One coordinate at a time • Adaboost performs gradient descent on exponential loss • Adds one coordinate (“weak learner”) at each iteration. • Weak learning in binary classification = slightly better than random guessing. • Weak learning in regression – unclear. • Uses example-weights to communicate the gradient direction to the weak learner • Solves a computational problem

  18. What is a good weak learner? • The set of weak rules (features) should be flexible enough to be (weakly) correlated with most conceivable relations between feature vector and label. • Small enough to allow exhaustive search for the minimal weighted training error. • Small enough to avoid over-fitting. • Should be able to calculate predicted label very efficiently. • Rules can be “specialists” – predict only on a small subset of the input space and abstain from predicting on the rest (output 0).

  19. Y no no yes yes 5 X 3 Decision Trees -1 +1 X>3 -1 Y>5 -1 -1 +1

  20. Y -1 +1 +0.2 -0.1 +0.1 X>3 no yes -1 -0.3 +0.1 sign -0.1 Y>5 no yes -0.3 +0.2 X Decision tree as a sum -0.2 -0.2

  21. Y +0.2 Y<1 -0.1 +0.1 -1 +1 no yes X>3 +0.7 0.0 no yes -0.3 +0.1 -0.1 -1 sign Y>5 no yes -0.3 +0.2 +1 X An alternating decision tree -0.2 +0.7

  22. Example: Medical Diagnostics • Cleve dataset from UC Irvine database. • Heart disease diagnostics (+1=healthy,-1=sick) • 13 features from tests (real valued and discrete). • 303 instances.

  23. Adtree for Cleveland heart-disease diagnostics problem

  24. Cross-validated accuracy

  25. Boosting and over-fitting

  26. Curious phenomenon Boosting decision trees Using <10,000 training examples we fit >2,000,000 parameters

  27. Explanation using margins 0-1 loss Margin

  28. No examples with small margins!! Explanation using margins 0-1 loss Margin

  29. Experimental Evidence

  30. Fraction of training example with small margin Probability of mistake Size of training sample VC dimension of weak rules Theorem Schapire, Freund, Bartlett & Lee Annals of stat. 98 For any convex combination and any threshold No dependence on number of weak rules that are combined!!!

  31. Suggested optimization problem Margin

  32. Idea of Proof

  33. Applications

  34. Applications of Boosting • Academic research • Applied research • Commercial deployment

  35. Academic research % test error rates

  36. Schapire, Singer, Gorin 98 Applied research • “AT&T, How may I help you?” • Classify voice requests • Voice -> text -> category • Fourteen categories Area code, AT&T service, billing credit, calling card, collect, competitor, dial assistance, directory, how to dial, person to person, rate, third party, time charge ,time

  37. Examples • Yes I’d like to place a collect call long distance please • Operator I need to make a call but I need to bill it to my office • Yes I’d like to place a call on my master card please • I just called a number in Sioux city and I musta rang the wrong number because I got the wrong party and I would like to have that taken off my bill • collect • third party • calling card • billing credit

  38. Word occurs Word does not occur Weak rules generated by “boostexter” Third party Collect call Calling card Category Weak Rule

  39. Results • 7844 training examples • hand transcribed • 1000 test examples • hand / machine transcribed • Accuracy with 20% rejected • Machine transcribed: 75% • Hand transcribed: 90%

  40. Commercial deployment Freund, Mason, Rogers, Pregibon, Cortes 2000 • Distinguish business/residence customers • Using statistics from call-detail records • Alternating decision trees • Similar to boosting decision trees, more flexible • Combines very simple rules • Can over-fit, cross validation used to stop

  41. Summary • Boosting is a computational method for learning accurate classifiers • Resistance to over-fit explained by margins • Underlying explanation – large “neighborhoods” of good classifiers • Boosting has been applied successfully to a variety of classification problems

  42. binding sites regulators DNA mRNA transcript Measurable quantity Gene Regulation • Regulatory proteins bind to non-coding regulatory sequence of a gene to control rate of transcription

  43. mRNA transcript Protein folding ribosome Protein sequence From mRNA to Protein Nucleus wall

  44. regulator Protein Transcription Factors

  45. Genome-wide Expression Data

  46. Microarrays measure mRNA transcript expression levels for all of the ~6000 yeast genes at once. • Very noisy data • Rough time slice over all compartments of many cells. • Protein expression not observed

  47. TF MTF SM TF MTF Partial “Parts List” for Yeast Many known and putative • Transcription factors • Signaling moleculesthat activate transcription factors • Known and putative binding site “motifs” • In yeast, regulatory sequence = 500 bp upstream region

  48. Microarray Image R1 R2 R3 R4 ….. Rp “Parent” gene expression G1 G2 G1 G2 Target gene expression G3 G3 G4 G4 … Gt … Binding sites (motifs)in upstream region Gt GeneClass: Problem Formulation M. Middendorf, A. Kundaje, C. Wiggins, Y. Freund, C. Leslie. Predicting Genetic Regulatory Response Using Classification. ISMB 2004. • Predicttarget generegulatory response from regulator activity and binding site data

  49. +1 -1 0 Role of quantization By Quantizing expression into three classes We reduce noise but maintain most of signal Weighting +1/-1 examples linearly with Expression level performs slightly better.

  50. Problem setup • Data point = Target gene X Microarray • Input features: • Parent state {-1,0,+1} • Motif Presence {0,1} • Predict output: • Target Gene {-1,+1}

More Related