20 likes | 112 Views
Ofer Dekel Shai Shalev-Shwartz Yoram Singer. The Forgetron: A kernel-based Perceptron on a fixed budget. The Hebrew University Jerusalem, Israel. Problem Setting. The Forgetron. Proof Technique. Mistake Bound. Online learning: Choose an initial hypothesis f 0 For t=1,2,…
E N D
Ofer Dekel Shai Shalev-Shwartz Yoram Singer The Forgetron: A kernel-based Perceptron on a fixed budget The Hebrew University Jerusalem, Israel Problem Setting The Forgetron Proof Technique Mistake Bound • Online learning: • Choose an initial hypothesis f0 • For t=1,2,… • Receive an instance xt and predict sign(ft(xt)) • If (yt ft(xt) <= 0) • update the hypothesis f • Goal: minimize the number of prediction mistakes • Kernel-based hypotheses • Example: the dual Perceptron • is always 1 • Initial hypothesis: • Update rule: • Competitive analysis • Compete against any g: • Goal: A provably correct algorithm on a budget: |It| <= B • Initilize: • For t=1,2… • define • Receive an instance xt, predict sign(ft(xt)), and then receive yt • If yt ft(xt) <= 0 set Mt = Mt-1 + 1 and update: • If |I’t|<= B skip the next two steps • define rt = min It • Progress measure: can be rewritten as • The Perceptron update leads to positive progress: • The shrinking and removal steps might lead to negative progress Can we quantify the deviation they might cause ? • A sequence {(x1,y1),…,(xT,yT)} such that K(xt,xt) <= 1 • Budget B >= 84 • Competitor g s.t. • Then, 1 2 3 ... ... t-1 t Deviation due to Shrinking Step (1) - Perceptron Experiments • Case I: the shrinking is projection onto the ball of radius U = ||g||: • Case II: aggressive shrinking define • Compare to Crammer, Kandola, Singer (NIPS’03) • Display online error vs. budget constraint 1 2 3 ... ... t-1 t hinge lossmax { 0 , 1 - ytg(xt) } MNIST USPS Step (2) - Shrinking Deviation due to Removal define • If example (x,y) is activated on round r and deactivated on round t then, Hardness Result synthetic (10% label noise) census-income (adult) 1 2 3 ... ... t-1 t • For any kernel-based algorithm that satisfies |It| <= B we can find g and an arbitrarily long sequence such that the algorithm errs on every single round whereas g suffers no loss. • for each ft based on B examples, exists x in X such that ft(x) = 0 • norm of the competitor is • we cannot compete with hypotheses whose norm larger than Step (3) - Removal Perceptron’s progress Deviation due to removal: 1 2 3 ... ... t-1 t