140 likes | 222 Views
Smooth ε -Insensitive Regression by Loss Symmetrization. Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew University {oferd,shais,singer}@cs.huji.ac.il COLT 2003: The Sixteenth Annual Conference on Learning Theory . Before We Begin ….
E N D
Smooth ε-Insensitive Regression by Loss Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew University {oferd,shais,singer}@cs.huji.ac.il COLT 2003: The Sixteenth Annual Conference on Learning Theory
Before We Begin … Linear Regression: given find such that Least Squares: minimize Support Vector Regression: minimize s.t.
Loss Symmetrization Loss functions used in classification Boosting: Symmetric versions of these losses can be used for regression:
A General Reduction • Begin with a regression training set where , • Generate 2mclassification training examples of dimension n+1: • Learn while maintaining by minimizing a margin-based classification loss
A Batch Algorithm An illustration of a single batch iteration Simplifying assumptions (just for the demo) • Instances are in • Set • Use the Symmetric Log-loss
A Batch Algorithm Calculate discrepancies and weights: 43210 0 1 2 3 4
A Batch Algorithm Cumulative weights: 0 1 2 3 4
or Additive update Two Batch Algorithms Update the regressor: 43210 Log-Additive update 0 1 2 3 4
Progress Bounds Theorem: (Log-Additive update) Theorem: (Additive update) Lemma: Both bounds are non-negative and equal zero only at the optimum
Boosting Regularization A new form of regularization for regression and classification Boosting Can be implemented by addingpseudo-examples * Communicated by Rob Schapire where
Regularization Contd. • Regularization Compactness of the feasible set for • Regularization A unique attainable optimizer of the loss function Proof of Convergence Progress + compactness + uniqueness = asymptotic convergence to the optimum
Exp-loss vs. Log-loss • Two synthetic datasets Log-loss Exp-loss
Extensions • Parallel vs. Sequential updates • Parallel - update all elements of in parallel • Sequential - update the weight of a single weak regressor on each round (like classic boosting) • Another loss function – the “Combined Loss” Log-loss Exp-loss Comb-loss
On-line Algorithms • GD and EG online algorithms for Log-loss • Relative loss bounds Future Directions • Regression tree learning • Solving one-class and various ranking problems using similar constructions • Regression generalization bounds based on natural regularization