1 / 28

5.3 Algorithmic Stability Bounds

5.3 Algorithmic Stability Bounds. Summarized by: Sang Kyun Lee. Robustness of a learning algorithm. Instead of compression and reconstruction function, now we think about the “robustness of a learning algorithm A ” Robustness

fergal
Download Presentation

5.3 Algorithmic Stability Bounds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 5.3 Algorithmic Stability Bounds Summarized by: Sang Kyun Lee

  2. Robustness of a learning algorithm • Instead of compression and reconstruction function, now we think about the “robustness of a learning algorithm A” • Robustness • a measure of the influence of an additional training example (x, y) 2Z on the learned hypothesis A(z) 2H • quantified in terms of the loss achieved at any test object x 2X • Arobust learning algorithmguarantees |expected risk - empirical risk| < M even if we replace one training example by its worst counterpart • This fact is of great help when using McDiarmid’s inequality (A.119) – a large deviation result perfectly suited for the current purpose (c) 2005 SNU CSE Biointelligence Lab

  3. McDiarmid’s Inequality (A.119) (c) 2005 SNU CSE Biointelligence Lab

  4. 5.3.1 Algorithmic Stability for Regression • Framework • Training sample: • drawn iid from an unknown distribution • Hypothesis: • a real-valued function • Loss function: • l : R£R!R • a function of predicted value and observed value t (c) 2005 SNU CSE Biointelligence Lab

  5. Notations • Given & (c) 2005 SNU CSE Biointelligence Lab

  6. m-stability (1/2) • this implies robustness in the more usual sense of measuring the influence of an extra training example. This is formally expressed in the following theorem. (c) 2005 SNU CSE Biointelligence Lab

  7. m-stability (2/2) • Proof (theorem 5.27) (c) 2005 SNU CSE Biointelligence Lab

  8. Lipschitz Loss Function (1/3) • Thus, given Lipschitz continuous loss function l, • That is, we can use the difference of the two functions to bound the losses incurred by themselves at any test object x. (c) 2005 SNU CSE Biointelligence Lab

  9. Lipschitz Loss Function (2/3) • Examples of Lipschitz continuous loss functions (c) 2005 SNU CSE Biointelligence Lab

  10. Lipschitz Loss Function (3/3) • Using the concept of Lipschitz continuous loss functinos we can upper bound the value of m for a large class of learning algorithms, using the following theorem (Proof at Appendix C9.1): Using this, we’re able to cast most of the learning algorithms presented in Part I of this book into this framework (c) 2005 SNU CSE Biointelligence Lab

  11. Algorithmic Stability Boundfor Regression Estimation • Now, in order to obtain generalization error bounds for m-stable learning algorithms A we proceed as follows: • To use McDiarmid’s inequality, define a random variable g(Z) which measure |R[fz] – Remp[fz,z]| or |R[fz] – Rloo[A,z]|. (ex) g(Z) = R[fz] – Remp[fz,z] • Then we need to upper bound E[g] over the random draw of training samples z 2Zm. This is because we’re only interested in the prob. that g(Z) will be larger than some prespecified . • We also need an upper bound on which should preferably not depend on i 2 {1,…,m} Little bit crappy here! (c) 2005 SNU CSE Biointelligence Lab

  12. Expectation over the random draw of training samples z 2Zm Algorithmic Stability Boundfor Regression Estimation (C9.2 – 1/8) = = = Quick Proof: (c) 2005 SNU CSE Biointelligence Lab

  13. Algorithmic Stability Boundfor Regression Estimation (C9.2 – 2/8) Quick Proof: (c) 2005 SNU CSE Biointelligence Lab

  14. Algorithmic Stability Boundfor Regression Estimation (C9.2 – 3/8) (c) 2005 SNU CSE Biointelligence Lab

  15. Algorithmic Stability Boundfor Regression Estimation (C9.2 – 4/8) • Proof by Lemma C.21 (c) 2005 SNU CSE Biointelligence Lab

  16. Algorithmic Stability Boundfor Regression Estimation (C9.2 – 5/8) • Summary: • The two bounds are essentially the same • the additive correction ¼m • the decay of the prob. is O(exp(-2/m m2)) • This result is slightly surprising, because • VC theory indicates that the training error Remp is only a good indicator of the generalization error when the hypothesis space has a small VC dimension (Thm. 4.7) • In contrast, the loo error disregards VC dim and is an almost unbiased estimator of the expected generalization error of an algorithm (Thm 2.36) (c) 2005 SNU CSE Biointelligence Lab

  17. Algorithmic Stability Boundfor Regression Estimation (C9.2 – 6/8) • However, recall that • VC theory is used for empirical risk minimization algos which only consider the training error as the coast function to be minimized • In contrast, in the current formulation we have to guarantee a certain stability of the learning algorithm : in case of ! 0 (the learning algorithm minimizes the emp risk only, we can no longer guarantee a finite stability. (c) 2005 SNU CSE Biointelligence Lab

  18. Algorithmic Stability Boundfor Regression Estimation (C9.2 – 7/8) • Let’s consider m-stable algorithm A s.t. m·m-1 • From thm 5.32, ! with probability of at least 1-. • This is an amazingly tight generalization error bound whenever ¿ because the expression is dominated by the second term • Moreover, this provides us practical guides on the possible values of the trade-off parameter . From (5.19), regardless of the empirical term Remp[A(z),z] (c) 2005 SNU CSE Biointelligence Lab

  19. (c) 2005 SNU CSE Biointelligence Lab

  20. 5.3.2 Algorithmic Stability for Classification • Framework • Training sample: • Hypothesis: • Loss function: • Confine to zero-one loss, • although the following also applies to any loss that takes a finite set of values. (c) 2005 SNU CSE Biointelligence Lab

  21. m stability • For a given classification algorithm • However, here we have m2 {0,1} only. • m= 0 occurs if, • for all training samples z 2Zm and all test examples (x,y) 2Z, which is only possible if H only contains on hypothesis. • If we exclude this trivial case, then thm 5.32 gives trivial result (c) 2005 SNU CSE Biointelligence Lab

  22. Refined Loss Function (1/2) • In order to circumvent this problem, we think about the real-valued output f(x) and the classifier of the form h(¢)=sign(f(¢)). • As our ultimate interest is the generalization error , • Consider a loss function: which is a upper bound of the function • Advantage from this loss function settings: (c) 2005 SNU CSE Biointelligence Lab

  23. Refined Loss Function (2/2) • Another useful requirement on the refined loss function lis Lipschitz continuity with a small Lipschitz constant • This can be done by adjusting the linear soft margin loss : where y2 {-1,+1} • Modify this function to output at least  on the correct side • Loss function has to pass through 1 for f(x)=0 • Thus the steepness of the function is 1/ • Therefore the Lipschitz constant is also 1/ • The function should be in the interval [0,1] because the zero-one loss will never exceed 1. (c) 2005 SNU CSE Biointelligence Lab

  24. (c) 2005 SNU CSE Biointelligence Lab

  25. Algorithmic Stability for Classification (1/3) • For !1, the first term is provably non-increasing whereas the second term is always decreasing (c) 2005 SNU CSE Biointelligence Lab

  26. Algorithmic Stability for Classification (2/3) • Consider this thm for the special case of linear soft margin SVM for classification (see 2.4.2) • WLOG, assume  = 1 (c) 2005 SNU CSE Biointelligence Lab

  27. Algorithmic Stability for Classification (3/3) • This bounds provides an interesting model selection criterion, by which we select the value of  (the assumed noise level). • In contrast to the result of Subsection 4.4.3, this bound only holds for the linear soft margin SVM • The results in this section are so recent that no empirical studies have yet been carried out (c) 2005 SNU CSE Biointelligence Lab

  28. Algorithmic Stability for Classification (4/4) (c) 2005 SNU CSE Biointelligence Lab

More Related