Understanding Approximation and Generalization in Neural Networks

CMS 165 Lecture 8 Approximation and Generalization in Neural Networks

Recall from previous lecture: Hypothesis class: A loss function: Expected risk: Expected risk minimizer: Given a set of samples: Empirical risk: Empirical risk minimizer:

Measures of Complexity with prob at least VC-Dimension: Linear class Bounded linear class

Rademacher complexity of NN From notes of Percy Liang

Decomposition of Errors Derivation for linear regression

Universality of NN

Approximation in Shallow NN Universality proof is loose: exponential number of units. Better bound? Better basis? How does it improve bound for various classes of functions?

Deep vs. Shallow Networks What is the advantage of deep networks? Compositionality: requires exponential number of units in a shallow network

Classical NN theory

Modern Neural Networks From Belkin etal, “Reconciling modern machine learning and the bias-variance trade-off”

Seems to be true in practice Slides from Ben Recht

Is it really true? Slides from Ben Recht

Look closely at data.. Slides from Ben Recht

Solution? Better Test Sets.. Slides from Ben Recht

Accuracy on harder test set Slides from Ben Recht

True even on Imagenet Slides from Ben Recht

Is this a good summary? Slides from Ben Recht

Understanding Approximation and Generalization in Neural Networks

Understanding Approximation and Generalization in Neural Networks

Presentation Transcript

Lecture 8

Lecture 8

Lecture 6 CMS 165

Lecture 8

Lecture #8

Lecture 8

Lecture 8

Lecture 8

Lecture 8

Lecture 8

Lecture 8

642-165

Approx. 165 Cycle 8 Districts

LECTURE № 8

LECTURE 8

Lecture 8

165