Regression Variance-Bias Trade-off

RegressionVariance-Bias Trade-off

Regression • We need a regression function h(x) • We need a loss function L(h(x),y) • We have a true distribution p(x,y) • Assume a quadratic loss, then: Note: yt; h(x)y(x) estimation error noise error

Regression: Learning • Assume h(x) is a parametric curve, e.g. h(x)=af(x)+b. • Minimize loss over the parameters (e.g. a,b), where p(x,y) is replaced with a sum over • data-cases (called a “Monte Carlo sum”): • That is: we solve: • The same results follows from posing a Gaussian model q(y|x) for p(y|x) with mean h(x) • and maximizing the probability of the data over the parameters. • (This approach is taken in 274; probabilistic learning).

Back to overfitting • More parameters lead to more flexible functions • which may lead to over-fitting. • Formalize this by imagining very many datasets D, • all of size N. Call h(x,D) the regression function • estimated from a dataset D of size N, i.e. a(D)f(x)+b(D), • then: • Next, average over p(D)=p(x1)p(x2)….p(xN). • Only first term depends on D: 0 Variance+bias2

Bias/Variance Tradeoff A B C A: The label y label fluctuates (label variance). B: The estimate of h fluctuates across different datasets (estimation variance). C: The average estimate of h does not fit well to the true curve (squared estimation bias).

Bias/Variance Illustration Variance Bias

Relation to Over-fitting Training error is measuring bias, but ignoring variance. Testing error / X-validation error is measuring both bias and variance. Increasing regularization (less flexible models) Decreasing regularization (more flexible models)

Regression Variance-Bias Trade-off