790 likes | 1.19k Views
Linear Regression. Fall 2014 The University of Iowa Tianbao Yang. Content. Linear Regression with one variable Probability Interpretation Linear Basis Function Models Optimization Multiple Outputs Regularization and Lasso Bias and Variance Tradeoff Model Selection.
E N D
Linear Regression Fall 2014 The University of Iowa Tianbao Yang
Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Model Selection
Linear Regression with One Variable • Example: predict house price • Training Data: a set of examples • input (feature): size of house • output (target): house price price First-order Linear Regression size
Linear Regression with One Variable • How to estimate the model parameters price size
Linear Regression with One Variable • How to estimate the model parameters price size
Linear Regression with One Variable • How to estimate the model parameters • Criterion: minimize the error on training data • the loss function measures the error price Loss Function (function in parameters) size
Linear Regression with One Variable • How to estimate the model parameters • Criterion: minimize the error on training data • the loss function measures the error price Loss Function (function in parameters) size
Linear Regression with One Variable • To estimate the model parameters • Criterion: minimize the error on training data • the loss function measures the error • minimize the sum of all losses Square loss Least Square Regression
Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Bayesian Regression
Supervised Learning • Training examples: • Identical independent distribution (i.i.d) assumption • A critical assumption for machine learning theory
Probability Interpretation • Training Data: a set of examples • input (feature): size of house • output (target): house price Random variable Random variable price Standard Gaussian Noise size
Data Likelihood • Training Data: a set of examples variance variance mean mean Data Likelihood i.i.d assumption
Maximum Likelihood Estimation (MLE) • Estimate the model parameters • Maximum Likelihood Estimation
MLE is Equivalent to Least Square Regression • Maximum Likelihood Estimation
MLE is Equivalent to Least Square Regression • Least Square Regression IS • Maximum Likelihood Estimation
Probability Interpretation of Linear Regression • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Model Selection
Linear Basis Function Models • Example: Polynomial Curve Fitting
Linear Basis Function Models • generally • where are known as basis functions. • typically , so that acts as a bias.
Linear Basis Function Models • Polynomial basis functions: • These are global; a small change in x affect all basis functions.
Linear Basis Function Models • Gaussian basis functions: • These are local; a small change in x only affect nearby basis functions.
Linear Basis Function Models • Sigmoidal basis functions: • These are local; a small change in x only affect nearby basis functions.
Linear Regression with Multi-Variables • Example: predict house price • Training Data: a set of examples • input (features) • size of house • year of house • etc • output (target): house price
Least Square Regression • Minimize Sum of Square Loss
Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Bayesian Regression
Procedures of Machine Learning • A Three-step view of Machine Learning • data collection (and pre-processing) • model building (and analysis) • optimization Data Optimization Model
Optimization • Minimize Sum of Square Loss • Unconstrained Convex Optimization 1. compute the gradient with respect to (w.r.t)
Optimization • Unconstrained Convex Optimization 2. set the gradient to zero
Geometry of Least Square • Minimize Sum of Square Loss • subspace • , minimize the distance between and its orthogonal projection
Large-scale Regression • expensivecomputation : the number of training data points and the dimensionality are both large Too many features Too many data
Gradient Descent • Gradient Descent
Stochastic Gradient Descent • Stochastic Gradient Descent Step-size
Stochastic Gradient Descent • Stochastic Gradient Descent VS Gradient Descent
Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso • Bias and Variance Tradeoff • Bayesian Regression
Multi-task Learning • Predict multiple outputs • Example: predict current house price, and house-price after two years
Multi-task Learning • predict multiple outputs from the same features
Content • Linear Regression with one variable • Probability Interpretation • Linear Basis Function Models • Optimization • Multiple Outputs • Regularization and Lasso and more • Bias and Variance Tradeoff
Over-fitting Root-Mean-Square (RMS) Error:
Avoid Over-fitting: Regularization • Consider the error function: • With the sum-of-squares error function and a quadratic regularizer, we get • which is minimized by Loss term + Regularization term Regularization parameter Ridge Regression
Analytical Explanation • See homework
Probability Interpretation • Maximum a posteriori Estimation • Bayes' Theorem • Ridge regression is to maximize a posterior distribution Data Likelihood Prior of model Posterior of model
Probability Interpretation • Maximum a posteriori Estimation • Bayes' Theorem • Ridge regression is to maximize a posterior distribution
Probability Interpretation • Maximum a posteriori Estimation • prior distribution is Gaussian distribution