490 likes | 795 Views
Regression. AMCS/CS 340: Data Mining. Xiangliang Zhang King Abdullah University of Science and Technology. Outline. What is regression? Minimizing sum-of-square error Linear regression Nonlinear regression Statistic models for regression Overfitting and Cross validation for regression.
E N D
Regression AMCS/CS 340: Data Mining Xiangliang Zhang King Abdullah University of Science and Technology
Outline • What is regression? • Minimizing sum-of-square error • Linear regression • Nonlinear regression • Statistic models for regression • Overfitting and Cross validation for regression 2 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Classification (reminder) X Y • Anything: • continuous (,d, …) • discrete ({0,1}, {1,…k}, …) • structured (tree, string, …) • … • discrete: • {0,1} binary • {1,…k} multi-class • tree, etc. structured 3 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Regression X Y • continuous: • , d • Anything: • continuous (,d, …) • discrete ({0,1}, {1,…k}, …) • structured (tree, string, …) • … • discrete: • {0,1} binary • {1,…k} multi-class • tree, etc. structured 4 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Examples • Data Predicting or Forecasting: • Processes, memory Power consumption • Protein structure Energy • Heart-beat rate, age, speed, duration Fat • Oil supply, consumption, etc Oil price • …… 5 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Linear regression 40 26 24 Temperature 22 20 20 30 40 20 30 20 10 0 10 0 10 20 0 0 Given examples given a new point Predict 6 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
26 24 22 20 30 40 20 30 Prediction 20 10 Prediction 10 0 0 Linear regression 40 Temperature 20 0 0 20 7
Outline • What is regression? • Minimizing sum-of-square error • Linear regression • Nonlinear regression • Statistic models for regression • Overfitting and Cross validation for regression 8 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Sum-of-Squares Error Function Observation Prediction Error or “residual” minimizing Sum squared error Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Sum-of-Squares Error Function Minimizing E(w) to find w* E(w) ---- quadratic function of w If derivative of E(w) w.r.t. w--- linear of w unique solution for minimizing E(w) Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Sum squared error Least Squares Error or “residual” Observation Prediction 0 0 20 Estimate the unknown parameter w by minimizing 11 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Minimize the sum squared error Sum squared error =0 Predict http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo1.m 12
Linear regression models Generally where Фj(x)are known as basis functions. Typically, Ф0(x) = 1, so that w0 acts as a bias. e.g. = [0 x x2 x3 ]
Linear regression models Example: Polynomial Curve Fitting Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Basis Functions Polynomial basis functions: These are global; a small change in xaffectsall basis functions. Gaussianbasis functions: Sigmoidal basis functions: where Gaussian and Sigmoidal basis functions are local; a small change in x only affect nearby basis functions. μj and s control location and scale (width or slope).
Minimize the sum squared error Sum squared error =0 Predict http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo2.m 16
Batch gradient descent algorithm (1) If derivative of E(w) w.r.t. wis not linear of w, Batch gradient descent Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Batch gradient descent algorithm (2) Batch gradient descent example Update the value of w according to the gradient Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Batch gradient descent algorithm (3) Batch gradient descent example Iteratively approaches the optimum of the Error function Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Statistical models for regression • Machine Learning paradigms • Regression Tree (CART) • Neural Networks • Support Vector Machine • Strength • flexibility within a nonparametric, assumption-free openwork that accommodates big data • Weakness • difficulty in interpreting the effects of each covariate 20 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Regression Tree Target value 21 http://chem-eng.utoronto.ca/~datamining/dmc/decision_tree_reg.htm
SVM Regression Given training data Find: , such that optimally describes the data: Sum of errors ● ● ● Subject to: ● ● ● ● “Support vectors” 22 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
NN Regression Minimize 23 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Outline • What is regression? • Minimizing sum-of-square error • Overfittingand Cross validation for regression 24 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Regression overfitting • Which one is the best ? • The one with best fit to the data ? • How well is it going to predict future data drawn from the same distribution? x 25 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Outline • What is regression? • Minimizing sum-of-square error • Overfittingand Cross validation for regression • Test set method • Leave-one-out • K-fold cross validation 26 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
The test set method 27 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
The test set method 28 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
The test set method 29 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
The test set method • Pros • very simple • Can then simply choose the method with the best test-set score • Cons • Wastes data: we get an estimate of the best method to apply to 30% less data • If we don’t have much data, our test-set might just be lucky or unlucky (“test-set estimator of performancehas high variance”) 30 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Outline • What is regression? • Minimizing sum-of-square error • Overfittingand Cross validation for regression • Test set method • Leave-one-out • K-fold cross validation 31 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
LOOCV (Leave-one-out Cross Validation) 32 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
LOOCV (Leave-one-out Cross Validation) 33 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
LOOCV (Leave-one-out Cross Validation) 34 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
LOOCV (Leave-one-out Cross Validation) 35 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
LOOCV (Leave-one-out Cross Validation) When you have done all points, report the mean error 36 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
LOOCV for Linear Regression For k=1 to N 1. Let (xk,yk) be the k-th record 2. Temporarily remove (xk,yk) from the dataset 3. Train on the remaining N-1 data points 4. Note your error (xk,yk). When you’ve done all points, report the mean error. MSELOOCV = 2.12 37
LOOCV for Quadratic Regression For k=1 to N 1. Let (xk,yk) be the k-th record 2. Temporarily remove (xk,yk) from the dataset 3. Train on the remaining N-1 data points 4. Note your error (xk,yk). When you’ve done all points, report the mean error. MSELOOCV = 0.962 38
LOOCV for Join the Dots For k=1 to N 1. Let (xk,yk) be the k-th record 2. Temporarily remove (xk,yk) from the dataset 3. Train on the remaining N-1 data points 4. Note your error (xk,yk). When you’ve done all points, report the mean error. MSELOOCV = 3.33 39
Which one of validations ? k-fold Cross Validation gets the best of both worlds 40 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
Outline • What is regression? • Minimizing sum-of-square error • Overfittingand Cross validation for regression • Test set method • Leave-one-out • K-fold cross validation 41 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) 42 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. 43 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. 44 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. 45 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error. Linear Regression MSE3FOLD = 2.05 46 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error. Quadratic Regression MSE3FOLD = 1.11 47 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
k-fold Cross Validation Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue) For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points. For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points. Then report the mean error. Join the Dots MSE3FOLD = 2.93 48 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining
CV-based Regression Algorithm Choice • Choosing which regression algorithm to use • Step 1: Compute 10-fold-CV error for six different model • Step 2: Whichever algorithm gave best CV score: train it with all the data, and that’s the predictive model you’ll use. 49 Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining