290 likes | 303 Views
+. Machine Learning and Data Mining Regression Problem. (adapted from) Prof. Alexander Ihler. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A. Overview. Regression Problem Definition and define parameters ϴ . Prediction using ϴ as parameters
E N D
+ Machine Learning and Data MiningRegression Problem (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA
Overview • Regression Problem Definition and define parameters ϴ. • Prediction using ϴ as parameters • Measure the error • Finding good parameters ϴ (direct minimization problem) • Non-Linear regression problem (c) Alexander Ihler
Example of Regression • Vehicle price estimation problem • Features x: Fuel type, The number of doors, Engine size • Targets y: Price of vehicle • Training Data: (fit the model) • Testing Data: (evaluate the model)
Example of Regression • Vehicle price estimation problem • ϴ = [-300, -200 , 700 , 130] • Ex #1 = -300 + -200 *1 + 4*700 + 70*130 = 11400 • Ex #2 = -300 + -200 *1 + 2*700 + 120*130 = 16500 • Ex #3 = -300 + -200 *2 + 2*700 + 310*130 = 41000 • Mean Absolute Training Error = 1/3 *(400 + 500 + 0) = 300 • Test = -300 + -200 *2 + 4*700 + 150*130 = 21600 • Mean Absolute Testing Error = 1/1 *(600) = 600
Supervised learning • Notation • Features x (input variables) • Targets y (output variables) • Predictions ŷ • Parameters q • Error = Distance between y and ŷ Learning algorithm Change θ Improve performance Program (“Learner”) Characterized by some “parameters”θ Procedure (using θ) that outputs a prediction Training data (examples) Features Feedback / Target values Evaluation of the model (measure error)
Overview • Regression Problem Definition and parameters. • Prediction using ϴ as parameters • Measure the error • Finding good parameters ϴ (direct minimization problem) • Non-Linear regression problem (c) Alexander Ihler
Linear regression “Predictor”: Evaluate line: ӯ = ϴ0 +ϴ1 * X1 return ӯ ӯ = Predicted target value (Black line) • Define form of function f(x) explicitly • Find a good f(x) within that family New instance with X1=8 Predicted value =17 40 Y = 5 + 1.5*X1 Target y 20 0 0 10 20 Feature X1 (c) Alexander Ihler
26 26 24 24 y 22 22 20 20 30 30 40 40 20 20 30 30 x1 20 20 x2 10 10 10 10 0 0 0 0 More dimensions? y x1 x2 (c) Alexander Ihler
Notation Ӯ is a plane in n+1 dimension space Define “feature” x0 = 1 (constant) Then n = the number of features in dataset (c) Alexander Ihler
Overview • Regression Problem Definition and parameters. • Prediction using ϴ as parameters • Measure the error • Finding good parameters ϴ (direct minimization problem) • Non-Linear regression problem (c) Alexander Ihler
Supervised learning • Notation • Features x (input variables) • Targets y (output variables) • Predictions ŷ • Parameters q • Error = Distance between y and ŷ Learning algorithm Change θ Improve performance Program (“Learner”) Characterized by some “parameters”θ Procedure (using θ) that outputs a prediction Training data (examples) Features Feedback / Target values Evaluation of the model (measure error)
Red points = Real target values Black line = ӯ (predicted value) ӯ = ϴ0 +ϴ1 * X Blue lines = Error (Difference between real value y and predicted value ӯ) Measuring error Error or “residual” Observation Prediction 0 0 20 (c) Alexander Ihler
Mean Squared Error • How can we quantify the error? • Y= Real target value in dataset, • ӯ = Predicted target value by ϴ*X • Training Error: m= the number of training instances, • Testing Error: Using a partition of Training error to check predicted values. m= the number of testing instances, m=number of instance of data (c) Alexander Ihler
MSE cost function • Rewrite using matrix form X = input variables in dataset y= output variable in dataset m=number of instance of data n = the number of features, (Matlab) >> e = y’ – th*X’; J = e*e’/m; (c) Alexander Ihler
Visualizing the error function J is error function. The plane is the value of J, not the plane fitted to output values. Dimensions are ϴ0 and ϴ1instead of X1 and X2 Output is J instead of y as target value J(θ) θ0 θ1 Representation of J in 2D space. Inner red circles has less value of J Outer red circles has higher value of J θ1 (c) Alexander Ihler θ0 θ 0
Overview • Regression Problem Definition and parameters. • Prediction using ϴ as parameters • Measure the error • Finding good parameters ϴ (direct minimization problem) • Non-Linear regression problem (c) Alexander Ihler
Supervised learning • Notation • Features x • Targets y • Predictions ŷ • Parameters q Learning algorithm Change θ Improve performance Program (“Learner”) Characterized by some “parameters”θ Procedure (using θ) that outputs a prediction Training data (examples) Features Feedback / Target values Evaluation of the model (measure error)
Finding good parameters • Want to find parameters which minimize our error… • Think of a cost “surface”: error residual for that θ… (c) Alexander Ihler
MSE Minimum (m <= n+1) m=number of instance of data n = the number of features, • Consider a simple problem • One feature, two data points • Two unknowns and two equations: n +1=1+1 = 2 m=2 • Can solve this system directly: Theta gives a line or plane that exactly fit to all target values. (c) Alexander Ihler
SSE Minimum (m > n+1) • Reordering, we have • Most of the time, m > n • There may be no linear function that hits all the data exactly • Minimum of a function has gradient equal to zero (gradient is a horizontal line.) n +1=1+1 = 2 m=3 Just need to know how to compute parameters. (c) Alexander Ihler
18 16 14 162 cost for this one datum Heavy penalty for large errors Distract line from other points. 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 Effects of Mean Square Error choice outlier data: An outlier is an observation that lies an abnormal distance from other value. (c) Alexander Ihler
Absolute error MSE, original data 18 Abs error, original data 16 Abs error, outlier data 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 (c) Alexander Ihler
Something else entirely… Error functions for regression (Mean Square Error) (Mean Absolute Error) (???) “Arbitrary”Error functions can’t be solved in closed form… So as alternative way, use gradient descent (c) Alexander Ihler
Overview • Regression Problem Definition and parameters. • Prediction using ϴ as parameters • Measure the error • Finding good parameters ϴ (direct minimization problem) • Non-Linear regression problem (c) Alexander Ihler
Nonlinear functions • Single feature x, predict target y: • Sometimes useful to think of “feature transform” • Convert a non-linear function to linear function and then solve it. Add features: Linear regression in new features (c) Alexander Ihler
Higher-order polynomials Y = ϴ0 • Are more features better? • “Nested” hypotheses • 2nd order more general than 1st, • 3rd order ““ than 2nd, … • Fits the observed data better Y = ϴ0+ ϴ1 * X Y = ϴ0+ ϴ1 * X + ϴ2*X2 + ϴ3*X3 18nd order 3rd order 1st order
Test data • After training the model • Go out and get more data from the world • New observations (x,y) • How well does our model perform? Training data New, “test” data (c) Alexander Ihler
Plot MSE as a function of model complexity Polynomial order Decreases More complex function fits training data better What about new data? 30 Training data 25 New, “test” data 20 15 • 0th to 2storder • Error decreases • Underfitting • Higher order • Error increases • Overfitting 10 5 0 0 1 2 3 4 5 6 7 8 9 10 Training versus test error Under fitting Overfitting Mean squared error Polynomial order (c) Alexander Ihler
Summary • Regression Problem Definition • Vehicle Price estimation • Prediction using ϴ: • Measure the error: difference between y and ŷ • e.g. Absolute error, MSE • direct minimization problem • Two cases m<=n+1 and m > n+1 • Non-Linear regression problem • Finding best nth order polynomial function for each problem (not overfitting and not under fitting)