Class notes for ISE 201 San Jose State University Industrial & Systems Engineering Dept.

Probability & Statistics for Engineers & Scientists, byWalpole, Myers, Myers & Ye ~Chapter 11 Notes Class notes for ISE 201 San Jose State University Industrial & Systems Engineering Dept. Steve Kennedy

Simple Linear Regression • If there is a linear relationship between an independent variable x and a dependent variable Y, thenwhere  is the intercept and  is the slope of the linear relationship, and  is the random error, assumed to be normally distributed with mean  = 0 and variance 2 . • Residuals: Given regression data points [(xi, yi), i = 1, 2, ..., n], if yihat = a + bxi is the estimate of yi using the linear model, then the residual ei is given by • The residual for each data point is the distance of the point from the line in the y direction. • We will use the "least squares" technique to minimize the sum of the squares of the residuals.

Least Squares Method • We wish to find a and b to minimize the sum of the squares of the errors (residuals), SSE.To minimize, differentiate with respect to a and b, and set each result to 0. This generates two simultaneous equations (called normal equations) & two unknowns. • Solving for a and b, we get and • a & b are the coefficients of the "best fit" straight line through the data points that minimize SSE.

Coefficient of Determination (R2) • The coefficient of determination R2 is a measure of the proportion of variability explained by the fitted model, and thus a measure of the quality of the linear fit. • Recall from the previous slide that SSE is the sum of the squares of the errors (residuals), or the amount of variation unexplained by the straight line. • SST, the total sum of squares, is the total variability in the data. • Then R2 (the square of the correlation coefficient) is defined as • R2 tells us the percent of the total variation in the data explained by the straight line relationship. • If R2 1, all points are very close to the line.

Data Transformations for Regression • If the relationship between the variables is other than linear, we can first transform either the dependent or independent variable or both, and then perform a linear regression on the transformed variables. If, for example, we have: • Exponential: If y = ex, use y* = ln y, and regress y* against x. • Power: If y = x, use y* = ln y and x* = ln x, and regress y* against x*. • Reciprocal: If y =  + (1/x), use x* = 1/x, and regress y against x*. • Hyperbolic: If y = x/( + x), use y* = 1/y and use x* = 1/x, and regress y* against x*.

Multiple Linear Regression • In a multiple linear regression model, we have k independent variables, x1, x2, ..., xk. The model is • The least-squares estimates of the coefficients can be calculated as with simple linear regression, except that there are k + 1 simultaneous equations to solve (use matrix inversion). • R2 still describes the goodness of the linear relationship. • Multiple linear regression can also be used to calculate the least squares coefficients for a polynomial model of the formby first calculating the square, cube, etc., of the independent variable and then doing a multiple linear regression.

Class notes for ISE 201 San Jose State University Industrial & Systems Engineering Dept.