Chapter 2 Simple Linear Regression

…Don’t be afraid of others, because they are bigger than you. The real size could be measured in the wisdom. ST3131, Lecture 2

Chapter 2 Simple Linear Regression In this Chapter, we consider the simplest regression model : Simple Linear Regression (SLR) Model Which describes the linear relationship between Y and X. Tasks: 1. Review of some basic statistics 2. Define measures of the direction / strength of the linear relationship between Y and X. 3. Find the formulas for the estimators and . ST3131, Lecture 2

Review of Some Basic Statistics Let Y and X have n observations as follows: • Summary Statistics: • Mean : , average of observations of Y, measure for sample center of Y. • Deviation: , differences of observation from . • Variance: , average of squared deviations of Y. • Standard Deviation: , measure for spread of Y. • Standardization of Y: • Similarly, we can define all terms for X. ST3131, Lecture 2

Properties of Standardized Variables Properties: =0, =1. Proof: ST3131, Lecture 2

Exact Linear Relationship between Y and X Given Y= + X, consider the Linear Relationship between Y and X, Case 1). Positive, X increases, Y increases, when ( ) > 0; 2). Negative, X increases, Y decreases, when ( ) <0; 3). Linearly Uncorrelated, X changes, Y does NOT change, when ( ) =0. =1, = -1 =1, = 0 =1, = 1 Conclusion: ( ) is an Indicator of the direction of Y changing with X. ST3131, Lecture 2

Distorted Linear Relationship between Y and X Given Y= + X + , consider the Linear Relationship between Y and X Case 1). Positive, X increases, Y almost increases when ( ) >0; 2). Negative, X increases, Y almost decreases when ( ) <0; 3).Linearly Uncorrelated, X changes, Y almost does NOT change, when ( )=0 =1, = -1 =1, = 0 =1, = -1 Conclusion: ( ) is also an Indicator of the Direction of Y changing with X. ST3131, Lecture 2

Intuitive Derivation of the LS-estimators Sample Covariance of Y and X: the summation of cross-products of deviations of Y and X divided by (n-1). Intuitive Derivation: We have , thus we have Thus, ST3131, Lecture 2

Formulas for the LS-estimators Assume ( ) =0, and =0. The LS-estimators are Since =0, Cov(Y,X) has the same signs as those of Thus Cov(Y,X) is also an Indicator of the direction of the Linear Relationship between Y and X: Case 1) Positive when Cov(Y,X)>0; 2) Negative when Cov(Y,X)<0; 3) Uncorrelated when Cov(Y,X)=0. Summary: Indicators of the Direction of the Linear Relationship between Y and X 1). The slope 2) The slope estimator 3). Cov (Y,X) ST3131, Lecture 2

Properties of Cov(Y,X) • 1. Symmetric, i.e., Cov(Y,X)=Cov(X,Y). • 2. Scale-Dependent, i.e. when the scales of Y or X change, so is their covariance. Let Y1=a+bY,X1=c+dX. Then we have • 3. Take values from - to + since b and d can take any values. Thus Cov(Y,X) does not measure the strength of the linear relationship between Y and X. ST3131, Lecture 2

Correlation Coefficient between Y and X Correlation Coefficient of Y and X is defined as the covariance of Standardized Y and X i.e. the covariance of Y and X divided by their standard deviations. Clearly Cor(Y,X) and Cov(Y,X) have the same signs so that it is also an indicator of the direction of the linear relationship between Y and X 1) Positive when Cor(Y,X)>0 2) Negative when Cor(Y,X)<0 3) Linear Uncorrelated when Corr(Y,X)=0 ST3131, Lecture 2

Properties of Cor(Y,X) • Symmetric, I.e. Cor(Y,X)=Cor(X,Y). • Scale-Invariant, i.e., Not change with change of the scales of Y and X. • Let Y1=a+bY,X1=c+dX, b>0,d>0. Then 3. Take values between –1 and 1. The strength of the Linear Relationship: 1). Strong when |Cor(Y,X)| close to 1; 2). Weak when |Cor(Y,X)| close to 0; 3). Linear Uncorrelated when Cor(Y,X)=0; But Y and X can still have some relationship. Counter example: the top-right picture where Y=2-cos(6.28X) (perfect nonlinear relationship) while Cor(Y,X)=0. ST3131, Lecture 2

Examples of Correlation Coefficients Cor(Y,X)=.71 Strong Linearity Cor(Y,X)=-.09 Near Uncorrelated Cor(Y,X)=.98 Very Strong Linearity Robustness: BothCov(Y,X) and Cor(Y,X) are NOT Robust Statistics since their values will be affected by a few outliers . Examples: Anscombe quartets (see next slide) have the same summary statistics but quite different pictures: (a). can be described by a linear model; (b). can be described by a qudratic model; (c ). has an outlier, and so is (d). ST3131, Lecture 2

Anscombe Quartets (b) Strong Nonlinearity (a) Strong linearity (d) An outlier appears (c) an outlier appears ST3131, Lecture 2

Table for Computing Variance and Covariance Note that Var(Y)=sum of squared deviations of Y divided by (n-1) Var(X)=sum of squared deviations of X divided by (n-1) Cov(Y,X)=sum of products of deviations of Y and X divided by (n-1). ST3131, Lecture 2

Example Computer Repair Data (Table 2.5, Page 27, see Table 2.6, Page 28 for detail computation) Conclusion: Y and X are strongly linearly related. Drawback: Cor(Y,X) can not be used to predict Y values given X values. This can be done with Simple Linear Regression Analysis. ST3131, Lecture 2

Strict Derivation of the LS-estimators SLR Model Intercept =the predicted value of Y when X=0, Slope =the change in Y for unit change in X. Least Squares Method: Find to minimize the Sum of the Squared Errors (SSE): The minimizers are: the same as those in Slide 7. ST3131, Lecture 2

Proof ST3131, Lecture 2

Proof (continued) Equality holds when Which are the least squares estimators of the parameters of and Since , we have an important property of Cor(Y,X) Moreover, we have another important equation: ST3131, Lecture 2

Example Computer Repair Data(continued), we have , Cov(Y,X)=136, Var(X)=2.96 Thus the LS- Regression Line is The fitted values and residuals then are In other words, we have : Minutes =4.162+15.5*Units Using this formula, we can compute the fitted (predicted) values, e.g. X=4, fitted value=4.162+15.5*4=66.20. X=11, predicted value=4.162+15.5*11=174.66. ST3131, Lecture 2

Exercise (1) Fill the following table, then compute the mean, variance, std of Y and X (2) Compute the covariance and Correlation of Y and X. (3) Compute the Simple Linear Regression Coefficients ST3131, Lecture 2

Reading Assignment • Review Sections 2.1-2.5 of Chapter 2. • Read Sections 2.6-2.9 of Chapter 2. Consider problems: • a) How to do significance tests of parameters? • b) How to construct confidence intervals of parameters? • c) How to do inferences about prediction? ST3131, Lecture 2

Chapter 2 Simple Linear Regression

Chapter 2 Simple Linear Regression

Presentation Transcript

Chapter 12 Simple Linear Regression

Simple Linear Regression

Chapter 12a Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple linear regression

Chapter 11: Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple linear regression

Chapter 11: Simple Linear Regression

Simple Linear Regression