220 likes | 242 Views
… Don’t be afraid of others, because they are bigger than you. The real size could be measured in the wisdom. Chapter 2 Simple Linear Regression. In this Chapter, we consider the simplest regression model : Simple Linear Regression (SLR) Model
E N D
…Don’t be afraid of others, because they are bigger than you. The real size could be measured in the wisdom. ST3131, Lecture 2
Chapter 2 Simple Linear Regression In this Chapter, we consider the simplest regression model : Simple Linear Regression (SLR) Model Which describes the linear relationship between Y and X. Tasks: 1. Review of some basic statistics 2. Define measures of the direction / strength of the linear relationship between Y and X. 3. Find the formulas for the estimators and . ST3131, Lecture 2
Review of Some Basic Statistics Let Y and X have n observations as follows: • Summary Statistics: • Mean : , average of observations of Y, measure for sample center of Y. • Deviation: , differences of observation from . • Variance: , average of squared deviations of Y. • Standard Deviation: , measure for spread of Y. • Standardization of Y: • Similarly, we can define all terms for X. ST3131, Lecture 2
Properties of Standardized Variables Properties: =0, =1. Proof: ST3131, Lecture 2
Exact Linear Relationship between Y and X Given Y= + X, consider the Linear Relationship between Y and X, Case 1). Positive, X increases, Y increases, when ( ) > 0; 2). Negative, X increases, Y decreases, when ( ) <0; 3). Linearly Uncorrelated, X changes, Y does NOT change, when ( ) =0. =1, = -1 =1, = 0 =1, = 1 Conclusion: ( ) is an Indicator of the direction of Y changing with X. ST3131, Lecture 2
Distorted Linear Relationship between Y and X Given Y= + X + , consider the Linear Relationship between Y and X Case 1). Positive, X increases, Y almost increases when ( ) >0; 2). Negative, X increases, Y almost decreases when ( ) <0; 3).Linearly Uncorrelated, X changes, Y almost does NOT change, when ( )=0 =1, = -1 =1, = 0 =1, = -1 Conclusion: ( ) is also an Indicator of the Direction of Y changing with X. ST3131, Lecture 2
Intuitive Derivation of the LS-estimators Sample Covariance of Y and X: the summation of cross-products of deviations of Y and X divided by (n-1). Intuitive Derivation: We have , thus we have Thus, ST3131, Lecture 2
Formulas for the LS-estimators Assume ( ) =0, and =0. The LS-estimators are Since =0, Cov(Y,X) has the same signs as those of Thus Cov(Y,X) is also an Indicator of the direction of the Linear Relationship between Y and X: Case 1) Positive when Cov(Y,X)>0; 2) Negative when Cov(Y,X)<0; 3) Uncorrelated when Cov(Y,X)=0. Summary: Indicators of the Direction of the Linear Relationship between Y and X 1). The slope 2) The slope estimator 3). Cov (Y,X) ST3131, Lecture 2
Properties of Cov(Y,X) • 1. Symmetric, i.e., Cov(Y,X)=Cov(X,Y). • 2. Scale-Dependent, i.e. when the scales of Y or X change, so is their covariance. Let Y1=a+bY,X1=c+dX. Then we have • 3. Take values from - to + since b and d can take any values. Thus Cov(Y,X) does not measure the strength of the linear relationship between Y and X. ST3131, Lecture 2
Correlation Coefficient between Y and X Correlation Coefficient of Y and X is defined as the covariance of Standardized Y and X i.e. the covariance of Y and X divided by their standard deviations. Clearly Cor(Y,X) and Cov(Y,X) have the same signs so that it is also an indicator of the direction of the linear relationship between Y and X 1) Positive when Cor(Y,X)>0 2) Negative when Cor(Y,X)<0 3) Linear Uncorrelated when Corr(Y,X)=0 ST3131, Lecture 2
Properties of Cor(Y,X) • Symmetric, I.e. Cor(Y,X)=Cor(X,Y). • Scale-Invariant, i.e., Not change with change of the scales of Y and X. • Let Y1=a+bY,X1=c+dX, b>0,d>0. Then 3. Take values between –1 and 1. The strength of the Linear Relationship: 1). Strong when |Cor(Y,X)| close to 1; 2). Weak when |Cor(Y,X)| close to 0; 3). Linear Uncorrelated when Cor(Y,X)=0; But Y and X can still have some relationship. Counter example: the top-right picture where Y=2-cos(6.28X) (perfect nonlinear relationship) while Cor(Y,X)=0. ST3131, Lecture 2
Examples of Correlation Coefficients Cor(Y,X)=.71 Strong Linearity Cor(Y,X)=-.09 Near Uncorrelated Cor(Y,X)=.98 Very Strong Linearity Robustness: BothCov(Y,X) and Cor(Y,X) are NOT Robust Statistics since their values will be affected by a few outliers . Examples: Anscombe quartets (see next slide) have the same summary statistics but quite different pictures: (a). can be described by a linear model; (b). can be described by a qudratic model; (c ). has an outlier, and so is (d). ST3131, Lecture 2
Anscombe Quartets (b) Strong Nonlinearity (a) Strong linearity (d) An outlier appears (c) an outlier appears ST3131, Lecture 2
Table for Computing Variance and Covariance Note that Var(Y)=sum of squared deviations of Y divided by (n-1) Var(X)=sum of squared deviations of X divided by (n-1) Cov(Y,X)=sum of products of deviations of Y and X divided by (n-1). ST3131, Lecture 2
Example Computer Repair Data (Table 2.5, Page 27, see Table 2.6, Page 28 for detail computation) Conclusion: Y and X are strongly linearly related. Drawback: Cor(Y,X) can not be used to predict Y values given X values. This can be done with Simple Linear Regression Analysis. ST3131, Lecture 2
Strict Derivation of the LS-estimators SLR Model Intercept =the predicted value of Y when X=0, Slope =the change in Y for unit change in X. Least Squares Method: Find to minimize the Sum of the Squared Errors (SSE): The minimizers are: the same as those in Slide 7. ST3131, Lecture 2
Proof ST3131, Lecture 2
Proof (continued) Equality holds when Which are the least squares estimators of the parameters of and Since , we have an important property of Cor(Y,X) Moreover, we have another important equation: ST3131, Lecture 2
Example Computer Repair Data(continued), we have , Cov(Y,X)=136, Var(X)=2.96 Thus the LS- Regression Line is The fitted values and residuals then are In other words, we have : Minutes =4.162+15.5*Units Using this formula, we can compute the fitted (predicted) values, e.g. X=4, fitted value=4.162+15.5*4=66.20. X=11, predicted value=4.162+15.5*11=174.66. ST3131, Lecture 2
Exercise (1) Fill the following table, then compute the mean, variance, std of Y and X (2) Compute the covariance and Correlation of Y and X. (3) Compute the Simple Linear Regression Coefficients ST3131, Lecture 2
Reading Assignment • Review Sections 2.1-2.5 of Chapter 2. • Read Sections 2.6-2.9 of Chapter 2. Consider problems: • a) How to do significance tests of parameters? • b) How to construct confidence intervals of parameters? • c) How to do inferences about prediction? ST3131, Lecture 2