180 likes | 521 Views
Simple Linear Regression. Lecture for Statistics 509 November-December 2000. Correlation and Regression. Study of association and/or relationship between variables.
E N D
Simple Linear Regression Lecture for Statistics 509 November-December 2000
Correlation and Regression • Study of association and/or relationship between variables. • Useful for determining the effect of changes in one variable (called the independent or control variable) on another variable (called the dependent or response variable). • Regression models could be utilized to determine optimal operating conditions [these conditions specified by the control variables] in order to achieve a certain specified value or yield on the response variable. • Regression models could also be utilized to predict the value of the response given a value of the independent variable, or could be used for “calibrating” the value of the independent variable to achieve a certain response. Stat 509 - Regression Lecture
Some Examples • Control variable is X = Average Speed of a Car and response variable is Y=Fuel Efficiency of the Car. Goal is to determine speed to optimize the efficiency of the car. • Control variable is X = Temperature, while the response variable is Y = Yield in a chemical reaction. • Control variable is X = amount of fertilizer applied on a plant, while the response variable is Y = yield of this plant. • Control variable is X = thickness of a stack of bond paper, while the response variable is Y = number of sheets in this stack. • Control variable is X = average time of studying, while the response variable is Y = GPA. Stat 509 - Regression Lecture
Population Model • Each member of the population will have a value for the independent variable X and the response variable Y, usually represented by the vector (X,Y). • For a given value X = x, the variable Y has a certain distribution whose conditional mean is m(x) and whose conditional variance is s2(x). • This could be visualized as follows: When you consider the subpopulation consisting of units whose values of X equal x, then their Y-values has a certain distribution whose mean is m(x) and whose variance is s2(x). When you pick a unit from this subpopulation, then the Y-value that you will observe is governed by this particular distribution. In particular, this observation could be expressed via • Y = m(x) + e, where e is some “error term.” Stat 509 - Regression Lecture
Assumptions for Simple Linear Regression • Assumptions for Simple Linear Regression • m(x) = E(Y|X=x) = a + bx. This means that the mean of Y, given X = x, is a linear function of x. • b is called the regression coefficient or the slope of the regression line; a is the y-intercept. • s2(x) = s2 does not depend on x. This is the assumption of “equal variances” or homoscedasticity. • Furthermore, for the sample data (x1, Y1), (x2, Y2), …, (xn, Yn): • Y1, Y2, …, Yn are independent observations, and their conditional distributions are all normal. • In shorthand notation: • Yi = m(xi) + ei = a + bxi + ei, i=1,2,…,n, where e1, e2, …, en are independent and identically distributed (IID) N(0,s2). Stat 509 - Regression Lecture
Regression Problem • Given the sample (bivariate) data (x1, Y1), (x2, Y2), …, (xn, Yn), satisfying the linear regression model • Yi = a + bxi + ei with e1, e2, …, en IID N(0, s2) • we would like to address the following questions: • How should the data be summarized graphically? • What are the estimators of the parameters a, b, and s2? • What will be an estimate of the prediction line? • What are the properties of the estimators of the model parameters? • How do we test whether the fitted regression model is a significant model? • How do we construct CIs or test hypotheses concerning parameters? • How do we perform prediction using the prediction model? Stat 509 - Regression Lecture
Illustrative Example: On Plasma Etching • Plasma etching is essential to the fine-line pattern transfer in current semiconductor processes. The paper “Ion Beam-Assisted Etching of Aluminum with Chlorine” in J. Electrochem. Soc. (1985) gives the data below on chlorine flow (x, in SCCM) through a nozzle used in the etching mechanism, and etch rate (y, in 100A/min) Stat 509 - Regression Lecture
The Scatterplot Stat 509 - Regression Lecture
Least-Squares Prediction Line Stat 509 - Regression Lecture
Analysis of Variance Table Stat 509 - Regression Lecture
Excel Worksheet for Regression Computations Stat 509 - Regression Lecture
Regression Analysis from Minitab Stat 509 - Regression Lecture
Fitted Line in Scatterplot with Bands Stat 509 - Regression Lecture