Simple Linear Regression

Simple Linear Regression With Thanks to My Students in AMS572 – Data Analysis

1. Introduction Example: Brad Pitt: 1.83m Angelina Jolie: 1.70m George Bush :1.81m Laura Bush: ? David Beckham: 1.83m Victoria Beckham: 1.68m ● To predict height of the wife in a couple, based on the husband’s height Response (out come or dependent) variable (Y): height of the wife Predictor (explanatory or independent) variable (X): height of the husband

Regression analysis: ●regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variable. ●when there is just one predictor variable, we will use simple linear regression. When there are two or more predictor variables, we use multiple linear regression. ● The earliest form of linear regression was the method of least squares, which was published by Legendre in 1805, and by Gauss in 1809. ● The method was extended by Francis Galton in the 19th century to describe a biological phenomenon. ● This work was extended by Karl Pearson and Udny Yule to a more general statistical context around 20th century. ● when it is not clear which variable represents a response and which is a predictor, correlation analysis is used to study the strength of the relationship History:

A probabilistic model • We denote the n observed values of the predictor variable x as • We denote the corresponding observed values of the response variable Y as

Notations of the simple linear Regression - Observed value of the random variable Yi depends on xi - random error with unknown mean of Yi True Regression Line Unknown Slope Unknown Intercept

4 BASIC ASSUMPTIONS – for statistical inference Linear function of the predictor variable Have a common variance, Same for all values of x. Normally distributed Independent

Comments: 1. Linear not in x But in the parameters and Example: linear, logx = x* 2. Predictor variable is not set as predetermined fixed values, is random along with Y. The model can be considered as a conditional model Example: Height and Weight of the children. Height (X) – given Weight (Y) – predict Conditional expectation of Y given X = x

2. Fitting the Simple Linear Regression Model 2.1 Least Squares (LS) Fit

Example 10.1 (Tires Tread Wearvs. Mileage: Scatter Plot. From: Statistics and Data Analysis; Tamhane and Dunlop; Prentice Hall. )

The “best” fitting straight line in the sense of minimizing Q: LS estimate One way to find the LS estimate and Setting these partial derivatives equal to zero and simplifying, we get

Solve the equations and we get

To simplify, we introduce • The resulting equation is known as the least squares line, which is an estimate of the true regression line.

Example 10.2 (Tire Tread vs. Mileage: LS Line Fit) Find the equation of the line for the tire tread wear data from Table10.1,we have and n=9.From these we calculate

The slope and intercept estimates are Therefore, the equation of the LS line is Conclusion: there is a loss of 7.281 mils in the tire groove depth for every 1000 miles of driving. Given a particular We can find Which means the mean groove depth for all tires driven for 25,000miles is estimated to be 178.62 miles.

2.2 Goodness of Fit of the LS Line • Coefficient of Determination and Correlation • The residuals: are used to evaluate the goodness of fit of the LS line.

We define: Note: total sum of squares (SST) Regression sum of squares (SSR) Error sum of squares (SSE) is called the coefficient of determination

Example 10.3 (Tire Tread Wear vs. Mileage: Coefficient of Determination and Correlation • For the tire tread wear data, calculate using the result s from example 10.2 We have • Next calculate • Therefore • The Pearson correlation is where the sign of r follows from the sign of since 95.3% of the variation in tread wear is accounted for by linear regression on mileage, the relationship between the two is strongly linear with a negative slope.

The Maximum Likelihood Estimators (MLE) • Consider the linear model: where is drawn from a normal population with mean 0 and standard deviation σ, the likelihood function for Y is: • Thus, the log-likelihood for the data is:

The MLE Estimators • Solving • We obtain the MLEs of the three unknown model parameters • The MLEs of the model parameters a and b are the same as the LSEs – both unbiased • The MLE of the error variance, however, is biased:

2.3 An Unbiased Estimator of s2 An unbiased estimate ofis given by Example 10.4(Tire Tread Wear Vs. Mileage: Estimate of Find the estimate of for the tread wear data using the results from Example 10.3 We have SSE=2351.3 and n-2=7,therefore Which has 7 d.f. The estimate of is miles.

3. Statistical Inference on b0 and b1 Under the normal error assumption * Point estimators: * Sampling distributions of and : For mathematical derivations, please refer to the Tamhane and Dunlop text book, P331.

Statistical Inference on b0 and b1 , Con’t * Pivotal Quantities (P.Q.’s): * Confidence Intervals (CI’s):

Statistical Inference on b0 and b1 , Con’t * Hypothesis tests: -- Test statistics: -- At the significance level , we reject in favor of if and only if (iff) --The first test is used to show whether there is a linear relationship between x and y

Analysis of Variance (ANOVA), Con’t Mean Square: -- a sum of squares divided by its d.f.

Analysis of Variance (ANOVA) ANOVA Table Example:

4. Regression Diagnostics4.1 Checking for Model Assumptions • Checking for Linearity • Checking for Constant Variance • Checking for Normality • Checking for Independence

Checking for Linearity Xi =Mileage Y=β0+ β1x Yi =Groove Depth ^ ^ ^ ^ Y=β0+ β1x Yi =fitted value ^ ei =residual Residual = ei = Yi- Yi

Checking for Normality

Checking for Constant Variance Var(Y) is not constant. A sample residual plots when Var(Y) is constant.

Does not apply for Simple Linear Regression Model Only apply for time series data Checking for Independence

4.2 Checking for Outliers & Influential Observations • What is OUTLIER • Why checking for outliers is important • Mathematical definition • How to deal with them

4.2-A. Intro Recall Box and Whiskers Plot (Chapter 4 of T&D) • Where (mild) OUTLIER is defined as any observations that lies outside of Q1-(1.5*IQR) and Q3+(1.5*IQR) (Interquartile range, IQR = Q3 − Q1) • (Extreme) OUTLIER as that lies outside of Q1-(3*IQR) and Q3+(3*IQR) • Observation "far away" from the rest of the data

4.2-B. Why are outliers a problem? • May indicate a sample peculiarity or a data entry error or other problem ; • Regression coefficients estimated that minimize the Sum of Squares for Error (SSE) are very sensitive to outliers >>Bias or distortion of estimates; • Any statistical test based on sample means and variances can be distorted In the presence of outliers >>Distortion of p-values; • Faulty conclusions. Example: ( Estimators not sensitive to outliers are said to be robust)

4.2-C. Mathematical Definition • Outlier The standardized residual is given by • If |ei*|>2, then the corresponding observation may be regarded an outlier. • Example: (Tire Tread Wear vs. Mileage) • STUDENTIZED RESIDUAL: a type of standardized residual calculated with the current observation deleted from the analysis. • The LS fit can be excessively influenced by observation that is not necessarily an outlier as defined above.

eg.1 with without eg.2 scatter plot residual plot 4.2-C. Mathematical Definition • Influential Observation Observation with extreme x-value, y-value, or both. • On average hii is (k+1)/n, regard any hii>2(k+1)/n as high leverage; • If xi deviates greatly from mean x, then hii is large; • Standardized residual will be large for a high leverage observation; • Influence can be thought of as the product of leverage and outlierness. • Example: (Observation is influential/ high leverage, but not an outlier)

4.2-C. SAS code of the tire example SAS code Data tire; Input x y; Datalines; 0 394.33 4 329.50 … 32 150.33; Run; procregdata=tire; model y=x; outputout=resid rstudent=r h=lev cookd=cd dffits=dffit; Run; procprintdata=resid; where abs(r)>=2 or lev>(4/9) or cd>(4/9) or abs(dffit)>(2*sqrt(1/9)); run;

4.2-C. SAS output of the tire example SAS output

4.2-D. How to deal with Outliers & Influential Observations • Investigate (Data errors? Rare events? Can be corrected?) • Ways to accommodate outliers • Non Parametric Methods (robust to outliers) • Data Transformations • Deletion (or report model results both with and without the outliers or influential observations to see how much they change)

4.3 Data Transformations Reason • To achieve linearity • To achieve homogeneity of variance • To achieve normality or symmetry about the regression equation

Types of Transformation • Linearzing Transformation transformation of a response variable, or predicted variable, or both, which produces an approximate linear relationship between variables. • Variance Stabilizing Transformation make transformation if the constant variance assumption is violated

Linearizing Transformation • Use mathematical operation, e.g. square root, power, log, exponential, etc. • Only one variable needs to be transformed in the simple linear regression. Which one? Predictor or Response? Why?

e.g. We take a log transformation on Y = a exp (-bx) <=> log Y = log a - b x

Variance Stabilizing Transformation Delta method :Two terms Taylor-series approximations Var( h(Y)) ≈ [h(m)]2 g2 (m) whereVar(Y) = g2(m), E(Y) = m • set [h’(m)]2 g2 (m) = 1 • h’(m) = • h(m) =  h(y) = e.g. Var(Y) = c2 m2 , where c > 0, g(m) = cm ↔ g(y) = cy h(y) = = = Therefore it is the logarithmic transformation

5. Correlation Analysis • Pearson Product Moment Correlation: a measurement of how closely two variables share a linear relationship. • Useful when it is not possible to determine which variable is the predictor and which is the response. • Health vs wealth. Which is predictor? Which is response?

Statistical Inference on the Correlation Coefficient ρ • We can derive a test on the correlation coefficient in the same way that we have been doing in class. • Assumptions • X, Y are from the bivariate normal distribution • Start with point estimator • r: sample correlation coefficient: estimator of the population correlation coefficient ρ • Get the pivotal quantity • The distribution of r is quite complicated • T0: test statistic for ρ = 0 • Do we know everything about the p.q.? • Yes: T ~ tn-2 under H0 : ρ=0

Bivariate Normal Distribution • pdf: • Properties • μ1, μ2 means for X, Y • σ12, σ22 variances for X, Y • ρ the correlation coeffbetween X, Y

Derivation of T0 • Therefore, we can use t as a statistic for testing against the null hypothesis H0: β1=0 • Equivalently, we can test against H0: ρ=0

Exact Statistical Inference on ρ • Example • A researcher wants to determine if two test instruments give similar results. The two test instruments are administered to a sample of 15 students. The correlation coefficient between the two sets of scores is found to be 0.7. Is this correlation statistically significant at the .01 level? • H0 : ρ=0 , Ha : ρ≠0 • for α = .01, 3.534 = t0 > t13, .005 = 3.012 • ▲ Reject H0 • Test • H0 : ρ=0 , Ha : ρ≠0 • Test statistic: • Reject H0 iff

Simple Linear Regression