540 likes | 680 Views
AMS 572 Presentation. CH 10 Simple Linear Regression. Introduction. Example:. Brad Pitt: 1.83m Angelina Jolie: 1.70m. George Bush :1.81m Laura Bush: ?. David Beckham: 1.83m Victoria Beckham: 1.68m. ● To predict height of the wife in a couple, based on the husband’s height.
E N D
AMS 572 Presentation CH 10 Simple Linear Regression
Introduction Example: Brad Pitt: 1.83m Angelina Jolie: 1.70m George Bush :1.81m Laura Bush: ? David Beckham: 1.83m Victoria Beckham: 1.68m ● To predict height of the wife in a couple, based on the husband’s height Response (out come or dependent) variable (Y): height of the wife Predictor (explanatory or independent) variable (X): height of the husband
Regression analysis: ●regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variable. ●when there is just one predictor variable, we will use simple linear regression. When there are two or more predictor variables, we use multiple linear regression. ● The earliest form of linear regression was the method of least squares, which was published by Legendre in 1805, and by Gauss in 1809. ● The method was extended by Francis Galton in the 19th century to describe a biological phenomenon. ● This work was extended by Karl Pearson and Udny Yule to a more general statistical context around 20th century. ● when it is not clear which variable represents a response and which is a predictor, correlation analysis is used to study the strength of the relationship History:
A probabilistic model • Specific settings of the predictor variable • Corresponding values of the response variable
ASSUME: - Observed value of the random variable Yi depends on xi - random error with unknown mean of Yi True Regression Line Unknown Slope Unknown Intercept
4 BASIC ASSUMPTIONS Linear function of Have a common variance, Same for all values of x. Normally distributed Independent
Comments: 1. Linear not because of x Linear in the parameters and Example: linear, logx = x 2. Predictor variable is not set as predetermined fixed values, is random along with Y Example: Height and Weight of the children. Height (X) – given Weight (Y) – predict Conditional expectation of Y given X = x
10.2 Fitting the Simple Linear Regression Model 10.2.1 Least Squares (LS) Fit
The “best” fitting straight line in the sense of minimizing Q: LS estimate One way to find the LS estimate and Setting these partial derivatives equal to zero and simplifying, we get
To simplify, we introduce • We get The equation is known as the least squares line, which is an estimate of the true regression line.
Example 10.2 (Tire Tread vs. Mileage: LS Line Fit) Find the equation of the line for the tire tread wear data from Table10.1,we have and n=9.From these we calculate
The slope and intercept estimates are Therefore, the equation of the LS line is Conclusion: there is a loss of 7.281 mils in the tire groove depth for every 1000 miles of driving. Given a particular We can find Which means the mean groove depth for all tires driven for 25,000miles is estimated to be 178.62 miles.
10.2.2 Goodness of Fit of the LS Line • Coefficient of Determination and Correlation • The residuals: are used to evaluate the goodness of fit of the LS line.
We define: Note: total sum of squares (SST) Regression sum of squares (SSR) Error sum of squares (SSE) r is called the coefficient of determination 0<r<1
Example 10.3(Tire Tread Wear vs. Mileage: Coefficient of Determination and Correlation • For the tire tread wear data, calculate and using the result s from example 10.2 We have • Next calculate • Therefore where the sign of r follows from the sign of since 95.3% of the variation in tread wear is accounted for by linear regression on mileage, the relationship between the two is strongly linear with a negative slope.
10.2.3 estimation of s2 An unbiased estimate ofis given by Example 10.4(Tire Tread Wear Vs. Mileage: Estimate of Find the estimate of for the tread wear data using the results from Example 10.3 We have SSE=2351.3 and n-2=7,therefore Which has 7 d.f. The estimate of is miles.
Statistical Inference on b0 and b1 , Con’t Point estimators: Sampling distributions of and : For mathematical derivations, please refer to the text book, P331.
Statistical Inference on b0 and b1 , Con’t P.Q.’s: CI’s:
Statistical Inference on b0 and b1 , Con’t Hypothesis test: -- Test statistic: -- At the significance level , we reject in favor of iff --Can be used to show whether there is a linear relationship between x and y
Analysis of Variance (ANOVA), Con’t Mean Square: -- a sum of squares divided by its d.f.
Analysis of Variance (ANOVA) ANOVA Table Example:
10.4 Regression Diagnostics10.4.1 Checking for Model Assumptions • Checking for Linearity • Checking for Constant Variance • Checking for Normality • Checking for Independence
Checking for Linearity Xi =Mileage Y=β0+ β1x Yi =Groove Depth ^ ^ ^ ^ Y=β0+ β1x Yi =fitted value ^ ei =residual Residual = ei = Yi- Yi
Checking for Constant Variance Var(Y) is not constant. A sample residual plots when Var(Y) is constant.
Does not apply for Simple Linear Regression Model Only apply for time series data Checking for Independence
10.4.2 Checking for Outliers & Influential Observations • What is OUTLIER • Why checking for outliers is important • Mathematical definition • How to deal with them
10.4.2-A. Intro Recall Box and Whiskers Plot (Chapter 4) • Where (mild) OUTLIER is defined as any observations that lies outside of Q1-(1.5*IQR) and Q3+(1.5*IQR) (Interquartile range, IQR = Q3 − Q1) • (Extreme) OUTLIER as that lies outside of Q1-(3*IQR) and Q3+(3*IQR) • Observation "far away" from the rest of the data
10.4.2-B. Why are outliers a problem? • May indicate a sample peculiarity or a data entry error or other problem ; • Regression coefficients estimated that minimize the Sum of Squares for Error (SSE) are very sensitive to outliers >>Bias or distortion of estimates; • Any statistical test based on sample means and variances can be distorted In the presence of outliers >>Distortion of p-values; • Faulty conclusions. Example: ( Estimators not sensitive to outliers are said to be robust)
10.4.2-C. Mathematical Definition • Outlier The standardized residual is given by • If |ei*|>2, then the corresponding observation may be regarded an outlier. • Example: (Tire Tread Wear vs. Mileage) • STUDENTIZED RESIDUAL: a type of standardized residual calculated with the current observation deleted from the analysis. • The LS fit can be excessively influenced by observation that is not necessarily an outlier as defined above.
eg.1 with without eg.2 scatter plot residual plot 10.4.2-C. Mathematical Definition • Influential Observation Observation with extreme x-value, y-value, or both. • On average hii is (k+1)/n, regard any hii>2(k+1)/n as high leverage; • If xi deviates greatly from mean x, then hii is large; • Standardized residual will be large for a high leverage observation; • Influence can be thought of as the product of leverage and outlierness. • Example: (Observation is influential/ high leverage, but not an outlier)
10.4.2-C. SAS code of the examples SAS code procregdata=tire; model y=x; outputout=resid rstudent=r h=lev cookd=cd dffits=dffit; procprintdata=resid; where abs(r)>=2 or lev>(4/9) or cd>(4/9) or abs(dffit)>(2*sqrt(1/9)); run; SAS output
10.4.2-D. How to deal with Outliers & Influential Observations • Investigate (Data errors? Rare events? Can be corrected?) • Ways to accommodate outliers • Non Parametric Methods (robust to outliers) • Data Transformations • Deletion (or report model results both with and without the outliers or influential observations to see how much they change)
10.4.3 Data Transformations Reason • To achieve linearity • To achieve homogeneity of variance • To achieve normality or symmetry about the regression equation
Type of Transformation • Linearzing Transformation transformation of a response variable, or predicted variable, or both, which produces an approximate linear relationship between variables. • Variance Stabilizing Transformation make transformation if the constant variance assumption is violated
Method of Linearizing Transformation • Use mathematical operation, e.g. square root, power, log, exponential, etc. • Only one variable needs to be transformed in the simple linear regression. Which one? Predictor or Response? Why?
e.g. We take a exponential transformation on Y = a exp (-bx) <=> log Y = log a - b x
Method of Variance Stabilizing Transformation Delta method :Two terms Taylor-series approximations Var( h(Y)) ≈ [h(m)]2 g2 (m) whereVar(Y) = g2(m), E(Y) = m • set [h’(m)]2 g2 (m) = 1 • h’(m) = • h(m) = h(y) = e.g. Var(Y) = c2 m2 , where c > 0, g(m) = cm ↔ g(y) = cy h(y) = = = Therefore it is the logarithmic transformation
Correlation Analysis • Correlation: a measurement of how closely two variables share a linear relationship. • Useful when it is not possible to determine which variable is the predictor and which is the response. • Health vs wealth. Which is predictor? Which is response?
Statistical Inference on the Correlation Coefficient ρ • We can derive a test on the correlation coefficient in the same way that we have been doing in class. • Assumptions • X, Y are from the bivariate normal distribution • Start with point estimator • R: sample estimate of the population correlation coefficient ρ • Get the pivotal quantity • The distribution of R is quite complicated • T: transform the point estimator into a p.q. • Do we know everything about the p.q.? • Yes: T ~ tn-2 under H0 : ρ=0
Bivariate Normal Distribution • pdf: • Properties • μ1, μ2 means for X, Y • σ12, σ22 variances for X, Y • ρ the correlation coeffbetween X, Y
Derivation of T • Therefore, we can use t as a statistic for testing against the null hypothesis H0: β1=0 • Equivalently, we can test against H0: ρ=0
Exact Statistical Inference on ρ • Test • H0 : ρ=0 , Ha : ρ<>0 • Test statistic: • Reject H0 if t0 > tn-2 • Example (from textbook) • A researcher wants to determine if two test instruments give similar results. The two test instruments are administered to a sample of 15 students. The correlation coefficient between the two sets of scores is found to be 0.7. Is this correlation statistically significant at the .01 level? • H0 : ρ=0 , Ha : ρ<>0 • for α = .01, 3.534 = t0 > t13, .005 = 3.012 • ▲ Reject H0
Approximate Statistical Inference on ρ • There is no exact method of testing ρ vs an arbitrary ρ0 • Distribution of R is very complicated • T ~ tn-2 only when ρ = 0 • To test ρ vs an arbitrary ρ0 use Fisher’s Normal approximation • Transform the sample estimate
Approximate Statistical Inference on ρ • Test : • Sample estimate: • Z statistic: • CI: • reject H0 if |z0| > zα/2
Approximate Statistical Inference on ρusing SAS • Code: • Output: