230 likes | 468 Views
Everyday is a new beginning in life. Every moment is a time for self vigilance. . Simple Linear Regression. Regression model Goodness of fit Model diagnosis. Goal: to predict the length of Armspan for a given Height. Humm… How long is my armspan?. Armspan Data. HEIGHT ARMSPAN
E N D
Everyday is a new beginning in life. Every moment is a time for self vigilance.
Simple Linear Regression Regression model Goodness of fit Model diagnosis
Goal: to predict the length of Armspan for a given Height • Humm… • How long is my armspan?
Armspan Data HEIGHT ARMSPAN 68.75 64.25 75.75 70.25 45.75 43.00 66.75 66.25 66.50 66.75 72.25 71.25 48.25 47.25 … 75.50 70.00 75.00 77.25 64.00 65.25 68.50 67.50
Review: Math Equation for a Line • Y: the response variable • X: the explanatory variable Y=a+bX Y } b 1 }a X
Regression Model • The regression line models the relationship between X and Y on average. • Population regression line • Least squared regression line • The math equation of a regression line is called regression equation.
The Predicted Y Value • We use the regression line to estimate the average Y value for a specified X value and use this Y value to predict what Y value we might observe at this X value in the near future. • This predicted Y value, denoted as and pronounced as “y hat,” is the Y value on the regression line. So, Regression equation
The Usage of Regression Equation • Predict the value of Y for a given X value Eg. Wish to predict a lady’s weight by her height. ** What is X? Y? ** Suppose a, b are estimated as -205 and 5: ** For ladies with HT of 60”, their WT will be predicted as -205+5x60=95 pounds, the (estimated) average WT of all ladies with HT of 60’’.
Examples of the Predicted Y • The predicted WT of a given HT • The predicted armspan of a given height
The Limitation of the Regression Equation • The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed. Eg. Given HT of 40”, the regression equation will give us WT of -205+5x40 = -5 pounds!!
The Unpredicted Part • The value is the part the regression equation (model) cannot catch, and it is called “residual,” denoted as e, an estimate of “error” at this observation
Least Square Method • The regression line is the line which minimizes the sum of squares of residuals (SSE) and so the formulas for intercept and slope on the regression line are:
Inference for Regression Slope b • Standard error of • Confidence interval • Hypothesis test
Goodness of Fit • For each observation: residuals • For the whole data set: the coefficient of determination R2, which measures the proportion of variability in Y explained by the model (the linear regression of Y on X); • For simple linear regression (only one predictor) R2 = r2
Model Assumptions and Diagnosis • Independent observations • Y|X=x follows a normal distribution with a common standard deviation s, independent of x value • Diagnosis: Residual Plot, residual vs. fitted value
Residual Plot: Is the spread level of residuals more or less the same over fitted value?
Minitab:Stat>>Regression>> regression … • Select the response and predictors accordingly • Click “graphs” for residual plots
Residual Plots Click “residuals versus fits”
Regression Analysis: ARMSPAN versus HEIGHT The regression equation is ARMSPAN = - 3.73 + 1.04 HEIGHT Predictor Coef SE Coef T P Constant -3.728 2.660 -1.40 0.169 HEIGHT 1.03655 0.04082 25.39 0.000 S = 2.12905 R-Sq = 94.4% R-Sq(adj) = 94.3% Analysis of Variance Source DF SS MS F P Regression 1 2922.8 2922.8 644.81 0.000 Residual Error 38 172.2 4.5 Total 39 3095.1 Minitab Output