Simple Linear Regression

Simple Linear Regression If it’s simple, how bad can it be?

Linear Functions • Does this look familiar? • We use something very similar: Where: b is the slope a is the y intercept x is the independent (predictor) variable y is the dependent (response) variable

Regression Line • Use the regression line to make a prediction about the dependent variable • , which is typically called “y-hat,” means an estimate of y • For example • x might be the ACT score of entering freshmen

Regression Example • Made-up example comparing cigarette smoking to health issues • Y = number of health problems experienced by people between the ages of 65 and 70 • X = number of packs of cigarettes smoked per day between their ages of 20 and 50 • Our goal is to create an equation that will help us predict the value of y from the value of x Source: http://www.hippocampus.org/course_locator?skinPath=http://www.hippocampus.org/hippocampus.skins/default&course=Statistics%20for%20Social%20Sciences&lesson=18&topic=3&topicTitle=Regression%20Examples

Cigarette Study

Scatterplot

Coefficient of Correlation

Correlation

Coefficient of Correlation

Correlations • http://www.duxbury.com/authors/mcclellandg/tiein/johnson/correlation.htm • http://www.rossmanchance.com/applets/guesscorrelation/GuessCorrelation.html

Least Squares Method • Once again, we’re back to variances • Recall: variance squares the deviation from the mean • Regression line that has the smallest value of distances (squared) from it is our least squares line • How do we do that? • We have values for y and x, we need a and b Where r is the correlation coefficient for the line

Solve for Slope (b)

Solve for Intercept (a)

Regression Equation • Substituting the values we just calculated, our regression equation is: • If we enter a value for x into this equation, we will get a predicted value for y

Scatterplot

Least Square Error • The formula gives us an approximation for the value of the dependent variable • Actual value of y is obtained by measurement • If the regression line was perfect,

Least Square Error • Deviation of actual score from predicted is the error term

Cigarette Study Sum of errors = 0

Error Variance Sum of errors, squared = 108.571 Error variance = sum of squares/df = 108.571/5 = 21.7

Simple Linear Regression