180 likes | 278 Views
Haroon Alam , Mitchell Sanders, Chuck McAllister-Ashley, and Arjun Patel. Chapter 3 Review: Linear Regression. The Big Idea. Plot data on a scatterplot. Interpret what you see: direction, form, strength, and outliers.
E N D
HaroonAlam, Mitchell Sanders, Chuck McAllister-Ashley, and Arjun Patel Chapter 3 Review: Linear Regression
The Big Idea Plot data on a scatterplot. Interpret what you see: direction, form, strength, and outliers Numerical summary: mean of X and Y, standard deviation of X and Y, and r. Least-Squares Regression Line How well does it fit: r and r^2
Vocabulary • Response Variable: output, dependent variable, y value • Explanatory Variable: input, independent variable, x value • Scatterplot: a mathematical diagram that shows values for two variables as points on a Cartesian plane; best used for quantitative data • Outlier: an observation that has a large residual
Vocabulary • Influential Point: a point that has a large effect on the slope of a regression line but has a small residual • Correlation: a measure of how dependent the response variable is on the explanatory variable • Residuals: difference between observed value of the response value and the value predicted by the regression line
Vocabulary • Least-squares Regression Line: the line that makes the sum of the squared vertical distances of the data points from the line as small as possible • Sum of Squared Errors: a measure of the difference between the estimated values based on the linear regression and the actual observations • Total Sum of Squares: a measure of the difference between the estimated values on the line y = y and the actual observed values.
Vocabulary • Coefficient of Determination: the fraction of the variation in the values of the response variable that can be explained by the LSRL of y on x • Residual Plots: a plot of the residuals against the explanatory variable • Extrapolation: the use of a regression line for prediction outside the range of values of the explanatory variable • Lurking Variable: a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables
Key Topics • Data: categorical and quantitative • Scatterplots and descriptions • Strong/weak, positive/negative, linear/not linear • Outliers and Influential Points • Creating the least-squares regression line • Calculating correlation and coefficient of determination
Formulas • To calculate the correlation r : • To calculate the slope, b, of the least-squares regression line: • To calculate the y-intercept: • To calculate the sum of squared errors, SSE:
Formulas • To calculate the total sum of squares, SSM: • To calculate the coefficient of determination: • Or the correlation r could be squared • To calculate the residual:
Calculator Key Strokes • To make a scatterplot with the calculator, first enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Then, turn push “2nd” “Y=” “ENTER” “ENTER”. Next push “ZoomStat” to view the scatterplot. • To overlay the least-squares regression line over the scatterplot, follow the above two list of steps. However, after pushing “8” choose to store the RegEQ: by first selecting RegEQ:. Next, push “VARS”, scroll over to “Y-VARS”, and push “ENTER” twice. Push “ENTER” twice again to calculate the least-squares regression line. Next, push “ZoomStat” to view the scatterplot and the overlaying least-squares regression line.
Calculator Key Strokes • To calculate the least-squares regression line, r, and r2, first push “MODE”. Scroll down to “Stat Diagnostics” and select “ON”. Hit “Enter”. Enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Press “STAT”, choose “CALC”, then push “8”. Hit “ENTER” five times. The y-intercept, slope, r, and r2 will be calculated.
Calculator Key Strokes • To create a residual plot in the calculator, first enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Next, calculate the least-squares regression line. Then, push “2nd” “Y=” “ENTER”. Turn on Plot1, make sure the scatterplot form is selected, and Xlist should be L1. Ylist should be changed to Resid. This is done by selecting Ylist, then pushing “2nd” “Stat” “Resid”. Next push “ZoomStat” to view the residual plot.
Example Problem • With this data, find the LSRL • Start by entering this data into list 1 and list 2
Example Problem • Results of the Regression • a=53.24 • b=1.65 • r-squared=.8422 • r=.9177
Example Problem • Interpreting the intercept • When your shoe size is 0, you should be about 53.24 inches tall • Of course this does not make much sense in the context of the problem • Interpreting the slope • For each increase of 1 in the shoe size, we would expect the height to increase by 1.65 inches • Making predictions • How tall might you expect someone to be who has a shoe size of 12.5? • Plug in 12.5 • Height = 53.24+1.65 (12.5)=73.865 inches
Helpful Hints • Our eyes are not good judges of how strong a linear relationship is. • Correlation requires that both variables be quantitative. • Correlation makes no distinction between explanatory and response variables. • r does not change when the units of measurement of x or y change. • The correlation r is always a number between -1 and 1.
Helpful Hints • Correlation measures strength of linear relationships only. • The correlation is very affected by outliers. • Regression, unlike correlation, requires that we have an explanatory variable and a response variable. • The size of the LSRL slope does not determine how important a relationship is. • There is a close connection between correlation and the slope of LSRL.
Helpful Hints • Do not forget to use y-hat in the equations. • Write in the form • Extrapolation produces unreliable predictions. • Lurking variables can make correlation misleading. • Correlations based on averages are usually too high when applied to individuals. • Association does not imply causation.