190 likes | 327 Views
MSP 5410 Statistics. Lecture 11 Dr. Chappell. When a scenario indicates comparison of 3 or more groups, this is the statistical test that you should select. The previous note slides gave more detailed information about ANOVA, for example, what ANOVA stands for. About ANOVA.
E N D
MSP 5410Statistics Lecture 11 Dr. Chappell
When a scenario indicates comparison of 3 or more groups, this is the statistical test that you should select. The previous note slides gave more detailed information about ANOVA, for example, what ANOVA stands for. About ANOVA
Introduction to regression analysis – Interpreting printouts Chapter 18
Simple linear regression • Simple regression • Ordinary least squares (p. 333) • Linear regression (p. 333) • Involves two variables that are measured at the interval level How to determine the relationship between two interval-level variables (p.323)
First variable, the variable you will use to predict the other variable, can be referred to as the: • Independent variable (IV) • Predictor variable • Regressor variable • X-variable • The second variable is the one you are predicting. It is sometimes referred to as the: • Dependent variable (DV) • Response variable • Predicted variable • Y-variable Simple Regression (involving the statistical/prediction relationship, not the functional relationship)
Page 324 • You might review this exercise. Independent vs dependent variable exercise
For prediction (*It is important to know that it is for this purpose that regression should be indicated as the appropriate statistical test to use for hypothesis testing.) • Our discussion focuses on simple regression, involving only 2 variables • Variable X • Variable Y • Want to predict Y, given X When to Use Regression Analysis
Examining the co-relation of two variables (There’s something called correlational analysis that is related to regression.) • Shape • Direct/positive(Data points move lower left to upper right) – Low values on the x-axis correspond to low values on the y-axis and high values on the x-axis correspond to high values on the y-axis • Inverse/negative (Data points move upper left to lower right) – Low values on the x-axis correspond to high values on the y-axis and high values on the x-axis correspond to low values on the y-axis *See example scatterplots/scattergrams on the next two slides Eyeballing relationship in a scatterplot/scattergram
Scatterplot – Motorist speed and police car presence (Refer to plot at top of page 327) – Shows inverse relationship
Scatterplot – iq and pulse rate – shows positive relationship
The relationship between any two variables can be defined (summarized) by a line. (P. 328) • There are two important values associated with a “line”. (P. 330) • Slope • Intercept Properties of regression line
Ŷ =α + βX(p. 330) – Formula that describes “line”(Ŷ = statistician’s symbol for the predicted value of Y and pronounced “Y-hat, p. 331) α= y-intercept or intercept (point where X crosses y-axis; also value of Ŷ when X = 0) β = slope of the line (slant of the line that represents the average change in Y for each one-unit change in X, the independent variable) For this course, you will only need to know the formula for the regression equation, NOT the formula for α or β as shown in the text. Equation for Simple Regression (Concepts found on p. 469. Again assumption is linear relationship.)
Ŷ = a + bX(p. 335) • Formula that describes “line” when using sample data • a= y-intercept or intercept (Same definition as previous slide) • b= slope of the line (Same definition as previous slide) Format for regression equations involving sample data (p. 335)
Ŷ = a + bX(Again this is the general formula when using sample data.) Use the table titled “Coefficients”. The value for a(y-intercept) is located in the (Constant) row under “B” in the Unstandardized Coefficients section of the table. The value for b= (slope of the line) is located in the 2nd (bottom) row under “B” in the Unstandardized Coefficients section of the table. How to write regression equations using spss output
Ŷ = a + bX(This is the general formula when using sample data but substitute actual values for a and b as shown below when writing the equations. Actual values are located in the SPSS output as discussed in the previous slide. The regression equation for predicting motorist speed from number of police cars is: Ŷ = 72.2 + (-2.55) X (Used 4th table on the handout) The regression equation for predicting pulse rate from IQ scores is: Ŷ = 7.714 + 0.897 X (Used 1st table on the handout) *Note that the form for these equations is the same as that for the general formula above. Practice writing regression equations using spss output– based on 4/21/2012 in-class handout
What is the predicted motorist speed when the number of police cars = 6? (X = 6) Ŷ = 72.2 + (-2.55) X = 72.2 + (-2.55) 6 = 72.2 – 15.3 = 56.9 What is the predicted pulse rate when IQ = 91? (X = 91) Ŷ = 7.714 + 0.897 X = 7.714 + 0.897 (91) = 7.714 + 81.627 = 89.341 Practice predicting Ŷ when value of x is given– based on 4/21/2012 in-class handout
COEFFICIENT OF DETERMINATION • R2(For 2 variables, actually r2 for sample data.) Also written R Square in SPSS results. • Defined as: theamount of variation in Y that is accounted for (EXPLAINED) by X. • COEFFICIENT OF NONDETERMINATION • Complement of R2 • Defined as: theamount of variation in Y that is NOT accounted for (UNEXPLAINED) by X. OTHER IMPORTANT REGRESSION CONCEPTS
To identify the amount of variance in motorist speed that is EXPLAINED by the number of police cars, use the table titled “Model Summary” and locate the value associated with R Square. The appropriate data involving these variables is found in the 2nd table. The value for R Square is 0.942 or 94.2% (when converted to a percent). Thus, this value is the coefficient of determination. • To identify the amount of variance in motorist speed that is NOT EXPLAINED by the number of police cars, first you need to know that total variance (EXPLAINED + UNEXPLAINED) is 1.000 or 100%. Since the UNEXPLAINED variance is the complement of R Square, you subtract the coefficient of determination from 1.000 (if using the decimal numbers) or 100% (if using percent), Thus, the coefficient of nondetermination (UNEXPLAINED variance/ variance NOT EXPLAINED) is calculated as follows: 1.000 100.0 % - 0.942 - 94.2% 0.058 5.8% (Compare results in the Model Summary table (2nd table for these variables) to hand calculation in Step 6 on p. 343.) Practice 1: computing coefficients of determination and nondetermination using spss output– based on 4/21/2012 in-class handout
To identify the amount of variance in pulse rate that is EXPLAINED by IQ, use the table titled “Model Summary” and locate the value associated with R Square. The appropriate data involving these variables is found in the 5th table. The value for R Square is 0.614 or 61.4% (when converted to a percent). Thus, this value is the coefficient of determination. • To identify the amount of variance in pulse rate that is NOT EXPLAINED by IQ, again you need to know that total variance (EXPLAINED + UNEXPLAINED) is 1.000 or 100%. Since the UNEXPLAINED variance is the complement of R Square, you subtract the coefficient of determination from 1.000 (if using the decimal numbers) or 100% (if using percent), Thus, the coefficient of nondetermination (UNEXPLAINED variance/ variance NOT EXPLAINED) is calculated as follows: 1.000 100.0 % - 0.614 - 61.4% 0.38638.6% Practice 2: computing coefficients of determination and nondetermination using spss output– based on 4/21/2012 in-class handout