290 likes | 358 Views
Midterm Review Ch 7-8. Requests for Help by Chapter. Chapter 7. Describe the characteristics of the relationship between two variables. Discuss the null and research hypotheses for correlation. Discuss using the r table for determining significance. Chapter 8.
E N D
Chapter 7 • Describe the characteristics of the relationship between two variables. • Discuss the null and research hypotheses for correlation. • Discuss using the r table for determining significance.
Chapter 8 • Discuss the steps in making raw data predictions from raw data values. • Discuss the situations when you cannot use regression. • Discuss the inappropriateness of predicting outside of the sample range. • Discuss the null hypothesis in regression. • Discuss alpha levels and critical values with respect to statistical significance. • Discuss residual error when predicting from regression.
Describe the characteristics of the relationship between two variables.
Describe the characteristics of the relationship between two variables • Three dimensions characterize the relationship between two variables, • linearity, • direction, • and strength.
Linearity • The relationship is either linear or some other curvilinear relationship. • In a linear relationship, as scores on one variable increase, scores on the other variable either generally increase or generally decrease. • In a curvilinear relationship, as scores on one variable increase, scores on the other variable move first in one direction, then in another direction.
Direction • The direction of a relationship is either positive or negative. • In a positive relationship, as scores on one variable increase, scores on the other variable increase. So a best fitting line rises from left to right on a graph and therefore has a positive slope. • In a negative relationship, as scores on one variable increase, scores on the other variable decrease. So a best fitting line falls from left to right on a graph and therefore has a negative slope.
Strength • The strength of a correlation indicates how predictable one variable is from another. • In a strong relationship, tX and tY scores are consistently similar or dissimilar. So, you are able to accurately predict one score from another. • In a weak relationship, tX and tY scores are in consistent in similarity or dissimilarity. So, you are only able to somewhat predict one score from another. • In an independent relationship, there is no consistency in the relationship of the tX and tY scores So, it is impossible to predict one score from another.
Discuss alpha levels and critical values with respect to statistical significance. Discuss the null and research hypotheses for correlation. Discuss the null hypothesis in regression. Discuss using the r table for determining significance.
Alpha levels and significance • Scientists are a careful bunch. They are very careful to not make a Type 1 error. • A Type 1 error is when you mistakenly say that you have found a relationship in a population, when the one you found in your random sample doesn’t exist in the population as a whole • To be careful, scientists will only say there is a relationship, if the probability of a Type 1 error is very low (5 in 100 or lower.)
. . . Alpha levels and significance • These probabilities are called alpha levels. • The typical alpha levels are p.05 or p.01. • These represent 5 out of 100 or 1 out of 100. • A sample r that is far enough from 0.000 to occur 5 or fewer times in 100 when rho actually equals zero is called significant.
The Null Hypothesis • The null hypothesis (H0) states that a non-zero correlation in a sample between two variables is the result of random sampling fluctuation. • Therefore, there is no underlying relationship in the population as a whole. • In mathematical terms, rho = 0.000.
The Alternative Hypothesis • The alternative hypothesis is the opposite of the null hypothesis. • It states that there is an underlying relationship in the population as a whole and that it is reflected by a non-zero correlation in your random sample.
Rejecting the Null Hypothesis • The purpose of research is to reject the null hypothesis. • We reject the null hypothesis, when the correlation is significant. • The correlation is significant, when the probability that the result is due to an error is less than the .05 or .01 alpha level.
Using the r Table to Determine Significance • First, calculate r. • Then, determine the degrees of freedom, (np-2). • Look in the r table to see if r falls outside the CI.95 in Column 2 of the r table. If r does, it is significant.
df nonsignificant .05 .01 1 2 3 4 5 6 7 8 9 10 11 12 . . . 100 200 300 500 1000 2000 10000 -.996 to .996 -.949 to .949 -.877 to .877 -.810 to .810 -.753 to .753 -.706 to .706 -.665 to .665 -.631 to .631 -.601 to .601 -.575 to .575 -.552 to .552 -.531 to .531 . . . -.194 to .194 -.137 to .137 -.112 to .112 -.087 to .087 -.061 to .061 -.043 to .043 -.019 to .019 .997 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 . . . .195 .138 .113 .088 .062 .044 .020 .9999 .990 .959 .917 .874 .834 .798 .765 .735 .708 .684 .661 . . . .254 .181 .148 .115 .081 .058 .026 Or here? Does r fall here? Or here? Look in this column for the row that has your degrees of freedom. Non-significant Significant
Significant Correlation • If r is non-significant, we continue to accept the null hypothesis and say that rho = 0.000 in the population. • If r is significant, we reject the null hypothesis at the .05 or .01 alpha level. We assume that rho is best estimated by r, the correlation found in the random sample.
Discuss the situations when you cannot use regression. Discuss the inappropriateness of predicting outside of the sample range.
Use Regression Carefully • When we have the entire population and compute rho, • We know all of the values of X and Y. • We know the direction and strength of the relationship between X and Y variables. • Therefore, we can safely use the regression equation to predict Y from X. • Even when rho is exactly zero, the regression equation is still right. It tells us to predict that everyone will score at the mean of Y.
Samples and Regression • When we have a random sample from a population, we can only predict when • r is significant, otherwise we assume that rho is 0, • the relationship between the variables is linear, otherwise it is inappropriate to use correlation at all, • the X score is within the range of X scores in the sample, because for values outside of the range, you do not know if the linear relationship holds.
Discuss the steps in making raw data predictions from raw data values.
Describe the steps in making predictions from raw data. Scientists are interested in taking the score for one variable and then predicting the score for another variable. If you want to predict, you must first ensure that there is a linear relationship between the two variables. Then, you must calculate the correlation coefficient and check that it is significant. You also must check that the score you are predicting from is within the original range of scores. If these conditions are met, then you can use the regression equation to predict. You first convert the predicting score to a t score. Then you plug the t score and the correlation value into the regression equation. You solve the regression equation for the predicted t score. Finally, you convert the predicted t score into the predicted score.
Discuss residual error when predicting from regression. • The average squared error when we predict from the mean is the variance, also called the mean square error. • The average squared error when we predict from the regression equation is called the residual mean square.
Residual Square Error • A significant correlation will always yield a better prediction than the mean. • Therefore, the residual mean square is always better, that is, smaller than the variance.
Steps in calculating Residual Square Error • To calculate the variance take the deviations of Y from the mean of Y, square them, add them up, and divide by degrees of freedom. • To calculate the residual mean square take the deviations of each Y from its predicted value, square them, add them up, and divide by degrees of freedom.
Short way to do that • r2 equals the proportion of error that is gotten rid of when you use the regression equation rather than the mean as your prediction. • So the amount of error you get rid of equals the original sum of squares for Y times r2. • So the remaining error, SSRESID, equals the amount of error you get by using the mean as your predictor (SSY) minus the amount you get rid of by using the regression equation, r2SSY
SSRESID=SSY –r2SSY To get average squared error when using the regression equation, divide by dfREG (MSRESID=SSRESID/(np-2) The standard error of the estimate is simply the square root of MSRESID