1 / 15

Prediction

Prediction. Prediction. Prediction is a scientific guess of unknown by using the known data. Statistical prediction is based on correlation. If the correlation among variables is high, the success in prediction will be high.

Download Presentation

Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction

  2. Prediction • Prediction is a scientific guess of unknown by using the known data. • Statistical prediction is based on correlation. If the correlation among variables is high, the success in prediction will be high. • So, if there is a perfect correlation among variables, then the prediction will be perfect. • Today, we will study on the bivariate linear correlation and regression. That is, linear relation between two variables.

  3. Prediction • Today, we will study on the bivariate linear correlation and regression. That is, linear relation between two variables. • In a graduate level of statistic class, you will learn curvilinear correlation/regression for U-shaped relations and Cannonical Correlation for the relations among three or more variables • By using statistical predictions (regression), we can answer such questions: • What is the height of a person whose weight is 80 kg? • If a students’ OSS score is 168, what will be his/her GPA in university?

  4. Prediction • As you remember, correlation coefficient provides us a measure of how strongly two variables are related. • Pearson r is one of the correlation coefficients and it is calculated on the least squared deviations. • By using that method, we can draw a hypothetic linebetween the scores on a scatter diagram. • This line represents the best fit of the predicted scores

  5. In this example, the correlation between weight and height is perfect (r= +1). So, all the dots in the diagram are on the hypothetic line (the line of best fit). • The formula of the line is Y=aX+c. For these distributions, it is Y=1x + 100.

  6. Terminology: • The line in this diagram is regression line. • The equation of the line is regression equation (Y=aX+c). • For correlation, it is not important which axis represents which variable, since there is no direction in correlation. • For regression, Y axis is always shows the predicted (dependent) variable, and X axis represents the predictor (independent variable) • Be careful about the language of regression. The equation is always read as regression of Y on X.

  7. Now let’s focus on the other examples in which the relation between two variables is less perfect. • As you can see, when the correlation become less perfect, dots in the diagram starts to diverge from the regression line

  8. In this example, the correlation is far from being perfect. • So, the dots are far away from the regression line.

  9. The Criterion of Best Fit • By using the regression equation, we can calculate predicted values of Y. • Let’s say these predicted values areYl • As you can see in the scatter diagram, these predicted values of Yl are not the same of Y. • The discrepancy with Y and Yl is the error of prediction. • These discrepancies are presented as vertical lines in the figure. • As you can see, the higher the r, the lower the error is.

  10. The Regression Equation • The regression of Y on X (Standart-Score Formula) • zly = rzx • zly = the predicted standart-score value of Y • r = correlation coefficient • rzx= the standart-score value of X from which zly is predicted • As you can see, • 1. zly =rzxwhen r=1 • 2. zlyis 0 when zxis 0. That is mean of X overlaps with mean of Yl • To calculate the predicted values of Y from raw scores of X, we can use a simpler formula

  11. The Regression Equation • In this formula • Yl= the predicted raw score in Y • Sx and Sy= the two standart deviations • r= correlation coefficient • Now, lets try to calculate Yl for the distribution presented in Table 1. Note that this form of the formula is similar to Y=aX+c

  12. Error of PredictionThe Standard Error of Estimate • The regression line is a kind of a mean of the score in the scatter plot • The discrepancy from regression line is then a kind of a deviation form mean (line) • So, we can use a similar formula to SD in order to calculate standard discrepancy

  13. Error of PredictionThe Standard Error of Estimate • A preferred formula for Syx is

  14. Limits of Error in Estimating Y from X • Assume that actual scores of Y are normally distributed about predicted scores • So, we can use normal curve to calculate the limits of error in prediction • In a Normal Distribuion • 68% of the Y scores fall within the limits Y(mean) +/- 1 SD • 95 % of the Y scores fall within the limits Y(mean) +/- 1.96 SD • 99 % of the Y scores fall within the limits Y(mean) +/- 2.58 SD

  15. Limits of Error in Estimating Y from X • Given that regression line is a kind of a mean and standard error is a kind of a SD • We can see that • 68% of actual Y values fall within the limits Y(mean) +/- 1 Syx • 95 % of actual Y values fall within the limits Y(mean) +/- 1.96 Syx • 99 % of actual Y values fall within the limits Y(mean) +/- 2.58 Syx

More Related