1 / 60

Experimental Statistics - week 10

Experimental Statistics - week 10. Chapter 11: Linear Regression and Correlation. Note: Homework Due Thursday. 2-Factor with Repeated Measure -- Model. type. subject within type. type by time interaction. time.

larhonda
Download Presentation

Experimental Statistics - week 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday

  2. 2-Factor with Repeated Measure -- Model type subject within type type by time interaction time NOTES: type and time are both fixed effects in the current example - we say “subject is nested within type” - Expected Mean Squares given on page 1032

  3. 2-Factor Repeated Measures – ANOVA Output The GLM Procedure Dependent Variable: conc Sum of Source DF Squares Mean Square F Value Pr > F Model 17 57720.50000 3395.32353 110.87 <.0001 Error 32 980.00000 30.62500 Corrected Total 49 58700.50000 R-Square Coeff Var Root MSE conc Mean 0.983305 6.978545 5.533986 79.30000 Source DF Type III SS Mean Square F Value Pr > F type 1 40.50000 40.50000 1.32 0.2587 subject(type) 8 3920.00000 490.00000 16.00 <.0001 time 4 34288.00000 8572.00000 279.90 <.0001 type*time 4 19472.00000 4868.00000 158.96 <.0001

  4. 2-factor Repeated Measures Source Type III Expected Mean Square type Var(Error) + 5 Var(subject(type)) + Q(type,type*time) subject(type) Var(Error) + 5 Var(subject(type)) time Var(Error) + Q(time,type*time) type*time Var(Error) + Q(type*time) The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: conc Source DF Type III SS Mean Square F Value Pr > F * type 1 40.500000 40.500000 0.08 0.7810 Error 8 3920.000000 490.000000 Error: MS(subject(type)) * This test assumes one or more other fixed effects are zero. Source DF Type III SS Mean Square F Value Pr > F subject(type) 8 3920.000000 490.000000 16.00 <.0001 * time 4 34288 8572.000000 279.90 <.0001 type*time 4 19472 4868.000000 158.96 <.0001 Error: MS(Error) 32 980.000000 30.625000

  5. NOTE: Since time x type interaction is significant, and since these are fixed effects we DO NOT test main effects – we compare cell means (using MSE) Cell Means .5 1 2 3 4 C 37 63 85 140 76 T 55 81 134 80 42

  6. The write-up related to the SAS output should be something like the following. Note, that even though we get a significant variance component due to subject(within group) I did not estimate the variance component itself. (I did not give this particular variance component estimation formula.) Note also that since there is a significant interaction between the fixed effects type and time, we do not test the main effects.

  7. Dealing with Normality/Equal Variance Issues Normalizing Transformations: - log - square root - Box-Cox transformations Note: the normalizing transformations sometimes also produce variance stabilization

  8. Nonparametric “ANOVA” Man-Whitney U – for comparing 2 samples Kruskal-Wallis Test – for comparing >2 samples Friedman’s Test –nonparametric alternative to randomized complete block/ 1-factor repeated measures design

  9. Histogram • displays distribution of 1 variable Scatter Diagram (Scatterplot) • displays joint distribution of 2 variables • plots data as “points” in the “x-y plane.”

  10. Association Between Two Variables • indicates that knowing one helps in predicting the other Linear Association • our interest in this course • points “swarm” about a line Correlation Analysis • measures the strength of linear association

  11. (association)

  12. Regression Analysis We want to predict the dependent variable - response variable using the independent variable - explanatory variable - predictor variable Dependent Variable (Y) Independent Variable (X) More than one independent variable – Multiple Regression

  13. 11.7 Correlation Analysis

  14. Correlation Coefficient- measures linear association -1 0 +1 perfect no perfect negative linear positive relationship relationship relationship

  15. Positive Correlation- - high values of one variable are associated with high values of the other Examples: • - father’s height, son’s height • - daily grade, final grade • r = 0.93 for plot on the left 3 2 1 0 1 2 3 4 5 6 7 8

  16. EXAMS I and II

  17. Negative Correlation- -high with low, low with high Examples: - car age, selling price - days absent, final grade • r = - 0.89 for plot shown here 4 3 2 1 0 1 2 3 4 5 6 7

  18. Zero Correlation- - no linear relationship 5 4 3 2 1 0 Examples: • - height, IQ score • r = 0.0 for plot here 1 2 3 4 5 6 7

  19. -.75, 0, .5, .99

  20. Calculating the Correlation Coefficient

  21. Notation: So --

  22. The data below are the study times and the test scores on an exam given over the material covered during the two weeks. Study Time Exam (hours) Score (X) (Y) 10 92 15 81 12 84 20 74 8 85 16 80 14 84 22 80 Find r

  23. DATA one; INPUT time score; DATALINES; 10 92 15 81 12 84 20 74 8 85 16 80 14 84 22 80 ; PROCCORR; Var score time; TITLE ‘Study Time by Score'; RUN; PROCPLOT; PLOT time*score; RUN; PROCGPLOT; PLOT time*score; RUN;

  24. Study Time by Score The CORR Procedure 2 Variables: score time Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum score 8 82.50000 5.18239 660.00000 74.00000 92.00000 time 8 14.62500 4.74906 117.00000 8.00000 22.00000 Pearson Correlation Coefficients, N = 8 Prob > |r| under H0: Rho=0 score time score 1.00000 -0.77490 0.0239 time -0.77490 1.00000 0.0239

  25. Plot of score*time. Legend: A = 1 obs, B = 2 obs, etc. score ‚ ‚ 92 ˆ A ‚ 91 ˆ ‚ 90 ˆ ‚ 89 ˆ ‚ 88 ˆ ‚ 87 ˆ ‚ 86 ˆ ‚ 85 ˆ A ‚ 84 ˆ A A ‚ 83 ˆ ‚ 82 ˆ ‚ 81 ˆ A ‚ 80 ˆ A A ‚ 79 ˆ ‚ 78 ˆ ‚ 77 ˆ ‚ 76 ˆ ‚ 75 ˆ ‚ 74 ˆ A ‚ Šƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒ 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 time

  26. Testing Statistical Significance of Correlation Coefficient Test Statistic Rejection Region t > ta/2 ort < - ta/2df=n - 2

  27. Correlation Between Study Time and Score H0: There is No Correlation Between Study Time and Score Ha: There is a Correlation Between Study Time and Score Rejection Region Test Statistic Conclusion P-value

  28. Properties of Correlations • Correlation measures the strength of the linear relationship between two variables. • Correlation requires that both variables be quantitative. • r does not change when we change the units of measurement of x, y, or both. • Correlation makes no distinction between explanatory and response variables. • The correlation coefficient is not resistant to outliers.

  29. Math vs Reading Scores The CORR Procedure Pearson Correlation Coefficients, N = 20 Prob > |r| under H0: Rho=0 math reading math 1.00000 0.87207 <.0001 reading 0.87207 1.00000 <.0001

  30. Math vs Reading Scores with Outlier The CORR Procedure Pearson Correlation Coefficients, N = 20 Prob > |r| under H0: Rho=0 math reading math 1.00000 0.27198 0.2460 reading 0.27198 1.00000 0.2460

  31. Pearson Correlation Coefficients, N = 14 Prob > |r| under H0: Rho=0 math reading math 1.00000 -0.1973 0.5194 reading -0.1973 1.00000 0.5194

  32. Pearson Correlation Coefficients, N = 14 Prob > |r| under H0: Rho=0 math reading math 1.00000 0.53211 0.0502 reading 0.53211 1.00000 0.0502

  33. Divorce Rate (per 1000) % in prison on Drug Offenses

  34. IMPORTANT NOTE:Correlation DOES NOTImply Causation • strong association between 2 variables is not enough to justify conclusions about cause and effect • best way to get evidence that X causes Y is through a controlled experiment

  35. 11.1-5 Regression Analysis

  36. Goal of Regression Analysis: Predict Y from knowledge of X For data such as the Father-Son data, it seems reasonable to assume a model of the form i.e. the conditional means of Y given x follow a straight line

  37. Alternative mathematical expression for the “regression model”: In practice, we want to estimate this line from the data.

More Related