520 likes | 751 Views
Statistical Analysis Regression & Correlation. Psyc 250 Winter, 2013. Review: Types of Variables & Steps in Analysis. Variables & Statistical Tests. Evaluating an hypothesis. Step 1: What is the relationship in the sample ?
E N D
Statistical AnalysisRegression & Correlation Psyc 250 Winter, 2013
Evaluating an hypothesis • Step 1: What is the relationship in the sample? • Step 2: How confidently can one generalize from the sample to the universe from which it comes? p < .05
Relationships betweenScale Variables Regression Correlation
Regression • Amount that a dependent variable increases (or decreases) for each unit increase in an independent variable. • Expressed as equation for a line – y = m(x) + b – the “regression line” • Interpret by slope of the line: m (Or: interpret by “odds ratio” in “logistic regression”)
Correlation • Strength of association of scale measures • r = -1 to 0 to +1 +1 perfect positive correlation -1 perfect negative correlation 0 no correlation • Interpret r in terms of variance
Height Mother’s height Mother’s education SAT Estimate IQ Well-being (7 pt. Likert) Weight Father’s education Family income G.P.A. Health (7 pt. Likert) Example: Weight & HeightSurvey of Class n = 42
Frequency Table for: HEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent 59.00 1 2.4 2.4 2.4 61.00 2 4.8 4.8 7.1 62.00 3 7.1 7.1 14.3 63.00 3 7.1 7.1 21.4 65.00 5 11.9 11.9 33.3 66.00 3 7.1 7.1 40.5 67.00 4 9.5 9.5 50.0 68.00 5 11.9 11.9 61.9 69.00 1 2.4 2.4 64.3 70.00 6 14.3 14.3 78.6 71.00 1 2.4 2.4 81.0 72.00 4 9.5 9.5 90.5 73.00 3 7.1 7.1 97.6 74.00 1 2.4 2.4 100.0 ------- ------- ------- Total 42 100.0 100.0 Valid cases 42 Missing cases 0
Frequency Table for: HEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent 59.00 1 2.4 2.4 2.4 61.00 2 4.8 4.8 7.1 62.00 3 7.1 7.1 14.3 63.00 3 7.1 7.1 21.4 65.00 5 11.9 11.9 33.3 66.00 3 7.1 7.1 40.5 67.00 4 9.5 9.5 50.0 68.00 5 11.9 11.9 61.9 69.00 1 2.4 2.4 64.3 70.00 6 14.3 14.3 78.6 71.00 1 2.4 2.4 81.0 72.00 4 9.5 9.5 90.5 73.00 3 7.1 7.1 97.6 74.00 1 2.4 2.4 100.0 ------- ------- ------- Total 42 100.0 100.0 Valid cases 42 Missing cases 0 Descriptive Statistics for: HEIGHT Valid Variable Mean Std Dev Variance Range Minimum Maximum N HEIGHT 67.33 3.87 14.96 15.00 59.00 74.00 42 mean
Variance x i - Mean )2 Variance = s2 = ----------------------- N - 1 Standard Deviation = s = variance
Frequency Table for: WEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent 115.00 1 2.4 2.4 2.4 120.00 1 2.4 2.4 4.8 124.00 1 2.4 2.4 7.1 125.00 4 9.5 9.5 16.7 128.00 1 2.4 2.4 19.0 130.00 6 14.3 14.3 33.3 135.00 4 9.5 9.5 42.9 136.00 1 2.4 2.4 45.2 140.00 3 7.1 7.1 52.4 145.00 2 4.8 4.8 57.1 150.00 3 7.1 7.1 64.3 155.00 2 4.8 4.8 69.0 160.00 6 14.3 14.3 83.3 165.00 2 4.8 4.8 88.1 170.00 1 2.4 2.4 90.5 185.00 1 2.4 2.4 92.9 190.00 2 4.8 4.8 97.6 210.00 1 2.4 2.4 100.0 ------- ------- ------- Total 42 100.0 100.0 Valid cases 42 Missing cases 0 Descriptive Statistics for: WEIGHT Valid Variable Mean Std Dev Variance Range Minimum Maximum N WEIGHT 146.38 21.30 453.80 95.00 115.00 210.00 42 mean
“Least Squares” Regression Line Dependent = ( B ) (Independent) + constant weight = ( B ) ( height ) + constant
Regression: WEIGHT on HEIGHT Multiple R .59254 R Square .35110 Adjusted R Square .33488 Standard Error 17.37332 Analysis of Variance DF Sum of Squares Mean Square Regression 1 6532.61322 6532.61322 Residual 40 12073.29154 301.83229 F = 21.64319 Signif F = .0000 ------------------ Variables in the Equation ------------------ Variable B SE B Beta T Sig T HEIGHT 3.263587 .701511 .592541 4.652 .0000 (Constant) -73.367236 47.311093 -1.551 [ Equation: Weight = 3.3 ( height ) - 73 ]
Regression line W = 3.3 H - 73
Strength of Relationship“Goodness of Fit”: Correlation How well does the regression line “fit” the data?
Correlation • Strength of association of scale measures • r = -1 to 0 to +1 +1 perfect positive correlation -1 perfect negative correlation 0 no correlation • Interpret r in terms of variance
Frequency Table for: WEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent 115.00 1 2.4 2.4 2.4 120.00 1 2.4 2.4 4.8 124.00 1 2.4 2.4 7.1 125.00 4 9.5 9.5 16.7 128.00 1 2.4 2.4 19.0 130.00 6 14.3 14.3 33.3 135.00 4 9.5 9.5 42.9 136.00 1 2.4 2.4 45.2 140.00 3 7.1 7.1 52.4 145.00 2 4.8 4.8 57.1 150.00 3 7.1 7.1 64.3 155.00 2 4.8 4.8 69.0 160.00 6 14.3 14.3 83.3 165.00 2 4.8 4.8 88.1 170.00 1 2.4 2.4 90.5 185.00 1 2.4 2.4 92.9 190.00 2 4.8 4.8 97.6 210.00 1 2.4 2.4 100.0 ------- ------- ------- Total 42 100.0 100.0 Valid cases 42 Missing cases 0 Descriptive Statistics for: WEIGHT Valid Variable Mean Std Dev Variance Range Minimum Maximum N WEIGHT 146.38 21.30 453.80 95.00 115.00 210.00 42 mean
mean Variance = 454
Regression line mean
Correlation: “Goodness of Fit” • Variance (average sum of squared distances from mean) = 454 • “Least squares” (average sum of squared distances from regression line) = 295
l.s. = 295 Regression line S2 = 454 mean
Correlation: “Goodness of Fit” How much is variance reduced by calculating from regression line? 454 – 295 = 159 159 / 454 = .35 Variance is reduced 35% by calculating “least squares” from regression line r2 = .35
Correlation coefficient = r r2 = % of variance in WEIGHT “explained” by HEIGHT
Correlation: HEIGHT with WEIGHT HEIGHT WEIGHT HEIGHT 1.0000 .5925 ( 42) ( 42) P= . P= .000 WEIGHT .5925 1.0000 ( 42) ( 42) P= .000 P= .
r = .59 r2 = .35 HEIGHT “explains” 35% of variance in WEIGHT
Sentence & G.P.A. • Regression: form of relationship • Correlation: strength of relationship • p value: statistical significance
Legal Attitudes Study: • Relationship of sentence length to G.P.A.? • Relationship of sentence length to Liberal-Conservative views
Regression Coefficients Sentence = -3.5 G.P.A. + 18
“Least Squares” Regression Line Sent = -3.5 GPA + 18
Statistical Significance Regression: Correlation p = .31
Interpreting Correlations • r = -.22 • r2 = .05 p = .31 G.P.A. “explains” 5% of the variance in length of sentence
Write Results “A regression analysis finds that each higher unit of GPA is associated with a 3.5 month decrease in sentence length, but this correlation was low (r = -.22) and not statistically significant (p = .31).”
Multiple Regression • Problem: relationship of weight and calorie consumption • Both weight and calorie consumption related to height • Need to “control for” height or assess relative effects of height and calorie consumption
Multiple Regression Regression line mean
Multiple Regression Regression line Residuals mean
Multiple Regression • Regress weight residuals (dependent variable) on caloric intake (independent variable) • Statistically “controls” for height: removes effect or “confound” of height . • How much variance in weight does caloric intake account for over and above height?
Multiple Regression • How much variance in dependent measure (weight, length of sentence) do all independent variables combined account for? multiple R2 • What is the best “model” for predicting the dependent variable?