1 / 14

Correlation & Regression

Correlation & Regression. Correlation. Measure the strength of linear relation between 2 random variables (X & Y)  = Corr(X,Y) = Cov(X,Y)/ δ x δ y = E[(X- μ x )(Y- μ y )]/[(X- μ x ) 2 (Y- μ y ) 2 ] 1/2 Standardized Cov(X,Y) so -1   1. Strength of .

abbott
Download Presentation

Correlation & Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation & Regression

  2. Correlation • Measure the strength of linear relation between 2 random variables (X & Y) •  = Corr(X,Y) = Cov(X,Y)/δxδy • = E[(X-μx)(Y-μy)]/[(X- μx)2(Y- μy)2]1/2 • Standardized Cov(X,Y) so -1   1

  3. Strength of  •  = -1  Perfect Negative linear relation •  = 1  Perfect Positive linear relation •  = 0  No linear relation • As || increases so does the strength of the relationship

  4. Sample • Cov(X,Y) = 1/(n-1) (xi -x)(yi -y) • Corr(X,Y) = r = • (xi -x)(yi -y)/[(xi -x)2 (yi -y)2]1/2

  5. Hypothesis Test • Null: H0:  = 0 • Alternative: HA:   0; reject H0 if t>tn-2,/2 • Alternative: HA:  > 0; reject H0 if t > tn-2, • Alternative: HA:  < 0; reject H0 if t < -tn-2,

  6. Rank Correlation (Spearman’s) • Sample Correlation (r) can be affected by extreme observations • Spearman’s Rank • 1st rank xi and yi then calculate sample correlation of these ranks • rs = 1- [6(d2)/n(n2-1)] • Where di = the differences of the ranked pairs

  7. Linear Regression • Find/Define relationship between dependent variable and independent variable • Use independent variable to explain the behavior of the dependent variable • Separate variation in the data into explained variation and unexplained variation (noise) • Predict the value of the dependent variable given a value for the independent variable

  8. Linear Regression Model • Predict Y given X • E(Y|X=x) = 0 + 1x • Y = 0 + 1xi + i • Assumptions: • I are random variables • E[i] = 0 • E[i i] = δ2 • E[i k] = 0 ik; they are uncorrelated

  9. Sum of Squares • Total Sum of Squares = Regression sum of squares + Error sum of squares • SST = SSR + SSE • (yi -y)2 = (yi -y)2 + e2i

  10. Coefficient of Determination (R2) • Measures how well x explain the variation in Y • R2 = SSR/SST = 1- SSE/SST = r2 • R2 measures the explained variation in the data

  11. Confidence Interval • Error Variance: S2e = e2i/(n-2) = SSE/(n-2) • Unbiased Estimate of δ2b: S2b = S2e/(xi -x)2 • t = (b-)/Sb • C.I. for Regression Slope = b-tn-2,/2Sb <  < b+tn-2,/2Sb

  12. Regression Slope Tests • H0:  = 0 or H0:   0 vs. H1:  > 0 • Reject H0 if (b-)/Sb > tn-2, • H0:  = 0 or H0:   0 vs. H1:  < 0 • Reject H0 if (b-)/Sb < -tn-2, • H0:  = 0 vs. H1:   0 • Reject H0 if (b-)/Sb > tn-2, or (b-)/Sb < -tn-2,

  13. SAS: Inches-Centimeter • Data Height; • Input inches centimeter; • Datalines; • 1 2.54 • 2 5.08 • 24 60.96 • 4 10.16 • 5 12.7 • 16 40.64 • 7 17.78 • 8 20.32 • 19 48.26 • 10 25.4 • 20 50.8 • 25 63.5 • ; • ProcPlot Data=Height; • Plot inches*centimeter; • ProcCorr Data=Height; • Title 'Correlation Matrix of Inches vs. Centimeter'; • Var inches centimeter; • ProcReg Data=Height; • Title 'Regression Line for Inches-Centimeter Data'; • Model inches=centimeter; • Plot Predicted.*centimeter = 'P' • U95M.*centimeter = '-' L95M.*centimeter = '_' • inches*centimeter = '*' / overlay; • Plot Residual.*centimeter = 'o'; • Quit;

  14. SAS: GRE – GPA Data • Data GRE_GPA; • Input GRE GPA; • Datalines; • 2100 4 • 1920 3.8 • 2290 3.8 • 1580 3.9 • 1400 3.77 • 1300 3.95 • 2020 3.8 • 1060 3.54 • 1500 3 • 1900 4 • 1900 3.7 • 1800 3.5 • 2200 4 • 1990 3.51 • 2000 4 • 1650 3.8 • 1640 3.75 • 1800 3.9 • 2300 3.91 • 2000 3.75 • 2000 3.9 • ; • ProcPlot Data=GRE_GPA; • Plot GRE*GPA; • ProcCorr Data=GRE_GPA; • Title 'Correlation Matrix of GRE vs. GPA'; • Var GRE GPA; • ProcReg Data=GRE_GPA; • Title 'Regression Line for GRE-GPA Data'; • Model GPA=GRE; • Plot Predicted.*GRE = 'P' • U95M.*GRE = '-' L95M.*GRE = '_' • GPA*GRE = '*' / overlay; • Plot Residual.*GRE = 'o'; • Quit;

More Related