740 likes | 1.47k Views
Correlation and regression analysis. Week 8 Research Methods & Data Analysis. Lecture outline. Correlation Regression Analysis The least squares estimation method SPSS and regression output Task overview. Correlation.
E N D
Correlation and regression analysis Week 8 Research Methods & Data Analysis Research Methods & Data Analysis
Lecture outline • Correlation • Regression Analysis • The least squares estimation method • SPSS and regression output • Task overview Research Methods & Data Analysis
Correlation • Correlation measures to what extent two (or more) variables are related • Correlation expresses a relationship that is not necessarily precise (e.g. height and weight) • Positive correlation indicates that the two variables move in the same direction • Negative correlation indicates that they move in opposite directions Research Methods & Data Analysis
Covariance • Covariance measures the “joint variability” • If two variables are independent, then the covariance is zero (however, Cov=O does not mean that two variables are independent) • Where E(…) indicates the expected value (i.e. average value) Research Methods & Data Analysis
Correlation coefficient • The correlation coefficient r gives a measure (in the range –1, +1) of the relationship between two variables • r=0 means no correlation • r=+1 means perfect positive correlation • r=-1 means perfect negative correlation • Perfect correlation indicates that a p% variation in x corresponds to a p% variation in y Research Methods & Data Analysis
Correlation coefficient and covariance Pearson correlation coefficient Correlation coefficient - POPULATION SAMPLE Research Methods & Data Analysis
Bivariate and multivariate correlation • Bivariate correlation • 2 variables • Pearson correlation coefficient • Partial correlation • The correlation between two variables after allowing for the effect of other “control” variables Research Methods & Data Analysis
Significance level in correlation • Level of correlation (value of the correlation coefficient): indicates to what extent the two variables “move together” • Significance of correlation (p value): given that the correlation coefficient is computed on a sample, indicates whether the relationship appear to be statistically significant • Examples • Correlation is 0.50, but not significant: the sampling error is so high that the actual correlation could even be 0 • Correlation is 0.10 and highly significant: the level of correlation is very low, but we can be confident on the value of such correlation Research Methods & Data Analysis
Correlation and covariance in SPSS Choose between bivariate & partial Research Methods & Data Analysis
Bivariate correlation Select the variables you want to analyse Require the significance level (two tailed) Ask for additional statistics (if necessary) Research Methods & Data Analysis
Bivariate correlation output Research Methods & Data Analysis
Partial correlations List of variables to be analysed Control variables Research Methods & Data Analysis
Partial correlation output - - - P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S - - - Controlling for.. SIZE STYLE AMTSPENT USECOUP ORG AMTSPENT 1.0000 .2677 -.0116 ( 0) ( 775) ( 775) P= . P= .000 P= .746 USECOUP .2677 1.0000 .0500 ( 775) ( 0) ( 775) P= .000 P= . P= .164 ORG -.0116 .0500 1.0000 ( 775) ( 775) ( 0) P= .746 P= .164 P= . (Coefficient / (D.F.) / 2-tailed Significance) " . " is printed if a coefficient cannot be computed Partial correlations still measure the correlation between two variables, but eliminate the effect of other variables, i.e. the correlations are computed on consumers shopping in stores of identical size and with the same shopping style Research Methods & Data Analysis
Bivariate and partial correlations • Correlation between Amount spent and Use of coupon • Bivariate correlation: 0.291 (p value 0.00) • Partial correlation: 0.268 (p value 0.00) • The amount spent is positively correlated with the use of coupon (0=no use, 1=from newspaper, 2=from mailing, 3=both) • The level of correlation does not change much after accounting for different shop size and shopping styles Research Methods & Data Analysis
Linear regression analysis Intercept Error Dependent variable Independent variable (explanatory variable, regressor…) Regression coefficient Research Methods & Data Analysis
Regression analysis y x Research Methods & Data Analysis
Example • We want to investigate if there is a relationship between cholesterol and age on a sample of 18 people • The dependent variable is the cholesterol level • The explanatory variable is age Research Methods & Data Analysis
What regression analysis does • Determine whether a relationships exist between the dependent and explanatory variables • Determine how much of the variation in the dependent variable is explained by the independent variable (goodness of fit) • Allow to predict the values of the dependent variable Research Methods & Data Analysis
Regression and correlation • Correlation: there is no causal relationship assumed • Regression: we assume that the explanatory variables “cause” the dependent variable • Bivariate: one explanatory variable • Multivariate: two or more explanatory variables Research Methods & Data Analysis
How to estimate the regression coefficients • The objective is to estimate the population parameters a ebon our data sample: • A good way to estimate it is by minimising the error ei, which represents the difference between the actual observation and the estimated (predicted) one Research Methods & Data Analysis
The objective is to identify the line (i.e. the a and b coefficients) that minimise the distance between the actual points and the fit line Research Methods & Data Analysis
The least square method • This is based on minimising the square of the distance (error) rather than the distance Research Methods & Data Analysis
Bivariate regression in SPSS Research Methods & Data Analysis
Regression dialog box Dependent variable Explanatory variable Leave this unchanged! Research Methods & Data Analysis
Regression output Statistical significance Is the coefficient different from 0? Value of the coefficients Research Methods & Data Analysis
Model diagnostics: goodness of fit The value of the R square is included between 0 and 1 and represents the proportion of total variation that is explained by the regression model Research Methods & Data Analysis
R-square Total variation Variation explaned by regression Residual variation Research Methods & Data Analysis
Multivariate regression • The principle is identical to bivariate regression, but there are more explanatory variables • The goodness of fit can be measured through the adjusted R-square, which takes into account the number of explanatory variables Research Methods & Data Analysis
Multivariate regression in SPSS • Analyze / Regression / Linear Simply select more than one explanatory variable Research Methods & Data Analysis
Output Research Methods & Data Analysis
Coefficient interpretation • The constant represents the amount spent being 0 all other variables (£ 296.5) • Health food stores, Size of store and being vegetarian are not significantly different from 0 • Gender coeff = -69.6: On average being woman (G=1) implies spending £ 69 less • Shopping style coeff = +22.8 S • S=1 (shop per himself) = +22.8 • S=2 (shop per himself & spouse) = +45.6 • S=3 (shop per himself & family) = +68.4 • Coupon use coeff = 30.4 C • C=1 (do not use coupon) = +30.4 • C=2 (coupon from newspapers) = +60.8 • C=3 (coupon from mailings) = +91.2 • C=4 (coupon from both) = +121.6 Categorization problems? Research Methods & Data Analysis
Prediction • On average, how much will someone with the following characteristics spend: • Male (G=0) • Shopping for family (S=3) • Not using coupons (C=1) Research Methods & Data Analysis
How good is the model? • The regression model explain less than 19% of the total variation in the amount spent Research Methods & Data Analysis
Task A • Examine the relationship between the amount spent and the following customer characteristics: • Being male/female • Being vegetarian • Shopping for himself / for himself and others • Shopping style (weekly, bi-weekly, etc.) • Potential methods: • Battery of hypothesis testing & Analysis of variance • Regression Analysis Research Methods & Data Analysis
Task B • Examine the relationship between the amount spent and the following customer characteristics: • Hypothesis: the average amount spent in health-oriented shop is higher than those of other shops. True or false? • Test the same hypothesis accounting for different shop sizes • Potential methods: • Battery of hypothesis testing & Analysis of variance • Regression Analysis Research Methods & Data Analysis
Task C • Find a relationship between the average amount spent per store and the following store characteristics: • Size of store • Health-oriented store • Store organisation • Potential methods: • Transform the customer data set into a store data set • Battery of ANOVA • Regression Analysis Research Methods & Data Analysis
Task D • Hypothesis: is the amount spent by those that use coupon significantly higher? • What is the most effective way of distributing coupons: • By mail • On newspapers • Both • Potential methods: • Recode the variable into 1=not using coupon and 2=using coupon • Hypothesis testing • Analysis of variance Research Methods & Data Analysis