1 / 24

Understanding Product Moment Correlation and Regression Analysis

Learn about product moment correlation, covariance, partial correlation, and regression analysis, including bivariate regression and associated statistics and steps.

middleton
Download Presentation

Understanding Product Moment Correlation and Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHAPTER- 17CORRELATION AND REGRESSION • PRODUCT MOMENT CORRELATION • It is a static summarizing the strength of association between two metric variables. Example: • How strongly are sales related to advertising expenditures? • Is there any association between market share and size of sales force? • The product moment correlation r, is the most widely used static. It was originally proposed bt Karl Pearson so it is also known as Pearson correlation co-efficient.

  2. PRODUCT MOMENT CORRELATION • From a sample of n observations, X and Y, the product moment correlation r,can be calculated as follows: • ∑ ( Xi – X¯) (Yi - Y¯) • r=----------------------------------------- • √ ∑ ( Xi – X¯)² ∑ (Yi - Y¯)² • Division of numerator and denominator gives: COVXY • r =------------------ SX SY • In these equations X¯ and Y¯ denote the sample means and Sx and Sy the standard deviations.

  3. PRODUCT MOMENT CORRELATION • Covariance: A systematic relationship between two variables in which a change in one implies a corresponding change in the other (COVxy). The covariance may be either positive or negative. • The statistical significance of the relationship between two variables measured by using r can be conveniently used. The hypothesis are: • Ho: ρ = 0 • H1: ρ ≠ 0 n-2 • The test statistic is: t= r (-----------------)½ 1-r²

  4. PARTIAL CORRELATION COEFFICIENT: • A measure of association between two variables after controlling or adjusting for the effects of one or more additional variables. • The statistic is used to answer the following questions: • How strongly are sales related to advertising expenditures when the effect of price is controlled? • Is there any association between market share and size of sales force after adjusting for the effects of sales promotion? • Are consumers perception of quality related to their perceptions of prices when the effect of brand image is controlled?

  5. PARTIAL CORRELATION COEFFICIENT: • Association between Y and X after controlling Z r XY – (r XZ ) (r YZ) • r XY.Z= --------------------------- √ 1- r² XZ √ 1- r² YZ • PART CORRELATION COEFFICIENT: • A measure of the correlation between Y and X when the linear effects of the other independent variables have been removed from X but not from Y. r XY - rYZ r XZ • r Y ( X.Z) = ----------------------------- √ 1- r² XZ

  6. REGESSION ANALYSIS: • A statistical procedure for analyzing associative relationships between a metric dependent variable and one or more independent variables. It can be used in the following ways: • Determine whether the independent variable explain a significant variation in the dependent variable: whether a relationship exists. • Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship. • Determine the structure or form of the relationship: the mathematical equation relating the independent and dependent variables. • Predict the values of the dependent variable. • Control for other independent variables when evaluating the contributions of a specific variable or set of variables.

  7. BIVARIATE REGRESSION: • BIVARIATE REGRESSION: • A procedure for deriving a mathematical relationship, in the form of an equation between a single metric dependent variable and a single metric independent variable. Such as: • Are consumers’ perception of quality determined by their perceptions of price? • Can the variation in the market share be accounted for by the size of the sales force?

  8. STATISTICS ASSOCIATED WITH BIVARIATE REGRESSION ANALYSIS: • Bivariate regression model. • Coefficient of determination. • Estimated or predicted value. • Regression coefficient. • Scatter diagram. • Standard error of estimate. • Standard error. • Standardized regression coefficient. • Sum of squared errors. • t statistic.

  9. STEPS IN CONDUCTING BIVARIATE REGRESSION ANALYSIS: • Plot the scatter diagram • Formulate the general model. • Estimate the parameters. • Estimate the standardized regression coefficient. • Test for significance. • Determine the strength and significance of the association. • Check the prediction accuracy. • Examine the residuals • Cross- validate the model.

  10. PLOT THE SCATTER DIAGRAM • A scatter diagram is the plot of values of two variables for all cases or observations. It is customary to plot the dependent variable on the vertical axis and the independent variable on the horizontal axis. • Least square procedure is a technique for fitting a straight line to a scatter gram by minimizing the square of the vertical distances of all points from the line. • Formulate the general model: • The general form is: • Y = ß0+ ß1x • In marketing research very few relationships are deterministic and there are errors. So the basic model becomes: • YI= ßo + ß1 XI + e i

  11. Estimate the parameters: In most cases ßo and ß1 are unknown and are estimated from the sampleobservation using the equation: Ŷi= a + bx i The slope b may be computed in terms of covariance between X and Y (COVxy) and the variance of X as: COVxy b=----------------------- S² x ∑ ( Xi – X¯) (Yi - Y¯) = ------------------------------ ∑ ( Xi – X¯)² ∑ Xi Yi- n X¯ Y¯ = ------------------------------ ∑ X ²- n X¯² The intercept a may be calculated as follows: a= Y¯ - b X¯

  12. Estimate Standardized Regression Coefficient: • Standardization is a process by which the raw data are transformed into new variables that have a mean of 0 and a variance of 1. Moreover, each of these regression coefficients is equal to the simple correlation between X and Y. • Byx= Bxy = rxy . • There is a simple relationship between the standardized and nonstandardized regression coefficients: • Byx= byx ( Sx / Sy)

  13. Determine the Strength and significance of Association: The strength of association may be calculated as follows: SSreg SSy– SSreg r2 = ------------ = ------------------ SSy SSreg • It may be recalled from the earlier calculation of the simple correlation coefficient that: • SSy= ∑ ( Yi - Y¯) ²

  14. The Regression of attitudes towards the city on the duration of residence • SSreg = ∑( Ŷ -Y¯) ², here, Ŷ = using a and b the predicted values of attitudes. • SSres = ∑(Y -Ŷ) ², here, Y attitudes towards the city • The appropriate test statistic is F statistic: SSreg • F = ------------------------ SSres / (n-2) • The statistical significance of the linear relationship between X (duration of residence) and Y (attitudes towards the city) may be tested by examining the hypothesis: • H 0 : ß 1 = 0 • H1 : ß 1 ≠ 0

  15. Check prediction accuracy: ∑ ( Yi - Y¯)² • SEE= √ --------------------------- n-2 SSreg • SEE= √ ------------------------- n-2 • If there are k independent variables then: SSreg • SEE= √ ------------------------- n-k-1

  16. ASSUMPTIONS • The error term is normally distributed. For each fixed value of X the distribution of Y is normal. • The means of all these normal distributions of Y, given X, lie on a straight line with slope b. • The mean of error term is 0. • The variance of error term is constant. • The error terms are uncorrelated. Observations have been drawn independently.

  17. MULTIPLE REGRESSION • A statistical technique that simultaneously develops a mathematical relationship between two or more independent variables and an interval scaled dependent variable. • The general form of multiplr regression model is as follows: • Y= ß­0 + ß1 X1 + ß2 X2 + … … … + ßk Xk + e • Which is estimated by the following equation: • Ŷ= a+ b1 X1 + b2 X2 +… … … +bk Xk

  18. Statistics associated with multiple regression: • Adjusted R² • Coefficient of multiple determination • F test • Partial F test. • Partial regression coefficient. • CONDUCTING MULTIPLE REGRESSION ANALYSIS: • Partial Regression Coefficients: • Ŷ = a + b1X1 + b2 X2

  19. Strength of association • The total variation is decomposed as follows: SSy = SSreg + SSreg n • SSy = ∑ ( Yi - Y¯)² i=1 • n • SSreg = ∑ ( Ŷi - Y¯)² • i=1 • The strength of association is measured as follows: SSreg R2 = -------------------- SSy

  20. Significance Testing: • The overall test can be calculated as follows: SSreg / k • F= --------------------------------------- SSreg / (n-k-1) R2 /k • F= ------------------------- ( 1- R²) / ( n-k-1) • Examination of residuals: • The difference between the observed value of Yi and the value predicted by the regression equation Yi’

  21. Stepwise Regression • A regression proceedure in which the prodictor variables enter or leave the regression equation one at a time. There are several approaches to stepwise regression: • Forward inclusion. • Backward elimination. • Stepwise solution.

  22. MUTLICOLINEARITY: • A state of very high correlations among independent variables. Multicollinearity results in several problems. • The partial regression coefficients may not be estimated precisely. • The standard errors are likely to be high. • The magnititude as well as the signs of the regression coefficients may cause problems. • Difficulty in assessing the relative importance of variables. • Predictor variables may cause problems.

  23. RELATIVE IMPORTANCE OF PREDICTORS: • Several approaches are commonly used to assess the relative importance of predictor variables: • Statistical significance • Square of simple correlation coefficient. • Square of partial correlation coefficient. • Square of part correlation coefficient. • Measures based on standardized coefficient or beta weights. • Stepwise regression.

  24. CROSS VALIDATION: • A test of validity that examines whether a model hods om comparable data not used in the original estimation. • DOUBLE CROSS VALIDATION: • A special form of validation in which a sample is split into halves. One half serves as an estimation and the other as validation sample. the roles of estimation and validation halves are then reversed, and the cross validation process repeated. • REGRESSION WITH DUMMY VARIABLES: • The model can be computed as follows: • Ŷi= a + b1 D1+ b2 D2+ b3 D3

More Related