270 likes | 748 Views
BUSINESS STATISTICS, 2/E. by. Chapter. G C Beri. 16. Correlation. Concept and Importance of Correlation.
E N D
BUSINESS STATISTICS, 2/E by Chapter G C Beri 16 Correlation
Concept and Importance of Correlation Correlation analysis is used as a statistical tool to ascertain the association between two variables. The problem in analysing the association between two variables can be broken down into three steps. We try to know whether the two variables are related or independent of each other. If we find that there is a relationship between the two variables, we try to know its nature and strength. This means whether these variables have a positive or a negative relationship and how close that relationship is. We may like to know if there is a causal relationship between them. This means that the variation in one variable causes variation in another.
Correlation and Causation The correlation may be due to chance particularly when the data pertain to a small sample. It is possible that both the variables are influenced by one or more other variables. There may be another situation where both the variables may be influencing each other so that we cannot say which is the cause and which is the effect.
Types of Correlation Positive and negative Linear and non-linear Simple, partial and multiple
Algebraic Methods of Correlation Karl Pearson’s Method Direct Method
Process of Calculating Coefficient of Correlation Calculate the means of the two series, X and Y. Take deviations in the two series from their respective means, indicated as x and y. The deviation should be taken in each case as the value of the individual item minus (–) the arithmetic mean. Square the deviations in both the series and obtain the sum of the deviation-squared columns. This would give Sx2 and Sy2.
Process of Calculating Coefficient of Correlation Take the product of the deviations, that is, Sxy. This means individual deviations are to be multiplied by the corresponding deviations in the other series and then their sum is obtained. The values thus obtained in the preceding steps Sxy, Sx2 and Sy2 are to be used in the formula for correlation, given earlier.
Calculating Coefficient of Correlation by Short-cut Method • Following steps are involved in calculating coefficient of correlation by this method: • Choose convenient values as assumed means of the two series, X and Y. • Deviations (now dx and dy instead of x and y) are obtained from the assumed means in the same manner as in the earlier example. • Obtain the sum of the dx and dy columns, that is, Sdx and Sdy. • Deviations dx and dy are squared up and their totals Sdx2 and Sdy2 are obtained. • Finally, obtain Sdxdy, which is the sum of the products of deviations taken from the assumed means in the two series.
Steps for Calculating Correlation Coefficient for Grouped Data • Record the mid-points of the class intervals for both X and Y variables. • Choose an assumed mean in X series and calculate the deviations from it. The same procedure is to be used for Y series. • To simplify calculations, step deviations can be taken by dividing deviations by a common factor. • Obtain the product of dx and the corresponding frequencies in each cell. Write the figure thus obtained in the right-hand corner of each cell. The same procedure is to be followed for Y series. If this is inconvenient, an alternative of this is to write these values within brackets as we have done. This will give Sfdx and Sfdy.
Steps for Calculating Correlation Coefficient for Grouped Data • All the values obtained in (iv) above are to be • added up to obtain Sfdxdy. • Multiply dx with the respective frequencies, add • them up to obtain Sfdx. • Multiply fdx in each cell by the corresponding dx • to obtain Sfdx2. • In the same manner, multiply dy with the • respective frequencies, add them up to obtain Sfdy. • In the same manner as done in (vii) above, multiply • dy and fdy to obtain Sfdy2. • Having obtained all the requisite values, viz. Sfdxdy, • Sfdx, Sfdy, Sfdx2 and Sfdy2, substitute them in one • of the formulae given above.
t Test for a Correlation Coefficient The most frequently used test to examine whether the two variables X and Y are correlated is the t test. To apply this test, we first set up the two hypotheses as follows: H0 : r = 0 (Absence of correlation) H1 : r¹ 0 (Presence of correlation) where p is the population correlation coefficient. The test statistic t follows a t distribution with n-2 degrees of freedom
Assumptions of the Karl Pearsonian Correlation The two variables X and Y are linearly related. The two variables are affected by several causes, which are independent, so as to form a normal distribution.
Coefficient of Determination The strength of r is judged by coefficient of determination, r2 for r = 0.9, r2 = 0.81. We multiply it by 100, thus getting 81 per cent. This suggests that when r is 0.9 then we can say that 81 per cent of the total variation in the Y series can be attributed to the relationship with X.
Rank Correlation In the second case, when the ranks are not given, that is, when actual data are given, we have to assign ranks.
Limitations of Spearman’s Method of Correlation Spearman’s r is a distribution-free or non parametric measure of correlation. As such, the result may not be as dependable as in the case of ordinary correlation where the distribution is known. Another limitation of rank correlation is that it cannot be applied to a grouped frequency distribution. when the number of observations is quite large and one has to assign ranks to the observations in the two series, then such an exercise becomes rather tedious and time-consuming. This becomes a major limitation of rank correlation.
Some Limitations of Correlation Analysis Correlation analysis cannot determine cause-and-effect relationship. Another mistake that occurs frequently is on account of misinterpretation of the coefficient of correlation and the coefficient of determination. Another mistake in the interpretation of the coefficient of correlation occurs when one concludes a positive or negative relationship even though the two variables are actually unrelated.