560 likes | 1.28k Views
Regression Analysis. British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century. Meaning of Regression A nalysis.
E N D
Regression Analysis British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century.
Meaning of Regression Analysis • Regression means the estimation or prediction of the unknown value of dependent variable from the known value of independent variable. In other words, regression analysis is a mathematical measure of the average relationship between two or more variables. Definition of Regression By M.M. Blair- Regression is the measure of the average relationship between two or more variables. By Hirsch- regression analysis measures the nature and extent of the relation between two or more variables, thus enables us to make predictions.
Utility of Regression 1. Regression analysis is essential for planning and policy making:It is widely used to estimate the coefficient of economic relationship. These coefficient helps in the formulation of economic policy of govt. 2. Testing Economic Theory: Regression analysis is one of the basic tool to test the accuracy of economictheory. 3. Prediction:Byregression analysis, the value of dependent variable can be predicted on the basis of the value of an independent variable. For example, if price of a commodity rises, what will be the probable fall in demand, this can be predicted by regression. 4.Regression is used to study the functional relationship: In agriculture, it is important to study the response of fertilizer.
Types of Regression Analysis 1. Simple and Multiple Regression:In simple regression analysis, we study only two variables at a time, in which one variable is dependent and another is independent. The relationship between income and expenditure is an example of simple regression. In multiple regression analysis, we study more than two variables at a time in which one is dependent variable and others are independent variable. The study of effect of rain and irrigation on yield of wheat is an example of multiple regression. 2. Linear and Non-linear Regression: When one variable changes with other variable in fixed ratio this is called linear regression. When one variable varies with other variable in changing ratio, then it is called as non-linear regression. 3. Partial and Total Regression: When two or more variables are studied for functional relationship but at a time, relationship between two variables is studied and the other variables are constant, it is known as partial regression. When all the variables are studied simultaneously for the relationship among them is called total regression.
Regression Lines • The Regression Line shows the average relationship between two variables. This is also known as the line of Best Fit. If two variables X and Y are given, then there are two regression lines related to them: • Regression Line of X on Y:The regression line of X on Y gives the best estimate for the value of X for any given value of Y • Regression Line of Y on X: The regression line of Y on X gives the best estimate for the value of Y for any value of X.
Scatter diagram method • This is the simplest method of constructing regression lines. In this method, values of related variables are plotted on a graph. A straight line is drawn passing through the plotted points. The straight line is drawn with freehand. The shape of regression line can be linear or non-linear.
Least Square Method • Regression Lines can also be constructed by this method. Under this method a regression line is fitted through different points in such a way that the sum of squares of the deviations of the observed values from the fitted line shall be least. The line drawn by this method is called Line of Best Fit. Under this method two lines regression lines are drawn in such a way that sum of squared deviations becomes minimum.
Regression Equations • Regression equations are the algebraic formulation of regression lines. Regression Equations represent regression lines. There are two regression equations:
Regression Equation of Y on X • This equation is used to estimate the probable value of Y on the basis of the given values of X. This equation can be expressed in the following way: • Y=a+bX • Here a and b are constants • Regression equation of Y on X can also be presented in another way: • Y−Y¯=byx(X—X¯)
Regression Equation of X on Y • This equation is used to estimate the probable values of X on the basis of the given values of Y. This equation is expressed in the following: • X=a+bY • Here a and b are constants • Regression equation of X on Y can also be written in the following way: • X—X¯=bxy(Y—Y¯¯)
Regression Coefficients • There are two regression coefficients. Regression coefficients measures the average change in the value of one variable for q unit change in the value of another variables. Regression coefficient represents the slope of a regression line. There are two regression coefficients:
Regression coefficient of Y on X • This coefficient shows that with a unit change in the value of X variable, what will be the average change in the value of Y variable. This is represented byx. Its formula is as follows: • byx=r.y∕x • Regression coefficient of X on Y • This coefficient shows that with a unit change in the value of Y variable, what will be the average change in the value of X-variable. It is represented by bxy. Its formula is as follows: • bxy=r.x∕y
Properties of Regression Coefficients • Following are the properties of regression coefficients: • 1. Coefficient of correlation is the geometric mean of the regression coefficients. • r=√bxy×byx • 2. Both the regression coefficients must have same sign- This means either both regression coefficients will either be positive or negative. In other words when one regression coefficient is negative, the other would also be negative. It is never possible that one regression coefficient is negative while the other is postive.
3. The coefficient of correlation will have the same sign as that of regression coefficients- If both regression coefficients are negative, then the correlation would be negative. And if byx and bxy have positive signs, then r will also take plus sign. • 4. Both regression coefficients cannot be greater than unity- If one regression coefficient of y on x is greater than unity, then the regression coefficient of x on y must be less than unity. This is because • r=√byx.bxy=±1
5.Arithmetic mean of two regression coefficients is either equal to or greater than the correlation coefficient. In terms of the formula: byx+bxy÷2≥r 6. Shift of origin does not affect regression coefficients but shift in scale does affect regression coefficient- Regression coefficients are independent of the change of origin but not of scale.
In case of Individual Series • In Individual series, regression equations can be worked out by two methods:
Regression Equations using Normal Equations • This method is also called Least Square Method. Under this method, computation of regression equations is done by solving two normal equations. • Regression Equation of Y on X • Y=a+bX • Under least square method, the values of a and b are obtained by using the following two normal equations: • ∑Y=Na+b∑X • ∑XY=a∑X+b∑X² • Solving these equations, we get the following value of a and b: • bxy= N.∑XY−∑X.∑Y÷N.∑X²−(∑X)² • a=Y-bX • Finally a and b are put in the equation: • Y=a+bX
Regression Equation of X on Y • Regression Equation of X on Y is expressed as follows: • X=a+bY • Under least Square Method the value of a and b are obtained by using the following normal equations: • ∑X=Na+b∑Y • ∑XY=a∑Y+b∑Y² • Solving these equations we get the value of a and b The calculated value of a and b are put in the equation: X=a+bY
Example • Calculate the regression equation of X on Y from the following by least square method
Solution • Regression Equation of X on Y • X=a+bY • The two normal equations are: • ∑X=Na+b∑Y • ∑XY=a∑Y+b∑Y² • Substituting the values we get • 15=5a+25b • 88=25a+151b • Multiplying 1 and 2 • 88=25a+151b • 75=25a+125b • − − − • 13=26b • b=13÷26=0.5
Putting the value of b in equation1 • 15=5a+25×0.5 • 15=5a+12.5 • 5a=2.5 • a=0.50 • Therefore: X=0.5+0.5Y
Regression Equation using Regression Coefficients • Regression equations can also be computed with the help of regression coefficients. For this we will have to find out, X‾, Y‾, byx, bxy from the data. Regression equations can be computed from the regression coefficients by following method: • 1. Using actual values of X and Y • 2. Using deviations from Actual Means • 3. Using deviations from Assumed Means • 4. Using r, x, y, and X‾, Y‾.
Using Actual values of X and Y series • In this method , actual values of X and Y are used to determine regression equations. Regression equation are put in the following way: • Regression Equation of Y on X • Y-Y‾=byx(X-X‾) • Using the actual values, the value of byx can be calculated as: • byx=N.∑XY−∑X.∑Y÷N.∑X²−(∑X)² • Regression Equation of X on Y • X−X‾=bxy(Y−Y‾) Using the actual values bxy can be calculated as : bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)²
Example • Obtain two lines of equations from the following :
Regression coefficient of Y on X • byx=N.∑XY−(∑X) (∑Y)÷N.∑X²−(∑X)² • =5×266−30×40÷5×220−(30)² • =1330−1200÷1100−900 • =130÷200 • =0.65 • Regression coefficient of X on Y • bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)² =5×266−30×40÷5×340−(40)² =1330−1200÷1700−1600 =130÷100 =1.30 • Y−Y‾=byx(X−X‾) • Y‾=∑Y÷N • =40÷5 • =8 • X‾=∑X÷N • =30÷5 • =6
Regression equation of Y on X • Y−Y‾=byx(X−X‾) • Y−8=0.65(X− 6) • Y=4.1+0.65X • Regression equation of X on Y • X−X‾=bxy(Y−Y‾) • X−6=1.30(Y−8) • X=−4.40+1.30Y
Using Deviations take from Actual Means • When the size of the values of X and Y is very large, then the method using actual values becomes very difficult to use. In such case, deviations taken from arithmetic means are used. • Regression equation of Y on X • Y−Y‾=byx(X−X‾) • Using deviations from actual means: • byx=∑xy÷∑x² • Where, x=X−X‾, y=Y−Y‾ • Regression equation of X on Y • X−X‾=bxy(Y−Y‾) • Using deviations from actual means: • bxy=∑xy÷∑y²
Example • Obtain the two regression equations from the following:
Solution Since the actual means of X and Y are whole numbers we should take deviations from X‾and Y‾ • byx=∑xy÷∑x² • =18÷70 • =0.257 • bxy=∑xy÷∑y² • =18÷40 • =0.45 • Regression equation of Y on X • Y−Y‾=byx(X−X‾) • Y−5=0.257(X−7) • Y−5=0.257X−7 • Y=0.257X+3.201 • Regression equation of X on Y • X−X‾=bxy(Y−Y‾) • X−7=0.45(Y−5) • X−7=0.45Y−2.25 • X=0.45Y=4.75
Using Deviations taken from Assumed Means • When actual means turn out to be in fractions rather than the whole number like 24.14, 56.89 etc, then deviations from assumed means rather than actual means are used. Following are the regression equations: • Regression equation of Y on X • Y−Y‾=byx(X−X‾) • Using deviations from the assumed means, value of byx can be calculated as: • byx=N×∑dxdy−∑dx.∑dy÷N.∑dx²−(∑dx)²
Regression equation of X on Y • X−X¯=bxy(Y−Y¯) • Using deviations from assumed means the value of bxy can be calculated: • bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑dy)² • Where dx=X−Ax, dy=Y−Ay
Example • Obtain the two regression equations for the following data:
Solution • byx=N.∑dxdy−∑dx.∑dy÷N.∑dx²−(dx)² • =8×2159−(48)(109)÷8×1530−(48)² • =17272−5232÷12240−2304 • =12040÷9936 • =1.212 • Regression equation of Y on X • Y−Y¯=byx(X−X¯) • Y−125.625=1.212(X−75) • Y−125.625=1.212X−90.9 • Y=1.212X+34.725
bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑ dy)² • =8×2159−(48)(109)÷8×3491−(1005)² • =17272−5232÷27928−1010025 • =12040÷−9827097 • =−0.001 • Regression equation of X on Y • X−X¯=bxy(Y−Y¯) • X−75=−0.001(Y−125.625) • X−75=−0.001Y+0.125 • X=−0.001Y+9.375
Obtain Regression Equations from Coefficient of Correlation, Standard Deviations and Arithmetic Means of X and Y • When the values of X¯, Y, x, y and r of X and Y series are given then regression equations are expressed in the following way: • 1. Regression Equation Of Y on X • Y−Y¯=byx(X−X¯) • Where, byx=r.y÷x • 2.Regression equation of X on Y • X−X¯=bxy(Y−Y¯) • Where, bxy=r.x÷y
Example • You are given the following information: • Obtain two regression equations:
Solution • 1. Regression Equation Of X on Y: • X−X¯=r.x÷y(Y−Y¯) • Putting the values in the equation we get: • X−5=0.7×2.6÷3.6(Y−12) • X−5=0.51(Y−12) • X−5=0.51Y−6.12 • X=0.51Y−1.12
2. Regression equation of Y on X Y−Y¯=r.y÷x(X−X¯) Putting the values in the equation, we get Y−12=0.7×3.6÷2.6(X−5) Y−12=0.97(X−5) Y−12=0.97X−4.85 Y=0.97X+7.15
Obtain Regression Equations in case of Grouped Data For obtaining regression equations from grouped data, first of all we have to construct a correlation table. Special adjustments must be made while calculating the value of regression coefficients because regression coefficients are independent of change of origin but not of scale. In grouped data the regression coefficients are computed by using the formula:1.bxy=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdy²−(∑fdy)²×ix÷iy 2.byx=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdx²−(∑fdx)²×iy÷ix
Obtain the Mean values and Correlation Coefficient from the Regression Equations • 1. To find the Mean Values from the Regression Equations: Two regression lines intersect each other at mean value(X¯ and Y‾ points. • 2. To find the Coefficient of Correlation from two Regression Equations: Correlation coefficient can be worked out from the regression coefficients bxy and byx. • r=√byx.bxy
Example • From the following two regression equations, identify which one is X on Y and which one is Y on X • 2X+3Y=42 • X+2Y=26 • Solution • In the absence of any clear cut indication, Let us assume that equation first to be Y on X and equation second to be X on Y • Let equation 1 be regression equation of Y on X • 2X+3Y=42 • 3Y=42−2X • Y=42÷3−2X÷3
From this it follows that • byx=Coefficient of X in 1=−2÷3 Now equation 2 be regression equation on X on Y X+2Y=26 X=26−2Y From this it follows that bxy=coefficient of Y in 2=−2 Now, we calculate ‘r’ on the basis of the above values of two regression coefficients we get: r²=byx.bxy =−2÷3×−2 = 4÷3>1
Here, r²>1 which is impossible as r²≤1. So our assumption is wrong. We now choose equation 1 as regression of X on Y and 2 as regression equation of Y on X: • Assuming the first equation as on X on Y, we have • 2X+3Y=42 • 2X=42−3Y • X=42÷2−3Y÷2 • From this, it follows that • bxy=Coefficient of Y in above equation=−3÷2 • Now, assuming the second equation as Y on X, we have, • X+2Y=26 • 2Y=26−X • Y=26÷2−1÷2X • From this it follows that • byx=Coefficient of X in above equation=−1÷2
Now, r²=bxy.byx • =−3÷2×−1÷2 • =3÷4 • =0.75 • Here, r²<1 which is possible. r² is within the limit , r²≤1. • Hence, it is proved that the first equation is of X on Y and the second equation is of Y on X