1 / 21

BUSINESS STATISTICS, 2/E

BUSINESS STATISTICS, 2/E. by. Chapter. G C Beri. 15. Regression Analysis. What is Regression?.

Download Presentation

BUSINESS STATISTICS, 2/E

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BUSINESS STATISTICS, 2/E by Chapter G C Beri 15 Regression Analysis

  2. What is Regression? It was Sir Francis Galton who first used the term regression as a statistical concept in 1877. He made a statistical study that showed that the height of children born to tall parents tends to ‘regress’ towards the mean height of population. Galton used the term regression as a statistical technique to predict one variable (the height of children) from another variable (the height of parents). This is called ‘regression’ or ‘simple regression’ confined to bivariate data. The variable that forms the basis for predicting another variable is known as the independent or predictor variable and the variable that is predicted, is known as the dependent variable.

  3. Regression Model A statistical model is a set of mathematical formulas and assumptions which describe a real world situation. In this sense, simple linear regression as also multiple regression are statistical models. A statistical model tries to capture the systematic behaviour of the given data, leaving out those factors that cannot be foreseen or predicted. These factors are the errors. A good statistical model is one which provides as large a systematic component as possible, minimising errors.

  4. Regression Model (Contd…) As a first step, we choose a particular model, say a linear regression model, for describing the relationship between the two variables. As a second step, we work out the estimates of the model parameters on the basis of random sample data. The third step is to consider the errors that are called residuals, arising on the fit of the model to the data. When we are convinced that the residuals contain only pure randomness, we consider our model quite appropriate for its intended purpose, which invariably happens to make predictions.dependent variable on the dependent variable.

  5. Estimation Using the Regression Line A Scatter diagram can give us a broad idea of the type of relationship (or even absence of any relationship) between the two variables under study. The equation for a straight line is Y = a + bX where Y is the dependent variable, X is the independent variable, a is the Y-intercept, which is the point at which the regression line crosses the Y-axis (the vertical axis) and b is the slope of the regression line. It should be noted that the values of both a and b will remain constant for any given straight line.

  6. The Method of Least Squares • In order to explain the method of least squares, it is necessary to introduce a new symbol. • A new symbol (computed or estimated value of Y) is used to represent individual values of the estimated points, that is, those points that actually lie on the estimating line. In view of this, the equation for the estimating line becomes = a + bX.

  7. The Method of Least Squares (Contd…)   The two normal equations are: SY = na + bSX SXY = aSX + bSX2 where SY = the total of Y series n = number of observations SX = the total of X series SXY = the sum of XY column SX2 = the total of squares of individual items in X series   a and b are the Y-intercept and the slope of the regression line, respectively.

  8. Alternative Approach

  9. Use of Deviations from Means of X & Y

  10. Use of Deviations from the Assumed Means

  11. Regression in Case of Bivariate Grouped Frequency Distributions

  12. Regression Coefficient

  13. Properties of Regression Coefficients

  14. The Standard Error of Estimate It is the measure of the spread of observed values from the estimated ones, expressed by regression equation. This concept is similar to the standard deviation, which measures the variation of individual items about the arithmetic mean.

  15. The Standard Error of Estimate (Short-cut method

  16. Interpreting Standard Error of Estimate It is the measure of the spread of observed values from the estimated ones, expressed by regression equation. ThiHigher the magnitude of the standard error of estimate, the greater is the dispersion or variability of points around the regression line. In contrast, if the standard error of estimate is zero, then we may take it that the estimate in equation is the best estimator of the dependent variable. In such a case, all the points would lie on the regression line. As such, there would be no point scattered around the regression line.s concept is similar to the standard deviation, which measures the variation of individual items about the arithmetic mean.

  17. Hypothesis Tests about Regression Relationship

  18. Interval Estimate of B We recall that Y = a + bx really is a sample regression line and, as such, is only one of several possible sample regression lines. The population regression line is Y = A + BX where A equals the population equivalent of the sample a. Similarly, B is the parameter analogous to b, which is the slope of the sample regression line.   In order to determine the interval estimate of B, the formula is b ± t Sb

  19. How Good is the Regression

  20. Strength of the Association SSR SST we have to calculate the coefficient of determination, i. e. r2 = which shows variation in Y explained by regression compared to total variation. It should be obvious that greater is r2, higher is the degree of association. The range of r2 is 0 to 1 while r varies from –1 to +1.

  21. Cautions in the Use of Regression Analysis The inclusion of one or two extreme items can completely change a given relationship between the variable. As such, extreme values should be excluded from the data. It is advisable to first draw a scatter diagram so that one can have an idea of the possible relationship between X and Y. In the absence of a scatter diagram, one may attempt a linear regression model but the given set of data may actually show a non-linear relationship. When predictions based on regression analysis are made, one should be sure that the nature and extent of relationship between X and Y will remain the same. This assumption at times is completely overlooked that may lead to errors in prediction. In many cases the regression line computed is a sample regression line This implies that the constant a and the regression coefficient b are for the sample. It is advisable to make some refinement for providing an interval within which the true population regression line lies.

More Related