1.37k likes | 1.93k Views
BUSINESS FORECASTING. FORECASTING WITH REGRESSION MODELS TREND ANALYSIS. Prof. Dr. Burç Ülengin ITU MANAGEMENT ENGINEERING FACULTY FALL 2011. OVERVIEW. The bivarite regression model Data inspection Regression forecast process Forecasting with simple linear trend
E N D
BUSINESS FORECASTING FORECASTING WITH REGRESSION MODELS TREND ANALYSIS Prof. Dr. Burç Ülengin ITU MANAGEMENT ENGINEERING FACULTY FALL 2011
OVERVIEW • The bivarite regression model • Data inspection • Regression forecast process • Forecasting with simple linear trend • Causal regression model • Statistical evaluation of regression model • Examples...
The Bivariate Regression Model • The bivariate regression model is also known a simple regression model • It is a statistical tool that estimates the relationship between a dependent variable(Y) and a single independent variable(X). • The dependent variable is a variable which we want to forecast
The Bivariate Regression Model General form Dependent variable Independent variable Specific form: Linear Regression Model Random disturbance
The Bivariate Regression Model • The regression model is indeed a line equation • 1= slope coefficient that tell us the rate of change in Y per unit change in X • If 1= 5, it means that one unit increase in X causes 5 unit increase in Y • is random disturbance, which causes for given X, Y can take different values • Objective is to estimate 0 and 1 such a way that the fitted values should be as close as possible
The Bivariate Regression ModelGeometrical Representation Y Good fit Poor fit X The red line is more close the data points than the blue one
Best Fit Estimates population sample
Misleading Best Fits Y Y e2=100 e2=100 X X Y Y e2=100 e2=100 X X
THE CLASSICAL ASSUMPTIONS • 1. The regression model is linear in the coefficients, correctly specified, & has an additive error term. • 2. E(e) = 0. • All explanatory variables are uncorrelated with the error term. • Errors corresponding to different observations are uncorrelated with each other. • The error term has a constant variance. • No explanatory variable is an exact linear function of any other explanatory variable(s). • The error term is normally distributed such that:
Regression Forecasting Process • Data consideration: plot the graph of each variable over time and scatter plot. Look at • Trend • Seasonal fluctuation • Outliers • To forecast Y we need the forecasted value of X • Reserve a holdout period for evaluation and test the estimated equation in the holdout period
An Example: Retail Car Sales • The main explanatory variables: • Income • Price of a car • Interest rates- credit usage • General price level • Population • Car park-number of cars sold up to time-replacement purchases • Expectation about future • For simple-bivariate regression, income is chosen as an explanatory variable
Bi-variate Regression Model • Population regression model • Our expectation is1>0 • But, we have no all available data at hand, the data set only covers the 1990s. • We have to estimate model over the sample period • Sample regression model is
Retail Car Sales and Disposable Personal Income Figures Quarterly car sales 000 cars Disposable income $
OLS Estimate Dependent Variable: RCS Method: Least Squares Sample: 1990:1 1998:4 Included observations: 36 Variable Coefficient Std. Error t-Statistic Prob. C 541010.9 746347.9 0.724878 0.4735 DPI 62.39428 40.00793 1.559548 0.1281 R-squared 0.066759 Mean dependent var 1704222. Adjusted R-squared 0.039311 S.D. dependent var 164399.9 S.E. of regression 161136.1 Akaike info criterion 26.87184 Sum squared resid 8.83E+11 Schwarz criterion 26.95981 Log likelihood -481.6931 F-statistic 2.432189 Durbin-Watson stat 1.596908 Prob(F-statistic) 0.128128
Basic Statistical Evaluation • 1 is the slope coefficient that tell us the rate of change in Y per unit change in X • When the DPI increases one $, the number of cars sold increases 62. • Hypothesis test related with 1 • H0: 1=0 • H1: 10 • t test is used to test the validity of H0 • t = 1/se(1) • If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0 • t= 1,56 < t table or Pr = 0.1281 > 0.05 Do not Reject H0 • DPI has no effect on RCS
Basic Statistical Evaluation • R2 is the coefficient of determination that tells us the fraction of the variation in Y explained by X • 0<R2<1 , • R2 = 0 indicates no explanatory power of X-the equation. • R2 = 1 indicates perfect explanation of Y by X-the equation. • R2 = 0.066 indicates very weak explanation power • Hypothesis test related with R2 • H0: R2=0 • H1: R20 • F test check the hypothesis • If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic=2.43 < F table or Pr = 0.1281 > 0.05 Do not reject H0 • Estimated equation has no power to explain RCS figures
Graphical Evaluation of Fitand Error Terms Residuls show clear seasonal pattern
Model Improvement • When we look the graph of the series, the RCS exhibits clear seasonal fluctuations, but PDI does not. • Remove seasonality using seasonal adjustment method. • Then, use seasonally adjusted RCS as a dependent variable.
Seasonal Adjustment • Sample: 1990:1 1998:4 • Included observations: 36 • Ratio to Moving Average • Original Series: RCS • Adjusted Series: RCSSA • Scaling Factors: • 1 0.941503 • 2 1.119916 • 3 1.016419 • 4 0.933083
OLS Estimate Dependent Variable: RCSSA Method: Least Squares Sample: 1990:1 1998:4 Included observations: 36 Variable Coefficient Std. Error t-Statistic Prob. C 481394.3 464812.8 1.035674 0.3077 DPI 65.36559 24.91626 2.623411 0.0129 R-squared 0.168344 Mean dependent var 1700000. Adjusted R-squared 0.143883 S.D. dependent var 108458.4 S.E. of regression 100352.8 Akaike info criterion 25.92472 Sum squared resid 3.42E+11 Schwarz criterion 26.01270 Log likelihood -464.6450 F-statistic 6.882286 Durbin-Watson stat 0.693102 Prob(F-statistic) 0.012939
Basic Statistical Evaluation • 1 is the slope coefficient that tell us the rate of change in Y per unit change in X • When the DPI increases one $, the number of cars sold increases 65. • Hypothesis test related with 1 • H0: 1=0 • H1: 10 • t test is used to test the validity of H0 • t = 1/se(1) • If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0 • t= 2,62 < t table or Pr = 0.012 < 0.05 Reject H0 • DPI has statistically significant effect on RCS
Basic Statistical Evaluation • R2 is the coefficient of determination that tells us the fraction of the variation in Y explained by X • 0<R2<1 , • R2 = 0 indicates no explanatory power of X-the equation. • R2 = 1 indicates perfect explanation of Y by X-the equation. • R2 = 0.1683 indicates very weak explanation power • Hypothesis test related with R2 • H0: R2=0 • H1: R20 • F test check the hypothesis • If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic = 6.88 < F table or Pr = 0.012 < 0.05 Reject H0 • Estimated equation has some power to explain RCS figures
Graphical Evaluation of Fitand Error Terms No seasonality but it still does not look random disturbance Omitted Variable? Business Cycle?
Simple Regression ModelSpecial Case: Trend Model • Independent variable Time, t = 1, 2, 3,...., T-1, T • There is no need to forecast the independent variable • Using simple transformations, variety of nonlinear trend equations can be estimated , therefore the estimated model can mimic the pattern of the data
Chapter 3 Exercise 13College Tuition Consumers' Price Index by Quarter Holdout period
OLS Estimates Dependent Variable: FEE Method: Least Squares Sample: 1986:1 1994:4 Included observations: 36 Variable Coefficient Std. Error t-Statistic Prob. C 115.7312 1.982166 58.38624 0.0000 @TREND 3.837580 0.097399 39.40080 0.0000 R-squared 0.978568 Mean dependent var 182.8889 Adjusted R-squared 0.977938 S.D. dependent var 40.87177 S.E. of regression 6.070829 Akaike info criterion 6.498820 Sum squared resid 1253.069 Schwarz criterion 6.586793 Log likelihood-114.9788 F-statistic 1552.423 Durbin-Watson stat 0.284362 Prob(F-statistic) 0.000000 e2
Basic Statistical Evaluation • 1 is the slope coefficient that tell us the rate of change in Y per unit change in X • Each year tuition increases 3.83 points. • Hypothesis test related with 1 • H0: 1=0 • H1: 10 • t test is used to test the validity of H0 • t = 1/se(1) • If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0 • t= 39,4 > t table or Pr = 0.0000 < 0.05 Reject H0
Basic Statistical Evaluation • R2 is the coefficient of determination that tells us the fraction of the variation in Y explained by X • 0<R2<1 , • R2 = 0 indicates no explanatory power of X-the equation. • R2 = 1 indicates perfect explanation of Y by X-the equation. • R2 = 0.9785 indicates very weak explanation power • Hypothesis test related with R2 • H0: R2=0 • H1: R20 • F test check the hypothesis • If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic= 1552 < F table or Pr = 0.0000 < 0.05 Reject H0 • Estimated equation has explanatory power
Graphical Evaluation of Fit Holdout period ACTUALFORECAST 1995 Q1 260.00 253.88 1995 Q2259.00 257.72 1995 Q3 266.00 261.55 1995 Q4 274.00 265.39
Graphical Evaluation of Fitand Error Terms Residuals exhibit clear pattern, they are not random Also the seasonal fluctuations can not be modelled Regression model is misspecified
Model Improvement • Data may exhibit exponential trend • In this case, take the logarithm of the dependent variable • Calculate the trend by OLS • After OLS estimation forecast the holdout period • Take exponential of the logarithmic forecasted values in order to reach original units
Original and Logarithmic Transformed Data LOG(FEE) FEE 4.844187 127.000 4.844187 127.000 4.867534 130.000 4.912655 136.000 4.912655 136.000 4.919981 137.000 4.941642 140.000 4.976734 145.000 4.983607 146.000
OLS Estimate of the Logrithmin Trend Model Dependent Variable: LFEE Method: Least Squares Sample: 1986:1 1994:4 Included observations: 36 Variable Coefficient Std. Error t-Statistic Prob. C 4.816708 0.005806 829.5635 0.0000 @TREND 0.021034 0.000285 73.72277 0.0000 R-squared 0.993783 Mean dependent var 5.184797 Adjusted R-squared 0.993600 S.D. dependent var 0.222295 S.E. of regression 0.017783 Akaike info criterion -5.167178 Sum squared resid 0.010752 Schwarz criterion -5.079205 Log likelihood 95.00921 F-statistic 5435.047 Durbin-Watson stat 0.893477 Prob(F-statistic) 0.000000
Forecast Calculations obs FEE LFEEF FEELF=exp(LFEEF) 1993:1 228.0000 5.405651 222.6610 1993:2 228.0000 5.426684 227.3940 1993:3 235.0000 5.447718 232.2276 1993:4 243.0000 5.468751 237.1639 1994:1 244.0000 5.489785 242.2052 1994:2 245.0000 5.510819 247.3536 1994:3 251.0000 5.531852 252.6114 1994:4 259.0000 5.552886 257.9810 1995:1 260.0000 5.573920 263.4648 1995:2 259.0000 5.594953 269.0651 1995:3 266.0000 5.615987 274.7845 1995:4 274.0000 5.637021 280.6254
Graphical Evaluation of Fitand Error Terms Residuals exhibit clear pattern, they are not random Also the seasonal fluctuations can not be modelled Regression model is misspecified
Model Improvement • In order to deal with seasonal variations remove seasonal pattern from the data • Fit regression model to seasonally adjusted data • Generate forecasts • Add seasonal movements to the forecasted values
Multiplicative Seasonal Adjustment • Included observations: 40 • Ratio to Moving Average • Original Series: FEE • Adjusted Series: FEESA • Scaling Factors: • 1 1.002372 • 2 0.985197 • 3 0.996746 • 4 1.015929
OLS Estimate of the Seasonally Adjusted Trend Model Dependent Variable: FEESA Method: Least Squares Sample: 1986:1 1995:4 Included observations: 40 Variable Coefficient Std. Error t-Statistic Prob. C 115.0387 1.727632 66.58749 0.0000 @TREND 3.897488 0.076240 51.12152 0.0000 R-squared 0.985668 Mean dependent var 191.0397 Adjusted R-squared 0.985291 S.D. dependent var 45.89346 S.E. of regression 5.566018 Akaike info criterion 6.319943 Sum squared resid 1177.261 Schwarz criterion 6.404387 Log likelihood -124.3989 F-statistic 2613.410 Durbin-Watson stat 0.055041 Prob(F-statistic) 0.000000
Graphical Evaluation of Fitand Error Terms Residuals exhibit clear pattern, they are not random There is no seasonal fluctuations Regression model is misspecified
Model Improvement • Take the logarithm in order to remove existing nonlinearity • Use additive seasonal adjustment to logarithmic data • Apply OLS to seasonally adjusted logrithmic data • Forecast holdout period • Add seasonal movements to reach seasonal forecasts • Take an exponential in order to reach original seasonal forecasts
Logarithmic Transformation and Additive Seasonal Adjustment • Sample: 1986:1 1995:4 • Included observations: 40 • Difference from Moving Average • Original Series: LFEE =log(FEE) • Adjusted Series: LFEESA • Scaling Factors: • 1 0.002216 • 2 -0.014944 • 3 -0.003099 • 4 0.015828
Original and Logarithmic Additive Seasonally Adjustment Series