1 / 59

Advanced Forecasting Methods: Predictive Classification and Regression Models

This chapter delves into advanced forecasting methods, focusing on predictive classification and regression techniques such as classification and regression trees, logistic regression, neural network methods, and vector autoregressive models.

darrylb
Download Presentation

Advanced Forecasting Methods: Predictive Classification and Regression Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 10: Advanced Methods of Forecasting* 10.1 Predictive Classification 10.2 Classification and Regression Trees 10.3 Logistic Regression 10.4 Neural Network Methods 10.5 Vector Autoregressive(VAR) Models 10.6 Principles

  2. Chapter 10: Advanced Methods of Forecasting* 10.1: Predictive classification • Where we try to forecast whether new (future) observations fall into a ‘target’ class • More generally, to predict which of a set of classes future observations are expected to fall 10.2-10.4: The methods • Classification and regression trees • Logistic regression • Neural nets New observations ? ? Class 1 Class 2

  3. Chapter 10: Advanced Methods of Forecasting* 10.5: New(ish) econometric methods • Vector Autoregressions (VAR) Models • Modelingunleaded gas prices Appendices: Nonstationarydata • Unit roots • Introducing Cointegration • Working through the complexity

  4. 10.1: Predictive Classification What is predictive classification? • Target variable: In predictive classification, the sample of observations (or cases) under analysis can be classified into two or more discrete categories based on the different possible outcomes of the target variable. • Target Class: The particular category of interest in a predictive classification study, e.g., the “bads”: defaulters on a loan. New observations ? ? Class1 Class 2 Q: Find examples of predictive classification and the targets.

  5. 10.1: Predictive Classification An Example: Predicting Ownership of Computers Cross-sectional Survey data • Ownership • Income • Education • Kids • etc

  6. 10.1: Predictive Classification • A simple predictive model: • Where Yt=1 if the household owns a computer=0 otherwise, • Xs are various explanatory variables, e.g. income. • Split the sample into a training (estimation) sample of 80%; Test Sample – 20% remaining. • Define: • X1=1 if the household income is level2 • X2=1 if the household income is level3 etc • Use 4 dummy variables to capture income effects Q: There are 5 income categories. How many dummy variables do we need?

  7. 10.1: Predictive Classification The results: A model for predicting household ownership of computers

  8. 10.1.1: Evaluating the Accuracy of the Predictions: The Classification Matrix I • The total sample N = (h00 + h01 + h10 + h11), • A 50 percent threshold, or cutoff, gives • The overall proportion classified as correct is (Instructor: Complete in class.)

  9. 10.1.1: Evaluating the Accuracy of the Predictions: The Classification Matrix II Target Event – households not owning computers as shown: • sensitivity = • measures the proportion of target events that are correctlypredicted (i.e., nonowners). • specificity= • measures the proportion of correctly predicted nontargetevents.

  10. 10.1.1: Evaluating the Accuracy of the Predictions Fig. 10.2 Data: Compdata.xlsx; adapted from SAS output. Graphs for helping interpret the classification accuracy.

  11. 10.1.1: Evaluation the Accuracy of a Prediction Receiver Operating Characteristic (ROC) Curve Fig 10.3 • Sensitivity • proportion of target events correctly predicted • 1-Specificity • proportion of non-target events falsely predicted SAS output Graphs for helping interpret the classification accuracy. Q: What do you observe? Data: Compdata.xlsx; adapted from SAS output.

  12. 10.1.1: Evaluating the Accuracy of the Predictions Using Profit or Loss • PG – profit (loss) from a correct classification of the target • PB – profit (loss) from miss-classifying a non-target as a target • Expected Profit(Cutoff ) = PGh11 + h01PB Q: For a Web-based insurance provider, what costs and profits should be considered in deciding whether to offer a policy to a new customer?

  13. 10.2: Classification and Regression Trees • Two leaves recommend granting the loan: • Credit score>600 & Employed • Credit score<600 and owning home

  14. 10.2: Classification and Regression Trees: Performance Measures II • Suppose: • Granting a bad load costs an average of $10K • Approving a good loan generates $2K • No loans: Total profit=0 • All loans approved: Total profit= ? Q: Does our classification model help? Table 10.5 Classification Matrix for Data on Bank Loans Note: different cut-offs or threshold values will give different classification matrices

  15. 10.2: Classification and Regression Trees: Performance Measures III • 1st level: Sensitivity= Specificity= Profit= • 2nd level: Sensitivity= Specificity= Profit= (Instructor to complete)

  16. 10.2: Classification and Regression Trees: Algorithms I • CHAID (Chi-squared Automatic Interaction Detection) At each step, CHAID chooses the independent (predictor) variable that has the strongest interaction [as measured by a chi-squared criterion] with the dependent variable. • Categories of each predictor are merged if they are not significantly different with respect to the dependent variable. • Different software packages have different approaches to operationalizing these steps.

  17. 10.2: Classification and Regression Trees: Algorithms II • If a split produces the same proportions of goods and bads it has zero value as a predictor. • No value in split  No Association between predicted and actual. • Expected number of {Good/ Good} cases= • Observed value = 110, a difference of 6.

  18. 10.2: Classification and Regression Trees: Algorithms III • The chi-square statistic is: • If there was no value in the split this statistic would be small. • A formal chi-squared test at a 1% significance level shows this to be significant (Critical value with 1 df, 6.63).

  19. 10.2: Classification and Regression Trees: Computer ownership revisited • NB: Always examine performance on the test data set • No deterioration here

  20. 10.2: Classification and Regression Trees: Computer ownership revisited Data: Compdata.xlsx; adapted from SPSS output.

  21. 10.3: Logistic Regression: Computer example continued • Let Pi= Probability that household does not own a computer • Model: or • Where • A linear function of explanatory variables of ownership NB: Special programs including SPSS and SAS are needed to estimate this Unlike regression, there is no simple equation for the parameter estimates

  22. 10.2: Classification and Regression Trees: Computer example continued II Odds Ratio; last column – Exp(B) measures the proportional increase (decrease) in the odds against ownership, e.g. having kids increases the probability of ownership by 1/(0.516) or 94% relative to not having kids. Q: Does the parameter estimates make sense?

  23. 10.3: Logistic Regression Evaluating a logistic regression model • Examine the p-values of the model to simplify the model. • Don’t drop insignificant dummy variables that define a categorical variable, e.g. income. • Check residuals for outliers. • Is this a small sample of the training set. • Consider different specifications comparing their overall fit • Does the model work well across all values of Pi? • Does the model add value on the hold-out sample?

  24. 10.3: Logistic Regression: Two ways of defining the cut-off (threshold) for classification • The logistic model derives estimated probabilities of non-ownership for each individual in the training or test samples. • To decide whether to predict an individual as a non-owner (or owner) we must specify a cut-off. • Either choose a fixed proportion – say the 50% of sample with highest probability of being in the target (non-ownership) class. Or • Classify any individual with a prediction higher than some chosen probability, say 0.9. • These give different classification matrices.

  25. 10.3: Logistic Regression: Two ways of defining the cut-off (threshold) for classification II • 50% predicted as non-owners • Households with an estimated probability > 0.5 predicted as owners Q: What’s the difference between the two approaches here?

  26. Take-Aways from Classification Methods • Consider a variety of different classification methods • Use a variety of error measures to evaluate the methods • In developing classification models, always retain a subset of the data with which to test the effectiveness of the proposed methods • Always compare the new method with a benchmark classification approach (The Basic Principle of Forecasting)

  27. 10.4: Neural Network Methods • Hidden layer • 3 hidden nodes Output Inputs Explanatory variables Neural Network: a non-linear transformation of inputs into output(s).

  28. 10.4: Neural Network Methods Issues in developing Neural Network Models • Which inputs to include • Number of hidden nodes (one hidden layer) • Data pre-processing to ensure pre-processed inputs between 0 & 1 • Choice of programs (no established standard) • Choice of parameters in non-linear estimation • Final selection of forecasts (combined or selected)

  29. 10.4: Neural Network Methods • The process of building a neural network • Choose the input variables. • Divide the sample into training (estimation), validation, and test (hold-out) data sets. • Standardized and coded the input variables (e.g., with dummy variables for categorical inputs). • Use different sets of starting values in estimation. • Remove unimportant variables. • Compare the results from using different numbers of hidden nodes. • Choose the neural network architecture that leads to the best performance on the validation data. • Use the performance on the test data to measure the likely future performance of the network.

  30. 10.4.1: A Cross-Sectional Neural Network Analysis Comparing different numbers of hidden nodes • Variables: • Income • Education • 1, 3, and 5 hidden nodes • Validation sample 30% • SAS output Data: Compdata.xlsx; adapted from SAS output. At lower cut-offs 3 or 5 hidden nodes better. Q: Why are the low cut-offs more relevant in this example?

  31. 10.4.2: A Time Series Neural Network Analysis Example I Fig 10.9: Retail Sales in the UK Source: www.ons.gov.uk/ons/rel/rsi/retail-sales/august-2011/tsd-retail-sales.html. Data shown is from file UK_retail_sales.xlsx. What characteristics can you identify?

  32. 10.4.2: A Time Series Example II: Error Measures for Different Hidden Nodes • Data pre-processing. Transform the data • 18 years for training (estimation) • 2 for validation, 2 for testing (forecasting out-of-sample) • Results compared using 1-10 hidden nodes on validation sample Data: UK_retail_sales.xlsx.

  33. 10.4.2: A Time Series Example III: Error Progression during Training Training stopped when little sign of improvements in validation data. Data: UK_retail_sales.xlsx.

  34. 10.4.2: A Time Series Example III: The Forecast Results Data: UK_retail_sales.xlsx. Q: How do you interpret these results?

  35. Take-Aways for Neural Network Modeling • Scale the data before estimating the neural network model. • Use a variety of random starting values to avoid local minima. • Use the median forecasts from these results. • Use techniques such as validation samples to avoid overfitting.

  36. 10.5: Vector Autoregressive (VAR) Models • The simple autoregressive model: • The basic idea: Vector Variables (Yt)=f(lagged variables Yt)

  37. 10.5: Vector Autoregressive (VAR) Models Example: Oil prices Data: Gas_prices.xlsx. Q: What variables do you think influence unleaded pump prices?

  38. A simple VAR of Unleaded, Crude and the CPI Source: Output adapted from EViews 7. Data: Gas_prices.xlsx. Note that for presentation purposes, the number of significant figures in the table should be much smaller.

  39. A simple VAR of Unleaded, Crude and the CPI II Source: Output adapted from EViews 7. Data: Gas_prices.xlsx. Note that for presentation purposes, the number of significant figures in the table should be much smaller.

  40. 10.5: Vector Autoregressive (VAR) Models • Various criteria available • AIC – Akaike’s Information Criterion • SC –Bayesian (Schwarz) Information Criterion Note conflicting conclusions: SC usually points to simpler models, here with 3 lags.

  41. 10.5: Vector Autoregressive (VAR) Models Other VAR issues • Estimating the model • OLS • Producing the forecasts • With only lags they can be produced automatically (as with a simple AR model) • Exogenous variables may be added in • Avoids too many parameters • More intuitive models • But forecasting is conditional on the exogenous variables

  42. 10.5: Vector Autoregressive (VAR) Models Oil price example (cont.) Oil price example (cont.) The problem: the recession of 2008! 2nd order VAR best in sample but relatively poor outside sample Data: Gas_prices.xlsx.

  43. Take-Aways for VAR Modeling • Use a VAR model (based on lags), as opposed to single equations based upon concurrent values of the explanatory variables. • Reduce the lag length on each variable separately. • Model the data in levels, rather than differences, initially. • Examine all the model’s variables for potential nonstationarity. • Use theoretical analysis, graphs, autocorrelations, and such to decide whether the variables are nonstationary. • Beware: Relying on tests alone is dangerous because these tests have weak power. • For nonstationary data, develop an Error Correction Model (ECM) model and compare it with an unrestricted model and a model in differences.

  44. Appendix 10B: The Effects of Nonstationary Data Graph of log CPI, Unleaded, and Change in Log CPI • Definition: A time series is stationary if the mean, variance, and autocorrelation function are independent of time. Data: Gas_prices.xlsx; Standardization over the period 1996-2010.

  45. Appendix 10B: The Effects of Nonstationary Data Exercise: • Generate two random variables with linear time trends: • Assume the error terms are normally distributed with zero means • Calculate the correlation between Y1and Y2for different choices of the error variances, the sample size, and β1 andβ2 • The values of the intercepts do not matter. Why? • With common trends it is all too easy to ‘find’ variables are related - the problem of spurious regression.

  46. Appendix 10B: The Effects of Nonstationary Data An example of spurious regression: the two series are independent but both have expected values that are linear functions of time. The sample correlation is 0.91 Q: Can you identify examples of spuriously correlated series?

  47. Appendix 10B: The Effects of Nonstationary Data: The problem with modelingnonstationary data II • One response: model in differences • A model in levels has implications for a model in differences and vice versa. • But a non-stationary variable, unleaded prices cannot be modeled by a stationary variable (CPI) alone and vice versa. The residuals would be autocorrelated.

  48. Appendix 10C: Differencing and Unit Roots: The problem with modelingnonstationary data III • We need to identify a nonstationary time series to protect ourselves from ‘spurious regression’ • Deciding whether a time series is nonstationary • Look at it • Think about the variable • Test Augmented Dickey-Fuller test: =1 for nonstationarity Added constant: Added constant and trend: Three tests, all weak, i.e. poor at discriminating.

  49. Appendix 10C: Differencing and Unit Roots: The problem with modelingnonstationary data IV • Constant and trend, results from EViews Table 10C.1 The Results from an Augmented Dickey-Fuller Test for Non-Stationarity of Ln(Crude) • Null hypothesis – data series is non-stationary • Alternative – series is stationary See details in Appendix 10.C

More Related