590 likes | 619 Views
Chapter 10: Advanced Methods of Forecasting*. 10.1 Predictive Classification 10.2 Classification and Regression Trees 10.3 Logistic Regression 10.4 Neural Network Methods 10.5 Vector Autoregressive(VAR) Models 10.6 Principles. Chapter 10: Advanced Methods of Forecasting*.
E N D
Chapter 10: Advanced Methods of Forecasting* 10.1 Predictive Classification 10.2 Classification and Regression Trees 10.3 Logistic Regression 10.4 Neural Network Methods 10.5 Vector Autoregressive(VAR) Models 10.6 Principles
Chapter 10: Advanced Methods of Forecasting* 10.1: Predictive classification • Where we try to forecast whether new (future) observations fall into a ‘target’ class • More generally, to predict which of a set of classes future observations are expected to fall 10.2-10.4: The methods • Classification and regression trees • Logistic regression • Neural nets New observations ? ? Class 1 Class 2
Chapter 10: Advanced Methods of Forecasting* 10.5: New(ish) econometric methods • Vector Autoregressions (VAR) Models • Modelingunleaded gas prices Appendices: Nonstationarydata • Unit roots • Introducing Cointegration • Working through the complexity
10.1: Predictive Classification What is predictive classification? • Target variable: In predictive classification, the sample of observations (or cases) under analysis can be classified into two or more discrete categories based on the different possible outcomes of the target variable. • Target Class: The particular category of interest in a predictive classification study, e.g., the “bads”: defaulters on a loan. New observations ? ? Class1 Class 2 Q: Find examples of predictive classification and the targets.
10.1: Predictive Classification An Example: Predicting Ownership of Computers Cross-sectional Survey data • Ownership • Income • Education • Kids • etc
10.1: Predictive Classification • A simple predictive model: • Where Yt=1 if the household owns a computer=0 otherwise, • Xs are various explanatory variables, e.g. income. • Split the sample into a training (estimation) sample of 80%; Test Sample – 20% remaining. • Define: • X1=1 if the household income is level2 • X2=1 if the household income is level3 etc • Use 4 dummy variables to capture income effects Q: There are 5 income categories. How many dummy variables do we need?
10.1: Predictive Classification The results: A model for predicting household ownership of computers
10.1.1: Evaluating the Accuracy of the Predictions: The Classification Matrix I • The total sample N = (h00 + h01 + h10 + h11), • A 50 percent threshold, or cutoff, gives • The overall proportion classified as correct is (Instructor: Complete in class.)
10.1.1: Evaluating the Accuracy of the Predictions: The Classification Matrix II Target Event – households not owning computers as shown: • sensitivity = • measures the proportion of target events that are correctlypredicted (i.e., nonowners). • specificity= • measures the proportion of correctly predicted nontargetevents.
10.1.1: Evaluating the Accuracy of the Predictions Fig. 10.2 Data: Compdata.xlsx; adapted from SAS output. Graphs for helping interpret the classification accuracy.
10.1.1: Evaluation the Accuracy of a Prediction Receiver Operating Characteristic (ROC) Curve Fig 10.3 • Sensitivity • proportion of target events correctly predicted • 1-Specificity • proportion of non-target events falsely predicted SAS output Graphs for helping interpret the classification accuracy. Q: What do you observe? Data: Compdata.xlsx; adapted from SAS output.
10.1.1: Evaluating the Accuracy of the Predictions Using Profit or Loss • PG – profit (loss) from a correct classification of the target • PB – profit (loss) from miss-classifying a non-target as a target • Expected Profit(Cutoff ) = PGh11 + h01PB Q: For a Web-based insurance provider, what costs and profits should be considered in deciding whether to offer a policy to a new customer?
10.2: Classification and Regression Trees • Two leaves recommend granting the loan: • Credit score>600 & Employed • Credit score<600 and owning home
10.2: Classification and Regression Trees: Performance Measures II • Suppose: • Granting a bad load costs an average of $10K • Approving a good loan generates $2K • No loans: Total profit=0 • All loans approved: Total profit= ? Q: Does our classification model help? Table 10.5 Classification Matrix for Data on Bank Loans Note: different cut-offs or threshold values will give different classification matrices
10.2: Classification and Regression Trees: Performance Measures III • 1st level: Sensitivity= Specificity= Profit= • 2nd level: Sensitivity= Specificity= Profit= (Instructor to complete)
10.2: Classification and Regression Trees: Algorithms I • CHAID (Chi-squared Automatic Interaction Detection) At each step, CHAID chooses the independent (predictor) variable that has the strongest interaction [as measured by a chi-squared criterion] with the dependent variable. • Categories of each predictor are merged if they are not significantly different with respect to the dependent variable. • Different software packages have different approaches to operationalizing these steps.
10.2: Classification and Regression Trees: Algorithms II • If a split produces the same proportions of goods and bads it has zero value as a predictor. • No value in split No Association between predicted and actual. • Expected number of {Good/ Good} cases= • Observed value = 110, a difference of 6.
10.2: Classification and Regression Trees: Algorithms III • The chi-square statistic is: • If there was no value in the split this statistic would be small. • A formal chi-squared test at a 1% significance level shows this to be significant (Critical value with 1 df, 6.63).
10.2: Classification and Regression Trees: Computer ownership revisited • NB: Always examine performance on the test data set • No deterioration here
10.2: Classification and Regression Trees: Computer ownership revisited Data: Compdata.xlsx; adapted from SPSS output.
10.3: Logistic Regression: Computer example continued • Let Pi= Probability that household does not own a computer • Model: or • Where • A linear function of explanatory variables of ownership NB: Special programs including SPSS and SAS are needed to estimate this Unlike regression, there is no simple equation for the parameter estimates
10.2: Classification and Regression Trees: Computer example continued II Odds Ratio; last column – Exp(B) measures the proportional increase (decrease) in the odds against ownership, e.g. having kids increases the probability of ownership by 1/(0.516) or 94% relative to not having kids. Q: Does the parameter estimates make sense?
10.3: Logistic Regression Evaluating a logistic regression model • Examine the p-values of the model to simplify the model. • Don’t drop insignificant dummy variables that define a categorical variable, e.g. income. • Check residuals for outliers. • Is this a small sample of the training set. • Consider different specifications comparing their overall fit • Does the model work well across all values of Pi? • Does the model add value on the hold-out sample?
10.3: Logistic Regression: Two ways of defining the cut-off (threshold) for classification • The logistic model derives estimated probabilities of non-ownership for each individual in the training or test samples. • To decide whether to predict an individual as a non-owner (or owner) we must specify a cut-off. • Either choose a fixed proportion – say the 50% of sample with highest probability of being in the target (non-ownership) class. Or • Classify any individual with a prediction higher than some chosen probability, say 0.9. • These give different classification matrices.
10.3: Logistic Regression: Two ways of defining the cut-off (threshold) for classification II • 50% predicted as non-owners • Households with an estimated probability > 0.5 predicted as owners Q: What’s the difference between the two approaches here?
Take-Aways from Classification Methods • Consider a variety of different classification methods • Use a variety of error measures to evaluate the methods • In developing classification models, always retain a subset of the data with which to test the effectiveness of the proposed methods • Always compare the new method with a benchmark classification approach (The Basic Principle of Forecasting)
10.4: Neural Network Methods • Hidden layer • 3 hidden nodes Output Inputs Explanatory variables Neural Network: a non-linear transformation of inputs into output(s).
10.4: Neural Network Methods Issues in developing Neural Network Models • Which inputs to include • Number of hidden nodes (one hidden layer) • Data pre-processing to ensure pre-processed inputs between 0 & 1 • Choice of programs (no established standard) • Choice of parameters in non-linear estimation • Final selection of forecasts (combined or selected)
10.4: Neural Network Methods • The process of building a neural network • Choose the input variables. • Divide the sample into training (estimation), validation, and test (hold-out) data sets. • Standardized and coded the input variables (e.g., with dummy variables for categorical inputs). • Use different sets of starting values in estimation. • Remove unimportant variables. • Compare the results from using different numbers of hidden nodes. • Choose the neural network architecture that leads to the best performance on the validation data. • Use the performance on the test data to measure the likely future performance of the network.
10.4.1: A Cross-Sectional Neural Network Analysis Comparing different numbers of hidden nodes • Variables: • Income • Education • 1, 3, and 5 hidden nodes • Validation sample 30% • SAS output Data: Compdata.xlsx; adapted from SAS output. At lower cut-offs 3 or 5 hidden nodes better. Q: Why are the low cut-offs more relevant in this example?
10.4.2: A Time Series Neural Network Analysis Example I Fig 10.9: Retail Sales in the UK Source: www.ons.gov.uk/ons/rel/rsi/retail-sales/august-2011/tsd-retail-sales.html. Data shown is from file UK_retail_sales.xlsx. What characteristics can you identify?
10.4.2: A Time Series Example II: Error Measures for Different Hidden Nodes • Data pre-processing. Transform the data • 18 years for training (estimation) • 2 for validation, 2 for testing (forecasting out-of-sample) • Results compared using 1-10 hidden nodes on validation sample Data: UK_retail_sales.xlsx.
10.4.2: A Time Series Example III: Error Progression during Training Training stopped when little sign of improvements in validation data. Data: UK_retail_sales.xlsx.
10.4.2: A Time Series Example III: The Forecast Results Data: UK_retail_sales.xlsx. Q: How do you interpret these results?
Take-Aways for Neural Network Modeling • Scale the data before estimating the neural network model. • Use a variety of random starting values to avoid local minima. • Use the median forecasts from these results. • Use techniques such as validation samples to avoid overfitting.
10.5: Vector Autoregressive (VAR) Models • The simple autoregressive model: • The basic idea: Vector Variables (Yt)=f(lagged variables Yt)
10.5: Vector Autoregressive (VAR) Models Example: Oil prices Data: Gas_prices.xlsx. Q: What variables do you think influence unleaded pump prices?
A simple VAR of Unleaded, Crude and the CPI Source: Output adapted from EViews 7. Data: Gas_prices.xlsx. Note that for presentation purposes, the number of significant figures in the table should be much smaller.
A simple VAR of Unleaded, Crude and the CPI II Source: Output adapted from EViews 7. Data: Gas_prices.xlsx. Note that for presentation purposes, the number of significant figures in the table should be much smaller.
10.5: Vector Autoregressive (VAR) Models • Various criteria available • AIC – Akaike’s Information Criterion • SC –Bayesian (Schwarz) Information Criterion Note conflicting conclusions: SC usually points to simpler models, here with 3 lags.
10.5: Vector Autoregressive (VAR) Models Other VAR issues • Estimating the model • OLS • Producing the forecasts • With only lags they can be produced automatically (as with a simple AR model) • Exogenous variables may be added in • Avoids too many parameters • More intuitive models • But forecasting is conditional on the exogenous variables
10.5: Vector Autoregressive (VAR) Models Oil price example (cont.) Oil price example (cont.) The problem: the recession of 2008! 2nd order VAR best in sample but relatively poor outside sample Data: Gas_prices.xlsx.
Take-Aways for VAR Modeling • Use a VAR model (based on lags), as opposed to single equations based upon concurrent values of the explanatory variables. • Reduce the lag length on each variable separately. • Model the data in levels, rather than differences, initially. • Examine all the model’s variables for potential nonstationarity. • Use theoretical analysis, graphs, autocorrelations, and such to decide whether the variables are nonstationary. • Beware: Relying on tests alone is dangerous because these tests have weak power. • For nonstationary data, develop an Error Correction Model (ECM) model and compare it with an unrestricted model and a model in differences.
Appendix 10B: The Effects of Nonstationary Data Graph of log CPI, Unleaded, and Change in Log CPI • Definition: A time series is stationary if the mean, variance, and autocorrelation function are independent of time. Data: Gas_prices.xlsx; Standardization over the period 1996-2010.
Appendix 10B: The Effects of Nonstationary Data Exercise: • Generate two random variables with linear time trends: • Assume the error terms are normally distributed with zero means • Calculate the correlation between Y1and Y2for different choices of the error variances, the sample size, and β1 andβ2 • The values of the intercepts do not matter. Why? • With common trends it is all too easy to ‘find’ variables are related - the problem of spurious regression.
Appendix 10B: The Effects of Nonstationary Data An example of spurious regression: the two series are independent but both have expected values that are linear functions of time. The sample correlation is 0.91 Q: Can you identify examples of spuriously correlated series?
Appendix 10B: The Effects of Nonstationary Data: The problem with modelingnonstationary data II • One response: model in differences • A model in levels has implications for a model in differences and vice versa. • But a non-stationary variable, unleaded prices cannot be modeled by a stationary variable (CPI) alone and vice versa. The residuals would be autocorrelated.
Appendix 10C: Differencing and Unit Roots: The problem with modelingnonstationary data III • We need to identify a nonstationary time series to protect ourselves from ‘spurious regression’ • Deciding whether a time series is nonstationary • Look at it • Think about the variable • Test Augmented Dickey-Fuller test: =1 for nonstationarity Added constant: Added constant and trend: Three tests, all weak, i.e. poor at discriminating.
Appendix 10C: Differencing and Unit Roots: The problem with modelingnonstationary data IV • Constant and trend, results from EViews Table 10C.1 The Results from an Augmented Dickey-Fuller Test for Non-Stationarity of Ln(Crude) • Null hypothesis – data series is non-stationary • Alternative – series is stationary See details in Appendix 10.C