330 likes | 456 Views
Forecasting Choices. Types of Variable. Continuous. Quantitative. Discrete (counting). Variable. Ordinal. Qualitative. Nominal. Nominal or Ordinal Dependent Variable. Indicating “choices” of a decision maker, say a consumer. Response categories: Mutually exclusive
E N D
Types of Variable Continuous Quantitative Discrete (counting) Variable Ordinal Qualitative Nominal
Nominal or Ordinal Dependent Variable • Indicating “choices” of a decision maker, say a consumer. • Response categories: • Mutually exclusive • Collectively exhaustive • Finite Number • Desired regression outputs • Probability that the d.m. chooses each category • Coefficient of each independent variable
Generalized Linear Models (GLM) • Regression model for a continuous Y: Y = b0 + b1X1 + b2X2 + e , e following N(0, s) • GLM Formulation: • Model for Y: Y is N(m, s) • Link Function (model for the predictors) m = b0 + b1X1 + b2X2
Estimation of Parameters of GLM • Maximum Likelihood Estimation • For normal Y, MLE is the LS estimation • Maximize: • Sum of log (likelihood function), Li of each observation
MLE for Regression Model • Y is N(m, s) • MLE: Maximize
GLM for Binary Dependent Variable, Y • Model for response: Y is B (n, p) • Model for predictors (Link Function) logit(p) = b0 + b1X1 + b2X2 +… bKXK = g • Probability p = exp(g) / (1+exp(g))
X : Covariates • Independent variables are often referred to as “covariates.” • Example: • SPSS binary logistic regression routine • SPSS multinomial logistic regression routine
A. Logistic Regression For Ungrouped Data (ni=1) • Model of Observation for the i-th observation Yi = 1: Choose category 1 with probability pi Yi = 0: Choose category 2 with probability 1- pi • Log Likelihood Function for the i-th observation
MLE • Maximize:
Link Function, gi Parameters of the Likelihood ln(Likelihood) Li Setting Up a Worksheet for MLE • Define an array for storing parameters of the link function. Enter an initial estimate for each parameter. Then for each observation: • Sum the likelihood and invoke the solver to maximize by changing the parameters. • Multiply –2 to the maximized value for test of significance of the regression
Test of Significance • Hypotheses: H0: b1 = b2 …. bK = 0 H1: At least one bj = 0 • Test statistic: • The Distribution Under H0: c2(DF = K)
Standard Errors of Logistic Regression Coefficients (optional) • Estimate of Information Matrix, I (K=2)
Deviance Residuals and Deviance for Logistic Regression (Optional) • Deviance (corresponds to SSE) • Deviance Residual
B. Logistic Regression for Grouped Data Using WLS • The observation for the i-th group: -> -> ->
WLS for Logistic Regression • Regress: on X1i, …, XKi with
WLS for Unequal Variance Data 2 * Y * * 1 * Observation 2 is subject to a larger variance than observation 1. So, it makes sense to give a lower weight. In WLS, the weight is proportional to 1/variance. * X
Modeling of Forecasting Choices - GLM • Model for Observation of the Dependent Variable. A probability distribution • Link Function (Model for Independent Variables) A mathematical function
Forecasting Choices Binomial Distr. 2 # of Choices Multinomial Distr. > 2 Unordered Ordered
Multinomial Logit Regression • Multinomial Choice (m=3) , Ungrouped Data: • Y1=1: Choose category 1 with probability p1 • Y1=0: Choose category 2 or 3 with probability 1- p1 • Y2=1: Choose category 2 with probability p2 • Y2=0: Choose category 1 or 3 with probability 1- p2 • Y3=1: Choose category 3 with probability p3 • Y3=0: Choose category 1 or 2 with probability 1- p3
Log Likelihood Function • Log Likelihood Function of the i-th ungrouped observation • MLE: Maximize
Y3 and p3 can be omitted • Multinomial Choice (m=3) , Ungrouped Data: • Y1=1: Choose category 1 with probability p1 • Y1=0: Choose category 2 or 3 with probability 1- p1 • Y2=1: Choose category 2 with probability p2 • Y2=0: Choose category 1 or 3 with probability 1- p2
Log Likelihood Function • Log Likelihood Function of the i-th (ungrouped) observation • MLE: Maximize
1: Formulating “Link” Functions: Unordered Choice Categories • Category 3 as the baseline category.
Test of Significance • Hypotheses: H0: b11 = b21 = … bK1 = b12 = b22 = … bK2 = 0 H1: At least one bij = 0 • Test statistic • The Distribution Under H0: c2(DF = 2 K)
Interpreting Coefficients • Not easy, as a change of probability for one category affects probabilities for other (two) categories.
2: Formulating Link Functions: Ordered Choice Categories Category 1 Category 2 Category 3 g1 g2 Underlying Variable Defining Categories
Choices for Probability Distribution of U a. Ordered Probit Model for the i-th DM Ui = follows N(mi, s=1) b. Ordered Logit Model for the i-th DM Ui follows Logistic Distribution(mi) • mi = b1X1i + b2X2i (no const)
Types of Variable Continuous Quantitative Discrete (counting) Variable Ordinal Qualitative Nominal
Poisson Regression for Counting • Model of observations for Y • Link Function • Log Likelihood Function