Chapter 3: Part c – Parameter Estimation

Chapter 3: Part c – Parameter Estimation • We will be discussing • Nonlinear Parameter Estimation • Maximum Likelihood Parameter Estimation • (These topics are needed for Chapters 9, 12, 14 and 15)

Why Do We Need Nonlinear Parameter Estimation? With the Linear Model, y = X + e,we end up with a closed form, algebraic solution. Sometimes there is no algebraic solution for the unknowns in a Marketing Model Suppose the data depend in a nonlinear way on an unknown parameter , lets say y = f() + e To minimize e′e, we need to find the spot at which de′e/d = 0. But if there is no way to get  by itself on one side of an equation and stuff that we know on the other…. .

Steps to the Algorithm of Nonlinear Estimation • We take a stab at the unknown, inventing a starting value for it. • We assess the derivative of the objective function at the current value of . If the • derivative is not zero, we modify  by moving it in the direction in which the derivative • getting closer to 0. We keep repeating this step until the derivative arrives at zero.

f 2 1  A Picture of Nonlinear Estimation If the derivative is positive, we should move to the left (go more negative) If the derivative is negative, we should move to the right (go more positive) This suggests the rule:

A Brief Introduction to Maximum Likelihood • ML is an alternative philosophy to Least Squares. • If ML estimators exist, they will be consistent • If ML estimators exist they will be normally distributed. • If ML estimators exist, they will be asymptotically efficient. • ML leads to a Chi Square test of the model • The Covariance Matrix for ML estimators can be calculated from the second order derivatives. • Marketing Scientists really like ML estimators.

The Likelihood Principle We wish to maximize the probability of the data given the model. We will start with the example of estimating the population mean, . 1.0 Pr(x) x 0  Assume we draw a sample of 3 values, x1 = 4, x2 = 5 and x3 = 6.

The Likelihood of The Sample What would be the likelihood of observing x1, x2 and x3 given that  = 212? How about if  = 5? With ML we choose an estimate for  that maximizes the likelihood of the sample. The sample that we observed was presumably more likely on average than the samples that we did not observe. We should make its probability as large as possible.

Steps to ML Estimation Derive the probability of an observation given the parameters, Pr(yi | ). Derive the likelihood of the sample, which typically involves multiplication when we assume independent sampling, Derive the likelihood under the general alternative that the data are arbitrary. Pick elements of the unknown parameter vector  so that is as small as possible.

Chapter 3: Part c – Parameter Estimation