Fitting models to data – II (The Basics of Maximum Likelihood Estimation)

Fitting models to data – II(The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9

The Principle of ML Estimation • We wish to select the values for the parameters so that the probability that the model generated (is responsible for) the data is a high as possible. • Taken another way: if we have two candidate sets of parameters and the probability that one generated the data is ten times the other, we would naturally prefer the former. • OK, so how to we define this probability.

The Likelihood Function • What we need to compute is the likelihood function: • If we have a discrete set of hypotheses / set of parameter vectors, then

A First Example • We observe Y=6 and know that the observation process is based on the equation: • Given Y=6, the likelihood function is normal:

A First Example - II Y=4 Y=6 Note: the parameter and not the data; we are given the data

Multiple Data Sources • If we have multiple data sources (CPUE and survey data for Cape Hake), we can establish a likelihood for each data source. The likelihood for the two data sources combined is the product of the likelihoods for each data source: • Note: We often work with the logarithm of the likelihood function, i.e.:

Likelihood Estimation • Identify the questions. • Identity the data sources. • Select alternative models. • Select appropriate likelihood functions for each data source. • Find the values for the parameters that maximize the likelihood function (hence Maximum Likelihood Estimation).

Finding the Maximum Likelihood Estimates The best estimate is 6, because this value of  leads to the maximum likelihood

Therefore…. We need to know which probability density functions to use for which data types. • The probability distributions encountered most commonly are: • Normal / multivariate normal • t • Log-normal • Poisson • Negative binomial • Beta • Binomial / multinomial You need to know when to use each distribution and its functional form (up to any normalizing constants).

The Normal and t-distributions • The density functions for the normal and t-distributions are: •  is the mean •  is the standard deviation ( for the t) • k is the degrees of freedom. • We use these distributions when the data are the sum of terms. The t-distribution allows account to be taken of small sample sizes (<30).

The Normal and t-distributions

Key Point with Normal Likelihood Let us say we wish to fit the model assuming normally distributed errors, i.e. The likelihood function is therefore: Taking logarithms and multiplying by -1 gives: This is implies that if you assume normally-distributed errors, the answers will be identical to those from least squares.

Time for an Example! • We wish to fit the Dynamic Schaefer model to the bowhead census data. • q is assumed to be 1 here because the surveys provide absolute indices of abundance. • We have information on the trend in abundance from 1978-93 (increase of 3.2% per annum (SD 0.76%) based on 8 data points). • We have an estimate of abundance for 1993 of 7800 (SD 564).

How to Deal with this Example! • The model: • The likelihood function is the product of a normal likelihood (for the abundance estimate) and a t-likelihood (for the trend). Ignoring constants independent of the model parameters: • We take logs, multiply by minus one and minimize to find the estimates for K and r. • Note that we can ignore any constants – why? • The t-distribution is chosen for the slope – why?

The Outcome B1993=7710 Slope78-93=2.95%

The Lognormal distribution • The density function: •  is the median (not the mean) •  is the standard deviation of the logarithm (approximately the coefficient of variation of x). • The lognormal distribution is used extensively in fisheries assessments because x is always larger than zero – this is true for most data sources (CPUE, survey indices, estimates of death rates, etc.)

The Multivariate Normal-I • The density function: • is the vector of means. • is the variance-covariance matrix. • d is the length of the vector. • This isn’t nearly as bad as it looks.

The Multivariate Normal-II • We use the multivariate normal when the data points are correlated (e.g. surveys with common correction factors). For example for bowheads:

Readings • Hilborn and Mangel (1997); Chapter 7 • Haddon (2001), Chapter 4

Fitting models to data – II (The Basics of Maximum Likelihood Estimation)