Model Uncertainty and Model Selection

Model Uncertainty and Model Selection Fish 458, Lecture 13

Overview • Models are hypotheses regarding “how the world could work”. There are usually several competing models ranging from very simple to very complicated. • Some important results (e.g. extinction risk – do we have environmental variation in deaths?) will be sensitive to model structure. • Complex models explain the data better but may provide poor forecasts. • Classical statistics emphasizes estimation uncertainty. However, many would argue that model uncertainty is more important in practice (e.g. which of the data series for northern cod should have been used for assessment purposes).

Complexity vs Simplicity-I We wish to approximate a function using a histogram based on 100 points. How many bins should we choose? Too many – imprecise. Too few - biased

Complexity vs Simplicity-II • Too few parameters, we can’t capture the true model adequately – error due to approximation. • Too many parameters, we can’t estimate them adequately – error due to estimation. • The “optimal” number of parameters depends on the amount of data.

Complexity vs Simplicity-III • Consider approximating N(100,252) using a histogram. We define a “discrepancy” between the predicted and true distributions using:

Complexity vs Simplicity-IV The optimal number of bins increases with N

Model Selection • Model selection can be seen as evaluating the weight of evidence in favor of each hypothesis and using this to select among the hypotheses.

Model Selection (Nested Models) • A model is nested within another model if it is a special case of that model, e.g. • We can compare nested models (model Bis nested within model A) using the likelihood ratio test: • R, the likelihood ratio, is 2 distributed with number of degrees of freedom equal to the difference in parameters between models A and B.

Back to cod-I • Some alternative hypotheses : • The Base case model (1) is nested within models 2 and 3. Models 4 and 5 are nested within model 1.

Back to cod-II Log-likelihood (not the negative-log-likelhood)

Model Selection(non-nested models) • The likelihood ratio test can only be applied to compare nested models. However, we often wish to compare non-nested models. We use the Akaike Information Criterion (AIC) to make such comparisons. • We compute the AIC (AICc for small sample sizes) for each model and choose that which has the lowest AIC.

Model Selection(non-nested models) • Choose the model with the lowest value of AIC. • Note that the data, Y, are the same for all models.

Comparing Growth Curves-I • We wish to compare between the von Bertalanffy and logistic growth curves for some simulated data (the true model is the von Bertalanffy curve). • We generate 100 data sets based on the von Bertalanffy growth curve for various values for  and count the fraction of cases the von Bertalnffy curve is chosen correctly .

Comparing Growth Curves-II Likelihoods (p=4): Von Bert = 20.25 Logistic = 11.30 CV=0.2

Comparing Growth Curves-III • The probability of correctly selecting the von Bertalanffy growth curve depends on  (and the sample size). • Checking the reliability of model selection methods by simulation is often worth doing.

Model Selection – Miscellany-I • All model selection methods are based on the assumption that the likelihood function is correct. This may well not be the case. • Neither likelihood ratio nor AIC can be used to compare models that have different likelihood functions / use different data. • Check the residuals about the fits to the data for all models – it may be that none of the models are fitting the data. Model selection makes little sense if none of the models fit the data.

Model Selection – Miscellany - II • Rejecting models is not always a sensible thing to do. In some cases (e.g. examining the consequences of future management actions), consideration should be given to retaining complicated models even if they don’t provide “significant” improvements in fit. • “Model averaging” (e.g. giving a weight to each model – say proportional to exp(AIC)) allows consideration of model uncertainty.

Model Selection – Miscellany - III • Always plot the fits of the different models. Even if one model is significantly better than another, the improved fit may be qualitatively “insubstantial”. • Some models that fit the data better do not provide more “realistic” results (e.g. estimating M often leads to values for M of 0). • Likelihood ratio and AIC are frequentist approaches. Bayesian techniques are also available for model selection.

Readings • Hilborn and Mangel, Chapter 7 • Haddon, Chapter 3 • Linhart and Zuchini (1986) • Burnham and Anderson (1998). • Quinn and Deriso, Section 4.5

Model Uncertainty and Model Selection