460 likes | 596 Views
Example 1. Modelling catches of fish. The data file “catch-data.txt” contains observed catches of haddock from the North Sea. The data are measured with error, and the catches are expected to be auto-correlated. The true (unobserved) catches follow a simple auto-regressive model:
E N D
Example 1. Modelling catches of fish • The data file “catch-data.txt” contains observed catches of haddock from the North Sea. The data are measured with error, and the catches are expected to be auto-correlated. The true (unobserved) catches follow a simple auto-regressive model: • Where ε represents a process error that characterises the variability in the catches. • Assuming a log-normal error distribution for the observed catches Ŷ and a normal distribution for ε, write a WinBUGS program to estimate the fitted catches and the two error distributions. For the measurement error associated with Ŷ use a gamma prior for the precision, Gamma(.1,.1). Use the same prior for the process error. • Plot the fitted and observed catches as a time series. • Comment on the difference between process and measurement error.
model{ # set priors on model parameters tau.l~dgamma(.1,.1) # measurement error for catch tau.proc~dgamma(.1,.1) # process error for total catch log.y[y1]~dunif(10,15) # prior for initial total catch fit.l[y1]<-exp(log.y[y1]) # model fitted values for(i in y1+1:y2){ dy[i]~dnorm(0,tau.proc) # random error on total catch fit.l[i]<-fit.l[i-1]*exp(dy[i]) # multiplicative error on catch } # likelihoods for(i in y1:y2){ log.l[i]<-log(fit.l[i]) L[i]~dlnorm(log.l[i],tau.l) #lognormal errors for catches } } # end model # inital values list(tau.proc=100) Example code 1a
Example 1b Although the gamma distribution is commonly used as a non-informative prior for inverse variances Gelman(2006) suggests that it is sometimes better to use a uniform prior on the standard deviation. Re-run the model but for the process error use a uniform prior U(0,100) on the standard deviation; for example: sigma.proc~dunif(0,100) tau.proc<-1/pow(sigma.proc,2) What do you notice about the change in the estimate of the precision? Look at the history of the process error estimates. How well has the error been estimated?
Example 1c The total catches modelled above comprise two components- the landings which are recorded by regulatory authorities, and fish discarded at sea, the “discards”, that are usually estimated by observers on fishing vessels. The discards will have a different measurement error so the error estimated in part (a) or (b) above will not adequately capture this uncertainty. If we assume that the landings, D, are a proportion, p, of the total catch Y then: And the landings, L, are: Catch data for landings and discards are in the file “had-discard-data.txt”. Modify your WinBUGS code to take account of the two catch components with different error distributions for landings and discards. You should model the proportion,pt, as an annual random effect. A suitable prior for pt would be Beta(1,1). • Fit your revised model and make a note of the DIC. • Plot the fitted and observed landings as a time series. Also plot the fitted and observed proportion discarded as a time series. What do you notice? • Comment in the difference in the estimated error distribution for landings and discards. Is this what you expect to see?
model{ # set priors on model parameters tau.l~dgamma(.1,.1) # measurement error for landings tau.d~dgamma(.1,.1) # measurement error for discards tau.proc~dgamma(.1,.1) # process error for total catch log.y[y1]~dunif(10,15) # prior for initial total catch y.fit[y1]<-exp(log.y[y1]) for(i in y1:y2){ p[i]~dunif(0,1) # prior for proportion discarded each year } # model fitted values fit.l[y1]<-y.fit[y1]*(1-p[y1]) fit.d[y1]<-y.fit[y1]-fit.l[y1] for(i in y1+1:y2){ dy[i]~dnorm(0,tau.proc) # random error on total catch y.fit[i]<-y.fit[i-1]*exp(dy[i]) # multiplicative error on catch fit.l[i]<-y.fit[i]*(1-p[i]) # fitted landings fit.d[i]<-y.fit[i]-fit.l[i]# fitted discards } # likelihoods for(i in y1:y2){ log.l[i]<-log(fit.l[i]) log.d[i]<-log(fit.d[i]) L[i]~dlnorm(log.l[i],tau.l) #lognormal errors for landings and discards D[i]~dlnorm(log.d[i],tau.d) } } # end model Example code 1c
Example 1d • Modelling the proportions as random effects allows the model a lot of flexibility to fit the discard data which we know are subject to a large error. An alternative would be to use a similar time series smoothing approach as we did for the total catch. However, proportions are awkward because the scale is limited to the interval (0,1). One way around this problem is to use a logit transformation on p which has an approximately normal distribution. Hence we could write: Where ε is normally distributed. Check that this transformation does indeed approximate to a normal distribution by running the model: model{ p~dbeta(1,1) logitp<-logit(p) } And plotting the densities of p and logitp. Revise your model from section (a) to model the proportions as a time series on a logit scale. Note that logit(p)=log[p/(1-p)]. You can write logit(p) on the left hand side of a statement in WinBUGS so that you do not have to code the transformation yourself. For example: logit(p[i])<-logit(p[i-1])+eps[i] • Refit the model and plot the fitted and observed values of landings, discards and p as a time series. • Compare the DIC values of the two models. Which one is the preferred model?
model{ # set priors on model parameters tau.l~dgamma(.1,.1) # measurement error for landings tau.d~dgamma(.1,.1) # measurement error for discards tau.proc~dgamma(.1,.1) # process error for total catch tau.p~dgamma(.1,.1) # process error for proportion discarded log.y[y1]~dunif(10,15) # prior for initial total catch y.fit[y1]<-exp(log.y[y1]) p[y1]~dunif(.01,.9) # prior for proportion discarded in start year # model fitted values fit.l[y1]<-y.fit[y1]*(1-p[y1]) fit.d[y1]<-y.fit[y1]-fit.l[y1] for(i in y1+1:y2){ dy[i]~dnorm(0,tau.proc) # random error on total catch dp[i]~dnorm(0,tau.p) # random error on proportion discarded y.fit[i]<-y.fit[i-1]*exp(dy[i]) # multiplicative error on catch logit(p[i])<-logit(p[i-1])+dp[i] # error added to logit of proportion discarded fit.l[i]<-y.fit[i]*(1-p[i]) # fitted landings fit.d[i]<-y.fit[i]-fit.l[i] # fitted discards } # likelihoods for(i in y1:y2){ log.l[i]<-log(fit.l[i]) log.d[i]<-log(fit.d[i]) L[i]~dlnorm(log.l[i],tau.l) #lognormal errors for landings and discards D[i]~dlnorm(log.d[i],tau.d) } } # end model Example cod 1d
Example 1e In the data file replace the last year of landings and discard data with “NA” to simulate missing values. • Now refit the model but monitor both the fitted values and the missing data values. • Compare the standard deviations of the fitted values and the estimates of the missing values. Can you explain why these differ? • How do the predicted values compare with what was actually observed?
Missing values Missing values take into account the sampling distribution of the observed discards hence larger interval estimate.
Example 2. Analysis of multiple indices of abundance • The file “index-data-log.txt” contains indices of log abundance of the bonnethead shark from the East coast of the USA. The variable uhat is the log index of abundance. Each column represents a different survey and the rows correspond to years. Missing values are represented by “NA”. Plot the time series of uhat for each series. Do the various indices show the same trends and variability?
Example 2b The file “index-conn-full.txt” contains the WinBUGS code to fit the model proposed by Conn(2009) that extracts the common population signal in order to summarise the development of the biomass of the sharks over time. Examine the code to identify the nodes that correspond to the log population biomass and the survey process error. Fit the model and monitor the nodes you have identified, also calculating the DIC. Plot the fitted median biomass and 95% CI as a time series as well as the raw data from series 8 in the data file. Comment on the results. Which survey shows the lowest process error? • Hint: if you only need to monitor one column in a matrix you can specify it as a vector such as v[,3] to monitor column 3 only.
model { # prior for survey process error for(i in 1:s){ sigma.s[i]~dunif(0,1000) var.s[i]<-pow(sigma.s[i],2) # variance of index process error } b.mean<-mean(uhat[,sref]) # estimate of mean for biomass in the scale of the reference index # priors on catchability. for(iin 1:(sref-1)){ q[i]~dunif(-10,10) } for(iin (sref+1):s){ q[i]~dunif(-10,10) } q[sref]<-0 # set catchability on reference series to zero (log(q)) for identifiability # derive fitted index for (j in 1:year){ b[j]~dnorm(b.mean,.1) # annual populations are independent random variables B[j]<-exp(b[j]) } for (i in 1:s){ for(j in syr[i,1]:syr[i,2]){ u[j,i]<-q[i]+b[j] # fitted index values on log scale tau.s[j,i]<-1/(var.s[i]+1/u.prec[j,i]) # precision of observed values uhat[j,i]~dnorm(u[j,i],tau.s[j,i]) # likelihood U[j,i]<-exp(u[j,i]) } } # end } list(sigma.s=c(1,1,1,1,1,1,1,1,1) Multiple survey code
Example 2c The fitted biomass index in (b) reflects the variability of the observed survey index, especially in the earlier years when only one index is available. It is reasonable to assume that that the time series of abundance is auto-correlated. Modify the WinBUGS code to model the underlying biomass abundance as the simple time series model as was used in Example 1, and refit the model (remember to estimate the DIC). • Plot the new estimated biomass values and compare it to the values from the earlier model. • Is there an improvement in the model and why?
model { # priors for(i in 1:s){ sigma.s[i]~dunif(0,5) # process error for surveys var.s[i]<-pow(sigma.s[i],2) } b.mean<-mean(uhat[,sref]) sigma.proc~dunif(0,10) tau.proc<-1/pow(sigma.proc,2) # priors on catchability. for(iin 1:(sref-1)){ q[i]~dunif(-10,10) } for(iin (sref+1):s){ q[i]~dunif(-10,10) } q[sref]<-0 # set catchability on reference series to zero (log(q)) for identifiability # derive fitted index b[1]~dnorm(b.mean,.1) # initial population for (j in 1:year){ e.proc[j]~dnorm(0,tau.proc) b[j+1]<-b[j]+e.proc[j] # time series errors B[j]<-exp(b[j]) } # likelihood for (i in 1:s){ for(j in syr[i,1]:syr[i,2]){ u[j,i]<-q[i]+b[j] tau.s[j,i]<-1/(var.s[i]+1/u.prec[j,i]) uhat[j,i]~dnorm(u[j,i],tau.s[j,i]) # likelihood U[j,i]<-exp(u[j,i]) } } # end } list(sigma.proc=.5,sigma.s=c(1,1,1,1,1,1,1,1,1)) Example 2c
DIC • Modified model= 304.206 • Base model= 322.251
Example 3. Analysis of cod in seal diet The file “seals-cod-data.txt” contains data on the abundance of cod at each age in the sea for 1985 and 2002. It also contains the number of cod at each age in the diet of grey seals for the same years. The simplest model to explain the number of cod in the seal diet is that the proportion of cod in the diet is the same as the proportion of cod in the sea. WinBUGS code to fit such a model is in the file “seals-cod-base.txt”. Assuming that the observed numbers of cod in the diet are drawn from a multinomial distribution add the two lines of WinBUGS code to the model file to specify the likelihood for the data.
model{ # priors for population abundance for(j in 1:2){ for( i in 1:A){ log.n[j,i]<-log(n[j,i]) n.fit[j,i]~dlnorm(log.n[j,i],tau.n[i]) # abundance index } } # selectivity for seals for(j in 1:2){ for(i in 1:A){ s.sel[j,i]<-1+0*len[j,i] # len is dummy term to avoid data read error from file } } # calculate seal catch for(j in 1:2){ for(i in 1:A){ catch[j,i]<-n.fit[j,i]*s.sel[j,i] } } # proportion of fish at age in seal diet for(j in 1:2){ for(i in 1:A){ prop[j,i]<-catch[j,i]/sum(catch[j,1:A]) fit.comp[j,i]<-seal.sample[j]*prop[j,i] } } #Likelihoods for(j in 1:2){ seal.sample[j]<-sum(sealcomp[j,1:A]) # total number of fish in sample sealcomp[j,1:A]~dmulti(prop[j,1:A],seal.sample[j]) # seal diet from multinomial } } end model # initial value list(mode=45) Base model: new code
Example 3b and c Run the model you have completed in part 1 on the data and monitor the fitted numbers of cod in the seal diet (fit.comp). Also set WinBUGS to calculate the DIC. Plot the fitted numbers in the diet against the observed numbers. Do you consider this a good fit to the data? From the results in 2 calculate the residual (observed-fitted) for the numbers of cod in the seal diet and plot these against age. Do you notice anything about the residuals? Does this change your perception of how well the model fits the data?
Example 3d In the base model seal selectivity (s.sel) was set to one for all age groups. An alternative model is that this selectivity is related to size according to a gamma function: Where α and β are parameters and len is the length of fish at each age. Modify the WinBUGS code from the base model so that s.sel is a function of length. You will need to specify priors for α and β. A suitable prior for α is N(20,.1). Rather than specify a prior for β it is more convenient to specify the size at maximum selectivity, the mode of the selection curve, and note that β=mode/(α -1). A possible prior for the mode is N(45,.01) which represents weak knowledge of the mode of the distribution. Repeat steps 2 and 3 with your revised model. You should give an initial value to the mode of 45. Which model best explains the numbers of cod in the seal diet?
Example 3d code model{ # priors mode~dnorm(45,.01) mode.prior~dnorm(45,.01) # keep track of prior distribution alpha~dnorm(20,.1) for(j in 1:2){ for( i in 1:A){ log.n[j,i]<-log(n[j,i]) n.fit[j,i]~dlnorm(log.n[j,i],tau.n[i]) # abundance index } } # gamma selectivity curve for seals beta<-mode/(alpha-1) for(j in 1:2){ for(i in 1:A){ s.sel[j,i]<-pow(len[j,i]/((alpha-1)*beta),(alpha-1))*exp(alpha-1-len[j,i]/beta) } } # calculate seal catch for(j in 1:2){ for(i in 1:A){ catch[j,i]<-n.fit[j,i]*s.sel[j,i] } } # proportion of fish at age in seal diet for(j in 1:2){ for(i in 1:A){ prop[j,i]<-catch[j,i]/sum(catch[j,1:A]) fit.comp[j,i]<-seal.sample[j]*prop[j,i] } }
Selectivity model DIC Base: 57.844 Selectivity: 37.266
Example 4. Analysis of a fish stock The file “stock-model.txt” contains WinBUGS code to fit a standard age-structured population model to data on catch and an abundance index. The model will estimate mortality rates due to fishing and populations of fish at each age by year. Load the data in the file “hadVIa-data1.txt” and run the model and monitor the nodes for fishing mortality “F”and “n.fit[,1]” the log of recruitment. Make sure you load the initial values listed at the end of the model code. Plot the median and 95%CI for these values. • What do you notice about the width of the 95%CIs? • Can you think of an explanation for these characteristics?
model { # priors q~dunif(-10,0) # catchability of survey tau.f~dgamma(.1,.1) # process error for fishing effort f[1]~dunif(-2,0) # prior on initial value of fishing effort sel[1]~dbeta(1,1) # uniform prior in selectivity at age 1 for (i in 1:A){ tau.u[i]~dgamma(0.1,0.1) # measurement error for index tau.c[i]~dgamma(0.1,0.1) # measurement error for catch numbers } for (i in 1:Y){ n.fit[i,1]~dunif(3,20) # prior for annual recruitment at age 1 } for (i in 2:A){ n.fit[1,i]~dunif(3,20) # prior for populations in first year } M<-0.2 # set selectivity at age for(i in 2:A){ sel[i]<-1 } # fill natural mortality matrix for (i in 1:A){ for (j in 1:Y){ m[j,i]<-M # treats natural mortality as a known constant } } Stock model code
Example 4b The model assumes that the mortality due to natural causes, M, is a fixed constant equal to 0.2, the conventional value often used by fishery scientists. Modify the code so that M for each age group is treated as an unknown parameter to be estimated by the model. A suitable prior for M is a uniform distribution U(0.1,0.8). Load the data in the file “hadVIa-data1.txt”, compile the model and load the initial values for q and tau.f listed at the end of the code. Run the model for 10000 iterations and monitor M. • After fitting the model have the posterior distributions of M changed? • Is there any information in the data to estimate M?
Example 4c In a meta-analysis of fish stocks worldwide, Lorenzen (1996) established a relationship between body size and natural mortality described by the function: Where b and c are constants, and w is the weight in kilogrammes. Lorenzen estimated the mean and precision for c as N(3.69,4) and b as N(-0.305,1250). Modify your model code to describe M as a function of weight using Lorenzen’s estimates of b and c as priors for the constants in the equation. Using the data in the file “hadVIa-data2.txt”, re-run the model and monitor natural mortality at age “m”, b and c. You should also include in your code statements to store the prior distributions of b and c such as: b.prior~dnorm(-0.305,1250) c.prior~dnorm(3.69,4) If you monitor the priors you will be able to compare the priors with the posteriors of b and c. • How much have the priors been updated? • How do the values of m compare with the values you estimated in part 1? • Do you think that the analysis using this relationship is to be preferred over the assumption of M=0.2 and why?
Example 4d An extremely useful property of WinBUGS is that it makes the calculation of derived quantities and their associated credible intervals very easy. Fishery scientists and managers are often interested in the total biomass of fish of reproductive age (the spawning stock biomass or SSB) since this is a measure of the ability of the stock to replace itself. SSB is a derived quantity calculated from the model estimates of numbers at age. It can be calculated from the formula Where N is the number at age, p is the proportion mature and w is the weight at age. Modify the WinBUGS code to calculate SSB for the haddock stock. The data are in the file “hadVIa3.txt”, where the variables “mat” and “wt” corresponds to p and w. Re-run the model and monitor SSB. Plot the time series of SSB and its associated 95% CI. • How might you do this calculation using a conventional likelihood approach?
Example 4e Very often in environmental science it is necessary to establish whether a particular quantity exceeds a threshold value. For example, standards are often set for the acceptable concentration of a contaminant in river water. Measurements of such quantities inevitably are made with error, so it is important to quantify any uncertainty associated with exceeding the threshold. In fisheries management the regulatory bodies are concerned about the probability that the stock is greater than a minimum SSB and of being below a maximum fishing mortality rate (F). For the stock to be in a good state it should be above the SBB threshold and below the F threshold. WinBUGS makes it very easy to calculate these probabilities using the “step” function. The two thresholds for haddock are SSB=150000 and F=0.3. Modify your code to calculate the probability of exceeding the SSB target and being below the F target in the most recent year (year 10) and then re-run the model to monitor these nodes. Example code might be: ps<-step(SSB[10]-150000) pf<-step(0.3-F[10]) Plot the distribution of ps and pf. • What do you conclude about the status of the stock in relation to the reference values? • Instead of using the step function and the additional WinBUGS code, can you think of another way of performing the same calculation by storing the MCMC results from WinBUGS?
Reference points node mean sd MC error 2.5% median 97.5% start sample pf 0.201 0.4007 0.02376 0.0 0.0 1.0 4001 6000 ps 0.8792 0.3259 0.01461 0.0 1.0 1.0 4001 6000