Fisheries Research 222 (2020) 105411
Contents lists available at ScienceDirect
Fisheries Research journal homepage: www.elsevier.com/locate/fishres
Technical note
Parameterizations for Bayesian state-space surplus production models a,
John K. Best *, André E. Punt a b
b
T
Quantitative Ecology and Resource Management, Box 357941, University of Washington, Seattle, WA, 98195-7941, USA School of Aquatic and Fisheries Sciences, Box 355020, University of Washington, Seattle, WA, 98195-5020, USA
ARTICLE INFO
ABSTRACT
Handled by: George A. Rose
Bayesian state-space surplus production models are commonly applied in fisheries stock assessment when the only information available is an index of relative abundance. However, even relatively simple models such as these can be computationally expensive to fit, and diagnosing poor fits can be difficult. The Stan software package provides an advanced Markov chain Monte Carlo sampler and diagnostics that are not available in other packages for fitting Bayesian models. Here the sampler diagnostics, efficiency, and posterior inferences are compared among multiple parameterizations of a state-space biomass dynamics model, using both PellaTomlinson and Schaefer dynamics. Two parameterizations that prevent predictions of negative biomass are introduced, one of which allows for errors in catch. None of the parameterizations used avoid diagnostic warnings using the default sampler parameter values. Choosing the appropriate parameterization of a model, and paying attention to these diagnostics can increase computational efficiency and make inferences more robust.
Keywords: Bayesian Markov chain Monte Carlo surplus production model diagnostics state-space Stan
1. Introduction
1.1. Posterior distributions and MCMC diagnostics
Bayesian state-space surplus production models are used extensively for assessing data-limited fisheries. These models have been fit using general-purpose Markov chain Monte Carlo [MCMC] software packages since the late 1990s (Meyer and Millar, 1999). Advances in MCMC algorithms promise both increased computational efficiency and new diagnostic tools, but model implementation specifics can affect computational costs as well as confidence in the resulting inferences. State space models are a class of hierarchical model where the quantity of interest cannot be observed directly, but must be inferred from noisy, transformed observations. The unobserved state changes through time based on a model of the system’s dynamics that includes a stochastic component. Each parameter of the observation and dynamics processes is given a prior. This completes the specification of the Bayesian model. The observations are measures of catch per unit effort [CPUE] or another measure of relative abundance. These are assumed proportional to stock biomass, which is the unobserved quantity of interest. Biomass changes through time based on a surplus production model such as the Pella-Tomlinson (Pella and Tomlinson, 1969). Priors can be specified based on information from closely related species, domain knowledge, or with a goal of being uninformative.
Posterior distributions for these state-space models cannot be determined analytically due to the nonlinearity of the production dynamics model and the large number of parameters. A common method for fitting these models is Markov chain Monte Carlo [MCMC]. Implementation of MCMC methods requires software that can be a research project in its own right. Consequently, many practitioners use general purpose MCMC packages, of which the Bayesian Inference Using Gibbs Sampling [BUGS] (Gilks et al., 1994) was an early example. While its popularity has declined in recent years, updated versions of BUGS such as WinBUGS (Lunn et al., 2000), OpenBUGS (Lunn et al., 2009), and Just Another Gibbs Sampler [JAGS] (Plummer, 2003) have seen wide use across many scientific disciplines (Monnahan et al., 2017). The aim of an MCMC algorithm is to generate samples from the posterior distribution, which can then be used for inference. Posterior samples are obtained by proposing a new set of parameter values, starting from the current set of parameter values. These are accepted or rejected based on the ratio of the posterior density at the proposed parameter vector and the current parameter vector, tending to move to parameter values with higher posterior density. This repeats, moving
⁎
Corresponding author. E-mail address:
[email protected] (J.K. Best).
https://doi.org/10.1016/j.fishres.2019.105411 Received 8 April 2019; Received in revised form 8 October 2019; Accepted 9 October 2019 0165-7836/ © 2019 Elsevier B.V. All rights reserved.
Fisheries Research 222 (2020) 105411
J.K. Best and A.E. Punt
through the parameter space. If proposed parameter values are always close to the current values, they will be accepted more often. The tradeoff is that it will take many more iterations to explore the whole parameter space. Thus, it is important to evaluate whether the posterior samples obtained are representative of the posterior before they are used for inference. Relatedly, autocorrelation among the draws must also be considered; higher correlation between subsequent posterior samples reduces the amount of information each sample provides about that parameter. An MCMC algorithm that can make proposals further from the current parameter values that are also likely to be accepted will be more efficient in sampling from the posterior distribution. One MCMC algorithm that can do this is the No-U-Turn Sampler1 [NUTS] (Hoffman and Gelman, 2014). The most popular implementation of NUTS is in the Stan software package (Carpenter et al., 2017). NUTS is a self-tuning variant of Hamiltonian Monte Carlo [HMC] (Neal, 2011), which uses an analogue to a physical system to generate MCMC proposals. A hypothetical particle starting at the current parameter values is given momentum. The path of the particle can then be found by solving a system of differential equations known as the Hamiltonian. The gradient of the log posterior density governs the change in potential energy and hence momentum of the hypothetical particle. Because the system of differential equations is solved numerically, the path of the particle (the parameter vector) is represented as a series of discrete steps. The step size (in time) is a key tuning parameter of HMC algorithms and determines the magnitude of discretization error, which results in a failure to conserve energy in the analogous physical system and reduces the probability of acceptance. Posteriors with large gradients require small steps to avoid large discretization errors and low acceptance rates. Posteriors with small gradients use larger step sizes to more efficiently explore the parameter space. NUTS uses a warmup phase to self-tune this parameter, accounting for the magnitudes of gradients encountered during the warmup. However, if the magnitude of the gradients changes substantially within the parameter space, the chosen step size is necessarily a compromise between discretization error and exploration efficiency. Proposal paths that enter parts of the parameter space with large gradients can accumulate so much discretization error that they will never be accepted. This is known as a divergent transition, and indicates that some areas of the posterior may be difficult to sample. The presence of divergent transitions provides a diagnostic tool for evaluating MCMC samples. The step size in NUTS can be tuned indirectly by adjusting the target acceptance rate, where higher target acceptance rates tend to result in shorter step sizes. This may eliminate divergent transitions if the increase in gradient is limited, but reparameterization is the only option if the gradient increase is unbounded. The second tuning parameter in HMC is the number of steps to take. Additional steps allow for proposals further from the current parameter vector, but at the cost of additional evaluations of the log posterior density and gradient. The number of steps is dynamic in NUTS. A single step is taken, then two, doubling each time until the path turns back on itself (a U-turn). Steps in the path where the U-turn occurs are not considered as potential proposals, which is where the sampler gets its name. The number of doublings is limited. The second diagnostic available from NUTS is a maximum treedepth exceedance, which occurs when the maximum number of doublings is met without a U-turn. This indicates that some area of the posterior is too flat to be efficiently explored using the step size chosen during the warmup phase. Although divergent transitions can only be indicated when using variants of HMC, they indicate a posterior geometry that most samplers will have difficulty sampling from. Riemannian manifold Hamiltonian Monte Carlo
(Girolami and Calderhead 2011) can account for changing gradients throughout the posterior, but general-purpose implementations are not currently available. 1.2. Bayesian state-space surplus production models Bayesian state-space surplus production models are often used for stock assessments of fisheries where data are limited to catches and indices of abundance such as CPUE. These models are applied to pelagic species such as swordfish (e.g., ICCAT, 2017a), blue shark (e.g., ISC, 2017), shortfin mako shark (ICCAT, 2017b), and albacore (ICCAT, 2017c). They have also been used for invertebrates such as tiger prawns (Zhou et al., 2010) and marine mammals such as fin whales (Moore and Barlow, 2011). Software packages such as BSP2 (McAllister, 2014) and Just Another Bayesian Biomass Assessment [JABBA] (Winker et al., 2018) simplify the formulation, fitting, and evaluation of these models. The JABBA uses general-purpose MCMC software; its model is specified in the BUGS language and fit using JAGS. Surplus production models such as the Pella-Tomlinson (Pella and Tomlinson, 1969) used here specify a functional relationship for population change between consecutive years. These models are typically chosen for their simplicity and for their behavior in the regimes they expect to see, such as when a population is at or below carrying capacity. However, these models can exhibit pathological behavior in some situations, which manifests as predictions of negative biomass for subsequent time steps. Predictions of negative biomass can occur if the current estimate of biomass is less than the observed catch, or if the current biomass is so large that the density dependence term predicts negative production larger in magnitude than the current biomass less catch. One obvious solution to predictions of negative biomass is to set any negative predictions to some minimum value. Practically, this is also important when assuming log-normal process and observation errors (as in the models presented here), because the natural logarithm of the predicted biomass is needed to compute the likelihood function, and must be greater than zero. The minimum value is typically chosen arbitrarily. Sometimes upper bounds are included as well, also arbitrarily (e.g., Meyer and Millar, 1999). An extension of this is the use of soft bounds that penalize estimates outside a particular range. This is an option in JABBA. Two novel approaches to preventing predictions of negative biomass are presented here. The first approach estimates the fishing mortality rate for each year and allows for errors in observed catch. Actual catch is then automatically constrained to be less than the available biomass, preventing predictions of negative biomass. This can be used regardless of the surplus production model. The second approach uses the intrinsic growth rate, carrying capacity, and catch in the next time step to dynamically set bounds on the current depletion (biomass expressed relative to carrying capacity), preventing negative predictions. Bounds can be calculated analytically when a Schaefer surplus production model (Schaefer, 1954) is used. Other forms such as the Pella-Tomlinson model would require a numerical root-finding step, and are not considered here It is more important to deal with this issue in variants of HMC than in other MCMC samplers because the proposal paths that Stan builds up can reach areas of the parameter space that have low posterior density on the way to areas that are more likely to be accepted. If a proposal path encounters an impossible set of parameter values, it is stopped and cannot continue building the proposal path. It is good practice to specify models that constrain parameters to possible values. The most common case of this is constraining variance parameters to be positive. Techniques such as noncentering (Betancourt and Girolami, 2015; Papaspiliopoulos et al., 2007) can be used to overcome divergent transitions. This kind of reparameterization can reduce parameter correlation and gradient changes through the parameter space. Other
1 See Monnahan et al. (2017) for an introductory description of Hamiltonian Monte Carlo and NUTS. Betancourt (2017) provides an in-depth description of both the theory and implementation of these algorithms.
2
Fisheries Research 222 (2020) 105411
J.K. Best and A.E. Punt
parameterizations commonly used in fisheries involve marginalizing out one or more parameter, as presented in Walters and Ludwig (1994).
1
The relationship between m and PMSY is given by PMSY = m m 1 . Similar species have used a fixed shape parameter value corresponding to a depletion at maximum sustainable yield of 0.4 (e.g. Winker et al., 2018). For this analysis a vaguely informative prior was placed on m by examining the implicit prior on PMSY . It was parameterized such that the median of the implicit prior on PMSY is 0.4 and allows a wide range of values of PMSY while penalizing small values of m that imply implausible growth rates at low population sizes. The prior is specified as log(m) skew normal( 1/2, 1, 10), where the three parameters of the skew normal distribution are location, scale, and shape respectively. See Supplementary Appendix B for more details. Here the posterior density is specified in terms of m , but a prior distribution is specified for log(m) . Scalar transformation of a random variable requires a correction to the posterior density equal to the absolute value of the derivative of the transformation function (Gelman et al., 2013, pg. 21). Multivariate transformations use the determinant of the matrix of first partial derivatives of each transformation function with respect to each parameter, known as the Jacobian. The log-transformation here for example places all of the probability mass in the interval < log(m) < 0 in the interval 0 < m < 1, while the probability mass in the interval 0 < log(m) < ends up in the interval 1 < m < . This is accounted for by multiplying the posterior density by a factor of log(m) = 1/ m . Note that linear transformations will have a corm rection factor that is constant, so no correction is required in the context of MCMC where posterior density is only required up to a constant of proportionality. The posterior distribution of the parameter values combines the information from each of these components. Observation likelihoods are assumed independent conditional on the process model, and process errors are also assumed independent. Although the statistical model is fully defined here, there are multiple parameterizations to consider. Each has specific performance characteristics of in terms of diagnostics and computational efficiency. The six parameterizations are summarized in Table 1.
1.3. The aim of this paper Five parameterizations of a Bayesian state-space model with a PellaTomlinson surplus production model are fit to catch and effort data for South Atlantic albacore. This is done both while estimating the PellaTomlinson shape parameter, and while fixing it to a value where depletion at maximum sustainable yield is 0.4. The same five parameterizations plus a parameterization that constrains depletion are fit when the surplus production model has the Schaefer form. We evaluate the sampler diagnostics for each parameterization, and how to improve these. We also compare the computational efficiency of the parameterizations. 2. Methods 2.1. Model specification The nonlinear Bayesian state-space surplus production model is specified as a hierarchical model. The observed catch per unit effort It at time t is assumed to be proportional to the true abundance through the catchability coefficient q. Further, catch per unit effort is assumed to be observed with log-normal error with variance parameter 2 , so that: 2
It Bt , q,
log-normal(log[qBt ],
2)
t = 1, …, T .
(1)
The Pella-Tomlinson surplus production model is used to model the population dynamics. The intrinsic growth rate is denoted r and carrying capacity K . The Pella-Tomlinson shape parameter is m . T consecutive years of catch data are assumed to be available, where Ct is the catch observed at time t . Biomass at time t is Bt . Depletion (Pt , where Pt = Bt / K ) is the unobserved state; this separates the estimation of dynamics from that of carrying capacity. Multiplicative (log-normal) process error with variance parameter 2 is assumed for each year, with independence among years. Median depletion in year t is P˜t . Median depletion of the unfished population (i.e., P˜1) is assumed to be 1. Independent log-normal process errors with variance parameter 2 are included for each year. The population dynamics process is then:
P˜1 = 1 P˜t = Pt
1
Pt P˜t ,
2
+
r m
1
Pt
1 (1
Ptm 1 1)
log-normal(logP˜t ,
2)
Ct 1 t = 2, …, T K t = 1, …, T .
(2)
2.1.1. Centered model The centered model corresponds directly to the model specified above. To avoid estimates of negative depletion and attempting to take the log of a negative number, a lower bound of 0.001 is placed on the median depletion, i.e. equation (3) is modified to:
(3)
P + P˜t = max t 1 m 0.001
log-normal( 1.38, 0.512 )
K p (q )
log-normal(5.04, 0.51622) 1 q
1
Pt
1 (1
Ptm 1 1)
Ct 1 , t = 2, …, T . K
(4) 2.1.2. Noncentered process noise Noncentering (Papaspiliopoulos et al., 2007) is a technique that is commonly used in additive hierarchical models to improve MCMC
Priors match those used in Meyer and Millar (1999), which were based on a review of the literature for South Atlantic albacore and an attempt to match particular quantiles (as described in the appendix of Meyer and Millar (1999)). Catchability is given an improper, noninformative prior. The priors are
r
r
Table 1 Features of each parameterization. The first column indicates whether process error was centered (C) or noncentered (NC). The “P > 0 ” column indicates the strategy used to prevent predictions of negative depletion, either by replacing negative predictions with a small positive biomass (Small val.), explicitly estimating fishing mortality (Expl. F), or truncating above and below at the values where depletion becomes negative (Pos. trunc.). The fourth column indicates whether catchability (q ) was analytically marginalized out, and the final column whether the parameterization was fit only with Schaefer dynamics.
(5) (6) (7)
2
inverse gamma(3.79,0.0102)
(8)
2
inverse gamma(1.71,0.0086).
(9)
Models where the Pella-Tomlinson shape parameter (m in equation (3)) is estimated require an additional prior. This parameter can be difficult to estimate (Fletcher, 1978), and is often fixed by choosing a plausible relative biomass at which maximum sustainable yield is achieved, PMSY . 3
Parameterization
Process error
P>0
Centered Noncentered Marginal q Explicit F Explicit F marg q Constrained P
C NC C C C C
Small val. Small val. Small val. Expl. F Expl. F Pos. trunc.
Marginalized q
Schaefer only
× ×
×
Fisheries Research 222 (2020) 105411
J.K. Best and A.E. Punt
sampler efficiency and reduce potential for parameter bias (Betancourt and Girolami, 2015). In a hierarchical model, noncentering decorrelates parameters from the variance parameter of those parameters. In this case, the process noise is noncentered so that equations (2), (3), and (4) are replaced by:
*
and the variance parameter of the catch observation likelihood is 2
[Pt
r
+
1
m
0.001
ut
normal(0,1)
1
Pt
1 (1
Ptm 1 1)
Ct 1 ]exp( ut ), K
Ft t = 2, …, T
'
t
qˆ =
It ) Pt K
2.1.5. Explicit fishing mortality with marginalized catchability Initial model fits indicated that the parameter limiting efficiency in the explicit fishing mortality parameterization was catchability. An additional parameterization combines the previous two parameterizations, estimating instantaneous fishing mortality and marginalizing out catchability. That is, the index observation likelihood is given by equations (10), (11), and (12), and the catch observation likelihood by equations (15) and (16). Biomass dynamics are governed by equations (13) and (14). The state likelihood is given by equation (4), and priors by equations (5), (6), (8), (9), (17), and (18).
t = 1, …, T .
Zt
T
t = 1, …, T ,
(10)
,
2.1.6. Constrained depletion Continuous harvest necessarily implies that the population has not been driven extinct. This knowledge allows us to calculate bounds on the depletion parameters. These bounds can be derived analytically for the Schaefer surplus production model, a special case of the PellaTomlinson with m = 2. For the Schaefer surplus production model and given values of Pt 1, r , K , and Ct , solving the inequality
(11)
and
Zt
normal(qˆ' ,
2)
(12)
t = 1, …, T .
Here qˆ acts as an estimate of the mean of q = log(q) . With catchability marginalized out, an explicit prior is no longer necessary, and equation (7) is no longer included in the calculation of the posterior density. The rest of the structure from the centered parameterization remains the same.
Pt
P˜t = Pt
1
+
m
1
Pt
1 (1
Ptm 1 1) (1
exp(Ft )) t = 2, …, T .
Ct
r m
1
Pt (1
log-normal(logCt*,
2)
(15)
t = 1, …, T .
(16)
Pt 1)
Ct > 0 K
(19)
4rCt K
(1 + r )2
< Pt
1
<
1+r+
(1 + r )2 2r
4rCt K
. (20)
The Pella-Tomlinson surplus production model with m 2 does not generally allow for an analytic solution to these bounds, so the constrained depletion parameterization is not considered for specifications that do not use Schaefer surplus production. In this parameterization, equation (4) is modified to enforce the constraints in equation (20). Because these bounds depend on other random variables, a Jacobian correction is required. This contributes T factors of
JCt =
(14)
Ptm 1)]exp(Ft )
1 (1
2r
(1 + r ) 2 r
4rCt K
t = 1, …, T
to the posterior density.
Thus, predicted depletion can only be negative due to density dependence because fishing only ever takes a proportion of the existing biomass. This parameterization also allows for errors in catch observations. This contributes an additional term to the posterior density, where
Ct* = K [Pt +
+ rPt
1+r
(13)
r
1
for Pt 1 provides constraints that ensure the subsequent predicted median depletion will be positive. The left hand side of equation (19) is quadratic in Pt 1, so the bounds can be found using the quadratic formula. This gives
2.1.4. Explicit fishing mortality Another way to eliminate predictions of negative biomass is to estimate the fishing mortality rate for each year, and include the likelihood of the observed catch given the instantaneous fishing mortality. A similar parameterization was presented in the context of a continuous-time model by Pedersen and Berg (2017), but has not been considered in a discrete-time model. Assuming multiplicative errors, a log-normal observation likelihood is specified for reported catch. Because true catch is based on an estimated fishing mortality rate, it is important to consider the biomass that is being fished. Here fishing occurs on the biomass after production. For instantaneous fishing mortality Ft , we replace equations (2) and (3) with
P˜1 = 1
(18)
Exponential(1) t = 1, …, T .
See Supplementary Appendix B for details.
2.1.3. Marginalized catchability An uninformative prior is placed on catchability in equation (7), using Jeffreys’ reference prior (Millar, 2002). This prior takes a form that is amenable to analytic marginalization. Following Walters and Ludwig (1994) the observation likelihood, equation (1) above, is replaced by
Zt = log(
= log[( *)2 + 1].
Each Ft receives a prior corresponding to a uniform prior on the proportion of biomass removed by fishing each year,
P1 = exp( u1) P˜t = max
(17)
Exponential(46)
2.2. Data The data set (available in Supplementary Appendix A) has been published several times, including by Polacheck et al. (1993) and Meyer and Millar (1999). It includes 23 years of observed catch and catch per unit effort data from the south Atlantic albacore fishery between 1967 and 1989. Catch is provided in thousands of tonnes while catch per unit effort has units of kilograms per 100 hooks. A classic “one way trip” is exhibited where catch-per-unit-effort decreases over time (Fig. 1).
Inspired by the penalized complexity prior framework (Simpson et al., 2017), an exponential prior is placed on the coefficient of variation of catch error, * . This shrinks the model towards a simpler model with no error in catch observations. The rate of the exponential prior on * was chosen so that Pr( * < 0.05) = 0.9. Then
2.3. Fitting the parameterizations The first five model parameterizations (Table 1) were fit with the 4
Fisheries Research 222 (2020) 105411
J.K. Best and A.E. Punt
Fig. 1. Catch and CPUE series for the South Atlantic albacore fishery from 1967 to 1989.
Pella-Tomlinson shape parameter m estimated and then again with it fixed. Additionally, all six parameterizations were fit with m fixed at 2 to specify a Schaefer surplus production model. A range of target acceptance rates between 0.8 and 0.999 , where the default value in Stan is 0.8, were used to evaluate the tradeoff between number of divergent transitions and sampling efficiency. Initial runs indicated that the two explicit F parameterizations consistently exceeded the default maximum treedepth of 10, particularly at target acceptance rates closer to 1, so the maximum treedepth was increased to 20. There was no evidence that the increased treedepth was used by the other parameterizations. Four chains of 30,000 iterations each were run, with the first 5,000 designated as warmup. This leaves 100,000 posterior samples for each fitted model. Each fit was performed using Stan v2.19.2 through the rstan interface (Stan Development Team, 2019) in R v3.6.1 (R Core Team, 2019). The Stan model code and R code used to produce these results are available at https://github.com/jkbest2/tunabayes/tree/v1. 0.
be trusted. The iterations and time elapsed during the warmup phase are not considered. To compare posterior distributions, the model fit for each parameterization with the fewest divergent transitions were considered. If more than one fit had zero divergent transitions, the fit with the most efficient sampling was chosen for each combination of surplus production model (Pella-Tomlinson with estimated m , Pella-Tomlinson with fixed m , or Schaefer) and parameterization. 3. Results Five parameterizations of the state-space model with a PellaTomlinson surplus production model were fit, once with an estimated shape parameter and once with a fixed shape parameter. Six parameterizations were fit with a Schaefer surplus production model; the five with a Pella-Tomlinson surplus production model and the constrained P parameterization. Each model was fit using a range of target acceptance rates. Sampler diagnostics, computational efficiency, and posterior distributions are compared among these.
2.4. Comparing parameterizations Parameterizations and target acceptance rates were compared based on the number of divergent transitions and then on computational efficiency. The number of divergent transitions is saved as part of the MCMC output in Stan. Computational efficiency is less straightforward to measure. The NUTS algorithm requires many more evaluations of the posterior density per MCMC iteration than a traditional Metropolis sampler, but tends to produce chains with less autocorrelation. Chains with more effectively independent samples reduce Monte Carlo error more quickly, and so do not need to be run for as long. To account for both decreased within-chain autocorrelation and additional posterior density evaluations, an appropriate measure of computational efficiency is the rate that effectively independent samples are produced. For d independent chains of length n , the effective sample size (ESS) can be calculated as (Gelman et al., 2013, p. 286)
ESS =
dn 1 + 2 t=1
, t
3.1. Sampler diagnostics Fig. 2 shows the number of divergent transitions for each model fit. Increasing the target acceptance rate tends to decrease the number of divergent transitions as expected. The centered and marginal q parameterization were fit with zero divergent transitions for all three surplus production models, as did the constrained P for the Schaefer surplus production model. All three of these parameterizations generated divergent transitions at the default target acceptance rate. The noncentered parameterization generated divergent transitions for the Pella-Tomlinson model with an estimated shape parameter and the Schaefer surplus production model. Both explicit F parameterizations generated divergent transitions for all three surplus production models at all target acceptance rates. The number of divergent transitions does not generally appear to be affected by the type of surplus production model.
(21)
3.2. Parameterization efficiency
where t is the lag-t autocorrelation. In practice, empirical autocorrelations and rules for truncating the infinite sum in the denominator of equation (21) are applied. Further details are available in Section 11.5 of Gelman et al. (2013). This comparison is over the minimum of any of the sampled parameters. Low effective sample sizes for any parameter indicate that inferences for all parameters should not
The primary measure of efficiency is the rate at which effectively independent samples are produced. Fig. 3 shows the general decrease in efficiency as target acceptance rate increases. The ranking of the models by efficiency is generally stable, with the marginal q parameterization most efficient. The centered and noncentered parameterizations are 5
Fisheries Research 222 (2020) 105411
J.K. Best and A.E. Punt
Fig. 2. Number of divergent transitions out of 100,000 total posterior samples. Note the different scales in each row. Fits with zero divergent transitions are plotted with filled circles. For a full-colour rendering of this figure, please see the web version of this article.
comparable, with the centered consistently more efficient when the Schaefer surplus production model is used. However, the noncentered parameterization suffers from divergent transitions at higher target acceptance rates. The constrained P parameterization behaves similarly, but is consistently around an order of magnitude slower than the marginal q parameterization. The two explicit F parameterizations are very slow. Neither is consistently more efficient, and divergent transitions are present even at the highest target acceptance rate.
3.3. Posterior distributions The fit with the fewest divergent transitions and most efficient sampling was chosen to compare posterior distributions. The maximum potential scale reduction factor (Rˆ ) (Gelman et al., 2013) over all parameters was less than 1.02. Parameterizations with no divergent transitions have strikingly similar posterior distributions of both the biomass series (Fig. 4) and the quantities of management interest
Fig. 3. Sampling efficiency. Note the logarithmic scale. Model fits with zero divergent transitions are plotted as filled circles. For a full-colour rendering of this figure, please see the web version of this article. 6
Fisheries Research 222 (2020) 105411
J.K. Best and A.E. Punt
Fig. 4. Posterior biomass series, using fits with the fewest divergent transitions. Fits with divergent transitions are faded out. The median and 50% and 90% credible intervals are plotted for each year of each included fit. For a full-colour rendering of this figure, please see the web version of this article.
Fig. 5. Posterior distributions of depletion in 1990, FMSY, and MSY. These are based on the samples from the fit with the fewest divergent transitions and most efficient sampling. Fits with divergent transitions are faded out. The median and 50% and 90% credible intervals are shown. For a full-colour rendering of this figure, please see the web version of this article.
7
Fisheries Research 222 (2020) 105411
J.K. Best and A.E. Punt
(Fig. 5). Small variations in the locations of the quantiles for these parameterizations, particularly the 10% and 90% quantiles, are apparent. These variations are small in comparison to the width of their associated credible intervals. Extreme quantiles are also subject to higher Monte Carlo estimation error than central quantiles or expectations. This similarity is expected, as each parameterization is fitting the same statistical model. The posterior distributions of parameterizations with divergent transitions are sometimes very similar to those without divergent transitions, but sometimes show substantial differences. The models with Pella-Tomlinson surplus production generally estimate lower biomass values. This is not surprising, as the prior on the shape parameter has a median less than 2, and the fixed value is also less than 2. This results in higher surplus production at lower biomass relative to the Schaefer model. Interestingly, including uncertainty in the Pella-Tomlinson shape parameter does not appear to result in wider credible intervals. The credible intervals of the Schaefer model fits appear slightly larger than those from the Pella-Tomlinson models. FMSY and MSY show a similar trend across the surplus production models, highest when the Pella-Tomlinson shape parameter is estimated and lowest when the Schaefer surplus production model is used (Fig. 5). Credible intervals are also larger when the Pella-Tomlinson shape parameter is estimated. Depletion in 1990, on the other hand, is comparable across the surplus production models, as is its uncertainty. The posterior distributions of biomass and management quantities from the two explicit F parameterizations are substantially different from the other four parameterizations (Fig. 4, Fig. 5). Carrying capacity is decreased and maximum population growth rate is increased (Supplementary Appendix C, Fig. C.1). The Pella-Tomlinson shape parameter is also shifted to lower values (Supplementary Appendix C, Fig. C.2) in models where it is estimated. Taken together, these result in management quantities that are substantially different from the other parameterizations, including larger values of maximum sustainable yield and fishing mortality at maximum sustainable yield (Fig. 5). Posterior distributions for additional select parameters are presented in Supplementary Appendix C.
additional T fishing mortality rate parameters that were fit in these models. The ability to include errors in catch observations is a major advantage of the explicit F parameterizations. A reparameterization to eliminate divergent transitions when fitting the explicit F parameterizations will be required before it can be recommended. The differences between the posterior distributions of the two explicit F parameterization and the other four parameterizations are likely due to the prior on instantaneous fishing mortality rate (F ). The exponential prior used here was not as uninformative as intended. It induced a uniform prior on the fraction of biomass removed by fishing, but shrinks each fishing mortality rate towards zero. This results in a larger fraction of the biomass removed each year while the observed catch remains the same. The marginal posterior distributions of a number of other parameters demonstrate a cascade of effects, resulting in very different population dynamics and management quantities. Prior distributions for these parameters must be specified carefully to avoid this cascade of effects. The constrained P parameterization has the advantage of biologically-motivated constraints to prevent predictions of negative depletion. Unfortunately, this complicates the code and substantially decreases sampler efficiency. It is also limited to the Schaefer surplus production model in its current analytic form. The choice of how to prevent negative biomass predictions is likely data set dependent. For example, upper constraints are probably only required where populations may reach sizes much larger than carrying capacity as a result of process error deviations. Using only the lower constraint from the constrained P parameterization may reduce the computational burden enough to make it more appropriate. In general, if a practitioner encounters warnings about divergent transitions, the target acceptance rate should be increased in an attempt to eliminate them. If the divergent transitions cannot be eliminated by increasing the target acceptance rate, alternative parameterizations should be considered. This will generally decrease sampler efficiency, but avoids the potential for erroneous posterior distributions without other warnings, as in the posteriors for the explicit F marg q under the Pella-Tomlinson surplus production model in Fig. 5. Increasing the target acceptance rate will generally decrease sampler efficiency. Because Stan models are compiled to executables, multiple short fits can be run quickly to tune the sampler parameters. Model-fitting techniques that utilize gradient information have been used in fisheries for decades, starting with AD Model Builder [ADMB] (Fournier et al., 2012) and more recently Template Model Builder [TMB] (Kristensen et al., 2016). These software packages have focused on maximum likelihood estimation. Stan provides both a model specification language and the NUTS sampler. Recent work allows models coded for TMB to be fit using the NUTS sampler in Stan via the tmbstan R package (Monnahan and Kristensen, 2018). A NUTS sampler for models coded for ADMB is provided by the adnuts R package (Monnahan and Kristensen, 2018). Given the common use of gradient information, these parameterizations may also be useful when fitting models with maximum likelihood, especially because ADMB and TMB are more restrictive of models that include branches based on parameter values such as if statements. Fisheries can benefit greatly from advances in MCMC algorithms, but it is important to understand how to get the most out of them. Increasing the efficiency of fitting these models is particularly relevant in situations such as management strategy evaluation, where surplus production models may be fit thousands of times. Understanding how to most efficiently fit models using these MCMC algorithms is only getting more important. This paper highlights diagnostics and techniques that can, and should, be regularly used when applying MCMC algorithms to fisheries modelling.
4. Discussion Divergent transitions were only eliminated in the centered, marginal q, and constrained P parameterizations for all of the surplus production models where they are used. This is not surprising given the underlying similarity of these parameterizations. The noncentered parameterization only eliminated divergent transitions under the Pella-Tomlinson surplus production model when the shape parameter was fixed, and demonstrated no efficiency advantage over the centered parameterization. Neither explicit F parameterization eliminated all divergent transitions, and some of the posterior distributions under these were much different from those without divergent transitions. This is concerning given the acceptable Rˆ values associated with these fits and the fact that multiple parameterizations are rarely fit for comparison. Efficiency varied over multiple orders of magnitude among the parameterizations. Unsurprisingly, efficiency was closely tied to the number of parameters to be estimated. Marginalizing out catchability increased efficiency, while estimating fishing mortality rates for each year was substantially less efficient. Estimating the Pella-Tomlinson shape parameter also reduced efficiency, but less than moving to a different parameterization. Analytically marginalizing out parameters is expected to increase efficiency. The difficulty is that this marginalization is specific to the combination of likelihood and prior. In this case, an uninformative prior with a log-normal observation likelihood was amenable to marginalization. If more informative priors were specified, there is no guarantee that the marginalization would be analytic. On the other hand, the explicit F parameterizations showed no difference whether catchability was marginalized out or not. This may be due to the
Declaration of Competing Interest The authors declare that they have no known competing financial 8
Fisheries Research 222 (2020) 105411
J.K. Best and A.E. Punt
interests or personal relationships that could have appeared to influence the work reported in this paper.
Kristensen, K., Nielsen, A., Berg, C.W., Skaug, H., Bell, B., 2016. TMB: automatic differentiation and Laplace approximation. Journal of Statistical Software 70. https://doi. org/10.18637/jss.v070.i05. Lunn, D.J., Thomas, A., Best, N., Spiegelhalter, D., 2000. WinBUGS - a Bayesian modelling framework: Concepts, structure, and extensibility. Stat. Comput. 10, 325–337. https://doi.org/10.1023/A:1008929526011. Lunn, D., Spiegelhalter, D., Thomas, A., Best, N., 2009. The BUGS project: Evolution, critique and future directions. Stat. Med. 28, 3049–3067. https://doi.org/10.1002/ sim.3680. McAllister, M.K., 2014. A generalized Bayesian surplus production stock assessment software (BSP2). Collect. Vol. Sci. Pap. ICCAT 70, 1725–1757. Meyer, R., Millar, R.B., 1999. BUGS in Bayesian stock assessments. Can. J. Fish. Aquat. Sci. 56, 1078–1087. https://doi.org/10.1139/f99-043. Millar, R.B., 2002. Reference priors for Bayesian fisheries models. Can. J. Fish. Aquat. Sci. 59, 1492–1502. https://doi.org/10.1139/f02-108. Monnahan, C.C., Thorson, J.T., Branch, T.A., 2017. Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo. Methods Ecol. Evol. 8, 339–348. https:// doi.org/10.1111/2041-210X.12681. Monnahan, C.C., Kristensen, K., 2018. No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages. PLOS ONE 13 (5), e0197954. https://doi.org/10.1371/journal.pone.0197954. Moore, J.E., Barlow, J., 2011. Bayesian state-space model of fin whale abundance trends from a 1991–2008 time series of line-transect surveys in the California Current. J. Appl. Ecol. 48, 1195–1205. https://doi.org/10.1111/j.1365-2664.2011.02018.x. Neal, R.M., 2011. MCMC using Hamiltonian dynamics. In: Brooks, S., Gelman, A., Jones, G.L., Meng, X.-L. (Eds.), Handbook of Markov Chain Monte Carlo, Handbooks of Modern Statistical Methods. Taylor & Francis, Boca Raton, pp. 113–162. Papaspiliopoulos, O., Roberts, G.O., Sköld, M., 2007. A general framework for the parametrization of hierarchical models. Stat. Sci. 22, 59–73. https://doi.org/10.1214/ 088342307000000014. Pedersen, M.W., Berg, C.W., 2017. A stochastic surplus production model in continuous time. Fish and Fisheries 18, 226–243. https://doi.org/10.1111/faf.12174. Pella, J.J., Tomlinson, P.K., 1969. A generalized stock production model. Inter-Am. Trop. Tuna Comm. Bull. 13, 421–458. Plummer, M., 2003. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing. Presented at the International Workshop on Distributed Statistical Computing. Vienna, Austria. . https://www.r-project.org/conferences/ DSC-2003/Proceedings/Plummer.pdf. Polacheck, T., Hilborn, R., Punt, A.E., 1993. Fitting surplus production models: Comparing methods and measuring uncertainty. Can. J. Fish. Aquat. Sci. 50, 2597–2607. https://doi.org/10.1139/f93-284. R Core Team, 2019. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.r-project.org/. Schaefer, M.B., 1954. Some aspect of the dynamics of populations important to the management of the commercial marine fisheries. Inter-Am. Trop. Tuna Comm. Bull. 1, 27–56. https://doi.org/10.1017/CBO9781107415324.004. Simpson, D., Rue, H., Riebler, A., Martins, T.G., Sørbye, S.H., 2017. Penalising model component complexity: a principled, practical approach to constructing priors. Stat. Sci. 32, 1–28. https://doi.org/10.1214/16-STS576. Stan Development Team, 2019. RStan: The R interface to Stan. R package version 2.19.2. http://mc-stan.org/. Walters, C., Ludwig, D., 1994. Calculation of Bayes posterior probability distributions for key population parameters. Can. J. Fish. Aquat. Sci. 51, 713–722. https://doi.org/10. 1139/f94-071. Winker, H., Carvalho, F., Kapur, M., 2018. JABBA: Just Another Bayesian Biomass Assessment. Fish. Res. 204, 275–288. https://doi.org/10.1016/j.fishres.2018.03.010. Zhou, S., Punt, A.E., Deng, R., Dichmont, C.M., Ye, Y., Bishop, J., 2010. Modified hierarchical Bayesian biomass dynamics models for assessment of short-lived invertebrates: A comparison for tropical tiger prawns. Mar. Freshw. Res. 60, 1298–1308. https://doi.org/10.1071/MF09022.
Acknowledgements This publication is funded by the Joint Institute for the Study of the Atmosphere and Ocean (JISAO) under NOAA Cooperative Agreement NA15OAR4320063, Contribution No. 2019-1031. Discussions with Jim Thorson and participants in the Fisheries Think Tank seminars contributed to the development of this research. An earlier draft of this paper was substantially improved thanks to comments from two anonymous reviewers. Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.fishres.2019.105411. References Betancourt, M., 2017. A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint. arXiv 1701 (02434) [stat]. http://arxiv.org/abs/1701.02434. Betancourt, M.J., Girolami, M., 2015. Hamiltonian Monte Carlo for hierarchical models. In: Upadhyay, S.K., Singh, U., Dey, D.K., Loganathan, A. (Eds.), Current Trends in Bayesian Methodology with Applications. CRC press, Boca Raton, FL, pp. 79–101. https://doi.org/10.1201/b18502-5. Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddell, A., 2017. Stan: A probabilistic programming language. J. Stat. Softw. https://doi.org/10.18637/jss.v076.i01. Fletcher, R.I., 1978. Time-dependent solutions and efficient parameters for stock-production models. Fish. Bull. 76, 377–388. Fournier, D.A., Skaug, H.J., Ancheta, J., Ianelli, J., Magnusson, A., Maunder, M.N., Nielsen, A., Sibert, J., 2012. AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optimization Methods and Software 27, 233–249. https://doi.org/10.1080/10556788.2011. 597854. Gelman, A., Stern, H.S., Carlin, J.B., Dunson, D.B., Vehtari, A., Rubin, D.B., 2013. Bayesian Data Analysis 3rd ed. Chapman and Hall/CRC. Gilks, W.R., Thomas, A., Spiegelhalter, D.J., 1994. A language and program for complex Bayesian modelling. J. Royal Stat. Soc. D 43, 169–177. https://doi.org/10.2307/ 2348941. Girolami, M., Calderhead, B., 2011. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73, 123–214. https://doi.org/10.1111/j.1467-9868.2010.00765.x. Hoffman, M.D., Gelman, A., 2014. The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623. ICCAT, 2017a. Report of the 2017 ICCAT Atlantic swordfish stock assessment session. Collect. Vol. Sci. Pap. ICCAT 74, 841–967. ICCAT, 2017b. Report of the 2017 ICCAT shortfin mako stock assessment meeting. Collect. Vol. Sci. Pap. ICCAT 74, 1465–1561. ICCAT, 2017c. Report of the 2017 ICCAT albacore species group intersessional meeting (including assessment of Mediterranean albacore). Collect. Vol. Sci. Pap. ICCAT 74, 508–583. ISC, 2017. Stock assessment and future projections of blue shark in the North Pacific Ocean through 2015. International Scientific Committee for tuna and tuna-like species in the North Pacific Ocean, pp. 96.
9