Identifying growth morphs from mixtures of size-at-age data

Fisheries Research 185 (2017) 83–89 Contents lists available at ScienceDirect Fisheries Research journal homepage: www.elsevier.com/locate/fishres ...

Download PDF

1MB Sizes 0 Downloads 42 Views

Report

PDF Reader
Full Text

Fisheries Research 185 (2017) 83–89

Contents lists available at ScienceDirect

Fisheries Research journal homepage: www.elsevier.com/locate/fishres

Full length article

Identifying growth morphs from mixtures of size-at-age data Kyle W. Shertzer a,∗ , John Fieberg b , Jennifer C. Potts a , Michael L. Burton a a b

National Marine Fisheries Service, Southeast Fisheries Science Center, Beaufort, NC, United States Department of Fisheries, Wildlife, and Conservation Biology, University of Minnesota, St. Paul, MN, United States

a r t i c l e

i n f o

Article history: Received 30 June 2016 Received in revised form 28 September 2016 Accepted 30 September 2016 Handled by A.E. Punt Keywords: Bayesian Cubera snapper Mixture models Somatic growth Unsupervised classiﬁcation von Bertalanffy

a b s t r a c t Somatic growth is critical to the biology of individuals and to population dynamics. Variability in size at age can often be attributed to the existence of distinct groups, or growth morphs, that differ in their growth trajectories. We develop a framework for identifying multiple growth morphs from mixture data, with utility for describing somatic growth at the population level as well as for classifying individuals into their most likely groups. For illustration, growth trajectories are modeled using the von Bertalanffy function, but the framework is general enough to accommodate any suitable growth function. After describing the framework, we demonstrate proof of concept using a simulation study, and then apply the proposed method to size-at-age data for Cubera snapper Lutjanus cyanopterus. In addition, we compare several Bayesian model selection criteria for inferring the unknown, underlying number of morphs. Published by Elsevier B.V.

1. Introduction Understanding somatic growth is fundamental to ﬁshery science. Growth is linked through life-history evolution to reproductive output and natural mortality (Charnov et al., 2013), and these processes are primary drivers of population dynamics. In ﬁsh stock assessments, growth models are used to convert between numbers and biomass (in the population and in the catches), interacting with estimation of recruitment, natural mortality, ﬁshing mortality, and selectivity of the ﬁshing or survey gear. Because of their importance to ﬁshery science and consequently resource management, growth models have received much attention in the primary literature (e.g., Maunder et al., 2016). Numerous models have been proposed to describe ﬁsh growth (Quinn and Deriso, 1999; Katsanevakis and Maravelias, 2008). By far, the most widely applied model is attributed to von Bertalanffy (Von Bertalanffy, 1938). The von Bertalanffy model gives rise to an increasing length-at-age, but decelerating growth rate with age, and it ﬁts most populations well, especially when the youngest ages are excluded (Chen et al., 1992; Lester et al., 2004). Alternative models generally lead to a similar growth pattern, though variations exist to account for such inﬂuences as life-history stages,

∗ Corresponding author. E-mail address: [email protected] (K.W. Shertzer). http://dx.doi.org/10.1016/j.ﬁshres.2016.09.032 0165-7836/Published by Elsevier B.V.

seasonal variation, environmental forcing, and density dependence (Lorenzen, 2016; Matthias et al., 2016). Somatic growth varies among individuals because of genetic, behavioral, ecological, and environmental diversity. In some cases, sizes of individuals may deviate around a single growth trajectory, as is commonly assumed. In other cases, variability in size or length data may be described best using multiple growth phenotypes, or, morphs. Throughout this paper, we use the term “growth morph” to refer to a group of individuals that share an expected growth trajectory. When ﬁtting growth models to length-at-age data, the typical assumption is that individual ﬁsh (observations) originate from the same statistical population. If multiple growth curves are considered, the data are usually ﬁrst classiﬁed into different groups and distinct curves are then ﬁt to each of these subpopulations. For example, if a biological population were expected to have sexually dimorphic growth, separate growth curves would be ﬁt to data from males and females. Then, the two curves could be compared to make inference about dimorphic growth. Unfortunately, it is not always possible to classify the data a priori into distinct subsets. In the preceding example, the sex of some or all individuals may be unknown when ﬁsh are gutted at sea (gonads removed), a common occurrence with ﬁshery dependent data. In some cases, assigning labels to groups a priori may not be desirable or even possible, but we are nonetheless interested in whether there is evidence for multiple morphs. If so, we may want to estimate the relative abundance of each group in the

84

K.W. Shertzer et al. / Fisheries Research 185 (2017) 83–89

data set or population, as well as estimate group-speciﬁc growth trajectories for use in stock assessment models (Goodyear, 1984; Punt et al., 2002; Taylor and Methot, 2013). In other applications, we may be more interested in classifying individual ﬁsh into their respective groups. Whether focused at the population or individual level, mixture modeling can be useful for analyzing data composed of multiple groups, particularly when the group identity of some or all individuals is unknown. Mixture modeling is well established in the ﬁelds of statistics (McLachlan and Peel, 2000; Gelman et al., 2013) and machine learning, where it is called “unsupervised classiﬁcation” (Hastie et al., 2009). Mixture models have also proven useful in a range of ecological applications − examples include modeling heterogeneity in mark-recapture data (Pledger et al., 2010), temporal variability in animal movement dynamics (Morales et al., 2004), and variation in ungulate life-history parameters (Hamel et al., 2016), as well as decomposing length frequencies into age or stage classes (Sweeney et al., 2015). The latter can be accomplished in a maximum likelihood framework using publically available software such as the R package mixdist (Macdonald and Du, 2012). To our knowledge, mixture models have not previously been applied to somatic growth data collected from ﬁsh populations. The goals of this paper are twofold. First, we show how mixture models can be adapted to account for heterogeneity in grouprelated growth. Second, we examine ways to identify objectively how many morphs contribute to the population. The method we propose has utility for quantifying growth trajectories of each of the underlying morphs, i.e., the mixture components. It has further utility for classifying individual observations (ﬁsh) into their most likely components (e.g., unclassiﬁed individuals of a particular length as either male or female). Uncertainty in these classiﬁcations can be quantiﬁed, allowing for propagation of error to any subsequent analyses. After describing the method, we demonstrate its use on simulated data and through a case study on Cubera snapper Lutjanus cyanopterus. 2. Methods 2.1. Mixture model framework For ﬁtting length-at-age data, we took a mixture modeling approach implemented in a Bayesian framework (Gelman et al., 2013). Here, the observed lengths (y = y1 , . . ., yn ) were considered to be generated by a mixture of M components (m = 1, . . ., M), the growth morphs. We modeled the expected length (L¯ a,m ) at age (a) of each morph using the von Bertalanffy equation, L¯ a,m = L∞,m (1 − e−km (a−a0,m ) )

(1)

where L∞,m is the asymptotic expected length of morph m, km is the growth coefﬁcient, and a0,m is the theoretical age at which length is zero. We chose the von Bertalanffy equation because of its widespread popularity in ﬁshery science, but note that the general modeling approach can accommodate any suitable growth function. We used a latent (unobserved) variable, zi , assumed to follow a multinomial distribution, to capture the group (i.e., morph) membership of ﬁsh i, zi ∼multinomial (1 , . . ., M )

(2)

Values of ␭ represent the probabilities of membership in each group and, for Eq. (2) to be a valid probability distribution funcM tion, they must satisfy the condition m=1 m = 1. We assumed that the m ’s follow a Dirichlet distribution. This was accomplished implicitly by specifying a set of M hyperprior random variables,

M

˛m ∼gamma (1, 1), followed by the relationship, m = ˛m /

˛j

j=1

(Kéry and Schaub, 2012). More generally, log(˛m ) could be modeled as a function of explanatory variables in a multinomial regression framework, thus allowing the classiﬁcations to be informed by additional covariate data, if available. We assumed that variability of length-at-age is described by a morph-speciﬁc normal distribution,

2 yi | m , zi ∼N L¯ ai ,m , m

(3)

where m is the that fully describe each set of four parameters morph, m = L∞,m , km , a0,m , m . The assumption of normality is common (Francis, 2016), and we chose it here for the purpose of illustration. However, the approach could easily be adapted to other statistical distributions. We applied uniform prior distributions on each parameter. For any given application, meaningful bounds on prior distributions will depend on the data set in hand. However, one caveat regarding mixture models is that the zi might not be identiﬁable without informative priors or constraints. Indeed, all mixture models are nonidentiﬁable in the sense that the mixture distribution is unaffected by permutations in the group labels (Gelman et al., 2013). To avoid this ambiguity, we applied the following priors on asymptotic expected length,

L∞,1 ∼U (min, max) , L∞,2 ∼U 0, L∞,1 , ..., L∞,M ∼U 0, L∞,M−1 (4) which is equivalent to the restriction that L∞,1 > L∞,2 > ... > L∞,M . We deﬁne application-speciﬁc values of min and max in the Simulated data and Cubera snapper case study sections below, as well as specify prior distributions for other parameters of m . To implement the model, we used JAGS version 4.2.0 (Plummer, 2003), a program for Bayesian analysis utilizing Markov Chain Monte Carlo (MCMC) simulation. It was run in R version 3.2.5 (R Development Core and Team, 2016) via the R package R2jags (Su and Yajima, 2015). We ran three independent Markov chains, each for 500,000 iterations, and each initialized with over-dispersed starting values as judged by viewing the trace plots. Posterior distributions were computed after a burn-in period of 100,000 iterations to avoid any inﬂuence of starting values. We thinned the resulting chains by keeping every tenth iteration, because of data storage limits (Link and Eaton, 2012). Convergence was assessed through visual inspection of trace, density, and autocorrelation plots, and by examining the Brooks-Gelman-Rubin statistic for convergence toward 1 (Brooks and Gelman, 1998). 2.2. Classiﬁcation of observations During each MCMC iteration, each ﬁsh is assigned to a single morph. Across iterations, however, morph assignments may differ. We used the mode of the posterior distribution (i.e., the morph assigned most frequently) to classify each ﬁsh. 2.3. Number of morphs and model selection In some applications, M would be known a priori, for example in the case of investigating sexually dimorphic growth where M = 2. In other applications, the number of morphs may be unknown. In general, determining the number of components (morphs) in mixture modeling is a difﬁcult challenge (Gelman et al., 2013), and one method for doing so is model selection. For our applications, we ﬁtted the model for multiple values of M, and then compared results graphically and through model selection criteria. Numerous model selection criteria have been proposed for Bayesian applications, with no general consensus view on which

K.W. Shertzer et al. / Fisheries Research 185 (2017) 83–89

85

Fig. 1. Example of ﬁts to simulated data where the true number of growth morphs is 3. The number of morphs in the estimation model is M = 1 (top left panel), M = 2 (top right panel), M = 3 (bottom left panel), or M = 4 (bottom right panel). Solid lines represent expected curves based on the medians from the posterior distributions; dashed lines represent 95% credible intervals. Colors indicate estimated classiﬁcation of data points into their respective morphs (i.e., the morph selected most often among MCMC iterations).

approach is best (Kéry and Schaub, 2012; Hooten and Hobbs, 2015). Thus, rather than rely on a single criterion, we considered three in concert: the deviance information criterion (DIC), leave-oneout cross validation (LOO), and the widely applicable information criterion (WAIC, also referred to as Watanabe-Akaike information criterion). In principle, each of these metrics attempts to measure the ideal level of parsimony, by quantifying the tradeoff between model complexity and goodness-of-ﬁt so as to minimize prediction error. DIC (Spiegelhalter et al., 2002) is perhaps the most widely applied criterion because of its availability in popular Bayesian software packages, such as JAGS and WinBUGS. It has been found to work well in practice. However, its theoretical justiﬁcation has been questioned, as well as its application for models of mixture distributions or with latent variables (Plummer, 2008; Hooten and Hobbs, 2015). The other criteria considered, LOO (Vehtari and Lampinen, 2002) and WAIC (Watanabe, 2010), have been proposed by Gelman et al. (2014) as reasonable alternatives to DIC. Both LOO and WAIC attempt to quantify the accuracy of the posterior predictive distribution, p(˜yi |y) = p(˜yi |␪)p(␪|y)d␪, where y˜ i is a new observation (i.e., one not used in the estimation process), p(˜yi |␪) is the likelihood for observation y˜ i given parameters ␪, and p(␪ | y) is the posterior distribution of the model parameters given the data y. More speciﬁcally, LOO and WAIC estimate the expected log pointwise predictive density (elpd) for a new data set of the same size, n, elpd =

n

pt (˜yi ) log p(˜yi |y)d˜yi where pt (˜yi ) represents the data-

i=1

generating process for y˜ i (Vehtari et al., 2016a). These statistics and their standard errors were calculated with the R package loo (Vehtari et al., 2016b) using the likelihood of the data evaluated at

parameter values drawn from the full posterior distribution, p(yi | ␪s ), where ␪s represents one such sample (Vehtari et al., 2016b). 2.4. Simulated data We tested the method using four simulated length-at-age data sets that differed in the number of underlying growth morphs. For each, we used a sample size of n = 200 ﬁsh and ages 1–15. In actual length-at-age data, the probability of observing a ﬁsh of age a tends to decrease as ﬁsh get older because of mortality, but increase with selectivity of the sampling gear. We attempted to mimic these opposing forces, the ﬁrst by applying exponential decay of abundance at age Na , and the second by applying a logistic model to describe relative sampling at age Sa . Exponential decay followed the form Na+1 = Na e−Z , at the rate Z = 0.15 yr−1 ; sampling was modeled as Sa = 1/ (1 + e−ˇ(a−a50 ) ), with a slope of ˇ = 0.5 and age at 50% sampling selection of a50 = 2. Thus, the probability Pa of observing a ﬁsh of age a is,

15

Pa = Na Sa /

j=1

Nj Sj

(5)

The values of Pa are invariant to the initial abundance because of the normalization (denominator in Eq. (5)); we used N1 = 1. Given Pa , an age was assigned at random to each ﬁsh. After assigning an age, we derived the observed length in a three-step process. First, the ﬁsh was assigned at random to a growth morph, with each morph receiving equal probability. Second, the expected length was calculated (Eq. (1)). Finally, the observed length was drawn from a normal distribution, 2 . N L¯ a,m , m

86

K.W. Shertzer et al. / Fisheries Research 185 (2017) 83–89

Fig. 2. Model selection criteria applied to the simulation study, where the true number of growth morphs is 1 (top left panel), 2 (top right panel), 3 (bottom left panel), or 4 (bottom right panel). The number of morphs (NMorphs) in the estimation model is indicated on each panel. Selection metrics are DIC, LOO, and WAIC. Filled circles represent point estimates, and solid lines represent ±2 standard errors (for LOO and WAIC only).

Each of the simulated data sets contained a different number of underlying growth morphs, M = (1, 2, 3, or 4). In all cases, the morphs shared the same values for parameters km (= 0.25) , a0,m (= −0.75) , and m (= 60), but they had different values of L∞,m (units of these simulation parameters are arbitrary). For the simulated data with M = 1, L∞,1 = 1000; for M = 2, L∞,12 = (1200, 800); for M = 3, L∞,123 = (1300, 1000, 700); and for M = 4, L∞,1234 = (1600, 1200, 800, 500). We then applied the estimation model using M = (1, 2, 3, and 4) to each of the four simulated data sets, for a total of 16 model ﬁts. In these applications, we speciﬁed uniform prior distributions for each parameter of each morph, km ∼U (0.01, 0.5), a0,m ∼U (−5.00, −0.01), m ∼U (5, 200), and L∞,1 ∼U (700, 1800). Our primary goal was to test model performance in an idealistic setting when underlying conditions are known, before any application to real data.

2.5. Cubera snapper case study Cubera snapper is the largest member of the snapper family (Lutjanidae) in the tropical western Atlantic, capable of attaining lengths in excess of one meter. Adults typically inhabit subtropical and tropical rocky ledge and coral reef areas to 40 m depth (Allen, 1985). Juveniles have not been recorded on reefs, but instead are found on soft-bottom or vegetated habitats, such as mangrove areas (Lindeman and DeMaria, 2005). Cubera snapper are of moderate importance to the southeastern United States (North Carolina to Florida Keys) reef ﬁsh ﬁshery. While caught infrequently by anglers, their large size makes them a prized trophy species.

Cubera snapper (n = 128 ﬁsh) were opportunistically sampled during 2000–2015 by trained port agents who monitor recreational headboat and commercial ﬁsheries off the southeast United States coast. Sagittal otoliths were collected along with meristic measurements. The samples were processed and aged by experienced biologists using standard protocols (Potts and Manooch, 1995). Increment counts and margin types were recorded for each sample. Based on timing of annulus formation, the increment counts were converted to calendar ages, and fractional ages were calculated using July as the month of peak spawning. Here, growth trajectories were estimated using fractional ages and fork length (mm). Similar to the simulation analysis, we applied the estimation model to Cubera snapper using M = (1, 2, 3, and 4) morphs. We then considered which value of M seems most likely and discuss implications for this data set. As in the simulation study, we speciﬁed uniform prior distributions on model parameters, although with different bounds more relevant to this data set, km ∼U (0.01, 0.30), a0,m ∼U (−15.00, −0.01), m ∼U (10, 200), and L∞,1 ∼U (700, 2000). The model code and data are provided as Supplementary material. 3. Results 3.1. Application to simulated data The estimation model ﬁt the simulated data well, particularly when the number of growth morphs (M) in the estimation model equaled that of the simulation (ﬁts shown in Fig. 1 for simulation M = 3). Furthermore, in those cases, the estimation model recov-

K.W. Shertzer et al. / Fisheries Research 185 (2017) 83–89

87

Fig. 3. Fits to Cubera snapper data (solid circles, fork length in millimeters), where the number of growth morphs is M = 1 (top left panel), M = 2 (top right panel), M = 3 (bottom left panel), or M = 4 (bottom right panel). Solid lines represent expected curves based on the medians from the posterior distributions; dashed lines represent 95% credible intervals. Colors indicate estimated classiﬁcation of data points into their respective morphs (i.e., the morph selected most often among MCMC iterations).

ered the true parameter values, in the sense that true values were well approximated by medians of posterior distributions and were contained within the 95% credible intervals (results not shown). A notable pattern across all simulations was that credible intervals of estimated curves widened substantially when the number of morphs in the estimation model exceeded that of the simulated population. The proportions of each morph in the population were also well estimated by the multi-morph models. For M = 2 (estimation and simulation), the posterior median proportions (95% credible intervals) were 1 = 0.47 (0.40, 0.55) and 2 = 0.53 (0.45, 0.60), compared to true values of 0.5 for both morphs. Classiﬁcation accuracy, i.e. correct morph assignment, was 97% (94%, 99%). For M = 3, the posterior median proportions were 1 = 0.31 (0.23, 0.40), 2 = 0.31 (0.22, 0.41), and 3 = 0.37 (0.30, 0.45), compared to true values of 0.35, 0.30, and 0.36. (Note that estimated and true vectors may not sum exactly to one because of rounding.) Classiﬁcation accuracy was 90% (83%, 95%). For M = 4, the posterior median proportions were 1 = 0.28 (0.22, 0.35), 2 = 0.21 (0.15, 0.28), 3 = 0.26 (0.19, 0.35), and 4 = 0.24 (0.17, 0.32), compared to true values of 0.29, 0.22, 0.24, and 0.27. Classiﬁcation accuracy was 93% (87%, 97%). Model selection criteria proved useful for uncovering the true number of growth morphs (Fig. 2). For all four simulated data sets, DIC was lowest (best) when M in the estimation model equaled that of the simulation. The metrics LOO and WAIC behaved differently from DIC, but similarly to each other. Their values were lowest when M in the estimation model equaled or exceeded that of the simulation model. That is, they indicated the rejection of overly simplistic models (too few morphs), but did not distinguish between true and overly complicated models. Thus, LOO and WAIC

Fig. 4. Model selection criteria applied to the Cubera snapper ﬁts. The number of morphs (NMorphs) in the estimation model is indicated. Selection metrics are DIC, LOO, and WAIC. Filled circles represent point estimates, and solid lines represent ±2 standard errors (for LOO and WAIC only).

agreed with DIC (and the truth), if we used them to select the simplest model that they failed to reject. 3.2. Application to cubera snapper data For model ﬁts to the Cubera snapper data (Fig. 3), qualitative and quantitative diagnostics indicated convergence of the Markov chains. The only exception was that, for the two-morph model, the Brooks-Gelman-Rubin statistic exceeded 1.1 (but <1.3) for several

88

K.W. Shertzer et al. / Fisheries Research 185 (2017) 83–89

Table 1 Parameter estimates from the two-morph model applied to Cubera snapper data. Values shown are 50th (2.5th, 97.5th) percentiles from the posterior distributions. Parameter

Units

Morph 1

Morph 2

L∞ k a0 ␭

mm yr−1 yr mm unitless

1365 (1206, 1622) 0.05 (0.03, 0.09) −8.2 (−13.7, −4.2) 77 (65, 95) 0.60 (0.51, 0.69)

753 (700, 880) 0.11 (0.05, 0.17) −4.0 (−13.9, −0.19) 55 (43, 73) 0.40 (0.31, 0.49)

(14 out of 128) estimates of zi , indicating that those individuals were difﬁcult to assign to a particular growth morph. When comparing models with different numbers of morphs, we considered the three selection criteria in concert (Fig. 4). The DIC statistic provided strongest support for the two-morph model. The LOO statistic indicated that the one-morph model performed worst, but provided similar support for the other three models. The WAIC statistic suggested that the three- and four-morph models performed best, but with some overlap in standard errors between the two- and three-morph models. We would argue that these criteria, taken together and in light of the simulation study, indicate that the two-morph model is optimal in terms of balancing performance and parsimony (Table 1). In addition, the wider credible intervals of the three- and four-morph models would also support selection of the two-morph model (Fig. 3). 4. Discussion A primary goal of this paper was to develop methodology for ﬁtting growth models to size-at-age data that contain mixtures of multiple growth morphs. Although our description focused on the von Bertalanffy growth model and the normal distribution of size at age, the method is general, and could accommodate any suitable modiﬁcation to those assumptions. For example, expected size might be described by a multi-phasic growth model (e.g., Matthias et al., 2016) or a nonparametric function, and variability might be modeled such that it changes with age (e.g., Zhu et al., 2016). In addition, the model could be simpliﬁed in cases where group membership (zi ) is known for some or all individuals. Another goal was to compare model selection criteria for choosing the optimal number of morphs. When comparing models, it is important to recognize that information criteria are statistics, and as such, they are also subjected to sampling uncertainty. The three criteria we applied—DIC, LOO, and WAIC—proved useful for selecting the appropriate number of morphs, particularly when considered in concert, and also when we considered measures of variability (standard errors) along with point estimates. Still, Bayesian model selection is a developing ﬁeld (Hooten and Hobbs, 2015), and we do not consider our analysis to be the ﬁnal word on applications to size-at-age data. We conducted the simulation analysis as proof of concept, that is, to test whether the model could work under ideal conditions. We were interested in model performance, including convergence diagnostics and the ability to estimate parameters accurately, as well as in whether model selection criteria could help uncover the true number of morphs. Results were positive, indicating that the methodology holds promise. Although a primary goal of this paper was to develop methodology, we emphasize that more comprehensive simulation testing is needed to provide guidance on conditions when the methods work and when they break down. Further simulation testing could investigate such factors as 1) various combinations of growth parameters, including the standard deviation of size at age (m ), 2) various proportions of morphs in the data set (our simulations assumed equal proportions), 3) sample size, i.e., number of ﬁsh observed, 4) different error distributions, 5) other methods to deﬁne morph labels (we focused on

L∞ ), and 6) reasonable ways to inﬂuence estimation with informative prior distributions. Although we did not test all of these factors in a formal experimental design, we did conduct preliminary investigations with several of them (Supplementary material), including variation in simulated km and a0,m , variation in m ranging from 30 to 100, and various expected proportions of morphs in the data (1:2 and 1:3 for two morphs, 1:2:1 and 1:3:1 for three morphs, 1:2:2:1 and 1:3:3:1 for four morphs). We found that the model and selection criteria continued to perform well under various proportions of morphs and combinations of growth parameters. However, performance began to deteriorate at the higher values of m . These ﬁndings support the intuitive result that the power to distinguish among morphs depends on the between- versus within-morph variance in the data. In addition to further simulation study, future work could focus on the model structure itself. For example, one could adopt a hierarchical framework, by assuming that growth parameters vary normally among morphs. This approach of information sharing could prove useful, particularly if there exist a large number of morphs. Similarly, a hierarchical or random effects approach could be adopted to account for ageing error (Cope and Punt, 2007). Ageing error has the potential to blur the distinction among morphs and thus make their identiﬁcation more difﬁcult. In the case study of Cubera snapper, we found the two-morph model to be most consistent with the data. A reasonable next step might be to consider additional information to understand these results better. Our data set contained information on location and date of capture, and limited information on gender (21 individuals), but we found no support for hypotheses regarding latitudinal effects, temporal trends, or sexually dimorphic growth. Other hypotheses that we could not test using the current data set included the effects of behavior (e.g., bold/shy dichotomy), habitat quality (e.g., good/bad reefs), and depth gradient. We also pose the possibility that the apparent dimorphism is a sampling artifact; for example, collecting more specimens could result in a unimodal distribution of size at age, or the smaller morph may be due to species misidentiﬁcation. Although port agents who collected the data are trained in identiﬁcation, the smaller morph was found to have growth trajectories similar to those of Gray snapper Lutjanus griseus (Burton, 2000), a species that is quite similar in appearance to Cubera snapper. This possibility highlights a potential utility of our methodology: cleaning a contaminated data set. Because the method classiﬁes individual data points into groups (morphs), those points associated with an unsought group can be removed before proceeding with the analysis. In additional to testing hypotheses about growth dynamics (or decontaminating data), quantifying various morphs may also have subsequent utility for population-level applications (Francis, 2016). Both CASAL (Bull et al., 2012) and Stock Synthesis (SS, Methot and Wetzel, 2013), popular software package for stock assessment, have the capability to include growth morphs. This feature helps account for the fact that size-selective ﬁshing practices preferentially remove the faster growing individuals, without the computational cost of using a fully individual-based model. Its use can result in more accurate stock assessments (Taylor and Methot, 2013), yet it may not be readily apparent how many morphs to include and how to deﬁne them. Our methods might help such efforts. Our focus was on somatic growth, however the mixture modeling framework described here is more general and, with little modiﬁcation, could be applied to other measurable characteristics (e.g., fecundity). When a data set represents a mixture of multiple groups, the modeling framework could help elucidate patterns within the data, providing a means to classify individuals and describe population-level processes. Such applications might prove

K.W. Shertzer et al. / Fisheries Research 185 (2017) 83–89

useful when testing hypotheses about the underlying structure of populations. Acknowledgments We thank Trevor Heﬂey, Nikolai Klibanski, Andre Punt, Ian Taylor, and Erik Williams for constructive comments on the manuscript. Opinions expressed are those of the authors and do not necessarily represent views of any government agency. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ﬁshres.2016.09. 032. References Allen, G.R. 1985. Snappers of the world. An annotated and illustrated catalogue of lutjanid species known to date. FAO Species Catalogue, vol. 6. FAO Fish. Synop. 125, 1–208. Brooks, S., Gelman, A., 1998. General methods for monitoring convergence of iterative simulations. J. Comp. Graph. Stat. 7, 434–455. Bull, B., Francis, R.I.C.C., Dunn, A., McKenzie, A., Gilbert, D.J., Smith, M.H., Bian, R., Fu, D., 2012. CASAL (C++ algorithm stock assessment laboratory): CASAL User Manual v2.30-2012/03/21. NIWA Technical Report 135, pp. 280. Burton, M.L., 2000. Age, growth, and mortality of gray snapper Lutjanus griseus, from the east coast of Florida. Fish. Bull. 99, 254–265. Charnov, E.L., Gislason, H., Pope, J.G., 2013. Evolutionary assembly rules for ﬁsh life histories. Fish Fish. 14, 213–224. Chen, Y., Jackson, D.A., Harvey, H.H., 1992. A comparison of von Bertalanffy and polynomial functions in modelling ﬁsh growth data. Can. J. Fish. Aquat. Sci. 49, 1228–1235. Cope, J.M., Punt, A.E., 2007. Admitting ageing error when ﬁtting growth curves: an example using the von Bertalanffy growth function with random effects. Can. J. Aquat. Sci. 64, 205–218. Francis, R.I.C.C., 2016. Growth in age-structured stock assessment models. Fish. Res. 180, 77–86. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B., 2013. Bayesian Data Analysis, 3rd edition. CRC Press, Boca Raton, FL. Gelman, A., Hwang, J., Vehtari, A., 2014. Understanding predictive information criteria for Bayesian models. Stat. Comput. 24, 997–1016. Goodyear, C.P., 1984. Analysis of potential yield per recruit for striped bass produced in Chesapeake Bay. North Am. J. Fish. Manage. 4, 488–496. Hamel, S., Yoccoz, N.G., Gaillard, J., 2016. Assessing variation in life-history tactics within a population using mixture regression models: a practical guide for evolutionary ecologists. Biol. Rev., http://dx.doi.org/10.1111/brv.12254. Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition. Springer-Verlag, New York, pp. 745. Hooten, M.B., Hobbs, N.T., 2015. A guide to Bayesian model selection for ecologists. Ecol. Monogr. 85, 3–28. Kéry, M., Schaub, M., 2012. Bayesian Population Analysis Using WinBUGS: A Hierarchical Perspective. Academic Press, Waltham, MA. Katsanevakis, S., Maravelias, C.D., 2008. Modelling ﬁsh growth: multi-model inference as a better alternative to a prior using von Bertalanffy equation. Fish Fish. 9, 178–187. Lester, N.P., Shuter, B.J., Abrams, P.A., 2004. Interpreting the von Bertalanffy model of somatic growth in ﬁshes: the cost of reproduction. Proc. R. Soc. Lond. B: Biol. Sci. 271, 1625–1631.

89

Lindeman, K.C., DeMaria, D., 2005. Juveniles of the Caribbean’s largest coral reef snapper do not use reefs. Coral Reefs 24, 359. Link, W.A., Eaton, M.J., 2012. On thinning of chains in MCMC. Methods Ecol. Evol. 3, 112–115. Lorenzen, K., 2016. Toward a new paradigm for growth modeling in ﬁsheries stock assessments: embracing plasticity and its consequences. Fish. Res. 180, 4–22. Macdonald, P., Du, J. 2012. mixdist: Finite Mixture Distribution Models. R package version 0. 5-4. URL: https://CRAN.R-project.org/package=mixdist. Matthias, B.G., Ahrens, R.N.M., Allen, M.S., Lombardi-Carlson, L.A., Fitzhugh, G.R., 2016. Comparison of growth models for sequential hermaphrodites by considering multi-phasic growth. Fish. Res. 179, 67–75. Maunder, M.N., Crone, P.R., Punt, A.E., Valero, J.L., Semmens, B.X., 2016. Growth Theory, estimation, and application in ﬁshery stock assessment models. Fish. Res. 180, 1–3. McLachlan, G., Peel, D., 2000. Finite Mixture Models. John Wiley & Sons, Inc, New York, NY. Methot Jr., R.D., Wetzel, C.R., 2013. Stock synthesis: a biological and statistical framework for ﬁsh stock assessment and ﬁshery management. Fish. Res. 142, 86–99. Morales, J.M., Haydon, D.T., Frair, J., Holsinger, K.E., Fryxell, J.M., 2004. Extracting more out of relocation data: building movement models as mixtures of random walks. Ecology 85, 2436–2445. Pledger, S., Pollock, K.H., Norris, J., 2010. Open capture-recapture models with heterogeneity: II. Jolly–Seber model. Biometrics 66, 883–890. Plummer, M., 2003. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing. Ed. By Hornik, K., Leisch, F., Zeileis, A., Vienna, Austria. Plummer, M., 2008. Penalized loss functions for Bayesian model comparison. Biostatistics 9, 523–539. Potts, J.C., Manooch III, C.S., 1995. Age and growth of red hind and rock hind collected from North Carolina through the Dry Tortugas, Florida. Bull. Mar. Sci. 56, 784–794. Punt, A.E., Smith, A.D.M., Cui, G., 2002. Evaluation of management tools for Australia’s south east ﬁshery 1: modelling the south east ﬁshery taking account of technical interactions. Mar. Freshw. Res. 53, 615–629. Quinn II, T.J., Deriso, R.B., 1999. Quantitative Fish Dynamics. Oxford University Press, New York, New York. R Development Core Team, 2016. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http:// www.R-project.org/. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A., 2002. Bayesian measures of model complexity and ﬁt (with discussion). J. R. Stat. Soc. B 64, 583–639. Su, Y.S., Yajima, M., 2015. R2jags: Using R to Run ‘JAGS’. R package version 0. 5–7. URL: https://cran.r-project.org/package=R2jags. Sweeney, K.L., Shertzer, K.W., Fritz, L.W., Read, A.J., 2015. A novel approach to compare pinniped populations across a broad geographic range. Can. J. Fish. Aquat. Sci. 72, 175–185. Taylor, I.G., Methot Jr., R.D., 2013. Hiding or dead? A computationally efﬁcient model of selective ﬁsheries mortality. Fish. Res. 142, 75–85. Vehtari, A., Lampinen, J., 2002. Bayesian model assessment and comparison using cross-validation predictive densities. Neural Comput. 14, 2439–2468. Vehtari, A., Gelman, A., Gabry, J., 2016a. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. arXiv preprint arXiv:1507.04544. Vehtari, A., Gelman, A., Gabry, J., 2016b. loo: Efﬁcient leave-one-out cross-validation and WAIC for Bayesian models. R package version 0.1.6. URL: https://github.com/jgabry/loo. Von Bertalanffy, L., 1938. A quantitative theory of organic growth (Inquiries on growth laws II). Hum. Biol. 10, 181–213. Watanabe, S., 2010. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571–3594. Zhu, J., Maunder, M.N., Aires-da-Silva, A.M., Chen, Y., 2016. Estimation of growth within Stock Synthesis models: management implications when using length-composition data. Fish. Res. 180, 87–91.

Identifying growth morphs from mixtures of size-at-age data

Identifying growth morphs from mixtures of size-at-age data

Recommend Documents