Using cross validation model selection to determine the shape of nonparametric selectivity curves in fisheries stock assessment models

Using cross validation model selection to determine the shape of nonparametric selectivity curves in fisheries stock assessment models

Fisheries Research 110 (2011) 283–288 Contents lists available at ScienceDirect Fisheries Research journal homepage: www.elsevier.com/locate/fishres...

306KB Sizes 0 Downloads 36 Views

Fisheries Research 110 (2011) 283–288

Contents lists available at ScienceDirect

Fisheries Research journal homepage: www.elsevier.com/locate/fishres

Using cross validation model selection to determine the shape of nonparametric selectivity curves in fisheries stock assessment models Mark N. Maunder a,∗ , Shelton J. Harley b,1 a b

Inter-American Tropical Tuna Commission, 8604 La Jolla Shores Drive, La Jolla, CA 92037-1508, USA Secretariat of the Pacific Community, BP D5, 98848 Noumea CEDEX, New Caledonia

a r t i c l e

i n f o

Article history: Received 26 September 2010 Received in revised form 21 April 2011 Accepted 21 April 2011 Keywords: AD model builder Cross validation Model selection Nonparametric Selectivity Stock assessment

a b s t r a c t We used hold-out cross validation model selection to determine the most appropriate form of the selectivity curve, using a nonparametric approach to represent selectivity. The cross-validation method is based on setting aside a portion of the catch-at-age (or catch-at-length) data to use as a test data set. The remaining catch-at-age data, along with other data (e.g. relative indices of abundance) are used to estimate the parameters of the stock assessment model, including the selectivity parameters. These parameter estimates are then used to predict the catch at age for the test data set. The selectivity model that produces the closest predictions to the test data set is chosen as the selectivity model to use in the assessment. The selectivity model we use is nonparametric, based on estimating an individual selectivity parameter for each age and then applying smoothness penalties to constrain how much the selectivity can change from age to age. The smoothness penalties we consider are the first, second, and third differences, a length-based penalty, and a monotonic penalty. The penalties are applied on the logarithm of selectivity to avoid scale-related problems and improve stability. The method was applied to the assessment of bigeye tuna in the eastern Pacific Ocean. We found that the estimated management quantities were relatively robust within the set of smoothness penalties that gave low cross-validations scores. We also found that poor choices for the smoothness penalties could give very different results. Poor choices include both under-smoothing (e.g. no penalties) and over-smoothing (penalties that are too large). The most influential factor was the inclusion of a monotonic penalty. © 2011 Elsevier B.V. All rights reserved.

1. Introduction Age-specific selectivity is one of the most influential components of modern stock assessment methods. Not only can it influence estimates of population size, it can also have a substantial influence on management quantities. For example, maximum sustainable yield (MSY) and associated management quantities (e.g. the biomass that supports MSY) are not defined without specifying the selectivity of the gear (Maunder, 2002). These management quantities can differ substantially for different age-specific selectivity patterns (Sinclair, 1993; Maunder, 2002). Therefore, it is important that appropriate methods are used to estimate agespecific selectivity. Numerous assumptions have been used to constrain agespecific selectivity. It is usually not possible to adequately estimate a single selectivity parameter for each age. Therefore, the num-

∗ Corresponding author. Tel.: +1 858 546 7027; fax: +1 858 546 7133. E-mail addresses: [email protected] (M.N. Maunder), [email protected] (S.J. Harley). 1 Tel.: +687 260192. 0165-7836/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.fishres.2011.04.017

ber of parameters estimated is limited. The simplest approach is to assume that all individuals above a given age are selected by the fishery and all individuals below that age are not selected (knife-edge selectivity). However, in most situations the selectivity pattern is not so clear cut, and individuals become more or less selected by the fishery as they age. The logistic curve is frequently used to represent a gradual increase in selectivity as individuals age (e.g. Smith and Punt, 1998). The logistic equation assumes that selectivity increases monotonically to an asymptote. Many fishing gears may not fully select for older individuals. For example, large individuals may not get caught in the meshes of gill nets. To include the possibility of decreasing selectivity for the older individuals, other selectivity curves have been used. Such as the double normal (Hilborn et al., 2000), gamma (Punt and Walker, 1998), double logistic (Helu et al., 2000), and the exponential-logistic (Thompson, 1994). Haist et al. (1999) suggest that these functional forms are too restrictive and may be inappropriate for a particular application, leading to biased results. They suggest using separate parameters to represent selectivity for each age, but to constrain the amount that selectivity can change from age to age using smoothness penalties. These penalties avoid overparameterization of the model. The

284

M.N. Maunder, S.J. Harley / Fisheries Research 110 (2011) 283–288

method of Haist et al. is commonly used in complex statistical catch-at-age or catch-at-length analyses (e.g. Fournier et al., 1998; Ianelli, 2002; Maunder and Watters, 2003). However, the smoothness penalties, which are usually specified arbitrarily, can influence the results. Likelihood-based model selection criteria are often used to determine which parameters should be estimated in a stock assessment model (e.g. Watters and Maunder, 2001). For example, the likelihood ratio test can be used to determine if the selection curve should be asymptotic or dome shaped by fixing or estimating a single parameter. The Akaike Information Criterion (AIC; Akaike, 1973) and the Bayes Information Criterion (BIC; Schwarz, 1978), which do not require the models to be nested (Hilborn and Mangel, 1997), can be used to determine which selectivity functional form to use and/or what parameters to estimate (Helu et al., 2000). These criteria penalize the fit to the data (the likelihood value) by the number of parameters estimated. Therefore, it would be unlikely that a model with an estimated selectivity parameter for each year would be selected over a functional form with two or three parameters. The smoothness penalties used by Haist et al. (1999) are one way to limit the degrees of freedom of the selectivity curve. However, it is difficult or impossible to determine the effective number of parameters to use in the AIC or BIC criteria. Therefore, an alternative method is needed to determine the appropriate values for the smoothness penalties. We develop a method using hold-out cross validation model selection (Arlot and Celisse, 2010) to determine the appropriate penalties to smooth selectivity curves. The method is related to commonly used methods in standard nonparametric statistics (Wood, 2006). We apply this method to bigeye tuna, Thunnus obesus, in the eastern Pacific Ocean (EPO), which is a stock assessment used for management advice.

2. Methods 2.1. Selectivity model We implement selectivity at age (or length), following the method of Fournier et al. (1998, see Haist et al., 1999 for details). Separate parameters are estimated to represent selectivity for each age, but the amount that selectivity can change from age to age is constrained using smoothness penalties (Table 1). The constraints are implemented, using the difference equation approximation to the first, second, and third derivatives of the selectivity curve (Eqs. (1)–(3) in Table 1). A weighting factor  is added to allow for an increase or decrease in the influence of the selectivity smoothness penalties. The first difference constrains the selectivity curve to be (penalizes it toward being) constant, the second difference constrains the selectivity curve to be linear, and the third difference constrains the selectivity curve to be quadratic. It is likely that selectivity is partly length-based, so an additional weighting factor is added to the first difference to put a greater penalty for ages for which the growth rate is lower and the length distributions are similar between consecutive ages (see Fournier et al., 1998 for an alternative method to include length into the selectivity curvature penalty). The parameter can be modified to determine how the weighting differs as a function of mean length at age. This parameter should be a negative number to provide an inverse relationship between the difference in average length and smoothness of the selectivity curve. There is no length effect if this parameter is set to zero, but the constraint of the first difference is still used. James Ianelli (pers. com., NMFS, Seattle, USA) suggests using the penalties on the logarithm of the selectivity parameters to avoid scale-related problems and improve the stability of the estimation procedure. However, the resulting selectivity curve can be influenced by this

choice, and it may be desirable to remove the logarithms from the equations. In addition, the selectivity parameters for each fishery are penalized to average one (Eq. (4) in Table 1). This removes excess correlation among the selectivity parameters and between the selectivity parameters and other model parameters. A penalty can also be used to constrain the selectivity curve to be monotonically increasing (Eq. (5) in Table 1). It may be beneficial to fix the first few and last few selectivities at a small value or make them equal to a single estimated value. This reduces the number of parameters to be estimated when the catch-at-age or -at-length data indicate that individuals of these ages are not caught by that gear or these ages make up only a small fraction of the total population, and avoids estimating parameters for which there is little information. 2.2. Cross validation We use hold-out cross-validation model selection (Arlot and Celisse, 2010) to select the appropriate smoothness penalties for the selectivity curves. Selection of smoothness parameters for standard nonparametric statistical procedures are also based on cross-validation (Wood, 2006), but the approaches used are often more sophisticated than that presented here. Cross-validation involves using a subset of the data as a test data set. First, the remaining data (the training data set) are used to estimate the model parameters, and then these parameters are used to predict the test data set. The smoothness penalties that provide the predictions that are closest to the test data set are chosen as the best penalties. Due to the computational demands of the stock assessment model we use hold-out cross-validation (Devroye and Wagner, 1979) rather than K-fold cross-validation (Geisser, 1975), Ordinary Cross-Validation (OCV; Stone, 1974), or Generalized Cross-Validation (GCV; Craven and Wahba, 1979), and test the sensitivity of the results to a different random selection of data for the test data set. The catch-at-age (or -at-length) data are predicted when choosing the smoothness penalties. If selectivity is too flexible, the model may be fitting to noise (due to random sampling error or changes in growth or selectivity) in the training data set, and therefore will not be able to adequately predict the test data set. Intuitively, since catch-at-age data provide the majority of information about selectivity, using catch-at-age data rather than other data (e.g. indices of abundance) is appropriate. A decision needs to be made about which data to leave out. The data could be left out for certain years, completely at random, or balanced by gear. Leaving out complete years for all gears may cause too much loss of information about some model parameters (e.g. recruitment strength). Leaving out too much data for one gear will cause too much loss of information about the selectivity parameters for that gear. Leaving out the data sets completely at random should provide, on average, a balance between maintaining data for all years and gears. 2.3. Bigeye tuna application The stock assessment method used for the bigeye tuna application follows that used by Maunder and Harley (2002), with updated maturity-at-age data, and is based on the A-SCALA method of Maunder and Watters (2003). We randomly choose 20% of the catch-at-length data to be the test data set and use the catchat-length likelihood function as the measure of closeness of the predicted values to the test data set. The likelihood function is a normal approximation to the multinomial (Fournier et al., 1990), with sample sizes fixed based on assumptions about the data. The EPO bigeye tuna assessment models nine fisheries for which selectivity curves are estimated. Seven of these fisheries are

M.N. Maunder, S.J. Harley / Fisheries Research 110 (2011) 283–288

285

Table 1 Equations defining the penalties applied to the selectivity curves. Description

Equation

First difference

1g



a=A−1

[a+1 − a + 0.01]

g

[− ln(sg,a ) + ln(sg,a+1 )]

2

(1)

a=1 a=A−2

Second difference

2g



[ln(sg,a ) − 2 ln(sg,a+1 ) + ln(sg,a+2 )]

2

(2)

a=1 a=A−3

3g

Third difference



[− ln(sg,a ) + 3 ln(sg,a+1 ) − 3 ln(sg,a+2 ) + ln(sg,a+3 )]

 

a=1

Monotonic increasing 1g ,

a

s=1 ln g

Scaling

2g ,

sg,a

2

(3)

2 (4)

A

if sg,a > sg,a+1

mon [ln(sg,a ) − ln(sg,a+1 )] g

2

(5)

3g

Where and are the weighting factors for the first, second, and third difference, respectively, for gear type g, g , is the length-based weighting factor for gear type g, sg,a is the selectivity to gear g for an individual of age a a , is the mean length of an individual of age a, and A is the maximum age in the model.

purse-seine fisheries and two are longline fisheries. It would be an arduous task to determine the appropriate smoothness parameters for each fishery separately. This is particularly true for a complex computer-intensive assessment such as the EPO bigeye tuna assessment which takes about 1 h on a modern desktop computer. However, it is reasonable to assume that one set of penalties may apply to all purse-seine fisheries and another set of penalties to all longline fisheries. There are still many possible weights on the first, second, and third differences, the weights for monotonic increases and length-based weighting, and the two gear types. Therefore, some simplification is needed. The first simplification is to apply cross validation separately for each gear type by calculating the cross-validation score for each gear type separately, but still fitting to catch-at-length data for all fisheries. This may cause sub-optimal choice of weights because the smoothness penalties for one gear type affect the fit to the catch-at-length of the other gear type. The monotonic penalty is appropriate only for the southern longline fishery, as it catches the largest fish. The selectivity for the first few

ages for both types of fisheries and the last few ages for the purse seine fisheries were set to zero. The weights are selected as follows: (a) the model is applied with the weights for each gear type (longline and purse seine) at 0.01, 0.05, 0.1, 0.5, 1, 2, and 10 independently for the first, second, and third differences and for the first difference with length-based weighting (set at −1 to give an inverse relationship) with the monotonic penalty on the southern longline fishery (set at 1000); (b) the model is then run for the best penalty for the longline gear with monotonic weighting factors of 0, 1, 10, and 100; (c) the best weights for the first, second, and third differences are chosen and set; and (d) the best single penalty for longline and the best single penalty for purse seine are selected. We also investigate the effect of a different randomly selected test data set. For each run of the model, we calculated the cross-validation score for each gear type as well as some management quantities. We also present a scaled cross-validation score. The average sample size differs among gear types because there are more purse-

Table 2 Results from the bigeye tuna application with different weighting factors for the first difference, second difference, third difference, and first difference with a length-based penalty. Bold numbers represent the minimum values for the cross-validation (C-V) scores.

First difference

Second difference

Third difference

First difference with length -based penalty

0.01 0.05 0.1 0.5 1 2 10 0.01 0.05 0.1 0.5 1 2 10 0.01 0.05 0.1 0.5 1 2 10 0.01 0.05 0.1 0.5 1 2 10

PS C-V

LL C-V

Total C-V

Scaled C-V

MSY

SMSY /S0

Scur

Scur /SMSY

Fscale

108.28 105.77 105.82 110.01 114.38 123.81 150.04 107.25 103.97 104.02 102.54 105.81 110.54 119.99 104.25 106.26 103.74 101.08 100.87 101.88 108.1 108.1 107.31 105.58 107.91 108.04 109.25 128.9

0.64 2.04 2.19 4.25 4.05 5.18 14.44 3.08 1.79 1.86 2.66 3.22 3.54 5.26 2.88 2.67 2.68 1.79 2.37 2.43 3.1 4.32 0.39 0.37 2.79 2.91 3.16 4.75

108.91 107.81 108.02 114.26 118.43 128.98 164.49 110.33 105.76 105.88 105.2 109.03 114.08 125.26 107.13 108.94 106.42 102.87 103.23 104.31 111.2 112.42 107.71 105.95 110.71 110.95 112.41 133.65

2.81 6.6 7.01 12.64 12.14 15.3 40.74 9.44 5.89 6.1 8.24 9.79 10.71 15.5 8.86 8.32 8.31 5.87 7.43 7.61 9.51 12.81 2.13 2.05 8.66 8.98 9.67 14.18

69808 67923 62809 58845 58043 63046 68083 68315 65261 64456 59158 59571 60404 60261 67623 68615 67011 61610 60370 60612 60862 70579 70070 68628 67222 66481 65830 63047

0.18 0.18 0.19 0.22 0.22 0.28 0.27 0.18 0.19 0.19 0.21 0.22 0.21 0.21 0.18 0.18 0.18 0.19 0.2 0.2 0.21 0.17 0.17 0.17 0.17 0.17 0.17 0.23

60931 68775 53548 75102 62398 37475 31342 58864 55908 55857 71953 85358 89723 87318 57823 63935 58909 59288 66909 71113 84293 53721 63905 61016 64810 62616 59367 33239

2.44 2.5 2.08 2.15 1.86 0.97 0.83 2.32 2.16 2.14 2.15 2.32 2.4 2.34 2.3 2.5 2.33 2.09 2.13 2.2 2.39 2.33 2.54 2.47 2.55 2.52 2.49 1.16

1.61 1.63 1.38 1.33 1.17 0.7 0.64 1.56 1.44 1.43 1.36 1.43 1.48 1.42 1.56 1.68 1.58 1.37 1.37 1.41 1.49 1.6 1.66 1.6 1.66 1.64 1.61 0.87

286

M.N. Maunder, S.J. Harley / Fisheries Research 110 (2011) 283–288

Table 3 Results from the bigeye tuna application with the weighting factor for the first difference set at 0.1, the length-based penalty, and different weighting factors for the southern longline monotonic penalty. Bold numbers represent the minimum values for the cross-validation (C-V) scores.

PS C-V LL C-V Total C-V Scaled C-V MSY SMSY /S0 Scur Scur /SMSY Fscale

0

1

10

100

1000

111.00 2.45 113.45 7.75 115697 0.17 217048 4.75 4.10

111.89 2.55 114.43 8.04 80619 0.17 112435 3.57 2.44

110.99 1.74 112.73 5.84 79215 0.18 116597 3.43 2.26

106.34 3.00 109.35 9.22 63939 0.18 47127 1.97 1.36

105.58 0.37 105.95 2.05 68628 0.17 61016 2.47 1.60

seine fisheries than longline fisheries. A simple total will therefore favor purse-seine length-frequency data. Therefore, we calculate an alternative total that scales the purse-seine and longline crossvalidation scores, respectively, by the minimum cross-validation score by gear type calculated over all runs. The management quantities we present are MSY, spawning biomass at MSY as a ratio of the unexploited spawning biomass (SMSY /S0 ), current spawning biomass (Scur ), current spawning biomass as a ratio of the spawning biomass at MSY (Scur /SMSY ), and the current fishing mortality rate as a ratio of the fishing mortality rate at MSY (Fscale ). 3. Results In summary, the weighting factors chosen differed depending on the gear (purse seine or longline), the cross-validation

PS C-V LL C-V Total C-V Scaled C-V MSY SMSY /S0 Scur Scur /SMSY Fscale

D

5 4 3 2 1 0 0

2

4

6

0

8

Selectivity

Selectivity

E

4 3 2 1 0 2

4

2

6

0

8

Selectivity

3

4

6

8

2

4

6

8

6

8

Age

F Selectivity

4

105.12 2.62 107.73 8.16 67424 0.19 78230 2.63 1.70

6 5 4 3 2 1 0

Age

C

Combined2

103.33 2.57 105.90 8.02 59113 0.21 68163 2.09 1.33

Age

5

0

Combined1

109.56 3.36 112.92 10.21 65726 0.21 31153 1.23 0.95

7 6 5 4 3 2 1 0

Age

B

None

measure (scaled or unscaled), or whether a length-based penalty was applied. There were also substantial differences in the estimates of the management parameters for the different weighting factors. The differences in the estimates of the management parameters were greater for the first differences compared to the second and third differences (Table 2). The cross-validation scores were substantially worse for the weighting factors that had estimates of management quantities that are substan-

Selectivity

Selectivity

A

Table 4 Results from the bigeye tuna application with no selectivity smoothness penalties (none), with the best weights for each difference penalty for each gear simultaneously (Combined1 – second difference with a weighting factor of 0.5, third difference with a weighting factor of 1.0, and a first difference with a weighting factor of 0.1 and a length-based penalty, for purse seine, second difference with a weighting factor of 0.05, third difference with a weighting factor of 0.5, and a first difference with a weighting factor of 0.1 and a length-based penalty, and a monotonic penalty weighting factor of 1000 for longline), and for the best penalties for purse seine and longline simultaneously (Combined2 – third difference with a weighting factor of 1.0 for purse seine and the first difference with a weighting factor of 0.1 and a length-based penalty, and a monotonic penalty weighting factor of 1000 for longline).

2 1

5 4 3 2 1 0

0 0

2

4

Age

6

8

0

2

4

Age

Fig. 1. Selectivity curves for a purse seine fishery with penalties based on (A) the smallest cross-validation score for purse seine (third difference with a weighting factor of 1), (B) the smallest cross-validation score for longline, which is also the smallest scaled cross validation score (first difference with a weighting factor of one and a length-based penalty), (C) the smallest cross-validation score (third difference with a weighting factor of 0.5), (D) no penalties, (E) the highest cross-validation score (first difference with a weighting factor of 10), (F) and for the best penalties for purse seine and longline simultaneously (third difference with a weighting factor of 1.0 for purse seine and the first difference with a weighting factor of 0.1 and a length-based penalty, and a monotonic penalty weighting factor of 1000 for longline).

1.5 1.0 0.5

D

2.5

Selectivity

A Selectivity

M.N. Maunder, S.J. Harley / Fisheries Research 110 (2011) 283–288

2.0 1.5 1.0 0.5 0.0

0.0 0

2

4

6

8

0

2

Selectivity

1.0 0.5

E

2.5

Selectivity

1.5

2.0

0

2

4

6

6

8

6

8

1.0 0.5

8

0

2

F

0.5 0.0 0

2

4

4

Age

Selectivity

Selectivity

1.0

8

1.5

Age 1.5

6

0.0

0.0

C

4

Age

Age

B

287

6

8

1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0

Age

2

4

Age

Fig. 2. Selectivity curves for a longline fishery with penalties based on (A) the smallest cross-validation score for purse seine (third difference with a weighting factor of 1), (B) the smallest cross-validation score for longline, which is also the smallest scaled cross-validation score (first difference with a weighting factor of one and a length-based penalty), (C) the smallest cross-validation score (third difference with a weighting factor of 0.5), (D) no penalties, (E) the highest cross-validation score (first difference with a weighting factor of 10), (F) and for the best penalties for purse seine and longline simultaneously (third difference with a weighting factor of 1.0 for purse seine and the first difference with a weighting factor of 0.1 and a length-based penalty, and a monotonic penalty weighting factor of 1000 for longline).

tially different from those estimated from the best weighting factors. The smallest cross-validation score for the purse-seine gear occurred for the third difference with a weighting factor of 1 (Table 2). The smallest cross-validation score for the longline gear occurred for the first difference with a weighting factor of 0.1 and a length-based penalty (Table 2). The smallest crossvalidation score for both gears combined occurred for the third difference with a weighting factor of 0.5 (Table 2). The smallest scaled cross-validation score for both gears combined occurred for the first difference with a weighting factor of 0.1 with a length-based penalty (Table 2). The unscaled cross-validation score is highly weighted toward purse-seine catch-at-length data. For the first difference with a weighting factor of 0.1 and a length-based penalty, the smallest purse-seine crossvalidation, longline cross-validation, and scaled cross-validation scores occurred for a monotonic penalty weighting factor of 1000 (Table 3). There is a substantial difference in all the estimates of management quantities, except SMSY /S0 , between the monotonic (weighting factor of 1000) and the nonmonotonic (weighting factor of 0) selectivities for the southern longline fishery. When the best of each first, second, and third difference are combined, the resulting cross-validation scores are worse (Table 4). This is also true when the best single penalty for purse seine is used for purse seine simultaneously when the best single penalty for longline is use for longline (Table 4). This indicates

that the weights used for purse-seine influence the resulting cross-validation score for longline and vise versa. The estimates of management parameters for these two sets of penalties are, however, only moderately different from the estimates obtained when using the penalties that give the lowest cross-validation scores. The estimates of selectivity differ substantially depending on the weights (Figs. 1 and 2). Penalties that are too small can cause the selectivity curve to be rough (Fig. 1D) which has unrealistic changes in selectivity from one age to the next. Penalties that are too large can cause the selectivity curve to be too limited in range (Fig. 1E) and not select for some ages seen in the catch-at-length data. The effects are somewhat different for the longline fishery, which has the additional monotonic penalty, for which most of the differences in estimated selectivity among penalties occurs for the older ages (Fig. 2). The runs using a different random set of catch-at-length data sets for the test data set show that there can be a switch in the order of the best weighing factors for weights for which the cross-validation scores are similar. However, these weighting factors give similar estimates for the management quantities. 4. Discussion We used hold-out cross validation model selection to determine the most appropriate form of selectivity curve, using a nonparametric approach to represent selectivity, and applied this method

288

M.N. Maunder, S.J. Harley / Fisheries Research 110 (2011) 283–288

to data for bigeye tuna in the EPO. We found that the estimated management quantities were relatively robust within the set of smoothness penalties that gave low cross-validation scores. We also found that poor choices of the smoothness penalties could give very different results. Poor choices include both under smoothing (e.g. no penalties) and over smoothing. The most influential factor was the inclusion of a monotonic penalty for the southern longline fishery. The best method for the longline fisheries was the first difference penalty with a length-based penalty and a monotonic penalty. The length-based penalty for the longline gear makes sense, because longlines catch a wider range of sizes of fish, including the larger fish, which have a lower growth rate, and thus consecutive ages should have similar selectivities. The monotonic penalty was also expected, because the southern longline gear catches the largest individuals. The best method for purse seine was the third difference, which makes sense, because a dome-shaped curve is expected for these gears that do not catch the largest fish. We used hold-out cross-validation rather that other forms of cross-validation due to the computational demands of the stock assessment model. A different randomly selected data set produced a switch in the order of the best weighing factors for weights for which the cross-validation scores are similar. Despite the management parameters being reasonably robust to the prediction data set, scaling of the cross-validation score, and other factors, our application of cross-validation should be considered an initial investigation into the use of cross-validation for estimating weighting factors for smoothness penalties of nonparametric selectivity curves. Alternative approaches based on commonly used methods in nonparametric statistics (Wood, 2006) and other fields of research should be investigated. Ordinary cross-validation (OCV) in which one data set is left out at a time may be too computationally intensive for some applications and it is not clear if generalized cross-validation (GCV) is applicable. K-fold cross-validation, which repeatedly retains a percentage of the data set for prediction, may be appropriate to balance the variance-bias tradeoff (Arlot and Celisse, 2010). The performance of cross-validation differs among applications and the methods used will be a tradeoff among bias, variance, and computational complexity (Arlot and Celisse, 2010). Case specific analyses may be needed to determine the best crossvalidation method to use. Future work should involve comprehensive simulation studies to test the method and to determine if general recommendations for weighting factors and the form of smoothness penalties can be developed. Some questions that remain to be answered include (1) what form of cross-validation should be used; (2) should penalties be placed on selectivity or the logarithm of the selectivity; (3) are there general guidelines on what smoothness penalties and weighting factors should be used; and (4) how to test multiple smoothness penalties simultaneously. Cross validation could also be used to select among functional forms or to select between the nonparametric selectivity method we present here and functional forms. It might be found that simple parametric forms are appropriate for many applications. The implementation of cross validation is very simple, and therefore can be a standard procedure to use for selecting selectivity or other components of the stock assessment model. For example, we have also used it to select among different methods to model temporal variability in growth, using penalized likelihood versions of random effects.

Acknowledgements William Bayliff, Cleridy Lennert-Cody, Richard Deriso, Andre Punt, and two anonymous reviewers provided comments that improved the manuscript. References Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (Eds.), 2nd International Symposium on Information Theory. Publishing house of the Hungarian Academy of Sciences, Budapest, pp. 268–281, Reprinted in 1992 in Kotz, S., Johnson, N. (Eds.), Breakthroughs in Statistics, vol. 1, Springer Verlag, New York, pp. 610–624. Arlot, S., Celisse, A., 2010. A survey of cross-validation procedures for model selection. Stat. Surveys 4, 40–79. Craven, P., Wahba, G., 1979. Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377–403. Devroye, L., Wagner, T.J., 1979. Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25, 601–604. Fournier, D.A., Sibert, J.R., Majkowski, J., Hampton, J., 1990. MULTIFAN a likelihoodbased method for estimating growth parameters and age-composition from multiple length frequency data sets illustrated using data for southern bluefin tuna (Thunnus maccoyii). Can. J. Fish. Aquat. Sci. 47, 301–317. Fournier, D.A., Hampton, J., Sibert, J.R., 1998. MULTIFAN-CL: a length-based, agestructured model for fisheries stock assessment, with application to South Pacific albacore, Thunnus alalunga. Can. J. Fish. Aquat. Sci. 55, 2105–2116. Geisser, S., 1975. The predictive sample reuse method with applications. J. Am. Stat. Assoc. 70, 320–328. Haist, V., Fournier, D., Saunders, M.W., 1999. Reconstruction of B.C. sablefish stocks, 1966–1998, and catch projections for 1999, using an integrated catch-age mark-recapture model with area and depth movement. Canadian Stock Assessment Secretariat Research Document 99/79, http://www.dfompo.gc.ca/csas/Csas/DocREC/1999/pdf/99 079e.pdf. Helu, S.L., Sampson, D.B., Yin, Y., 2000. Application of statistical model selection criteria to the stock synthesis assessment program. Can. J. Fish. Aquat. Sci. 57, 1784–1793. Hilborn, R., Mangel, M., 1997. The Ecological Detective: Confronting Models with Data. Princeton University Press, Princeton. Hilborn, R., Maunder, M.N., Parma, A., Ernst, B., Paynes, J., Starr, P.J., 2000. Documentation for a General Age-Structured Bayesian Stock Assessment Model: Code Named Coleraine. Fisheries Research Institute, University of Washington, FRI/UW 00/01, http://www.fish.washington.edu/ research/coleraine/pdf/coleraine.pdf. Ianelli, J.N., 2002. Simulation analysis testing the robustness of productivity determinations from west coast Pacific Ocean perch stock assessment data. North Am. J. Fish. Man. 22, 301–310. Maunder, M.N., 2002. Allocation of effort among fishing methods: its relation to MSY and other considerations. Fish Fish. 3, 251–260. Maunder, M.N., Harley, S.J., 2002. Status of bigeye tuna in the eastern Pacific Ocean. Inter-American Tropical Tuna Commission Stock Assessment Report, 3, 201–311. Maunder, M.N., Watters, G.M., 2003. A-SCALA: an age-structured statistical catch-atlength analysis for assessing tuna stocks in the eastern Pacific Ocean. Inter-Am. Trop. Tuna Comm. Bull. 22, 433–582. Punt, A.E., Walker, T.I., 1998. Stock assessment and risk analysis for the school shark (Galeorhinus galeus) off southern Australia. Mar. Freshw. Res. 49, 719–713. Schwarz, G., 1978. Estimating the dimension of a model. Ann. Stat. 6, 461–464. Sinclair, A.F., 1993. Partial recruitment considerations in setting catch quotas. Can. J. Fish. Aquat. Sci. 50, 734–742. Smith, A.D.M., Punt. A.E., 1998. Stock assessment of gemfish (Rexea solandri) in eastern Australia using maximum likelihood and Bayesian methods. In: Funk, F., Quinn II, T.J., Heifetz, J., Ianelli, J.N., Powers, J.E., Schweigert, J.J., Sullivan, P.J., Zhang C.I., (Eds.), Fishery Stock Assessment Models (Proceedings of the International Symposium on Fishery Stock Assessment Models for the 21st Century, October 8–11, 1997, Anchorage, Alaska) Alaska Sea Grant College Program Report No. AK-SG-98-01. University of Alaska Fairbanks, pp. 245–286. Stone, M., 1974. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B. 36, 111–147. Thompson, G.G., 1994. Confounding of gear selectivity and the natural mortality rate in cases where the former is a nonmonotone function of age. Can. J. Fish. Aquat. Sci. 51, 2654–2664. Watters, G.M., Maunder, M.N., 2001. Status of bigeye tuna in the eastern Pacific Ocean. In: Inter-American Tropical Tuna Commission Stock Assessment Report, 1 , pp. 109–211. Wood, S.N., 2006. Generalized Additive Models: an introduction with R. Chapman and Hall.