Fisheries Research 106 (2010) 112–114
Contents lists available at ScienceDirect
Fisheries Research journal homepage: www.elsevier.com/locate/fishres
Short communication
Inverse prediction for fish length–length conversion Yuk W. Cheng ∗ Washington Department of Fish and Wildlife, 1111 Washington St. S, Olympia, WA 98504, USA
a r t i c l e
i n f o
Article history: Received 20 October 2009 Received in revised form 12 July 2010 Accepted 14 July 2010 Keywords: Confidence intervals Bias Inverse prediction Sea cucumber
a b s t r a c t Inverse prediction is a common method used in ecology, marine fish stock assessment, forest research, and many other biological fields. It is unlikely, however, that inverse prediction is unbiased if data are not available for refitting. We propose an inverse prediction method to estimate the linear regression coefficient in the absence of an intercept, along with 95% confidence intervals. The proposed method uses the linear regression estimate, its standard deviation, and basic data statistics. We compare an existing inverse prediction method (Seber, 1977, p. 192), with the proposed method using a sea cucumber dataset. The proposed method provides results closer to actual known values and can also estimate the variance of the slope of inverse regression. © 2010 Elsevier B.V. All rights reserved.
1. Introduction
2. Materials and methods
Conversion between the two types of measurements is often useful if, for example, results from biological analysis must be translated into a management control. Total length (TL) is often used as a management control, while fork length (FL) is generally a more accurate measure for biological analyses of marine species. The need for length–length conversion is not unique to fisheries. Other examples where conversion is needed include: stem diameter to crown diameter in tropical ecology research (Silman and Krisel, 2006); beat sampling mite density to total mite density in entomology (Shrewsbury and Hardin, 2004); sex ratio to temperature in turtles (Valenzuela, 2001); visual fraction of an insect population to the total insect population (Yoo et al., 2003); and total length to fork length in fisheries stock assessment (Kahn et al., 2004). It is not uncommon to want to know the value of X given Y when the available relationships are in the form of a regression of Y on X through the origin. Using the inverse of the slope of the regression of Y on X will typically be a biased estimator since it is usual to ascribe error to the dependent variable in such regressions. If original data are available, then these should be used, but some sensible approximate correction is needed when this is not the case. Since the original data are seldom available a method that uses those data that are needed. Estimates of the means and variances of X and Y are obvious candidates. The objective of this note is to propose a pragmatic correction based upon these statistics and to demonstrate its use for giant red sea cucumber, Parastichopus californicus, body length to body weight conversion.
2.1. The existing method
∗ Tel.: +1 360 9022689. E-mail addresses:
[email protected],
[email protected]. 0165-7836/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.fishres.2010.07.003
Assume there are n sets of observations (X1 ,Y1 ), (X2 ,Y2 ), . . ., (Xn ,Yn ), X and Y follow a linear relationship where Y is zero when X is zero. It is straightforward to regress Y on X and obtain the estimated slope and its standard deviation. With these results and data statistics (means, variances, and number of observations), we can predict a new Y from an additional X measurement. If there is a need to convert a Y measurement back to an X and data are available, we can simply regress X on Y. But data may not always be available and the inverse prediction method is commonly used to predict a value of X from a measured Y without data (Seber, 1977, p. 192). We can write the original relationship as follows: Yi = bXi + εi , where i = 1, 2, . . ., n and εi ∼NID(0, ε2 ). NID means normally independently distributed. The inverse prediction of Yp is (Seber, 1977, p. 192): Xˆ p =
Yp . bˆ
2.2. The proposed method A prediction, Xˆ p , from Yp can be obtained using the model: Xi = ˇYi + ıi , where ıi ∼NID(0, ı2 ).
Y.W. Cheng / Fisheries Research 106 (2010) 112–114
113
Table 1 Inverse prediction of body diameter from body length for sea cucumber. Body length Yi (mm)
Method
Body diameter Xˆ i (mm) [bias]
Lower 95%CI[bias]
Minimum = 190
I II III
37.581 39.222[4.368%] 37.591[0.025%]
12.185 12.368[9.708%] 12.184[−0.013%]
62.977 65.361[3.785%] 62.998[0.032%]
Median = 300
I II III
59.339 61.931[4.368%] 59.354[0.025%]
33.794 35.998[6.524%] 33.798[0.011%]
84.884 88.311[4.037%] 84.910[0.031%]
Maximum = 385
I II III
76.151 79.478[4.368%] 76.710[0.025%]
50.446 53.437[5.929%] 50.454[0.016%]
101.856 106.093[4.159%] 101.886[0.030%]
Upper 95%CI[bias]
ˆ 2 , X, ¯ Y¯ , 2 and 2 from the original Given the statistics, b, X Y b model, we can derive an estimator for the coefficient ˇ:
1672.23 mm2 . The estimated coefficient of determination (R2 ) was 0.95.
ˆ = ˇ
3. Results
n Xi Yi , i=1 n 2
(1)
Y
i=1 i
and approximate (2) as follows: ˆ = bˆ ˇ
X¯ 2 + (n − 1/n) X2 ˆ ≈ bC, Y¯ 2 + (n − 1/n) 2
(2)
Y
where C = X¯ 2 + X2 /Y¯ 2 + Y2 . Eq. (1) is equivalent to equation (6) in Lwin n and Maritz (1980). If the original data are available, the term X Y of Eq. (1) can be calculated directly. It is unlikely that i=1 i i
n
X Y can be obtained from reports or studies. In practice, it is i=1 i i very difficult to implement the method of Lwin and Maritz (1980). ˆ X, ¯ Y¯ , 2 and 2 from reports However, it is highly likely to find b, X Y ˆ using Eq. (2) instead or studies. Therefore, it is feasible to estimate ˇ
Table 1 shows the inverse prediction results for sea cucumber widest diameter (X) from body length at three points in the dataset: minimum, median and maximum body length (Y). Method II produced a prediction at the minimum that was biased by 4.36% compared with the model based on the original data (Method I). In contrast, the bias from Method III was less than 0.03%. For the lower 95% CI at the minimum body length, the Method II resulted in a prediction with 9.71% bias while the Method III resulted with ±0.03% bias. It is clear that the Method III outperformed Method II and its estimates are very close to the inverse prediction with the original data.
of Eq. (1). Similarly, we can derive the variance of ˇ, 4. Discussion
ˇ2 = b2 C 2 . The predicted (1 − ˛)% CI for Xˆ p is:
ˆ p±t ˇY (1−(1−˛)/2),n−1
2 SXY
ˆ p ± t(1−(1−˛)/2),n−1 b C ≈ bCY
1+
Yp2
n
Y2 i=1 i
n(Y¯ 2 + Y2 ) + Yp2 .
2.3. Example-sea cucumber We compared estimates and 95% CI’s fitting Xi = ˇYi + ıi using the original data (Method I), an existing inverse prediction method using summary statistics (Seber, 1977, p. 192, Method II) and the proposed method using summary statistics (Method III) using data on sea cucumber body length and width. We measured the relative bias of Z1 to Z0 as follows: Z1 − Z0 × 100%. Z0 Fifty sea cucumbers were collected monthly in Puget Sound, USA, between 2005 and 2006. During lab analysis, each sea cucumber was removed from seawater in a contracted and turgid state and allowed to rest on a flat surface for 20 min so it could straighten itself and expel most of the sand it contained (Desurmont, 2003). Total body length and widest diameter measurements were taken to the nearest millimeter. The body diameter (X) and body ˆ was obtained, with the estimated length (Y) relationship Yˆ = bX ˆ b ) = 4.84(0.15). Basic data statistics were regression coefficient b( n = 50, X¯ = 59.98 mm, Y¯ = 298.70 mm, X2 = 114.72 mm2 and Y2 =
Inverse prediction is not only used in ecology and marine research. It is commonly used in chemistry (Tellinghuisen, 2000). Theoretically, the proposed method is equivalent to the actual inverse prediction with known data when the sample size approaches infinity. The proposed method is simple and easy to n implement, and does not require the term X Y . Both the i=1 i i estimated slope and variance of the slope, of the inverse regression can be estimated by a multiple of the proposed constant C. The variance of the estimated slope, and the statistics of the predictor and response variables are not often reported in publications. So, it is impossible to use the proposed method. This study demonstrates how important it is to report the summary statistics in the report. If an estimated linear regression is available but data statistics are not, they might be collected independently from the field or gleaned from reports and studies. Both linear regression and the proposed inverse prediction method are based on the assumption that X and Y are normally distributed. If this is not the case, Methods II and III may lead to biased results and it will be very difficult to access the bias of the proposed inverse prediction method if the bias from the linear regression model is unknown.
Acknowledgements The author thanks Michael Ulrich for providing the sea cucumber data. I also thank for Aaron Ellison, Kurt Reidinger, Andre Punt and an anonymous reviewer for reading the manuscript and offering many useful suggestions.
114
Y.W. Cheng / Fisheries Research 106 (2010) 112–114
References Desurmont, A., 2003. Papua New Guinea sea cucumber and beche-de-mer identification cards. SPC Beche-de-mer Information Bulletin 18, 8–14. Kahn, R.G., Pearson, D.E., Dick, E.J., 2004. Comparison of standard length, fork length and total length for measuring west coast marine fishes. Marine Fisheries Review 66 (1), 31–33. Lwin, T., Maritz, J.S., 1980. A note on the problem of statistical calibration. Applied Statistics 29 (2), 135–141. Seber, G.A.F., 1977. Linear Regression Analysis. John Wiley & Sons, New York, USA. Shrewsbury, P.M., Hardin, M.R., 2004. Beat sampling accuracy in estimating spruce spider mite (Acari: Tetrancychidae) populations and injury on Juniper. Journal of Economics Entomology 97 (4), 1444–1449.
Silman, M.R., Krisel, C., 2006. Getting to the root of tree neighbourhoods: hectarescale root zones of a neotropical fig. Journal of Tropical Ecology 22, 727– 730. Tellinghuisen, J., 2000. Inverse vs. classical calibration for small data sets. Fresenius Journal of Analytical Chemistry 368, 585–588. Valenzuela, N., 2001. Constant, shift, and natural temperature effects on sex determination in Podocnemis expansa turtles. Ecology 82 (11), 3010– 3024. Yoo, H.J.S., Stewart-Oaten, A., Murdoch, W.W., 2003. Converting visual census data into absolute abundance estimates: a method for calibrating timed counts of a sedentary insect population. Ecological Entomology 28, 490–499.