Journal of Food Engineering 131 (2014) 7–17
Contents lists available at ScienceDirect
Journal of Food Engineering journal homepage: www.elsevier.com/locate/jfoodeng
Modelling the relationship between peel colour and the quality of fresh mango fruit using Random Forests Shinji Fukuda a, Eriko Yasunaga b,⇑, Marcus Nagle c, Kozue Yuge a, Vicha Sardsud d, Wolfram Spreer c,e, Joachim Müller c a
Faculty of Agriculture, Kyushu University, Hakozaki 6-10-1, Fukuoka 812-8581, Japan Institute for Sustainable Agro-ecosystem Services, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Midori Machi 1-1-1, Nishi Tokyo, Tokyo 188-0002, Japan Institute of Agricultural Engineering, Universität Hohenheim, Garbenstrasse 9, Stuttgart 70599, Germany d Postharvest Technology Research Institute, Chiang Mai University, Chiang Mai 50200, Thailand e Science and Technology Research Institute, Chiang Mai University, 239 Huay Kaew Road, Chiang Mai 50200, Thailand b c
a r t i c l e
i n f o
Article history: Received 26 August 2013 Received in revised form 2 December 2013 Accepted 10 January 2014 Available online 21 January 2014 Keywords: Data mining Non-destructive assessment Postharvest ripening Fruit quality Peel colour Mangifera indica L
a b s t r a c t Mango (Mangifera indica L.) is one of the major tropical fruits exported through long supply chains to export markets. Production of high quality fruits and monitoring postharvest changes during storage and transport are thus primary concerns for exporters to ensure the premium value of fresh mango fruit after distribution. This study aims to demonstrate the applicability of Random Forests (RF) for estimating the internal qualities of mango based on peel colour. Two cultivars, namely Nam Dokmai and Irwin, having different fruit properties and grown in intensively managed orchards in Thailand and Japan, respectively, were used in this study. Postharvest changes in peel colour and fruit quality were observed under three storage conditions with respect to temperature. RF models were applied to establish a relationship between peel colour and fruit quality, and then tested the applicability based on model accuracy and variable importance computed by the RF. Specifically, this work demonstrates how the variable importance can be used to interpret the model results. The high accuracy and the information retrieved by the RF models suggest the applicability and practicality as a non-destructive assessment method for the quality of fresh mango fruit. Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction Mango (Mangifera indica L.) is one of the major tropical fruits, favoured for its taste, colour, texture and nutritional value. These fruit quality parameters depend strongly on both production (González et al., 2004; Spreer et al., 2007, 2009; Nagle et al., 2010) and postharvest management (Mahayothee et al., 2007; Kienzle et al., 2011, 2012). Mango exhibits climacteric behaviour, characterised by decreasing fruit respiration during development leading to a minimum, followed by a rise in respiration levels until full ripeness. Climacteric fruits are commonly harvested directly after the pre-climacteric minimum, meaning mature but not ripe for consumption, after which the fruit then undergoes post-harvest ripening during the climacteric rise (Grierson, 2002). The time of harvest ⇑ Corresponding author. Tel.: +81 42 463 1687; fax: +81 42 464 4391. E-mail addresses:
[email protected] (S. Fukuda),
[email protected]. u-tokyo.ac.jp (E. Yasunaga),
[email protected] (M. Nagle), yuge@ bpes.kyushu-u.ac.jp (K. Yuge),
[email protected] (V. Sardsud), wolfram.
[email protected] (W. Spreer),
[email protected] (J. Müller). http://dx.doi.org/10.1016/j.jfoodeng.2014.01.007 0260-8774/Ó 2014 Elsevier Ltd. All rights reserved.
influences the magnitude of the climacteric curve, and therefore, the final product quality (Seymour et al., 1990). For example, fruits harvested too early do not fully realise the desired ripening changes and a late harvest will lead to reduced shelf life and offflavour (Medlicott et al., 1988; Lalel et al., 2003). In this regard, harvest maturity should be primarily considered for a better distribution management. Several methods are known for determining the optimum harvest time of mango that require a set of maturity-related physiological or quality attributes. The mango industry commonly uses destructive methods such as determination of flesh firmness and soluble solids content for assessing harvest maturity. However, harvest decisions based on fruit sampling are flawed because of high inconsistency between mango varieties and high variability within trees and orchards (Herold et al., 2005). Furthermore, access to technological resources, specialised equipment and expertise are needed. Most producers are unable to accurately determine the best time for harvest and remain using unreliable methods with no real standardization. Thus, a large potential exists for rapid, accurate, non-destructive sensor technology for predicting interior fruit qualities for determination of
8
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17
optimal harvest time. Such sensors are useful because they allow measurement of every fruit, since they are non-destructive, and can be repeated while leaving the fruit on the tree until it is mature. Postharvest ripening is a key process affecting the fruit quality during distribution, which is important especially for long supply chains. Changes in fruit properties during ripening are strongly affected by storage and transport conditions, such as temperature, humidity and atmospheric composition. The ability to monitor fruit quality changes during the postharvest handling chain, especially by non-destructive assessment methods, can help to ensure a premium product after distribution. Peel colour, paramount with respect to consumer acceptance, has been found to be one of the major fruit quality indicators for mangoes (Saranwong et al., 2004; Vásquez-Caicedo et al., 2005). However, Kienzle et al. (2011, 2012) reported that postharvest maturity stage is difficult to specify with respect to peel colour based on a 7-day monitoring period of comprehensive fruit quality parameters. Thus, it is worth investigating the relationships between fruit quality and peel colour parameters based on longer monitoring periods. For this purpose, predictive modelling methods, such as machine learning, can be applied. Such predictive models allow for establishing peel-colour-based systems for harvest maturity specification as well as automatic sorting systems that can, for instance, be used for sorting of mango fruit suitable either for domestic or export markets. Quality prediction is one of the major interests in postharvest engineering (Hertog et al., 2011), in which machine learning methods, including artificial neural networks (ANNs), support vector machines (SVMs) and classification and regression trees (CARTs), can be useful as data-driven modelling approaches. For instance, ANNs have been used to detect chilling injury in apple based on hyperspectral imaging (ElMasry et al., 2009) and to predict a vase life of cut roses (In et al., 2009). Gomez-Sanchis et al. (2012) applied ANNs and CARTs for rottenness detection of citrus fruits. SVMs have been increasingly applied, for instance, to detect browning degree of mango fruit (Zheng and Lu, 2012), or to detect bruising of red bayberry (Lu et al., 2011) and apples (Baranowski et al., 2012). Mollazade et al. (2012) have compared model accuracy of several machine learning methods for grading raisins based on visual images. Among various machine learning methods, the Random Forests (RF: Breiman, 2001) algorithm has been regarded as one of the most precise prediction methods, having advantages such as ability to determine variable importance, ability to model complex interactions among predictor variables, and flexibility to perform several types of statistical data analysis including regression, classification and unsupervised learning (Cutler et al., 2007). The use of RF allows for a new way of modelling and extracting information from observation data, and thus contributes to a better understanding of a target system and mechanism that are, in general, complex and nonlinear. Its high predictive capability has been supported by previous comparative studies with other machine learning methods (Benito Garzón et al., 2006; Peters et al., 2007; Slabbinck et al., 2009; Kampichler et al., 2010; Pino-Mejías et al., 2010; Bisrat et al., 2012; Fukuda et al., 2013a). Also, it has been successfully implemented for yield estimation of agricultural products (Vincenzi et al., 2011; Fukuda et al., 2013b). However, to the best of our knowledge, no study has applied RF in food engineering. This study aims to demonstrate the applicability of RF as a tool for estimating the internal quality of fresh mango fruit based on peel colour. The results are presented with a specific focus on the relationship between peel colour and fruit quality changes, considering the postharvest ripening process of mango fruit. Specifically, variable importance computed by RF was investigated to interpret the model results. As such, the method of how to interpret the
model results can be illustrated (i.e. specification of the important colour parameters for modelling the quality of fresh mango fruit). 2. Materials and methods Two commercial mango cultivars, having different fruit properties and grown in intensively managed orchards in Thailand and Japan, were used to test the applicability of proposed approach and compare the difference between the cultivars. In fruit quality monitoring, the changes in peel colour and fruit quality were observed before and after distribution under three postharvest conditions with respect to temperature, while distribution condition was the same across the postharvest conditions. Based on the observation data of changes of peel colour and fruit quality, a set of RF models were developed and evaluated with respect to model accuracy and variable importance computed by the RF models. 2.1. Fruit sample collection Fresh fruit samples of two mango cultivars, namely ‘Nam Dokmai’ (Fig. 1a) and ‘Irwin’ (Fig. 1b), were collected from an intensively managed orchard in Phitsanulok, Thailand (16°330 N, 100°370 E, 64 m a.s.l.) and in Okinawa, Japan (26°100 N, 127°410 E, 18 m a.s.l.), respectively. These orchards were selected because they produce high quality mangoes for export to major markets in Japan. The cultivars used differed considerably with respect to phenotypic attributes. Each represented one of the two races in which mangoes are classified, namely subtropical (monoembryonic) and tropical (polyembryonic). Subtropical varieties usually exhibit the ‘bicolor’ characteristic of green peel with a red shoulder, while the tropical varieties are normally considered to be ‘all yellow’, although some mango varieties can even exhibit unordinary hues of red and purple. Irwin is described as subtropical, exhibiting development of a red shoulder and Nam Dokmai is included in the tropical yellow varieties (Litz and Gomez-Lim, 2005).
Fig. 1. Photographs showing the (a) Nam Dokmai and (b) Irwin samples after distribution.
9
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17
Fruit samples were harvested at the same time as the commercial harvest for each orchard. Because of the different distance to the final destination (i.e. Fukuoka, Japan in this study), the maturity stage of the fruit samples at harvest was different between the cultivars. That is, the Nam Dokmai samples were harvested at the premature stage (while they are greenish-yellow), whereas the Irwin samples were harvested at the fully mature stage. Despite the difference in maturity stage, the variation at harvest was uniform within samples from the same orchard (Yasunaga et al., 2013). In the following experiments, only undamaged disease-free fruit samples were used. 2.2. Fruit quality assessment A series of experiments using three temperature treatments of 15, 25 and 35 °C was conducted before and after distribution in order to monitor the changes of postharvest fruit quality and peel colour of mango under different postharvest ripening conditions. The distribution condition from orchard to the destination was the same across the three temperature treatments. Specifically, the fruit quality was measured for five fruit samples for each of the three temperature treatments. For Nam Dokmai, 85 fruit samples were used including two initial measurements (i.e. each of before and after distribution) and five measurements (i.e. three at a one-day interval before distribution and two at a two-day interval after distribution) for each of the three temperature treatments. For Irwin, 55 fruit samples were taken, including two initial measurements (i.e. each of before and after distribution) and three measurements (at a two-day interval after distribution) for each of the three temperature treatments. The experimental setting and the number of fruit samples were different (Table 1), because of the different distribution times from the orchard to the destination. Whereas transport took about 19 days for Nam Dokmai samples (Phitsanulok to Bangkok [4 days, 28.7 ± 2.0 °C], Bangkok to Osaka [14 days, 13.0 ± 0.5 °C], and Osaka to Fukuoka [1 day, 14.7 ± 1.1 °C]), it took only two days for Irwin samples (Okinawa to Fukuoka [29.0 ± 2.6 °C]). Peel colour was measured based on the CIE L a b tri-stimulus colour space, which is an international standard for colour measurement. The CIE L a b colour space is comprised of a luminance component L (from black to white), and two chromatic components of a (from green to red) and b (from blue to yellow). Colour measurements were taken at three longitudinal points along the fruit surface using a hand-held colorimeter (CR-13, Konica Minolta, Japan), of which the mean value was used for the subsequent analyses. Two additional colour parameters were calculated based on a and b values, namely metric Chroma (C) and metric hue-angle (h):
C ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ða Þ2 þ ðb Þ
ð1Þ
b h ¼ tan1 a
ð2Þ
Hardness score (N) was determined at three longitudinal points along the fruit surface using a hand-held penetrometer (Fujiwara, KM-5) equipped with a core-shaped probe. The mean value of the three measurements was used in the subsequent analyses. Lascorbic acid (L-AsA) content (mg/100 g fresh weight (FW)), which is a key nutritional component and an indicator for freshness, was determined according to the method of Yasunaga et al. (2009). A 10-g fresh sample was homogenised and extracted in 20 ml 5% metaphosphoric acid solution. Extracts were analysed using a high-performance liquid chromatography (HPLC) system (LC 10A, Shimadzu, Japan) equipped with an UV–VIS detector (SPD-20AV) at 300 nm. Total soluble solids (TSS) content of mango was measured in degree Brix with a digital hand-held refractometer (IPR101a, Atago, Japan). 2.3. Random forests RF fits many classification and regression trees to a data set, and then combines the predictions from all the trees. The algorithm begins with the selection of many bootstrap samples from the data set. A classification and regression tree is fit to each of the bootstrap samples, but at each node, only a small number of randomly selected variables are available for binary partitioning. The trees are fully grown and each is used to predict the out-of-bag (OOB) observations. The predicted class (or value for regression) of an observation is calculated by majority vote (or averaging in regression), with ties split randomly. The bagging procedure (Breiman, 1996) embedded in RF alleviates overfitting problems in modelling from observation data. One important feature of the RF is its ability to assess the importance of each input variable (Cutler et al., 2007). In this study, the randomForest package (Liaw and Wiener, 2002) was used in the R software (R Development Core Team, 2011), in which the default setting was applied because default RF could present high predictive performance compared to other data-driven methods (Fukuda et al., 2013a). The modelling process was repeated ten times using ten different sets of initial conditions in order to evaluate the variability of the model structures resulted from the initial conditions. The percent increment of mean squared error (MSE), which is a variable importance measure evaluated based on the prediction errors on the OOB data, was employed. 2.4. Fruit quality modelling Six RF models (i.e. three data sets two sets of different input variables; see below) were developed for estimating each of the three quality parameters of fresh mango fruits (i.e. hardness,
Table 1 Descriptive statistics for peel colour and fruit quality parameters of mango fruit: (a) Nam Dokmai, (b) Irwin and (c) pooled data sets. L
a
b
C
h
Hardness (N)
L-AsA
(n = 85) 65.3 73.1 ± 3.4 78.5
1.8 8.5 ± 2.6 13.9
23.1 35.5 ± 3.1 41.6
23.8 36.5 ± 3.5 43.9
70.2 76.7 ± 3.2 87.2
8.40 22.17 ± 10.00 35.77
20.6 27.3 ± 3.41 37.5
9.4 16.1 ± 3.13 21.6
(b) Irwin (n = 55) Minimum 29.5 Mean ± SD 41.4 ± 5.7 Maximum 52.9
8.7 26.7 ± 7.5 40.1
6.3 30.4 ± 12.2 50.6
19.1 42.0 ± 8.5 54.7
13.0 46.4 ± 16.7 80.2
9.97 13.18 ± 2.38 20.32
14.3 33.3 ± 6.01 51.3
9.5 13.6 ± 1.68 16.4
(c) Pooled data (n = 140) Minimum 29.5 Mean ± SD 60.6 ± 16.1 Maximum 78.5
1.8 15.7 ± 10.3 40.1
6.3 33.5 ± 8.3 50.6
19.1 38.7 ± 6.5 54.7
13.0 64.7 ± 18.3 87.2
8.40 18.64 ± 9.05 35.77
14.3 29.7 ± 5.44 51.3
9.4 15.1 ± 2.92 21.6
(a) Nam Dokmai Minimum Mean ± SD Maximum
h, hue angle; L-AsA, L-ascorbic acid; TSS, total soluble solids; SD, standard deviation.
(mg/100 g FW)
TSS (°BRIX)
10
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17
Table 2 Correlation matrix of peel colour and fruit quality parameters, of which the right top part is for Nam Dokmai and the left bottom part is for Irwin. L
a
b
C
h
Hardness
TSS
0.894 – 0.316 0.142 0.583 0.214 0.425
0.625 0.706 – 0.883 0.940 0.566 0.096
L-AsA
– 0.693 0.594 0.294 0.693 0.098 0.451
0.702 0.785 0.993 – 0.686 0.654 0.061
0.873 0.969 0.519 0.616 – 0.449 0.190
0.884 0.835 0.645 0.706 0.797 – 0.225
0.213 0.183 0.028 0.058 0.216 0.095 –
TSS
0.565
0.553
0.579
0.340
0.658
0.262
0.142
L a b C h Hardness
L-AsA
0.717 0.740 0.638 0.682 0.692 0.827 0.067 –
h, hue angle; L-AsA, L-ascorbic acid; TSS, total soluble solids. Significant at the 5% level.
(c)
(b)
180
50
50
50
120
40
40
40
60
30
30
30
0
20
20
20
−60
10
10
10
−120
0
0
0
0
20 40 60 80 100
(e)
0
20 40 60 80 100
L*
(f) 60
180
50
120
40
40
60
h
30
−180
0
20 40 60 80 100
L*
0
20
20
10
10
−120
0
0
−180
0 10 20 30 40 50 60
−60
0 10 20 30 40 50 60
a*
(h)
20 40 60 80 100
(g)
50 30
0
L*
60
C*
b*
h
60
C*
60
L*
a*
0 10 20 30 40 50 60
a*
(i) 60
180
50
120
40
60
30
h
C*
(d)
60
b*
a*
(a)
0
20
−60
10
−120
0
0 10 20 30 40 50 60
b*
−180
0 10 20 30 40 50 60
b*
(j) 180 120
h
60 0 N D k i (b f di t ib ti ) (before : Nam Dokmai distribution) : Nam Dokmai (after distribution) : Irwin (after distribution)
−60 −120 −180
0 10 20 30 40 50 60
C* Fig. 2. Hardness score of the Nam Dokmai and Irwin samples in relation to the correlation between each pair of peel colour parameters, namely L, a, b, C and h. White and black marks indicate the minimum and maximum observed hardness score, respectively, with grey marks showing the hardness score in between by the intensity of colour. L-AsA
and TSS) and for assessing the importance of peel colour parameters (i.e. L, a, b, C and h). The three data sets correspond to Nam Dokmai, Irwin and pooled data sets (Table 1), and the two
sets of input variables correspond to three colour parameters, namely L, a and b (LAB model) and five colour parameters, namely L, a, b, C and h (LCH model), respectively. Based on
11
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17
the data set and input variables used, the six RF models developed in this study were labelled as LABN, LABI, LABP, LCHN, LCHI, and LCHP, in which subscripts N, I and P indicate Nam Dokmai, Irwin and pooled data set, respectively. The model performance was assessed on the basis of Pearson’s correlation coefficients (COR), Nash–Sutcliffe efficiency (NSE; Nash and Sutcliffe, 1970), and root mean squared error (RMSE) between the observed and modelled mango fruit quality.
P P P N Ni¼1 ðY o;i Y m;i Þ Ni¼1 Y o;i Ni¼1 Y m;i rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi COR ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P 2ffi P P P 2 N N Ni¼1 Y 2o;i ð Ni¼1 Y o;i Þ N Ni¼1 Y 2m;i Y m;i i¼1 ð3Þ PN ðY o;i Y m;i Þ2 NSE ¼ 1 Pi¼1 2 N i¼1 ðY o;i Y o Þ
50
50
120
40
40
40
60
30
30
0
20
20
20
−60
10
10
−120
0
0
20 40 60 80 100
0
0
20 40 60 80 100
L** 60
180
50
50
120
40
40
60
h
30 20
10
10
−120
0
0
−180
0 10 20 30 40 50 60
a* a
0
20 40 60 80 100
L**
−60
0 10 20 30 40 50 60
a* a
60
(i) 180
50
120
40
60
h
20
−180
0
20
30
20 40 60 80 100
L**
60
30
0
(g)
(f)
C*
b*
30
10
(e)
C*
h
180
50
C* C
60
L**
0 10 20 30 40 50 60
a* a
0
−60
10
−120
0
−180
0 10 20 30 40 50 60
b*
(j)
(d)
60
0
(h)
where Yo,i and Ym,i are the observed and modelled fruit quality for the data point i (=1, 2, . . . , N), Y o is the mean observed fruit quality, and N is the size of the data set. Specifically, the NSE takes a value in the range [–1, 1], of which an NSE value of 1 means a perfect fit. Of these measures, the mean and standard deviation from ten replications were used as the measure of the accuracy and the variability of model structures, respectively. The variable importance evaluated in the RF computation was used to weight the contribution of peel colour parameters (i.e. L, a, b, C and h) to the estimation of mango fruit quality. The model performance and variable importance between the models was compared to illustrate the relationship between the peel colour and the quality of fresh mango fruit during postharvest ripening.
(c)
(b)
ð5Þ
60
b* b
a* a
(a)
ð4Þ
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u1 X RMSE ¼ t ðY o;i Y m;i Þ2 N i¼1
0 10 20 30 40 50 60
b*
180 120
h
60 0 (before : Nam N D Dokmai k i (b f di distribution) t ib ti ) : Nam Dokmai (after distribution) : Irwin (after distribution)
−60 −120 −180
0 10 20 30 40 50 60
C* Fig. 3. L-ascorbic acid (L-AsA) content of the Nam Dokmai and Irwin samples in relation to the correlation between each pair of peel colour parameters, namely L, a, b, C and h. White and black marks indicate the minimum and maximum observed L-AsA content, respectively, with grey marks showing the L-AsA content in between by the intensity of colour.
12
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17
The correlation matrix (Table 2) shows differences in the relationships between the peel colour and fruit quality parameters for each cultivar. For Nam Dokmai, correlations were observed between peel colour and fruit quality parameters, except for L-AsA. Regarding Irwin samples, several peel colour parameters such as b and h were strongly correlated. The correlation between peel colour and fruit quality parameters (except L-AsA) was weaker compared to Nam Dokmai. The three fruit quality parameters in relation to each pair of the five colour parameters are shown in Figs. 2–4, of which darker colour indicates a higher value of the respective fruit quality parameter (see Table 1c for the properties of each fruit quality parameter). Although some colour parameters overlap, the two cultivars can be classified based on colour parameters such as L. Despite the correlations between colour and quality parameters
3. Results 3.1. Mango fruit quality The peel colour (i.e. L, a, b, C and h) and fruit quality (i.e. hardness, L-AsA and TSS) values of fresh mango were different between Nam Dokmai and Irwin samples (Table 1). The Irwin samples had lower L values and higher a values than Nam Dokmai, which reflects the darker reddish peel colour. The range of peel colour of the Nam Dokmai samples were narrower and within that of the Irwin samples, except L and h. The fruit quality of two mango cultivars was different according to the fruit quality parameters (Table 1). For instance, the Nam Dokmai samples had a wider range of hardness and TSS, and a narrower range of L-AsA content than Irwin. Some quality parameters overlap irrespective of cultivar.
(c)
(b)
180
50
50
50
120
40
40
40
60
30
30
30
0
20
20
20
−60
10
10
10
−120
0
0
0
0
20 40 60 80 100
(e)
0
20 40 60 80 100
L** 180
50
120
40
40
60
30
h
C*
60
50
20
−60
10
10
−120
0
0
a* a
(h)
−180
0
20 40 60 80 100
L**
0
20
0 10 20 30 40 50 60
20 40 60 80 100
L**
60
30
0
(g)
(f)
b*
h
60
C*
60
L**
0 10 20 30 40 50 60
a* a
−180
0 10 20 30 40 50 60
a* a
(i) 60
180
50
120
40
60
h
C*
(d)
60
b*
a*
(a)
30
0
20
−60
10
−120
0
0 10 20 30 40 50 60
b*
−180
0 10 20 30 40 50 60
b*
(j) 180 120
h
60 0 : Nam N D Dokmai k i (b (before f di distribution) t ib ti ) : Nam Dokmai (after distribution) : Irwin (after distribution)
−60 −120 −180
0 10 20 30 40 50 60
C* Fig. 4. Total soluble solids (TSS) content of the Nam Dokmai and Irwin samples in relation to the correlation between each pair of peel colour parameters, namely L, a, b, C and h. White and black marks indicate the minimum and maximum observed TSS content, respectively, with grey marks showing the TSS content in between by the intensity of colour.
13
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17
(a)
(a) 40
(a) 30
50
30
40
20
30
20
20
0
Hard ness score (N)
(b) 40 30 20 10 0
(c) 40
10
10 0
Total solbule solids content (°Brix)
L− ascorbic acid content (mg /100g FW)
10
(b) 50 40 30 20 10 0
(c) 50
30
40
20
30
(b) 30 20 10 0
(c) 30 20
20 10
0
10
10 0
0 Initial 15°C 25°C 35°C
0 Initial 15°C 25°C 35°C
Initial 15°C 25°C 35°C
Experiment
Experiment
Experiment
(i) Hardness score Fig. 5. Postharvest quality change of fresh mango fruits before and after storage experiment: (a) Nam Dokmai before distribution, (b) Nam Dokmai after distribution, and (c) Irwin after distribution.
Table 3 Model performance of the Random Forests developed for estimating fruit quality parameters, namely hardness score, L-ascorbic acid (L-AsA) content and total soluble solids (TSS) content, based on each of the Nam Dokmai, Irwin and pooled data sets. Performance measure
Data
Hardness
TSS
L-AsA
LAB model
LCH model
LAB model
LCH model
LAB model
LCH model
COR
Nam Dokmai Irwin Pooled data
0.980 ± 0.001 0.962 ± 0.001 0.983 ± 0.000
0.978 ± 0.000 0.954 ± 0.001 0.982 ± 0.001
0.941 ± 0.002 0.906 ± 0.003 0.925 ± 0.001
0.939 ± 0.002 0.917 ± 0.002 0.934 ± 0.001
0.958 ± 0.001 0.922 ± 0.002 0.957 ± 0.001
0.957 ± 0.001 0.931 ± 0.002 0.956 ± 0.001
NSE
Nam Dokmai Irwin Pooled data
0.958 ± 0.001 0.881 ± 0.003 0.965 ± 0.000
0.953 ± 0.001 0.885 ± 0.002 0.962 ± 0.001
0.739 ± 0.004 0.757 ± 0.003 0.826 ± 0.002
0.758 ± 0.003 0.780 ± 0.005 0.841 ± 0.003
0.908 ± 0.001 0.816 ± 0.003 0.906 ± 0.001
0.906 ± 0.002 0.835 ± 0.003 0.902 ± 0.001
RMSE
Nam Dokmai Irwin Pooled data
2.048 ± 0.020 0.813 ± 0.010 1.676 ± 0.010
2.166 ± 0.020 0.794 ± 0.010 1.764 ± 0.029
1.734 ± 0.012 2.936 ± 0.019 2.263 ± 0.015
1.666 ± 0.012 2.790 ± 0.034 2.163 ± 0.020
0.943 ± 0.007 0.712 ± 0.005 0.893 ± 0.006
0.956 ± 0.008 0.674 ± 0.006 0.909 ± 0.005
COR: correlation coefficient. NSE: Nash–Sutcliffe efficiency. RMSE: root mean squared error.
observed (Table 2), it is difficult to find clear patterns or trends between the fruit quality and any pair of peel colour parameters (Figs. 2–4). The postharvest changes of fruit quality parameters were different according to cultivars and experimental conditions as well as before and after distribution (Fig. 5). A large variation in Fig. 5 indicates a large change of fruit quality during experiment. Nam Dokmai samples exhibited large changes in two fruit quality parameters, namely hardness and TSS, during the before-distribution experiments
(Fig. 5(i-a) and (iii-a), respectively). In contrast with Nam Dokmai samples, Irwin samples showed moderate temperature-dependent changes for all fruit quality parameters (Fig. 5c). 3.2. Model performance The RF models were able to link the peel colour parameters to each of the fruit quality parameters during postharvest ripening. Because the model accuracy of LAB models and LCH models is al-
14
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17
30
20
10
0
10
20
30
40
Observed hardness score (N)
30
20
10
10
20
30
40
Modelled L−ascorbic acid content (mg/100g FW)
Modelled hardness score (N)
(d)
0
Observed hardness score (N)
30
20
10
10
20
30
Observed hardness score (N)
40
Modelled L−ascorbic acid content (mg/100g FW)
Modelled hardness score (N)
(g)
0
30
20
10
0
0
10
20
30
40
(c) 20
10
0
50
0
50
(e) 40
30
20
10
0
0
10
20
30
40
(h) 40
30
20
10
0
0
10
20
30
40
50
Observed L−ascorbic acid content (mg/100g FW)
20
30
30
(f) 20
10
0
50
50
10
Observed total soluble solids content (°Brix )
0
10
20
30
Observed total soluble solids content (°Brix)
Observed L−ascorbic acid content (mg/100g FW)
40
0
40
30
Observed L−ascorbic acid content (mg/100g FW)
40
0
(b)
Modelled total soluble solids content (°Brix)
0
50
The variable importance evaluated from the RF computation was different according to the fruit quality parameters (Fig. 7 for LAB models; Fig. S2 for LCH models). The variable importance for Nam Dokmai (Fig. 7a–c) and Irwin (Fig. 7d–f) contrasted for each of the fruit quality parameters. Regarding Nam Dokmai samples, the important variables were L and a for hardness score and TSS, and b for L-AsA content. Regarding the Irwin samples, the important variables were b for hardness and TSS, and L and a for L-AsA content. In the pooled data set, the variable importance values were similar to the case of Nam Dokmai, except for L-AsA content which exhibited the same trend as for Irwin. Variable importance of L, a and b remained almost the same in the LCH models, while in some cases C and/or h appeared to be more important compared to the tri-stimulus colour values (Fig. S2).
Modelled total soluble solids content (°Brix)
Modelled hardness score (N)
(a)
3.3. Variable importance
Modelled total soluble solids content (°Brix)
40
Modelled L−ascorbic acid content (mg/100g FW)
most the same and no dominance of either model was observed (Table 3), the focus hereafter is on the results of LAB models only. In this study, the variability in the RF models, which may result from ten different sets of initial conditions, can be observed in the vertical spread of the plots with the same observed value (Fig. 6) and the standard deviations of the performance measures in Table 3. As a result, the variability in model structure was very small for all RF models developed in this study (see Fig. S1 for LCH models). The three LAB models, namely LABN, LABI and LABP, achieved very high performances for all fruit quality parameters (Fig. 6; Table 3) except LABI model, showing moderate accuracy with respect to NSE. The LABI model slightly overestimated the minimum L-AsA content (Fig. 6e), which coincides with the moderate NSE values for the fruit quality parameter (Table 3). The overestimation can also be observed for the pooled data set (Fig. 6h), whereas the NSE values are better than both LABN and LABI models (Table 3).
30
(i) 20
10
0
0
10
20
30
Observed total soluble solids content (°Brix)
Fig. 6. Scatter diagrams between observed and modelled mango fruit quality (LAB models): hardness score (a, d, g), L-ascorbic acid content (b, e, h) and total soluble solids content (c, f, i). The data shown are for Nam Dokmai (a–c), Irwin (d–f) and pooled data sets (g–i).
15
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17
(a)
(b)
Pe ercent increment of MSE
30
30 15
25
20
20 15
10 15
10
10
5 5
5
L*
a*
0
b*
(d)
L*
a*
0
b*
(e) 30
Percent incrrement of MSE
35
25
0
L*
a*
b*
L*
a*
b*
(f)
20
35 30
25 15
25
20
20 15
10 15
10
10
5 5 0
5
L*
a*
0
b*
(g)
L*
a*
0
b*
(h)
(i) 35
20
30
Perce ent increment of MSE
(c)
20
30
25 15
25 2
20
20 10
15
15 10
10
5 5 0
5
L* a* b* Variable
0
L* a* b* Variable
0
L* a* b* Variable
Fig. 7. Variable importance evaluated in the Random Forests computation for estimating each fruit quality parameter (LAB models): hardness score (a, d, g), L-ascorbic acid content (b, e, h) and total soluble solids content (c, f, i). The data shown are for Nam Dokmai (a–c), Irwin (d–f) and pooled data sets (g–i).
4. Discussion RF models were used to establish the relationships between the peel colour parameters (i.e. L, a and b) and each of the fruit quality parameters (i.e. hardness, L-AsA and TSS) of two mango cultivars, namely Nam Dokmai from Thailand and Irwin from Japan. The results are discussed below with respect to model performance and variable importance obtained from the RF models.
The model results are compared with the reported results in literature on the postharvest quality changes of fresh mango fruit. In this study, the LAB model could precisely estimate the quality of fresh mango fruit (comparative performance to LCH models). That is, once RF models have been developed, fruit quality could be predicted accurately from the three colour parameters (i.e. L, a and b). This result can be useful for some practical applications such as harvest maturity and postharvest quality monitoring. For
16
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17
instance, simple and a lower number of target variables are preferable for long-term or real-time monitoring including automatic harvest decision and postharvest sorting systems (Cubero et al., 2011; Nagle et al., 2012). In practice, it is difficult to decide which variable to remove when two (or more) variables correlate with each other. As demonstrated by Cutler et al. (2007), RF is not sensitive to collinearity and these correlated variables can be evaluated to have similar variable importance. The high accuracy, insensitivity to collinearity, and the measures of variable importance can be advantages of RF modelling, especially for a complex and nonlinear system. Despite the high accuracy, there were slight over- and underestimations of fruit quality parameters such as L-AsA content. This implies that fruit quality changes may have taken place without any change in peel colour, and thus the RF models could not estimate the fruit quality precisely. This could explain the results of Kienzle et al. (2011, 2012), in which postharvest maturity stage could not be specified for two tropical cultivars during a 7-day monitoring period of fruit quality parameters. Indeed, lower accuracy of RF models have been observed for the Irwin samples, of which monitoring period was shorter than for the Nam Dokmai samples. In addition, all Nam Dokmai samples used for the experiments were wrapped in carbon bags during field growth stages. This production practice produces pale yellow (instead of green) fruits and increases the percentage of the skin area with golden yellow colour after ripening (Hofman et al., 1997). Fruits with golden yellow colour are comparably better in appearance, which is preferred by consumers. This fact explains the more uniform colour of the Nam Dokmai samples, which produced better results with respect to estimating the fruit quality. During maturation and postharvest ripening, fruit quality has changed with the peel colour changes: from greenish yellow to yellowish orange for the Nam Dokmai samples, and from a yellow/red mixture to an orange/red mixture for the Irwin samples. The variability of fruit quality parameters observed in our experiment can be ascribed to the differences in temperature (i.e. 15, 25 and 35 °C) and maturity (i.e. premature and fully mature). Specifically, the Nam Dokmai samples showed higher variability because of earlier (premature) harvest and a longer monitoring period. In general, complex interactions between temperature and maturity have a significant influence on the ecophysiological responses of mango fruit such as respiration and ethylene production, and thus can result in variable postharvest ripening of the fruit. Further studies considering the temporal changes of fruit quality parameters in relation to ecophysiological responses of fresh mango fruit are needed for more precise modelling of postharvest quality changes of fresh mango fruit. A better understanding of the relationship between peel colour and fruit quality parameters as well as the influences of cultivars can be beneficial in two aspects: the establishment of better maturity specification methods and the development of improved sorting systems that can be used for sorting variable mango fruit either for domestic markets or export. In both cases, predictive machine learning methods such as RF can be applied as a tool for quantitative assessment of fruit quality.
5. Conclusion This study has shown that RF models could accurately estimate the quality of fresh mango fruit from peel colour as expressed by the CIE L a b colour system. Good correlations between peel colour and fruit quality parameters of fresh mango fruit, specifically for the Nam Dokmai samples, suggests the practicality of peel colour as an indicator for harvest maturity as well as postharvest quality of fresh mango fruit. Despite the distinct differences between the two cultivars, namely one tropical and one subtropical,
RF models could precisely estimate the quality of fresh mango fruit. Because the results cover a good scope of the variability in mango phenotypes, this study can contribute to the establishment of an improved distribution system and management strategies, specifically for a long supply chain, on the basis of non-destructive monitoring and precise modelling methods. Including more cultivars with similar peel colour properties is challenging and it can further illustrate the applicability of RF models as a tool for better non-destructive monitoring systems for the postharvest quality changes of fresh mango fruit. Further studies should consider the temporal changes of fruit quality parameters in relation to ecophysiological responses of fresh mango fruit, based on which existing distribution techniques such as controlled-atmosphere storage and transport can be improved for a long supply chain. Acknowledgements This study was supported in part by the JSPS Grant-in-Aid for Scientific Research (B) Grant Number 25304036, the Kyushu University interdisciplinary programs in education and projects in research development (Kyushu University P&P), the Deutsche Forschungsgemeinschaft (DFG) in the framework of the SFB 564 ‘‘The Uplands Program’’ and the JSPS Core-to-Core Program (B. Asia-Africa Science Platforms) ‘‘Collaborative Project for Soil and Water Conservation in Southeast Asian Watersheds’’. We thank Wanwarang Pattanapo, Daisuke Hamanaka, Yusuke Hanada and Ana Carolina Towata for their technical support for this work. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.jfoodeng.2014.01. 007. References Baranowski, P., Mazurek, W., Wozniak, J., Majewska, U., 2012. Detection of early bruises in apples using hyperspectral data and thermal imaging. J. Food Eng. 110, 345–355. Benito Garzón, M., Blazek, R., Neteler, M., Sánchez de Dios, R., Sainz-Ollero, H., Furlanello, C., 2006. Predicting habitat suitability with machine learning models: the potential area of Pinus sylvestris L. in the Iberian Peninsula. Ecol. Model. 197, 383–393. Bisrat, S.A., White, M.A., Beard, K.H., Cutler, R.D., 2012. Predicting the distribution potential of an invasive frog using remotely sensed data in Hawaii. Divers. Distrib. 18, 648–660. Breiman, L., 1996. Bagging predictors. Mach. Learn. 24, 123–140. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Cubero, S., Aleixos, N., Moltó, E., Gómez-Sanchis, J., Blasco, J., 2011. Advances in machine vision applications for automatic inspection and quality evaluation of fruits and vegetables. Food Bioprocess Technol. 4 (4), 487–504. Cutler, R.D., Edwards, T.C., Beard, K.H., Cutler, K.T., Gibson, H.J., Lawler, J.J., 2007. Random forests for classification in ecology. Ecology 88, 2783–2792. ElMasry, G., Wang, N., Vigneault, C., 2009. Detecting chilling injury in red delicious apple using hyperspectral imaging and neural networks. Postharvest Biol. Technol. 52, 1–8. Fukuda, S., De Baets, B., Waegeman, W., Verwaeren, J., Mouton, A.M., 2013a. Habitat prediction and knowledge extraction for spawning European grayling (Thymallus thymallus L.) using a broad range of species distribution models. Environ. Model. Softw. 47, 1–6. Fukuda, S., Spreer, W., Yasunaga, E., Yuge, K., Sardsud, V., Müller, J., 2013b. Random forests modelling for the estimation of mango (Mangifera indica L. cv. Chok Anan) fruit yields under different irrigation regimes. Agric. Water Manag. 116, 142–150. Gomez-Sanchis, J., Martin-Guerrero, J.D., Soria-Olivas, E., Martinez-Sober, M., Magdalena-Benedito, R., Blasco, J., 2012. Detecting rottenness caused by Penicillium genus fungi in citrus fruits using machine learning techniques. Expert Syst. Appl. 39, 780–785. González, A., Lu, P., Müller, W., 2004. Effect of pre-flowering irrigation on leaf photosynthesis, whole-tree water use and fruit yield of mango trees receiving two flowering treatments. Sci. Hortic. 112, 189–211. Grierson, W., 2002. Fruit development, maturation and ripening. In: Handbook of Plant and Crop Physiology. Marcel Dekker, New York, pp. 143–159. Herold, B., Truppel, I., Zude, M., Geyer, M., 2005. Spectral measurements on ‘Elstar’ apples during fruit development on the tree. Biosyst. Eng. 91 (2), 173–182.
S. Fukuda et al. / Journal of Food Engineering 131 (2014) 7–17 Hertog, M.L.A.T.M., Rudell, D.R., Pedreschi, R., Schaffer, R.J., Geeraerd, A.H., Nicolai, B.M., Ferguson, I., 2011. Where systems biology meets postharvest. Postharvest Biol. Technol. 62, 223–237. Hofman, P.J., Smith, L.G., Joyce, D.C., Johnson, G.I., Meiburg, G.F., 1997. Bagging of mango (Mangifera indica cv. ‘Keitt’) fruit influences fruit quality and mineral composition. Postharvest Biol. Technol. 12 (1), 83–91. In, B.-C., Inamoto, K., Doi, M., 2009. A neural network technique to develop a vase life prediction model of cut roses. Postharvest Biol. Technol. 52, 273–278. Kampichler, C., Wieland, R., Calmé, S., Weissenberger, H., Arriaga-Weiss, S., 2010. Classification in conservation biology: a comparison of five machine-learning methods. Ecol. Inform. 5, 441–450. Kienzle, S., Sruamsiri, P., Carle, R., Sirisakulwat, S., Spreer, W., Neidhart, S., 2011. Harvest maturity specification for mango fruit (Mangifera indica L. ‘Chok Anan’) in regard to long supply chains. Postharvest Biol. Tech. 61, 41–55. Kienzle, S., Sruamsiri, S., Carle, R., Sirisakulwat, S., Spreer, W., Neidhart, S., 2012. Harvest maturity detection for ‘Nam Dokmai #4’ mango fruit (Mangifera indica L.) in consideration of long supply chains. Postharvest Biol. Tech. 72, 64–75. Lalel, H.J.D., Singh, Z., Tan, S.C., 2003. Maturity stage at harvest affects fruit ripening, quality and biosynthesis of aroma volatile compounds in ‘Kensington Pride’ mango. J. Hortic. Sci. Biotechnol. 78 (2), 225–233. Liaw, A., Wiener, M., 2002. Classification and regression by random forest. R News 2 (3), 18–22. Litz, R.E., Gomez-Lim, M.I.A., 2005. Mangifera indica Mango. In: Litz, R.E., (Ed.), Biotechnology of Fruit and Nut Crops. CABI: Cambridge, MA USA, pp. 40–61. Lu, H., Zheng, H., Hu, Y., Lou, H., Kong, X., 2011. Bruise detection on red bayberry (Myrica rubra Sieb. & Zucc.) using fractal analysis and support vector machine. J. Food Eng. 104, 149–153. Mahayothee, B., Neidhart, S., Carle, R., Muehlbauer, W., 2007. Effects of variety, ripening condition and ripening stage on the quality of sulphite-free dried mango slices. Eur. Food Res. Technol. 225, 723–732. Medlicott, A.P., Reynolds, S.B., New, S.W., Thompson, A.K., 1988. Harvest maturity effects on mango fruit ripening. Trop. Agric. 65 (2), 153–157. Mollazade, K., Omid, M., Arefi, A., 2012. Comparing data mining classifiers for grading raisins based on visual features. Comput. Electron. Agric. 84, 124–131. Nagle, M., Mahayothee, B., Rungpichayapichet, P., Janjai, S., Müller, J., 2010. Effect of irrigation on near-infrared (NIR) based prediction of mango maturity. Sci. Hortic. 125, 771–774. Nagle, M., Romano, G., Intani, K., Spreer, W., Mahayothee, B., Sardsud, V., Müller, J., 2012. A novel optical approach to monitor color changes of Mango peel for machine-vision applications. In: Proceedings of the International Conference of Agricultural Engineering CIGR-AgEng2012.
. Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models. Part I: a discussion of principles. J. Hydrol. 10, 282–290.
17
Peters, J., De Baets, B., Verhoest, N.E.C., Samson, R., Degroeve, S., Becker, P.D., Huybrechts, W., 2007. Random forests as a tool for ecohydrological distribution modelling. Ecol. Model. 207, 304–318. Pino-Mejías, R., Cubiles-de-la-Vega, M.D., Anaya-Romero, M., Pascual-Acosta, A., Jordán-López, A., Bellinfante-Crocci, N., 2010. Predicting the potential habitat of oaks with data mining models and the R system. Environ. Model. Softw. 25, 826–836. R Development Core Team, 2011. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. . Saranwong, S., Sornsrivichai, J., Kawano, S., 2004. Prediction of ripe-stage eating quality of mango fruit from its harvest quality measured nondestructively by near infrared spectroscopy. Postharvest Biol. Technol. 31, 137–145. Seymour, G.B., N’Diaye, M., Wainwright, H., Tucker, G.A., 1990. Effects of cultivar and harvest maturity on ripening of mangoes during storage. J. Hortic. Sci. 65 (4), 479–483. Slabbinck, B., De Baets, B., Dawyndt, P., De Vos, P., 2009. Towards large-scale FAMEbased bacterial species identification using machine learning techniques. Syst. Appl. Microbiol. 32, 163–176. Spreer, W., Nagle, M., Neidhart, S., Carle, R., Ongprasert, S., Müller, J., 2007. Effect of regulated deficit irrigation and partial rootzone drying on the quality of mango fruits (Mangifera indica L., cv. ‘Chok Anan’). Agric. Water Manage. 88 (1–3), 173– 180. Spreer, W., Ongprasert, S., Hegele, M., Wünsche, J.N., Müller, J., 2009. Yield and fruit development in mango (Mangifera indica, L., cv. Chok Anan) under different irrigation regimes. Agric. Water Manage. 96, 574–584. Vásquez-Caicedo, A.L., Sruamsiri, P., Carle, R., Neidhart, S., 2005. Accumulation of all-trans-b-carotene and its 9-cis and 13-cis stereoisomers during postharvest ripening of nine Thai mango cultivars. J. Agric. Food Chem. 53, 4827–4835. Vincenzi, S., Zucchetta, M., Franzoi, P., Pellizzato, M., Pranovi, F., De Leo, G.A., Torricelli, P., 2011. Application of a Random Forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy. Ecol. Model. 222, 1471–1478. Yasunaga, E., Fukuda, S., Yuge, K., Sardsud, V., Spreer, W., Wanwarang, P., 2013. Comparison of changes in post-harvest quality deterioration of mango fruits between Thailand-Fukuoka and Okinawa-Fukuoka transportations. Acta Hortic. 989, 221–224. Yasunaga, E., Uchino, T., Yoshida, S., Tanaka, F., Chikushi, J., 2009. A proposed model to predict change in nutrient contents of Garland Chrysanthemum (Chrysanthemum coronarium) under distribution conditions. Shokubutsu Kankyo Kogaku 21, 154–161 (In Japanese with English abstract). Zheng, H., Lu, H., 2012. A least-squares support vector machine (LS-SVM) based on fractal analysis and CIELab parameters for the detection of browning degree on mango (Mangifera indica L.). Comput. Electron. Agric. 83, 47–51.