High-performance prediction of macauba fruit biomass for agricultural and industrial purposes using Artificial Neural Networks

High-performance prediction of macauba fruit biomass for agricultural and industrial purposes using Artificial Neural Networks

Industrial Crops & Products 108 (2017) 806–813 Contents lists available at ScienceDirect Industrial Crops & Products journal homepage: www.elsevier...

625KB Sizes 0 Downloads 68 Views

Industrial Crops & Products 108 (2017) 806–813

Contents lists available at ScienceDirect

Industrial Crops & Products journal homepage: www.elsevier.com/locate/indcrop

Research Paper

High-performance prediction of macauba fruit biomass for agricultural and industrial purposes using Artificial Neural Networks

MARK



Carla Aparecida de O. Castroa, Rafael T. Resendea, , Kacilda N. Kukic, Vinícius Q. Carneirob, Gustavo E. Marcattid, Cosme Damião Cruzb, Sérgio Y. Motoikec a

Department of Forestry, Universidade Federal de Viçosa/UFV, Av. Peter Henry Rolfs, s/n 36570-000, Viçosa, MG, Brazil Department of Statistics, Universidade Federal de Viçosa/UFV, Av. Peter Henry Rolfs, s/n, 36570-000, Viçosa, MG, Brazil c Department of Plant Science, Universidade Federal de Viçosa/UFV, Av. Peter Henry Rolfs, s/n, 36570-000, Viçosa, MG, Brazil d Department of Agricultural Sciences, Universidade Federal de São João Del-Rei/UFSJ, MG424 road, KM 47, n 202, 35701-970, Sete Lagoas, MG, Brazil b

A R T I C L E I N F O

A B S T R A C T

Keywords: ANN Prediction method Dry biomass Oil contents Yield Macaw palm

Biomass estimation plays of crucial role in agriculture and agro-based industries. The macauba, Acrocomia aculeata (Jacq.) Lood., ex Mart., is a palm species that has been a focal point for research and development of an alternative biomass-bioenergy crop for the tropics. The macauba fruit components (exocarp, mesocarp, endocarp and seed/kernel) present different constitutional characteristics and their biomass determination, by traditional methods, is labor-consuming. Therefore, the validation of procedures that can streamline this process is relevant, since it can reduce costs and time for both breeding programs and industries. This study tested the efficacy of Artificial Neural Networks (ANN) on biomass prediction of the macauba fruit components by comparing it to the multiple linear regression method. The data used came from fruits collected in 18 localities, distributed throughout the state of Minas Gerais, Brazil. According to their provenance, the matrices were clustered into two groups with the k-means method for posterior ANN cross-validation. Each group was interchangeably used for both training and validation purposes. The ANN was more efficient than multivariate linear model in the predictions of dry weight of the fruitś four components and oil content of the mesocarp and seed. As for variables related to dry weight, ANN reached 98% predictive accuracy (i.e., 98% accuracy of the value predicted by the network), and for variables related to oil contents, accuracy was around 90%. Additionally, non-invasive measurements of the fruit (i.e., low-cost and low-time measurement variables) were adequate enough to predict most of the variables of interest. These results show the ANN's prediction potential, saving time and efforts for the consolidation of macauba as a crop.

1. Introduction Countries that are signatories of the COP 21 (21st United Nations Conference on Climate Change) face the great challenge of changing their energy matrix, replacing non-renewable sources by renewables ones. This paradigm shift will inevitably increase the demand for biofuels in the world, including biodiesel and bio-kerosene from biomassbioenergy crops. As a global agreement, biofuel production from plant materials should avoid the deleterious effects of direct and indirect land use change (Lapola et al., 2010; Van der Laan et al., 2016). Therefore, raw materials with high energy density, which adapt to production systems with low environmental impact, are fundamental to sustainable production of biofuels (Lapola et al., 2010). The macauba palm, Acrocomia aculeata, fulfills these requirements (Montoya et al., 2016). Macauba, an oleiferous species of the Tropical America and one of



Corresponding author. E-mail address: [email protected] (R.T. Resende).

http://dx.doi.org/10.1016/j.indcrop.2017.07.031 Received 29 March 2017; Received in revised form 14 July 2017; Accepted 16 July 2017 0926-6690/ © 2017 Elsevier B.V. All rights reserved.

the most conspicuous palm in Brazil, shows great biomass yield potential (Evaristo et al., 2016a). Being a drupaceous species, macauba fruit structure, composition and oil content resemble those of the African oil palm fruit, Elaeis guineensis Jacq. (Del Río et al., 2016; Montoya et al., 2016). The macauba mesocarp oil has diverse industrial purposes, including the production of biodiesel, bio-kerosene, oleochemicals and cosmetics (Montoya et al., 2016; Pires et al., 2013). But unlike the African palm, the macauba displays high rusticity to drought, acid soils and fire (Bicalho et al., 2016; Pires et al., 2013); so much so it populates tropical areas with pronounced rainfall seasonality, like the Brazilian savannah ecoregions (Cerrado) (Lanes et al., 2014). Hence, macauba cultivation in degraded areas, such as abandoned pastures, or in integrated systems, such as inter-cropping and silvipastoral systems, is feasible (Motoike and Kuki, 2009). Both are sensible farming practices, with low environmental impact (Lanes et al., 2016). Another

Industrial Crops & Products 108 (2017) 806–813

C.A.d.O. Castro et al.

(Table 1), available in WorldClim – global climate records (Hijmans et al., 2005). The average temperature was 21.7 °C and average precipitation was 1306.1 mm3 throughout the year.

advantage is the likelihood of using all fruit parts – the exocarp (husk), mesocarp (pulp), endocarp (nut) and seed (kernel) – to produce coproducts with great economic value and high energy density. The solid residues, after the oil extraction, can provide protein and fiber cakes, charcoal and briquettes (Evaristo et al., 2016c; Pires et al., 2013). For these reasons, macauba is a workable and sustainable non-food crop alternative. Most of the exploitation of macauba is by means of extractivism in natural populations (Evaristo et al., 2016a; Pires et al., 2013). However, as investments started, commercial plantations have being established in the southeast region of Brazil. To maximize the economic value of the future commercial crops, the development and use of elite macauba plants is required. Cultivars are usually obtained through breeding programs, which often need information on genetic parameters of the species (Lynch and Walsh, 1998). Therefore, the prospect of acquiring estimates of biometric variables of the fruit parts, related to oil content and dry weight, is of relevance for macauba. Biomass predictions would help to set up best breeding strategies. However, the productivity characteristics of fruits – such as E. guineensis (Legros et al., 2009) e Jatropha curcas L. (Singh et al., 2013) – are controlled by climatic variations and soil properties. In addition, for species with broad distribution, such as macauba, the centers of domestication and origin may not coincide (Lanes et al., 2016). Since Brazil is a country of continental dimensions, with marked climatic gradation, the influence of the environment should be considered when assessing the biomass of macauba fruit. A fruitś metrics dataset is useful in evaluating the productivity of commercial plantations and in securing the bases to market their goods (Ciconini et al., 2013). Attaining these values, however, is time and labor demanding. It often needs the individualization of the fruit fractions, for later drying and/or physico-chemical analysis (Coimbra and Jorge, 2011a; Mazzottini-dos-Santos et al., 2015). Thus, analytical strategies that reduce time and costs expenditures, such as indirect prognostics, may increase the efficiency to obtain biometric values of macauba fruits. The approach through Computational Intelligence, using Artificial Neural Networks (ANN) as tools, is an efficient form of prediction. The method reliability is based on many studies that sought to minimize the handling and capture of allometric variables while providing accurate measurements (Baruah et al., 2017; Cheok et al., 2012; Hajar and Vahabzadeh, 2014; Silva et al., 2014). The ANN have a non-linear approach inspired in the human brain operation and structure and, as such, are capable of learning data patterns, including those with interferences and those consisted of incomplete or contradictory events (Silva et al., 2014). For all that has been said, this study evaluated: (i) the influence of the macauba matrices’ provenance on the quantitative traits of the fruit and (ii) the efficiency of ANN for predicting fruit biomass as an alternative to multiple linear regression analysis. The knowledge will help understand the morphological diversity of the species and, equally important, mediate the prognosis of the values referring to the fruit constituents of industrial interest.

2.2. Evaluation of macauba fruit physico-chemical variables Eleven variables obtained from the whole fruit and its individual parts were used in the study (Table 2). Based on them, two groups of variables were established: a Predictive – variables that were easily measured with a minimum time spent (between 10 s to 1 min), viz. Fruit fresh weight (FFW), Fruit radial diameter (FRD), Fruit axial diameter (FAD), Endocarp width 01 (EW1), Endocarp width 02 (EW2). b Predicted – variables that required more time (4–5 days) and resources to be measured, viz. Husk or exocap dry weight (HDW), Pulp or mesocarp dry weight (PDW), Endocarp dry weight (EDW), Kernel or seed dry weight (KDW), Oil content in the kernel (KOC) and Oil content in the pulp (POC). The predictive variables were classified according to the way in which the set of measurements were obtained: (i) Non-destructive procedure – external measures of the fruit, using only three variables from intact fruits i.e., FFW, FAD and FRD; and (ii) Semi-destructive procedure – external and internal measures of the fruit, using five variables, three from intact fruits (FFW, FAD and FRD) and two from sectioned fruits (EW1, EW2). These two procedures intended to reveal how much the input variables are relevant to achieve results with better accuracy using multiple regression and ANN. Besides that, it will allow the choice of the most effective strategy for fruit handling. The non-destructive measurements were carried out as follows: FFW was obtained by individually weighing clean fresh fruits in an electronic scale (0.01 g precision). The FAD and FRD were measured with a digital caliper (0.01 mm precision). For the destructive measurements, fresh fruits were cut in half (radial diameter) with a stainless-steel saw and the EW1 and EW2 randomly assessed using the precision digital caliper (Fig. 1). To obtain the dry weight of the fruit's parts (HDW, PDW, EDW and KDW), whole fresh fruits were kept in a ventilated kiln at 120 °C for 48 h, as established in previous essays. Afterwards, the fruits were dismantled – using a manual press and a stainless-steel knife – and each fraction conditioned at 65 °C – to allow additional drying but without tissue combustion – and later weighing. The POC and KOC were estimated after extraction of the oils in Soxhlet extractor, using hexane (P.A.) as solvent Montoya et al. (2016). 2.3. Statistical analysis and procedures description 2.3.1. Evaluation of fruit provenance effect over measured variables The fruits used in this study were collected on sites distributed in a wide range, from north to south of Minas Gerais state (Fig. 2). Therefore, we preliminarily sought to evaluate the effect of the provenance on fruit morphology and, in case of statistical significance, establish groups. In addition, repeatability coefficients (ρ) of fruits observations of a single matrix and the experimental coefficient of variation (CV %) were calculated through the adjustment of the mixed-effects model presented in Eq. (1):

2. Materials and methods 2.1. Plant material and provenances The work used the biometric data of 543 fruits from 172 macauba mother-threes (matrices, 7–25 years old) collected in 18 sites throughout the state of Minas Gerais/Brazil. The fruits were collected between November 2013 and February 2014, directly from the bunches, as soon as natural abscission started. At this stage, the fruits are considered matured but not ripened (Montoya et al., 2016). On average, 9.6 matrices were tested per locality (Table 1). For each matrix, fruit variables were measured in different numbers of repetitions (2–4). Using the cartographic coordinates of the sites of origin, it was possible to obtain information of their meteorological conditions

y = Xb + Zp + ε,

(1)

where, y are the observed values of morphological variables of the macauba fruits (i.e., FFW, FRD, FAD, EW1 and EW2); b are fixed effects of localities added to the overall mean; p are the random effects of matrices (or permanent environment effect), with variance structure as p σp2 ∼ N (0, Iσp2) ; ε is the residual component, with variance structure as ε σε2 ∼ N (0, Iσε2) ; where in I is an identity matrix; X and Z are incidence matrices on fixed and random effects, respectively. The 807

Industrial Crops & Products 108 (2017) 806–813

C.A.d.O. Castro et al.

Table 1 Cartographic coordinates of macauba fruit sites of origin (provenances) in the state of Minas Gerais/Brazil and their temperature and rainfall annual averages (Hijmans et al., 2005). Origin

Coordinates Latitude

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

16° 16° 16° 16° 16° 17° 19° 19° 19° 19° 19° 19° 17° 19° 19° 21° 21° 21°

41′ 44′ 13′ 15′ 12′ 35′ 39′ 30′ 13′ 57′ 53′ 47′ 53′ 27′ 02′ 08′ 15′ 14′

07” 13” 38” 46” 30” 41” 49” 50” 42” 07” 12” 00” 14” 51” 02” 08” 57” 45”

No. of tree-matrices

No. of observations

Climatic data

Longitude

44° 43° 44° 44° 44° 44° 43° 43° 44° 44° 44° 45° 44° 45° 46° 44° 44° 44°

21′ 51′ 15′ 09′ 25′ 43′ 41′ 44′ 02′ 20′ 25′ 40′ 34′ 36′ 09′ 15′ 50′ 59′

54” 53” 43” 52” 35” 28” 29” 46” 17” 34” 56” 53” 53” 03” 20” 42” 49” 59”

Temperature (°C)

20 14 6 5 8 3 17 9 5 3 2 9 7 23 15 9 13 4

80 56 24 20 31 10 68 36 20 12 8 36 14 46 30 18 26 8

Precipitation (mm)

Average

Max.

Min.

21.80 22.60 21.30 21.90 22.30 23.60 21.30 21.20 22.10 21.70 21.50 21.80 23.10 22.10 21.20 20.10 20.00 20.10

30.60 30.40 30.00 30.50 31.40 31.90 29.30 29.20 30.20 29.90 29.60 30.10 31.00 30.20 28.80 28.40 28.10 28.40

10.70 12.30 10.80 11.40 11.00 11.90 11.50 11.20 11.30 10.60 10.40 9.70 11.70 10.70 11.30 9.60 9.90 10.10

1293.00 1036.00 1130.00 1067.00 1099.00 1096.00 1395.00 1399.00 1232.00 1386.00 1434.00 1391.00 1168.00 1412.00 1502.00 1468.00 1539.00 1463.00

Table 2 Physico-chemical variables of intact macauba fruit and its parts. Fruit variable

Unit

Measurement Time

FFW FRD FAD EW1 EW2 KDW EDW PDW HDW KOC POC

g mm mm mm mm g g g g g/kg g/kg

a

10 sec 10 sec 10 sec 1 min 1 min 4–5 days 4–5 days 4–5 days 4–5 days 4–5 days 4–5 days

Type of variable Predictive

Predicted

✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

a

Average time per fruit need to perform each measurement. Time intervals for drying and oil extraction were also accounted. Fruit variables acronyms, FFW: Fruit fresh weight; FRD: Fruit radial diameter; FAD: Fruit axial diameter; EW1: Endocarp width 01; EW2: Endocarp width 02; KDW: Kernel or seed dry weight; EDW: Endocarp dry weight; PDW: Pulp or Mesocarp dry weight; HDW: Husk or Exocap dry weight; KOC: Oil content in the kernel; POC: Oil content in the pulp. Units, g: grams, kg: kilograms and mm: millimeter.

Fig. 2. Mapping of the 18 macauba matrices – fruits donors – in the territory of Minas Gerais state. The fruits were clustered in two groups according to their sites of origin, by the k-means analysis: Group 1 (G1) – circles; Group 2 (G2) squares.

repeatability coefficients (ρ) and residual variation coefficient – CV (%) − were obtained by Eqs. (2) and (3):

ρ=

σp2 σp2

+ σε2

CV (%) =

σε2 x

, (2)

× 100,

(3)

σp2

is the variance where, x is the arithmetic mean of the used variable; of permanent environment (i.e., tree); and σε2 is the residual variance. 2.3.2. Environment stratification Based on the previous analysis, the environmental stratification was performed allowing fruit data clusterization into two groups (justified for later application of hold-out cross-validation). This procedure was adopted to reduce the interaction of the environment on fruit morphology, besides validating the prediction process for different scenarios. The k-means method (Hartigan and Wong, 1979) was applied using the average values of each locality and the morphological fruit

Fig. 1. Illustration of the macauba fruitś variables used in the prediction of desirable variables. A) Intact whole fruit: Fruit axial and radial diameters (FAD, FRD). B) Sectioned fruit: Endocarp widths (EW1, EW2), randomly taken. Acronyms are: exocarp (ec), mesocarp (mc), endocarp (ec) and seed (sd).

808

Industrial Crops & Products 108 (2017) 806–813

C.A.d.O. Castro et al.

Fig. 3. Macauba fruit biomass. Fruit mill residual: A – Husk (exocarp), B – fiber pulp cake (mesorcap), C –nut (endocarp shell), D – kernel cake (endosperm). Oils from: E – pulp/mesocarp and F – kernel/seed or endosperm.

variables. This step was completed using the k-means function of R software (Team R, 2015) adopting k = 2. 2.3.3. Prediction of macauba fruit variables For the prediction of fruits variables of interest (Table 2), the two procedures used to obtain the predictive variables (i.e., non-destructive and semi-destructive procedures) were considered. Therefore, the prediction processes of these two approaches were independently performed. The data were divided according to the environmental stratification (groups of Fig. 2). The hold-out cross-validation results were independently evaluated for each prediction scenario (Kohavi, 1995). This whole process was repeated for both the multiple regression analysis and the ANN approaches, which will be described as follows: 2.3.3.1. Prediction by multiple linear regression. Eqs. (4) and (5) were used to predict the variables KDW, EDW, HDW, PDW, KOC and POC (respectively presented in Fig. 3) using the Multivariate Linear Model method (MLM). These six measures are considered of commercial importance, since they refer to the biomass production of the different fruit fractions, and oil contents.

Fig. 4. Feed-forward diagram of the Artificial Neural Network (ANN) used in the present study. X1, X2 and Xt are the predictive variables (where t is equal to 3 in the non-destructive procedure and equal to 5 in the semi-destructive procedure), Y is the network predicted variable, the circular boxes are the neurons, which varied between 7–12 in each of the 1–3 layers.

performing of ANN. The topology tested ranged from 1 to 3 layers, with the possibility of using 1–12 neurons per layer (see Fig. 4). In this way, it was possible to evaluate the amount of layers and neurons that are sufficient to achieve a result with good predictive accuracy. The logistic and linear activation functions, named in MATLAB by Logsig and Purelin, were used in the neurons of the hidden layers (Demuth and Beale, 1993). The training algorithm used, and the best result, was the Bayesian Regulation backpropagation (backpropagation of errors).

• Non-destructive procedure y = b0 + b1 x1 + b2 x2 + b3 x3 + e,

(4)

• Semi-destructive procedure y = b0 + b1 x1 + b2 x2 + b3 x3 + b4 x 4 + b5 x5 + e,

(5)

where, y is the observed value of the variable to be predicted; b0 to b5 are the multivariate regression coefficients; and x1 to x5 are predictive variables FFW, FRD, FAD, EW1 and EW2, respectively, according to the acronyms on Table 2; and e is the MLM residual.

2.3.3.3. Relative effectiveness of prediction methods. In order to compare the effectiveness of prediction approaches accounting the presence of fruit morphological differences and environmental influences, the data were partitioned into training and validation populations in three different ways. In the first one, random sampling (RS) of matrices provenances was carried out for formation of the training and the validation groups. In the second way, the observations belonging to G1, demonstrated by the k-means method, were used for training and the G2 fruits validated the learning of the prediction process (G1 → G2). The third way was characterized by being the exactly opposite of what happened in the second, i.e., both semi-destructive and non-destructive approaches were trained from the data present in G2 and validated with G1 (G2 → G1). Based on the results obtained in the validation, the values of accuracy (r), which represents the root of the coefficient of determination (R2), were compared; and Bias (in percentage), as demonstrated by Eq. (6), wherein MSE is the mean square error; x is the predicted variable mean. These values were calculated both for ANN and for multiple linear regression. According to these indicators, it was possible to conclude about which method has the greatest accuracy in the prediction of interest and if the environmental factors interfere

2.3.3.2. Prediction by artificial neural networks. The ANN was performed using the routines available in the software MATLAB R2011a® (Demuth and Beale, 1993) in integration with the software GENES® (Cruz, 2016) through the toolbox “nntoll”. This type of processing requires an output function Y; a set of variables that can be used as inputs (Xj) for training; function weights Wj; and pre-defined activation functions (Silva et al., 2014). The input variables were presented to the ANN and their adjusted weights successively until the network output approached the desired prediction results. Thus, several topologies were tested to find the one that minimized the prediction error, and maximized the prediction accuracy. The ANN used was the multilayer perceptron network, with feed-forward training (Rumelhart et al., 1985). The learning paradigm used is classified as supervised. In this type of learning, each example in the training was accompanied by a desired value, aiming at determining the fit weights to minimize the discrepancy between the desired value and the network response, demonstrated by the Mean Square Error (MSE) value, which is obtained as a result at the end of processing 809

Industrial Crops & Products 108 (2017) 806–813

C.A.d.O. Castro et al.

those fruits (Table 3).

Table 3 Estimated parameters of Eq. (1) model accounting the five morphological variables.

3.2. Fruits biomass prediction

Fixed effect components

3. Results

The biomass predictions by the ANN and MLM methods, and the results for the non-destructive and semi-destructive procedures are on Table 5. In all tested scenarios, the predictive accuracy of the ANN was greater than the MLM method. The three types of validation (RS, G1 → G2 and G2 → G1) showed rather highly similar results, even though the used samples had distinct backgrounds, such as provenance climatic characteristics, fruit morphology and number of fruit per group (Table 4). In effect, by comparing the three validation methods, the MLM showed accuracy of ∼0.82 and ∼0.81 for the non-destructive and semi-destructive procedures, respectively (Table 5). On the other hand, the ANN method presented values equal to 0.96 for the non-destructive procedure and 0.97 for the semi-destructive. The variables that were best predicted by the ANN were PDW, HDW and EDW, showing predictive accuracy of 0.97-0.98 for the non-destructive procedure and of 0.97-0.98 for the semi-destructive. The oil contents of the kernel and the pulp (KOC and POC) varied the most, presenting values equal to 0.94-0.73 for the non-destructive procedure and 0.96-0.89 for the semi-destructive. The bias for the MLM was greater than 20% in all tested settings while for the ANN it ranged from 0.74 to 11.18% (4.7%, on average) (Table 5). This represents an almost 10% advantage of the network method over the regression. The non-destructive and semi-destructive procedure had average bias of 5.82% and 3.56%, respectively.

3.1. Characteristics of the fruits predictive variables and provenance

4. Discussion

The predictive variables FFW, FAD, FRD and EW2 were significantly affected by the sites of origin of the fruits (Table 3). For them, the pvalues showed high significance level (α = 0.001 or 99.9% probability). In contrast, the EW1 was the only predictive variable in which the fruitś provenance was not significant (p-value = 0.13). The repeatability coefficient (ρ) showed that fruits from the same matrix displayed morphological similarities (Table 3). The variables FFW, FRD and FAD had high ρ values, 0.93, 0.93 and 0.90, respectively. However, smaller ρ values were found for EW1 and EW2, 0.43 and 0.40, respectively. The CV (%) was below 5% for FFW, FAD and FRD (4.87, 1.90 and1.58, respectively) while for EW1 and EW2, CV (%) was about 10% (Table 3). To validate the results obtained by the predictions (results presented below), clustering based in the fruitś provenance was performed. The kmeans analysis revealed that macauba fruits exhibited phenotypic diversity regarding the origin of their matrices (see coordinates in Table 1). Consequently, two groups were formed and a clear geographical separation disclosed (Fig. 2). The first group (G1) comprised data from 447 fruits of 135 matrices, which were distributed in 13 sites, mostly at the northern region of Minas Gerais state. The second group (G2) included data from 96 fruits of 37 matrices, which inhabited the remaining five sites located at the central-south region of the state (Table 4). Fruits of G2 presented the highest biomass mean values of FFW, FAD, FRD, EW1 and EW2, the variables used for the ANN and MLM prediction methods. Thus, the prevailing climatic conditions of the region may have positively affected the morphological features of

4.1. The matrices provenance affects the macauba fruit morphology

x p-value of provenances

FFW

FRD

FAD

EW1

EW2

43.07 6.7e-5***

45.1 8.5e-4***

44.08 1.6e-6***

5.73 0.13

4.09 2.0e-4***

Random effect components

σ p2

57.27

6.72

6.18

0.28

0.14

σε2 ρ CV (%)

4.39

0.51

0.7

0.38

0.15

0.93 4.87

0.93 1.58

0.9 1.9

0.43 10.74

0.48 9.6

FV: Factor of variation; ***: significant p-value at 99,9% probability (with 17 degrees of freedom); x : phenotypic mean; σ p2 : between plants variance; σε2 : residual variance; ρ: repeatability coefficient; CV (%): coefficient of residual in percentage term. Fruit variables acronyms, FFW: Fruit Fresh Weight; FRD: Fruit Radial Diameter; FAD: Fruit Axial Diameter; EW1: Endocarp Width 01; EW2: Endocarp Width 02.

significantly in the prediction results obtained.

Bias (%) =

MSE × 100 x

(6)

Brazil is a country with large dimensions and significant climatic and environmental variations. As a result, species with broad distribution, like macauba palm, can present morphological and quantitative variations (clinal variation) for both vegetative and reproductive structures (Assogbadjo et al., 2011). Among the five predictive morphological variables analyzed, only the endocarp width EW1 showed no difference regarding the fruit's site of origin. The state of Minas Gerais is the fourth in a territorial area and hosts different biomes (Coutinho, 2006). The climate in Minas Gerais is predominantly tropical; however, meso and microclimates varies from colder and wetter in the south, wet and dry (tropical savanna) in the center, to semi-arid in the northern part of the state (Alvares et al., 2013; Walter, 1986). Therefore, the quantitative variations in the macauba fruits are expected, because the matrices were from several provenances, i. e. morpho-climatic domains, with significant thermal and precipitation amplitudes (Table 1). Both local climate and soil proprieties are selection factors and can have a profound impact in the plantś fenotypes (growth, architecture, biochemical constitution, physiology, etc), often leading to ecotypes establishment. In fact, Machado et al. (2015) reported biometric and chemical variation between the fruit of two macauba ecotypes (A. aculeta ‘totai’ and ‘sclerocarpa’) that inhabit different localities in the Parana state. Besides the fruits biometrics variance, according to their origin, significant variations also occurred in fruits from the same locality,

Table 4 K-means values using the predictive variables and information about the groups (G1 and G2) formed based on the matrices provenances and the means of the analyzed variables. Groups

G1 G2 Total

Fruit variable averages

Provenances

FFW

FRD

FAD

EW1

EW2

41.62 50.8 43.07

44.59 47.19 45.1

43.58 46.56 44.08

5.74 5.94 5.79

4.07 4.32 4.11

Number of tree-matrices

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 15 11, 12, 16, 17, 18 1–18

135 37 172

Number of fruits

447 96 543

Climatic Averages Temperature (°C)

Precipitation (mm3)

22.02 20.7 21.65

1247.31 1459 1306.11

Fruit variables acronyms, FFW: Fruit Fresh Weight; FRD: Fruit Radial Diameter; FAD: Fruit Axial Diameter; EW1: Endocarp Width 01; EW2: Endocarp Width 02.

810

Industrial Crops & Products 108 (2017) 806–813

C.A.d.O. Castro et al.

Table 5 Results of prediction accuracy (r) and bias estimation (Bias) of the semi-destructive and non-destructive procedures. Procedure

Fruit Variable

G1 → G2

RS ANN

MLM

ANN

G2 → G1 MLM

ANN

MLM

r

Bias (%)

r

r

Bias (%)

r

r

Bias (%)

r

Semi-destructive

KDW PDW HDW EDW KOC POC

0.93 0.98 0.97 0.95 0.96 0.89

8.73 4.46 6.00 5.80 0.94 7.23

0.56 0.73 0.81 0.66 0.16 0.41

0.97 0.98 0.98 0.98 0.96 0.90

6.56 4.64 4.82 5.07 0.74 3.97

0.50 0.80 0.79 0.62 0.02 0.59

0.92 0.97 0.96 0.96 0.95 0.89

10.08 5.05 5.12 5.34 1.05 6.67

0.28 0.77 0.67 0.56 0.00 0.19

Non-destructive

KDW PDW HDW EDW KOC POC

0.96 0.97 0.96 0.95 0.86 0.75

8.17 5.63 6.37 6.56 1.83 10.70

0.50 0.78 0.82 0.63 0.19 0.45

0.97 0.94 0.95 0.97 0.94 0.76

6.53 7.58 5.79 5.38 1.43 7.95

0.50 0.76 0.78 0.59 0.02 0.62

0.90 0.94 0.94 0.90 0.79 0.73

11.12 6.88 6.52 8.64 2.10 10.02

0.25 0.76 0.68 0.56 0.06 0.30

RS: Random Sampling; G1 → G2: Training on G1 group and validation on G2 group; G2 → G1: Training on G2 group and validation on G1 group; ANN: Artificial Neural Network; MLM: Multivariate Linear Model. Fruit variables acronyms, KDW: Kernel or seed dry weight; EDW: Endocarp Dry Weight; PDW: Pulp or Mesocarp dry Weight; HDW: Husk or Exocap Dry Weight; KOC: Oil Content in the Kernel; POC: Oil Content in the Pulp.

two geographical groups (Fig. 2). Notably, the south group (G2) showed the biggest values for most of the fruit variables. This result is probably because the plants of G2 are from regions that had, on average, higher precipitation (1459 mm3) and mild temperatures (20.7 °C), typical features of Atlantic Rainforest; whereas most of the matrices in the north group (G1) are from the Cerrado biome, where lower (∼1247.31 mm3) and marked rainfall seasonality and higher temperatures (∼22.02 °C) are common (Coutinho, 2006; Walter, 1986). In fact, for Arecaceae representatives, the best plant performance and greater fruit growth are associated with tropical forest environments, with temperature between 15 and 35 °C and rainfall of 1500–2000 mm/ year (Costa and Marchi, 2008; Couvreur and Baker, 2013; Prates et al., 1986).

even from the same tree (Table 2). The statistic concept of repeatability (ρ) was used to test the identity of fruits produced by the same matrix. Similar ρ values for FFW, fruit diameters and endocarp widths were found by Manfio weet al. (2011) in a study with macauba fruits from several regions of Minas Gerais state. Although our results showed higher efficacy in the prediction analyzes, the estimated EW1 and EW2 values were low as those found by Manfio et al. (2011). The low values of repeatability (ρ ≪ 0.6) for the EW1 and EW2, hence the of low magnitude (Resende 2007), are possibly related to the random way data were collected. As can be seen in Fig. 1-B, the endocarp thickness is heterogeneous along its perimeter and more than one kernel can be found inside the nut (endocarp shell) (Manfio et al., 2011). These factors can affect the values of the variable. Therefore, for a greater confidence in quantifying the endocarp width, it is advisable to use a larger number of readings per endocarp or even a standardization of the portion to be measured in this fraction of the macauba fruit. It is noteworthy that, fruitś developmental stage, as well as the matrices’ age, can affect both the physical and the chemical characteristics of the fruit; however those variables were already incorporated in the data that feed the ANN. As see in Table 5, the network was able to quantify the morphological variations with great accuracy. The low CVs (%) values for FFW, FAD and FRD indicate that there is little interference of the residues in assessing the variables (Amaral et al., 1997); whereas EW1 and EW2 presented CV (%) higher than the other variables, 10.74 and 9.60, respectively. This means both widths suffer greater variation during measurement, which could systematically interfere in the prediction. Data from macauba fruits from other Brazilian states (Ciconini et al., 2013; Manfio et al., 2011; Trentini et al., 2016) showed mean and CV (%) values comparable to those obtained by our study, like the endocarp width CV (%) = 12.10, ensuring the method adequacy. The above discussed evaluation parameters (CV and ρ) prove the quality of the data acquisition in this study. For instance, it was verified that the measurements of the variables in the fruits of a matrix showed average-to- high repeatability (Table 3). In addition, the low CV (%) obtained in the analysis suggests that the environmental stratification and prediction results, presented below, can be successful under different circumstances.

4.3. The ANN reliably predicts the biomass of macauba fruit regardless of its provenance The ANN showed higher predictive efficacy than the MLM, owing to the better predictive values (r, or accuracy) for both the semi-destructive and non-destructive procedures and lower bias values (Table 5). The ANN superiority of is unmistakable for the KOC variable, in the semi-destructive procedure. In that case, all schemes of data partition for train-and-test validation, the network showed greater r values over the MLḾs. For example, r = 0.98 when the ANN were trained with the data from G1 and validated with G2, whilst for MLM it was much lower (r = 0.02). Also, similar results occurred with the opposite scenario, i.e., G2 → G1, meaning that the ANN will predict data from fruits of distinct origins without loss of prediction capacity. In the present study, a simple ANN was required to predict the desired data (Fig. 4). The algorithm roamed 1000 iterations for the condition to be reached. The fixed topology, containing only 3 intermediate layers, with 7–12 neurons in each (the best predictive ability), with the algorithms cited above, supplied excellent results (Table 5). The number of neurons of each layer at the end of the ANN process can be verified in the supplementary material (Table S1). This result is probably due to the ANN flexible dynamic, which relays on interconnected neurons that create a “global” intelligence and facilitate the prediction process (Silva et al., 2014). The use of the prediction methods (ANN and MLM) was based on validation tests (G1 → G2 and G2 → G1), as previously discussed (Table 4). And regardless of the group used for training, the network could perform the prediction with higher r values. The semi-destructive procedure presented results superior to the non-destructive one,

4.2. Clustering analysis of the measured variables reveals the existence of environmental stratification for macauba fruits The fruits from the 18 localities in the Minas Gerais state made up 811

Industrial Crops & Products 108 (2017) 806–813

C.A.d.O. Castro et al.

technical employee of UFV, Francisco de Assis Lopes; the research group Geotechnology Applied to Global Environment (GAGEN); the Brazilian fund agencies FAPEMIG, CNPq and CAPES for granting the scholarships. Also, our thanks to Professor Leonardo L. Bhering DSc., for the availability of the Biometrics Lab/UFV and computational server for processing the analyzes.

reaching r = 0.98 in the prediction for some of the evaluated variables, proving its practical potential and effectiveness. Bias represents how much the prediction can go wrong, in terms of percentage. Thus, since most of the bias results were low, it showed the effectiveness and strength of ANN in performing this technique (Haykin, 2009) and predicting values comparable to the actual ones (Table 5). However, the ANN prediction of the oil content of the pulp (POC), presented high bias for the non-destructive procedure, and moderate bias for the semi-destructive procedure. This can be justified by the difference in density caused by the amount of oil in the pulp (Evaristo et al., 2016b) and because POC does not have a direct relationship with fruit size. To point out, the correlations between POC × FFW, FAD, FRD, EW1, and EW2 were −0.01, 0.12, 0.22, 0.07 and −0.01, respectively. Supplementary Material Fig. S1 shows the relationships between the predictive and predicted variables via LOESS curves fitting (Local Regression). Besides, the oil contents of the macauba fruit, especially in the mesocarp, may vary according to the genetic material and ripeness status (Montoya et al., 2016). Different works (Bora and Rocha, 2004; Ciconini et al., 2013; Hiane et al., 2006; Machado et al., 2016, 2015) demonstrated the oil content in the macauba mesocarp varied greatly with the fruitś population origin, much to do with the genetic diversity. Therefore, small fruits, from one particular population, can have much high oil content than larger fruits from another population. Also, the palm's provenance influences the total amount of other fruit compounds, including carbohydrates, proteins, starch and fibers (Bora and Rocha, 2004; Coimbra and Jorge, 2011a, 2011b; Montoya et al., 2016; Wandeck and Justo, 1982). The macauba fruit oil contents predicted by the ANN presented high r values. Indeed, KOC showed r = 0.96 and bias = 0.74% while for POC, r ranged from 0.89 to 0.90 in the semi-destructive procedure (Table 5). Because the traditional way to determine these two variables is the costliest, to predict macauba oil biomass by ANN is an asset. The technique can considerably reduce the time, labor and reagents required to measure pulp and kernel oil contents. The ANN can also minimize other undesirable issues. Mature macauba fruits, in a typically climacteric behavior, accumulate extra amounts of oil (up to 15%) during post-harvest storage (Evaristo et al., 2016b). Therefore, the network and the non-destructive procedure guarantee the storage of intact fruits and, by preserving the pulp from damages, the oil content and quality is retained. According to the results, the non-destructive procedure suits the dry weight prediction of the macauba fruit fractions, whereas the ANN based in the semi-destructive procedure predicted KOC and POC with higher accuracy. Then, in the industry, the choice for non-destructive procedure should be considered as it saves time and facilitates storage of the fruits. The good result achieved in this work, allows agility in the industrial handling of macauba fruits. Moreover, the gathered information can help speed up the palḿs breeding programs because, to add genetic gains to any species under domestication, many genotypes and countless repetitions are needed.

Appendix A. Supplementary material Supplementary information associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.indcrop. 2017.07.031. References Alvares, C.A., Stape, J.L., Sentelhas, P.C., de Moraes, G., Leonardo, J., Sparovek, G., 2013. Köppen’s climate classification map for Brazil. Meteorol. Zeitschrift 22, 711–728. Amaral, A.M., do Muniz, J.A., de Souza, M., 1997. Avaliação do coeficiente de variação como medida da precisão na experimentação com citros. Pesqui. Agropecuária Bras. 32, 1221–1225. Assogbadjo, A.E., Kakaï, R.G., Edon, S., Kyndt, T., Sinsin, B., 2011. Natural variation in fruit characteristics: seed germination and seedling growth of Adansonia digitata L. in Benin. New For. 41, 113–125. Baruah, D., Baruah, D.C., Hazarika, M.K., 2017. Artificial neural network based modeling of biomass gasification in fixed bed downdraft gasifiers. Biomass Bioenergy 98, 264–271. Bicalho, E.M., Rosa, B.L., Souza, A.E., de Rios, C.O., Pereira, E.G., 2016. Do the structures of macaw palm fruit protect seeds in a fire-prone environment? Acta Bot. Brasilica 30 (4), 540–548. Bora, P.S., Rocha, R.V.M., 2004. Macaiba palm: fatty and amino acids composition of fruits. CYTA-J. Food 4, 158–162. Cheok, C.Y., Chin, N.L., Yusof, Y.A., Talib, R.A., Law, C.L., 2012. Optimization of total phenolic content extracted from Garcinia mangostana Linn hull using response surface methodology versus artificial neural network. Ind. Crops Prod. 40, 247–253. Ciconini, G., Favaro, S.P., Roscoe, R., Miranda, C.H.B., Tapeti, C.F., Miyahira, M.A.M., Bearari, L., Galvani, F., Borsato, A.V., Colnago, L.A., 2013. Biometry and oil contents of Acrocomia aculeata fruits from the Cerrados and Pantanal biomes in Mato Grosso do Sul, Brazil. Ind. Crops Prod. 45, 208–214. Coimbra, M.C., Jorge, N., 2011a. Characterization of the pulp and kernel oils from Syagrus oleracea Syagrus romanzoffiana, and Acrocomia aculeata. J. Food Sci. 76, C1156–C1161. Coimbra, M.C., Jorge, N., 2011b. Proximate composition of guariroba (Syagrus oleracea), jerivá (Syagrus romanzoffiana) and macaúba (Acrocomia aculeata) palm fruits. Food Res. Int. 44, 2139–2142. Costa, C.J., Marchi, E.C.S., 2008. Germinação de sementes de palmeiras com potencial para produção de agroenergia. Rev. Biodieselbr 18, 39–50. Coutinho, L.M., 2006. O conceito de bioma. Acta Bot. Brasilica 20, 13–23. Couvreur, T.L.P., Baker, W.J., 2013. Tropical rain forest evolution: palms as a model group. BMC Biol. 11, 48. Cruz, C.D., 2016. Genes software-extended and integrated with the R, Matlab and Selegen. Acta Sci. Agron. 38, 547–552. Del Río, J.C., Evaristo, A.B., Marques, G., Martín-Ramos, P., Martín-Gil, J., Gutiérrez, A., 2016. Chemical composition and thermal behavior of the pulp and kernel oils from macauba palm (Acrocomia aculeata) fruit. Ind. Crops Prod. 84, 294–304. http://dx. doi.org/10.1016/j.indcrop.2016.02.018. Demuth, H., Beale, M., 1993. Neural Network Toolbox for Use with MATLAB.Neural Network Toolbox for Use with MATLAB. Evaristo, A.B., Grossi, J.A.S., de C.O. Carneiro, A., Pimentel, L.D., Motoike, S.Y., Kuki, K.N., 2016a. Actual and putative potentials of macauba palm as feedstock for solid biofuel production from residues. Biomass Bioenergy 85, 18–24. Evaristo, A.B., Grossi, J.A.S., Pimentel, L.D., de Melo Goulart, S., Martins, A.D., dos Santos, V.L., Motoike, S., 2016b. Harvest and post-harvest conditions influencing macauba (Acrocomia aculeata) oil quality attributes. Ind. Crops Prod. 85, 63–73. Evaristo, A.B., Martino, D.C., Ferrarez, A.H., Donato, D.B., de C.O. Carneiro, A., Grossi, J.A.S., 2016c. Energy potential of the macaw palm fruit residues and their use in charcoal production. Ciência Florest. 26, 571–577. Hajar, M., Vahabzadeh, F., 2014. Artificial neural network modeling of biolubricant production using Novozym 435 and castor oil substrate. Ind. Crops Prod. 52, 430–438. Hartigan, J. a., Wong, M. a., 1979. Algorithm AS 136: a K-Means clustering algorithm. J. R. Stat. Soc. C 28, 100–108. http://dx.doi.org/10.2307/2346830. Haykin, S.O., 2009. Neural Networks and Learning Machines by Pearson Prentice Hall, 3rd ed. Pearson. Hiane, P.A., Baldasso, P.A., Marangoni, S., Macedo, M.L.R., 2006. Chemical and nutritional evaluation of kernels of bocaiuva: acrocomia aculeata (Jacq.). Lodd. Food Sci. Technol. 26, 683–689. Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G., Jarvis, A., 2005. Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol. 25, 1965–1978. Kohavi, R., 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai. pp. 1137–1145.

5. Conclusion The present study demonstrated the success of ANN in predicting the macauba fruit components of interest. This forecasting approach was more effective than the multiple linear regressions. Even though the analyzed variables showed proven quantitative differences, depending on the fruitś origin, the ANN prediction was highly efficient. Thus, the network can understand the interference of external factors during the prediction process. The results also suggest that the same topology used here will work for the prediction on fruits collected in other regions, with different environmental conditions. Acknowledgements We thank: Acrotech Co. for providing the macauba fruit data; the 812

Industrial Crops & Products 108 (2017) 806–813

C.A.d.O. Castro et al.

Motoike, S.Y., Kuki, K.N., 2009. The potential of macaw palm(Acrocomia aculeate) as source of biodiesel in Brazil. Int. Rev. Chem. Eng. 1, 632–635. Pires, T.P., dos Santos Souza, E., Kuki, K.N., Motoike, S.Y., 2013. Ecophysiological traits of the macaw palm: a contribution towards the domestication of a novel oil crop. Ind. Crops Prod. 44, 200–210. Prates, J.E., Sediyama, G.C., Vieira, H.A., 1986. Clima e producao agricola. Inf. Agropecu. – Empres. Pesqui. Agropecu. Minas Gerais 12, 18–22. Resende, M.D.V., 2007. Matemática e estatística na análise de experimentos e no melhoramento genético. Embrapa Florestas, Colombo, 1st ed. Forestry Embrapa, Colombo, PR Brazil. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1985. Learning internal representations by error propagation. DTIC Document. pp. 318–362. Silva, G.N., Tomaz, R.S., Sant’Anna, I., de, C., Nascimento, M., Bhering, L.L., Cruz, C.D., 2014. Neural networks for predicting breeding values and genetic gains. Sci. Agric. 71, 494–498. Singh, B., Singh, K., Rao, G.R., Chikara, J., Kumar, D., Mishra, D.K., Saikia, S.P., Pathre, U.V., Raghuvanshi, N., Rahi, T.S., 2013. Agro-technology of Jatropha curcas for diverse environmental conditions in India. Biomass Bioenergy 48, 191–202. Team R, 2015. R Development Core Team. R A Lang. Environ. Stat. Comput. 55, 275–286. Trentini, C.P., Oliveira, D.M., Zanette, C.M., da Silva, C., 2016. Low-pressure solvent extraction of oil from macauba (Acrocomia aculeata) pulp: characterization of oil and defatted meal. Ciência Rural 46, 725–731. Van der Laan, C., Wicke, B., Verweij, P.A., Faaij, A.P.C., 2016. Mitigation of unwanted direct and indirect land-use change–an integrated approach illustrated for palm oil, pulpwood, rubber and rice production in North and East Kalimantan, Indonesia. GCB Bioenergy. Walter, H., 1986. Vegetação e zonas climáticas: tratado de ecologia global. Vegetação E Zonas Climáticas. Tratado de Ecologia Global. EPU. Wandeck, F.A., Justo, P.G., 1982. A macauba, fonte energetica e insumo industrial: sua significacao economica no Brasil. In: 6. Simposio Sobre O Cerrado Brasilia. DF (Brazil). 4–8 Oct. pp. 1982.

Lanes, É.C.M., Motoike, S.Y., Kuki, K.N., Nick, C., Freitas, R.D., 2014. Molecular characterization and population structure of the macaw palm, Acrocomia aculeata (Arecaceae), ex situ germplasm collection using microsatellites markers. J. Hered. 106, 102–112. Lanes, É.C.M., Motoike, S.Y., Kuki, K.N., Resende, M.D.V., Caixeta, E.T., 2016. Mating system and genetic composition of the macaw palm (Acrocomia aculeata): Implications for breeding and genetic conservation programs. J. Hered. 1, 1–38. Lapola, D.M., Schaldach, R., Alcamo, J., Bondeau, A., Koch, J., Koelking, C., Priess, J.A., 2010. Indirect land-use changes can overcome carbon savings from biofuels in Brazil. Proc. Natl. Acad. Sci. 107, 3388–3393. Legros, S., Mialet-Serra, I., Caliman, J.-P., Siregar, F.A., Clement-Vidal, A., Dingkuhn, M., 2009. Phenology and growth adjustments of oil palm (Elaeis guineensis) to photoperiod and climate variability. Ann. Bot. 1, 1–12. http://dx.doi.org/10.1093/aob/ mcp214. Lynch, M., Walsh, B., 1998. Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA. Machado, W., Guimarães, M.F., Lira, F.F., Santos, J.V.F., Takahashi, L.S.A., Leal, A.C., Coelho, G., 2015. Evaluation of two fruit ecotypes (totai and sclerocarpa) of macaúba (Acrocomia aculeata). Ind. Crops Prod. 63, 287–293. Machado, W., Figueiredo, A., Guimarães, M.F., 2016. Initial development of seedlings of macauba palm (Acrocomia aculeata). Ind. Crops Prod. 87, 14–19. Manfio, C.E., Motoike, S.Y., dos Santos, C.E.M., Pimentel, L.D., de Queiroz, V., Sato, A.Y., 2011. Repeatability in biometric characteristics of macaw palm fruit. Ciência Rural 41, 70–76. Mazzottini-dos-Santos, H.C., Ribeiro, L.M., Mercadante-Simões, M.O., Sant’Anna-Santos, B.F., 2015. Ontogenesis of the pseudomonomerous fruits of Acrocomia aculeata (Arecaceae): a new approach to the development of pyrenarium fruits. Trees 29, 199–214. Montoya, S.G., Motoike, S.Y., Kuki, K.N., Couto, A.D., 2016. Fruit development, growth, and stored reserves in macauba palm (Acrocomia aculeata), an alternative bioenergy crop. Planta 244, 927–938.

813