Energy 188 (2019) 116077
Contents lists available at ScienceDirect
Energy journal homepage: www.elsevier.com/locate/energy
A comprehensive study on estimating higher heating value of biomass from proximate and ultimate analysis with machine learning approaches Jiangkuan Xing, Kun Luo*, Haiou Wang, Zhengwei Gao, Jianren Fan State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
a r t i c l e i n f o
a b s t r a c t
Article history: Received 22 April 2019 Received in revised form 29 August 2019 Accepted 5 September 2019 Available online 10 September 2019
Higher heating value (HHV) is an important parameter for design and operation of biomass-fueled energy systems. Experimental approach is always time-consuming and expensive for determinating this property compared with mathematical models. In this paper, three machine learning approaches, including artificial neural network (ANN), support vector machine (SVM) and random forest (RF), are employed for accurately estimating biomass HHV from ultimate or proximate analysis. The linear and nonlinear empirical correlations are also carried out for comparison. The results show machine learning approaches give better predictions (R2 > 0.90) compared with those of empirical correlations (R2 < 0.70), especially for the extreme values. The RF model shows the best performances for both the ultimate and proximate analysis, with the determination coefficient R2 >0.94. The SVM and ANN approaches show similar performances with R2 0.90. Ultimate-based models show better performances compared with those of the proximate-based models even with much less samples. Relative importance analysis shows for the proximate analysis, the ash, volatile matter and fixed carbon fractions show the maximum, medium and minimum effects, respectively. For the ultimate analysis, carbon and hydrogen fractions hold the first two significant places with carbon fraction having the most significant influence, while the oxygen and nitrogen fractions have limited effects. © 2019 Elsevier Ltd. All rights reserved.
Keywords: Biomass Higher heat value (HHV) Proximate analysis Artificial neural network (ANN) Support vector machine (SVM) Random forest (RF)
1. Introduction Renewable energy plays a more and more important role in the world energy structure with the growing demands of global energy, the limited reserves of fossil fuels and the urgent demand for environmentally-friendly energy source [1]. Among the renewable energy sources, biomass is the largest renewable energy source at 13% share in the global energy mix in 2016 [2], which can be utilized directly via combustion to produce heat and electricity [3], or converted into gaseous/liquid biofuels with different conversion methods [4]. The utilization of biomass needs an extensive study of its physical, chemistry and thermodynamic properties. Higher heating value (HHV) is one of the most important parameter for the design and operation of biomass-fueled energy systems [5]. HHV features the maximum amount of energy potentially recoverable from the biomass feedstock, which reflects the biomass quality [6]
* Corresponding author. E-mail address:
[email protected] (K. Luo). https://doi.org/10.1016/j.energy.2019.116077 0360-5442/© 2019 Elsevier Ltd. All rights reserved.
and must be known for determining the conversion efficiency of certain biomass conversion process [7]. HHV is also an important parameter for optimizing biomass conversion process under different conditions. For instance, Prins et al. [8], found that biomass with different levels of heating value was recommended to approach the highest gasification efficiency at different gasification temperatures. In addition, mathematical modelling, simulation and optimization are key tools for analyzing and developing complex biomass conversion technology, in which biomass heating value must be available for reflecting the biomass potential energy [9] and estimating the devolatilization product compositions [10] for unknown fuels without experimental data. This property is usually experimentally determined via a bomb calorimeter [11]. The experiment is simple but time-consuming and expensive compared with the convenient mathematical models, and may be unavailable for everyone, especially for engineers. Thus, an accurate model for estimating the HHV of various biomass types is needed for the development of bioenergy systems and avoiding the timeconsuming experiment. During the past decades, there have been many previous
2
J. Xing et al. / Energy 188 (2019) 116077
attempts made to develop models for predicting the HHV of biomass from proximate with nonlinear [12] and linear correlations [11], an adaptive neuro-fuzzy inference system approach [13], iterative neural network-adapted partial least squares [6], and artificial neural network [14], or ultimate analysis with linear [15] and nonlinear correlations [16], and Euclidean's distance method [17]. Among them, proximate-based models are more popular compared with the ultimate-based models due to the low cost of proximate analysis. At the beginning of those attempts, many empirical correlations (EC) were developed based on the proximate and ultimate analysis. Yin et al. [15] proposed two linear correlations from ultimate and proximate analysis results of 44 biomass samples from agricultural by-products (e.g. rice husk and corn straw) to wood (e.g. willow and oak) using Microsoft Excel. Nhuchhen et al. [12] used Microsoft Excel to develop linear and nonlinear correlations from proximate analysis of 250 biomass samples, including by-products of fruits, agri-wastes, wood chips/tree species, grasses/leaves/fibrous materials, briquettes/charcoals/pellets, and other waste materials. The results showed that the nonlinear correlation gave a better performance. Garcia et al. developed linear [16] and non-linear correlations [18] with MATLAB from ultimate and proximate analysis of 100 Spanish biomass samples including commercial fuels, industry wastes and forest wastes. Those proposed correlations are proved to be able to approximately estimate the HHV of biomass, but are not universal for various biomass and sometimes make large deviations due to the complex chemical and physical properties of biomass [12]. This indicates the complex nonlinear relation between HHV and ultimate and proximate analysis results can not be well characterized by empirical linear or non-linear correlations. Thus, more advanced approaches are desirable to well characterize the complex nonlinear relation to develop a universal model for accurately estimating HHV of various biomass. With the development of artificial intelligence, alternative advanced machine learning approaches, such as artificial neural network (ANN) [19], support vector machine (SVM) [20], and random forest (RF) [21], have been developed and proved to be effective to handle complex nonlinear issues. Those approaches
have been successfully applied for solid fuels, such as coal [22] and biomass [23] devolatilization, biomass gasification [24], control of biomass pyrolysis system [25] and combined heat and power plants [26], and desirable performances were obtained. Recently, several attempts have been made to estimate HHV of biomass from proximate analysis using ANN algorithm. Estiati et al. [14] and Uznn et al. [27] estimated HHV of biomass using ANN approach, and the model performances were better compared with those of the traditional empirical correlations, but still leave room to be improved with more advanced approaches. In the above context, the objective of the present work features the following three parts. First, taking the first attempt with multi machine learning approaches (SVM, ANN and RF) to estimate the biomass HHV from ultimate or proximate analysis, and determining the optimal approach. The empirical linear and non-linear correlations are also developed for comparison. Second, making a direct comparison on the performances of the proximate-based and ultimate-based models. Third, taking the first exploration on the relative importance of the proximate and ultimate data on the biomass HHV, which provides a possible way to directionally alter fuel composition to meet HHV demand. To the best of our knowledge, this is the first attempt that uses SVM and RF methods, compares multiple machine learning approaches, and measures the relative importance of each input on estimating HHV of various biomass. The rest of this paper is organized as follows. The samples and approaches used for developing models for estimating HHV of biomass are introduced in Section 2. Results are presented and discussed in Section 3. The last section makes a conclusion.
2. Samples and approaches A brief introduction of the present modelling procedures is provided here as shown in Fig. 1. The start point of the present study is the samples collected from the published literature. The ultimate and proximate analysis of those samples are normalized and divided into the training and application samples, respectively. Then the training samples are used to develop models for
Fig. 1. Computational flow diagram of the calculation of the present study.
J. Xing et al. / Energy 188 (2019) 116077
predicting biomass HHV with four different approaches, named linear and nonlinear correlations, artificial neural networks, support vector machine and random forest. After the training process is completed, performances of the trained models are tested on an application database and compared with each other to determine the optimal one. Finally, a relative importance analysis is made to quantitatively measure the effect of each input on the biomass HHV. The samples and approaches will be detailed in the following. 2.1. Samples In the present study, since the number of biomass types whose proximate and ultimate analysis are both available from the literature is limited, two training databases consist of ultimate and proximate analysis are constructed from the available literature, respectively. Besides, an application database is composed of biomass samples, whose ultimate and proximate analysis are both available, for convenience of performance comparison. It is worth noting here that the effects of biomass type, source and compositions are not considered in the present study for the following two reasons. The first reason is that there are no typical mathematical variables to well characterize those effects for modelling. The second reason is that introducing more input parameters would decrease the learning performance of the data-driven models and increase the possibility of overfitting due to the limited collected samples. Those effects would be considered in the future work by collecting more samples and exploring approximate mathematical variables to characterize them. In the present study, It is assumed that different biomass types share similar chemical bond compositions, which has closed relation with the proximate and ultimate analysis data. This assumption has been proved to be valid not in the present study but also in many previous studies [6,13,14,18]. The details about the training and application databases are introduced in the following. 2.1.1. Training samples Two training databases, consist of ultimate and proximate analysis, are collected from the available literature. For the proximate analysis database, 495 biomass samples are collected from previous literature [12,16,27e32]. For the ultimate analysis database, 190 biomass samples are collected from previous literature [15,18,28,30,33]. It is worth noting that since biomass samples in different literature may be overlapped, a pre-screening process is
3
used here to make sure that the overlapped samples appear only once in the constructed databases. The collected biomass samples cover a wide range including by-products of fruits, agri-wastes, briquettes/pellets, industry wastes, forest wastes and other waste materials. The training samples were collected from the published literature, in which most of the authors claimed their data are in a dry basis. Only very few samples are in other basis, such as receive basis, and we have transformed those samples into a dry basis for consistence due to their moisture fraction is small. Fig. 2a shows the data distribution of training database for the proximate and ultimate analysis. In the left box-plot sub-graphics, the top and bottom of the box represent 25% and 75% of the sample distributions, respectively. Lines in the top and bottom represent the maximum and minimum values, respectively. Crosses in the top and bottom represent 1% and 99% of the sample distributions, respectively. Square represents the average value. The bar chart represents the possibility distribution of the sample value. In the proximate analysis database, the mass fractions of fixed carbon (XFC ), volatile matter (XVM ), and ash (XASH ) in a dry basis are within the range of 1%e86.42%, 5.90%e92.00%, and 0.10%e77.7%, respectively, and the HHVs are within the range of 5.63e25.10 MJ/kg. In the ultimate analysis database, the mass fractions of carbon (XC ), hydrogen (XH ), oxygen (XO ), and nitrogen (XN ) elements in a dry basis are within the range of 17.98%e69.8%, 0.03%e11.55%, 6.8%e60.01%, and 0.00%e8.72%, respectively, and the HHVs are within the range of 4.67e25.00 MJ/kg. Since the number of biomass samples whose sulfur content are available is limited and the sulfur content is very low, sulfur element is not considered here. It is worth noting here that since moisture in biomass absorbs heat to vaporize and does not produce any extra heat during biomass combustion, thus this dry analysis can be extended to wet biomass through HHVd md ¼ HHVw mw if the moisture mass is known, and the HHV of wet sample is HHVw ¼ HHVd md =mw . In which HHVd and HHVw are the higher heat values for dry and wet samples, respectively. md and mw are the masses of the dry and wet samples, respectively. The details about the two training databases can be found in the Supplementary materials. The other note is that machine learning approaches have data-driven characteristic, and those developed models usually work well within the training database and have poor extra-interpolation characteristic [23]. Thus, the scope of the proposed models are within the training samples range. Increasing the number of training samples would improve the model performance, thus we will keep eyes on the published experimental data to update the database in the future
Fig. 2. Data distributions of proximate and ultimate analysis results for the (a) training database and (b) application database. It is worth noting that all analysis results are in a dry basis, and the mass fractions of H and N elements are multiplied by 5 and 10 for a clearer observation, respectively.
4
J. Xing et al. / Energy 188 (2019) 116077
study. 2.1.2. Application samples To verify and compare the performances of various models developed from proximate and ultimate analysis with different approaches, an application database, which is composed of 52 biomass samples whose ultimate and proximate analysis are both available, is constructed from the literature [28,30]. Those application samples are chosen for the following two reasons. The first reason is that we try to compare the performances of models developed based on ultimate and proximate analysis, which should be carried out samples whose proximate and ultimate analysis are both available. Of the samples we collected, only the ultimate and proximate analysis of those 52 samples are both available. The second reason is that the percentages of the training and validation samples in machine learning studies are usually about 75% and 25%, respectively [23]. In the application database, the mass fractions of fixed carbon (FC), volatile matter (VM), and ash in a dry basis are within the range of 3.60%e33.90%, 28.8%e88.20%, and 0.10%e 60.00%, respectively. The mass fractions of carbon (C), hydrogen (H), oxygen (O), and nitrogen (N) elements in a dry basis are within the range of 19.12%e56.30%, 2.00%e7.50%, 25.96%e61.7%, and 0.00%e5.10%, respectively, and the HHVs are within the range of 8.89e22.10 MJ/kg. Fig. 3 shows the plots of the training and application samples for proximate and ultimate analysis. It can be found that the application samples are randomly distributed inside the training database, which confirms the reliability of the application results. The details about the application database can be found in the Supplementary materials. It is worth noting here that ultimate analysis and heat value can be measured with similar workload with the help of the elemental analyzer and calorimeter, respectively. For the utility of the proposed models based on the ultimate analysis, it can save the time for measuring HHV using calorimeter when elemental analyzer is available. When the ultimate analyzer is not available, there are many open-source biomass databases, such as Phyllis and BIODAT constructed by Energy Research Center of The Netherlands Organisation for Applied Scientific Research [34], and BIOBIB constructed by Vienna University of Technology [35]. In which, the proximate and ultimate analysis of wide types of biomass feedstock are available and can be approximately used as the inputs of the present proposed models.
used to develop models for accurately predicting HHV from ultimate and proximate analysis using an open-source package, Python Scikit learn [36]. This open-source package can be download at https://scikit-learn.org/stable/, and their algorithms are introduced in the following. It is worth noting here that the hyper-parameter optimization of those approaches are carried out through trialand-error tests with the help of Python Scikit learn GridSearchCV tool. The nonlinear and linear correlations are also developed for comparison with the help of software 1stopt [37]. Those approaches are introduced in the following. 2.2.1. Empirical correlations In the present study, the empirical linear and non-linear correlations are also carried out for comparisons with the modelling results of machine learning approaches. For the proximate and ultimate analysis databases, the correlations are expressed as the following,
HHV ¼ a0 þ a1 XFC þ a2 XVM þ a3 XASH
(1)
HHV ¼ a0;n þ a1;n XFC þ a2;n XVM þ a3;n XASH þ a4;n X 2FC þ a5;n X 2VM þ a6;n X 2ASH (2) HHV ¼ a0 þ a1 XC þ a2 XH þ a3 XO þ a4 XN
(3)
HHV ¼ a0;n þ a1;n XC þ a2;n XC þ a3;n XC þ a4;n XC þ a5;n X 2C þ a6;n X 2H þ a7;n X 2O þ a8;n X 2N (4) where HHV is the higher heat value of biomass. XFC , XVM and XASH denote the mass fractions of fixed carbon, volatile matter and ash, respectively. XC , XH , XO and XN denote the mass fractions of carbon, hydrogen, oxygen, and nitrogen elements, respectively. ai and ai;n are the coefficients needing to be determined for the linear and non-linear correlations, respectively. With the help of 1stopt software [37], coefficients of the correlations are determined with a Levenberg-Marquardt (LM) algorithm.
2.2. Approaches In the present study, three machine learning approaches are
2.2.2. Artificial neural network Fig. 4 shows the schematic topological architecture of a three-
Fig. 3. Data distributions of proximate and ultimate analysis results for application database: (a) proximate analysis and (b) ultimate analysis.
J. Xing et al. / Energy 188 (2019) 116077
5
and optimality constraints.
f ðxÞ ¼
n X
ai ai Kðx; xi Þ þ b
(7)
i¼1
where ai and ai are the Lagrange multipliers. Kðx; xi Þ is the kernel function, and here this function is set as the common Gaussian radial basis function (RBF), which can be expressed as Kðxi ; xj Þ ¼ egkxi xj k , in which g is the width parameter. More detailed introduction about the SVM approach can be found in Ref. [41]. 2
Fig. 4. The schematic topological architecture of the three-layers ANN model.
layers ANN model, where there are three types of layers, i.e. input, hidden and output layers. The input signals are input into the input layer, and transferred to the output layer with the help of the fullyconnected neurons in the hidden layer, weight and bias vectors between neurons in different layers, and nonlinear activation functions in neurons. The output signals can be expressed as below,
y ¼ hfg½f ðxÞW1 þ B1 W2 þ B2 g
(5)
where x and y are the input and output vectors, respectively. f, g, and h are the activation functions. Here the activation function is the sigmod function for the hidden and output layers, and there is no activation function for the input layer. Wi and Bi are the weight and bias vectors between different layers. The weight and bias vectors are randomly initialized, and then we can obtain a prediction with the input signals using the forward propagation. Here, an error function, mean sum error (MSE), is introduced to measure the deviation between the predicted and actual values. If the error reaches the ideal criteria, the training process is over. If not, a backward propagation term is used to modify the weight and bias vectors using the algorithm proposed by Rumelhart et al. [38], and the backward propagation process is repeated until the error reaches the criteria. In addition, an L2 regularization term [39] is added into the error function to avoid over-fitting. It is worth noting that the number of neurons in the hidden layer has a significant effect on the ANN performance. The optimal number is determined with a trial-and-error test, and the results will be presented in next section.
2.2.4. Random forest Random forest is an ensemble learning approach for classification and regression that operates by constructing a multitude of decisions trees for training samples and obtaining the output signals that is the dominant mode of the classes and the average prediction of each tree for classification and regression, respectively. Fig. 5 shows the schematic topological architecture of the random forest approach. The beginning is the training database consists of Ntotal samples with M features. The bootstrap sampling method is employed to randomly generate n sample sets from the original training database. Then for each sample set, the samples are randomly divided into in-bag and out-of-bag (OOB) samples with the proportions of 2/3 and 1/3, respectively. The out-of-bag samples are not involved in the training process, but are used to determine the optimal number of trees by a trial-and-error method, and the test results will be presented in next section. The prediction of the trained random forest model is the average predictions of all trees. Details about this method can be found in the study of Breiman et al. [42].
2.3. Evaluation indicators Here, we try to develop models for accurately predicting HHV for various biomass, and what we care is the generality of the proposed model for various biomass. Thus, we compare the model performance with statistical indicators instead of presenting each prediction. Three indicators, including determination coefficient (R2 ) [43], root mean square error (RMSE) [23], and mean absolute percentage error (MAPE) [6], which have been widely used to quantitatively compare performances of various models in many previous machine learning studies, are defined here.
P Ntotal
2 yi;cal yi;exp R2 ¼ 1 2 PN i¼1 yi;exp yexp i¼1
2.2.3. Support vector machine Support vector machine (SVM) method was developed by Vapnik et al. [40] based on statistical learning theory and structural risk minimization principle, which maps the initial training samples to a higher dimensional characteristic space using nonlinear kernel functions and thus converts the problem from nonlinear to linear to get the optimal solution [41]. Giving a training database, T ¼ fðx1 ; y1 Þ; …; ðxn ; yn Þg, where xi and yi are the input and output vectors, respectively. The SVM regression function f ðxÞ can be expressed as below,
f ðxÞ ¼ wfðxÞ þ b
(6)
where w and b are the weight and bias vectors, respectively. fðxÞ is the nonlinear function used to map the initial input vectors into a higher dimensional characteristic space. The regression function can be determined as Eq. (7) by introducing Lagrange multipliers
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 2 total u 1 NX yi;cal yi;exp RMSE ¼ t Ntotal i¼1
MAPE ¼
yi;cal yi;exp 100% yi;exp
1
N total X
Ntotal
i¼1
(8)
(9)
(10)
where R2 , RMSE, and MAPE are the determination coefficient, root mean square error and mean absolute percentage error, respectively. Ntotal is the number of the total samples. i is the sample index. yi;cal and yi;exp are the predicted and measured HHVs for the ith sample, and yexp is the average measured HHV of the entire training or application samples.
6
J. Xing et al. / Energy 188 (2019) 116077
Fig. 5. The schematic topological architecture of the random forest approach.
3. Results and discussions In this section, the hyper-parameters optimization results of different machine learning methods are presented. Modelling performances of the EC, ANN, SVM and RF approaches for the training and validation databases are also presented and compared based on proximate and ultimate analysis, respectively. The relative importance of each input on the biomass HHV is then explored. 3.1. Proximate analysis 3.1.1. Hyper-parameter optimization When the proximate data is used to develop models, the coefficients of Eqs. (1) and (2) can be obtained with the help of 1stopt software using a LM algorithm. The expressions of the determined correlations are expressed as below,
HHV ¼ 17:797 þ 0:031 XFC þ 0:010 XVM 0:155 XASH (11) HHV ¼ 19:050 þ 0:124 XFC 0:021 XVM 0:167 XASH 0:001 FC 2 þ 0:000018 VM 2 0:000055 ASH2 (12) It is necessary to mention that values of proximate analysis are in a percentage basis. For the ANN approach, the trial-and-error method and 10-folder cross validation are employed for determining the optimal number of neurons in the hidden layer, and the test number of neuron ranges from 1 to 30. The test results are presented in Fig. 6, from which it can be found that the mean sum error for the entire database decreases and then stabilizes with the increasing of the neurons number in the hidden layer. This indicates that the model performance would not be improved when the neurons number exceeds the critical value. Thus, here the optimal neuron number is chosen as 21 as shown in Fig. 6. For the present SVM approach, the hyper-parameters need to be specific are the penalty parameter C and the width parameter of RBF function g. Here, we determine the optimal values of C and g through testing a series of combinations of C and g with 10-folder cross validation. The tested values of C and g range from 0.01 to 100 with a resolution of 0.1, which means that 1000 1000 times tests are carried out. The test results shows that the mean train error
Fig. 6. Trial-and-error test results of neurons number in the hidden layer for proximate analysis. Here the dashed red line represents the location of the optimal neuron number. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
decreases with the increasing of g, and then stabilizes when g is larger than 17. The mean test error is small when g is smaller than 17, and the test performance becomes worse. The determined optimal value of C and g are 84.41 and 0.01, respectively. For the random forest model, the parameter needs to be determined is the number of decision trees. Here, we test the performance of random forest model with different number of decision trees, and the tree number ranges from 1 to 500. For the train data and the OOB data, the mean sum error decreases and then stabilizes with the increasing of tree number as shown in Fig. 7, thus the optimal number is set as the critical value, 67.
3.1.2. Model performances for training samples Fig. 8 shows the comparisons of the training performances of the EC, ANN, SVM and RF approaches for the proximate analysis database. For the EC method, both the linear and nonlinear
J. Xing et al. / Energy 188 (2019) 116077
Fig. 7. Normalized mean sum errors of OOB data and training data of random forest with different number of decision trees for proximate analysis. Here the dash red line represents the optimal number of trees. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
7
correlations predictions are plotted here for comparison. To quantitatively compare the performances of the proposed models, evaluation indicators are calculated and compared. It is found the nonlinear correlation shows a slightly better performance compared with that of the linear correlation, with R2 increasing from 0.678 to 0.707, MAPE decreasing from 1.524 to 1.454, and RMSE decreasing from 6.556% to 6.282%, respectively, but the performance is still far away from being acceptable. This indicates that this complex nonlinear relation between proximate analysis and HHV can not be well characterized with simple nonlinear square correlation. The training performance has been improved when machine learning approaches are employed. Among the used three ML approaches, the RF gives the best training performances, with R2 , MAPE and RMSE have the values of 0.962, 0.274 and 2.020%, respectively. The ANN and SVM approaches have a similar performance, with the R2 , MAPE and RMSE have the values of 0.859 and 0.845, 1.040 and 1.119, and 4.669% and 4.662%, respectively. The reason why RF model shows advantages against the other two models is that RF features an ensemble learning approach, in which bootstrap sampling method is employed to generate Ntotal sub-data for each decision tree, and then predictions are obtained by averaging each tree prediction. This makes the learning process more robustness, reduces the effect of noise data and the possibility of over-fitting. While the SVM and ANN feature a single learning process from the training database without bootstrap sampling and statistical averaging. Therefore, the RF model gives better performance compared with the other two models, which has also been found in many previous studies [23,45,46].
3.1.3. Model performances for application samples To comprehensively verify the performances of the developed models, all developed models are employed to predict biomass HHV in the application database. Fig. 9 shows the comparisons of various model predictions for the application database. To quantitatively compare the performances of the proposed models, evaluation indicators are calculated and compared. It can be found that the linear and nonlinear correlations have very poor predictions, with R2 , MAPE and RMSE have values of 0.057 and 0.104, 2.327 and 2.268, and 8.836% and 8.413%, respectively. Especially for low and high HHV biomass, the EC model gives surprising deviations with RMSE larger than 80%. Consistent with the training results, three ML approaches also show better performances compared with these of the EC models. The RF model gives the best prediction with R2 , MAPE and RMSE have values of 0.944, 0.567 and 2.564%, respectively. The SVM and ANN models share a similar performance, with the R2 , MAPE and RMSE have values of 0.900 and 0.909, 0.759 and 0.725, and 3.407% and 3.533%, respectively. It is worth noting here that the R2 of application performance is slightly higher than that of training performance. This is because that the mean values and data distribution of the training and validation database are different. The range of the validation samples is smaller and mainly concentrated in the region where the training sample is abundant.
3.2. Ultimate analysis
Fig. 8. Comparisons of the training performances of the EC, ANN, SVM and RF models for the proximate analysis. Here, the red solid line represents the best performance, y ¼ x. For the EC method, both predictions of the linear and nonlinear correlations are plotted. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
3.2.1. Hyper-parameter optimization For the ultimate analysis data, the coefficients of the empirical linear and nonlinear correlations are determined using similar method as those for the proximate analysis, and the determined correlations are expressed as below,
8
J. Xing et al. / Energy 188 (2019) 116077
Fig. 9. Comparisons of the application performances of the EC, ANN, SVM and RF models for the proximate analysis. Here, the red solid line represents the best performance, y ¼ x. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
HHV ¼ 3:918 þ 0:276 XC þ 0:016 XH þ 0:020 XO þ 0:245 XN
(13)
HHV ¼ 4:437 þ 0:879 XC þ 0:278 XH 0:144 XO 0:195 XN 0:007 C 2 0:015 H2 0:001 O2 þ 0:057 N2 (14) It is necessary to mention that values of ultimate analysis are also in a percentage basis. For other three machine learning approaches, there are also some hyper-parameter optimizations need to be optimized for desirable model performances. The detailed methods are the same as these for the proximate analysis. Similar phenomena can be found as these of the proximate analysis. The performance of the ANN model decreases and stabilizes with the increasing of the neurons number in the hidden layer as similar in Fig. 6, thus the critical number, 18, is selected as the optimal value. The mean train error of SVM model decreases with the increasing of C and g, but the mean test error decreases when g is larger than 8. The optimal values of C and g are 78.51 and 0.31, respectively. For the random forest model, the mean sum errors of OOB and training data decrease and then stabilize with the increasing of tree number as similar in Fig. 7, and the optimal tree number is set as 58.
Fig. 10. Comparisons of the training performances of the EC, ANN, SVM and RF models for the ultimate analysis. Here, the red solid line represents the best performance, y ¼ x. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
3.2.2. Model performances for training samples Fig. 10 shows the comparisons of the training performances of the EC, ANN, SVM and RF approaches for the ultimate analysis database. For the EC method, both the linear and nonlinear correlations predictions are also plotted here for comparison. To quantitatively compare the performances of the proposed models, evaluation indicators are calculated and compared. It is found that the nonlinear correlation shows a slightly better performance compared with that of the linear correlation, with R2 increasing from 0.643 to 0.691, MAPE decreasing from 1.568 to 1.459, and RMSE decreasing from 6.276% to 5.789%, respectively, but the performance is still far away from being acceptable. This also indicates that this complex nonlinear relation between ultimate analysis and HHV can not be well characterized with a simple nonlinear square correlation. The training performance has been improved when machine learning approaches are employed. Among the used three ML approaches, the SVM model gives the best training performance, with R2 , MAPE and RMSE have the values of 0.953, 0.571 and 2.534%, respectively. The RF model gives the middle predictions with R2 , MAPE and RMSE have the values of 0.938, 0.651 and 2.395%, respectively. The ANN model gives the worst performances with R2 , MAPE and RMSE have the values of 0.897, 0.842 and 3.865%, respectively. 3.2.3. Model performances for application samples The performances of the developed four models, including EC,
J. Xing et al. / Energy 188 (2019) 116077
9
values and model predictions should be the important indicator parameters to reflect the measurement deviation in the sample. In the present study, the accuracy of the sample is not considered since the measurement deviations are unavailable in the literature we use to collect samples. Besides, this is not the topic of the present study, and the uncertainty of machine learning models is challenging and needs exploring [48]. Here we provide a method to consider the experimental deviation of samples in the machine learning models with the Monte Carlo method [48]. There are inputs and targets in the training database, whose mean (xmean and ymean ) and standard deviation values (xstd and ystd ) could be obtained through multiple measurements. The training samples are randomly generated for each input and target within the range determined from their mean and standard deviations, and then the generated samples are used for the training process. The above random generations of input and target values and training process are repeated for NMCM times for Monte Carlo statistics. Then we can obtain NMCM machine learning models, and each model would give a prediction for a sample. Finally the mean and standard deviation of the predictions can be assessed. For the model application, this method is also applicable without the training process. 3.3. Relative importance analysis
Fig. 11. Comparisons of the application performances of the EC, ANN, SVM and RF models for the ultimate analysis. Here, the red solid line represents the best performance, y ¼ x. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
ANN, SVM and RF, are verified on the application database. Fig. 11 shows the comparison results of the performances of various models on the application database. To quantitatively compare the performances of the proposed models, evaluation indicators are calculated and compared. Compared with the EC models based on proximate analysis models, the EC models based on ultimate analysis can give a better prediction, with R2 increasing from 0.1 to nearly 0.6. This indicates that the ultimate analysis data is more suitable for developing empirical correlation models. Thus it is recommended to choose the ultimate data for predicting biomass HHV if both the proximate and ultimate analysis are available. The three ML approaches show better performances. The RF model gives the best performance, with the R2 , MAPE and RMSE have the values of 0.944, 0.567 and 2.564%, respectively. and the ANN and SVM approaches show similar performances, with the R2 , MAPE and RMSE have the values of 0.944, 0.567 and 2.564%, respectively. It is worth to declare that the measurement deviation in the training samples should have effects on the model training and development. The accuracy of the training samples would first influence the training process, then the model development, and finally the model predictions. If data are not reliable, it would not be reflected in the model predictions. This is because in common machine learning studies, it is assumed that the training data are accurate without considering the measurement deviation [14,43,47]. To the best of the authors’ knowledge and literature reports, the mean and standard deviation of the experimental
This paper reports the first quantitative analysis on the relative importance of ultimate and proximate analysis data on the HHV. The above discussions show that the RF model shows the best performance, which means that the trained RF model can well characterize those complex correlations. Thus, the RF model is analyzed for exploring relative importance of each input on biomass HHV. In the random forest model, the node split in the each decision tree is made on variable m when the gini impurity criterion for the two descendant nodes is less than the parent node. The relative importance of each input on the output can be assessable by adding up the gini decreases for each variable over all trees in the forest and then normalized all sum gini values for all inputs [49]. Fig. 12 shows the measured relative importance of inputs on the biomass HHV for the proximate and ultimate analysis, respectively. It can be found that for the proximate analysis data, the ash, volatile matter and fixed carbon fractions show the maximum, medium and minimum effects on the biomass HHV with relative importance of 73.7%, 15.9% and 10.4%, respectively. This indicates that the ash content of biomass has the most significant effect on biomass HHV, and biomass with high ash usually has a low HHV. For the ultimate analysis data, the carbon and hydrogen fractions hold the first two significant places, with the carbon fraction has the most significant effect. This is because the biomass heat mainly origins from the bridge breaking of the combustible CeH components. The oxygen and nitrogen fractions have limited effects on biomass HHV. This analysis could provide a possible way to directionally alter fuel composition to meet HHV demand. 4. Conclusions In this work, we report the first comprehensive study on estimating biomass HHV from proximate and ultimate analysis with multiple machine learning methods. The empirical linear and nonlinear correlations are also developed for comparisons. The results show that machine learning approaches give better predictions (R2 > 0.90) compared with these of the linear and nonlinear empirical correlations (R2 < 0.70), especially for the extreme values. The RF model shows the best performances for both the ultimate and proximate analysis, with the determination
10
J. Xing et al. / Energy 188 (2019) 116077
Fig. 12. Measured relative importance of inputs on outputs for proximate analysis (left) and ultimate analysis (right).
coefficient R2 > 0.94. The SVM and ANN approach show a similar performance, with the determination coefficient R2 0.90. It is worth noting that ultimate-based models show better performances compared with those of the proximate-based models even with much less samples. The relative importance analysis shows that for the proximate analysis, the ash, volatile matter and fixed carbon fractions show the maximum, medium and minimum effects. For the ultimate analysis, the carbon and hydrogen fractions hold the first two significant places, with the carbon fraction gives the most significant effect, and the oxygen and nitrogen fractions have limited effects. The performance of other machine learning approaches will be studied in the future study.
[11]
[12] [13] [14]
[15] [16] [17]
Conflict of interest [18]
The authors have no conicts of interest to declare. Acknowledgment The authors are grateful for the support from National Key Research and Development Program of China (Grant No: 2017YFB0601805) and National Natural Science Foundation of China (Grant No: 91741203). JX especially thanks to his parents, sister and girlfriend for their constant encouragement and accompany.
[19] [20] [21] [22]
[23]
[24]
Appendix A. Supplementary data [25]
Supplementary data to this article can be found online at https://doi.org/10.1016/j.energy.2019.116077. [26]
References [1] Taarning E, Osmundsen CM, Yang XB, Voss B, Andersen SI. Zeolite-catalyzed biomass conversion to fuels and chemicals. Energy Environ Sci 2011;4: 793e804. [2] World Bioenergy Association. Global bioenergy statistics. Avaliable at: https:// worldbioenergy.org/uploads/181203%20WBA%20GBS%202018_hq.pdf; 2016. [3] Goyal HB, Seal D, Saxena RC. Bio-fuels from thermochemical conversion of renewable resources: a review. Renew Sustain Energy Rev 2008;122:504e17. [4] Sohail Toor Saqib, Rosendahl Lasse, Rudolf Andreas. Hydrothermal liquefaction of biomass: a review of subcritical water technologies. Energy 2015;36: 2328e42. [5] Hassan Khodaei, Al-Abdeli Yasir M, Guzzomi Ferdinando, Yeoh Guan H. An overview of processes and considerations in the modelling of fixed-bed biomass combustion. Energy 2015;88:946e72. [6] Hosseinpour S, Aghbashlo M, Tabatabaei M, Mehrpooya M. Estimation of biomass higher heating value (HHV) based on the proximate analysis by using iterative neural network-adapted partial least squares (INNPLS). Energy 2017;138:473e9. [7] McKendry P. Energy production from biomass (part 1): overview of biomass. Bioresour Technol 2002;83:37e46. [8] Prins MJ, Ptasinski KJ, Janssen F. From coal to biomass gasification: comparison of thermodynamic efficiency. Energy 2007;32:1248e59. [9] Puig-Arnavat M, Bruno JC, Coronas A. Review and analysis of biomass gasification models. Renew Sustain Energy Rev 2010;14:2841e51. [10] Wen X, Luo K, Jin HH, Fan JR. Large eddy simulation of piloted pulverized coal
[27]
[28] [29] [30] [31] [32] [33]
[34] [35] [36] [37]
combustion using extended flamelet/progress variable model. Combust Theor Model 2017;21:925e53. Majumder AK, Jain R, Banerjee P, Barnwal JP. Development of a new proximate analysis based correlation to predict calorific value of coal. Fuel 2008;87: 3077e81. Nhuchhen DR, Salam PA. Estimation of higher heating value of biomass from proximate analysis: a new approach. Fuel 2012;99:55e63. Akkaya E. ANFIS based prediction model for biomass heating value using proximate analysis components. Fuel 2016;180:687e93. Estiati I, Freire FB, Freire JT, Aguado R, Olazar M. Fitting performance of artificial neural networks and empirical correlations to estimate higher heating values of biomass. Fuel 2016;180:377e83. Yin C. Prediction of higher heating values of biomass from proximate and ultimate analyses. Fuel 2011;90:1128e32. García R, Pizarro C, Lavín AG, Bueno JL. Spanish biofuels heating value estimation. Part II: proximate analysis data. Fuel 2014;117:1139e47. Santosa RG, Bordado JC, Mateusa MM. Estimation of HHV of lignocellulosic biomass towards hierarchical cluster analysis by Euclidean's distance method. Fuel 2018;221:72e7. García R, Pizarro C, Lavín AG, Bueno JL. Spanish biofuels heating value estimation. Part I: ultimate analysis data. Fuel 2014;117:1130e8. Lecun Y, Boser BE, Denker JS. Backpropagation to handwritten zip code recognition. Neural Comput 1989;1:541e51. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20:273e97. Ho TK. Random decision forests. In: Proceedings of the 3rd international conference on document analysis and recognition; 1995. p. 278e82. Xing JK, Luo K, Pitsch H, Wang HO, Bai Y, Zhao CG, Fan JR. Predicting kinetic parameters for coal devolatilization by means of Artificial Neural Networks. Proc Combust Inst 2019;37:2943e50. Xing JK, Luo K, Wang HO, Wang S, Bai Y, Fan JR. Predictive single-step kinetic model of biomass devolatilization for CFD applications: a comparison study of empirical correlations (EC), artificial neural networks (ANN) and random forest (RF). Renew Energy 2019;136:104e14. Mutlu AY, Yucel O. An artificial intelligence based approach to predicting syngas composition for downdraft biomass gasification. Energy 2018;165: 895e901. Kasmuri NH, Kamarudin SK, Abdullah SRS, Hasan HA, Md Som A. Integrated advanced nonlinear neural network-simulink control system for production of bio-methanol from sugar cane bagasse via pyrolysis. Energy 2019;168: 261e72. De S, Kaiadi M, Fast M, Assadi M. Development of an artificial neural network model for the steam process of a coal biomass cofired combined heat and power (CHP) plant in Sweden. Energy 2007;32:2099e109. Uzun H, Yıldız Z, Goldfarb JL, Ceylan S. Improved prediction of higher heating value of biomass using an artificial neural network model based on proximate analysis. Bioresour Technol 2017;234:122e30. Parikha J, Channiwala SA, Ghosal GK. A correlation for calculating HHV from proximate analysis of solid fuels. Fuel 2005;84:487e94. Erol M, Haykiri-Acma H, Kuçukbayrak S. Calorific value estimation of biomass from their proximate analyses data. Renew Energy 2010;35:170e3. Dhyani V, Bhaskar T. A comprehensive review on the pyrolysis of lignocellulosic biomass. Renew Energy 2018;129:695e716. Ozveren U. An artificial intelligence approach to predict gross heating value of lignocellulosic fuels. J Energy Inst 2017;90:397e407. Ozyuguran A, Yaman S. Prediction of calorific value of biomass from proximate analysis. Energy Procedia 2017;107:130e6. Han Jun, Yao Xi, Zhan Yiqiu, Oh Song-Yul, Kim Lae-Hyun, Kim Hee-Joon. A method for estimating higher heating value of biomass-plastic fuel. J Energy Inst 2017;90:331e5. Reisinger K, Haslinger C, Herger M. BIOBIB - a database for biofuels. Available at: http://cdmaster2.vt.tuwien.ac.at/biobib/. [Accessed 12 May 2019]. ECN.TNO. Phyllis2, database for biomass and waste. Available at: https:// phyllis.nl/. [Accessed 12 May 2019]. Python. Scikit-learn packages. Available at: https://scikit-learn.org/stable/ (Last accessed at 15 July 2019). 7D-Soft High Technology Inc. 1st Opt Manual, Release 6.0. http://www.7dsoft.com/. [Accessed 15 July 2019].
J. Xing et al. / Energy 188 (2019) 116077 [38] Rumelhart DE, Hinton GE, Williams RJ. Learning representations by backpropagating errors. Nature 1986;323:533. [39] Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016. [40] Vapnik V. The nature of statistical learning theory. N.Y.: Springer; 1995, ISBN 0-387-94559-8. €lkopf B. A tutorial on support vector regression. Stat Comput [41] Smola AJ, Scho 2004;14(3):199e222. [42] Breiman L. Random forests”. Mach Learn 2001;45(1):5e32. [43] Xing JK, Luo K, Wang HO, Fan JR. Estimating biomass major chemical constituents from ultimate analysis using a random forest model. Bioresour Technol 2019;288:121541. [45] Lei CK, Deng J, Cao K, Xiao Y, Ma L, Wang WF, Ma T, Shu CM. A comparison of random forest and support vector machine approaches to predict coal spontaneous combustion in gob. Fuel 2019;239:297e311. [46] You HH, Ma ZY, Tang YJ, Wang YL, Ni MJ, Cen KF, Huang QX. Comparison of ANN (MLP), ANFIS, SVM, and RF models for the online classification of heating value of burning municipal solid waste in circulating fluidized bed incinerators. Waste Manag 2017;68:186e97. [47] Luo K, Xing JK, Bai Y, Fan JR. Prediction of product distribution in coal devolatilization by an artificial neural network model. Combust Flame 2018;193: 283e94. [48] Coral R, Flesch CA, Penz CA, Roisenberg M, Pacheco ALS. A Monte Carlo-based method for assessing the measurement uncertainty in the training and use of artificial neural networks. Metrol Meas Syst 2016;23:281e94. [49] Breiman L. Manual on setting up, using, and understanding random forests. V3.1, http://www.stat.berkeley.edu/b~reiman/RandomForests/cc-manual. htm; 2003.
XASH : Ash fraction XC : Carbon fraction XH : Hydrogen fraction XO : Oxygen fraction XN : Nitrogen fraction ai : Linear coefficients ai;n : Nonlinear coefficients x; xi : Input vectors xmean ; xstd : Mean and standard deviation of the inputs ymean ; ystd : Mean and standard deviation of the output y; yi : Output vectors h; g; f : Activation functions in ANN Wi ; Bi : Weight and bias vectors in ANN w; b: Weight and bias vectors in SVM fðxÞ: Nonlinear map function in SVM a; ai : Lagrange multipliers Kðx; xi Þ: Kernel function C: Penalty parameter of RBF g: Width parameter of RBF Ntotal : Total sample number NMCM : Random sampling times in Monte Carlo Method M: Number of features i: Sample index yi;cal : The predicted HHV of biomass(MJ/kg) yi;exp : The measured HHV of biomass(MJ/kg) mw ; md : Masses of wet and dry biomass samples(kg) HHVw ; HHVd : Higher heat values of wet and dry biomass samples(MJ/kg)
Nomenclature
Abbreviations
Variables R2 : Determination coefficients MSE: Mean sum error RMSE: Root mean sum error MAPE: Mean absolute percentage error XFC : Fixed carbon fraction XVM : Volatile matter fraction
HHV: Higher heating value ANN: Artificial neural networks SVM: Support vector machine RF: Random forest EC: Empirical correlations RBF: Gaussian radial basis function OOB: Out-of-bag LM: Levenberg-Marquardt
11