Soil & Tillage Research 197 (2020) 104513
Contents lists available at ScienceDirect
Soil & Tillage Research journal homepage: www.elsevier.com/locate/still
Developing novel hybrid models for estimation of daily soil temperature at various depths
T
Saeid Mehdizadeha,*, Farshad Fathianb, Mir Jafar Sadegh Safaric, Ali Khosravid a
Department of Water Engineering, Urmia University, Urmia, Iran Department of Water Science & Engineering, Faculty of Agriculture, Vali-e-Asr University of Rafsanjan, P.O.Box 77188-97111, Rafsanjan, Iran c Department of Civil Engineering, Yaşar University, Izmir, Turkey d Department of Mechanical Engineering, School of Engineering, Aalto University, Helsinki, Finland b
A R T I C LE I N FO
A B S T R A C T
Keywords: Estimation Daily soil temperature Fractionally autoregressive integrated moving average Feed-forward back propagation neural networks Gene expression programming
Estimation of soil temperature (ST) as one of the vital parameters of soil, which has an impact on many chemical and physical characteristics of soil, is of great importance in soil science. This study applies a time series-based model, namely fractionally autoregressive integrated moving average (FARIMA), as well as two machine learning-based models consisting of feed-forward back propagation neural networks (FFBPNN) and gene expression programming (GEP) for daily ST estimation. In doing so, the daily ST data of three stations at four depths (5, 10, 50, and 100 cm) in Iran were used for the time period from 1998 to 2017. Studied stations were selected from different climates including arid (Isfahan station), semi-arid (Urmia station), and very humid (Rasht station) to evaluate the performance of models and generalize the outcomes in different climate classes. The performances of the developed models are evaluated via three statistical metrics including the root mean square error (RMSE), mean absolute error (MAE), and relative RMSE (RRMSE). Results obtained demonstrated that the machine learning-based FFBPNN and GEP models performed better than the time series-based FARIMA approach at all depths. As a result, negligible differences were observed between the accuracies of FFBPNN and GEP. In addition, this study developed novel hybrid models through combining the FFBPNN and GEP techniques with the FARIMA to enhance the accuracy of traditional FARIMA, FFBPNN, and GEP. The developed hybrid models named GEP-FARIMA and FFBPNN-FARIMA were found to achieve better estimates of daily ST data at different depths in comparison with the classical models. The daily ST estimates with the highest accuracy were observed at a depth of 50 cm via the GEP-FARIMA at Isfahan station (RMSE = 0.05 °C, MAE = 0.03 °C, RRMSE = 0.25% for the testing phase), the GEP-FARIMA at Urmia station (RMSE = 0.04 °C, MAE = 0.03 °C, RRMSE = 0.26% for the testing phase), and the FFBPNN-FARIMA at Rasht station (RMSE = 0.07 °C, MAE = 0.05 °C, RRMSE = 0.35% for the testing phase).
1. Introduction Soil temperature (ST) is one of the crucial parameters of soil that controls the equilibrium of the heat energy amongst atmosphere and ground surface (Sanikhani et al., 2018), underground physical processes and carbon budget (Samadianfard et al., 2018a). Soil thermal regime determines the directions and rates of physical processes in soil such as moisture gradient and thermal fluxes (Araghi et al., 2017). It can also affect mass transfer in soil, soil structure and nutrient uptake (Børresen et al., 2007; Li et al., 2008; Wu et al., 2010; Xing et al., 2018), plant growth (Brar et al., 1992; Liu and Huang, 2005), seed germination (Nabi and Muillins, 2008), accumulation of organic matter in soil, soil
respiration, organic matter destruction (Seyfried et al., 2001; Schimel et al., 2004; Rube, 2005; Xing et al., 2018), root development, appearance and growth of seedling (Hillel, 1998). Despite the importance and critical need for the knowledge of ST values in various fields of engineering, particularly in agriculture, accessibility to the ST data is very limited in many areas (i.e., developing countries). ST values should be measured by thermometers installed at different soil depths, which is a time-consuming and costly task (Hu et al., 2016; Feng et al., 2019). The measurement error for the ST thermometers is 0.1 °C. Thus, alternative approaches have recently attracted much attention. In this regard, some traditional approaches including the soil heat flow, energy balance, empirical correlations with
⁎
Corresponding author. E-mail addresses:
[email protected] (S. Mehdizadeh),
[email protected] (F. Fathian),
[email protected] (M.J.S. Safari), ali.khosravi@aalto.fi (A. Khosravi). https://doi.org/10.1016/j.still.2019.104513 Received 8 June 2019; Received in revised form 10 October 2019; Accepted 16 November 2019 0167-1987/ © 2019 Elsevier B.V. All rights reserved.
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
approaches with the FARIMA for improving the accuracy of classical models. Accordingly, the hybrid models named GEP-FARIMA and FFBPNN-FARIMA are proposed. It is worth mentioning that the performance of FARIMA, as well as proposed hybrid models are evaluated for the first time in this study for daily ST estimation at various soil depths.
easily obtainable parameters (Kang et al., 2000) as well as numerical, analytical and experimental models (Behmanesh and Mehdizadeh, 2017) can be utilized to estimate ST. However, the application of aforementioned approaches seems to be a sophisticated and time-consuming task. In addition, data-driven methods including the machine learning-based and time series-based models could be applied; however, the machine learning models are more commonly used than the time series models for estimating the ST. The machine learning techniques have been broadly applied in literature to estimate ST that examples of the conducted studies in this field are briefly presented below: An artificial neural network (ANN) approach was implemented by Bilgili (2011) for estimating monthly ST of various depths at Adana, Turkey, and concluded that the ANN is an appropriate tool for the estimation of monthly ST. Wu et al. (2013) predicted the mean monthly ST of 83 regions located in southwestern China, by applying the ANN. It was reported that the ANN is a promising technique for estimating the mean monthly ST time series of the studied areas. Kim and Singh (2014) modeled daily ST of Champaign and Springfield stations, USA, utilizing adaptive neuro-fuzzy inference system (ANFIS) and multi-layer perceptron (MLP). The MLP showed better estimates of ST than those of the ANFIS. Kisi et al. (2015) studied the applicability of different types of the ANN consisting of radial basis neural networks (RBNN), generalized regression neural networks (GRNN), and MLP for modeling the monthly ST values at different depths for Mersin, Turkey. The performance of these techniques was also compared with simple multiple linear regression (MLR). They reported that RBNN at surface soil layers (i.e. 5 and 10 cm), as well as GRNN and MLR at deep layers (i.e. 50 and 100 cm) performed better than the other models. Tabari et al. (2015) employed the ANN to forecast the ST time series of two stations in Iran with different climates (i.e., Zahedan and Sari). A reliable capability of the ANN technique was reported to estimate ST of various depths at the studied areas. Mehdizadeh et al. (2017a) stated a better accuracy of the ANFIS in comparison with the gene expression programming (GEP) and ANN models when estimating the mean monthly ST of 31 stations, Iran. Samadianfard et al. (2018b) coupled wavelet analysis with the ANN and GEP models for prediction of daily ST of Tabriz station, northwest of Iran. The WANN models developed at different soil depths presented superior results compared with the WGEP. Two machine learning models, namely support vector machine (SVM) and multivariate adaptive regression splines (MARS) were employed by Mehdizadeh et al. (2018a) in estimation of mean monthly ST of 30 stations, Iran. The MARS was found to achieve better efficiency than the SVM at different soil depths. Kazemi et al. (2018) trained a genetic-based neural network ensemble by applying a sequential genetic-based negative correlation learning algorithm for estimating daily ST of two stations in Iran. They found out that the developed model performed the best. Sihag et al. (2019) applied MLP, random forest (RF), Gaussian process (GP), and M5P models for modeling the daily ST and the MLP was found to perform better than other methods at two studied stations in Iran. Feng et al. (2019) developed back propagation neural networks (BPNN), extreme learning machine (ELM), RF, and GRNN approaches for halfhourly ST estimation of various depths in a rainfed maize field situated in the north of China and concluded that the ELM was the best technique. Clearly, the literature review revealed that the time series models have received less attention compared to the machine learning models in estimating the ST. In addition, the hybrid models have not been reported in previous works for the estimation of ST via combining the machine learning and time series models. The objective of the current study is to estimate the daily time series of ST for various depths (i.e., 5, 10, 50, and 100 cm) at three stations with various climate classes in Iran. To this end, a time series-based model, namely fractionally autoregressive integrated moving average (FARIMA), and two machine learning-based models including the GEP and feed forward BPNN (FFBPNN) were applied. Afterwards, hybrid models were developed through combining the GEP and FFBPNN
2. Materials and methods 2.1. Study sites and dataset description In the present work, ST datasets of three stations located in Iran at four soil depths (i.e., 5, 10, 50, and 100 cm) were used over a daily time scale. It is necessary to mention that the daily ST at Iranian stations is measured at three times (i.e., 03:00, 09:00, and 15:00). The average ST data recorded during these three times are used as the mean daily ST. The studied stations were selected from different climates to investigate the performance of classical and proposed hybrid models for ST estimation in various climatic zones. The considered stations include Isfahan (arid), Urmia (semi-arid), and Rahst (very humid) where, their climates were discerned using the aridity index suggested by de Martonne (1925). Geographic locations of the studied areas are depicted in Fig. 1. Moreover, the geographic attributes of the studied stations are given in Table 1. The daily ST data of stations at 5, 10, 50, and 100 cm soil depths were collected from Iran Meteorological Organization (IMO) between the time period of 1 January 1998 and 31 December 2017. It should be noted that the ST is measured manually by an IMO operator using mercury thermometers installed at different soil depths. For the case of Urmia and Isfahan stations, there was no missing ST data at the depths of 5 and 10 cm; however, the number of missing data at 50 cm were 25 (Urmia) and 11 (Isfahan) as well as 50 (Urmia) and 2 (Isfahan) at the depth of 100 cm. Regarding the Rasht station, there were 11, 11, 29, and 19 missing ST data at the depths of 5, 10, 50, and 100 cm, respectively. Obviously, the number of missing data is negligible against the entire daily dataset used for the 20-year time period. The missing ST data were filled by the average values. The entire ST data were split into training and testing phases so that the first fifteen-year data (from 1998 to 2012; 75% of the entire data) and the remaining five-year data (between 2013 and 2017; 25% of the entire data) were respectively applied as the training and testing datasets. Time series graphs of the daily ST at four different depths are illustrated in Fig. 2 for the time period of 1998 to 2017. Table 2 tabulates the statistical parameters of the daily ST data at different soil depths for training and testing phases separately. As can be seen clearly, the statistical parameters of ST data at different depths for the studied stations are generally similar for both of the training and testing periods. xmin and xmax of ST datasets at the studied sites are respectively increased and decreased by increasing the soil depth. Moreover, the ST datasets illustrate a low skewness. On the other hand, by comparing the values of xsd and xcv at various soil depths, it is obvious that surface soil layers (5 and 10 cm) present higher standard deviations and variation coefficients in comparison with the bottom layers (50 and 100 cm). It is necessary to mention that the entire ST datasets were standardized to avoid the influence of time scale in the process of ST estimation. 2.2. Fractionally autoregressive integrated moving average One of the general types of linear autoregressive integrated moving average (ARIMA) time series model is the fractionally ARIMA (i.e., FARIMA) model that uses non-integer values for the differencing parameter of the ARIMA instead of integer values. This model can be applicable for the time series having long-term memory. When a time series has a long-term memory, sample autocorrelation values reduce to zero at a slower rate than the rate of reducing for the AR(1) time series 2
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
Fig. 1. The geographic locations of the studied stations in Iran.
z t = (1 − B )dyt = [ϕp (B )]−1ψq (B ) εt
Table 1 The geographic information of the studied stations in Iran. Station Isfahan Urmia Rasht
Longitude (N) 51° 40′ 45° 03′ 49° 37′
Latitude (E) 32° 37′ 37° 40′ 37° 19′
Altitude (m) 1550.4 1328.0 −8.6
(1 − B )d = 1 − dB +
Climate
(2)
B )d
is the fractionally differwhere a formal binomial expansion (1 − encing operator and z t is the fractionally differenced time series. The expansion of Eq. (2) is limited at a proper large lag (L) as the final optimum lag time applied on yt time series (Metcalfe and Cowpertwait, 2009). This lag can be set to L = 30 (the number of expanded terms in Eq. (2)) in practice. For example, if d = 0.45 then z t = yt − 0.45yt − 1 − 0.12375yt − 2 ...−0.00203yt − 30 . In general, three basic steps are required to implement a linear FARIMA that include model identification, model estimation, and model diagnostic checking. Readers can refer to Hipel and McLeod (1996) to see more details on the modeling procedure by the FARIMA model.
Arid Semi-arid Very humid
model (Mehdizadeh et al., 2019). The general equation of a FARIMA(p,d,q) model can be expressed as follows (Yang and Bowling, 2014):
ϕp (B )(1 − B )dyt = ψq (B ) εt
d (d − 1) 2 d (d − 1)(d − 2) 3 B − B + ... 2! 3!
(1)
where yt is the observational time series in the standardized form; ϕp (B ) is an autoregressive polynomial of order p; ψq (B ) is a moving average polynomial of order q; B is the backward operator, i.e., Bx t = x t − 1; d is the fractional difference parameter, and εt ∼ (0, σε2) is an independent identically distributed (i.i.d.) normal error time series. The non-integer value shows long-term memory processes that can be carried out on the fractionally differenced series yt according to the following equations:
2.3. Gene expression programming Gene expression programming (GEP) suggested by Ferreira (2001) is in fact an advanced form of genetic algorithm (GA) as an optimization algorithm (Holland, 1975; Goldberg, 1989) and genetic programming (GP) as a biologically-inspired technique (Koza, 1992). The first step in GEP is generating an initial population. Next, expression trees are created to solve the defined problem. Indeed, expression trees 3
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
are given below: Mutation = 0.044; inversion = 0.1; IS transposition = 0.1; RIS transposition = 0.1; one-point recombination = 0.3; two-point recombination = 0.3; gene recombination = 0.1; gene transposition = 0.1. More details about the GEP can be found at Mehdizadeh et al. (2017b). 2.4. Feed-forward back propagation neural network Feed-forward back propagation neural network (FFBPNN) is known as the most popular artificial neural network (ANN) technique utilized in many engineering applications. It includes three layers, namely input, hidden and output layers and generates a general framework for finding a nonlinear functional mapping between the input and output parameters. Inputs are combined in linear forms and then non-linear functions are provided by using an activation function. The nodes at the input layer provide respective elements as input signals to the second layer, which are used as input for the third layer of the network at FFBPNN structure. Computation nodes or hidden neurons in the hidden layers intervene between the input and output of the model. At the output layer of the FFBPNN, Levenberg-Marquardt (LM) optimization technique is used for computation of output signals from input information. Higher number of hidden layer at FFBPNN structure provides higher order statistics; however, it is reported that one hidden layer is quite appropriate for engineering problems (Hornik et al., 1989; Safari et al., 2016). The output of a FFBPNN model can be expressed as (Kim and Valdés, 2003; Nourani, 2017): M
N
⎡ ⎤ ⎞ ⎛ yˆk = f0 ⎢∑ wkj. fh ⎜∑ wji x i + bj0⎟ + bk 0⎥ ⎠ ⎝ i=1 ⎣ i=1 ⎦
(3)
where f0 and fh indicate the activation functions of output and hidden neurons respectively; wkj output layer weight corresponding to the jth and kth neurons in the output and hidden layers, respectively; wji hidden layer weight corresponding to the ith and jth neurons in the input and hidden layers, respectively; bj0 jth hidden neuron bias and bk0 kth output neuron bias. To calibrate the FFBPNN structure, it is an essential task to adjust the number of neurons in the hidden layer to obtain best performance of the model. Accordingly, it is aimed to fix the number of neurons in the hidden layer through a trial and error method. The performances of the developed FFBPNN modes are evaluated for all cases through considering different numbers of neurons in the hidden layer within the range of 1–15. The FFBPNN codes were written in MATLAB programming language.
Fig. 2. Time series graphs of the measured daily ST data at various depths for the studied stations.
comprise of chromosomes and each chromosome consists of genes. The results obtained are then evaluated via a fitness function to specify the GEP's solution suitability. If the best solution is found by the GEP, the program stops. Otherwise, the best solution for the current generation will be maintained and the evolution process is reiterated to find the best solution. GeneXproTools program was utilized in this study in ST estimation process. Implementing the GEP program to solve any specific problem includes the following steps: 1. At first, a fitness function is determined to assess the solutions. Root mean square error (RMSE) was applied in this study. 2. The terminals set and functions set should be defined to the model. Terminals set are comprised of input and output parameters. Additionally, (+,−,×,÷,lnx , e x , x 2 , x 3 , x , 3 x , sinx , cosx , arctanx ) were used as the functions set. 3. Determining the chromosomal structure consisting of the number of chromosomes (i.e., 30), number of genes per chromosome (i.e., 3) and head size (i.e., 7) is the third step. 4. Next, a linking function is defined to the GEP model to connect sub-expression trees, which an addition function was employed in this work. 5. Finally, the rates of genetic operators are introduced to the GEP. The rates of various genetic operators used when implementing the GEP
2.5. Classical and hybrid models development Time series-based models (e.g., FARIMA used in this study) work on standardized observation values. Mathematical equations are presented by the time series models, which are used for estimating the measured dataset. It is notable that the time series-based models are also known as stochastic models because they can estimate the stochastic component of the measured data. On the other hand, machine learning-based techniques (e.g., GEP and FFBPNN applied in the present study) require a series of input data for estimating the target variable. Indeed, these methods have the ability to simulate the deterministic part of the measured data. Here, historical records of ST data (i.e., one-day lagged data) in the standardized form were used as inputs to construct the GEP and FFBPNN models. In addition, the below relationship was used for developing the hybrid models (i.e., GEP-FARIMA and FFBPNN-FARIMA) as:
STh = STs + STd 4
(4)
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
Table 2 The daily statistical parameters of the measured ST data at various depths for the considered stations. Station
Dataset
Soil depth (cm)
xmin (°C)
xmax (°C)
xmean (°C)
xsd (°C)
xs
xcv
Isfahan
Training
5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100
−3.33 −2.73 4.73 7.47 −3.00 −1.20 6.40 10.40 −7.73 −7.47 −1.30 3.20 −4.87 −4.80 0.17 4.07 0.00 0.67 3.47 5.17 0.33 0.73 4.07 7.13
45.53 39.60 34.53 31.97 42.13 39.80 33.53 30.60 39.00 35.47 28.70 26.70 38.30 35.10 27.93 27.00 37.80 34.67 31.40 27.73 40.20 37.40 32.20 29.87
21.08 20.75 20.36 20.58 21.74 20.91 20.73 21.01 15.24 14.70 14.65 14.71 15.23 14.52 14.27 14.03 18.23 18.05 17.98 17.72 19.79 19.30 19.00 18.84
11.56 11.34 8.36 6.27 11.76 10.92 7.95 6.06 11.40 10.66 8.45 6.60 11.63 10.86 8.53 6.38 8.02 7.65 6.27 5.08 9.79 9.19 6.91 5.63
−0.04 −0.10 −0.09 −0.07 −0.05 −0.09 −0.06 −0.04 0.03 0.01 0.01 0.08 0.02 −0.01 −0.01 0.03 0.01 −0.04 −0.07 −0.04 0.22 0.16 0.09 0.07
0.55 0.55 0.41 0.30 0.54 0.52 0.38 0.29 0.75 0.73 0.58 0.45 0.76 0.75 0.60 0.45 0.44 0.42 0.35 0.29 0.49 0.48 0.36 0.30
Testing
Urmia
Training
Testing
Rasht
Training
Testing
xmin, xmax, xmean, xsd, xs, and xcv denote the minimum, maximum, mean, standard deviation, skewness coefficient, and coefficient of variation of the measured ST data, respectively.
mean of the observed soil temperature data and the total number of observations for each training and testing datasets. The RRMSE criterion can be utilized to describe the accuracy of models as: excellent (RRMSE < 10%); good (10% < RRMSE < 20%); fair (20% < RRMSE < 30%), and poor (RRMSE > 30%) (Mehdizadeh et al., 2017c). At first, the optimal time series-based FARIMA models were fitted to the standardized ST data of selected stations at various soil depths. The relationships obtained for the optimal FARIMA models are given in Tables 3a–3c. Moreover, the error metrics of RMSE, MAE, and RRMSE for the mentioned models at different depths during training and testing phases are tabulated in Table 4. Obviously, the performance of FARIMA models developed at various depths of the Isfahan station is excellent based on the RRMSE values. For Urmia and Rasht stations, the accuracy of FARIMA models at surface layers (i.e., 5 and 10 cm) belongs to the good class while they are classified excellent for deep depths (i.e., 50 and 100 cm). On the other side, the accuracy of FARIMA models improves with increasing soil depth so that 5 and 100 cm depths, respectively, represent the worst and best performances at the studied regions. The error statistics obtained at a depth of 100 cm are RMSE = 0.69 °C, MAE = 0.50 °C, RRMSE = 3.34% for training phase, RMSE = 0.68 °C, MAE = 0.53 °C, RRMSE = 3.22% for testing stage at the Isfahan station; RMSE = 1.45 °C, MAE = 1.18 °C, RRMSE = 9.86% for training phase, RMSE = 0.82 °C, MAE = 0.68 °C, RRMSE = 5.85% for testing stage at the Urmia station; RMSE = 1.12 °C, MAE = 0.88 °C, RRMSE = 6.33% for training phase, RMSE = 1.43 °C, MAE = 1.12 °C, RRMSE = 7.58% for testing stage at the Rasht station. As already mentioned, for machine learning-based GEP and FFBPNN models, ST of each day at the studied stations was estimated utilizing the one-day lagged ST data. In the case of GEP, the number of generation = 1000 was considered for model stop criterion. The results showed that the accuracy of GEP models developed at various depths did not generally change for the number of generations above 600-700. Besides, trial and errors for the number of neurons in the hidden layer from 1–15 were performed to achieve the best performance of FFBPNN. The results of trial and errors revealed that there were slight differences between the accuracy of FFBPNN models for 1–15 neurons in the hidden layer. Table 5 represents the optimal structures of the FFBPNN
where STh indicates the estimated ST through the hybrid models; STs is the estimated stochastic component of ST data by the FARIMA, and STd denotes the estimated deterministic component of ST data by the GEP and FFBPNN models. In other words, the classical models are only capable of estimating a stochastic or deterministic component while the hybrid models proposed in this study consider both components in estimating the ST. 3. Results The entire daily ST datasets of the studied stations at different soil depths were standardized by the application of the following equation:
STst =
STm − STm σSTm
(5)
where STst , STm , STm , and σSTm illustrate the standardized ST data, the measured ST data, mean of the measured ST data, and standard deviation of the measured ST data, respectively. After their standardization, the standardized ST data were applied to construct and develop the classical FARIMA, GEP, and FFBPNN, as well as the hybrid GEP-FARIMA and FFBPNN-FARIMA models. Then, the accuracy of aforementioned classical and proposed hybrid models were compared with each other for the estimation of daily ST at surface (5 and 10 cm) and deep soil layers (50 and 100 cm) with respect to three statistical measures, namely root mean square error (RMSE), mean absolute error (MAE), and relative root mean square error (RRMSE) as below: N
∑i = 1 (STm, i − STe, i )2
RMSE =
N
(6)
N
MAE =
∑i = 1 |STm, i − STe, i |
RRMSE =
N
(7)
RMSE × 100% STm
(8)
where STm, i and STe, i respectively indicate the measured and estimated soil temperature for the ith day, STm and N, respectively, denote the 5
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
Table 3a Equations of FARIMA models fitted to the daily ST data at various depths (Isfahan station). Soil depth (cm)
Models
Equations
5
FARIMA(1,0.254,1)
10
FARIMA(1,0.235,1)
50
FARIMA(1,0.465,3)
100
FARIMA(2,0.541,4)
z t = STt − 0.25408*STt − 1 − 0.09476*STt − 2 − 0.05514*STt − 3 − 0.03785*STt − 4 − 0.02836*STt − 5 − 0.02243*STt − 6 − 0.01841*STt − 7 − 0.01552*STt − 8 − 0.01364*STt − 9 − 0.01168*STt − 10 − 0.01035*STt − 11 − 0.00927*STt − 12 − 0.00837*STt − 13 − 0.00762*STt − 14 − 0.00699*STt − 15 − 0.00644*STt − 16 − 0.00967*STt − 17 − 0.00555*STt − 18 − 0.00518*STt − 19 − 0.00486*STt − 20 − 0.00457*STt − 21 − 0.00430*STt − 22 − 0.00407*STt − 23 − 0.00386*STt − 24 − 0.00366*STt − 25 − 0.00349*STt − 26 − 0.00332*STt − 27 − 0.00332*STt − 28 − 0.00304*STt − 29 − 0.00291*STt − 30 z t = 0.6617*z t − 1 − 0.2826*εt − 1 z t = STt − 0.23585*STt − 1 − 0.09011*STt − 2 − 0.05299*STt − 3 − 0.03661*STt − 4 − 0.02756*STt − 5 − 0.02188*STt − 6 − 0.01802*STt − 7 − 0.01524*STt − 8 − 0.01314*STt − 9 − 0.01152*STt − 10 − 0.01022*STt − 11 − 0.00917*STt − 12 − 0.00830*STt − 13 − 0.00756*STt − 14 − 0.00694*STt − 15 − 0.00640*STt − 16 − 0.00594*STt − 17 − 0.00553*STt − 18 − 0.00517*STt − 19 − 0.00485*STt − 20 − 0.00456*STt − 21 − 0.00431*STt − 22 − 0.00408*STt − 23 − 0.00387*STt − 24 − 0.00367*STt − 25 − 0.00350*STt − 26 − 0.00334*STt − 27 − 0.00319*STt − 28 − 0.00306*STt − 29 − 0.00293*STt − 30 z t = 0.6859*z t − 1 − 0.1991*εt − 1 z t = STt − 0.46525*STt − 1 − 0.12439*STt − 2 − 0.06363*STt − 3 − 0.04032*STt − 4 − 0.02850*STt − 5 −0.02154*STt − 6 − 0.01703*STt − 7 − 0.01391*STt − 8 − 0.01165*STt − 9 − 0.00994*STt − 10 −0.00861*STt − 11 − 0.00756*STt − 12 − 0.00671*STt − 13 − 0.00601*STt − 14 − 0.00542*STt − 15 −0.00492*STt − 16 − 0.00450*STt − 17 − 0.00413*STt − 18 − 0.00381*STt − 19 − 0.00353*STt − 20 −0.00329*STt − 21 − 0.00307*STt − 22 − 0.00287*STt − 23 − 0.00270*STt − 24 − 0.00254*STt − 25 −0.00239*STt − 26 − 0.00226*STt − 27 − 0.00214*STt − 28 − 0.00204*STt − 29 − 0.00194*STt − 30 z t = 0.7305*z t − 1 − 0.3729*εt − 1 + 0.0807*εt − 2 + 0.0515*εt − 3 z t = STt − 0.54105*STt − 1 − 0.12415*STt − 2 − 0.06037*STt − 3 − 0.03711*STt − 4 − 0.02567*STt − 5 −0.01908*STt − 6 − 0.01488*STt − 7 − 0.01201*STt − 8 − 0.00995*STt − 9 − 0.00842*STt − 10 −0.00724*STt − 11 − 0.00631*STt − 12 − 0.00556*STt − 13 − 0.00495*STt − 14 − 0.00444*STt − 15 −0.00401*STt − 16 − 0.00365*STt − 17 − 0.00333*STt − 18 − 0.00306*STt − 19 − 0.00283*STt − 20 −0.00262*STt − 21 − 0.00243*STt − 22 − 0.00227*STt − 23 − 0.00213*STt − 24 − 0.00199*STt − 25 −0.00188*STt − 26 − 0.00177*STt − 27 − 0.00167*STt − 28 − 0.00158*STt − 29 − 0.00150*STt − 30 z t = 0.0169*z t − 1 + 0.7256*z t − 2 − 0.1660*εt − 1 − 0.6551*εt − 2 + 0.1867*εt − 3 + 0.0862*εt − 4
illustrate the best performance at 100 cm soil depth. The performance of these models gets worse with reducing depth. The most important feature of the GEP technique is the provision of algebraic relationships between the input and output parameters that makes it superior to other machine learning models. The algebraic equations obtained for the different soil depths at the studied areas are given in Table 9. These equations can be applied to estimate soil
models developed at different depths of considered regions. The statistical errors computed for the classical GEP and FFBPNN models at different soil depths are presented in Tables 6–8 for the Isfahan, Urmia, and Rasht stations, respectively. It is apparent that the performances of machine learning-based GEP and FFBPNN models at all depths of the stations are in the excellent class based on the RRMSE index. Similar to the outcomes observed for the FARIMA models, the GEP and FFBPNN
Table 3b Equations of FARIMA models fitted to the daily ST data at various depths (Urmia station). Soil depth (cm)
Models
5
FARIMA(1,0.230,1)
10
FARIMA(1,0.320,1)
50
FARIMA(4,0.753,1)
100
FARIMA(5,0.863,2)
Equations
zt
zt
zt
zt
z t = STt − 0.23088*STt − 1 − 0.08878*STt − 2 − 0.05235*STt − 3 − 0.03624*STt − 4 − 0.02732*STt − 5 −0.02171*STt − 6 − 0.01789*STt − 7 − 0.01514*STt − 8 − 0.01307*STt − 9 − 0.01146*STt − 10 −0.01018*STt − 11 − 0.00913*STt − 12 − 0.00827*STt − 13 − 0.00754*STt − 14 − 0.00692*STt − 15 −0.00639*STt − 16 − 0.00593*STt − 17 − 0.00552*STt − 18 − 0.00516*STt − 19 − 0.00484*STt − 20 −0.00456*STt − 21 − 0.00430*STt − 22 − 0.00407*STt − 23 − 0.00386*STt − 24 − 0.00367*STt − 25 −0.00350*STt − 26 − 0.00334*STt − 27 − 0.00319*STt − 28 − 0.00306*STt − 29 − 0.00293*STt − 30 = 0.6468*z t − 1 − 0.1381*εt − 1 z t = STt − 0.32064*STt − 1 − 0.10819*STt − 2 − 0.06096*STt − 3 − 0.04083*STt − 4 − 0.03005*STt − 5 −0.02343*STt − 6 − 0.01901*STt − 7 − 0.01587*STt − 8 − 0.01354*STt − 9 − 0.01175*STt − 10 −0.01034*STt − 11 − 0.00920*STt − 12 − 0.00827*STt − 13 − 0.00749*STt − 14 − 0.00683*STt − 15 −0.00626*STt − 16 − 0.00578*STt − 17 − 0.00535*STt − 18 − 0.00498*STt − 19 − 0.00465*STt − 20 −0.00436*STt − 21 − 0.00410*STt − 22 − 0.00386*STt − 23 − 0.00365*STt − 24 − 0.00345*STt − 25 −0.00328*STt − 26 − 0.00312*STt − 27 − 0.00297*STt − 28 − 0.00284*STt − 29 − 0.00271*STt − 30 = 0.6078*z t − 1 − 0.1092*εt − 1 z t = STt − 0.75337*STt − 1 − 0.09290*STt − 2 − 0.03860*STt − 3 − 0.02168*STt − 4 − 0.01407*STt − 5 −0.00996*STt − 6 − 0.00746*STt − 7 − 0.00583*STt − 8 − 0.00469*STt − 9 − 0.00387*STt − 10 −0.00325*STt − 11 − 0.00277*STt − 12 − 0.00240*STt − 13 − 0.00210*STt − 14 − 0.00185*STt − 15 −0.00165*STt − 16 − 0.00148*STt − 17 − 0.00133*STt − 18 − 0.00121*STt − 19 − 0.00110*STt − 20 −0.00101*STt − 21 − 0.00093*STt − 22 − 0.00086*STt − 23 − 0.00080*STt − 24 − 0.00074*STt − 25 −0.00069*STt − 26 − 0.00064*STt − 27 − 0.00060*STt − 28 − 0.00057*STt − 29 − 0.00053*STt − 30 = 1.1807*z t − 1 − 0.0835*z t − 2 − 0.0943*z t − 3 − 0.0352*z t − 4 − 0.9669*εt − 1 z t = STt − 0.86328*STt − 1 − 0.05901*STt − 2 − 0.02236*STt − 3 − 0.01194*STt − 4 − 0.00749*STt − 5 −0.00516*STt − 6 − 0.00379*STt − 7 − 0.00290*STt − 8 − 0.00230*STt − 9 − 0.00187*STt − 10 −0.00155*STt − 11 − 0.00131*STt − 12 − 0.00112*STt − 13 − 0.00097*STt − 14 − 0.00085*STt − 15 −0.00075*STt − 16 − 0.00067*STt − 17 − 0.00060*STt − 18 − 0.00054*STt − 19 − 0.00049*STt − 20 −0.00045*STt − 21 − 0.00041*STt − 22 − 0.00037*STt − 23 − 0.00034*STt − 24 − 0.00032*STt − 25 −0.00030*STt − 26 − 0.00027*STt − 27 − 0.00026*STt − 28 − 0.00024*STt − 29 − 0.00022*STt − 30 = 1.4746*z t − 1 − 0.3125*z t − 2 − 0.0805*z t − 3 − 0.0946*z t − 4 + 0.0013*z t − 5 − 1.6880*εt − 1 + 0.6992*εt − 2
6
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
Table 3c Equations of FARIMA models fitted to the daily ST data at various depths (Rasht station). Soil depth (cm)
Models
Equations
5
FARIMA(3,0.581,2)
10
FARIMA(5,0.465,2)
50
FARIMA(4,0.399,1)
100
FARIMA(1,0.461,1)
z t = STt − 0.58184*STt − 1 − 0.12165*STt − 2 − 0.05750*STt − 3 − 0.03476*STt − 4 − 0.02376*STt − 5 −0.01750*STt − 6 − 0.01354*STt − 7 − 0.01086*STt − 8 − 0.00895*STt − 9 − 0.00754*STt − 10 −0.00645*STt − 11 − 0.00560*STt − 12 − 0.00492*STt − 13 − 0.00436*STt − 14 − 0.00390*STt − 15 −0.00352*STt − 16 − 0.00319*STt − 17 − 0.00291*STt − 18 − 0.00266*STt − 19 − 0.00245*STt − 20 −0.00227*STt − 21 − 0.00210*STt − 22 − 0.00196*STt − 23 − 0.00183*STt − 24 − 0.00171*STt − 25 −0.00161*STt − 26 − 0.00151*STt − 27 − 0.00143*STt − 28 − 0.00135*STt − 29 − 0.00128*STt − 30 z t = 0.4524*z t − 1 + 0.7598*z t − 2 − 0.2637*z t − 3 − 0.1678*εt − 1 − 0.8001*εt − 2 z t = STt − 0.46539*STt − 1 − 0.12440*STt − 2 − 0.06363*STt − 3 − 0.04032*STt − 4 − 0.02850*STt − 5 −0.02154*STt − 6 − 0.01703*STt − 7 − 0.01391*STt − 8 − 0.01164*STt − 9 − 0.00994*STt − 10 −0.00861*STt − 11 − 0.00756*STt − 12 − 0.00671*STt − 13 − 0.00600*STt − 14 − 0.00542*STt − 15 − 0.00492*STt − 16 − 0.00450*STt − 17 − 0.00413*STt − 18 − 0.00381*STt − 19 − 0.00353*STt − 20 −0.00328*STt − 21 − 0.00307*STt − 22 − 0.00287*STt − 23 − 0.00269*STt − 24 − 0.00254*STt − 25 − 0.00239*STt − 26 − 0.00226*STt − 27 − 0.00214*STt − 28 − 0.00204*STt − 29 − 0.00194*STt − 30 z t = −0.4340*z t − 1 − 0.4052*z t − 2 + 0.4701*z t − 3 − 0.0372*z t − 4 − 0.0396*z t − 5 + 0.9761*εt − 1 + 0.9289*εt − 2 z t = STt − 0.39966*STt − 1 − 0.11996*STt − 2 − 0.06399*STt − 3 − 0.04160*STt − 4 − 0.02995*STt − 5 −0.02296*STt − 6 − 0.01837*STt − 7 − 0.01516*STt − 8 − 0.01280*STt − 9 − 0.001101*STt − 10 −0.00960*STt − 11 − 0.00848*STt − 12 − 0.00757*STt − 13 − 0.00681*STt − 14 − 0.00618*STt − 15 −0.00564*STt − 16 − 0.00517*STt − 17 − 0.00477*STt − 18 − 0.00442*STt − 19 − 0.00411*STt − 20 −0.00383*STt − 21 − 0.00359*STt − 22 − 0.00337*STt − 23 − 0.00317*STt − 24 − 0.00300*STt − 25 −0.00283*STt − 26 − 0.00269*STt − 27 − 0.00255*STt − 28 − 0.00243*STt − 29 − 0.00232*STt − 30 z t = 1.3578*z t − 1 − 0.3473*z t − 2 − 0.0238*z t − 3 − 0.0282*z t − 4 − 0.8188*εt − 1 z t = STt − 0.46111*STt − 1 − 0.12424*STt − 2 − 0.06373*STt − 3 − 0.04045*STt − 4 − 0.02863*STt − 5 −0.02165*STt − 6 − 0.01713*STt − 7 − 0.01400*STt − 8 − 0.01173*STt − 9 − 0.01001*STt − 10 −0.00868*STt − 11 − 0.00763*STt − 12 − 0.00677*STt − 13 − 0.00606*STt − 14 − 0.00547*STt − 15 −0.00497*STt − 16 − 0.00454*STt − 17 − 0.00417*STt − 18 − 0.00385*STt − 19 − 0.00357*STt − 20 −0.00332*STt − 21 − 0.00310*STt − 22 − 0.00290*STt − 23 − 0.00273*STt − 24 − 0.00257*STt − 25 −0.00242*STt − 26 − 0.00229*STt − 27 − 0.00217*STt − 28 − 0.00206*STt − 29 − 0.00196*STt − 30 z t = 0. 9217*z t − 1 − 0.7074*εt − 1
temperature of each day (i.e., STt) at various depths by using soil temperature of the previous day (i.e., STt-1). As noted, one of the main goals of this study is to improve the estimation of daily ST at various depths by the novel hybrid models. For this purpose, the time series-based FARIMA was combined with the machine learning-based GEP and FFBPNN models. The results obtained for the hybrid models named GEP-FARIMA and FFBPNN-FARIMA are illustrated in Tables 6–8. As seen, the proposed hybrid models present superior performances than the classical FARIMA, GEP, and FFBPNN models. The greatest rates of improvements in the performance of classical models through the hybrid models at the considered regions were observed at a depth of 5 cm. For example, the statistical errors of RMSE, MAE and RRMSE at the 5 cm depth by the GEP during the testing phase are 1.47 °C, 1.05 °C, 6.76% (Isfahan station), 1.18 °C, 0.83 °C, 7.75% (Urmia station), and 1.60 °C, 1.20 °C, 8.09% (Rasht station) decrease to 0.24 °C, 0.18 °C, 1.12% (Isfahan station), 0.11 °C, 0.08 °C, 0.71% (Urmia station), and 0.25 °C, 0.18 °C, 1.25% (Rasht station) via the proposed GEP-FARIMA hybrid model. In addition, similar to the classical FARIMA, GEP, and FFBPNN models, the hybrid models proposed in this study offer better results at deep soil layers in comparison
Table 5 The optimal structures of the FFBPNN models developed at various depths for the studied stations. Station
Isfahan Urmia Rasht a
Soil depth (cm) 5
10
50
100
(1,2,1)a (1,3,1) (1,1,1)
(1,1,1) (1,1,1) (1,1,1)
(1,1,1) (1,1,1) (1,1,1)
(1,3,1) (1,2,1) (1,1,1)
(Input layer neuron, hidden layer neuron, output layer neuron).
with the surface layers. However, the proposed hybrid models performed the best at a depth of 50 cm for all the studied stations. In order to schematically describe the performance of hybrid models compared with the classical models, the depth of 5 cm was considered since this depth showed the greatest improvements in estimating the ST through the hybrid models. Then, time series graphs among the measured and estimated ST data by the proposed GEP-FARIMA hybrid model, as well as the classical FARIMA and GEP models were drawn and presented in Figs. 3–5 for the Isfahan, Urmia and Rasht stations,
Table 4 The statistical results of FARIMA models at various depths during the training and testing phases. Station
Isfahan
Urmia
Rasht
Soil depth (cm)
5 10 50 100 5 10 50 100 5 10 50 100
Training RMSE (°C)
MAE (°C)
RRMSE (%)
Testing RMSE (°C)
MAE (°C)
RRMSE (%)
1.68 1.57 1.04 0.69 1.76 1.72 1.49 1.45 2.07 1.92 1.53 1.12
1.29 1.21 0.79 0.50 1.36 1.34 1.15 1.18 1.65 1.54 1.20 0.88
7.97 7.58 5.13 3.34 11.54 11.68 10.18 9.86 11.35 10.61 8.51 6.33
1.79 1.64 1.03 0.68 1.72 1.65 1.19 0.82 3.04 2.77 1.77 1.43
1.40 1.27 0.79 0.53 1.38 1.32 0.96 0.68 2.34 2.15 1.37 1.12
8.24 7.83 4.98 3.22 11.27 11.39 8.32 5.85 15.35 14.33 9.32 7.58
7
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
Table 6 The statistical results of classical GEP, FFBPNN, and hybrid GEP-FARIMA and FFBPNN-FARIMA models at various depths during the training and testing phases (Isfahan station). Model
GEP
GEP-FARIMA
FFBPNN
FFBPNN-FARIMA
Soil depth (cm)
5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100
Training RMSE (°C)
MAE (°C)
RRMSE (%)
Testing RMSE (°C)
MAE (°C)
RRMSE (%)
1.40 1.12 0.34 0.34 0.23 0.13 0.06 0.16 1.39 1.12 0.34 0.34 0.26 0.15 0.07 0.15
1.00 0.83 0.23 0.18 0.17 0.10 0.04 0.10 1.00 0.82 0.23 0.18 0.18 0.11 0.05 0.10
6.62 5.40 1.68 1.63 1.10 0.62 0.29 0.76 6.62 5.39 1.69 1.63 1.21 0.70 0.35 0.71
1.47 1.20 0.34 0.35 0.24 0.13 0.05 0.16 1.48 1.19 0.34 0.35 0.29 0.14 0.06 0.16
1.05 0.86 0.21 0.19 0.18 0.10 0.03 0.11 1.06 0.85 0.21 0.18 0.21 0.11 0.04 0.10
6.76 5.73 1.65 1.65 1.12 0.61 0.25 0.77 6.82 5.71 1.64 1.66 1.35 0.68 0.31 0.77
Table 7 The statistical results of classical GEP, FFBPNN, and hybrid GEP-FARIMA and FFBPNN-FARIMA models at various depths during the training and testing phases (Urmia station). Model
GEP
GEP-FARIMA
FFBPNN
FFBPNN-FARIMA
Soil depth (cm)
5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100
Training RMSE (°C)
MAE (°C)
RRMSE (%)
Testing RMSE (°C)
MAE (°C)
RRMSE (%)
1.27 0.97 0.32 0.24 0.12 0.10 0.04 0.09 1.27 0.97 0.32 0.25 0.15 0.12 0.06 0.09
0.92 0.71 0.22 0.13 0.09 0.07 0.02 0.05 0.92 0.72 0.22 0.14 0.11 0.09 0.04 0.06
8.32 6.60 2.20 1.65 0.76 0.68 0.28 0.58 8.32 6.61 2.22 1.68 0.99 0.83 0.40 0.62
1.18 0.96 0.36 0.25 0.11 0.09 0.04 0.09 1.17 0.96 0.37 0.25 0.14 0.11 0.06 0.09
0.83 0.65 0.23 0.13 0.08 0.07 0.03 0.05 0.83 0.65 0.24 0.14 0.10 0.09 0.04 0.06
7.75 6.61 2.55 1.80 0.71 0.65 0.26 0.67 7.68 6.60 2.57 1.77 0.93 0.77 0.39 0.64
Table 8 The statistical results of classical GEP, FFBPNN, and hybrid GEP-FARIMA and FFBPNN-FARIMA models at various depths during the training and testing phases (Rasht station). Model
GEP
GEP-FARIMA
FFBPNN
FFBPNN-FARIMA
Soil depth (cm)
5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100
Training RMSE (°C)
MAE (°C)
RRMSE (%)
Testing RMSE (°C)
MAE (°C)
RRMSE (%)
1.20 0.89 0.48 0.38 0.14 0.18 0.10 0.11 1.20 0.89 0.48 0.38 0.16 0.18 0.07 0.11
0.88 0.66 0.28 0.20 0.10 0.13 0.05 0.07 0.89 0.66 0.29 0.20 0.12 0.13 0.05 0.07
6.59 4.95 2.67 2.16 0.76 1.00 0.53 0.59 6.60 4.95 2.67 2.17 0.88 1.01 0.41 0.61
1.60 1.19 0.45 0.42 0.25 0.19 0.08 0.12 1.63 1.21 0.46 0.44 0.37 0.30 0.07 0.15
1.20 0.90 0.32 0.26 0.18 0.14 0.06 0.08 1.24 0.91 0.33 0.28 0.24 0.21 0.05 0.10
8.09 6.27 2.39 2.24 1.25 0.98 0.42 0.63 8.26 6.29 2.41 2.31 1.89 1.54 0.35 0.81
8
5 10 50 100
5 10 50 100 5 10 50 100
Isfahan
Urmia
Rasht
Soil depth (cm)
Station
Table 9 The algebraic equations presented by the GEP at various depths.
STt STt STt STt STt STt STt STt
= = = = = = = =
+ exp ((((STt − 1 * STt − 1)^(1.0/3.0)) − ((6.224151 + STt − 1) + atan (STt − 1)))) exp ((−9.306061 − cos ((((−9.306061 + STt − 1) − 9.306061)^3)))) + (STt − 1 * cos (sin (sin (4.892487)))) + (cos (atan ((cos (STt − 1) − 7.712769)))* STt − 1) STt − 1 + ((atan (−4.004577)*(−4.004577 * STt − 1))*(atan (−4.004577)* exp (−4.004577))) + ((atan (STt − 1)*(−4.004577 − STt − 1))*(exp (−4.004577)*(−2.363617 + 2.363617))) STt − 1 + (((−0.053833 * STt − 1)*(0.053833 * 9.348816)) + ((STt − 1 * STt − 1)*(−0.053833^3))) + ((cos (STt − 1)/(2.736023 * 2.736023))/((STt − 1 + STt − 1) + (2.736023^3))) (exp (((STt − 1 − 7.403839) + STt − 1))/((7.403839 − 6.657135) + (STt − 1 + STt − 1))) + (STt − 1/(((STt − 1 * STt − 1) + exp (STt − 1)) + (−5.015502^3))) + STt − 1 (((atan (−0.445679)^3)* atan (STt − 1)) + STt − 1) + sin (−0.814331) + cos (atan (((sin (7.650086) − atan (STt − 1)) + STt − 1))) log (atan ((((sin (1.61438) + 1.61438) − STt − 1)^(1.0/3.0)))) + log (atan (((5.782898 − STt − 1)^(1.0/3.0)))) + ((((STt − 1 − STt − 1)* STt − 1)/(5.782898 − STt − 1)) + STt − 1) exp ((−7.712769 − ((2.036682 − STt − 1)*(STt − 1 + 2.036682)))) + exp ((((STt − 1/STt − 1) − (STt − 1 + 7.712769)) − STt − 1)) + STt − 1 sqrt (sin ((−5.015502 − (((STt − 1 − STt − 1)*−3.400207)^(1.0/3.0))))) + (STt − 1 * cos ((sin (−2.363617)^3))) + sin (atan (((−7.732025 − (STt − 1 + STt − 1)) + sin (STt − 1))))
STt = (((exp (STt − 1) − STt − 1) − cos (6.224151))* sin (sin (6.224151))) + (STt − 1 − ((exp (STt − 1) + STt − 1)/(6.540314 * 6.540314)))
STt = sin (sin (sin ((((STt − 1 − STt − 1)/STt − 1) + STt − 1)))) + STt − 1 + ((((STt − 1 − STt − 1)^3)^2) − atan ((exp (0.255005)* STt − 1))) STt = STt − 1 + STt − 1 + (((2.736023 − STt − 1) + (STt − 1 − 2.736023)) − (STt − 1 * atan (2.736023))) STt = sin (−2.363617) + (STt − 1 * cos (log ((sin ((STt − 1/STt − 1))^2)))) + cos (cos (((sin (−7.712769) + cos (2.036682))* cos (2.036682))))
Equations
S. Mehdizadeh, et al.
Soil & Tillage Research 197 (2020) 104513
Fig. 3. Measured vs. estimated daily ST values by the hybrid GEP-FARIMA and classical FARIMA and GEP models at a depth of 5 cm during the testing period (Isfahan station).
respectively. It can be observed that the hybrid models can accurately estimate and track the daily ST time series of the stations. By comparing the performance of GEP and FFBPNN models, it can be concluded that there are negligible discrepancies among the accuracy of mentioned models at different depths. Furthermore, the GEP and FFBPNN models demonstrate superior results compared with the FARIMA model. As already mentioned and according to Eq. (4), the time series of ST data similar to the other hydrological and meteorological parameters is comprised of stochastic and deterministic components. The stochastic component of ST data is obtained via the time series models such as the FARIMA used in this study; while the GEP and FFBPNN are applied to estimate the deterministic component of ST data. Better performance of the GEP and FFBPNN models than the FARIMA can be justified considering the nature of these models so that the performance of machine-learning-based models to estimate the deterministic component is superior to the time series-based FARIMA model in capturing the stochastic component of ST data. The hybrid models perform much better than the classical models for estimating daily ST. On the other hand, the GEP-FARIMA models developed at various depths of the studied sites present a better relative performance in comparison with the FFBPNN-FARIMA models. The results of the present research work are in agreement with the outcomes of Tabari et al. (2015) and Zeynoddin et al. (2019) implying that the best performance of machine learning-based MLP approach and
9
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
Fig. 4. Measured vs. estimated daily ST values by the hybrid GEP-FARIMA and classical FARIMA and GEP models at a depth of 5 cm during the testing period (Urmia station).
Fig. 5. Measured vs. estimated daily ST values by the hybrid GEP-FARIMA and classical FARIMA and GEP models at a depth of 5 cm during the testing period (Rasht station).
ARIMA time series model was obtained at deep soil layers utilizing the antecedent ST data in the estimation process. On the contrary, the outcomes obtained are inconsistent with the results of other studies such as Hosseinzadeh Talee, 2014; Nahvi et al. (2016); Behmanesh and Mehdizadeh (2017). The authors reported that the better accuracy of machine learning models was observed at surface layers by using the other meteorological parameters as inputs of the models to estimate ST. Additionally, hybrid models have been developed in the literature (Mehdizadeh, 2018; Mehdizadeh and Kozekalani Sales, 2018; Mehdizadeh et al., 2017c, 2018b, 2019; Fathian et al., 2019) for estimating the other parameters (e.g., hydrologic parameters) by the combination of time series and machine learning models. A much better efficiency of the hybrid models than the classical models has been reported by the authors, which is similar to the results achieved in this study. The performance of all the models developed at the studied stations with various climates is compared herein. To this end, it was focused on the values of RRMSE index. The best performance of FARIMA at all depths is observed at Isfahan station, which includes the arid climate (see, Table 4). At Urmia and Rasht stations, various results were achieved. The accuracy of FARIMA models implemented at all depths of Rasht station is better than the corresponding models at Urmia station during the training stage. On the contrary, the FARIMA models of Urmia station at all depths presented superior results in comparison to
the relevant models of Rasht station during the testing period. Regarding the single GEP and FFBPNN models, it can be concluded from the Tables 6–8 that the best performance of the mentioned models was observed at the depth of 10 cm for the Rasht and Isfahan stations during the training and testing periods, respectively. For the other depths (i.e., 5, 50, 100 cm), the single machine learning-based models showed superior results at the Isfahan station in terms of having the lowest RRMSE values. In general, the weakest performance of single models at deep soil depths (i.e., 50 and 100 cm) among the studied stations was obtained at Rasht station. For the case of hybrid models developed, the highest accuracy of these models at the depths of 5, 10, 50, and 100 cm were generally concluded at Urmia, Isfahan, Isfahan, and Urmia stations, respectively. 4. Conclusions The present study estimated the daily ST data of three stations in Iran at 5, 10, 50, and 100 cm depths as one of the crucial parameters of soil. To this end, classical FARIMA (as a time series-based model), and GEP and FFBPNN (machine learning-based models), as well as hybrid GEP-FARIMA and FFBPNN-FARIMA models were developed. All applied models were trained and then tested by using antecedent ST data (i.e., one-day lagged ST values). The performance of models was investigated with respect to three statistical metrics such as RMSE, MAE, 10
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
and RRMSE. Assessing the accuracy of classical GEP and FFBPNN models showed that there were negligible differences between the performances of mentioned models at all depths. On the other hand, the FARIMA illustrated lower accuracies compared with the GEP and FFBPNN models for daily ST estimation of the studied stations for all depths. This study also proposed hybrid models by the combination of FARIMA with the GEP and FFBPNN models. It was concluded that the proposed hybrid models indicated superior performances in comparison with the classical models. Additionally, classical and proposed hybrid models presented the best performance at large soil depths (i.e., 50 cm for the hybrid models and 100 cm for the classical models); so that, the accuracy of models was decreased from deep to surface soil layers. The literature review showed that previous works published have mainly used machine learning models to estimate ST while a few studies have been reported in literature for ST estimation using the time series-based models. Therefore, these models are recommended to further be considered in future studies when estimating the ST. In addition, as seen, the machine learning models (e.g., GEP and FFBPNN) demonstrated better results than the time series model applied here (e.g., FARIMA) over a daily time scale. This point cannot be considered as a general conclusion since machine learning and time series models can display different performances at different time scales. Thus, future studies can compare the performance of these models for estimating ST at different time horizons including daily, monthly, seasonally, and annually. Moreover, this study developed hybrid models via combining the GEP and FFBPNN models with the FARIMA. Hence, more hybrid models are recommended to be developed to improve the ST estimation by the combination of other machine learning models such as RF, MARS, ANFIS, etc. with the linear autoregressive (AR), autoregressive moving average (ARMA), ARIMA, and non-linear autoregressive conditional heteroscedasicity (ARCH), generalized ARCH (GARCH), selfexciting threshold autoregressive (SETAR) types of the time series models.
analytical model for estimating soil temperature profiles on the Qinghai-Tibet plateau of China. J. Arid Land 8 (2), 232–240. Kang, S., Kim, S., Oh, S., Lee, D., 2000. Predicting spatial and temporal patterns of soil temperature based on topography, surface cover and air temperature. For. Ecol. Manage. 136, 173–184. Kazemi, S.M.R., Minaei Bidgoli, B., Shamshirband, S., Karimi, S.M., Ghorbani, M.A., Chau, K.W., Kazem Pour, R., 2018. Novel genetic-based negative correlation learning for estimating soil temperature. Eng. Appl. Comput. Fluid Mech. 12 (1), 506–516. Kim, T.W., Valdés, J.B., 2003. Nonlinear model for drought forecasting based on a conjunction of wavelet transforms and neural networks. J. Hydrol. Eng. 8 (6), 319–328. Kim, S., Singh, V.P., 2014. Modeling daily soil temperature using data-driven models and spatial distribution. Theor. Appl. Climatol. 118 (3), 465–479. Kisi, O., Tombul, M., Zounemat Kermani, M., 2015. Modeling soil temperatures at different depths by using three different neural computing techniques. Theor. Appl. Climatol. 121 (1–2), 377–387. Koza, J.R., 1992. Genetic Programming: on the Programming of Computers by Means of Natural Selection, vol. 1 MIT press. Li, H.J., Yan, J.X., Yue, X.F., Wang, M.B., 2008. Significance of soil temperature and moisture for soil respiration in a Chinese mountain area. Agric. For. Meteorol. 148 (3), 490–503. Liu, H., Huang, B., 2005. Root physiological factors involved in cool-season grass response to high soil temperature. Environ. Exper. Bot. 53 (3), 233–245. Mehdizadeh, S., 2018. Estimation of daily reference evapotranspiration (ETo) using artificial intelligence methods: offering a new approach for lagged ETo data-based modeling. J. Hydrol. 559, 794–812. Mehdizadeh, S., Kozekalani Sales, A., 2018. A comparative study of autoregressive, autoregressive moving average, gene expression programming and Bayesian networks for estimating monthly streamflow. Water Resour. Manage 32 (9), 3001–3022. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2017a. Evaluating the performance of artificial intelligence methods for estimation of monthly mean soil temperature without using meteorological data. Environ. Earth Sci. 76. https://doi.org/10.1007/s12665017- 6607-8. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2017b. Application of gene expression programming to predict daily dew point temperature. Appl. Therm. Eng. 112, 1097–1107. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2017c. A comparison of monthly precipitation point estimates using integration of soft computing methods and GARCH time series model. J. Hydrol. 554, 721–742. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2018a. Comprehensive modeling of monthly mean soil temperature using multivariate adaptive regression splines and support vector machine. Theor. Appl. Climatol. 133 (3–4), 911–924. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2018b. New approaches for estimation of monthly rainfall based on GEP-ARCH and ANN-ARCH hybrid models. Water Resour. Manage. 32 (2), 527–545. Mehdizadeh, S., Fathian, F., Adamowski, J.F., 2019. Hybrid artificial intelligence-time series models for monthly streamflow modeling. Appl. Soft Comput. 80, 873–887. Metcalfe, A.V., Cowpertwait, P.S., 2009. Introductory Time Series With R. SpringerVerlag, New York, USA. Nabi, G., Muillins, C.E., 2008. Soil temperature dependent growth of cotton seedlings before emergence. Pedosphere 18 (1), 54–59. Nahvi, B., Habibi, J., Mohammadi, K., Shamshirband, S., Al Razgan, O.S., 2016. Using self-adaptive evolutionary algorithm to improve the performance of an extreme learning machine for estimating soil temperature. Comput. Electron. Agric. 124, 150–160. Nourani, V., 2017. An emotional ANN (EANN) approach to modeling rainfall-runoff process. J. hydrol. 544, 267–277. Rube, W., 2005. Carbon limitation of soil respiration under winter snowpacks: potential feedbacks between growing season and winter carbon fluxes. Global Change Biol. 11 (2), 231–238. Safari, M.J.S., Aksoy, H., Mohammadi, M., 2016. Artificial neural network and regression models for flow velocity at sediment incipient deposition. J. Hydrol. 541 (B), 1420–1429. Samadianfard, S., Ghorbani, M.A., Mohammadi, B., 2018a. Forecasting soil temperature at multiple-depth with a hybrid artificial neural network model coupled-hybrid firefly optimizer algorithm. Inform. Proc. Agric. 5 (4), 465–476. Samadianfard, S., Asadi, E., Jarhan, S., Kazemi, H., Keshtgar, S., Kisi, O., Sajjadi, S., Manaf, A.A., 2018b. Wavelet neural networks and gene expression programming models to predict short-term soil temperature at different depths. Soil Till. Res. 175, 37–50. Sanikhani, H., Deo, R.C., Yaseen, Z.M., Eray, O., Kisi, O., 2018. Non-tuned data intelligent model for soil temperature estimation: a new approach. Geoderma 330, 52–64. Schimel, J.P., Bilbrough, C., Welker, J.M., 2004. Increased snow depth affects microbial activity and nitrogen mineralization in two Arctic tundra communities. Soil Biol. Biochem. 36 (3), 217–227. Seyfried, M.S., Murdock, M.D., Hanson, C.L., Flerchinger, G.N., Van Vactor, S., 2001. Long-term climate database, Reynolds creek experimental watershed, Idaho, United States. Water Resour. Res. 37 (11), 2825–2830. Sihag, P., Esmaeilbeiki, F., Singh, B., Pandhiani, S.M., 2019. Model-based Soil Temperature Estimation Using Climatic Parameters: the Case of Azerbaijan Province. Iran. https://doi.org/10.1080/24749508.2019.1610841. Tabari, H., Hosseinzadeh Talaee, P., Willems, P., 2015. Short-term forecasting of soil temperature using artificial neural network. Meteorol. Appl. 22 (3), 576–585. Wu, X., Yao, Z., Brüggemann, N., Shen, Z.Y., Wolf, B., Dannenmann, M., 2010. Effects of soil moisture and temperature on CO2 and CH4 soil–atmosphere exchange of various land use/cover types in a semi-arid grassland in Inner Mongolia. China. Soil Biol. Biochem. 42 (5), 773–787.
Declaration of Competing Interest The authors declare that there is no conflict of interest. References Araghi, A., Mousavi-Baygi, M., Adamowski, J., 2017. Detecting soil temperature trends in Northeast Iran from 1993 to 2016. Soil Till. Res. 174, 177–192. Behmanesh, J., Mehdizadeh, S., 2017. Estimation of soil temperature using gene expression programming and artificial neural networks in a semiarid region. Environ. Earth Sci. https://doi.org/10.1007/s12665-017-6395-1. Bilgili, M., 2011. The use of artificial neural networks for forecasting the monthly mean soil temperatures in Adana. Turkey. Turk. J. Agric. For. 35, 83–93. Børresen, M.H., Barnes, D.L., Rike, A.G., 2007. Repeated freeze–thaw cycles and their effects on mineralization of hexadecane and phenanthrene in cold climate soils. Cold Reg. Sci. Technol. 49 (3), 215–225. Brar, G.S., Steiner, J.L., Unger, P.W., Prihar, S.S., 1992. Modeling sorghum seedling establishment from soil wetness and temperature of drying seed zones. Agron. J. 84, 905–910. de Martonne, E., 1925. Traité de géographie physique. 3 tomes. Paris. Fathian, F., Mehdizadeh, S., Kozekalani Sales, A., Safari, M.J.S., 2019. Hybrid models to improve the monthly river flow prediction: integrating artificial intelligence and nonlinear time series model. J. Hydrol. 575, 1200–1213. Feng, Y., Cui, N., Hao, W., Gao, L., Gong, D., 2019. Estimation of soil temperature from meteorological data using different machine learning models. Geoderma 338, 67–77. Ferreira, C., 2001. Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst. 13 (2), 87–129. Goldberg, D.E., 1989. Genetic algorithm in search. Optimization & Machine Learning. Addison-Wesley, New York. Hillel, D., 1998. Environmental Soil Physics. Academic Press (1998). 771 PP. Hipel, K.W., McLeod, A.E., 1996. Time Series Modeling of Water Resources and Environmental Systems. Elsevier, Amsterdam. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press, Cambridge, Massachusetts, USA. Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366. Hosseinzadeh Talee, P., 2014. Daily soil temperature modeling using neuro-fuzzy approach. Theor. Appl. Climatol. 118 (3), 481–489. Hu, G., Lin, Z., Wu, X., Ren, L., Wu, T., Xie, C., Qiao, Y., Shi, J., Cheng, G., 2016. An
11
Soil & Tillage Research 197 (2020) 104513
S. Mehdizadeh, et al.
Yang, G., Bowling, L.C., 2014. Detection of changes in hydrologic system memory associated with urbanization in the Great Lakes region. Water Resour. Res. 50 (5), 3750–3763. Zeynoddin, M., Bonakdari, H., Ebtehaj, I., Esmaeilbeiki, F., Gharabaghi, B., Haghi, D.Z., 2019. A reliable linear stochastic daily soil temperature forecast model. Soil Till. Res. 189, 73–87.
Wu, W., Tang, X.P., Guo, N.J., Yang, C., Liu, H.B., Shang, Y.F., 2013. Spatiotemporal modeling of monthly soil temperature using artificial neural networks. Theor. Appl. Climatol. 113 (3–4), 481–494. Xing, L., Li, L., Gong, J., Ren, C., Liu, J., Chen, H., 2018. Daily soil temperatures predictions for various climates in United States using data-driven model. Energy 160, 430–440.
12