Developing novel hybrid models for estimation of daily soil temperature at various depths

Soil & Tillage Research 197 (2020) 104513 Contents lists available at ScienceDirect Soil & Tillage Research journal homepage: www.elsevier.com/locat...

Download PDF

2MB Sizes 0 Downloads 48 Views

Report

PDF Reader
Full Text

Soil & Tillage Research 197 (2020) 104513

Contents lists available at ScienceDirect

Soil & Tillage Research journal homepage: www.elsevier.com/locate/still

Developing novel hybrid models for estimation of daily soil temperature at various depths

T

Saeid Mehdizadeha,*, Farshad Fathianb, Mir Jafar Sadegh Safaric, Ali Khosravid a

Department of Water Engineering, Urmia University, Urmia, Iran Department of Water Science & Engineering, Faculty of Agriculture, Vali-e-Asr University of Rafsanjan, P.O.Box 77188-97111, Rafsanjan, Iran c Department of Civil Engineering, Yaşar University, Izmir, Turkey d Department of Mechanical Engineering, School of Engineering, Aalto University, Helsinki, Finland b

A R T I C LE I N FO

A B S T R A C T

Keywords: Estimation Daily soil temperature Fractionally autoregressive integrated moving average Feed-forward back propagation neural networks Gene expression programming

Estimation of soil temperature (ST) as one of the vital parameters of soil, which has an impact on many chemical and physical characteristics of soil, is of great importance in soil science. This study applies a time series-based model, namely fractionally autoregressive integrated moving average (FARIMA), as well as two machine learning-based models consisting of feed-forward back propagation neural networks (FFBPNN) and gene expression programming (GEP) for daily ST estimation. In doing so, the daily ST data of three stations at four depths (5, 10, 50, and 100 cm) in Iran were used for the time period from 1998 to 2017. Studied stations were selected from diﬀerent climates including arid (Isfahan station), semi-arid (Urmia station), and very humid (Rasht station) to evaluate the performance of models and generalize the outcomes in diﬀerent climate classes. The performances of the developed models are evaluated via three statistical metrics including the root mean square error (RMSE), mean absolute error (MAE), and relative RMSE (RRMSE). Results obtained demonstrated that the machine learning-based FFBPNN and GEP models performed better than the time series-based FARIMA approach at all depths. As a result, negligible diﬀerences were observed between the accuracies of FFBPNN and GEP. In addition, this study developed novel hybrid models through combining the FFBPNN and GEP techniques with the FARIMA to enhance the accuracy of traditional FARIMA, FFBPNN, and GEP. The developed hybrid models named GEP-FARIMA and FFBPNN-FARIMA were found to achieve better estimates of daily ST data at diﬀerent depths in comparison with the classical models. The daily ST estimates with the highest accuracy were observed at a depth of 50 cm via the GEP-FARIMA at Isfahan station (RMSE = 0.05 °C, MAE = 0.03 °C, RRMSE = 0.25% for the testing phase), the GEP-FARIMA at Urmia station (RMSE = 0.04 °C, MAE = 0.03 °C, RRMSE = 0.26% for the testing phase), and the FFBPNN-FARIMA at Rasht station (RMSE = 0.07 °C, MAE = 0.05 °C, RRMSE = 0.35% for the testing phase).

1. Introduction Soil temperature (ST) is one of the crucial parameters of soil that controls the equilibrium of the heat energy amongst atmosphere and ground surface (Sanikhani et al., 2018), underground physical processes and carbon budget (Samadianfard et al., 2018a). Soil thermal regime determines the directions and rates of physical processes in soil such as moisture gradient and thermal ﬂuxes (Araghi et al., 2017). It can also aﬀect mass transfer in soil, soil structure and nutrient uptake (Børresen et al., 2007; Li et al., 2008; Wu et al., 2010; Xing et al., 2018), plant growth (Brar et al., 1992; Liu and Huang, 2005), seed germination (Nabi and Muillins, 2008), accumulation of organic matter in soil, soil

respiration, organic matter destruction (Seyfried et al., 2001; Schimel et al., 2004; Rube, 2005; Xing et al., 2018), root development, appearance and growth of seedling (Hillel, 1998). Despite the importance and critical need for the knowledge of ST values in various ﬁelds of engineering, particularly in agriculture, accessibility to the ST data is very limited in many areas (i.e., developing countries). ST values should be measured by thermometers installed at diﬀerent soil depths, which is a time-consuming and costly task (Hu et al., 2016; Feng et al., 2019). The measurement error for the ST thermometers is 0.1 °C. Thus, alternative approaches have recently attracted much attention. In this regard, some traditional approaches including the soil heat ﬂow, energy balance, empirical correlations with

⁎

Corresponding author. E-mail addresses: [email protected] (S. Mehdizadeh), [email protected] (F. Fathian), [email protected] (M.J.S. Safari), ali.khosravi@aalto.ﬁ (A. Khosravi). https://doi.org/10.1016/j.still.2019.104513 Received 8 June 2019; Received in revised form 10 October 2019; Accepted 16 November 2019 0167-1987/ © 2019 Elsevier B.V. All rights reserved.

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

approaches with the FARIMA for improving the accuracy of classical models. Accordingly, the hybrid models named GEP-FARIMA and FFBPNN-FARIMA are proposed. It is worth mentioning that the performance of FARIMA, as well as proposed hybrid models are evaluated for the ﬁrst time in this study for daily ST estimation at various soil depths.

easily obtainable parameters (Kang et al., 2000) as well as numerical, analytical and experimental models (Behmanesh and Mehdizadeh, 2017) can be utilized to estimate ST. However, the application of aforementioned approaches seems to be a sophisticated and time-consuming task. In addition, data-driven methods including the machine learning-based and time series-based models could be applied; however, the machine learning models are more commonly used than the time series models for estimating the ST. The machine learning techniques have been broadly applied in literature to estimate ST that examples of the conducted studies in this ﬁeld are brieﬂy presented below: An artiﬁcial neural network (ANN) approach was implemented by Bilgili (2011) for estimating monthly ST of various depths at Adana, Turkey, and concluded that the ANN is an appropriate tool for the estimation of monthly ST. Wu et al. (2013) predicted the mean monthly ST of 83 regions located in southwestern China, by applying the ANN. It was reported that the ANN is a promising technique for estimating the mean monthly ST time series of the studied areas. Kim and Singh (2014) modeled daily ST of Champaign and Springﬁeld stations, USA, utilizing adaptive neuro-fuzzy inference system (ANFIS) and multi-layer perceptron (MLP). The MLP showed better estimates of ST than those of the ANFIS. Kisi et al. (2015) studied the applicability of diﬀerent types of the ANN consisting of radial basis neural networks (RBNN), generalized regression neural networks (GRNN), and MLP for modeling the monthly ST values at diﬀerent depths for Mersin, Turkey. The performance of these techniques was also compared with simple multiple linear regression (MLR). They reported that RBNN at surface soil layers (i.e. 5 and 10 cm), as well as GRNN and MLR at deep layers (i.e. 50 and 100 cm) performed better than the other models. Tabari et al. (2015) employed the ANN to forecast the ST time series of two stations in Iran with diﬀerent climates (i.e., Zahedan and Sari). A reliable capability of the ANN technique was reported to estimate ST of various depths at the studied areas. Mehdizadeh et al. (2017a) stated a better accuracy of the ANFIS in comparison with the gene expression programming (GEP) and ANN models when estimating the mean monthly ST of 31 stations, Iran. Samadianfard et al. (2018b) coupled wavelet analysis with the ANN and GEP models for prediction of daily ST of Tabriz station, northwest of Iran. The WANN models developed at diﬀerent soil depths presented superior results compared with the WGEP. Two machine learning models, namely support vector machine (SVM) and multivariate adaptive regression splines (MARS) were employed by Mehdizadeh et al. (2018a) in estimation of mean monthly ST of 30 stations, Iran. The MARS was found to achieve better eﬃciency than the SVM at diﬀerent soil depths. Kazemi et al. (2018) trained a genetic-based neural network ensemble by applying a sequential genetic-based negative correlation learning algorithm for estimating daily ST of two stations in Iran. They found out that the developed model performed the best. Sihag et al. (2019) applied MLP, random forest (RF), Gaussian process (GP), and M5P models for modeling the daily ST and the MLP was found to perform better than other methods at two studied stations in Iran. Feng et al. (2019) developed back propagation neural networks (BPNN), extreme learning machine (ELM), RF, and GRNN approaches for halfhourly ST estimation of various depths in a rainfed maize ﬁeld situated in the north of China and concluded that the ELM was the best technique. Clearly, the literature review revealed that the time series models have received less attention compared to the machine learning models in estimating the ST. In addition, the hybrid models have not been reported in previous works for the estimation of ST via combining the machine learning and time series models. The objective of the current study is to estimate the daily time series of ST for various depths (i.e., 5, 10, 50, and 100 cm) at three stations with various climate classes in Iran. To this end, a time series-based model, namely fractionally autoregressive integrated moving average (FARIMA), and two machine learning-based models including the GEP and feed forward BPNN (FFBPNN) were applied. Afterwards, hybrid models were developed through combining the GEP and FFBPNN

2. Materials and methods 2.1. Study sites and dataset description In the present work, ST datasets of three stations located in Iran at four soil depths (i.e., 5, 10, 50, and 100 cm) were used over a daily time scale. It is necessary to mention that the daily ST at Iranian stations is measured at three times (i.e., 03:00, 09:00, and 15:00). The average ST data recorded during these three times are used as the mean daily ST. The studied stations were selected from diﬀerent climates to investigate the performance of classical and proposed hybrid models for ST estimation in various climatic zones. The considered stations include Isfahan (arid), Urmia (semi-arid), and Rahst (very humid) where, their climates were discerned using the aridity index suggested by de Martonne (1925). Geographic locations of the studied areas are depicted in Fig. 1. Moreover, the geographic attributes of the studied stations are given in Table 1. The daily ST data of stations at 5, 10, 50, and 100 cm soil depths were collected from Iran Meteorological Organization (IMO) between the time period of 1 January 1998 and 31 December 2017. It should be noted that the ST is measured manually by an IMO operator using mercury thermometers installed at diﬀerent soil depths. For the case of Urmia and Isfahan stations, there was no missing ST data at the depths of 5 and 10 cm; however, the number of missing data at 50 cm were 25 (Urmia) and 11 (Isfahan) as well as 50 (Urmia) and 2 (Isfahan) at the depth of 100 cm. Regarding the Rasht station, there were 11, 11, 29, and 19 missing ST data at the depths of 5, 10, 50, and 100 cm, respectively. Obviously, the number of missing data is negligible against the entire daily dataset used for the 20-year time period. The missing ST data were ﬁlled by the average values. The entire ST data were split into training and testing phases so that the ﬁrst ﬁfteen-year data (from 1998 to 2012; 75% of the entire data) and the remaining ﬁve-year data (between 2013 and 2017; 25% of the entire data) were respectively applied as the training and testing datasets. Time series graphs of the daily ST at four diﬀerent depths are illustrated in Fig. 2 for the time period of 1998 to 2017. Table 2 tabulates the statistical parameters of the daily ST data at diﬀerent soil depths for training and testing phases separately. As can be seen clearly, the statistical parameters of ST data at diﬀerent depths for the studied stations are generally similar for both of the training and testing periods. xmin and xmax of ST datasets at the studied sites are respectively increased and decreased by increasing the soil depth. Moreover, the ST datasets illustrate a low skewness. On the other hand, by comparing the values of xsd and xcv at various soil depths, it is obvious that surface soil layers (5 and 10 cm) present higher standard deviations and variation coeﬃcients in comparison with the bottom layers (50 and 100 cm). It is necessary to mention that the entire ST datasets were standardized to avoid the inﬂuence of time scale in the process of ST estimation. 2.2. Fractionally autoregressive integrated moving average One of the general types of linear autoregressive integrated moving average (ARIMA) time series model is the fractionally ARIMA (i.e., FARIMA) model that uses non-integer values for the diﬀerencing parameter of the ARIMA instead of integer values. This model can be applicable for the time series having long-term memory. When a time series has a long-term memory, sample autocorrelation values reduce to zero at a slower rate than the rate of reducing for the AR(1) time series 2

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

Fig. 1. The geographic locations of the studied stations in Iran.

z t = (1 − B )dyt = [ϕp (B )]−1ψq (B ) εt

Table 1 The geographic information of the studied stations in Iran. Station Isfahan Urmia Rasht

Longitude (N) 51° 40′ 45° 03′ 49° 37′

Latitude (E) 32° 37′ 37° 40′ 37° 19′

Altitude (m) 1550.4 1328.0 −8.6

(1 − B )d = 1 − dB +

Climate

(2)

B )d

is the fractionally diﬀerwhere a formal binomial expansion (1 − encing operator and z t is the fractionally diﬀerenced time series. The expansion of Eq. (2) is limited at a proper large lag (L) as the ﬁnal optimum lag time applied on yt time series (Metcalfe and Cowpertwait, 2009). This lag can be set to L = 30 (the number of expanded terms in Eq. (2)) in practice. For example, if d = 0.45 then z t = yt − 0.45yt − 1 − 0.12375yt − 2 ...−0.00203yt − 30 . In general, three basic steps are required to implement a linear FARIMA that include model identiﬁcation, model estimation, and model diagnostic checking. Readers can refer to Hipel and McLeod (1996) to see more details on the modeling procedure by the FARIMA model.

Arid Semi-arid Very humid

model (Mehdizadeh et al., 2019). The general equation of a FARIMA(p,d,q) model can be expressed as follows (Yang and Bowling, 2014):

ϕp (B )(1 − B )dyt = ψq (B ) εt

d (d − 1) 2 d (d − 1)(d − 2) 3 B − B + ... 2! 3!

(1)

where yt is the observational time series in the standardized form; ϕp (B ) is an autoregressive polynomial of order p; ψq (B ) is a moving average polynomial of order q; B is the backward operator, i.e., Bx t = x t − 1; d is the fractional diﬀerence parameter, and εt ∼ (0, σε2) is an independent identically distributed (i.i.d.) normal error time series. The non-integer value shows long-term memory processes that can be carried out on the fractionally diﬀerenced series yt according to the following equations:

2.3. Gene expression programming Gene expression programming (GEP) suggested by Ferreira (2001) is in fact an advanced form of genetic algorithm (GA) as an optimization algorithm (Holland, 1975; Goldberg, 1989) and genetic programming (GP) as a biologically-inspired technique (Koza, 1992). The ﬁrst step in GEP is generating an initial population. Next, expression trees are created to solve the deﬁned problem. Indeed, expression trees 3

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

are given below: Mutation = 0.044; inversion = 0.1; IS transposition = 0.1; RIS transposition = 0.1; one-point recombination = 0.3; two-point recombination = 0.3; gene recombination = 0.1; gene transposition = 0.1. More details about the GEP can be found at Mehdizadeh et al. (2017b). 2.4. Feed-forward back propagation neural network Feed-forward back propagation neural network (FFBPNN) is known as the most popular artiﬁcial neural network (ANN) technique utilized in many engineering applications. It includes three layers, namely input, hidden and output layers and generates a general framework for ﬁnding a nonlinear functional mapping between the input and output parameters. Inputs are combined in linear forms and then non-linear functions are provided by using an activation function. The nodes at the input layer provide respective elements as input signals to the second layer, which are used as input for the third layer of the network at FFBPNN structure. Computation nodes or hidden neurons in the hidden layers intervene between the input and output of the model. At the output layer of the FFBPNN, Levenberg-Marquardt (LM) optimization technique is used for computation of output signals from input information. Higher number of hidden layer at FFBPNN structure provides higher order statistics; however, it is reported that one hidden layer is quite appropriate for engineering problems (Hornik et al., 1989; Safari et al., 2016). The output of a FFBPNN model can be expressed as (Kim and Valdés, 2003; Nourani, 2017): M

N

⎡ ⎤ ⎞ ⎛ yˆk = f0 ⎢∑ wkj. fh ⎜∑ wji x i + bj0⎟ + bk 0⎥ ⎠ ⎝ i=1 ⎣ i=1 ⎦

(3)

where f0 and fh indicate the activation functions of output and hidden neurons respectively; wkj output layer weight corresponding to the jth and kth neurons in the output and hidden layers, respectively; wji hidden layer weight corresponding to the ith and jth neurons in the input and hidden layers, respectively; bj0 jth hidden neuron bias and bk0 kth output neuron bias. To calibrate the FFBPNN structure, it is an essential task to adjust the number of neurons in the hidden layer to obtain best performance of the model. Accordingly, it is aimed to ﬁx the number of neurons in the hidden layer through a trial and error method. The performances of the developed FFBPNN modes are evaluated for all cases through considering diﬀerent numbers of neurons in the hidden layer within the range of 1–15. The FFBPNN codes were written in MATLAB programming language.

Fig. 2. Time series graphs of the measured daily ST data at various depths for the studied stations.

comprise of chromosomes and each chromosome consists of genes. The results obtained are then evaluated via a ﬁtness function to specify the GEP's solution suitability. If the best solution is found by the GEP, the program stops. Otherwise, the best solution for the current generation will be maintained and the evolution process is reiterated to ﬁnd the best solution. GeneXproTools program was utilized in this study in ST estimation process. Implementing the GEP program to solve any speciﬁc problem includes the following steps: 1. At ﬁrst, a ﬁtness function is determined to assess the solutions. Root mean square error (RMSE) was applied in this study. 2. The terminals set and functions set should be deﬁned to the model. Terminals set are comprised of input and output parameters. Additionally, (+,−,×,÷,lnx , e x , x 2 , x 3 , x , 3 x , sinx , cosx , arctanx ) were used as the functions set. 3. Determining the chromosomal structure consisting of the number of chromosomes (i.e., 30), number of genes per chromosome (i.e., 3) and head size (i.e., 7) is the third step. 4. Next, a linking function is deﬁned to the GEP model to connect sub-expression trees, which an addition function was employed in this work. 5. Finally, the rates of genetic operators are introduced to the GEP. The rates of various genetic operators used when implementing the GEP

2.5. Classical and hybrid models development Time series-based models (e.g., FARIMA used in this study) work on standardized observation values. Mathematical equations are presented by the time series models, which are used for estimating the measured dataset. It is notable that the time series-based models are also known as stochastic models because they can estimate the stochastic component of the measured data. On the other hand, machine learning-based techniques (e.g., GEP and FFBPNN applied in the present study) require a series of input data for estimating the target variable. Indeed, these methods have the ability to simulate the deterministic part of the measured data. Here, historical records of ST data (i.e., one-day lagged data) in the standardized form were used as inputs to construct the GEP and FFBPNN models. In addition, the below relationship was used for developing the hybrid models (i.e., GEP-FARIMA and FFBPNN-FARIMA) as:

STh = STs + STd 4

(4)

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

Table 2 The daily statistical parameters of the measured ST data at various depths for the considered stations. Station

Dataset

Soil depth (cm)

xmin (°C)

xmax (°C)

xmean (°C)

xsd (°C)

xs

xcv

Isfahan

Training

5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100

−3.33 −2.73 4.73 7.47 −3.00 −1.20 6.40 10.40 −7.73 −7.47 −1.30 3.20 −4.87 −4.80 0.17 4.07 0.00 0.67 3.47 5.17 0.33 0.73 4.07 7.13

45.53 39.60 34.53 31.97 42.13 39.80 33.53 30.60 39.00 35.47 28.70 26.70 38.30 35.10 27.93 27.00 37.80 34.67 31.40 27.73 40.20 37.40 32.20 29.87

21.08 20.75 20.36 20.58 21.74 20.91 20.73 21.01 15.24 14.70 14.65 14.71 15.23 14.52 14.27 14.03 18.23 18.05 17.98 17.72 19.79 19.30 19.00 18.84

11.56 11.34 8.36 6.27 11.76 10.92 7.95 6.06 11.40 10.66 8.45 6.60 11.63 10.86 8.53 6.38 8.02 7.65 6.27 5.08 9.79 9.19 6.91 5.63

−0.04 −0.10 −0.09 −0.07 −0.05 −0.09 −0.06 −0.04 0.03 0.01 0.01 0.08 0.02 −0.01 −0.01 0.03 0.01 −0.04 −0.07 −0.04 0.22 0.16 0.09 0.07

0.55 0.55 0.41 0.30 0.54 0.52 0.38 0.29 0.75 0.73 0.58 0.45 0.76 0.75 0.60 0.45 0.44 0.42 0.35 0.29 0.49 0.48 0.36 0.30

Testing

Urmia

Training

Testing

Rasht

Training

Testing

xmin, xmax, xmean, xsd, xs, and xcv denote the minimum, maximum, mean, standard deviation, skewness coeﬃcient, and coeﬃcient of variation of the measured ST data, respectively.

mean of the observed soil temperature data and the total number of observations for each training and testing datasets. The RRMSE criterion can be utilized to describe the accuracy of models as: excellent (RRMSE < 10%); good (10% < RRMSE < 20%); fair (20% < RRMSE < 30%), and poor (RRMSE > 30%) (Mehdizadeh et al., 2017c). At ﬁrst, the optimal time series-based FARIMA models were ﬁtted to the standardized ST data of selected stations at various soil depths. The relationships obtained for the optimal FARIMA models are given in Tables 3a–3c. Moreover, the error metrics of RMSE, MAE, and RRMSE for the mentioned models at diﬀerent depths during training and testing phases are tabulated in Table 4. Obviously, the performance of FARIMA models developed at various depths of the Isfahan station is excellent based on the RRMSE values. For Urmia and Rasht stations, the accuracy of FARIMA models at surface layers (i.e., 5 and 10 cm) belongs to the good class while they are classiﬁed excellent for deep depths (i.e., 50 and 100 cm). On the other side, the accuracy of FARIMA models improves with increasing soil depth so that 5 and 100 cm depths, respectively, represent the worst and best performances at the studied regions. The error statistics obtained at a depth of 100 cm are RMSE = 0.69 °C, MAE = 0.50 °C, RRMSE = 3.34% for training phase, RMSE = 0.68 °C, MAE = 0.53 °C, RRMSE = 3.22% for testing stage at the Isfahan station; RMSE = 1.45 °C, MAE = 1.18 °C, RRMSE = 9.86% for training phase, RMSE = 0.82 °C, MAE = 0.68 °C, RRMSE = 5.85% for testing stage at the Urmia station; RMSE = 1.12 °C, MAE = 0.88 °C, RRMSE = 6.33% for training phase, RMSE = 1.43 °C, MAE = 1.12 °C, RRMSE = 7.58% for testing stage at the Rasht station. As already mentioned, for machine learning-based GEP and FFBPNN models, ST of each day at the studied stations was estimated utilizing the one-day lagged ST data. In the case of GEP, the number of generation = 1000 was considered for model stop criterion. The results showed that the accuracy of GEP models developed at various depths did not generally change for the number of generations above 600-700. Besides, trial and errors for the number of neurons in the hidden layer from 1–15 were performed to achieve the best performance of FFBPNN. The results of trial and errors revealed that there were slight diﬀerences between the accuracy of FFBPNN models for 1–15 neurons in the hidden layer. Table 5 represents the optimal structures of the FFBPNN

where STh indicates the estimated ST through the hybrid models; STs is the estimated stochastic component of ST data by the FARIMA, and STd denotes the estimated deterministic component of ST data by the GEP and FFBPNN models. In other words, the classical models are only capable of estimating a stochastic or deterministic component while the hybrid models proposed in this study consider both components in estimating the ST. 3. Results The entire daily ST datasets of the studied stations at diﬀerent soil depths were standardized by the application of the following equation:

STst =

STm − STm σSTm

(5)

where STst , STm , STm , and σSTm illustrate the standardized ST data, the measured ST data, mean of the measured ST data, and standard deviation of the measured ST data, respectively. After their standardization, the standardized ST data were applied to construct and develop the classical FARIMA, GEP, and FFBPNN, as well as the hybrid GEP-FARIMA and FFBPNN-FARIMA models. Then, the accuracy of aforementioned classical and proposed hybrid models were compared with each other for the estimation of daily ST at surface (5 and 10 cm) and deep soil layers (50 and 100 cm) with respect to three statistical measures, namely root mean square error (RMSE), mean absolute error (MAE), and relative root mean square error (RRMSE) as below: N

∑i = 1 (STm, i − STe, i )2

RMSE =

N

(6)

N

MAE =

∑i = 1 |STm, i − STe, i |

RRMSE =

N

(7)

RMSE × 100% STm

(8)

where STm, i and STe, i respectively indicate the measured and estimated soil temperature for the ith day, STm and N, respectively, denote the 5

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

Table 3a Equations of FARIMA models ﬁtted to the daily ST data at various depths (Isfahan station). Soil depth (cm)

Models

Equations

5

FARIMA(1,0.254,1)

10

FARIMA(1,0.235,1)

50

FARIMA(1,0.465,3)

100

FARIMA(2,0.541,4)

z t = STt − 0.25408*STt − 1 − 0.09476*STt − 2 − 0.05514*STt − 3 − 0.03785*STt − 4 − 0.02836*STt − 5 − 0.02243*STt − 6 − 0.01841*STt − 7 − 0.01552*STt − 8 − 0.01364*STt − 9 − 0.01168*STt − 10 − 0.01035*STt − 11 − 0.00927*STt − 12 − 0.00837*STt − 13 − 0.00762*STt − 14 − 0.00699*STt − 15 − 0.00644*STt − 16 − 0.00967*STt − 17 − 0.00555*STt − 18 − 0.00518*STt − 19 − 0.00486*STt − 20 − 0.00457*STt − 21 − 0.00430*STt − 22 − 0.00407*STt − 23 − 0.00386*STt − 24 − 0.00366*STt − 25 − 0.00349*STt − 26 − 0.00332*STt − 27 − 0.00332*STt − 28 − 0.00304*STt − 29 − 0.00291*STt − 30 z t = 0.6617*z t − 1 − 0.2826*εt − 1 z t = STt − 0.23585*STt − 1 − 0.09011*STt − 2 − 0.05299*STt − 3 − 0.03661*STt − 4 − 0.02756*STt − 5 − 0.02188*STt − 6 − 0.01802*STt − 7 − 0.01524*STt − 8 − 0.01314*STt − 9 − 0.01152*STt − 10 − 0.01022*STt − 11 − 0.00917*STt − 12 − 0.00830*STt − 13 − 0.00756*STt − 14 − 0.00694*STt − 15 − 0.00640*STt − 16 − 0.00594*STt − 17 − 0.00553*STt − 18 − 0.00517*STt − 19 − 0.00485*STt − 20 − 0.00456*STt − 21 − 0.00431*STt − 22 − 0.00408*STt − 23 − 0.00387*STt − 24 − 0.00367*STt − 25 − 0.00350*STt − 26 − 0.00334*STt − 27 − 0.00319*STt − 28 − 0.00306*STt − 29 − 0.00293*STt − 30 z t = 0.6859*z t − 1 − 0.1991*εt − 1 z t = STt − 0.46525*STt − 1 − 0.12439*STt − 2 − 0.06363*STt − 3 − 0.04032*STt − 4 − 0.02850*STt − 5 −0.02154*STt − 6 − 0.01703*STt − 7 − 0.01391*STt − 8 − 0.01165*STt − 9 − 0.00994*STt − 10 −0.00861*STt − 11 − 0.00756*STt − 12 − 0.00671*STt − 13 − 0.00601*STt − 14 − 0.00542*STt − 15 −0.00492*STt − 16 − 0.00450*STt − 17 − 0.00413*STt − 18 − 0.00381*STt − 19 − 0.00353*STt − 20 −0.00329*STt − 21 − 0.00307*STt − 22 − 0.00287*STt − 23 − 0.00270*STt − 24 − 0.00254*STt − 25 −0.00239*STt − 26 − 0.00226*STt − 27 − 0.00214*STt − 28 − 0.00204*STt − 29 − 0.00194*STt − 30 z t = 0.7305*z t − 1 − 0.3729*εt − 1 + 0.0807*εt − 2 + 0.0515*εt − 3 z t = STt − 0.54105*STt − 1 − 0.12415*STt − 2 − 0.06037*STt − 3 − 0.03711*STt − 4 − 0.02567*STt − 5 −0.01908*STt − 6 − 0.01488*STt − 7 − 0.01201*STt − 8 − 0.00995*STt − 9 − 0.00842*STt − 10 −0.00724*STt − 11 − 0.00631*STt − 12 − 0.00556*STt − 13 − 0.00495*STt − 14 − 0.00444*STt − 15 −0.00401*STt − 16 − 0.00365*STt − 17 − 0.00333*STt − 18 − 0.00306*STt − 19 − 0.00283*STt − 20 −0.00262*STt − 21 − 0.00243*STt − 22 − 0.00227*STt − 23 − 0.00213*STt − 24 − 0.00199*STt − 25 −0.00188*STt − 26 − 0.00177*STt − 27 − 0.00167*STt − 28 − 0.00158*STt − 29 − 0.00150*STt − 30 z t = 0.0169*z t − 1 + 0.7256*z t − 2 − 0.1660*εt − 1 − 0.6551*εt − 2 + 0.1867*εt − 3 + 0.0862*εt − 4

illustrate the best performance at 100 cm soil depth. The performance of these models gets worse with reducing depth. The most important feature of the GEP technique is the provision of algebraic relationships between the input and output parameters that makes it superior to other machine learning models. The algebraic equations obtained for the diﬀerent soil depths at the studied areas are given in Table 9. These equations can be applied to estimate soil

models developed at diﬀerent depths of considered regions. The statistical errors computed for the classical GEP and FFBPNN models at diﬀerent soil depths are presented in Tables 6–8 for the Isfahan, Urmia, and Rasht stations, respectively. It is apparent that the performances of machine learning-based GEP and FFBPNN models at all depths of the stations are in the excellent class based on the RRMSE index. Similar to the outcomes observed for the FARIMA models, the GEP and FFBPNN

Table 3b Equations of FARIMA models ﬁtted to the daily ST data at various depths (Urmia station). Soil depth (cm)

Models

5

FARIMA(1,0.230,1)

10

FARIMA(1,0.320,1)

50

FARIMA(4,0.753,1)

100

FARIMA(5,0.863,2)

Equations

zt

zt

zt

zt

z t = STt − 0.23088*STt − 1 − 0.08878*STt − 2 − 0.05235*STt − 3 − 0.03624*STt − 4 − 0.02732*STt − 5 −0.02171*STt − 6 − 0.01789*STt − 7 − 0.01514*STt − 8 − 0.01307*STt − 9 − 0.01146*STt − 10 −0.01018*STt − 11 − 0.00913*STt − 12 − 0.00827*STt − 13 − 0.00754*STt − 14 − 0.00692*STt − 15 −0.00639*STt − 16 − 0.00593*STt − 17 − 0.00552*STt − 18 − 0.00516*STt − 19 − 0.00484*STt − 20 −0.00456*STt − 21 − 0.00430*STt − 22 − 0.00407*STt − 23 − 0.00386*STt − 24 − 0.00367*STt − 25 −0.00350*STt − 26 − 0.00334*STt − 27 − 0.00319*STt − 28 − 0.00306*STt − 29 − 0.00293*STt − 30 = 0.6468*z t − 1 − 0.1381*εt − 1 z t = STt − 0.32064*STt − 1 − 0.10819*STt − 2 − 0.06096*STt − 3 − 0.04083*STt − 4 − 0.03005*STt − 5 −0.02343*STt − 6 − 0.01901*STt − 7 − 0.01587*STt − 8 − 0.01354*STt − 9 − 0.01175*STt − 10 −0.01034*STt − 11 − 0.00920*STt − 12 − 0.00827*STt − 13 − 0.00749*STt − 14 − 0.00683*STt − 15 −0.00626*STt − 16 − 0.00578*STt − 17 − 0.00535*STt − 18 − 0.00498*STt − 19 − 0.00465*STt − 20 −0.00436*STt − 21 − 0.00410*STt − 22 − 0.00386*STt − 23 − 0.00365*STt − 24 − 0.00345*STt − 25 −0.00328*STt − 26 − 0.00312*STt − 27 − 0.00297*STt − 28 − 0.00284*STt − 29 − 0.00271*STt − 30 = 0.6078*z t − 1 − 0.1092*εt − 1 z t = STt − 0.75337*STt − 1 − 0.09290*STt − 2 − 0.03860*STt − 3 − 0.02168*STt − 4 − 0.01407*STt − 5 −0.00996*STt − 6 − 0.00746*STt − 7 − 0.00583*STt − 8 − 0.00469*STt − 9 − 0.00387*STt − 10 −0.00325*STt − 11 − 0.00277*STt − 12 − 0.00240*STt − 13 − 0.00210*STt − 14 − 0.00185*STt − 15 −0.00165*STt − 16 − 0.00148*STt − 17 − 0.00133*STt − 18 − 0.00121*STt − 19 − 0.00110*STt − 20 −0.00101*STt − 21 − 0.00093*STt − 22 − 0.00086*STt − 23 − 0.00080*STt − 24 − 0.00074*STt − 25 −0.00069*STt − 26 − 0.00064*STt − 27 − 0.00060*STt − 28 − 0.00057*STt − 29 − 0.00053*STt − 30 = 1.1807*z t − 1 − 0.0835*z t − 2 − 0.0943*z t − 3 − 0.0352*z t − 4 − 0.9669*εt − 1 z t = STt − 0.86328*STt − 1 − 0.05901*STt − 2 − 0.02236*STt − 3 − 0.01194*STt − 4 − 0.00749*STt − 5 −0.00516*STt − 6 − 0.00379*STt − 7 − 0.00290*STt − 8 − 0.00230*STt − 9 − 0.00187*STt − 10 −0.00155*STt − 11 − 0.00131*STt − 12 − 0.00112*STt − 13 − 0.00097*STt − 14 − 0.00085*STt − 15 −0.00075*STt − 16 − 0.00067*STt − 17 − 0.00060*STt − 18 − 0.00054*STt − 19 − 0.00049*STt − 20 −0.00045*STt − 21 − 0.00041*STt − 22 − 0.00037*STt − 23 − 0.00034*STt − 24 − 0.00032*STt − 25 −0.00030*STt − 26 − 0.00027*STt − 27 − 0.00026*STt − 28 − 0.00024*STt − 29 − 0.00022*STt − 30 = 1.4746*z t − 1 − 0.3125*z t − 2 − 0.0805*z t − 3 − 0.0946*z t − 4 + 0.0013*z t − 5 − 1.6880*εt − 1 + 0.6992*εt − 2

6

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

Table 3c Equations of FARIMA models ﬁtted to the daily ST data at various depths (Rasht station). Soil depth (cm)

Models

Equations

5

FARIMA(3,0.581,2)

10

FARIMA(5,0.465,2)

50

FARIMA(4,0.399,1)

100

FARIMA(1,0.461,1)

z t = STt − 0.58184*STt − 1 − 0.12165*STt − 2 − 0.05750*STt − 3 − 0.03476*STt − 4 − 0.02376*STt − 5 −0.01750*STt − 6 − 0.01354*STt − 7 − 0.01086*STt − 8 − 0.00895*STt − 9 − 0.00754*STt − 10 −0.00645*STt − 11 − 0.00560*STt − 12 − 0.00492*STt − 13 − 0.00436*STt − 14 − 0.00390*STt − 15 −0.00352*STt − 16 − 0.00319*STt − 17 − 0.00291*STt − 18 − 0.00266*STt − 19 − 0.00245*STt − 20 −0.00227*STt − 21 − 0.00210*STt − 22 − 0.00196*STt − 23 − 0.00183*STt − 24 − 0.00171*STt − 25 −0.00161*STt − 26 − 0.00151*STt − 27 − 0.00143*STt − 28 − 0.00135*STt − 29 − 0.00128*STt − 30 z t = 0.4524*z t − 1 + 0.7598*z t − 2 − 0.2637*z t − 3 − 0.1678*εt − 1 − 0.8001*εt − 2 z t = STt − 0.46539*STt − 1 − 0.12440*STt − 2 − 0.06363*STt − 3 − 0.04032*STt − 4 − 0.02850*STt − 5 −0.02154*STt − 6 − 0.01703*STt − 7 − 0.01391*STt − 8 − 0.01164*STt − 9 − 0.00994*STt − 10 −0.00861*STt − 11 − 0.00756*STt − 12 − 0.00671*STt − 13 − 0.00600*STt − 14 − 0.00542*STt − 15 − 0.00492*STt − 16 − 0.00450*STt − 17 − 0.00413*STt − 18 − 0.00381*STt − 19 − 0.00353*STt − 20 −0.00328*STt − 21 − 0.00307*STt − 22 − 0.00287*STt − 23 − 0.00269*STt − 24 − 0.00254*STt − 25 − 0.00239*STt − 26 − 0.00226*STt − 27 − 0.00214*STt − 28 − 0.00204*STt − 29 − 0.00194*STt − 30 z t = −0.4340*z t − 1 − 0.4052*z t − 2 + 0.4701*z t − 3 − 0.0372*z t − 4 − 0.0396*z t − 5 + 0.9761*εt − 1 + 0.9289*εt − 2 z t = STt − 0.39966*STt − 1 − 0.11996*STt − 2 − 0.06399*STt − 3 − 0.04160*STt − 4 − 0.02995*STt − 5 −0.02296*STt − 6 − 0.01837*STt − 7 − 0.01516*STt − 8 − 0.01280*STt − 9 − 0.001101*STt − 10 −0.00960*STt − 11 − 0.00848*STt − 12 − 0.00757*STt − 13 − 0.00681*STt − 14 − 0.00618*STt − 15 −0.00564*STt − 16 − 0.00517*STt − 17 − 0.00477*STt − 18 − 0.00442*STt − 19 − 0.00411*STt − 20 −0.00383*STt − 21 − 0.00359*STt − 22 − 0.00337*STt − 23 − 0.00317*STt − 24 − 0.00300*STt − 25 −0.00283*STt − 26 − 0.00269*STt − 27 − 0.00255*STt − 28 − 0.00243*STt − 29 − 0.00232*STt − 30 z t = 1.3578*z t − 1 − 0.3473*z t − 2 − 0.0238*z t − 3 − 0.0282*z t − 4 − 0.8188*εt − 1 z t = STt − 0.46111*STt − 1 − 0.12424*STt − 2 − 0.06373*STt − 3 − 0.04045*STt − 4 − 0.02863*STt − 5 −0.02165*STt − 6 − 0.01713*STt − 7 − 0.01400*STt − 8 − 0.01173*STt − 9 − 0.01001*STt − 10 −0.00868*STt − 11 − 0.00763*STt − 12 − 0.00677*STt − 13 − 0.00606*STt − 14 − 0.00547*STt − 15 −0.00497*STt − 16 − 0.00454*STt − 17 − 0.00417*STt − 18 − 0.00385*STt − 19 − 0.00357*STt − 20 −0.00332*STt − 21 − 0.00310*STt − 22 − 0.00290*STt − 23 − 0.00273*STt − 24 − 0.00257*STt − 25 −0.00242*STt − 26 − 0.00229*STt − 27 − 0.00217*STt − 28 − 0.00206*STt − 29 − 0.00196*STt − 30 z t = 0. 9217*z t − 1 − 0.7074*εt − 1

temperature of each day (i.e., STt) at various depths by using soil temperature of the previous day (i.e., STt-1). As noted, one of the main goals of this study is to improve the estimation of daily ST at various depths by the novel hybrid models. For this purpose, the time series-based FARIMA was combined with the machine learning-based GEP and FFBPNN models. The results obtained for the hybrid models named GEP-FARIMA and FFBPNN-FARIMA are illustrated in Tables 6–8. As seen, the proposed hybrid models present superior performances than the classical FARIMA, GEP, and FFBPNN models. The greatest rates of improvements in the performance of classical models through the hybrid models at the considered regions were observed at a depth of 5 cm. For example, the statistical errors of RMSE, MAE and RRMSE at the 5 cm depth by the GEP during the testing phase are 1.47 °C, 1.05 °C, 6.76% (Isfahan station), 1.18 °C, 0.83 °C, 7.75% (Urmia station), and 1.60 °C, 1.20 °C, 8.09% (Rasht station) decrease to 0.24 °C, 0.18 °C, 1.12% (Isfahan station), 0.11 °C, 0.08 °C, 0.71% (Urmia station), and 0.25 °C, 0.18 °C, 1.25% (Rasht station) via the proposed GEP-FARIMA hybrid model. In addition, similar to the classical FARIMA, GEP, and FFBPNN models, the hybrid models proposed in this study oﬀer better results at deep soil layers in comparison

Table 5 The optimal structures of the FFBPNN models developed at various depths for the studied stations. Station

Isfahan Urmia Rasht a

Soil depth (cm) 5

10

50

100

(1,2,1)a (1,3,1) (1,1,1)

(1,1,1) (1,1,1) (1,1,1)

(1,1,1) (1,1,1) (1,1,1)

(1,3,1) (1,2,1) (1,1,1)

(Input layer neuron, hidden layer neuron, output layer neuron).

with the surface layers. However, the proposed hybrid models performed the best at a depth of 50 cm for all the studied stations. In order to schematically describe the performance of hybrid models compared with the classical models, the depth of 5 cm was considered since this depth showed the greatest improvements in estimating the ST through the hybrid models. Then, time series graphs among the measured and estimated ST data by the proposed GEP-FARIMA hybrid model, as well as the classical FARIMA and GEP models were drawn and presented in Figs. 3–5 for the Isfahan, Urmia and Rasht stations,

Table 4 The statistical results of FARIMA models at various depths during the training and testing phases. Station

Isfahan

Urmia

Rasht

Soil depth (cm)

5 10 50 100 5 10 50 100 5 10 50 100

Training RMSE (°C)

MAE (°C)

RRMSE (%)

Testing RMSE (°C)

MAE (°C)

RRMSE (%)

1.68 1.57 1.04 0.69 1.76 1.72 1.49 1.45 2.07 1.92 1.53 1.12

1.29 1.21 0.79 0.50 1.36 1.34 1.15 1.18 1.65 1.54 1.20 0.88

7.97 7.58 5.13 3.34 11.54 11.68 10.18 9.86 11.35 10.61 8.51 6.33

1.79 1.64 1.03 0.68 1.72 1.65 1.19 0.82 3.04 2.77 1.77 1.43

1.40 1.27 0.79 0.53 1.38 1.32 0.96 0.68 2.34 2.15 1.37 1.12

8.24 7.83 4.98 3.22 11.27 11.39 8.32 5.85 15.35 14.33 9.32 7.58

7

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

Table 6 The statistical results of classical GEP, FFBPNN, and hybrid GEP-FARIMA and FFBPNN-FARIMA models at various depths during the training and testing phases (Isfahan station). Model

GEP

GEP-FARIMA

FFBPNN

FFBPNN-FARIMA

Soil depth (cm)

5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100

Training RMSE (°C)

MAE (°C)

RRMSE (%)

Testing RMSE (°C)

MAE (°C)

RRMSE (%)

1.40 1.12 0.34 0.34 0.23 0.13 0.06 0.16 1.39 1.12 0.34 0.34 0.26 0.15 0.07 0.15

1.00 0.83 0.23 0.18 0.17 0.10 0.04 0.10 1.00 0.82 0.23 0.18 0.18 0.11 0.05 0.10

6.62 5.40 1.68 1.63 1.10 0.62 0.29 0.76 6.62 5.39 1.69 1.63 1.21 0.70 0.35 0.71

1.47 1.20 0.34 0.35 0.24 0.13 0.05 0.16 1.48 1.19 0.34 0.35 0.29 0.14 0.06 0.16

1.05 0.86 0.21 0.19 0.18 0.10 0.03 0.11 1.06 0.85 0.21 0.18 0.21 0.11 0.04 0.10

6.76 5.73 1.65 1.65 1.12 0.61 0.25 0.77 6.82 5.71 1.64 1.66 1.35 0.68 0.31 0.77

Table 7 The statistical results of classical GEP, FFBPNN, and hybrid GEP-FARIMA and FFBPNN-FARIMA models at various depths during the training and testing phases (Urmia station). Model

GEP

GEP-FARIMA

FFBPNN

FFBPNN-FARIMA

Soil depth (cm)

5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100

Training RMSE (°C)

MAE (°C)

RRMSE (%)

Testing RMSE (°C)

MAE (°C)

RRMSE (%)

1.27 0.97 0.32 0.24 0.12 0.10 0.04 0.09 1.27 0.97 0.32 0.25 0.15 0.12 0.06 0.09

0.92 0.71 0.22 0.13 0.09 0.07 0.02 0.05 0.92 0.72 0.22 0.14 0.11 0.09 0.04 0.06

8.32 6.60 2.20 1.65 0.76 0.68 0.28 0.58 8.32 6.61 2.22 1.68 0.99 0.83 0.40 0.62

1.18 0.96 0.36 0.25 0.11 0.09 0.04 0.09 1.17 0.96 0.37 0.25 0.14 0.11 0.06 0.09

0.83 0.65 0.23 0.13 0.08 0.07 0.03 0.05 0.83 0.65 0.24 0.14 0.10 0.09 0.04 0.06

7.75 6.61 2.55 1.80 0.71 0.65 0.26 0.67 7.68 6.60 2.57 1.77 0.93 0.77 0.39 0.64

Table 8 The statistical results of classical GEP, FFBPNN, and hybrid GEP-FARIMA and FFBPNN-FARIMA models at various depths during the training and testing phases (Rasht station). Model

GEP

GEP-FARIMA

FFBPNN

FFBPNN-FARIMA

Soil depth (cm)

5 10 50 100 5 10 50 100 5 10 50 100 5 10 50 100

Training RMSE (°C)

MAE (°C)

RRMSE (%)

Testing RMSE (°C)

MAE (°C)

RRMSE (%)

1.20 0.89 0.48 0.38 0.14 0.18 0.10 0.11 1.20 0.89 0.48 0.38 0.16 0.18 0.07 0.11

0.88 0.66 0.28 0.20 0.10 0.13 0.05 0.07 0.89 0.66 0.29 0.20 0.12 0.13 0.05 0.07

6.59 4.95 2.67 2.16 0.76 1.00 0.53 0.59 6.60 4.95 2.67 2.17 0.88 1.01 0.41 0.61

1.60 1.19 0.45 0.42 0.25 0.19 0.08 0.12 1.63 1.21 0.46 0.44 0.37 0.30 0.07 0.15

1.20 0.90 0.32 0.26 0.18 0.14 0.06 0.08 1.24 0.91 0.33 0.28 0.24 0.21 0.05 0.10

8.09 6.27 2.39 2.24 1.25 0.98 0.42 0.63 8.26 6.29 2.41 2.31 1.89 1.54 0.35 0.81

8

5 10 50 100

5 10 50 100 5 10 50 100

Isfahan

Urmia

Rasht

Soil depth (cm)

Station

Table 9 The algebraic equations presented by the GEP at various depths.

STt STt STt STt STt STt STt STt

= = = = = = = =

+ exp ((((STt − 1 * STt − 1)^(1.0/3.0)) − ((6.224151 + STt − 1) + atan (STt − 1)))) exp ((−9.306061 − cos ((((−9.306061 + STt − 1) − 9.306061)^3)))) + (STt − 1 * cos (sin (sin (4.892487)))) + (cos (atan ((cos (STt − 1) − 7.712769)))* STt − 1) STt − 1 + ((atan (−4.004577)*(−4.004577 * STt − 1))*(atan (−4.004577)* exp (−4.004577))) + ((atan (STt − 1)*(−4.004577 − STt − 1))*(exp (−4.004577)*(−2.363617 + 2.363617))) STt − 1 + (((−0.053833 * STt − 1)*(0.053833 * 9.348816)) + ((STt − 1 * STt − 1)*(−0.053833^3))) + ((cos (STt − 1)/(2.736023 * 2.736023))/((STt − 1 + STt − 1) + (2.736023^3))) (exp (((STt − 1 − 7.403839) + STt − 1))/((7.403839 − 6.657135) + (STt − 1 + STt − 1))) + (STt − 1/(((STt − 1 * STt − 1) + exp (STt − 1)) + (−5.015502^3))) + STt − 1 (((atan (−0.445679)^3)* atan (STt − 1)) + STt − 1) + sin (−0.814331) + cos (atan (((sin (7.650086) − atan (STt − 1)) + STt − 1))) log (atan ((((sin (1.61438) + 1.61438) − STt − 1)^(1.0/3.0)))) + log (atan (((5.782898 − STt − 1)^(1.0/3.0)))) + ((((STt − 1 − STt − 1)* STt − 1)/(5.782898 − STt − 1)) + STt − 1) exp ((−7.712769 − ((2.036682 − STt − 1)*(STt − 1 + 2.036682)))) + exp ((((STt − 1/STt − 1) − (STt − 1 + 7.712769)) − STt − 1)) + STt − 1 sqrt (sin ((−5.015502 − (((STt − 1 − STt − 1)*−3.400207)^(1.0/3.0))))) + (STt − 1 * cos ((sin (−2.363617)^3))) + sin (atan (((−7.732025 − (STt − 1 + STt − 1)) + sin (STt − 1))))

STt = (((exp (STt − 1) − STt − 1) − cos (6.224151))* sin (sin (6.224151))) + (STt − 1 − ((exp (STt − 1) + STt − 1)/(6.540314 * 6.540314)))

STt = sin (sin (sin ((((STt − 1 − STt − 1)/STt − 1) + STt − 1)))) + STt − 1 + ((((STt − 1 − STt − 1)^3)^2) − atan ((exp (0.255005)* STt − 1))) STt = STt − 1 + STt − 1 + (((2.736023 − STt − 1) + (STt − 1 − 2.736023)) − (STt − 1 * atan (2.736023))) STt = sin (−2.363617) + (STt − 1 * cos (log ((sin ((STt − 1/STt − 1))^2)))) + cos (cos (((sin (−7.712769) + cos (2.036682))* cos (2.036682))))

Equations

S. Mehdizadeh, et al.

Soil & Tillage Research 197 (2020) 104513

Fig. 3. Measured vs. estimated daily ST values by the hybrid GEP-FARIMA and classical FARIMA and GEP models at a depth of 5 cm during the testing period (Isfahan station).

respectively. It can be observed that the hybrid models can accurately estimate and track the daily ST time series of the stations. By comparing the performance of GEP and FFBPNN models, it can be concluded that there are negligible discrepancies among the accuracy of mentioned models at diﬀerent depths. Furthermore, the GEP and FFBPNN models demonstrate superior results compared with the FARIMA model. As already mentioned and according to Eq. (4), the time series of ST data similar to the other hydrological and meteorological parameters is comprised of stochastic and deterministic components. The stochastic component of ST data is obtained via the time series models such as the FARIMA used in this study; while the GEP and FFBPNN are applied to estimate the deterministic component of ST data. Better performance of the GEP and FFBPNN models than the FARIMA can be justiﬁed considering the nature of these models so that the performance of machine-learning-based models to estimate the deterministic component is superior to the time series-based FARIMA model in capturing the stochastic component of ST data. The hybrid models perform much better than the classical models for estimating daily ST. On the other hand, the GEP-FARIMA models developed at various depths of the studied sites present a better relative performance in comparison with the FFBPNN-FARIMA models. The results of the present research work are in agreement with the outcomes of Tabari et al. (2015) and Zeynoddin et al. (2019) implying that the best performance of machine learning-based MLP approach and

9

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

Fig. 4. Measured vs. estimated daily ST values by the hybrid GEP-FARIMA and classical FARIMA and GEP models at a depth of 5 cm during the testing period (Urmia station).

Fig. 5. Measured vs. estimated daily ST values by the hybrid GEP-FARIMA and classical FARIMA and GEP models at a depth of 5 cm during the testing period (Rasht station).

ARIMA time series model was obtained at deep soil layers utilizing the antecedent ST data in the estimation process. On the contrary, the outcomes obtained are inconsistent with the results of other studies such as Hosseinzadeh Talee, 2014; Nahvi et al. (2016); Behmanesh and Mehdizadeh (2017). The authors reported that the better accuracy of machine learning models was observed at surface layers by using the other meteorological parameters as inputs of the models to estimate ST. Additionally, hybrid models have been developed in the literature (Mehdizadeh, 2018; Mehdizadeh and Kozekalani Sales, 2018; Mehdizadeh et al., 2017c, 2018b, 2019; Fathian et al., 2019) for estimating the other parameters (e.g., hydrologic parameters) by the combination of time series and machine learning models. A much better eﬃciency of the hybrid models than the classical models has been reported by the authors, which is similar to the results achieved in this study. The performance of all the models developed at the studied stations with various climates is compared herein. To this end, it was focused on the values of RRMSE index. The best performance of FARIMA at all depths is observed at Isfahan station, which includes the arid climate (see, Table 4). At Urmia and Rasht stations, various results were achieved. The accuracy of FARIMA models implemented at all depths of Rasht station is better than the corresponding models at Urmia station during the training stage. On the contrary, the FARIMA models of Urmia station at all depths presented superior results in comparison to

the relevant models of Rasht station during the testing period. Regarding the single GEP and FFBPNN models, it can be concluded from the Tables 6–8 that the best performance of the mentioned models was observed at the depth of 10 cm for the Rasht and Isfahan stations during the training and testing periods, respectively. For the other depths (i.e., 5, 50, 100 cm), the single machine learning-based models showed superior results at the Isfahan station in terms of having the lowest RRMSE values. In general, the weakest performance of single models at deep soil depths (i.e., 50 and 100 cm) among the studied stations was obtained at Rasht station. For the case of hybrid models developed, the highest accuracy of these models at the depths of 5, 10, 50, and 100 cm were generally concluded at Urmia, Isfahan, Isfahan, and Urmia stations, respectively. 4. Conclusions The present study estimated the daily ST data of three stations in Iran at 5, 10, 50, and 100 cm depths as one of the crucial parameters of soil. To this end, classical FARIMA (as a time series-based model), and GEP and FFBPNN (machine learning-based models), as well as hybrid GEP-FARIMA and FFBPNN-FARIMA models were developed. All applied models were trained and then tested by using antecedent ST data (i.e., one-day lagged ST values). The performance of models was investigated with respect to three statistical metrics such as RMSE, MAE, 10

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

and RRMSE. Assessing the accuracy of classical GEP and FFBPNN models showed that there were negligible diﬀerences between the performances of mentioned models at all depths. On the other hand, the FARIMA illustrated lower accuracies compared with the GEP and FFBPNN models for daily ST estimation of the studied stations for all depths. This study also proposed hybrid models by the combination of FARIMA with the GEP and FFBPNN models. It was concluded that the proposed hybrid models indicated superior performances in comparison with the classical models. Additionally, classical and proposed hybrid models presented the best performance at large soil depths (i.e., 50 cm for the hybrid models and 100 cm for the classical models); so that, the accuracy of models was decreased from deep to surface soil layers. The literature review showed that previous works published have mainly used machine learning models to estimate ST while a few studies have been reported in literature for ST estimation using the time series-based models. Therefore, these models are recommended to further be considered in future studies when estimating the ST. In addition, as seen, the machine learning models (e.g., GEP and FFBPNN) demonstrated better results than the time series model applied here (e.g., FARIMA) over a daily time scale. This point cannot be considered as a general conclusion since machine learning and time series models can display diﬀerent performances at diﬀerent time scales. Thus, future studies can compare the performance of these models for estimating ST at diﬀerent time horizons including daily, monthly, seasonally, and annually. Moreover, this study developed hybrid models via combining the GEP and FFBPNN models with the FARIMA. Hence, more hybrid models are recommended to be developed to improve the ST estimation by the combination of other machine learning models such as RF, MARS, ANFIS, etc. with the linear autoregressive (AR), autoregressive moving average (ARMA), ARIMA, and non-linear autoregressive conditional heteroscedasicity (ARCH), generalized ARCH (GARCH), selfexciting threshold autoregressive (SETAR) types of the time series models.

analytical model for estimating soil temperature proﬁles on the Qinghai-Tibet plateau of China. J. Arid Land 8 (2), 232–240. Kang, S., Kim, S., Oh, S., Lee, D., 2000. Predicting spatial and temporal patterns of soil temperature based on topography, surface cover and air temperature. For. Ecol. Manage. 136, 173–184. Kazemi, S.M.R., Minaei Bidgoli, B., Shamshirband, S., Karimi, S.M., Ghorbani, M.A., Chau, K.W., Kazem Pour, R., 2018. Novel genetic-based negative correlation learning for estimating soil temperature. Eng. Appl. Comput. Fluid Mech. 12 (1), 506–516. Kim, T.W., Valdés, J.B., 2003. Nonlinear model for drought forecasting based on a conjunction of wavelet transforms and neural networks. J. Hydrol. Eng. 8 (6), 319–328. Kim, S., Singh, V.P., 2014. Modeling daily soil temperature using data-driven models and spatial distribution. Theor. Appl. Climatol. 118 (3), 465–479. Kisi, O., Tombul, M., Zounemat Kermani, M., 2015. Modeling soil temperatures at different depths by using three diﬀerent neural computing techniques. Theor. Appl. Climatol. 121 (1–2), 377–387. Koza, J.R., 1992. Genetic Programming: on the Programming of Computers by Means of Natural Selection, vol. 1 MIT press. Li, H.J., Yan, J.X., Yue, X.F., Wang, M.B., 2008. Signiﬁcance of soil temperature and moisture for soil respiration in a Chinese mountain area. Agric. For. Meteorol. 148 (3), 490–503. Liu, H., Huang, B., 2005. Root physiological factors involved in cool-season grass response to high soil temperature. Environ. Exper. Bot. 53 (3), 233–245. Mehdizadeh, S., 2018. Estimation of daily reference evapotranspiration (ETo) using artiﬁcial intelligence methods: oﬀering a new approach for lagged ETo data-based modeling. J. Hydrol. 559, 794–812. Mehdizadeh, S., Kozekalani Sales, A., 2018. A comparative study of autoregressive, autoregressive moving average, gene expression programming and Bayesian networks for estimating monthly streamﬂow. Water Resour. Manage 32 (9), 3001–3022. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2017a. Evaluating the performance of artiﬁcial intelligence methods for estimation of monthly mean soil temperature without using meteorological data. Environ. Earth Sci. 76. https://doi.org/10.1007/s12665017- 6607-8. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2017b. Application of gene expression programming to predict daily dew point temperature. Appl. Therm. Eng. 112, 1097–1107. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2017c. A comparison of monthly precipitation point estimates using integration of soft computing methods and GARCH time series model. J. Hydrol. 554, 721–742. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2018a. Comprehensive modeling of monthly mean soil temperature using multivariate adaptive regression splines and support vector machine. Theor. Appl. Climatol. 133 (3–4), 911–924. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2018b. New approaches for estimation of monthly rainfall based on GEP-ARCH and ANN-ARCH hybrid models. Water Resour. Manage. 32 (2), 527–545. Mehdizadeh, S., Fathian, F., Adamowski, J.F., 2019. Hybrid artiﬁcial intelligence-time series models for monthly streamﬂow modeling. Appl. Soft Comput. 80, 873–887. Metcalfe, A.V., Cowpertwait, P.S., 2009. Introductory Time Series With R. SpringerVerlag, New York, USA. Nabi, G., Muillins, C.E., 2008. Soil temperature dependent growth of cotton seedlings before emergence. Pedosphere 18 (1), 54–59. Nahvi, B., Habibi, J., Mohammadi, K., Shamshirband, S., Al Razgan, O.S., 2016. Using self-adaptive evolutionary algorithm to improve the performance of an extreme learning machine for estimating soil temperature. Comput. Electron. Agric. 124, 150–160. Nourani, V., 2017. An emotional ANN (EANN) approach to modeling rainfall-runoﬀ process. J. hydrol. 544, 267–277. Rube, W., 2005. Carbon limitation of soil respiration under winter snowpacks: potential feedbacks between growing season and winter carbon ﬂuxes. Global Change Biol. 11 (2), 231–238. Safari, M.J.S., Aksoy, H., Mohammadi, M., 2016. Artiﬁcial neural network and regression models for ﬂow velocity at sediment incipient deposition. J. Hydrol. 541 (B), 1420–1429. Samadianfard, S., Ghorbani, M.A., Mohammadi, B., 2018a. Forecasting soil temperature at multiple-depth with a hybrid artiﬁcial neural network model coupled-hybrid ﬁreﬂy optimizer algorithm. Inform. Proc. Agric. 5 (4), 465–476. Samadianfard, S., Asadi, E., Jarhan, S., Kazemi, H., Keshtgar, S., Kisi, O., Sajjadi, S., Manaf, A.A., 2018b. Wavelet neural networks and gene expression programming models to predict short-term soil temperature at diﬀerent depths. Soil Till. Res. 175, 37–50. Sanikhani, H., Deo, R.C., Yaseen, Z.M., Eray, O., Kisi, O., 2018. Non-tuned data intelligent model for soil temperature estimation: a new approach. Geoderma 330, 52–64. Schimel, J.P., Bilbrough, C., Welker, J.M., 2004. Increased snow depth aﬀects microbial activity and nitrogen mineralization in two Arctic tundra communities. Soil Biol. Biochem. 36 (3), 217–227. Seyfried, M.S., Murdock, M.D., Hanson, C.L., Flerchinger, G.N., Van Vactor, S., 2001. Long-term climate database, Reynolds creek experimental watershed, Idaho, United States. Water Resour. Res. 37 (11), 2825–2830. Sihag, P., Esmaeilbeiki, F., Singh, B., Pandhiani, S.M., 2019. Model-based Soil Temperature Estimation Using Climatic Parameters: the Case of Azerbaijan Province. Iran. https://doi.org/10.1080/24749508.2019.1610841. Tabari, H., Hosseinzadeh Talaee, P., Willems, P., 2015. Short-term forecasting of soil temperature using artiﬁcial neural network. Meteorol. Appl. 22 (3), 576–585. Wu, X., Yao, Z., Brüggemann, N., Shen, Z.Y., Wolf, B., Dannenmann, M., 2010. Eﬀects of soil moisture and temperature on CO2 and CH4 soil–atmosphere exchange of various land use/cover types in a semi-arid grassland in Inner Mongolia. China. Soil Biol. Biochem. 42 (5), 773–787.

Declaration of Competing Interest The authors declare that there is no conﬂict of interest. References Araghi, A., Mousavi-Baygi, M., Adamowski, J., 2017. Detecting soil temperature trends in Northeast Iran from 1993 to 2016. Soil Till. Res. 174, 177–192. Behmanesh, J., Mehdizadeh, S., 2017. Estimation of soil temperature using gene expression programming and artiﬁcial neural networks in a semiarid region. Environ. Earth Sci. https://doi.org/10.1007/s12665-017-6395-1. Bilgili, M., 2011. The use of artiﬁcial neural networks for forecasting the monthly mean soil temperatures in Adana. Turkey. Turk. J. Agric. For. 35, 83–93. Børresen, M.H., Barnes, D.L., Rike, A.G., 2007. Repeated freeze–thaw cycles and their eﬀects on mineralization of hexadecane and phenanthrene in cold climate soils. Cold Reg. Sci. Technol. 49 (3), 215–225. Brar, G.S., Steiner, J.L., Unger, P.W., Prihar, S.S., 1992. Modeling sorghum seedling establishment from soil wetness and temperature of drying seed zones. Agron. J. 84, 905–910. de Martonne, E., 1925. Traité de géographie physique. 3 tomes. Paris. Fathian, F., Mehdizadeh, S., Kozekalani Sales, A., Safari, M.J.S., 2019. Hybrid models to improve the monthly river ﬂow prediction: integrating artiﬁcial intelligence and nonlinear time series model. J. Hydrol. 575, 1200–1213. Feng, Y., Cui, N., Hao, W., Gao, L., Gong, D., 2019. Estimation of soil temperature from meteorological data using diﬀerent machine learning models. Geoderma 338, 67–77. Ferreira, C., 2001. Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst. 13 (2), 87–129. Goldberg, D.E., 1989. Genetic algorithm in search. Optimization & Machine Learning. Addison-Wesley, New York. Hillel, D., 1998. Environmental Soil Physics. Academic Press (1998). 771 PP. Hipel, K.W., McLeod, A.E., 1996. Time Series Modeling of Water Resources and Environmental Systems. Elsevier, Amsterdam. Holland, J.H., 1975. Adaptation in Natural and Artiﬁcial Systems. University of Michigan Press, Cambridge, Massachusetts, USA. Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366. Hosseinzadeh Talee, P., 2014. Daily soil temperature modeling using neuro-fuzzy approach. Theor. Appl. Climatol. 118 (3), 481–489. Hu, G., Lin, Z., Wu, X., Ren, L., Wu, T., Xie, C., Qiao, Y., Shi, J., Cheng, G., 2016. An

11

Soil & Tillage Research 197 (2020) 104513

S. Mehdizadeh, et al.

Yang, G., Bowling, L.C., 2014. Detection of changes in hydrologic system memory associated with urbanization in the Great Lakes region. Water Resour. Res. 50 (5), 3750–3763. Zeynoddin, M., Bonakdari, H., Ebtehaj, I., Esmaeilbeiki, F., Gharabaghi, B., Haghi, D.Z., 2019. A reliable linear stochastic daily soil temperature forecast model. Soil Till. Res. 189, 73–87.

Wu, W., Tang, X.P., Guo, N.J., Yang, C., Liu, H.B., Shang, Y.F., 2013. Spatiotemporal modeling of monthly soil temperature using artiﬁcial neural networks. Theor. Appl. Climatol. 113 (3–4), 481–494. Xing, L., Li, L., Gong, J., Ren, C., Liu, J., Chen, H., 2018. Daily soil temperatures predictions for various climates in United States using data-driven model. Energy 160, 430–440.

12

Developing novel hybrid models for estimation of daily soil temperature at various depths

Developing novel hybrid models for estimation of daily soil temperature at various depths

Recommend Documents