Prediction for hog prices based on similar sub-series search and support vector regression

Computers and Electronics in Agriculture 157 (2019) 581–588 Contents lists available at ScienceDirect Computers and Electronics in Agriculture journ...

Download PDF

1MB Sizes 0 Downloads 46 Views

Report

PDF Reader
Full Text

Computers and Electronics in Agriculture 157 (2019) 581–588

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag

Original papers

Prediction for hog prices based on similar sub-series search and support vector regression

T

⁎

Yiran Liua, Qingling Duana, , Dongjie Wangb, Zhentao Zhangc, Chunhong Liua a

College of Information and Electrical Engineering in China Agricultural University, Beijing 100083, China Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China c Animal Health Supervision Institute of Shandong Province, Jinan 250022, China b

A R T I C LE I N FO

A B S T R A C T

Keywords: Time series Dynamic time warping distance Support vector regression Hog price Prediction

Predicting hog price is important for making decisions for administration sections and pig-breeding enterprises. Hog prices follow a time series that is non-stationary, non-linear and has a pseudo-period, and that changes as a result of potential growth, cyclical ﬂuctuation and errors. Considering the diﬀerent characteristics of the trend component and the cyclical component in prediction, in this paper, we propose a hog price prediction method to address the problem of pseudo-cycle caused by the varying cycle length. We begin by separating the cyclical component and the trend component of the hog price series. We then predict the cyclical component of hog price series using a most similar sub-series search method, and predict the trend component using support vector regression. Finally, we combine the predicted series. Our main contributions are proposing a method that predicts the cyclical and trend components of hog prices separately, and designing a most similar sub-series search method to predict the cyclical component. In experiments on real datasets, our method has minor errors and exhibits superior performance compared with existing methods. It is suitable for predicting the price series of hog and other agricultural products with similar characteristics.

1. Introduction Pork production topped the list of global meat production in 2016 according to the Food and Agriculture Organization's food outlook report. It is of signiﬁcant importance to grasp the changing of hog prices timely and accurately for the stable and healthy development of pig market. However, there are two main problems in hog price prediction: hog prices follow pseudo-periodic time series that are non-linear and non-stationary. Pseudo-periodic time series is a widespread data type (Horvatic et al., 2011). The “cycle” of hog price repeats, but the cycles are not identical, with diﬀerences in length and shape, which causes diﬃculty in prediction (Xu et al., 2014; Ernst and Ray, 2015). On the other hand, hog price is inﬂuenced by factors such as policy, capital and disease, which are diﬃcult to quantify and predict. This not only leads to the uncertainty of the length of hog price cycle, but also makes it diﬃcult to select the inﬂuencing factors in prediction. Fortunately, all the impact factors are included and interpreted in historical price. Therefore, the historical price of hog is used as the only basis of prediction in this paper. Existing studies have shown that changes in a hog price series are a combination of potential growth and short-term ﬂuctuations (Wang et al., 2016). That is, multiple factors such as the

⁎

trend component, cyclical component and errors lead to changes. According to the diﬀerent characteristics of the cyclical and trend component of hog price series, in this paper, we propose a prediction model for hog prices based on similar sub-series search and support vector regression to address the problem of the variation of cycle length. Current research into hog prices focuses on two main aspects: hog price prediction and hog price cycle identiﬁcation. The prediction methods can be divided into one-factor methods and multiple-factor methods depending on the data, or into single method and combined methods depending on the structure. One-factor prediction methods use hog price itself or some other single factor as the explanatory variable. Li et al. (2013) used historical hog prices for short-term prediction based on a chaotic neural network optimized using a genetic algorithm, but the 20-day predicted value is not signiﬁcant for making production decisions. Sun (2013) separated the hog price into a trend component, a cyclical component and a seasonal component, and considered the breeding sow stocks or hog-grain price ratio as a factor when predicting the cyclical component. However, the sine function used does not ﬁt the cyclical component well when the cycle length is changing. Multiplefactor methods use factors such as corn prices, piglet prices and hoggrain ratio together to predict hog prices. The experiments conducted

Corresponding author. E-mail address: [email protected] (Q. Duan).

https://doi.org/10.1016/j.compag.2019.01.027 Received 18 January 2018; Received in revised form 26 June 2018; Accepted 20 January 2019 Available online 29 January 2019 0168-1699/ © 2019 Elsevier B.V. All rights reserved.

Computers and Electronics in Agriculture 157 (2019) 581–588

Y. Liu et al.

never exactly repeats itself (Yin et al., 2014). This is why pseudo-periodic series are diﬃcult to segment according to natural time. There are many pseudo-cyclical series in daily life, such as the electrocardiogram and the activity sequence of the sunspots. However, there is no universally accepted deﬁnition of the pseudo periodic time series. The deﬁnition of it is tried to be given as follows. Let S = {e1, e2, e3, …, eN} be a time series, cyclical segments {S1, S2, S3, …, Sm} can be got by segmenting the time series S according to the feature points of S. For any i ≠ j and i, j ∈ [1, m], formulation (1) can be satisﬁed.

by Ole (1997) demonstrated that whether an autoregressive model is better than the naive model and whether each factor contributes to the prediction is related to dataset. In terms of single prediction methods, Ding and Meng (2012) predicted the annual hog price using support vector regression (SVR), but the time-lag was not considered when the explanatory variables were selected. Combined methods mostly combine time series analysis and machine learning algorithms in series, in parallel, or nested. Ping et al. (2010) combined a three-order exponential smoothing model, a neural network model and the gray system in series, but the accuracy of the combined model needs to be improved. Hog price cycle identiﬁcation attempts to separate the cyclical component of hog prices and study its statistical characteristics. Hodrick-Prescott (H-P) ﬁlter was applied to analyze pork price cycle and the results indicating that they are not static (Holst and CramonTaubadel, 2012). While high-frequency cobweb theory is used to decompose the hog cycle, it is believed that there may be several short cycles within a longer cycle (Talpaz, 1974). From the analysis above, there is little research that separates the cyclical and trend components and none that considers the characteristics of pseudo-cycles when predicting hog prices. To this end, this paper makes the following contributions to the literature. We propose a prediction method that decomposes hog price series into a trend component and a cyclical component and then predicts these components, which takes into full consideration of the characteristics of each component. We also design a most similar sub-series search method based on the dynamic time warping (DTW) distance to predict the cyclical component of hog prices, which considers the pseudo-cycle caused by the varying cycle length. This paper is organized as follows: Section 2 describes the data source, the characteristics of the hog price series and the proposed prediction model in this paper. Section 3 describes the settings, results and analysis of the experiments. Conclusions are drawn in Section 4.

length(Si) ≠ length(Sj)

where N is the number of the elements in the series, and m is the number of the segments. A series is composed of elements, and the length is taken as the number of series elements. So function length(S) is used to counting the element number of series S. The deﬁnition of the feature point is as follows. If ei is a feature point (1 ≤ i ≤ N), it has to meet two conditions. ① ei is a local maximum (minimum) value of S. ② ei must satisfy formulation (2). ei − ei − 1 ei − 1 ⎨ ei − 1 − ei ⎩ ei

⎧

> Ψ , ei < ei − 1

(2)

2.2.2. Non-stationarity For stationary time series, the mean, variance and covariance are time-independent constants. The stability of time series can be proved easily by unit root test and the common methods are ADF test, KPSS test and PP test. In this paper, the ADF test is used to carry out the experiment. At a signiﬁcance level of 0.05, the results of two pig price series, pork price series and piglet price series are all failed to reject the unit-root null hypothesis, which proves that they are all non-stationary time series.

2.1. Data source We collect data from the livestock website for Shandong Province1 and the agricultural products price website for Beijing Xinfadi2 from January 2011 to March 2017. Our datasets include two hog price series, a pork price series and piglet price series, which are described in Table 1. The statistical features of the maximum, minimum, mean and standard deviation are calculated. In general, the standard deviation of the hog price series, pork price series and piglet price series are large, which indicates that the dispersion degrees of the price series are large. In addition, the dispersion degree of the pig prices in the dataset 2 is greater than the pig prices in the dataset 1.

2.3. Overall process of our method Considering the diﬀerent characteristics of each component of the price series, in this paper, we propose a prediction model for hog prices prediction. Fig. 2 shows the overall process of the prediction method for hog prices. The method includes the following steps. Step 1: Separate the cyclical component and the trend component of the hog price series using the Hodrick-Prescott ﬁlter. Step 2: Calculate the average cycle length and time delay of the cyclical component and then construct a sub-series matrix. The cyclical component is predicted according to the most similar subseries in the template matrix. Step 3: Predict the trend component using the SVR model. Step 4: Add the two components.

2.2. Characteristics of hog price series In addition to the obvious nonlinearity, hog price series also have the characteristics of pseudo-cycle and non-stationary. In order to fully demonstrate the characteristics of the hog price series, in this section, we describe pseudo periodicity of the hog price series and show the stationary test results. Moreover, the formal deﬁnition of pseudo-periodicity is tried to be given.

2.4. Separating the cyclical component and the trend component

2.2.1. Pseudo-periodicity When a time series presents oscillatory changes, we consider that the series has periodicity. For example, a network traﬃc series completes an oscillation once a day so that its cycle is 1 day. Pseudo-periodic time series refers to the time series appears highly periodic, but

2

> Ψ , ei > ei − 1

where Ψ is a given threshold, and the value is usually 0.1 (Liu et al., 2014). The ﬁrst and last points of the series are feature points by default. As is shown in Fig. 1, hog price series repeat once in a while, but not exactly the same for each repetition. This is especially obvious when the cyclical and trend components of hog price series are separated.

2. Materials and methods

1

(1)

The changes of hog prices are the combination of potential growth and short-term ﬂuctuation. That is, the trend component, the cyclical component and errors lead to variations in hog prices. On the one hand, the trend component is non-stationary and non-linear, but the cyclical component is stationary and pseudo-periodic. On the other hand, trend component is more predictable than cyclical component when making long-term forecasts. To avoid interaction between the two components, we separate them using the H-P ﬁlter and then predict each component.

http://www.sdxm.gov.cn/col/col782/index.html. http://www.xinfadi.com.cn/marketanalysis/3/list/1.shtml. 582

Computers and Electronics in Agriculture 157 (2019) 581–588

Y. Liu et al.

Table 1 Description of dataset. Dataset No.

Dataset name

Data source

Number of the samples

Maximum (yuan/kg)

Minimum (yuan/kg)

Mean (yuan/kg)

Standard deviation (yuan/kg)

1 2 3 4

Hog price 1 Hog price 2 Pork price Piglet price

Shandong Province Beijing Xinfadi Shandong Province Shandong Province

320 312 320 320

20.92 26.08 32.62 45.76

10.04 12.8 18.30 13.54

15.32 19.12 25.54 23.46

2.31 2.95 3.26 7.27

(a) Original hog price series

(b) Cyclical component of hog price series Fig. 1. Pseudo-periodicity of hog price series and its cyclical component.

The H-P ﬁlter is a common method for separating trend and cyclical components. The trend component is the solution of the following optimization problem:

Separate of the cyclical component and trend component of the hog price series

Cyclical component Calculate the average cycle and time delay

Trend component

Train a SVR model

Construct sub -series matrix Search the most similar subseries and predict

T

T

⎧ ∑ (xt − yt )2 + λ ∑ [(yt − yt−1) − (yt−1 − yt−2 )]2 ⎫⎬ {yt}t=−1 ⎨ t = 1 t=1 ⎩ ⎭ min T

(3)

where {xi} tT=1 is the original series and {yi} tT=1 is the trend component. The cyclical component of a time series is the diﬀerence between the original time series and the trend component (Alessandra and Alain, 2005).

Predict by the SVR model

Add up the two components

Ct = Pt − Tt

Fig. 2. Flow chart of the prediction method for hog prices.

Pt = {x i} tT=1, 583

(4)

Tt = {yi }tT=1

(5)

Computers and Electronics in Agriculture 157 (2019) 581–588

Y. Liu et al.

phase of the series P in the cycle is uncertain when making predictions. So P is initialized with the last k values in the historical cyclical component series, which is used to ﬁnd the most similar sub-series in the matrix Pattern. The value adjacent to the most similar sub-series is used as the ﬁrst value of the prediction series P. In addition, the similarity measurement when P matches to the row vectors in Pattern is the DTW distance, which is deﬁned in Eqs. (8)–(10) (Berndt and Clihord,1994; Liu et al., 2014).

where Ct is the cyclical component. After the trend component and the cyclical component are predicted, the ﬁnal prediction results can be obtained by merging the two components. 2.5. Prediction of cyclical component The cyclical component of the hog price series does not repeat exactly because of the time delays and bending points. That is to say, any sub-series have appeared previously, but the time when it appeared is unknown. Furthermore, there are small diﬀerences each time it appears. Therefore, the DTW distance is used to measure the similarity between the cyclical component of the prediction series and the previous series.

Dist(X S, YS) = f (n, m) = dist(n, m) ⎧ f (i, j ) = dist(i, j ) = min

2.5.1. Sub-series matrix construction Segmentation of time series is commonly achieved using natural time points and feature points. We segment the cyclical component of the hog price series in an overlapping way, which avoids the bending and lagging problems of the cyclical component to some extent. In this way, not every segment contains a complete cycle, but all the information in the cycles is completely retained. The construction method of the sub-series matrix is described below. We segment the cyclical component of the price series in an overlapping way and the lengths of the sub-series are s. Each time a section is separated, the window slides ahead one step. Assume that the length of the cyclical component is l after removing the last k values, so we obtain (l − s + 1) sub-series. The sub-series are then used to construct a matrix, which is a template for matching the follow-up series to the historical data.

c c2 ⎡ 1 c3 Pattern = ⎢ c…2 … ⎢c c ⎣ l−s+1 l−s+2

… cs ⎤ … cs + 1 … … ⎥ … cl ⎥ ⎦

f (0, 0) = 0, f (i, 0) = f (0, j ) = ∞ (i = 1, 2, … n; j = 1, 2, …m)

Support vector regression (SVR) model can be applied to non-linear time-series prediction problems and it is also applicable to the problems with small sample set (Gu et al., 2016; Kromanis and Kripakaran, 2013; Leksakul et al., 2015). The trend component of hog price series has the non-linear characteristic obviously, so SVR is used to make predictions of the trend component. 2.6.1. Training set construction The trend component of hog price series is a one-dimensional time series, so it is predicted by autoregressive model, which is using the series itself as a regression variable and using the previous p-order variable value to predict the next variable (Jonathan and Kung, 2008).

(7)

where AP represents the average period length and τ is the time delay. The value of s is selected by experiment, and the optimal is chosen to establish the model. The values of s of the 4 datasets are listed in the Table 2. 2.5.2. Most similar sub-series search The DTW distance is a distance measurement method that considers the overall characteristics of time series and is the minimum distance between two time series that can be obtained by bending the time axis, which to some extent avoids the eﬀects of data oﬀsetting, stretching and omission relative to the axis. This distance measure has been widely applied in speech recognition and pattern matching (Cai et al., 2017; Zhu et al., 2018; Lv et al., 2017). The prediction of the cyclical component can be expressed as P = {xp1, xp2, …, xpk}, where k is the number of prediction steps. The Table 2 Mean periods, time delay and S value of the 4 datasets. Average periods

τ

s

1 2 3 4

17 17 17 17

3 4 4 3

18 17 16 18

(10)

2.6. Prediction of trend component

The matrix above is a matrix of sub-series, where c is an element of the cyclical component of the hog price series. The segment length s is a variable that aﬀects the prediction accuracy. The tolerance for delays and curves on time-axis problems is small when s is small, but the accumulation of errors is large when it is large. In this paper, the value range of s is expressed as follows:

Dataset No.

(9)

where XS and YS are sub-series of the cyclical component of the hog price series, XS = {a1, a2, …, an}, YS = {b1, b2,…, bm}, and n and m are the lengths of XS and YS, respectively. We then again search for the most similar sub-series, and continue this process until k predictions have been obtained. A ﬂow chart of the most similar sub-series search and prediction method is shown in Fig. 3. According to the deﬁnition of the DTW distance, the number of prediction steps k and the segment length s can be diﬀerent.

(6)

[AP − τ , AP + τ ]

f (i, j − 1) f (i − 1, j ) ⎨ ⎩ f (i − 1, j − 1)

(8)

Fig. 3. Prediction method based on most similar sub-series search. 584

Computers and Electronics in Agriculture 157 (2019) 581–588

Y. Liu et al.

x t = θ1 x t − 1 + θ2 x t − 2 + …+θp x t − p + et

(11)

where xt is the current value of the time series, which is deﬁned as a linear combination of its nearest p-order hysteresis, and et contains all the new information that can’t be interpreted with the past values. The trend component of hog price series is a non-stationary time series so that need a smoothing processing. After the logarithm and the ﬁrst order diﬀerence, the original series has passed the ADF Test and becomes stationary. The training set is constructed with the corresponding p-order variables of the elements in the label set.

x x2 ⎡ 1 x x 2 3 T= ⎢ … … ⎢ ⎣ xn−p xn−p +1

MAPE =

(12)

ε

(13)

N

t=1

i=1

(18)

2

( ∑i = 1 (yi − yi )(∼ yi −  yi )) N N ∼ 2 y )2 ∑ (y − y ) ∑ (y −  i

i

i=1

i

(19)

i

Our method is compared with several common methods of time series prediction, including single and combined models. Moreover, the prediction results of the cyclical components and the trend components are compared with other methods, respectively. It is proved that the model established in this paper is eﬀective in predicting the hog price and other price series with similar characteristics.

(14)

3.3.1. Predictions of hog price series We compared the results of our method, the SVR model, the Wavelet-SVR model, and the BPNN (Back Propagation natural network) model on 4 datasets (Duan et al., 2017b; Liu et al., 2015; Haviluddin and Alfred, 2016). SVR model and BPNN model are commonly used nonlinear time series prediction models, especially in solving complex nonlinear regression problems. Wavelet-SVR is a kind of combined model which is used to solve the single variable chaotic time series prediction. The wavelet function decomposes the time series into subseries with diﬀerent frequency domains, and then considers each subseries with the resolution corresponding to its frequency (Maryam and Ozgur, 2016). The main parameters of each model are shown in Table 4. Table 5 and Fig. 5 are the results of comparison experiments.

P

∑ wt ·exp(−γ·xt − x 2) + b

yi − ∼ yi × 100% yi

N

∑

3.3. Result and discussion

where ξ T and ξT∗ are slack variables. Deviations larger than ε are tolerated and this corresponds to dealing with a so called ε-insensitive loss function (Smola and Schölkopf, 2004). The trend component of hog price series is non-linear. However, there are a lot of possible non-linear kernel functions that can be used to create such high dimensional feature space. The most commonly used kernel functions are the polynomial inner-product functions, the Sigmoid Function and the Radial Basis Function (RBF) (Zhou, 2016; Duan et al., 2017a). The RBF kernel is employed after experiments. The decision function of the problem is as follows.

xT = f (xT − 1, …, xT − P ) =

(17)

i=1

It takes about 4 months to breeding a pig from piglet to slaughter pig. In order to allow farmers and pig-breeding enterprises to estimate the slaughter prices of a batch of piglets, we make our predictions for 18 weeks of hog prices. The last 18 samples of the series are used as test samples and the other samples are used as training samples to validate the eﬀectiveness of our hog price prediction method. The parameter settings on the 4 datasets are shown in Table 3, they are all obtained by genetic algorithm (Beenstock and Szpiro, 2002). The line chart of the predicted price series and real price series is shown in Fig. 4, the abscissa is the time axis with a total of 18 weeks and the vertical axis shows the prices. On each dataset, the predicted price ﬁts well with the true values over a longer period of time. Therefore, the method based on similar sub-series search and support vector regression is a suitable and eﬀective method to predict hog price.

+ C ∑T = P + 1 (ξT + ξT∗)

xT − f (xT − 1, …, xT −P ) ⩽ ε + ξT ⎧ ⎪ f (xT − 1, …, xT −P ) − xT ⩽ ε + ξT∗ s. t . ⎨ ⎪ ξT , ξT∗ ⩾ 0 ⎩

1 N

N

yi )2 ∑ (yi − ∼

3.2. Results and analysis of the predictions

where P is the number of past values, T is the current time set, C is a positive constant inﬂuencing the degree of penalizing loss when a training error occurs and ε is insensitive loss parameter. Under circumstance when formulation (13) is infeasible, slack variables are introduced and we get formulation (14). 1 (wT w) 2

(16)

where yi denotes the true values, ∼ yi denotes the predicted values, yi denotes the mean value of the real values,  yi denotes the mean value of the predicted values, and N denotes the number of predicted values.

N

min

1 N

i=1

2.6.2. Prediction based on SVR If the number of samples is n, then the trend component can be segmented into (n − p + 1) sub-series as described above. Use p-order hysteresis as known quantities to construct the prediction function, and the optimization model is as follows.

1 T (w w) + C ∑ xT − f (xT − 1…,xT − P ) 2 T=P+1

yi − ∼ yi

i=1

N

where T is the training set, L is the label set, and xi represents an element of the trend component of the hog price series.

min L (w) =

N

∑

RMSE =

R2 =

… xp x ⎤ ⎡ p+1⎤ … xp+1⎥ L = ⎢ xp+2 ⎥ … … ⎥ ⎢ … ⎥ … xn − 1⎦ ⎣ xn ⎦

1 N

MAE =

(15)

where γ is a kernel-speciﬁc parameter, wt is the coeﬃcient of prediction function and b is a bias term. 3. Experimental results and discussion 3.1. Performance criteria

Table 3 Parameter values of our method in the 4 datasets.

In this paper, the models were established using MATLAB 2014a programming language and libsvm-3.11 toolbox and the experiments were carried out on 4 datasets. The prediction performance of our method was examined using mean absolute error (MAE), root mean square error (RMSE), mean absolute percent error (MAPE), and coeﬃcient of determination (R2). They are calculated as Eqs. (16)–(19) shown. 585

Dataset No.

C

γ

ε

p

S

1 2 3 4

0.036 38.99396 28.2680 74.4612

99.96 39.007545 0.1551 0.7528

0.01 2.155344e−06 0.01 0.0265

12 13 14 14

18 16 17 16

Computers and Electronics in Agriculture 157 (2019) 581–588

Y. Liu et al.

(b) Dataset 2

(a) Dataset 1

(d) Dataset 4

(c) Dataset 3 Fig. 4. Result of hog price prediction. Table 4 Parameter values of the models in the 4 datasets. Dataset No.

Table 5 Evaluations of diﬀerent methods.

1

2

3

4

Dataset No. Our method

SVR

C γ ε

0.05996 99.56241 0.03006

95.1357 0.271607 0.00771046

44 10.1102 0.0121923

40.22 99.9999 0.0465844

Wavelet-SVR

C γ ε WN

12.7489 33.7216 0.0395119 haar

1 21.45 0.1181 haar

99 10.1102 0.0121923 db4

10.75 4.7190 0.0851 db4

NIN NON HL IN LR

12 1 1 6 0.1

13 1 1 7 0.1

14 1 1 7 0.1

14 1 1 7 0.1

BPNN

Note: WN denotes wavelet name; NIN denotes the number of input nodes; NON denotes the number of output nodes; HL denotes the number of hidden layers; IN denotes initial nodes; LR denotes learning rate.

Table 5 shows that on the 4 datasets, the performance of our method is better than that of other models. Compared with the SVR model, the MAE of our method is reduced by 24.61%, 45.57%, 23.08%, and 42.09%, respectively. The RMSE is reduced by 28.75%, 52.29%, 2.40%, and 45.13%, respectively. The MAPE is reduced by 14.36%, 56.94%, 41.32% and 48.28%, respectively. And the R2 is raised by a minimum of 16.25%. Compared with the Wavelet-SVR model, the MAE of our method is reduced by 58.58%, 42.08%, 59.94%, and 5.88%, respectively. The RMSE is reduced by 60.31%, 43.66%, 58.87%, and 0.59%, respectively. The MAPE is reduced by 66.02%, 60.09%, 7.38% and 41.66%, respectively. The R2 is raised by a minimum of 6.10%. But on the dataset 4, the R2 is slightly decreased by 1.25%. Compared with the BPNN model, the MAE of our method is reduced by 24.77%, 45.87%, 29.78%, and 28.92%, respectively. The RMSE is reduced by 24.08%, 44.39%, 26.45%, and 25.50%, respectively. The MAPE is reduced by 24.77%, 30.07%, 28.82% and 46.79%, respectively. And the R2 is raised by 1.16% at least. It can be seen from Fig. 5 that the series predicted by the BPNN model and the SVR model are so smooth that can not ﬁt the ﬂuctuation in the actual price series well, while the

1

2

3

4

MAE RMSE MAPE R2

0.2883 0.3578 1.6882 0.93

0.3382 0.4882 1.1456 0.94

1.08882 1.33025 3.2787 0.79

0.5688 0.6945 2.6343 0.87

SVR

MAE RMSE MAPE R2

0.3824 0.5022 1.9712 0.80

0.439674 0.500223 2.6606 0.77

1.8802 2.4245 5.5871 0.62

1.0450 1.4556 5.0938 < 0.5

Wavelet-SVR

MAE RMSE MAPE R2

0.6715 0.9156 4.9680 < 0.5

0.8443 1.1870 2.8704 < 0.5

1.1568 1.3382 3.5401 0.80

0.9821 1.2327 4.5152 0.82

BPNN

MAE RMSE MAPE R2

0.3832 0.4713 2.2441 0.83

0.4816 0.6638 1.6383 0.69

1.5318 1.7857 4.6065 0.64

1.0509 1.2489 4.9505 0.86

wavelet-SVR model has too much ﬂuctuation, which leads to a large deviation from the real value. It should be pointed out in particular that the coeﬃcient of determination can reach more than 90% when our method is used, while other methods can’t. If the results of trend components prediction can reach a high R2, the prediction results of the price series can also. When the interference of cyclical components is eliminated, the ﬁtting eﬀect of SVR model is very good in trend components prediction. 3.3.2. Predictions of cyclical components In order to prove the eﬀectiveness of the most similar sub-series search method in predicting the cyclical components of hog price series, we compared it with the SVR model in cyclical components prediction. The prediction results are shown in Table 6. As is shown in Table 6, overall, the results of the most similar subseries search method are better than the SVR model in predicting the cyclical components of hog price series. Compared with the SVR model, the MAE of our method is reduced by 5.28%, 8.79%, −0.07%, and 586

Computers and Electronics in Agriculture 157 (2019) 581–588

Y. Liu et al.

(b) Dataset 2

(a) Dataset 1

(d) Dataset 4

(c) Dataset 3 Fig. 5. Predicted series of diﬀerent methods.

respectively. And the R2 is raised by 4.30%, 0.00%, 13.79%, and 3.23%, respectively. Only on dataset 2, the RMSE of the SVR model is slightly larger than that of BPNN model by 2.72%, but the MAE of the SVR model is much smaller than that of BPNN model. From the analysis above, we ﬁnd the following conclusions. In the ﬁrst place, the prediction method for hog prices proposed in this paper has high accuracy. Compared with existing methods, our method performs better with respect to the evaluation methods. Secondly, the most similar sub-series search method is suitable for the prediction of the cyclical component of hog price series, while in terms of trend component prediction, the SVR model is more accurate than the BPNN model.

Table 6 Comparison of cycle component predicted by most similar series search method and SVR model. Dataset No.

1 2 3 4

Most similar sub-series search

SVR

MAE

RMSE

MAE

RMSE

0.2690 0.4163 0.7268 0.2762

0.3002 0.5193 0.8699 0.3455

0.2840 0.4564 0.7263 0.3398

0.3708 0.5535 0.8785 0.4112

18.72%, respectively. The RMSE is reduced by 19.04%, 6.18%, 0.98%, and 15.97%, respectively. Only on dataset 3, the average absolute error of the most similar sub-series search method is slightly higher than that of SVR model by 0.07%.

4. Conclusion In this paper, we proposed a prediction method for hog prices. We ﬁrst separate the cyclical component and the trend component from the hog price series, and then predict these components using the most similar sub-series search method and the SVR model according to the characteristics of the two parts. Finally, we merge the cyclical and trend components to obtain the prediction results. The method proposed achieves high accuracy in the hog price prediction task by eliminating the interference between the cyclical component and the trend component. The advantages of the proposed method are reinforced by experiments on real datasets. The DTW distance is used to solve the problem of the pseudo-cycle, and the most similar sub-series search method is designed to predict the cyclical component of the hog price series. Our method can locate a cyclical phase in the predicted series, and allows bending and oﬀsetting of the cyclical component along the time axis, which guarantees good prediction performance.

3.3.3. Predictions of trend components In this paper, the prediction of trend component is an important part of our method. In order to prove the eﬀectiveness of the SVR model in predicting trend components of the hog price series, we compared the SVR model with the BPNN model. The prediction results are shown in Table 7. As is shown in Table 7, overall, the results of the SVR model are better than that of BPNN model in predicting the trend components of hog price series. Compared with the BPNN model, the MAE of SVR model is reduced by 6.20%, 8.94%, 21.27%, and 15.42%, respectively. The RMSE is reduced by 17.67%, −2.72%, 20.23%, and 8.65%, Table 7 Comparison of trend component predicted by SVR and BPNN. Dataset No.

1 2 3 4

SVR

BPNN

MAE

RMSE

R2

MAE

RMSE

R2

Acknowledgements

0.2420 0.2851 0.1795 0.5401

0.3098 0.3394 0.2109 0.7204

0.97 0.99 0.99 0.96

0.2580 0.3131 0.2280 0.6386

0.3763 0.3304 0.2644 0.7886

0.93 0.99 0.87 0.93

This work is supported by the National High Technology Research and Development Program (863 Plan) (2013AA102306). This work is also supported by the National Key Research and Development Plan (13th Five-year Plan) (2016YFD0700200). 587

Computers and Electronics in Agriculture 157 (2019) 581–588

Y. Liu et al.

References

Lv, Z., Zhang, B.B., Wu, X.P., Zhang, C., Zhou, B.Y., 2017. A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation. Speech Commun. 92, 132–141. https://doi.org/10.1016/j.specom.2017.06.007. Leksakul, K., Holimchayachotikul, P., Sopadang, A., 2015. Forecast of oﬀ-season longan supply using fuzzy support vector regression and fuzzy artiﬁcial neural network. Comput. Electron. Agric. 118 (C), 259–269. https://doi.org/10.1016/j.compag.2015. 09.002. Liu, S.Y., Xu, L.Q., Li, D.L., Zeng, L.H., 2014. Online prediction for dissolved oxygen of water quality based on support vector machine with time series similar data. Trans. Chinese Soc. Agric. Eng. 30 (3), 155–162. https://doi.org/10.3969/j.issn.1002-6819. 2014.03.021. Liu, L., Jiang, H.H., Wang, J., Rui, W.Z., 2015. Arma-svr network traﬃc prediction method based on wavelet analysis. Comput. Eng. Des. 36 (8), 2021–2025. https:// doi.org/10.16208/j.issn1000-7024.2015.08.005. Maryam, S., Ozgur, K., 2016. Lake level forecasting using wavelet-SVR, wavelet-ANFIS and wavelet-ARMA conjunction models. Water Resour. Manage. 30 (1), 79–97. https://doi.org/10.1007/s11269-015-1147-z. Ole, G., 1997. Forecasting quarterly hog prices: simple autoregressive models vs. naive predictions. Agribusiness 13 (6), 673–679. https://doi.org/10.1002/(SICI)15206297(199711/12)13:6<673::AID-AGR11>3.0.CO;2-1. Ping, P., Liu, D.Y., Yang, B., et al., 2010. Research on the combinational model for predicting the pork price. Comput. Eng. Sci. 32 (5), 109–112. https://doi.org/10.3969/j. issn.1007-130X.2010.05.029. Sun, J.M., 2013. Pork price forecast based on breeding sow stocks and hog-grain price ratio. Trans. Chinese Soc. Agric. Eng. (Trans. CSAE) 29 (13), 1–6. https://doi.org/10. 3969/j.issn.1002-6819.2013.13.001. Smola, A.J., Schölkopf, B., 2004. A tutorial on support vector regression. Stat. Comput. 14, 199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88. Talpaz, H., 1974. Multi-frequency cobweb model: decomposition of the hog cycle. Am. J. Agric. Econ. 57 (1), 38–49. https://doi.org/10.2307/1238855. Wang, D.S., Zhao, P.G., Ge, X.S., et al., 2016. Changes of pork price in Beijing city based on X12-ARIMA method. Food Nutrit. China 22 (10), 48–52. https://doi.org/10.3969/ j.issn.1006-9577.2016.10.011. Xu, B., Shi, L., Liu, Y., 2014. Forecast and empirical study on pig price in china. Issues Agric. Econ. 8, 25–32. https://doi.org/10.13246/j.cnki.iae.2014.08.004. Yin, N., Wang, S., Hong, S., Li, H., 2014. A segment-wise method for pseudo periodic time series prediction. In: Advanced Data Mining and Applications. Springer International Publishing, pp. 461–474. https://doi.org/10.1007/978-3-319-14717-8_36. Zhu, X., Zhang, S., Hu, R., Zhu, Y., Song, J., 2018. Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans. Knowl. Data Eng. 30 (3), 517–529. https://doi.org/10.1109/TMM.2017.2703636. Zhou, Z.H. (Ed.), 2016. Machine Learning. Tsinghua University Press, Beijing, pp. 126–133.

Alessandra, I., Alain, N., 2005. A frequency selective ﬁlter for short-length time series. Comput. Econ. 25 (1), 75–102. https://doi.org/10.1007/s10614-005-6276-7. Berndt, D., Clihord, J., 1994. Using dynamic time warping to ﬁnd patterns in time series. In: International Conference on Knowledge Discovery and Data Mining, pp. 359–370. Beenstock, M., Szpiro, G., 2002. Speciﬁcation search in nonlinear time-series models using the genetic algorithm. J. Econ. Dynam. Control 26 (5), 811–835. Cai, L.Q., Cui, S.J., Xiang, M., et al., 2017. Dynamic hand gesture recognition using RGBD data for natural human-computer interaction. J. Intell. Fuzzy Syst. 32 (5), 3495–3507. https://doi.org/10.3233/JIFS-169287. Ding, L.L., Meng, J., 2012. Comparison of two Chinese hog price forecasting models statistics and decision, 4, 74–76. doi: 10.13546/j.cnki.tjyjc.2012.04.050. Duan, Q.L., Xiao, X.Y., Liu, Y.R., Zhang, L., 2017a. Anomaly data real-time detection method of livestock breeding internet of things based on SW-SVR. Trans. Chinese Soc. Agric. Mach. 48 (8), 159–165. https://doi.org/10.6041/j.issn.1000-1298.2017.08. 017. Duan, Q.L., Zhang, L., Wei, F.F., et al., 2017b. Wang Liang, Forecasting model and validation for aquatic product price based on time series GA-SVR. Trans. Chinese Soc. Agric. Eng. 33 (1), 308–314. https://doi.org/10.11975/j.issn.1002-6819.2017.01. 042. in Chinese with English abstract. Ernst, B., Ray, H., 2015. Economic dynamics of the German hog-price cycle. Int. J. Food Syst. Dynam. 6 (2), 64–80. https://doi.org/10.18461/1869-6945-6. Gu, Y.H., Yoo, S.J., Park, C.J., et al., 2016. BLITE-SVR: New forecasting model for late blight on potato using support-vector regression. Comput. Electron. Agric. 130, 169–176. https://doi.org/10.1016/j.compag.2016.10.005. Horvatic, D., Stanley, H.E., Podobnik, B., 2011. Detrended cross-correlation analysis for non-stationary time series with periodic trends. EPL 94 (1), 18007–18012. https:// doi.org/10.1209/0295-5075/94/18007. Haviluddin, Alfred, R., 2016. A genetic-based back-propagation neural network for forecasting in time-series data. In: International Conference on Science in Information Technology. IEEE, pp. 158–163. https://doi.org/10.1109/ICSITech. 2015.7407796. Holst, C., Cramon-Taubadel, S.V., 2012. International synchronization of the pork cycle. Eur. Assoc. Agric. Econ. 15 (1), 1–11. https://doi.org/10.15414/raae.2012.15.01. 18-23. Jonathan, D.C., Kung, S.C. (Eds.), 2008. Time Series Analysis with Applications in R. Springer, New York, pp. 57–75. Kromanis, R., Kripakaran, P., 2013. Support vector regression for anomaly detection from measurement histories. Adv. Eng. Inform. 27 (4), 486–495. https://doi.org/10.1016/ j.aei.2013.03.002. Li, Z.M., Xu, S.W., Cui, L.G., Li, G.Q., Dong, X.X., Wu, J.Z., 2013. The short-term forecast model of pork price based on CNN-GA. Adv. Mater. Res. 628, 350–358. https://doi. org/10.4028/www.scientiﬁc.net/AMR.628.350.

588

Prediction for hog prices based on similar sub-series search and support vector regression

Prediction for hog prices based on similar sub-series search and support vector regression

Recommend Documents