Quantile forecasting and data-driven inventory management under nonstationary demand

Quantile forecasting and data-driven inventory management under nonstationary demand

Operations Research Letters 47 (2019) 465–472 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.c...

549KB Sizes 0 Downloads 59 Views

Operations Research Letters 47 (2019) 465–472

Contents lists available at ScienceDirect

Operations Research Letters journal homepage: www.elsevier.com/locate/orl

Quantile forecasting and data-driven inventory management under nonstationary demand ∗

Ying Cao a , , Zuo-Jun Max Shen a,b a b

Department of Industrial Engineering and Operations Research, University of California, Berkeley, CA, 94720, USA Department of Civil and Environmental Engineering, University of California, Berkeley, CA, 94720, USA

article

info

Article history: Received 26 March 2018 Received in revised form 20 July 2019 Accepted 27 August 2019 Available online 4 September 2019 Keywords: Newsvendor model Data-driven decision making Nonstationary time series Neural networks Quantile forecasting

a b s t r a c t In this paper, a single-step framework for predicting quantiles of time series is presented. Subsequently, we propose that this technique can be adopted as a data-driven approach to determine stock levels in the environment of newsvendor problem and its multi-period extension. Theoretical and empirical findings suggest that our method is effective at modeling both weakly stationary and some nonstationary time series. On both simulated and real-world datasets, the proposed approach outperforms existing statistical methods and yields good newsvendor solutions. © 2019 Elsevier B.V. All rights reserved.

1. Introduction In various fields of production/ inventory management, economic, engineering etc., predicting quantiles of a random process provides essential information for decision-making which is ignored by conventional point estimation of the conditional expectation. Moreover, many of these applications emphasize short-term forecasting where time series-based models, considering the internal structure of a process involved over time, are often preferable to explanatory approaches [38,41]. Accordingly, a time series model is incorporated in this paper. This paper starts with its application in newsvendor problem, one of the most fundamental stochastic inventory models, because it is well-known that the optimal stock level is the so-called critical quantile. In practice, when the distribution of demand is unknown, managers should determine the inventory level based on historical demand observations. This problem essentially boils down to the prediction of the critical quantile of the future demand. Thus, this paper focuses on solving a newsvendor problem, while our results can be easily extended to predict quantiles of a random process in other fields. Major existing approaches for solving newsvendor problem (quantile prediction) with historical data in the literature can be classified into three groups. The first category holds a strong assumption that all observations can be considered independent and identically distributed (i.i.d.) sample drawn from a real ∗ Corresponding author. E-mail address: [email protected] (Y. Cao). https://doi.org/10.1016/j.orl.2019.08.008 0167-6377/© 2019 Elsevier B.V. All rights reserved.

underlying distribution. Next, a standard treatment is to first estimate the distribution and then to replace the distribution with its estimation in the decision-making step (see [15,24,44] for a review). As a result, policies derived in such case are very susceptible to the parametric assumption of the demand distribution. Recently, to reduce this limitation, non-parametric methods have also been proposed [4,6,8,16,23,29,30,32,34]. Such an assumption of the i.i.d, though facilitates the establishment of asymptotic optimality of those methods, suffers from a major practical limitation that demand in real life changes over time, and that it is generally time-correlated. The mentioned concern leads to the second stream of approaches that consider time-correlated demand [10]. Most of these papers assume perfect knowledge of the demand evolution function with the innovations to be i.i.d. normal with zero mean and unknown variance [15,18,35]. And quantile estimates are generally obtained following a two-step procedure, i.e., mean squared error (MSE) method is employed to estimate the parameters of a predefined model, and subsequently variance of the innovation is estimated from the forecast errors [2,5,10,11,14,14, 17]. Besides the fact that the normality of errors is often violated, simple models such as AR(1), considering only linear relationship, is also unrealistic in practice. The choice of an improper evolution model may generate drastic errors in the forecasts. Inspired by the idea of quantile regression proposed by [26], the third type of approaches perform quantile estimation in a single step and relax the normality assumption on innovations. Introduced in 1978, quantile regression was developed to estimate the quantiles in a linear regression model. Later, it was extended in [27] to the autoregressive case with AR(p) models.

466

Y. Cao and Z.-J.M. Shen / Operations Research Letters 47 (2019) 465–472

Thereafter, researchers also propose different nonlinear quantile autoregressive models [3,13,19,31]. However, most of the aforementioned methods only work with stationary or at least locally stationary time series. To extend the methods to the nonstationary context, more assumptions regarding the form of trend and non-seasonality should be made, as in [20,33], thus further limit the practicality. Thus, one of the aims of this paper is to provide a singlestep non-parametric solution for quantile forecasting. Then, we argue that it serves as a data-driven approach, which works with time-correlated or even nonstationary demand data to make inventory decisions. Note that for stationarity, there are two common interpretations: the formal mathematical definition of stationary process in mathematics and statistics; the case where the demand distributions do not change over time (i.i.d.) in inventory literature. In this study, its mathematical definition is exploited. To the best of our knowledge, this is the first study in the data-driven inventory management field that deals with a general autoregressive demand process of an unknown form. In addition, we show that the myopic policy remains optimal under autoregressive demand, without requiring the demand process to be statistically increasing as in the previous literature. Moreover, compared with the existing neural network-based methods for quantile prediction by [36] and [43], our method is capable of dealing with nonstationary time series, and it does not require previous quantile values as input, which are not observable in practice. In fact, this is also the first time that a nonstationary time series can be dealt with without prespecified forms of seasonality or trend. The rest of this paper is organized as follows. In Section 2, the details of our proposed method denoted as DPFNN-Based Quantile Autoregression (DPFNN-QAR) are presented, as well as its application on stationary time series. In Section 3, the application of quantile autoregression as a data-driven approach to solve newsvendor problem is discussed, and its efficiency is illustrated by using case study which is nonstationary. In Section 4, the extension to multi-period inventory control problem is considered, in which excess inventory and backlogged demand can be carried over to the next period. Section 5 contains the concluding remarks and suggests the directions for future investigation. 2. DPFNN-based quantile autoregression 2.1. The time series model We extend the restrictive linear AR(p) model for time series, and consider a more general autoregressive model to define a process with potentially complicated nonlinear structure: Yt = g(Yt −1 , . . . , Yt −p ) + εt ,

(1)

where g(·) can be any continuous function of unknown form, random innovations {εt } are i.i.d. with mean 0 and unknown variance σ 2 , but are not necessarily normally distributed. Note that this is a stronger assumption compared with the white noise innovation assumed in the traditional linear regression and autoregression models. While the independent assumption can be relaxed to uncorrelated, the identical of εt is critical in the quantile case, allowing us to consider the effect of a covariate on the entire distribution of Y , not merely its conditional mean. Set Zτ as the τ th quantile of the common cumulative density function of {εt }, i.e., Pr(εt ≤ Zτ ) = τ . Thus, the τ th conditional quantile of Yt is written as QYt (τ |yt −1 , . . . , yt −p ) = g(yt −1 , . . . , yt −p ) + Zτ .

(2)

However, neither g(·) nor Zτ is known in practice, and they are conventionally estimated separately in two steps. For instance,

Fig. 1. A general DPFNN model.

given a predefined parametric form of g(·), ordinary least squares method (OLS) can be adopted to set its parameters. Next, Zτ can be calculated using the forecast errors by assuming normality. With quantile regression technique, however, we can avoid making the normality assumption of εt . Besides, by taking advantage of the universal approximation capability of neural networks [21, 22], the following approach proposed is nonparametric in spirit. 2.2. Structure of DPFNN In this paper, we propose to use a special neural network structure, named the double parallel feedforward network (DPFNN), to approximate (2), which is virtually a standard three-layer feedforward network (FNN) with shortcuts directly from the input nodes to the output node. The Universal Approximation Theorem for standard FNN by [21,22] can be easily extended to the case of DPFNN, since each FNN is a special case of a corresponding DPFNN when all weights on shortcuts from inputs to output are set to zero. Thus, the approximation to Eq. (2) can be arbitrarily close. This structure is better at capturing linear mapping compared with classical FNN configuration, while remains sensitive in nonlinear relationships [40]. Moreover, the analysis of [28] and [37] reveals that while FNN-based autoregressive models are asymptotically stationary, adding the shortcuts would allow them to model integrated time series. Besides, the results of our numerical experiments suggest that it also learns the seasonality in a time series. Thus, we reckon that DPFNN is suitable for modeling the potentially nonstationary time series. From now on, we assume that Eq. (2) can be replaced by a DPFNN denoted as H(·, θ0 ): QYt (τ |yt −1 , . . . , yt −p ) = g(yt −1 , . . . , yt −p ) + Zτ

= H(yt −1 , . . . , yt −p ; θ0 ) = H(xt ; θ0 ), where Xt = (Yt −1 , . . . , Yt −p ) and xt are the observed lagged values. A general DPFNN structure considered in this paper is shown in Fig. 1, a three-layer network consisting of p input nodes, m hidden nodes and one output. The weight matrix connecting the input layer and the hidden layer is denoted as Wp×m , where Wij is the weight on the link from input node i to hidden node j. Likewise, we can denote the weight vector from the hidden layer to output node by u = (u1 , u2 , . . . , um )T and that from input layer to output node by v = (v1 , v2 , . . . , vp )T . Finally, we use

Y. Cao and Z.-J.M. Shen / Operations Research Letters 47 (2019) 465–472

bh = (bh1 , bh2 , . . . , bhm )T and bo as the biases at the hidden layer and output node respectively. And for representation simplicity, we stack all these parameters together as θ . Furthermore, an activation f (·), sigmoid function in this paper, is used at all hidden nodes. The output of this network is expressed as H(xt ; θ ) =

p ∑

yt −i vi +

i=1

p m ∑ ∑

f(

j=1

yt −p wij + bhj )uj + bo .

467

for some small constant ϵ . And

ρˆ τ (u) =

{ τ h(u) if u > 0 (1 − τ )h(u) if u ≤ 0.

Training cycles repeated with the decrease in values of ϵ . Moreover, [12] suggested that as ϵ approaching to zero, the algorithm converges to the minimum of the original error function. 2.5. Quantile prediction for weakly stationary time series

i=1

According to the structure of DPFNN, many widely used time series models are special cases of DPFNN. For instance, an AR(p) model is just a DPFNN with p input nodes and 0 hidden nodes; moreover, a DPFNN with only nonzero weight on the link from node yt −s (s < p) to the output node is a seasonal AR(1)s model. On the whole, a DPFNN model can be considered as the combination of a simple stationary FNN and a seasonal AR(p) model, which is the intuition behind using DPFNN for certain nonstationary time series.

Once the structure of a DPFNN is given (p and m can be selected by cross-validation as suggested in numerous other neural network literature), our goal is to determine the value of θ0 based on the historical realizations of the time series. Based on the definition of quantile, it yields

When the time series is weakly stationary and the general regularity conditions hold, i.e. Yt is ergodic, the searching space for θ is compact etc., the uniformly convergence of H(·, θˆN ) to H(·, θ0 ) is easy to be verified by following a similar argument as presented in [13] and Theorem 2.2 from [42]. A simulated stationary time series is adopted to numerically demonstrate the efficiency of the DPFNN-QAR estimators, as presented below. All algorithms in this paper are coded in Python 3.5 using tensorflow 1.0 to model the neural network structure. AdamOptimizer [25] is used with learning rate = (N − p) × 10−4 to tune the weights, where N denotes the number of samples in training set. Besides, the algorithm terminates when a maximum number of 20, 000 is reached or the relative change of loss function is below 10−9 . ϵ in (6) is initially set to 2−5 and halved every 500 training epochs. Now we want to numerically verify the efficiency of our method. We first generate data from the following AR(2) model

θ0 ∈ argminθ EYt |xt [ρτ (Yt − H(xt ; θ ))] ∀xt ,

at = 1.2 ∗ at −1 − 0.32 ∗ at −2 + εt

2.3. DPFNN-based quantile autoregression

(3)

where ρτ (·) is the usual check function, given as

{ τu if u > 0 ρτ (u) = (τ − 1)u if u ≤ 0.

(4)

Following the standard treatment in [26], an estimator θˆN can be chosen such that the following empirical analogue of expected loss in Eq. (3) is minimized TC (θ ) =

N ∑

1 N −p

ρτ (yt − H(xt ; θ )).

(5)

t =p+1

Subsequently, an estimator of the conditional quantile is Qˆ Yt (τ |yt −1 , . . . , yt −p ) = H(xt ; θˆN ). 2.4. Huber approximation As in all neural network training cases, the loss function (5) is not convex. Stochastic gradient-based optimization methods (e.g., ADAM), though not ensuring the convergence to global optimum, have been empirically found to outperform other methods in such scenarios [25]. However, an obstacle to apply these most extensively used gradient-based methods to optimize (5) is that it is not differentiable everywhere. Thus, we follow the treatment in [9] and approximate the error function using the finite smoothing method from [12]. Thus, the following cost function is employed in our experiments rather than (5).

ˆ (θ ) = TC

N ∑

ρˆ τ (yt − H(xt ; θ )),

t =p+1

where the following Huber function is used to approximate the check function (4):

{ h(u) =

u2 /(2ϵ )

if 0 ≤ |u| ≤ ϵ |u| − ϵ/2 if |u| > ϵ

(6)

with εt i.i.d N(0, 72 ) and then set yt = at + 100. With this complete information, the real quantiles are easy to compute. We initialize the generation with a1 = ε1 and a2 = 1.2∗a1 +ε2 . And the remaining sequences come from the above formula. We discard data from a warm-up period of 500 points and keep the following 500 points. And it is verified that this process is ergodic stationary as the roots of its characteristic equation, 2.5 and 1.25, are both greater than one. The first 400 points were used for model selection and training. And with the remaining testing set of 100 data points, we compare DPFNN-QAR forecasts with the real quantiles. The DPFNN structure is selected via Monte Carlo crossvalidation (MCCV) in the following manner: 1. For each value of p, m, the data is cleaned into 500 − p records in the form of (xt , yt ) for t = p + 1, . . . , 500, and the last 100 records are reserved as testing set; 2. 80% records are randomly selected from the training set, and the DPFNN is trained for parameters with τ = 0.5; 3. Then, the xt from the remaining 20% validation set is used as the input to the trained network, and the total quantile loss is calculated on the validation set; 4. Steps (2) to (3) are repeated for 10 times, and the average cost is calculated for each (p, m) combination; 5. The (p, m) pair corresponding to the lowest average cost is selected. The configuration with smallest average quantile loss, p = 4 and m = 1, is selected and used. Three random sample paths are generated and different quantiles are targeted at. In Table 1, the percentage gaps between the DPFNN-QAR estimators and the real quantiles are summarized. The gaps are negligible in many cases, and the average quantile costs of predictions are even lower in a few cases, indicating good accuracy of the estimation. Fig. 2 shows the same story, where the predicted quantiles (the blue curve) almost overlap the real values (the orange curve), suggesting good prediction accuracy.

468

Y. Cao and Z.-J.M. Shen / Operations Research Letters 47 (2019) 465–472

Fig. 2. Real and predicted quantiles of simulated weakly stationary data.

Table 1 Average quantile costs differences between DPFNN-QAR predictions and real quantiles.

τ

Sample 1

Sample 2

Sample 3

0.3 0.5 0.7

1.1% −0.3% 1.7%

0.9% 0.0% 1.4%

−0.4% −0.3% 0.1%

3. Data-driven newsvendor problem In this section, the application of DPFNN-based quantile forecasting (DPFNN-QAR) is considered in the field of inventory management, and the newsvendor problem, one of the most fundamental stochastic inventory models, is first dealt with. 3.1. Problem statement and a data-driven solution The key element of this model is that the decision maker has a single opportunity to place an order — before the random demand is observed, no excess inventory can be carried to the next period and all unmet demands are lost [35]. It is of huge applications in stocking level management for various perishable products (e.g., newspapers, fresh produce, hotel and airline overbooking, and fashion goods). To be specific, we consider the newsvendor problem solved repeatedly in successive periods. At each iteration, the manager should set the inventory level based on previous sales (we assume sales are uncensored demand). The following elements are taken into consideration:

• Decisions: St : order-up-to inventory level at period t, assuming immediate delivery • Variables: Dt : nonnegative random demand occurred at period t, and dt denotes the realized value • Parameters: Ht : history of the process up to the beginning of period t, based on which a manager determines St , e.g., Ht = (d1 , d2 , . . . , dt −1 , S1 , . . . , St −1 )

c: constant ordering cost per unit h: holding cost per unit paid for excess inventory at hand at the end of each period after the demand is met b: per unit understock cost (e.g., lost sales + penalty for unmet demand) To achieve the minimal expected total cost, it is well known that the optimal order-up-to level is given by the critical number −c (see [35] section 4.4.2) - St∗ = FD−t 1|Ht ( hb+ ), where FDt |Ht (·) denotes b the cumulative density function (cdf) of Dt given Ht . While most literature on newsvendor assume that this cdf is unchanged and independent over time, we propose to further explore the internal structure of the demand process by assuming that it follows the autoregressive model as defined in (1). Hence, the problem of determining the ordering quantities boils down to finding the quantile (2) of the demand process with −c τ = hb+ . And it is thus naturally to use the DPFNN-QAR estimab tors to make the inventory decisions. Note that since the network is selected and trained with historical demand observations, and its output is used directly as decisions, this is essentially a datadriven approach for solving the newsvendor problem. On the other hand, our method differs from the recently introduced method by [7] as it is not required to assume a restrictive linear relationship between demand and other exogenous features. 3.2. Numerical experiments with real-world time series In Section 2, we already showed that the DPFNN-QAR estimator is accurate when the underlying process is stationary. However, most processes in real-world are nonstationary. In particular, there can be some seasonality or trend. Existing literature either preprocesses the data by deseasonality and detrend or adds terms to capture these patterns. However, in DPFNN-QAR we can learn a nonstationary time series without either of the two treatments. Though we have to sacrifice some theoretical guarantee without specifying the seasonality and trend, we can demonstrate the practical potential of our new method on real-world data over some popular existing algorithms. Two time series are used in this paper for demonstration. The first one comes from the University of Wisconsin Dairy Marketing

Y. Cao and Z.-J.M. Shen / Operations Research Letters 47 (2019) 465–472

469

exhibit both trend and seasonality [33,35], to illustrate the prediction power of our model. Depending on the assumed process structure, there are two versions of Holt-Winters’ formulation, i.e., Additive Holt-Winters’ (HWA) and Multiplicative HoltWinters’ (HWM). HWA assumes a linear trend and additive seasonal component: Level : Fig. 3. Ice-cream demands.

Trend :

Lt = α (Xt − St −k ) + (1 − α )(Lt −1 + Tt −1 ), Tt = β (Lt − Lt −1 ) + (1 − β )Tt −1 ,

(7)

Seasonality : St = γ (Xt − Lt ) + (1 − γ )St −k , where k denotes the length of cyclical period and should be specified in advance. Subsequently, the one-step-ahead forecast of the time series and the forecasting error are given by Xˆ t = Lt −1 + Tt −1 + St −k

(8)

and Fig. 4. Gasoline demands.

et = Xt − Xˆ t .

(9)

For HWM, while Eqs. (7) and (9) remain unchanged, the other equations are modified as and Risk Management Program maintained by Prof. Brian W. Gould of the Dept. of Agricultural and Applied Economics. The time series contains the monthly regular ice cream production (measured in thousand gallons) in US. Data from January 1983 to January 2017 are selected, containing 409 observations, where the first 360 were used for model selection and training and the remaining for testing. The other dataset contains monthly gasoline demand (measured in million gallons) in Ontario, Canada from January 1960 to December 1975. We got these 192 fact values from Datamarket https://datamarket.com/data/set/22of/monthly-gaso line-demand-ontario-gallon-millions-1960-1975#!ds=22of&displ ay=line. While the first 143 points are used for training and validation, we tested the selected model on the remaining 49 months of data. As shown in Figs. 3 and 4 respectively, both time series exhibit annual seasonality. Furthermore, there is an obvious rising trend in the gasoline demand. Though we have to admit that newsvendor decisions are generally made on store or warehouse levels, these nationwide time series should also be representative for some demand patterns and bring some insights. In fact, it is under these nonstationary real-world time series, we can observe the major difference between our method and the quantile regression neural networks (QRNN) proposed by [36] and [9]. It is to be demonstrated later that while DPFNN captures the trend and seasonality, the original QRNN fails and produces estimations that are almost constant over time at the sample quantile of the training data. The details of QRNN results are omitted as they are far from being accurate. Instead, we conduct comparison with a closely related and intuitive method, that is to first remove trend and seasonality by differencing, train a QRNN model on the stationarized time series and finally convert back to the original scale. We denote this method by QRNN with differencing (QRNN-D). Both order of simple and seasonal differences are selected to be one in our case, the stationarity of the two transformed times series is validated by Augmented Dickey–Fuller test. Likewise, the structure of the regression neural network is chosen via cross-validation where p = 18 and m = 3 are chosen for both time series. Due to the cyclical pattern of these two time series, we further select the widely used Holt-Winters’ triple exponential smoothing methods, which are suitable for forecasting time series that

Level :

Lt = α (Xt /St −k ) + (1 − α )(Lt −1 + Tt −1 ),

Seasonality : St = γ (Xt /Lt ) + (1 − γ )St −k , and Xˆ t = (Lt −1 + Tt −1 )St −k . One of the common practices in time series analysis is to find the parameters α, β, γ such that the sum of the squares of the forecast errors (SSE) is minimized, that is, min

α,β,γ

s.t.

N ∑

e2t

t =2

0 ≤ α, β, γ < 1.

Then, a quantile is obtained by assuming that the time series has i.i.d. normal innovations with the unknown variance estimated via observed residuals. We refer the readers to [33] for details and initialization of these two algorithms. Later, [1] proposed quantile versions of Holt-Winters methods, denoted by QHWA and QHWM respectively, replacing the MSE criterion by the quantile loss just as we did in our method. These methods are also implemented and compared with DPFNN-QAR. All of these benchmark algorithms are also implemented in Python3.5 and the objective functions were optimized by L-BFGS-B. 3.3. Results Based on the results from cross-validation, a DPFNN with p = 24 and m = 4 is employed to model the ice cream time series. Similarly, p = 23 and m = 1 are taken for the gasoline series. Subsequently, we train the selected models and test them on the reserved testing set. Here, the predictions of the 0.8th quantile of the ice cream demand and the 0.4th quantile of the gasoline case are demonstrated as examples in Figs. 5 and 6 respectively. For the first dataset, the real time series together with predictions using different methods are shown in Fig. 5. The performance of QRNN-D is significantly worse than all other methods with much higher costs given in Table 2. Two reasons may have contributed to this failure — as shown in Fig. 3, there is no steady linear trend in the time series and the order-one differencing transformation may have overfitted to the training data; as discussed earlier, quantile estimation requires i.i.d. innovations, such an assumption is rejected by the runs test as listed

470

Y. Cao and Z.-J.M. Shen / Operations Research Letters 47 (2019) 465–472 Table 3 Relative changes of quantile loss for gasoline dataset. Benchmark algorithm

τ = 0.2

τ = 0.4

τ = 0.5

τ = 0.6

τ = 0.8

HWA HWM QHWA QHWM QRNN-D

−2.23% 7.78% −10.85% −13.22% −73.18%

−8.60% −3.32% −9.87% −11.09% −66.99%

−9.26% −6.65% −9.06% −10.71% −66.76%

−6.27% −6.66% −10.06% −11.04% −67.58%

4.16% 7.69% −32.56% −28.77% −73.20%

Table 4 p-values of statistical tests for i.i.d. residuals.

Fig. 5. Ice-cream demand and predictions.

Fig. 6. Gasoline demand and predictions. Table 2 Relative changes of quantile loss for ice cream dataset. Benchmark algorithm

τ = 0.2

τ = 0.4

τ = 0.5

τ = 0.6

τ = 0.8

HWA HWM QHWA QHWM QRNN-D

−5.53% −4.62% −32.07% −34.55% −78.72%

−13.20% −12.34% −15.38% −15.20% −73.47%

−12.57% −13.01% −13.83% −12.67% −71.97%

−11.60% −12.90% −17.57% −15.41% −73.53%

−19.37% −24.06% −44.58% −41.97% −83.18%

in Table 4. All other five types of predictions lie above of the real demand most of the time as desired, since we penalize underestimation more than overestimation by setting τ greater than 0.5. The two curves corresponding to the two traditional Holt-Winters’ tend to overpredict compared with DPFNN-QAR, while the two quantile versions are inclined to underpredict. In general, the curve of DPFNN-QAR is the closest to the real process indicating a better prediction. In fact, the average quantile loss of DPFNN-QAR predictions in 49 month is significantly lower than all benchmark algorithms. To further illustrate the efficiency of our method over the others, we ran experiments to predict different quantiles, and the relative changes of average out-of-sample average loss of DPFNN−average loss of benchmark , total costs, calculated as average loss of benchmark are summarized in Table 2. The results show that our DPFNN-QAR method significantly beats the four benchmark methods. Note that the average quantile costs actually correspond to the average total costs under the newsvendor setting, which means that using DPFNN-QAR estimators as the inventory decision can reduce the cost significantly. Similar experiment was conducted to predict the 0.4th quantile of the gasoline demand for 49 months. And the results are shown in Fig. 6. Again, although the curve for QRNN-D looks much better compared with the first dataset, its performance is still not promising judging from cost evaluation. One possible reason is that differencing transformation does not capture the

p-value

DPFNN

QRNN-D

HWA

HWM

Ice-cream

Turning point test Runs test

0.818 0.183

0.818 0.002

0.818 0.311

0.645 0.663

Gasoline

Turning point test Runs test

0.565 0.311

0.206 0.311

0.908 0.826

0.908 0.936

nonstationarity as efficient as the other models. Once the orders of differencing are determined, this transformation assumes fixed linear trend and seasonality, while such elements are updated in the Holt-Winters’ methods with each new data point and even more flexible structure is allowed in DPFNN. Even though there can be other methods which can stationarize these datasets better, the best choice of such method is unknown in practice. And this is indeed the major shortcoming of such two-step methods. It is seen that though now we penalize overestimation more, all Holt-Winters predictions tend to overestimate the process more than our method. Again, it is suggested that the out-of-sample average cost of DPFNN-QAR is much lower in most cases than that of the Holt-Winters methods, as listed in Table 3. Thus, DPFNNQAR estimators are also good solutions for newsvendor problem when the demand process is seasonal and increasing over time. To validate whether the i.i.d assumption of the random innovations is satisfied, we first train all models using MSE criterion. Then, we use the residuals as a proxy for the sample path of {εt } and perform the turning point test and the runs test using R. Both methods are frequently used to test the null hypothesis that the remaining residuals are i.i.d. Table 4 suggests that we fail to reject the null hypothesis under DPFNN and Holt-Winters models, and that the i.i.d. assumption is tenable. 4. Multiperiod safety stock Now we consider the extension to multi-period newsvendor scenario, where excess inventory will be carried to the next period and unmet demand will be backlogged. We introduce the following new notations:

• xt : initial inventory at the beginning of period t, negative xt means backlogged demand

• T : number of periods in the planning time horizon • γ : discounting factor to calculate the present value of future costs Furthermore, x1 = 0 is given. To facilitate the derivation of closed-form ordering policy, it should be further assumed that all remaining inventory at the end of the planning horizon can be returned to supplier at the original price c, and all backlogged demand will also be satisfied at the same cost. Our goal is to find a sequence of ordering quantities S¯ = (S1 , . . . , ST ) over a planning horizon of length T , so that the expected discounted total cost defined below is minimized. T ∑

¯ = E{ fT (S)

t =1

γ t −1 [c(St − xt ) + g(St , Dt )] − γ T cxT +1 },

Y. Cao and Z.-J.M. Shen / Operations Research Letters 47 (2019) 465–472

where x1 = 0 is known and xt +1 = St − Dt

t = 1, 2, . . . , T ,

g(St , Dt ) = h max(St − Dt , 0) + b max(Dt − St , 0). A base-stock is proven optimal in [35]. However, the proof of optimality of the myopic policy no longer holds since our demand process is not necessarily stochastically increasing any more. Instead, the derivation of the optimal ordering policy should be reestablished for our autoregressive demand process based on the results of [39] as quoted below: Lemma 1 (Theorem 6.1 from [39]). If (a) for any fixed Ht , S˜t = arg min EDt |Ht [W (St , Dt )]; (b) under this policy, it is always true that xt ≤ S˜t t = 1, 2, . . . , T − 1; then, S¯ ∗ = (S˜1 , S˜2 , . . . , S˜T ). Theorem 1. If the cost at the end of the planning horizon is −cxT +1 , then the myopic base-stock quantity given by St∗ = FD−t 1|Ht (

b − (1 − γ )c

) h+b = g(dt −1 , . . . , dt −p ) + Zτ

= H(xt ; θ0 ) is optimal in each period. Proof. Let W (St , Dt ) = cSt + g(St , Dt ) − γ c(St − Dt ). Subsequently, T ∑

¯ = E{ fT (S)

γ t −1 W (St , Dt )}

t =1

=

T ∑

γ t −1 E {EDt |Ht [W (St , Dt )]}.

t =1

¯ And our goal is to find S¯ ∗ = arg min fT (S). ˜ We first solve for St as presented below

471

Thus, we propose to use the DPFNN-QAR prediction of the b−(1−γ )c quantile, Sˆt = H(xt ; θˆ ) ∀t with τ = , as h+b the ordering quantities. Though there is some prediction error in practice, this policy remains practical as long as the prediction is sufficiently accurate. Say, if H(·; θˆ ) − H(·; θ0 ) ≤ δ and demand is strictly bounded above from zero, i.e., Dt ≥ 2δ , then it is always possible to order-up-to H(xt ; θˆ ) in all periods since b−(1−γ )c th h+b

H(xt ; θˆ ) − Dt ≤ H(xt +1 ; θˆ )

⇔H(xt ; θˆ ) − (H(xt ; θ ) − Zτ + εt ) ≤ H(xt +1 ; θˆ ) − H(xt +1 ; θ ) + H(xt +1 ; θ ) ⇔(H(xt ; θˆ ) − H(xt ; θ )) − (H(xt +1 ; θˆ ) − H(xt +1 ; θ )) ≤ g(xt +1 ) + εt . The LHS≤ 2δ . 5. Conclusion The contribution made by this paper is twofold. On the one hand, a novel neural network model (DPFNN-QAR) is proposed to forecast quantiles of a random process. The proposed model exploits the universal approximating capability of neural networks and utilizes the idea of quantile regression. Furthermore, by adding the linear shortcuts in the neural network model, the proposed method is the first, to the best of our knowledge, that captures both stationary and nonstationary time series without the need to specify the form of seasonality or trend in advance. The model is tested on both simulated and two real time series. The results prove that our method worked efficiently predicting quantiles of weakly stationary time series and that it also beats QRNN with differencing and enhances the forecasting ability of the popular Holt-Winters methods when working with nonstationary time series with trend and seasonality. On the other hand, we suggest that myopic policy is optimal under autoregressive demand process, extending the result in [39] for stochastically increasing demand. Thus, we further propose to use DPFNN-QAR estimators as a data-driven approach for solving newsvendor problem and its multi-period extension.

EDt |Ht [W (St , Dt )] = EDt |Ht [cSt + g(St , Dt ) − γ c(St − Dt )]

= EDt |Ht [(h + (1 − γ )c) max(St − Dt , 0) + (b − (1 − γ )c) max(Dt − St , 0)] + EDt |Ht [cDt ] = EDt |Ht [(h + (1 − γ )c) max(St − Dt , 0) + (b − (1 − γ )c) max(Dt − St , 0)] + g(dt −1 , . . . , dt −p ), which is similar to the single period model with a constant term. The minimum is S˜t = Ft−1 (

b − (1 − γ )c h+b

|lt ) = g(dt −1 , . . . , dt −p ) + Zτ

with τ = . Lastly, it remains to verify condition (b) h+b in Lemma 1 to show that this myopic policy is optimal. Since demand is always nonnegative, it is obviously true when t = 1, and for t = 2, 3, . . . ., T − 1: b−(1−γ )c

xt = S˜t −1 − Dt −1

= g(dt −2 , . . . , dt −p−1 ) + Zτ − g(dt −2 , . . . , dt −p−1 ) − εt −1 = Zτ − εt −1 ≤ Zτ + g(dt −1 , . . . , dt −p ) = S˜t . The inequality holds since demand is always nonnegative and εt are i.i.d. This theorem also works when T = ∞. Thus, the myopic policy is optimal in the multi-period scenario. □

References [1] A. Alexandre Trindade, Y. Xu, Quantile versions of Holt–Winters forecasting algorithms, J. Statist. Adv. Theory Appl. 5 (1) (2011) 15–35. [2] L.C. Alwan, M. Xu, D.-Q. Yao, X. Yue, The dynamic newsvendor model with correlated demand, Decis. Sci. 47 (1) (2016) 11–30. [3] A. Aue, R.C. Cheung, T.C. Lee, M. Zhong, et al., Piecewise quantile autoregressive modeling for nonstationary time series, Bernoulli 23 (1) (2017) 1–22. [4] G.-Y. Ban, The data-driven (s, S) policy: Why you can have confidence in censored demand data, 2015. [5] A. Bensoussan, M. Çakanyıldırım, S.P. Sethi, A multiperiod newsvendor problem with partially observed demand, Math. Oper. Res. 32 (2) (2007) 322–344. [6] D. Bertsimas, A. Thiele, A Data-Driven Approach to Newsvendor Problems, Technical Report, Massechusetts Institute of Technology, Cambridge, MA, 2005. [7] A.-L. Beutel, S. Minner, Safety stock planning under causal demand forecasting, Int. J. Prod. Econ. 140 (2) (2012) 637–645. [8] A. Bisi, K. Kalsi, G. Abdollahian, A non-parametric adaptive algorithm for the censored newsvendor problem, IIE Trans. 47 (1) (2015) 15–34. [9] A.J. Cannon, Quantile regression neural networks: Implementation in R and application to precipitation downscaling, Comput. Geosci. 37 (9) (2011) 1277–1284. [10] E. Carrizosa, A.V. Olivares-Nadal, P. Ramírez-Cobo, Robust newsvendor problem with autoregressive demand, Comput. Oper. Res. 68 (2016) 123–133. [11] C. Chandra, J. Grabis, Application of multi-steps forecasting for restraining the bullwhip effect and improving inventory performance under autoregressive demand, Eur. J. Oper. Res. 166 (2) (2005) 337–350.

472

Y. Cao and Z.-J.M. Shen / Operations Research Letters 47 (2019) 465–472

[12] C. Chen, A finite smoothing algorithm for quantile regression, J. Comput. Graph. Statist. 16 (1) (2007) 136–164. [13] X. Chen, R. Koenker, Z. Xiao, Copula-based nonlinear quantile autoregression, Econom. J. 12 (s1) (2009) S50–S67. [14] L. Dong, H.L. Lee, Optimal policies and approximations for a serial multiechelon inventory system with time-correlated demand, Oper. Res. 51 (6) (2003) 969–980. [15] R. Gélinas, P. Lefrançois, On the estimation of time-series quantiles using smoothed order statistics, Int. J. Forecast. 9 (2) (1993) 227–243. [16] G.A. Godfrey, W.B. Powell, An adaptive, distribution-free algorithm for the newsvendor problem with censored demands, with applications to inventory and distribution, Manage. Sci. 47 (8) (2001) 1101–1112. [17] S.C. Graves, A single-item inventory model for a nonstationary demand process, Manuf. Serv. Oper. Manage. 1 (1) (1999) 50–61. [18] D.N. Gujarati, Basic Econometrics, Tata McGraw-Hill Education, 2009. [19] M. Guo, Z. Bai, H.Z. An, Multi-step prediction for nonlinear autoregressive models based on empirical distributions, Statist. Sinica (1999) 559–570. [20] J.C. Hoff, A Practical Guide to Box-Jenkins Forecasting, Lifetime Learning Publications, 1983. [21] K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Netw. 4 (2) (1991) 251–257. [22] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Netw. 2 (5) (1989) 359–366. [23] W.T. Huh, R. Levi, P. Rusmevichientong, J.B. Orlin, Adaptive data-driven inventory control with censored demand based on Kaplan–Meier estimator, Oper. Res. 59 (4) (2011) 929–941. [24] M. Khouja, The single-period (news-vendor) problem: literature review and suggestions for future research, Omega 27 (5) (1999) 537–553. [25] D. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980. [26] R. Koenker, G. Bassett Jr., Regression quantiles, Econometrica (1978) 33–50. [27] R. Koenker, Z. Xiao, Quantile autoregression, J. Amer. Statist. Assoc. 101 (475) (2006) 980–990. [28] F. Leisch, A. Trapletti, K. Hornik, On the stationarity of autoregressive neural network models, in: SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, 1998. [29] R. Levi, G. Perakis, J. Uichanco, The data-driven newsvendor problem: new bounds and insights, Oper. Res. 63 (6) (2015) 1294–1306.

[30] R. Levi, R.O. Roundy, D.B. Shmoys, Provably near-optimal sampling-based policies for stochastic inventory control models, Math. Oper. Res. 32 (4) (2007) 821–839. [31] X. Liu, Markov switching quantile autoregression, Stat. Neerl. 70 (4) (2016) 356–395. [32] L.H. Liyanage, J. Shanthikumar, A practical inventory control policy using operational statistics, Oper. Res. Lett. 33 (4) (2005) 341–348. [33] R. Pan, Holt–winters exponential smoothing, in: Wiley Encyclopedia of Operations Research and Management Science, Wiley Online Library, 2010. [34] C. Shi, W. Chen, I. Duenyas, Nonparametric data-driven algorithms for multiproduct inventory systems with censored demand, Oper. Res. 64 (2) (2016) 362–370. [35] L.V. Snyder, Z.-J.M. Shen, Fundamentals of Supply Chain Theory, John Wiley & Sons, 2011. [36] J.W. Taylor, A quantile regression neural network approach to estimating the conditional density of multiperiod returns, J. Forecast. 19 (4) (2000) 299–311. [37] A. Trapletti, F. Leisch, K. Hornik, Stationary and integrated autoregressive neural network processes, Neural Comput. 12 (10) (2000) 2427–2450. [38] F.-M. Tseng, H.-C. Yu, G.-H. Tzeng, Combining neural network model with seasonal time series ARIMA model, Technol. Forecast. Soc. Change 69 (1) (2002) 71–87. [39] A.F. Veinott Jr., Optimal policy for a multi-product, dynamic, nonstationary inventory problem, Manage. Sci. 12 (3) (1965) 206–222. [40] J. Wang, W. Wu, Z. Li, L. Li, Convergence of gradient method for double parallel feedforward neural network, Int. J. Numer. Anal. Model. 8 (2011) 484–495. [41] S.C. Wheelwright, S.G. Makridakis, et al., Forecasting Methods for Management, Wiley, 1973. [42] H. White, Nonparametric estimation of conditional quantiles using neural networks, in: Proceedings of 23rd Symposium on the Interface, in: Computer Science and Statistics, 1992, p. 190. [43] Q. Xu, X. Liu, C. Jiang, K. Yu, Quantile autoregression neural network model with applications to evaluating value at risk, Appl. Soft Comput. 49 (2016) 1–12. [44] Y. Zhang, V. Vovk, W. Zhang, Probability-free solutions to the nonstationary newsvendor problem, Ann. Oper. Res. 223 (1) (2014) 433–449.