Copyright @ IFAC Adaptation and Learning in Control and Signal Processing. Cemobbio-Como. Italy, 2001
APPROXIMATING NETWORKS FOR THE SOLUTION OF T-STAGE STOCHASTIC OPTIMAL CONTROL PROBLEMS 1 Marco Baglietto· Cristiano Cervellera • Thomas Parisini •• Marcello Sanguineti' Riccardo Zoppoli'
• Dept. of Communications, Computer and System Sciences DIST- University of Genova, Via Opera Pia 13, 16145 Genova, Italy •• Dept. of Electronic Engineering and Information Sciences Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy
Abstract: Approximate solution of a general T-stage stochastic optimal control problem is considered. It is known that discretizing uniformly the state components may lead dynamic programming to incur the "curse of dimensionality" . Approximating networks, i.e., linear combinations of parametrized basis functions provided with density properties in suitable normed spaces, are then defined and used in two approximate methods (examples of such networks are one hidden layered feedforward neural networks, Radial Basis Functions, etc.). The first one consists in approximating the optimal cost-to-go functions in dynamic programming. The second method reduces the original functional optimization problem to a nonlinear programming one that is solved by means of stochastic approximation. Approximating networks of suitable types benefit by the property that, for the approximation of members of some classes of smooth functions, the number of parameters to be optimized and the number of samples to be used increase moderately with the dimensions of the arguments of the functions. The two methods are tested and compared in a test problem involving a lO-dimension state vector. Copyright © 2001 IFAC Keywords : Neural Networks, Dynamic Programming, Stochastic Approximation, Optimal Control, Applied Neural Control
where Xt E Rn is the state vector (xo = x IS a given initial state) , Ut E Ut ~ Rm is the control vector (U is the set of admissible controls), and ~t E Rq is a random noise. The state is assumed to be perfectly measurable and the stochastic vectors ~t, t = 0, . . . , T - 1, to be characterized by a known probability density function. The cost function is given by
1. INTRODUCTION
A general form of T -stage stochastic optimal control problem is addressed. The discrete-time dynamic system is given by Xt+l =f(xt,ut,~d,
t=O , I, . . . ,T-l, (1)
T-I
1 The work was supported in part by the Italian Ministry for the University and Research (MURST) Project "Identification and control of industrial systems" .
.J =
L h(Xt,Ut,~t) + hT(XT). t=o
107
(2)
The following optimal control problem is stated:
on a statistical perspective has been proposed. This approach leads to a discretization of the state space different from the uniform grid. Only the values of the cost-to-go functions corresponding to the points belonging to a subset of the full uniform grid and with complexity growing as O(n 3 ) have to be computed. This polynomial growth allows to deal (for the first time) with high values of n without any need of introducing a simpler model. Such an efficient choice of the discretization points has enabled the solution of an inventory forecasting problem with up to nine state variables (Chen et al., 1999).
Problem P. Find the sequence of optimal control functions u~ = f-t~(xd, t = 0,1, .. . , T 1 , that minimize the expected cost J (xo) E~o' ... '~T-l (3) .
o It is common knowledge that Problem P can
be solved analytically by dynamic programming (DP) only if suitable hypotheses are verified (typically, if Problem P is stated in the LQ framework and if the random vectors are mutually independent) . In the general case, one has to look for approximate solutions. This is usually done by discretizing the state space. In such a way, the functional equation that defines the DP procedure has to be solved only in correspondence of a finite number of state values. Given a fixed number d of discretization levels for each component of the state vector, the use of a full uniform grid makes the total number of discretization points grow as d n . This exponential growth leads to the "curse of dimensionality" (Bellman, 1957) and limits the applicability of DP to small dimensions.
In this paper two approaches to solve Problem P in its general form and in high-dimensional settings are proposed. Both of them are based on the use of approximating networks, defined in Section 2. In Sections 3 and 4 the two methods of approximate solution are introduced. Some results from approximation theory and statistical learning, like those discussed in Section 5, justify the advantageous performances of the two proposed approaches in complex applications, such as the example presented in Section 6.
The efforts to cope with the curse of dimensionality have followed two main directions: reduction of the complexity of the problem by using simpler models, and use of smart approximators for the cost-to-go functions in order to reduce the number of grid points. As for the first method, the aggregation/ disaggregation of grid points to diminish the dimension of the state space (see, for example, (Turgeon, 1981) in the reservoir literature), and the partition of the original problem into smaller separable problems (see (Archibald et al., 1997) and the references therein) can be cited. The major drawback of these methods lies in their lack of generality, since they are highly dependent on the particular problem they are applied to (in the aforementioned works, reservoir operation). As for the second method, it is possible to trace an evolution starting from polynomial approximation (see, for example, (Bellman et al., 1963)) , and continuing with cubic Hermite polynomials (FoufoulaGeorgiou and Kitanidis, 1988) and spline interpolation (Johnson et al., 1993). In (Johnson et al., 1993) the use of cubic splines, instead of tensor-product linear interpolants, has allowed to solve a water supply problem up to dimension five . Although these strategies permit the use of fewer grid points to obtain a given precision in the approximation of the optimal cost-to-go functions, they still need a uniform discretization of the state space, thus leaving the curse of dimensionality eventually unavoided and their use still impracticable for large values of n.
2. APPROXIMATING NETWORKS
Let us consider functions ,/,(x): ~n t-t ~s. The proposed methods rely on approximating the unknown functions by means of fixed-structure functions '/'v(x, w v ) of the form '/'v(x,wv )
= col
[t
Cij ",(x, Ki),j
= 1, .. . ,
s] (3)
where ",(" .) is a given basis function, Cij E ~ and the components of the vectors Ki E ~k are parameters to be determined. Wv is defined as Wv ~ col(cij,Ki,i = 1, ... ,v,j = 1, ... ,s) . Then Wv E ~N(v), where N(v) = v(k + s). Following (Zoppoli et al., 2001), let the set of functions to be approximated be a linear space H. Av c H is defined as the parametrized set of all functions of the form (3), that is, Av ~ bv(x , w v ) EH, WV E ~(v)}. The sequence {Av} ~ I has the infinite nested structure Al C A2 C .. . c Av C . . .. As the distance between the functions '/' and '/'V has to be measured, the space H is equipped with a norm 11 · 11. Thus
we let 1i ~ (H,II . ID (Zoppoli et al., 2001). In the approximation methods that are going to be presented, the following assumption plays a basic role. A!. The functions '/'v(x,w v ) are such that the set U~~ Av is dense in 1i.
Recently, an approximating architecture using multivariate adaptive regression splines and based
Functions of the form (3), which verify Assumption AI, benefit by the density property in the
108
space 1-£ and are defined as 1-£-approximating networks (1/-ANs). Feedforward neural networks with one hidden layer and linear output activation functions, linear combinations of sinusoidal functions with variable frequencies, and linear combinations of radial basis functions of the Gaussian type are examples of 1-£-ANs in the space C(K, ~S) of continuous functions on a compact set K with the supremum norm and in the space 'c2(K, ~S) of square-integrable functions on K with the 'c2 norm. Hence, such approximators are both Cand 'crANs. The proofs of the related density properties can be found in several papers (see, for example, (Leshno et al., 1993) and (Park and Sandberg, 1991)). Linear 1/ -approximating networks are those in which the basis functions 'Pi do not contain the parameter vectors K-i (in the case of linear ANs, the basis functions have to be specified by the subscript i, this is unnecessary for the nonlinear networks) . Examples of C- and 'c2- linear ANs are classical linear approximators such as algebraic and trigonometric polynomials.
+hT[J(X~~I' UT-I, ~T-d]}, (I) XT - 1
E
(4a)
XT-l,L
N(x~/)) = min E{h(xt,Ut,~t) UtEUt ~t
+J[J(xi/) , Ut, ~d, wf~d]}, t=T-2, ... ,Oj
xii)
(4b)
EXt,L
(the ~o, ... , ~N-l are assumed to be mutually independent). Jt(Xt) is an approximation of the optimal cost-to-go Jt(xd (at stage T -1, this approximation is not needed (see (4a)). J(xi/) , wfll~) is an approximation of Jt(xd derived by constraining this cost to take on the structure of an AN, that is,
i=I
t = T - 1, . . . ,1.
The ANs are assumed to maintain the same structure stage after stage. To simplify the notation, in (4b) and in the following the subscript Vt has been dropped. The parameter vector wto is obtained on the basis of the values of Jt(x~/)) by using some approximation criterion like the least-squares one, i.e.,
The dimension N (v) of the parameter vector of an AN (or simply v, as N(v) is linear in v) will be considered as a "measure" of the complexity of the network. The term "network" is borrowed from the parlance of neural networks and it has been chosen since functions of the form (3) have the same structure as the feedforward neural networks with one hidden layer and linear output activation units. As the 1/-ANs (or, briefly, the ANs) that will be presented in the next sections (in practice feedforward neural networks but possibly even other networks) benefit by both the C- and the 'c2-density properties (and also by other density properties), in the following simply the term "approximating networks" will be used.
t
=T
- 1, . .. ,1. (5)
Of course, there is the need to assume that the cost functions to be approximated are sufficiently "smooth" in the sets of the admissible states, so that such functions belong to suitable subsets of linear spaces with the norm 11 · 11, hence, the spaces
3. APPROXIMATE MINIMIZATION OF THE OPTIMAL COST-TO-GO FUNCTIONS IN THE DP PROCEDURE (METHOD A)
1-£t
t::.. = (1/t, 11 . 11)·
There are two ways of determining the optimal controls on-line: reoptimization and approximation of the optimal control functions by ANs. 1. Reoptimization. Using this technique the computation of the optimal control ii~ is performed o,n line by exploiting the values of the cost-to-go J(Xt, wto) stored in memory, that is,
The first approximate method proposed here to solve problem P (that will be called Method A) is based on the DP procedure. What follows details the approach presented in (Baglietto et al., 2000). Let us ignore for the moment how the state space has been discretized and just assume that, at each stage t, L discretized points xiI), I = 1, ... ,L have been selected (in order to simplify the notation, we suppose the number L to be the same at all stages). Let X tL ~ I = 1, ... ,L }. Then it is possible to write:
{xi/) ,
+ J [J(Xt,Ut,~d,wf~I]}' t
= 0,1, . .. , T -
1.
2. Approximation of the optimal control fu.nctions by ANs. The optimal control functions J.L~(xd are approximated by another family of ANs, denoted
109
by jL(Xt,wr}, t = O, l , .. . ,T -1. Such a second family of ANs is derived as follows . While applying Eqs . (4), the L pairs [x~l) ,jL~I)(x~I))] are obtained at each stage. Then the optimal vectors wtO can be computed by minimizing (as in (5)) the approximation error Lt=l Ilit; (x~l)) - jL(x~l) , wt) 112, for t = T - 1, ... ,1. Again, the optimal control functions /-L~ , t = 0,1, .. . , T - 1, are assumed to belong to normed linear spaces in which the ANs are dense.
uniform probability distribution. In this way the dimension of the training set does not depend "structurally" on n (as in the full uniform grid case) , but such a dependence is related to the accuracy of the approximation (in fact, it can be expected that functions with higher dimension n require larger training sets in order to guarantee the same level of accuracy). The main drawback lies in the fact that the original "deterministic" problem (approximation of a fixed unknown function) is transformed into a "statistical" one. In other words, a probability measure on the generation of the samples, is arbitrarily introduced (even if it is the most "reasonable" probability). In this way all the available theoretical results (see, for example, (Barron, 1994; Vapnik, 1995)) are applied to a problem that is not statistical in nature. Anyway, by using an uniform distribution such results can be helpful to give reasonable hints about the required complexity of the ANs and the sizes of the training sets. (iii) A deterministic sequence of points, which tries to "spread" them in the most uniform way. Examples are Orthogonal Arrays, Latin Hypercubes, "quasi-Montecarlo" sequences (see, for example, (Niederreiter, 1992; Chen et al., 1999)). In this case the sequences are deterministic, then they enable one to obtain "deterministic" results. Furthermore, generally the resulting training sets are more "uniformly spread" than the ones generated by a random uniform distribution (see (Niederreiter, 1992)). However, there are very few available theoretical results that can help find the complexity of the ANs and the required number of points in order to have a good accuracy of the approximation.
By using the second method instead of reoptimization, it is possible to obtain the important advantage of performing all computations off line. Indeed, reoptimization requires on-line computations whereas the use of ANs permits the on-line generation of the control vectors "almost instantaneously". The idea of approximating the optimal cost-to-go functions by approximating networks having the structure of neural approximators is not new. Bertsekas and Tsitsiklis in (Bertsekas and Tsitsiklis, 1996) have presented, under the name of "neuro-dynamic programming", several methods that involve the use of neural networks in order to deal with high-dimensional DP, though the attention (as for most of the whole "neurodynamic" literature) is by far more focused on the case of infinite-horizon than on T -stage optimal control problems. In Section 5, the use of ANs (either for the use of DP or for Method B presented in Section 4) is justified with recent results in the field of nonlinear approximation.
3.1 On the choice of the sample sets X tL
In order for Method A to work, a uniformly good approximation of the optimal cost-to-go function over the sets of the feasible states is needed. In fact, an even slightly bad approximation of the (t + l)-th optimal cost-to-go function in some regions of its domain could lead the minimization algorithm to find a wrong minimum, at stage t , during the "exploration" of the (t + l)-th cost-togo function. In other words, for the choice of a good training set, the various feasible sets should be represented without priviledging some regions over others, which would be unjustified in absence of prior information about the cost-to-go function. There are many possibilities of uniform sampling of the feasibe sets. For example, the following ones can be cited. (i) A "full uniform grid". Each component of the state space is discretized by a fixed number of values. This is the simplest choice, and the first used since the introduction of the DP algorithm. As said before, its main drawback lies in the curse of dimensionality. (ii) A random sequence of independent and identically distributed (i.i.d.) points drawn with an
4. REDUCTION OF THE FUNCTIONAL PROBLEM P TO A NONLINEAR PROGRAMMING ONE (METHOD B) Method A may not be an easy tool to handle, for it may require cumbersome numerical procedures. Therefore, another kind of approximate technique to solve Problem P (which will be called Method B) is considered. Such a technique consists in assigning the control functions Ut = /-Lt (xd a fixed structure Ut = jL(Xt, wt), t = 0,1, . . . , T - 1, where jL( .,.) is an AN and wt is the vector of parameters to be determined. Even if the ANs introduced in this section differ from the ones used for Method A, the same notation is kept. Clearly, the control scheme gives rise to a chain of T approximating networks, each followed by the dynamic system. Now, by substituting the control functions jL(Xt, wt) into the state equation (1) and in (2), the cost function assumes the form J(w, ~o, . .. , ~T-l)' where w 110
~
= col(wt, t =
0,1, ... , T - 1) . Then the vector WO that minimizes the expected cost E{ [.1(w,€)] has to be
for a more detailed description of the algorithm) . It is worth noting that (7) is the classical adjoint equation of T -stage optimal control theory, with the addition of one term (the third) to take into account the introduction of the parametrized feedback control law. Note also that the structure of (7) does not depend on the type of AN used to implement the control law. Furthemore, notice that Method B , unlike Method A , does not require the random vectors €o, ... , €T - I to be mutually independent.
6
found, where € = col(€o , ... ,€T-I). It follows that the functional optimization Problem P has been approximated by a nonlinear programming problem that can be solved by using some gradient descent method. Provided that the numbers Vt of basis functions in each AN are sufficiently large, the new problem can approximate Problem P to any degree of accuracy. The concept of epiconvergence is useful to state correctly the meaning of the term "approximation" of Problem P by a sequence of nonlinear programming problems parametrized by Vt (see (Zoppoli et al., 2001)) . Unfortunately, due to the general statement of Problem P, it is generally impossible to express the average cost Ed.1(w ,€)] in explicit form . This leads to the computation of the "realization" Vw .1[w(k) ,€(k)] instead of Vw Ed.1(w,€)], thus giving rise to the stochastic approximation algorithm
w(k
+ 1) =
Method B has turned out to be simple (it does not require the construction of any grid in the state space), and very effective in solving optimal control problems with a large number of state components (see, for example, (Baglietto et al., 2001) and (Parisini and Zoppoli, 1998)). Although the authors are not yet able to motivate analytically such powerful capabilities, there are good reasons to believe that the properties discussed in Section 5 can be in force not only to approximate functions belonging to suitable spaces, but also to solve approximately functional optimization problems by using a "moderate" number of free parameters and of samples, provided that the optimal solutions are characterized by appropriate smoothness properties.
w(k) - a(k)V w .1[w(k) ,€(k)] k
= 0, 1, . . .
(6)
The sequence {€(k),k=O , l , . .. } is generated by randomly selecting the vectors €o(k) , ... , €T-I (k) according to their probability density function . As it is well known, the step-size a(k) must suitably decrease to ensure convergence. In the example given in Section 6, it is a (k) = Cl / (C2 + k), Cl, C2 > O. To implement algorithm (6) at each iteration step k, the components of the gradient Vw.1[w(k) ,€(k)] have to be computed, i.e, the partial derivatives
!:la ' .1[w(k) , €(k)] ,
for t
5. ON THE CURSE OF DIMENSIONALITY IN APPROXIMATE OPTIMIZATION If Method A is used, an effective methodology for deriving an approximate solution of problem P should require a limited number of parametrized basis functions and a limited number of samples ([state - optimal cost-to-go functions] and possibly [state - optimal controls] values) to obtain a good approximation of the optimal cost-to-go and control functions .
=
uwi. O, l , ... ,T - 1 and for i = 1, .. . ,N(Vt). denotes the i- th component of the vector Wt. It . . . 0.1 0.1 ofi,(Xt , Wt) IS pOSSIble to wnte: ~ = ~ ai ' and
w;
UW t
UUt
wt
then , by using some algebra, it is easy to see that 0.1 . . b ~ IS gIven y uUt
where as
T
>'t
60.1 UXt
= ~ . Moreover,
oYt'
If Method B is used, a limited number of basis functions and of samples of stochastic variables should be required to obtain good approximations of the control functions . In particular, there is the need to avoid an unacceptably fast growth of the number of basis functions and samples with the dimension n of the state vector in order not to incur, if possible, the curse of dirnensionality.
>'t can be computed
Method A and Method B share the following aspects: (i) the approximation of unknown functions by approximating networks, (ii) the use of sample sets in order to determine the values of the parameters. In this section, the target function 'Y(x), defined on a compact set BeRn, will be used as a unified notation representing, each time, the optimal t-th cost-to-go or the optimal t-th control function .
t = T - 1, . .. ,0 (7)
a
~hT(XT)'
uXT
In (7) Yt is the actual input to the AN at stage t (see, for example, (Parisini and Zoppoli, 1994)
III
Now suppose to have a set Lp of P samples [x(p) , y(p) = 'Y(x(p))], p = 1, ... , P . The number P is called sample complexity. The quantity that can be minimized by using an approximating network with complexity 11 and a set Lp of P samples is the so-called empirical risk 1> 'L:=l 11'Y(x(p)) 'Y",p(x(p) , W" , Lp )11 2 . Let
where cr(·) is a sigmoidal function, i.e., a bounded measurable function on the real line such that lim cr(z) = 1, lim cr(z) = O. For functions z->+ oo ~n -t ~
'Y : C-y
arg
min
WvElRN(v)
min
WvElRN(v)
E(lI 'Y (x) - 'Y,,(x,w,,)11 2 )
11
i
denoting
hidden units, and let 'Y~p ~ 'Y~p(x, w~, Lp), p
w~ ~ arg wv~~2(v) ~ L
=
2 1I'Y(x(p))-'Y,,(x(p) , w,,)11 .
p=l Moreover, let 'Y E r, B ~ [-l,l)n , and suitable assumptions on the parameters vector be satisfied (see (Barron, 1994)). Then
B
where p is a probability measure on B. The quantity of interest to evaluate the effectiveness of the approximation is the distance between 'Y and 'Y~p , called generalization error and defined as £"p(/" W,, ' Lp) ~ E(II'Y - 'Y~pI12). By using the triangle inequality, the following can be written 1I'Y-'Y~pll :S 11'Y-'Y~II+lb~-'Y~plI, which evidences the two factors contributing to the generalization error. The term Ib - 'Y~ 11 2, called approximation error, is due to the impossibility of perfectly representing a function belonging to an infinitedimensional space by a function from a finitedimensional space. The term E(lb~ - 'Y~pI12) , known as estimation error, is due to the fact that a limited number of samples contains insufficient information about the target function .
A number of important remarks can be done about this result. - To apply Theorem 1 to the proposed methods for the approximate solution of Problem P, the optimal cost-to-go functions or each component of the optimal controls must belong to the class r . However, this is not a major limitation, since r contains a large variety of functions of interest in applications (see (Barron, 1993)) . - For a fixed dimension n , the approximation error, as expected, goes to zero when 11 -t 00. The estimation error goes to zero when 11, P -t 00 11 only if pin P -t 0, i.e., only when the sample
Now some analytical results are presented concerning the relationship among generalization error, computational complexity 11, sample complexity P and dimension n, for a random extraction of the set Lp according to the probability density p (see 3.1, case (ii)) and for an approximating network corresponding to feedforward sigmoidal neural networks. Such a network structure has been chosen for its wide and successful diffusion in applications. However, the results that are going to be presented hold in a similar form for wider classes of nonlinear approximating networks (see, for example, (Niyogi and Girosi, 1996) for radial basis function networks) . To avoid burdening the notation and without losing generality, the case of a scalar function, i.e. , s = 1 in (3) , is considered. A feedforward neural network with one hidden layer is expressed as:
"
w
Theorem 1 (Barron , 1994). Let A" be the set of sigmoidal feedforward neural networks with
Ib(x) - 'Y,,(x, w,,)11 2 p(x)dx,
= L Ci cr(x T Q:i + (3i),
~ 'L:l Iwt with
the i-th component of the vector w, and r ~ { T ~n -t ~ such that C-y < 00 } . The following theorem is stated for an integral squared error on the set B ~ [-1,1)n, but this is not a major limitation, since other bounded domains may be considered (see (Barron, 1993) for details).
J
'Y,,(x, w,,)
Ilwlll I'YF(W)I dw (see (Barron, 1993)),
where Ilwlll
If, ideally, there was an infinite number of samples, it would be possible to obtain 'Y~ ~ 'Y,,(x, w~),
w~ ~ arg
t:,
= JlRn
z-> - oo
with a Fourier transform 'YF , let
complexity 11 grows slower than PllnP. - For functions with C-y showing a moderate growth with n, the bound O( C~ 111) allows to avoid the curse of dimensionality with respect to the computational complexity. P ) 1/2 - If we let 11 '" C-y ( n In P , the upper bound on the generalization error is
This implies that, for functions with C-y showing a moderate growth with n , also the curse of dimensionality with respect to the sample complexity is avoided. - With respect to the choice in the previous point, it must be pointed out that the number of units 11 cannot generally be selected as a function of C-y,
(9)
i=l
112
which is not known a priori. However, the same bound (10) can be obtained when the computational complexity is optimized by using one of the techniques described in (Barron, 1994) .
simulations in this paper, a lO-reservoir system with T=5 depicted in Fig.1 2 has been chosen. The system is the same presented (for the deterministic case) in (Yakowitz, 1982). The state equation has the form:
It is interesting to compare the nonlinear ANs with the linear ones. By ignoring the logarithmic factor , for a given C'Y the bound (10) provides a
(n)
j
_
Xt+l -
. (j mm xt
j
-
Ut
+
'"'
h
~ Ut
+ <"tcl 'X jmax )
hEll
1/ 2
rate of convergence 0 p for the generalization error. In contrast, the rate of convergence by using linear approximating networks (e.g., fixedknots splines and polynomials) for functions belonging to classical smoothness classes, such as 1 ) 2s / (2s+n) Sobolev spaces of order s , is 0 ( p (see, for example, (Nussbaum, 1986)) . The dependence of the exponent on the dimension n implies that, in order to keep the generalization error below a fixed threshold, the smoothness s has to grow with the dimension n . In other words, to avoid the curse of dimensionality with respect to the sample complexity, the class of functions to be approximated has to be constrained more and more as the dimension increases. However, it must be considered that also the condition of finiteness of C'Y ' used to define the class f, represents a smoothness constraint, although in a less evident form . As noticed in (Girosi et al., 1995), functions in r become more and more constrained as the dimension n grows. In other words, the effect of the dimension reveals itself in a different way on r with respect to Sobolev spaces.
t
°< -
j
ut
= 0,1 , ... ,4;
j
j
= 1,2, . . . ,10
h
< cl ' - x t + '"' ~ u t + <"min hEll
where xi is the amount of water in the j-th reservoir at the beginning of the t-th period, is the amount of water from the j-th reservoir released during the t-th period (planned at the beginning of the period), is the set of indexes relative to the reservoirs which release water directly into the j-th one, and ~t is the natural inflow of water into the j-th reservoir during the t-th period. The inflows are stochastic, all with probability density functions that are uniform between ~min = col(l, 1,1, 0,0.7, 0.7, 0, 1.5,0,0) and ~max = col(1.5, 1.5, 1.5, 0,1.2,1.2,0, 2, 0, 0) : they simulate the sum of deterministic minimum river inflows and stochastic rain inflows. The random variables ~t are mutually independent. The presence of a flood way for each reservoir is assumed, so that the state vector is never larger than Xmax = col(lO, 10, 10, 10, 10, 10,10,10,18,25). The benefit function to be maximized is given by
ui
It
4
Moreover, in the bound (10) the dimension n is somehow "hidden" in C'Y. Such a bound allows to avoid the curse of dimensionality with respect to the samples complexity only for classes of functions with C'Y not exponentially large in n. The same problem arises if one remembers that the bound (10) corresponds to a computational complexity 1/ '" Cl (n I~ p) 1/ 2, which is characterized by the curse of dimensionality for exponentially large values of C'Y . However, a wide variety of functions of interest for which C'Y shows a moderate growth with the dimension is reported in (Barron, 1993).
:; =
L
(CpUt
+ CIU~O)
t=O
10
100
.
L max (0, xl
m in -
. 2
~)
(11)
j=1
where cp = col(1.1 , 1,1 , 1.2,1.1,1 , 1.2,1,1.2,2.5) describes the benefits for the release of water from each reservoir (for example, power generation) and Cl = 1.9 is the coefficient corresponding to an irrigation benefit, supposed to come only from the 100th reservoir. J contains a penalty function to induce the condition X5 ~ Xmin , where Xmin = col(3, 3, 3, 3, 3, 3, 3, 3, 5, 6) . For what concerns Method A, ANs having the structure of feedforward neural networks have been used to approximate the optimal cost-togo functions Jt(Xt), t = 1,2, . . . ,4. Each network has 1/ = 40 neural units, and it has been trained by using L = 1800 sample pairs [state vector optimal cost-to-go function).
6. A NUMERICAL COMPARISON BETWEEN METHOD A AND METHOD B In order to test and compare Method A and Method B, a very classic example of application is considered: the optimal control of a system of reservoirs on a finite horizon of time. This problem has been faced many times in literature, very often by using some techniques related to DP ((Foufoula-Georgiou and Kitanidis, 1988), (Johnson et al., 1993), (Yakowitz, 1982)) , for which it has become a sort of benchmark. For the
As for Method B, feedforward neural networks have been used again, this time to approxiThe numerical comparison has been drawn from (Baglietto et al., 2000)
2
113
mate the optimal control functions p,~(xd , t = 1,2, .. . ,4. Each network has v = 40 neural units. Since Method B utilizes unconstrained optimization techniques, suitable differentiable penalty functions have been added to the benefit function (11) in order to implement the various constraints.
Chen , V.C.P., D. Ruppert and C.A. Shoemaker (1999). Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38- 53. Foufoula-Georgiou, E . and P.K. Kitanidis (1988) . Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345-1359. Girosi, F. , M. Jones and T. Poggio (1995). Regularization theory and neural networks architectures. Neural Computation 7, 219-269. Johnson, S.A., J .R. Stedinger, C. Shoemaker, Y . Li and J.A. Tejada-Guibert (1993). Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41 , 484-500. Leshno, M., V. Ya, A. Pinkus and S. Schocken (1993) . Multilayer feedforward networks with a non polynomial activation function can approximate any function. Neural Networks 6 , 861-867. Niederreiter, H. (1992) . Random Number Generation and Quasi-Monte Carlo Methods. SIAM. Philadelphia. Niyogi, P. and F . Girosi (1996). On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Computation 8, 819842. Nussbaum, M. (1986). On nonparametric estimation of a regression function that is smooth in a domain on Rk . Theory of Probability and its Applications 31, 118-125. Parisini, T. and R . Zoppoli (1994) . Neural networks for feedback feedforward nonlinear control systems. IEEE Trans . on Neural Networks 5 , 436-449. Parisini, T . and R. Zoppoli (1998). Neural approximations for infinite-horizon optimal control of nonlinear stochastic systems. IEEE Trans. on Neural Networks 9, 1388-1408. Park, J. and I. W . Sandberg (1991). Universal approximation using radial-basis- function networks. Neural Computation 3, 246-257. Turgeon, A. (1981) . A decomposition method for the long-term scheduling of reservoirs in series. Water Resour. Res. 17, 1565- 1570. Vapnik, V. N. (1995) . The Nature of Statistical Learning Theory. Springer-Verlag. New York. Yakowitz, S. (1982) . Dynamic programming applications in water resources. Water Resour. Res. 18, 673-696. Zoppoli, R., T . Parisini and M. Sanguineti (2001). Approximating networks and extended Ritz method for the solution of functional optimization problems. J . of Optim. Theory and Appl. (to appear).
The results obtained from both methods have been tested by computing a mean benefit over different sequences of the inflows, randomly extracted according to their probability density functions , each one for Xo = Xmin. The mean benefit for Method A has turned out to be 282, while Method B has given a benefit of 285.
6
...• Natural inflow -
Water release
Fig. 1. Ten-reservoir system configuration.
7. REFERENCES
Archibald, T.W., K.I.M. McKinnon and L.C. Thomas (1997) . An aggregate stochastic dynamic programming model of multireservoir systems. Water Resour. Res. 33, 333-340. Baglietto, M. , C. Cervellera, T. Parisini, M. Sanguineti and R. Zoppoli (2000). Neural approximators, dynamic programming and stochastic approximation. Proc. 19th Amer. Contr. Con/. pp. 3304-3308. Baglietto, M., T . Parisini and R. Zoppoli (2001) . Distributed-information neural control: the case of dynamic routing in traffic networks. IEEE Trans . on Neural Networks (to appear). Barron, A.R. (1993) . Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans . on Information Theory 39,930-945. Barron, A.R. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning 14, 115-133. Bellman, R. (1957). Dynamic Programming. Princeton University Press. Princeton. Bellman, R ., R. Kalaba and B . Kotkin (1963) . Polynomial approximation - a new computational technique in dynamic programming. Math . Comp . 17, 155- 161. Bertsekas, D.P. and J.N. Tsitsiklis (1996) . NeuroDynamic Programming. Athena Scientific. Belmont.
114