Neural network operators: Constructive interpolation of multivariate functions

Neural network operators: Constructive interpolation of multivariate functions

Accepted Manuscript Neural network operators: constructive interpolation of multivariate functions Danilo Costarelli PII: DOI: Reference: S0893-6080(...

545KB Sizes 0 Downloads 121 Views

Accepted Manuscript Neural network operators: constructive interpolation of multivariate functions Danilo Costarelli PII: DOI: Reference:

S0893-6080(15)00036-2 http://dx.doi.org/10.1016/j.neunet.2015.02.002 NN 3445

To appear in:

Neural Networks

Received date: 10 November 2014 Revised date: 28 January 2015 Accepted date: 1 February 2015 Please cite this article as: Costarelli, D. Neural network operators: constructive interpolation of multivariate functions. Neural Networks (2015), http://dx.doi.org/10.1016/j.neunet.2015.02.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Click here to view linked References

Neural network operators: constructive interpolation of multivariate functions by Danilo Costarelli

Department of Mathematics and Computer Science, University of Perugia 1, Via Vanvitelli, 06123 Perugia, Italy

[email protected]

Abstract In this paper, the interpolation of multivariate data by operators of the neural network type is proved. These operators, can also be used to approximate continuous functions dened on a box-domain of Rd . In order to show this fact, a uniform approximation theorem with order is proved. The rate of approximation is expressed in term of the modulus of continuity of the functions being approximated. The above interpolation neural network operators are activated by suitable linear combinations of sigmoidal functions constructed by a procedure involving the well-known central B-spline. The implications of the present theory with the classical theories of neural networks and sampling operators are analyzed. Finally, several examples with graphical representations are provided. AMS 2010 Mathematics Subject Classication: 41A25, 41A05, 41A30, 47A58 Key words and phrases: sigmoidal functions; neural networks operators; multivariate interpolation; multivariate approximation; order of approximation, irregular sampling scheme.

1 Introduction The function h(x) implementing a neural network (NN) can be represented by: h(x) =

n X j=0

cj σ(wj · x + θj ),

x ∈ Rd ,

d ∈ N+ ,

(I)

where cj ∈ R are the coecients, wj ∈ Rd are the weights and θj ∈ R are thresholds of the NN, for every j = 0, 1, ..., n. The terms wj · x denote the 1

inner product in Rd between the two vectors wj and x, while the function σ : R → R is the activation function of the NN, see, e.g., [54]. Typically, σ(x) is a sigmoidal function, i.e., a measurable function satisfying the properties: lim σ(x) = 0,

x→−∞

and

lim σ(x) = 1.

x→+∞

Examples of sigmoidal functions are σ` (x) := (1 + e−x )−1 and σh (x) := (1/2)(tanh x − 1), x ∈ R, i.e., the well-known logistic and hyperbolic tangent function, see, e.g., [38]. In the last thirty years, NNs of the form in (I), activated by sigmoidal functions, have been successfully applied in Approximation Theory, in order to approximate functions of one or several variables, see, e.g., [38, 13, 41, 49, 26, 24]. The most common approach used to study approximation by NNs was the non-constructive one, see, e.g., [38]. Recently, constructive approximation results have been proved in both univariate and multivariate settings, see, e.g., [24], where some of them are summarized. For instance, in [20, 21] the idea of convolution kernel from a sigmoidal function is used. This approach is quite dicult due to the nature of the problem and is based on some results related to the theory of ridge functions. Another possibility to obtain constructive NNs activated by sigmoidal functions was described in [48], where a convolution approach is considered again, but for Lebesgue-Stieltjes integrals. Moreover, we mentioned the approach proposed by R.Barron [13], where multivariate functions satisfying a suitable condition involving the Fourier transform of f were approximated in L2 norm. Further, in the paper [26] a constructive Lp -version of the Cybenko's approximation theorem ([38]) is provided, in both univariate and multivariate settings. Finally, in [32] the exponential convergence of certain NNs constructed by sigmoidal function is proved, by an approach based on multiresolution approximation and the corresponding wavelet scaling functions. For other results concerning NNs and their applications to approximation problems, see, e.g., [46, 47, 52, 53, 40, 51, 27, 30, 42]. The approaches used to obtain the results quoted and described above present some diculties and are not obvious. The theory of NN operators have been introduced in order to study a constructive approximation process by NNs, which was more intuitive than those proposed in previous papers. Moreover, results for NN operators can be proved by using techniques typically used in Operator Theory. The NN operators Nnσ were introduced by G.A. Anastassiou in [1], where the results originally proved in [18] by P. Cardalignet and G. Euvrard have

2

been extended. The above NN operators were dened by: bnb1 c

Nnσ (f, x) :=

X

bnbs c

...

k1 =dna1 e

bnb1 c

X

X

ks =dnas e

k1 =dna1 e

...

f (k/n) Ψσ (nx − k)

bnbs c

X

ks =dnas e

Ψσ (nx − k)

, x∈R

(II)

where R := [a1 , b1 ] × ... × [ad , bd ] ⊂ Rd , n ∈ N+ , f : R → R be a given bounded function, and Ψσ (x) := φσ (x1 ) · · · φσ (xd ), x ∈ Rd , is the multivariate density functions dened through the product of d one-dimensional density functions φσ (x) := 12 [σ(x + 1) − σ(x − 1)], x ∈ R. In [2, 3, 4, 6], Anastassiou studied neural network operators Nnσ both in univariate and multivariate settings, for the special cases of logistic and hyperbolic tangent sigmoidal activation functions, i.e., with σ(x) = σ` (x) and σ(x) = σh (x). Approximation results involving continuous functions dened on bounded domains were proved therein for the family (Nnσ )n∈N+ , together with estimates concerning the order of approximation. Subsequently, other results concerning the NN operators Nnσ have been obtained in [16, 17, 28, 29, 31]. In particular, in [28, 29] the approximation results proved in [2, 3, 4, 6] have been extended, in order to consider NN operators activated by any sigmoidal function σ(x) belonging to a suitable class which contains also σ` (x) and σh (x). Moreover, the results concerning the order of approximation have been improved therein. Further, in [31] NN operators of the Kantorovich type have been introduced in order to study the problem of approximating Lp functions, for 1 ≤ p < +∞. An important task for NNs is the capability to interpolate any given data. This problem is strictly related with the theory of training neural networks. Indeed, NNs which are able to interpolate data belonging to a suitable training set can be used to reproduce exactly certain values, without errors. The above problem, has been already studied by many authors (see e.g. [39, 56, 50]) by means of analytical or algebraic approaches. By the word analytical we refer to results proved by non-constructive arguments, while by the word algebraic we refer to proofs in which the coecients of interpolating NNs are obtained by solving suitable linear algebraic systems. Concerning the theory of NN operators, in general the Nnσ do not interpolate, i.e., Nnσ (f, k/n) 6= f (k/n), for any given bounded function f : R → R, n ∈ N+ and k ∈ Zd . In [25], interpolation NN operators have been introduced in one-dimensional setting, by a substantial modication in the denition of Nnσ , when d = 1. The changes that have been done in the onedimensional frame in order to introduce interpolating NN operators, focused 3

on the univariate density functions, the nodes where the sample values (i.e., the coecients) of the NN are computed, and others important elements such as the weights and the threshold values. It is well-known that the theory of NNs is mainly multivariate since applications to neurocomputing processes usually involve high dimensional data, then a multivariate extension of results proved in [25] is needed. In this paper, the interpolation of functions of several variables, dened on box-domain of Rd , by means of multivariate operators Fns (introduced in Section 2) of the NN type is proved. For the sake of simplicity, the points where a given function is interpolated are in general taken on a uniform spaced grid. However, even if the grid is not uniformly spaced, or more in general the points are not disposed over a grid, NN operators which interpolate a given function at such nodes can be constructed (see Subsection 2.1). In order to obtain such results, as happens in one-dimensional case, the denition of the operators Nnσ must be strongly modied. Here for instance, the multivariate density functions Ψσ (x) must be replaced by Ψs (x), which are dened by sigmoidal functions constructed by a certain procedure involving the well-known central B-spline, see, e.g., [32]. In this way, we are able to consider a general family of activation functions, including for instance some known examples, such as the ramp function, see, e.g., [19, 17, 24]. In order to describe the behavior of the operators Fns at points of R where continuous functions f are in general not interpolated, a uniform approximation theorem with order is also obtained. The rate of approximation is expressed in term of the modulus of continuity of the function being approximated. Both interpolation and approximation results (see Theorem 2.5 and Theorem 2.6 in Section 2) proved in this paper are the multivariate versions of theorems rstly proved in [25] in one-dimensional setting. In Section 3 some concrete examples of approximations and interpolations are presented, in both one and two space dimensions. Finally, in Section 4 the main results of this paper are discussed in relation to the theory of NNs, with particular attention to the eld of applications, such as, applications to the training of NNs. Moreover, a detailed comparison between Nnσ and Fns is made, together with a discussion among the results here proved and those already existing concerning interpolation by NNs. In addition, the relations between NNs and sampling operators are pointed out.

2 The main results We rst introduce some notations and preliminary concepts. In this paper, we will denote by Ms (x), the well-known one-dimensional central B-spline

4

of order s ∈ N+ (see, e.g., [14, 10]), dened as follows:

  s s−1 X 1 s i s Ms (x) := (−1) +x−i , (s − 1)! i 2 + i=0

x ∈ R,

where the function (x)+ := max {x, 0} denotes the positive part of x ∈ R. In [32], a procedure to construct sigmoidal functions by using the central B-spline of order s has been described. More in detail, for any given positive integer s, we will denote by σs (x) the sigmoidal function: σs (x) :=

Z

x

Ms (t) dt,

−∞

(1)

x ∈ R.

Note that, σs (x) are non-decreasing and 0 ≤ σs (x) ≤ 1, for every x ∈ R and s ∈ N+ . We are now able to introduce the non-negative one-dimensional density functions by the following nite linear combination of σs : φs (x) := σs (x + 1/2) − σs (x − 1/2),

x ∈ R.

(2)

We will use the density functions dened above as activation functions of the neural network operators studied in this paper. It is easy to see that, the functions of the form φs (x) satisfy the following useful properties: (Φ1) φs (x) is an even function; (Φ2) φs (x) is non-decreasing for x < 0 and non-increasing for x ≥ 0;

(Φ3) supp(φs ) ⊆ [−Ks , Ks ], where Ks := (s + 1)/2, i.e., φs (x) = 0 for every x ∈ R, |x| ≥ Ks ;

(Φ4) 0 ≤ φs (x) ≤ 1, for every x ∈ R. In particular, φs (0) > 0 and φs (Ks /2) > 0.

Conditions (Φi), i = 1, ..., 4, come out from the well-known properties of the central B-spline; for more details, see [14, 32, 24, 25].

Remark 2.1. Note that, the functions φs (x) are approximate identities, and they satisfy some important conditions, such as: X k∈Z

φs (x − k) = 1,

x ∈ R,

where the above series is convergent on the compact subsets of R. The proof of this claim can be made by following the line drawn in [28, 29].

Remark 2.2. The functions of the form φs (x), s ∈ N+ , are "centered bell shaped function", according to the denition given in [18, 1]. 5

We now introduce the following multivariate density functions: ψs (x) := φs (x1 ) · φs (x2 ) · · · φs (xd ),

x := (x1 , ..., xd ) ∈ Rd .

(3)

In order to introduce the multivariate neural network interpolation operators, we will consider from now on, a normalized version of ψs (x), suitable to study the above operators in case of functions dened on box-type domains, i.e., domains of the form R := [a1 , b1 ] × · · · × [ad , bd ]. We denote by: Ψs (x) := ψs



x1 x2 xd , , ..., b1 − a1 b2 − a2 bd − ad



,

x ∈ R,

(4)

the multivariate normalized density functions dened in R. From condition (Φ4), it is easy to see that kψs k∞ ≤ 1 and kΨs k∞ ≤ 1, for every s ∈ N+ , where the symbol k · k∞ denotes the usual sup-norm of ψs and Ψs in their respective domains, i.e., kψs k∞ := supx∈Rd |ψs (x)| and kΨs k∞ := supx∈R |Ψs (x)|. By using Ψs (x), we introduce the following denition of multivariate neural network interpolation operators.

Denition 2.3. Let N+ .

f : R → R be a bounded and measurable function

and n ∈ The multivariate neural network (NN) interpolation operators activated by σs and acting on f , are dened by

Fns (f, x) :=

n X

k1 =0

···

n X

k1 =0

n X

kd =0

···

f (tk ) Ψs n Ks (x − tk )

n X

Ψs

kd =0

 n Ks (x − tk )



,

x ∈ R,

where the tk 's are dened by tk := (t1k1 , ..., tdkd ), k = (k1 , ..., kd ), with tiki = ai + ki · hi , ki = 0, 1, ..., n, hi = (bi − ai )/n, i = 1, 2, ..., d, are uniform spaced points, and Ks > 0 is the constant introduced in condition (Φ3).

The operators Fns here introduced, are well-dened for every n ∈ N+ . This claim can be proved as follows. For any xed x ∈ R we have: n X

k1 =0

···

n X

kd =0

Ψs n Ks (x − tk )

=

n X

k1 =0

···

n X

kd =0



  Ψs nKs |x1 − t1k1 |, . . . , nKs |xd − tdkd |

since condition (Φ1) holds. Now, it is easy to see that: n X

k1 =0

···

n X

kd =0

  Ψs nKs |x1 − t1k1 |, . . . , nKs |xd − tdkd | ≥

6

  Ψs nKs |x1 − t1k1∗ |, . . . , nKs |xd − tdk∗ | d

(5)

where tk∗ := (t1k1∗ , ·, tdkd∗ ) is a suitable node, such that every component satises the following property: i = 1, 2, . . . , d.

|xi − tiki∗ | ≤ hi /2,

(6)

From (6) it turns out that nKs |xi − tiki∗ | ≤ (bi − ai )(Ks /2), i = 1, 2, . . . , d, hence by the inequality in (5), (Φ2) and (Φ4) we obtain: 

Ψs nKs |x1 −

t1k1∗ |, . . . , nKs |xd



   Ks (b1 − a1 ) Ks (bd − ad ) ≥ Ψs ,..., 2 2

tdk∗ | d

(7) Moreover, for every bounded and measurable function f : R → R we have: = φs (Ks /2) · φs (Ks /2) · · · φs (Ks /2) = [φs (Ks /2)]d > 0.

|Fns (f, x)| ≤ kf k∞

n X

k1 =0 n X

k1 =0

··· ···

n X

kd =0 n X

 Ψs n Ks (x − tk ) Ψs

kd =0

 n Ks (x − tk )

= kf k∞ < +∞.

Remark 2.4. We can observe that, in the NN interpolation operators

Fns

the same number, n, of equally spaced samples (the nodes tiki ) have been considered in each one-dimensional interval [ai , bi ] of the box-domain R. From the point of view of the applications it can occur that, for every [ai , bi ], one should consider dierent numbers, ni ∈ N+ , i = 1, 2, ..., n, of sample values. Taking into account of this fact, Denition 2.3 can be easily reformulated as follows: Fens (f, x) :=

n1 X

···

nd X

  f (tk ) Ψs n1 Ks (x1 − t1k1 ), ..., nd Ks (x1 − tdkd )

k1 =0 kd =0 nd n1 X X k1 =0

···

kd =0

Ψs



 n1 Ks (x1 − t1k1 ), ..., nd Ks (x1 − tdkd )

,

for every x ∈ R, ni ∈ N+ , i = 1, 2, ..., n, where f : R → R is any bounded and measurable function. In this paper, we do not treat the operators Fens since their notation is more complex than that used for Fns . However, the proofs of the results below, both for Fns and Fens , follow by the same arguments. We are now able to prove the interpolation properties of Fns .

Theorem 2.5. Let n ∈ N+ . Then

f : R → R be a bounded and measurable function and

Fns (f, tj ) = f (tj ),

f or every

with ji = 0, 1, ..., n, i = 1, 2, ..., d.

7

j := (j1 , j2 , ..., jd ),

Proof. Let j := (j1 , j2 , ..., jd ), with ji = 0, 1, ..., n, i = 1, 2, ..., d, be xed. For every index k = (k1 , ..., kd ), ki = 0, 1, ..., n, i = 1, 2, ..., d, if k 6= j , i.e., there exists ki∗ 6= ji∗ for some i∗ = 1, 2, ..., d, we have: ∗ ∗ n Ks tiji∗ − tiki∗ ≥ n Ks hi∗ = Ks (bi∗ − ai∗ ),

hence, by (Φ1), (Φ2) and (Φ3) we obtain: 0 ≤ φs



n Ks



|tiji∗ − tiki∗ | bi∗ − ai∗

!

≤ φs (Ks ) = 0,

then, ∗



Ψs (n Ks (tj −tk )) = Ψs (n Ks |t1j1 −t1k1 |, ..., n Ks |tiji∗ −tiki∗ |, ..., n Ks |tdjd −tdkd |) = φs

|t1j − t1k1 | n Ks 1 b1 − a1

!



· · · φs

n Ks



|tiji∗ − tiki∗ | bi∗ − ai∗

Moreover, in case of k = j , we have:

!

· · · φs

n Ks

|tdjd − tdkd | bd − ad

!

Ψs (n Ks (tj − tk )) = Ψs (0) = Ψs (0, ..., 0) = [φs (0)]d > 0.

Finally, we can summarize the above considerations as follows: Ψs (n Ks (tj − tk )) =



[φs (0)]d , 0,

if k = j if k = 6 j.

(8)

By (8) it is easy to see what follows: Fns (f, tj ) =

f (tj ) Ψs (n Ks (tj − tj )) Ψs (n Ks (tj − tj ))

= f (tj ),

for every index j = (j1 , ..., jd ), with ji = 0, 1, ..., n, i = 1, 2, ..., d. In case of NN interpolation operators Fns acting on continuous functions on R, the following uniform approximation theorem with order can be proved. In the estimate below, the well-known concept of modulus of continuity for multivariate functions dened on bounded domain is used. In what follows, we denote by C 0 (R) the set of all continuous functions f : R → R. For any f ∈ C 0 (R) its modulus of continuity is dened by: ω(f, δ) :=

sup x, y ∈R

kx−yk2 ≤δ

f (x) − f (y) ,

for δ > 0 and where k · k2 denotes the usual Euclidean norm of Rd . We can now prove the following. 8

(9)

= 0.

Theorem 2.6. Let f

∈ C 0 (R) be xed. Then

kFns (f, ·) − f (·)k∞ ≤ 2d φs (Ks /2)−d ω(f, M



d n−1 ),

where M := max {bi − ai : i = 1, 2, ..., d}, for every n ∈ N+ . Proof. Let x = (x1 , ..., xd ) ∈ R and n ∈ N+ be xed. We suppose without any loss of generality that x is dierent to the nodes tk , since this case is investigated in Theorem 2.5. By using the inequalities in (5) and (7) we can write what follows: |Fns (f, x) − f (x)| = n n n n X X X X   ··· f (tk )Ψs n Ks (x − tk ) −f (x) ··· Ψs n Ks (x − tk ) k1 =0 kd =0 k1 =0 kd =0 = n n X X  ··· Ψs n Ks (x − tk ) k1 =0



kd =0

n n X X  1 f (tk ) − f (x) Ψs n Ks (x − tk ) . · · · d φs (Ks /2) k1 =0

kd =0

Let now K the set of all indexes k = (k1 , ..., kd ), ki = 0, 1, ..., n, i = 1, ..., d, and moreover, we denote by S the subset of K of indexes satisfying the following condition: |xi − tiki | < hi ,

for every

i = 1, 2, ..., d.

(10)

Note that the cardinality of the set S , namely |S|, is such that |S| ≤ 2d . Now, we can write: |Fns (f, x)−f (x)| ≤

= φs (Ks /2)−d

n n X X  1 f (tk ) − f (x) Ψs n Ks (x − tk ) · · · d φs (Ks /2) k1 =0

  X 

k∈K\S

kd =0

 X  f (tk ) − f (x) Ψs n Ks (x − tk ) +  k∈S

=: φs (Ks /2)−d {I1 + I2 } .

In case of k = (k1 , ..., kd ) ∈ K \ S , we have n Ks |xi∗ − tiki∗ | ≥ n Ks hi∗ = Ks (bi∗ − ai∗ ) for some i∗ = 1, ..., d, then by (Φ1), (Φ2) and (Φ3) we obtain: ∗

 0 ≤ Ψs n Ks (x − tk )

  ∗ = Ψs n Ks |x1 − t1k1 |, . . . , n Ks |xi∗ − tiki∗ | . . . , n Ks |xd − tdkd |

9

≤ φs

! n Ks |x1 − t1k1 | · · ·φs (Ks )· · ·φs b1 − a1

n Ks |xd − tdkd | bd − ad

!

= 0.

(11)

From (11) it turns out that I1 = 0. Finally, we estimate the term I2 . Denoting by k · k the innity norm in Rd , where kuk := max {|ui | : i = 1, ..., d}, u = (u1 , ..., ud ) ∈ Rd , and by using (10), we can observe for every k ∈ S : kx − tk k ≤ max {hi : i = 1, ..., d} =:

M , n

where M := max {bi − ai : i = 1, ..., d}. Since in the Euclidean space Rd all norms are equivalent, we obtain: kx − tk k2

√ M d ≤ , n

for every

k ∈ S,

and consequently, recalling that kΨs k∞ ≤ 1 for every s ∈ N+ , the corresponding values: √ !  M d f (tk ) − f (x) Ψs n Ks (x − tk ) ≤ ω f, . n

In conclusion, since S contains at most 2d elements, we can nally deduce: |Fns (f,

−d

x) − f (x)| ≤ φs (Ks /2)

−d

I2 ≤ 2 φs (Ks /2) d

√ ! M d ω f, , n

for every x ∈ R, and so the assertion follows. 2.1

The case of sampling values assumed on an irregular grid of points

In general, in applications of NNs may occur that the values of a given training set are not uniformly spaced. In this case, the results of the previous section can not be applied. From the mathematical point of view, when sample values are assumed on an irregular grid of nodes, some diculties arise. Indeed, in the previous papers the theory of NN operators has been always treated in the case of uniform spaced sample values. In this subsection, we show how the results proved in Theorem 2.5 and Theorem 2.6 can be extended to the case of sampling values assumed on an irregular grid of nodes. Let f : R → R, R ⊂ Rd , be a bounded function. For any i = 1, ..., d, we consider the n + 1 nodes ti0 = ai < ti1 < ... < tin−1 < tin = bi , for every xed n ∈ N+ . Suppose that, in correspondence to n, there exist 0 < δ1(n) ≤ δ2(n) , with δ2(n) < 2 δ1(n) , and such that: (n)

δ1

(n)

≤ tiki +1 − tiki ≤ δ2 ,

for every 10

ki = 0, . . . , n − 1,

(12)

and for every i = 1, ..., d. Note that, the nodes tiki are not necessarily equally spaced. For the sake of simplicity, in what follows we denote by tk := (t1k1 , . . . , tdkd ), where k := (k1 , . . . , kd ). We construct the NN operators which are able to interpolate irregular sample values by:

Gsn (f,

x) :=

n X

k1 =0

···

n X

k1 =0

n X

(n)

n Ks δ2

f (tk ) Ψs

···

n X

(x − tk )

(n)

δ1

kd =0

!

(n)

Ψs

n Ks δ2 (n)

δ1

kd =0

!

x ∈ R,

,

(x − tk )

where the tk 's are the "irregular" vectors dened above, while Ψs , and Ks > 0 are the same introduced in Section 2. Obviously, if δ1(n) = δ2(n) , the operators Gsn reduced to the Fns studied in Section 2. Now, in order to show that the operators Gsn are well dened for any bounded function f : R → R, we proceed similarly to what has been done in (5), (6), and (7), and then, for every x ∈ R, we can write: n X

k1 =0

···

n X

(n)

Ψs

n Ks δ2 (n)

δ1

kd =0

(x − tk )

!



"

(n)

φs

δ2 Ks (n)

2 δ1

!#d

> 0,

(n)

Ks (n) (n) < Ks since δ2 < 2 δ1 . where 0 < δ2 (n) 2 δ1 Moreover, a property analogous to that given in (8) can be proved when the parameters δ1(n) and δ2(n) are introduced in the argument of the density function Ψs . Thus the following holds:

Ψs

(n) n Ks δ2 (tj (n) δ1

− tk )

!

=

  [φs (0)]d , 

0,

if k = j if k 6= j,

(13)

where tk are the vectors of irregular nodes dened above. In order to prove (13), we consider only the non-trivial case k 6= j . Now, proceeding as in the proof of Theorem 2.5, we have that there exists ki∗ 6= ji∗ for some i∗ = 1, ..., d, such that: (n)

(n)

n Ks δ2 (n) δ1

|ti∗ ki∗



tji∗i∗ |



n Ks δ2 (n) δ1

(n)

δ1

(n)

= n Ks δ2 .

(14)

Observing now that, by the above construction we must have: (n)

δ1



bi∗ − ai∗ (n) ≤ δ2 , n

11

(15)

we can deduce that n/(bi∗ − ai∗ ) ≥ 1/δ2(n) , and then by (14) we obtain: (n)

n Ks δ2

∗ |tiji∗

∗ tiki∗ |

!

≤ φs (Ks ) = 0, − (n) δ1 (bi∗ − ai∗ )   (n) n K s δ2 and this implies that Ψs (tj − tk ) = 0. (n) 0 ≤ φs

δ1

Now, as a consequence of (13) it is easy to prove that: Gsn f, tk



 = f tk ,

for every

k = (k1 , ..., kd ),

with ki = 0, 1, ..., n, and i = 1, ..., d. Finally, we can also observe that, also Theorem 2.6 can be rewritten for the operators Gsn . It is quite easy to see that the following estimate can be proved: "

(n)

Ks δ2

kGsn (f, ·) − f (·)k∞ ≤ 2d φs

(n)

2 δ1

!#−d

 √  (n) · ω f, d δ1 ,

for any xed f ∈ C 0 (R). In fact, it is sucient to repeat the same proof made for Theorem 2.6 but replacing h1 with δ1(n) . Finally, noting that (15) holds for every i∗ = 1, ..., d, we can deduce that δ1(n) → 0 as n → +∞, and moreover: kGsn (f,

·) − f (·)k∞ ≤ 2

d

"

φs

(n)

Ks δ2

(n)

2 δ1

!#−d

  √ · ω f, d M n−1 ,

where M := max {bi − ai : i = 1, 2, ..., d}.

Remark 2.7. In general, the problem of interpolation, can also be studied

for a generic (training) set of values, not necessarily related to a xed grid of points in a multivariate domain R, see e.g. [56, 43, 50]. Apparently, the above problem can not be solved by the theory developed in Section 2. Instead, it is always possible to observe that any nite set of data can be mapped into a suitable grid (with the same cardinality) assumed in a box-domain of Rd , by means of a discrete invertible function. By this procedure, and using the inverse of the above ad-hoc function, we became able to construct NN operators which interpolate the values belonging to any nite set. We recall that, as noted in Remark 2.4, the number of sample values not have to be necessarily the same for each spatial dimension of the considered domain.

12

3 Examples with graphical representations The sigmoidal functions σs (x) used in this paper to construct the NN interpolation operators and dened by the theoretical procedure in (1) enclose some well-known examples of sigmoidal functions. For instance, in case of the central B-spline of order 1, namely M1 (x), the corresponding sigmoidal function σ1 (x) coincides with the well-known ramp function dened by:   0, x + 1/2, σ1 (x) = σR (x) :=  1,

x ≤ −1/2, −1/2 < x < 1/2, x ≥ 1/2,

studied in relation to NNs approximation, e.g., in [19, 17, 28, 29, 24, 25, 31]. Moreover, the sigmoidal functions corresponding to M2 (x) and M3 (x) are:

and

 0,    2 x /2 + x + 1/2, σ2 (x) := x − x2 /2 + 1/2,    1,

  0,        3 3 1   6 (x + 2 ) ,     x3 1 3 σ3 (x) := 4x − 3 + 2,       1 3 3   1 − 6 ( 2 − x) ,       1,

x < −1, −1 ≤ x < 0, 0 ≤ x < 1, x ≥ 1, x < −3/2, −3/2 ≤ x < −1/2, −1/2 ≤ x ≤ 1/2, 1/2 < x ≤ 3/2, x ≥ 3/2,

and so on for s > 3. Now, we show some concrete examples of approximations and interpolations by means of the operators Fns , both in one and two space dimensions.

Example 3.1. Consider the continuous positive function f

dened by:

f (x) :=



x2 + 1, e2x ,

: [−1, 1] → R+ 0

−1 ≤ x < 0, 0 ≤ x ≤ 1.

We consider approximations and interpolations of f on [−1, 1] by the operators Fns , for s = 1, 2, and 3 and n = 3, and 5. The plots can be found in Fig.1. We can observe that, in both the plots of Fig.1 the nodes where the function f is interpolated can be easily identied as the intersection points of all the operators Fn1 , Fn2 , and Fn3 with the plot of the original function f . 13

Figure 1: Approximations and interpolations of the function f (x) of Example 3.1 obtained by the NN operators Fns , with n = 3 (left) and n = 5 (right) respectively, and s = 1, 2, 3, i.e., with σ1 (x), σ2 (x), and σ3 (x).

Moreover, we can also observe that, the regularity of the operators increases when also the parameter s increases. This latter property is due to the fact that, in general, also the regularity of the central B-splines of order s increases with the same logic. Further, we can also note that, by increasing the index n, the quality of the approximations improve, according to what has been proved in Theorem 2.6.

Example 3.2. Finally, let F

: [0, 2] × [0, 2] → R dened by:

F (x, y) := 5 x

cos(x y) . x2 + y + 1

We consider approximations and interpolations of F on the square [0, 2] × [0, 2] by the bivariate operators Fns , for s = 1, 2, and 3, and n = 3, 5, and 10. The plots related to Example 3.2 can be found in Fig.s 2, 3, and 4. The same observation made for approximations and interpolations of the one-dimensional function of Example 3.1 can now be proposed also for the example in two-space dimensions.

4 Discussion of the results and nal conclusions The problem of data interpolation and approximation of functions of several variables is of central interest in general theory of neural networks, for instance, for applications concerning the training of NNs. The multivariate operators Fns allow us to reproduce data belonging to a suitable set (called training set) without errors. This fact is shown in Theorem 2.5, where the interpolation properties of the above operators are proved. The uniform approximation theorem with order (see Theorem 2.6) shows that, we can also reproduce sample values not belonging to the training set, and the reproduction errors can be estimated. 14

Figure 2: Approximations and interpolations of the function F (x, y) of Example 3.2 obtained by the two-dimensional NN operators Fns , with n = 3, and s = 1, 2, 3, i.e., with σ1 (x), σ2 (x), and σ3 (x).

Figure 3: Approximations and interpolations of the function F (x, y) of Example 3.2 obtained by the two-dimensional NN operators Fns , with n = 5, and s = 1, 2, 3, i.e., with σ1 (x), σ2 (x), and σ3 (x). 15

Figure 4: Approximations and interpolations of the function F (x, y) of Example

3.2 obtained by the two-dimensional NN operators Fns , with n = 10, and s = 1, 2, 3, i.e., with σ1 (x), σ2 (x), and σ3 (x).

In the last years, many authors studied the problem of exact interpolation on n + 1 distinct samples by neural networks with n + 1 neurons, and several proofs have been proposed. Analytical-type proofs were given in [56, 43, 45, 54, 44]. In particular, in [45] the activation function of the NNs are continuous and nondecreasing sigmoidal functions, and the interpolation is obtained by means of wights belonging to the unitary ball of Rd . Moreover, Pinkus in [54] proved the interpolation assuming more general activation functions, i.e., requiring that they needs to be continuous on the whole R and are not a polynomial. Also constructive (or more precisely algebraic) proof of the above result can be found in [39, 7, 57, 8]. For instance, in [39] an interpolation theorem has been proved for σ(x) = σ` (x) being the logistic function, while in [7] any nonlinear activation function was considered. The latter approach, was criticized by Babri and Huang in [8] for its less general validity. In all the above constructive papers, the problem of nding suitable coecients for the interpolating NNs was solved by determining the solutions of certain (n + 1) × (n + 1) linear algebraic systems. Finally, in [50] a constructive procedure is proved in order to construct one-dimensional interpolation NNs, activated by sigmoidal functions. The approach proposed in the present paper is very dierent from those quoted above, since it was analytical, constructive, but not-algebraic, in the 16

sense that, the solution of algebraic linear systems is not required. In fact, the coecients, and all the other terms composing the operators Fsn (or Gsn ) are always known. An approach that can be considered similar to that introduced in the present paper, is that given in [50]. However, it is mainly related to the problem of approximate (and not exact) interpolation. For other results concerning approximate interpolation see e.g., [56]. Moreover, we can also observe that in the papers quoted above, in general the values that are interpolated are generic and not related to a xed grid of nodes. Here instead, the sample values where target functions are interpolated by Fns , are considered on a grid of suitable uniform spaced nodes in the domain of f . While, by means of the operators Gsn introduced in Subsection 2.1 we are able to interpolate data on grids of points which are not necessarily equally spaced, and moreover, we can also interpolate values belonging to training sets containing sample values which are not related to grids of points. Clearly, the latter case occurs more frequently in applications. The denition of the univariate density functions φs (x) used in this paper, rst introduced in [25], are slightly dierent to the denition of the density functions φσ (x) used in [29] to dene and to study the multivariate approximation NN operators Nnσ . The φσ (x) were dened therein by: φσ (x) :=

1 [σ(x + 1) − σ(x − 1)], 2

x ∈ R,

for suitable non-decreasing sigmoidal functions σ(x), and the corresponding multivariate density functions were: Ψσ (x) := φσ (x1 ) · φσ (x2 ) · · · φσ (xd ),

x ∈ Rd .

In this case, no normalization for the φσ (x) composing the Ψσ (x) were required, as instead happens for the multivariate density functions Ψs (x) considered in this paper. The operators Nnσ activated by Ψσ (x) and studied in [29] (or in [3, 4, 5] in the particular cases of logistic and hyperbolic tangent activation function), in general, do not interpolate a given bounded and measurable function of several variables dened on R, i.e., Nnσ (f, k/n) 6= f (k/n). Other dierences between the operators Fns and Nnσ can be observed. For instance, the sample values used to dene Fns and Nnσ are respectively f (tk ) and f (k/n), where the tk are those introduced in Denition 2.3, and are not necessarily of the form k/n, n ∈ N+ ; the weights used in Nnσ are the same for each variable and equal to n while, in case of Fns , we have dierent weights associated to dierent variables, and they are equal to n Ks /(bi − ai ), i = 1, 2, ..., d. Finally, the threshold values are −nKs tiki /(bi − ai ), i = 1, 2, ..., d, for each variable in case of Fns and −k/n for Nnσ . 17

The observation made above reveals the connections between the theory of NN operators and the theory of sampling operators, see, e.g., [15, 11, 58, 10, 12]. Indeed, discrete sampling operators are in general based upon kernel functions which are approximate identities, and usually, they are used to reconstruct continuous signals dened on whole real line (with unbounded duration) or whole space Rd , by means of a discrete family of its samples, see, e.g., [55, 11, 59]. Moreover, also not necessarily continuous signals can be reconstructed. In this case, sampling operators of the Kantorovich type seems to be the most appropriate to perform this task, see, e.g., [9, 33, 34, 22, 23, 35, 36, 37]. As showed in Remark 2.1, the density functions φs (x) dened in this paper satisfy all the typical properties of the approximate identities and then, can be used as kernels in the above sampling operators in the univariate case. Adopting the same approach given in [29], it is possible to prove that, also the multivariate functions ψs (x) and Ψs (x) are approximate identities, and then can be used for the same purpose. Then, we can nally observe that the Fns can be reviewed as sampling operators themselves, with special kernels constructed by sigmoidal functions and useful to reconstruct continuous signals with bounded duration, see, e.g., [14, 15, 10].

Acknowledgment The author is a member of the Gruppo Nazionale per l'Analisi Matematica, la Probabilitá e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM). Moreover, the author would to thank the referees for their suggestions which have been very useful in order to improve the quality of the present paper.

References [1] G.A. Anastassiou, Rate of convergence of some neural network operators to the unit-univariate case, J. Math. Anal. Appl. 212 (1997) 237-262. [2] G.A. Anastassiou, Univariate hyperbolic tangent neural network approximation, Math. Comput. Modelling 53 (5-6) (2011) 1111-1132. [3] G.A. Anastassiou, Multivariate hyperbolic tangent neural network approximation, Comput. Math. Appl. 61 (4) (2011) 809-821. [4] G.A. Anastassiou, Multivariate sigmoidal neural network approximation, Neural Networks 24 (2011) 378-386. [5] G.A. Anastassiou, Intelligent Systems: Approximation by Articial Neural Networks, Intelligent Systems Reference Library 19, SpringerVerlag, Berlin, 2011. 18

[6] G.A. Anastassiou, Univariate sigmoidal neural network approximation, J. Comput. Anal. Appl. 14 (4) (2012) 659-690. [7] P.J. Antsaklis, M.A. Sartori, A simple method to derive bounds on the size and to train multilayer neural networks, IEEE Trans. Neural Networks 2 (4) (1991) 467-471. [8] H.A. Babri, G.B. Huang, Feedforward neural networks with arbitrary bounded nonlinear activation functions, IEEE Trans. Neural Networks 9 (1) (1998) 224-229. [9] C. Bardaro, P.L. Butzer, R.L. Stens, G. Vinti, Kantorovich-type generalized sampling series in the setting of Orlicz spaces, Sampling Theory in Signal and Image Processing 6 (1) (2007) 29-52. [10] C. Bardaro, J. Musielak, G. Vinti, Nonlinear Integral Operators and Applications, New York, Berlin: De Gruyter Series in Nonlinear Analysis and Applications 9, 2003. [11] C. Bardaro, G. Vinti, A general approach to the convergence theorems of generalized sampling series, Applicable Analysis 64 (1997) 203-217. [12] C. Bardaro, G. Vinti, An Abstract Approach to Sampling Type Operators Inspired by the Work of P.L. Butzer - Part I - Linear Operators, Sampl. Theory Signal Image Process. 2 (2003) (3) 271-296. [13] A.R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inform. Theory 39 (3) (1993) 930-945. [14] P.L. Butzer, R.J. Nessel, Fourier Analysis and Approximation, Pure and Applied Mathematics 40, Academic Press, New York-London, 1971. [15] P.L. Butzer, S. Ries, R.L. Stens, Approximation of continuous and discontinuous functions by generalized sampling series, J. Approx. Theory 50 (1987) 25-39. [16] F. Cao, Z. Chen, The approximation operators with sigmoidal functions, Comput. Math. Appl. 58 (4) (2009) 758-765. [17] F. Cao, Z. Chen, The construction and approximation of a class of neural networks operators with ramp functions, J. Comput. Anal. Appl. 14 (1) (2012) 101-112. [18] P. Cardaliaguet, G. Euvrard, Approximation of a function and its derivative with a neural network, Neural Networks 5 (2) (1992) 207220. [19] G. H. L. Cheang, Approximation with neural networks activated by ramp sigmoids, J. Approx. Theory 162 (2010) 1450-1465. 19

[20] E.W. Cheney, W.A. Light, Y. Xu, On kernels and approximation orders, Approximation Theory, G. Anastassiou (Ed.), Marcel Dekker, New York (1992) 227-242. [21] E.W. Cheney, W.A. Light, Y. Xu, Constructive methods of approximation by rigde functions and radial functions, Numerical Algorithms 4 (2) (1993) 205-223. [22] F. Cluni, D. Costarelli, A.M. Minotti, G. Vinti, Enhancement of thermographic images as tool for structural analysis in earthquake engineering, NDT & E International 70 (2015) 60-72. [23] F. Cluni, D. Costarelli, A.M. Minotti, G. Vinti, Applications of sampling Kantorovich operators to thermographic images for seismic engineering, J. Comput. Anal. Appl. 19 (4) (2015) 602-617. [24] D. Costarelli, Sigmoidal functions approximation and applications, Ph.D. Thesis, "Roma Tre" University, Rome, Italy (2014). [25] D. Costarelli, Interpolation by neural network operators activated by ramp functions, Journal of Mathematical Analysis and Application, 419 (2014) 574-582. [26] D. Costarelli, R. Spigler, Constructive approximation by superposition of sigmoidal functions, Anal. Theory Appl. 29 (2) (2013) 169-196. [27] D. Costarelli, R. Spigler, Solving Volterra integral equations of the second kind by sigmoidal functions approximations. J. Integral Equations Appl. 25 (2) (2013) 193-222. [28] D. Costarelli, R. Spigler, Approximation results for neural network operators activated by sigmoidal functions, Neural Networks 44 (2013) 101-106. [29] D. Costarelli, R. Spigler, Multivariate neural network operators with sigmoidal activation functions, Neural Networks 48 (2013) 72-77. [30] D. Costarelli, R. Spigler, A collocation method for solving nonlinear Volterra integro-dierential equations of the neutral type by sigmoidal functions, J. Integral Equations Appl. 26 (1) (2014) 15-52. [31] D. Costarelli, R. Spigler, Convergence of a family of neural network operators of the Kantorovich type, J. Approx. Theory 185 (2014) 80-90. [32] D. Costarelli, R. Spigler, Approximation by series of sigmoidal functions with applications to neural networks, Ann. Mat. Pura Appl. 194 (1) (2015) 289-306, DOI:10.1007/s10231-013-0378-y . 20

[33] D. Costarelli, G. Vinti, Approximation by Multivariate Generalized Sampling Kantorovich Operators in the Setting of Orlicz Spaces, Bollettino U.M.I. (9) IV (2011), 445-468. [34] D. Costarelli, G. Vinti, Approximation by nonlinear multivariate sampling-Kantorovich type operators and applications to Image Processing, Numerical Functional Analysis and Optimization 34 (8) (2013) 819-844. [35] D. Costarelli, G. Vinti, Order of approximation for sampling Kantorovich operators, J. Int. Eq. Appl., 26 (3) (2014) 345-368. [36] D. Costarelli, G. Vinti, Rate of approximation for multivariate sampling Kantorovich operators on some functions spaces, J. Int. Eq. Appl., 26 (4) (2014) 455-481. [37] D. Costarelli, G. Vinti, Sampling Kantorovich operators and their applications to approximation problems and to Digital Image Processing, Proceedings of 8th International Conference on Applied Mathematics, Simulation, Modelling (ASM'14), Florence, Italy November 22-24, 2014. In: Recent Advances in Applied Mathematics, Modelling and Simulation, (2014) 256-260. [38] G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signals Systems 2 (1989) 303-314. [39] S. Dasgupta, Y. Shristava, Neural networks for exact matching of functions on a discrete domain, in: Proceedings of the 29th IEEE Conference on Decision and Control, Honolulu, 1990, pp. 1719-1724. [40] G. Gripenberg, Approximation by neural network with a bounded number of nodes at each level, J. Approx. Theory 122 (2) (2003) 260-266. [41] N. Hahm, B. Hong, Approximation order to a function in C(R) by superposition of a sigmoidal function, Applied Math. Lett. 15 (2002) 591-597. [42] V.E. Ismailov, On the approximation by neural networks with bounded number of neurons in hidden layers, J. Math. Anal. Appl. 417 (2) (2014) 963-969. [43] Y. Ito, Nonlinearity creates linear independence, Adv. Comput. Math. 5 (1996) 189-203. [44] Y. Ito, Independence of unscaled basis functions and nite mappings by neural networks, Math. Sci. 26 (2001) 117-126. [45] Y. Ito, K. Saito, Superposition of linearly independent functions and nite mappings by neural networks, Math. Sci 21 (1996) 27-33. 21

[46] P.C. Kainen, V. Kurková, An integral upper bound for neural network approximation, Neural Comput. 21 (2009) 2970-2989. [47] V. Kurková, Complexity estimates based on integral transforms induced by computational units, Neural Networks 33 (2012) 160-167. [48] B. Lenze, Constructive multivariate approximation with sigmoidal functions and applications to neural networks, Numerical Methods of Approximation Theory, Birkhauser Verlag, Basel-Boston-Berlin, (1992), 155-175. [49] G. Lewicki, G. Marino, Approximation by superpositions of a sigmoidal function, Zeitschrift fur Analysis und ihre Anwendungen, J. for Analysis and its Appl. 22 (2) (2003) 463-470. [50] B. Llanas, F.J. Sainz, Constructive approximate interpolation by neural networks, J. Comput. Appl. Math. 188 (2006) 283-308. [51] V. Maiorov, Approximation by neural networks and learning theory, J. Complexity 22 (1) (2006) 102-117. [52] Y. Makovoz, Random approximants and neural networks, J. Approx. Theory 85 (1996) 98-109. [53] Y. Makovoz, Uniform approximation by neural networks, J. Approx. Theory 95 (2) (1998) 215-228. [54] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numer. 8 (1999) 143-195. [55] S. Ries, R.L. Stens, Approximation by generalized sampling series, In: Constructive Theory of Functions'84, Soa, 1984, 746-756. [56] E.D. Sontag, Feedforward nets for interpolation and classication, J. Comp. Syst. Sci. 45 (1992) 20-48. [57] S. Tamura, M. Tateishi, Capabilities of a four-layered feedforward neural network, IEEE Trans. Neural Networks 8 (2) (1997) 251-255. [58] G. Vinti, A general approximation result for nonlinear integral operators and applications to signal processing, Applicable Analysis 79 (2001) 217238. [59] G. Vinti, L. Zampogni, A unifying approach to convergence of linear sampling type operators in Orlicz spaces, Adv. Dierential Equations 16 (2011) (5-6) 573-600.

22