NEUROCOMPUTINC
Neurocomputing
16 (1997) 149-186
Simulating human generalization-and-approximation capability by minimal layer networks loaded with minimal samples Yoshikane Takahashi” NTT Communication Science Laboratories, Yokosuko, Kanagawa, 239, Japan Received 2 November 1995; accepted 1 April 1997
Abstract Owing to their generalization-and-approximation capability, humans can recognize an object world such as a motion picture film by perceiving an appropriate finite set of points selected from among objects such as picture frames. This paper mathematically constructs a specific minimal layer network of a continuous type loaded with minimal finite samples that can simulate the human generalization-and-approximation capability. First, it formulates a Generalization-and-Approximation Problem (GAP) mathematically rigorously. It is to find a minimal pair of (sample, structure) that can generalize-and-approximate any given smooth mapping from an object world to another. Here, a structure of the network is comprised of unit sizes, synapse weights and biases. In addition, the pair (sample, structure) is minimal if the number of samples and the unit sizes are both minimum. Next, it reduces the GAP into a Standard Restricted Version (SRV-GAP) where solution construction techniques are restricted to those through polynomials. This is because of our aim to construct a practically feasible GAP solution that is calculable by almost all computers. Finally, the paper constructs a solution to the SRV-GAP rigorously, which constitutes its main results. It argues additionally that the solution also yields a useful practical version of the current universal approximation theorem that ensures an approximator network of a continuous mapping from one Euclidean space to another. Keywords: Neural networks; Generalization; Approximation; Minimal sample; Polynomials; Linear algebra
Continuum;
*Tel.:+ 8146859-2681; fax: + 8146859-3633;e-mail:
[email protected]. 0925-2312/97/$17.00 Copyright
PII SO925-23
12(96)00033-7
0
1997 Elsevier Science B.V. All rights reserved
Minimal
structure;
Y TakahashilNeurocomputing
150
16 (1997) 149-186
1. Introduction
It is popularly known that humans can recognize an object in the real-world only by perceiving some parts or finite points of it. Usually, a real object world such as a motion picture film shows a twofold aspect of a continuum. For instance, the film (the object world) consists of numerous picture frames (objects); each frame in turn contains a few to many persons, buildings, airplanes and trees (sub-objects) (Fig. 1). Thus the object world is considered to be comprised of a continuum of objects while the object consists of one or more continuum sub-objects. Humans generalize-andapproximate the object world well with finite points appropriately perceived from the objects and the sub-objects. This human generalization-and-approximation capability is visibly exemplified by, for instance, a film projector that maps a film onto a screen. Actually, the capability is a brain power that can map a physical outside object world continuously onto another object world inside the eye on retinas. There have been many excellent papers recently published that, theoretical or practical, deal with generalization and/or approximation capability of layer neural networks ([3, 5-7, 131 in particular). It is then expected that layer networks with smooth activation functions can simulate human capability suitably and efficiently. We thus propose a Generalization-and-Approximation Problem (GAP) as the most general one among such simulation problems. That is, the GAP is to find to any given smooth mapping from one object world onto another a minimal pair of (sample, structure) of the network that can generalize-and-approximate the mapping
film (continuum)
fiti
/
:
qr
frame 1 (object)
person, building, airplane, tree, etc. (sub-objects, continua) Fig. 1. An example
of a real object world
Y Takahashi JNeurocomputing 16 (I 997) 149-I 86
151
within any given admissible error. Here the object world and the sub-object are both represented by a continuum. Moreover, the sample loaded on the network is a set of finite points on the given mapping while the structure of the network, as usual, is comprised of unit sizes, synapse weights and biases. Also, (sample, structure) is called minimal if the sample points and the unit sizes are both minimum. Note that we have here supposed the following practical real-world requirement. [REl]
One can observe and collect from the environment only finite sample points on the mapping but not all.
The present paper proposes to construct a specific solution to the GAP mathematically rigorously that, at the same time, is really calculable in practice, usually by computers. The GAP as it stands is too difficult to solve since the minimal pair (sample, structure) requires the exhaustive comparative examination of all possible GAP solving methods. Moreover, almost all computers around us can calculate polynomials but a little beyond. Hence this paper studies GAP solving methods under the following practical real-world requirement while having to leave the full GAP for future study. [RE2] A pair (sample, structure) polynomial calculation.
of the GAP solution must be obtained with the
Therefore this paper focuses on constructing a solution to a Standard Restricted Version of the GAP (SRV-GAP) where solution construction techniques are restricted to those through polynomials. The SRV-GAP is a new problem and thus has not been studied so far in its entire form. Its subproblems, however, have been well investigated from several disparate standpoints. Let us briefly look at previous results in connection with our study of the SRV-GAP. Ji, Snapp and Psaltis [7] has motivated the study of the SRV-GAP. They modified the backpropagation algorithm for 3-layer networks to generalize a continuous curve from integer points. However, no reference was made to a minimal pair of (sample, structure), essential to the GAP. Funahashi [S] and Hornik et al. [6] provided basic theoretical confirmation of the GAP solution. They developed the universal approximation theorem to the simplest case of the GAP where the object world consists of a single object comprised of only one sub-object. The theorem states that any continuous mapping from a compact set in a P-dimensional Euclidean space to a Q-dimensional one can be approximated within any given admissible error with a 3-layer network with sufficiently many intermediate units. Their results, however, do not provide any practical construction method of the sample under requirement [REl], not to mention of the structure, crucial praticularly to the SRV-GAP. Additionally, their followers such as [14] have not progressed so far from them. Baum [2], Bichsel and Seitz [4], Lee [S], Mukhopadhyay et al. [ll], Roy et al. [12], Sartori and Antsaklis [13] and Zollner et al. [16] investigated the minimal structure to achieve a given discrete mapping. Networks they investigated, however, are of a threshold type for pattern classification and their searching methods for minimal structures heavily exploit that network features. It thus is very difficult to extend their techniques directly on the smooth type network of the GAP. In fact, neither the
152
Y. TakahashilNeurocomputing
16 (1997) 149-186
combinatorial method of [2] nor the maximum information method of [4] can have any power to the GAP because of their discrete nature. Nonetheless, we employ a linear algebraic method similar to that of [ 131 to examine the structure existence of a certain discrete type network derived from the GAP. Then Baum and Haussler [3] elucidated a bound on appropriate sample versus unit size sufficient for valid generalization from samples to total spaces subject to an arbitrary probability distribution. Their investigated networks are of a threshold type and besides their method depends essentially on the probability nature of input spaces. In addition, similar studies on samples have been done to smooth types of networks as well by Barlet [l], Malinowski and Zurada [9] and Mitchison and Durbin [lo]. Their studies, however, remain at an analysis stage and thus hardly provide any specific sample construction method. Hence we have to develop a sample construction method particular to the GAP. The survey of the previous related work has suggested us that to the SRV-GAP, we need to develop anew specific construction techniques of a minimal structure that also enable us to pick out a minimal sample. In addition, we have also learned from our earlier study Cl.51 that all essential techniques necessary for the SRV-GAP can be contained in those necessary for its basic form, called here a Standard Restricted Version of a Basic GAP (SRV-BGAP), where each object is a contour on a plane and the network has 3-layers. Hence this paper in essence constructs an SRV-GAP solution pair (sample, structure) mathematically with the aid of two classic mathematical tools; i.e., the Weierstrass’ polynomial approximation theorem and the solution existence theorem for linear equation systems. The former tool is very helpful in approximating continuous functions with polynomials while the latter in enhancing the structure existence condition by Sartori and Antsaklis [13] into a minimal structure existence condition for the SRV-BGAP. The present paper is comprised of ten sections. Section 2 expounds the GAP, focusing mostly on the BGAP. Section 3 mathematically formulates the BGAP. Section 4 reduces the BGAP into the SRV-BGAP. For the paper to be self-contained, Section 5 prepares the Weierstrass’ polynomial approximation theorem and the solution existence theorem for linear equation systems. Section 6 describes a solution to the SRV-BGAP in a form of Theorem 1, which constitutes our main result. Sections 7 and 8 demonstrate Theorem 1 mathematically rigorously by a series of Propositions. Section 9 extends Theorem 1 to the SRV-GAP. Results are stated as Theorem 2 and its corollary. The latter in particular yields a mathematically sharper and practically feasible version of the current universal approximation theorem. Finally, Section 10 concludes the paper with summarizing the solutions (Theorems 1 and 2, and Corollary) to the GAP, also referring to some future work. 2. A Generalization-and-Approximation 2.1. Premises
Problem (GAP)
on the GAP
Under the real-world environment [REl] and [RE2], the present paper further puts the following premises as major prerequisites for the GAP.
Y. TakahashilNeurocomputing 16 (1997) 149-186
153
Premise:
The paper considers the GAP under the following premises. [PRl] [PR2]
[PR3]
[PR4] [PR5]
[PR6]
The network is multi-layered with C1 (continuously differentiable) activation functions. An admissible error, represented by a non-negative real number EE Rf (a set of non-negative real numbers), is given. This 8 can be arbitrarily chosen and then is fixed throughout the paper. The object world consists of a continuum set X of objects each of which is comprised of one or more sub-objects. In addition, each sub-object is a compact set in [wp(P-dimensional Euclidean space). The set X is topologically organized into a compact metric space. A mapping f from X onto another space Y is in twofold-C’, i.e., in C1 inter-object as well as intra-object. This f can be arbitrarily chosen and then is fixed throughout the paper. One can know that the mapping f is in C’; and besides, obtain a Lipschitz’s constant Ls~ [W’ for J: This L, is fixed throughout the paper.
Let us make some additional remarks on Premise. (1) More elaborate and minor prerequisites for the GAP that are involved in our own GAP solving method are embedded scattered into the text, appearing under the heading “Assumption N”. (2) Refer to Sections 3 and 4 for mathematical details of all the technical terms in Premise such as “compact metric space” and “Lipschitz’s constant”. (3) We assumed in [PR3] and [PR4] that X is continuum and compact. Let us explain this premise for the sample object world of the motion picture. The picture film consists of numerous, or rather, actually a nearly infinite number of picture frames and is thus considered to constitute a continuum. Also, one can consider the picture film compact (totally bounded and complete) in the sense that it is bounded, i.e., the film length from the first film frame and the last is finite. (4) We assumed [PR5] since neural mappings generated by the network with C’ activation functions ([PRl]) are in C’. This premist [PR5] is a little stronger than that of the previous work such as [S-7]. (5) There exist Lipschitz’s constants for any C ’ function since it is well known that it satisfies a Lipschitz’s condition (refer to Section 3 for more details). Thus we assumed [PR6], although we think it not so easy to obtain actually a Lipschitz’s constant L., even the minimum one, solely from a finite sample from the C1 function J: 2.2. Statement
of the GAP
We here give a specification of the GAP in prose (also refer to Fig. 2), which provides a basis for the mathematical argument in Section 3. 1. A Generalization-and-Approximation Problem (GAP) consists of constructing any minimal SS-pair (o”,wo) from f and 8 given such that w” generalizes a sample go as well as approximates f within E.Here, 0’ indicates a finite sample that Definition
Y. TakahashilNeurocomputing
154
16 (1997) 149-186
plane
plane
contour
contour
I’ a
0
: differentiable-
sample data
mapping
generalize and approximate
f
neural mai&
three-layer
I
’
’
network
Fig. 2. A basic generalization-and-approximation
problem
is minimum in the sample size (i.e., the number of sample points) while CO’a structure (unit sizes, synapse weights, biases) that is minimum in the unit sizes (i.e., the numbers of units in each layer). A Basic GAP (BGAP) is a specially simple form of the GAP where every object is comprised of a single sub-object that in turn consists of a single contour on the plane R2; and besides, the network is 3-layered.
3. Mathematical
formulation
of the BGAP
Solving the GAP must start from formulating the BGAP by Definition 1 mathematically rigorously. This section is devoted to this formulation by developing several mathematical entities required for the BGAP.
Y. TakahashilNeurocomputing 16 (1997) 149-186
155
3.1. Mapping f 3.1.1. Domain X and range Y of f
The object world X in the BGAP is a set of contours. Let us begin with the following definition of contours. 2. A contour x is a simple continuous closed curve on the plane [w2and represented by Definition
x = {x(t) = (x1(t), x2(t))]tEZ = [0, 11; Xi(t):continuous on I (i = 1, 2); Xj(0) = Xi(l) (i = 1, 2); X(s) #X(t) if S # t}.
(1)
As seen from (l), contour x is a bounded and closed subspace of IF!’with respect to the distance fi2 defined by 8,(x,(t), xb(t)) s maxi= i, 2 IX,i(t) - xbi(t)] for all contour pairs (x,, xb) and all t~l.
(2)
Thus, contour x is known to be compact: for any covering of x, there always exists its finite refinement covering of x. Let us denote by C(1) a set of all the contours. C(Z) = (xix: contour on W}.
(3)
The set C(I) turns out a metric space with the distance dz defined by d2(x,,xb) = max,,162(xJt), xb(t))
for all pairs (x,, X~)EC(I) x C(I).
(4)
In many cases, the same notation x can implicitly indicate either or both of the twofold meanings suitably depending upon discussion contexts: one is a contour as a continuum on [w2and the other a point of the metric space C(I). Nonetheless, we also prepare explicit notation XM’ and xM2 to indicate distinctly the former meaning and the latter one, respectively, for discussion contexts that crucially rely on their clear distinction. In general, a metric space is compact if and only if it is both totally bounded (i.e., t.he existence of a finite s-covering for any E > 0) and complete (i.e., all the Cauchy sequences converge). Obviously, the metric space C(1) is not totally bounded and thus not compact. Since we assumed [PR3] and [PR4], we formulate the domain and the range off as follows. 3. Each of the domain X and the range Y of f is a compact metric subspace of C(1) with respect to the distance d2, and expressed by
Definition
X = {x,l~~~C(l);x,
#xb if a # b;a, beA};
(Sa)
Y = {~~l~~~W);a,bE/i}, where n indicates an index set with the cardinality
(sb) # (/i) = #( [0, 11).
Y. TakahashilNeurocomputing 16 (1997) 149-186
156
We similarly by X”’ contours Pi the points Since X compact, there covering of
XM2) the exists, for
3.1.2. Twofold-C’ mapping f We formulate the twofold-C’ Definition f:X-+
4. Twofold-C’ Y;
mapping
mapping
[f(xa)](t)
X that
=y,(t)
covering
of
comprised
of
its finite
f as follows.
f is a mapping for all a6A
from X to Y, i.e., and t~l
(6)
such that it satisfies the following twofold-C’ conditions simultaneously. (a) For each a E A, mapping f is in C1 at all the R2-points on xf’ with respect to distance h2. (b) Mapping f is continuous on the space XM2 with respect to distance d2. In addition, it is assumed that f is expanded to a mapping that is in twofold-C’ on C(1). For notational simplicity, the expansion is denoted also by J: Twofold-C’ mapping f includes, tours such as rotation and smooth
for instance, differentiable expansion/retraction.
deformations
on con-
3.2. Samples and structures 3.2.1. Samples Let M and N be any natural A, = (x,,,EXId2(xl,x2) and besides, A, a partitioning
numbers.
= ... = d2(x,,x,+l)
Definition
(7)
(8)
as follows.
o- z a(M, N) is specified
CJ= {(~m(t,),~,(t,))l~m~A,,~m n = 0, 1, .
= ... = d2(x,+_1,xM)),
. . . . N-l}.
a finite sample
5. A finite sample
of X such that
of I such that
A,-{t,~Zlt,=n/N;n=O,l, Then, we formulate
Also, let A, be a partitioning
=f(x,)~
by
Y, m = 1, . . . , M; t,gdi,
, N - l}.
In addition, each element (x,(t,),y,(t,))~ number MN is called a sample size.
(9) R4 of D is called a sample
point. Also, the
The specification of the sample by Definition 5 is one of the simplest. There are many other possible alternatives to that specification. For instance, the partitionings d, and A, need not be limited to the equally-divided. Such sample specifications more advanced, however, have to be left for a future study.
Y. TakahashilNeurocomputing 16 (1997) 149-186
3.2.2. Structures The network used in the BGAP is a 3-layer one with C’ activation specify that network.
157
functions.
Let us
Definition 6. A 3-layer network with C’ activaion functions is specified as follows. (a) Inputs consist of all the points of the contour er EX for all agn. Also, outputs can consist of all points of all contoursy M1 of any subspace of C(Z), not necessarily identical with the space Y =f(X). (b) The input layer has two units. Outputs from unit i are represented by a set jxi(t)ER]tEI) (i = 1, 2)
such that x”‘(t)
= (xl(t), x2(t)).
(10)
(c) The intermediate layer has K units where K, a natural number, size. The activation function at unit k is represented by ok
= (Pk[~iC(ikXi(t) - e,](i
indicates
= 1, 2; k = 1, . . . ,K),
a unit
(11)
where (Pi is any bounded, monotone increasing Cl-function and besides, &( - co) = 0; aik a synapse weight from Unit i to unit k; ok a bias of unit k. Eq. (11) is also expressed as a brief representation z(t) = cp[u(t) - 191 where CIE (a&) denotes a InatriX, X(t) = (Xi(t)) and 8 = (&) VeCtOrS while cp = ((Pk) a mapping from RK to [WK. (d) The output layer has two units. The activation function at unit j is represented by JJj(r) =
(k = 1, .
CkPkjZk(t)
, K; j = 1, 2),
where bkj is a synapse weight from unit k to unit. Eq. (12) is also expressed as a brief representation denotes a matrix. Prior function
(12)
y(t) = /Fz(t) where fi = (p,‘j)
to the structure specification, we make the following assumption (Pi and the paramCterS (K, uik, Pkj, 6,) that appear in Definition 6.
on the
Assumption 1. A maximum unit size KM available for the network is sufficiently very large. Also, any mapping vector cp = (ql, . . . , (Pp) is given known. In addition, both of KM and q are fixed constant. On the other hand, unit size K, the matrices a = (&k) and fl = (bkj), and the vector 0 = (0,) are all variables, the ranges of which are as follows: l
< Xik < 03
- ZC
(13)
c;O
(i = 1, 2; k = 1, ... ) K),
(14)
(k=l)...)
(15)
CC (k=l,
K;j=l,2), . . ..K).
(16)
Y. TakahashiJNeurocomputing 16 (1997) 149-186
158
Now, let us specify the structure as follows. Definition 7. A 4-tuple o = u(K) = (K, a, 8, p) is called a structure of the network if
and only if it satisfies the conditions (11) and (12) simultaneously. As far as the BGAP is concerned, the network is nothing but its structure. Thus we henceforth identify the network by its structure o. Then, we proceed to prepare the following specification of neural mappings. 8. A structure w = (K, a, 8, p) produces the following mapping t, from X to C(Z) through (1 l), (12). Definition
y(t) = [&,(x)](t) = /?q[~ax(t) - O] for all XEX and te_I (yeC(Z)).
(17)
The mapping 4, is called a neural mapping of w. 3.3. Generalization
and approximation
This subsection expounds our notion “generalization-and-approximation” by formulating the terms “generalizes a sample” and “approximates f” in the GAP specification in Definition 1. 3.3.1. Generalization Let (a, co) be any SS-pair of a sample c = {(xm(t,),ym(t,))} and a structure o = (K, U, 0, /?). Generally speaking, the generalization by the network [3] is concerned with values of neural mapping 5, at points different from those ofxm(tn) where the latter points x,(t,) are fixed. In the BGAP, however, (Tand 5, are both variables. Thus, we must consider the “generalization” relational conditions between cr and 5, that the BGAP solution should satisfy. More specifically, our generalization specifies relational conditions between f(xm) and r(x) in C(I)M2, and besides, between f(x,,,)(tn) and 5(x,)(t) in C(I)“‘, which are described as follows. 9. Let (rr, w) be any SS-pair. Then, the neural mapping 5, s-generalizes (henceforth, “generalizes” simply) the sample o if and only if the following twofoldgeneralization conditions are simultaneously satisfied (see also Fig. 3). (a) 5, s-generalizes (xm,ym) on XM2, i.e., Definition
Vx z xM2E XM2, 3m = m(x)(l I m I M):
(18)
d2(5,(X)>Y,) < te.
(b) 5, s-generalizes (xm(tn), y,,,(t,,)) on each x,,, = Vx, =_$‘~X~l(l G,(C~,(x,)l(t),y,(t,))
and all n = 0, 1,
xf’ , i.e.,
for all tE CL &+I1
, N - 1.
(19)
Y. TakahashilNeurocomputing
16 (1997) 149-186
159
3.3.2. Approximation One can notice in Definition 9 that both the conditions (18) and (19) focus entirely
on the relations between rs and 4, while they do not deal with relations between j’ and <, such as d2(&(x),f(x)) and J,([&,,(x)](t), [f(x)](t)). In general, relations between f and l,, are discussed as validity criteria for the generalization [3]. The generalization validity in the BGAP can be measured by the approximation degree off by <,,. Thus, we consider the “approximation” the notion that involves the validity criteria for generalization, which is specified in terms of suitable relations between f and l, as follows. Definition 10. Let (0, u) be any SS-pair. Then, the neural mapping 5, E-approximates (henceforth, “approximates” simply) the mapping f if and only if the following twofold-approximation conditions are simultaneously satisfied (see also Fig. 3). (a) t, E-pproximates f(x) on XM2, i.e.,
&[f(x),
grJx)] <
E
for all
XeXM2.
(20)
(b) 4, E-approximates f(x(t)) on each x”‘(t), i.e., Yx E xM1 6X: h2 [f@(t)), (,(x(t))]
t
<
E
for all t El.
tm
yn=f(xn)
Approximation
Generalization
Fig. 3. Relationships among values off; P and 5
(21)
160
Y. TakahashilNeurocomputing
16 (1997) 149-186
It follows easily from the specification of the distances d2 in (2) and d2 in (4) that actually, conditions (20) and (21) are equivalent with each other. Hence we have only to examine (21) in order to ensure the fact that 5, approximates J:
3.4. Formulation
ef the BGAP
With all the mathematical entities prepared by Sections 3.1-3.3, we formulate the BGAP in a mathematically rigorous way as follows. Definition 11. A mathematical form of the BGAP is specified as follows. Assume that an admissible error E and the twofold-C’ mapping f specified by Definition 4 are given. Then, construct any SS-pair (cr’, w”) z (a(MO, No), o(KO)) an d a structure o(K”) = (KO, MO,8’, p”) of a sample o(M’, No) = {(~,(6k+&J)} such that the following conditions are simultaneously satisfied. [B-CON11 The neural mapping t,o generalizes the sample GO,i.e., the conditions (18), (19) in Definition 9 are satisfied. [B-CON21 The neural mapping 5, also approximates f, i.e., the condition (21) in Definition 10 is satisfied. [B-CON31 Among all the SS-pairs (o(M, N), o(K)) each of which satisfies conditions [B-CON11 and (B-CON2], the SS-pair (o(MO, NO), w(KO)) satisfies the following minimal conditions simultaneously. [B-CON3a] The sample a(M’, No) is minimal, i.e., sample size M”No is minimum. [B-CON3b] The structure o(K”) is minimal, i.e., unit size K” is minimum.
4. Reduction of the BGAP There can be several or more methods with which one can actually construct SS-pairs (g, w) that satisfy conditions [B-CON11 and [B-CON21 in the BGAP. It thus is extremely difficult to obtain a soution (a’, w”) that satisfies condition [BCON33 since it requires the examination of minimality exhaustively over all the construction methods of the SS-pairs. Owing to the real-world environment [REl], [RE2] assumed in Section 1, we thus limit the SS-pair construction method to a standard one that exploits polynomials. By the aid of suitable polynomials, this section reduces the BGAP into its standard restricted version. 4.1. Primitive polynomial
constructors
Let (o(M, N), o(K)) be any SS-pair of a sample o(M, N) and a structure o(K). In addition, let P E P(x) be any polynomial on C(I) = C(Z)“‘, andp, =pm(t) = (pml(t), h(t))(m = 1, . . . , M) a set of polynomial contours. Then, let us consider an SPPtriplet rcpv = rcpv(K; M, N) = (o(M, N), P, {pl, . . . ,p,}) as a first rough constructor for the BGAP solution (a’, 0’). We thus specify primitive constructors as follows.
Y. TakahashiJNeurocomputing 16 (1997) 149-186
.I61
Definition 12. The SPP-triplet
npy(K; M, N) = (o(M, N), P, (pl) . . ,pM} ) is said to be an s-primitive polynomial constructor for the SS-pair (o(M, N), o(K)) (henceforth, “primitive polynomial constructor” simply), or conversely, (a(M, N), o(K)) is said to have the primitive polynomial constructor Z&K; M, N) if and only if the following three conditions are simultaneously satisfied (also see Fig. 3 again).
[z~~-CON~] Generalization-and-approximation conditions for J: [np,-CONla] f(x,,,)(t) s-approximates f(x)(t), that is, for any XEX, there exists x, such that for all t E I, &(f(xMf(&J@))
(22)
< a/4.
[rcpy-CON1 b] f(xm)(t) s-approximates f(xm)(tn), that is, for all x, and t E [t,,, t, + ,I, s,(f(Xm)t,Xf(Xm)(t))
[n,,-CON21 [r+,-CON2a]
(2.3)
< 48.
Generalization-and-approximation conditions for P. P(p,)(t,) a-approximates [f(xm)](tn), that is, for all x, and t,,
~*([Ifhl)l(L)r lIfYPm~l(GI)) < 420. [rcpy-CON2b]
Mm&n),
P@,)(t) s-generalizes I’@,,,)(&), that is, for all Pm and t E [tn, t,+
CmiJl(~))< E/20.
[7cpy-CON2c] P@,)(t) ~2(wmN),
CR,,-CON2d] I,>
[q,-CON2e] ~,(M%lk)>
(2,4)
s-approximates
(25) P(x,)(t),
that is, for all x, and t E I,
Cm?l)l(~))< E/20. P(x,)(t)
11,
(26)
s-generalizes P(x,)(tn), that is, for all x, and t E [t,,, t,+ 1],
mGiJ(t)) P(x,)(t,) %&tn))
(2.7)
< 420.
s-approximates
5,(x,)(&), that is, for all x, and t,,
< E/20.
m
[z~~-CON~] Generalization-and-approximation conditions for <, [rc,,-CON3a] &,(x,)(t) s-generalizes &,(x,)(t,), that is, for all x, and t E [t,,, t,+ ,I. ~2(LWM,
i”&?J(t))
[rcpV-CON3b] (,(x,)(t) x, such that for all t E I, &(i;c&)(t),
L&n)(t))
(29)
< &.
s-approximates
< $6.
[,(x)(t),
that is, for any XEX, there exists
(30)
The SS-pair (o(M, N), w(K)) that has a primitive polynomial constructor n&K; M, IV) is so constructed that it is naturally sufficient for conditions [B-CONl], [B-CON21 for the BGAP. Actually, we demonstrate that in the following Proposition 1.
Y. TakahashiJNeurocomputing 16 (1997) 149-186
162
1. Let (0, o) = (o(M, N), w(K)) b e any SS-pair that has a primitive polynomial constructor X&K; M, N) = (o(M, IV), P, { pl, . . , pM} ). Then, the SSpair (0, w) satisfies conditions [B-CONl], [B-CON2], i.e., it satisfies both the generalization conditions (18), (19) in Dejnition 9 and the approximation condition (21) in Dejnition 10.
Proposition
Proof. Let us confirm each of the conditions (19), (18), (21) in this order. (1) Confirmation of(19): It follows from (24)-(29) that for all x,, tE [t,,, t,+ I] and n = 0, 1, . , N - 1,
5 &tfbn)l(L)~ Cwbl)l(tn))+ &(mhk)~ C%%Jl(t)) + ~,(%tN~)~CmhtJ1(0)+ L4mn)@n)~f-%I)(~)) + &(a&mn),Wm)M) + &mn&“)~ 5kn)(~)) <&E+&~E+&E+&~E+&E++E = gE.
(31)
Thus, the condition (19) is confirmed. (2) Conjirmation of (18): It follows from (23), (30) and (31) that for any x and for all t E I, there exists a pair of x, and t, such that MMx)(t)J,(t))
5 s,(f(x,)(t,),f(x,)(t))
+ &(5(X)(t),
<(X,)(t))
+
s,(C5,(x,)l(t),y,(t,))
< &E + iE
+ $8 = $8.
(32)
The condition (18) is a straightforward consesquence from (32) because of the specification (4) of d2 (3) Confirmation of(21): It follows from (22) and (32) that for any x and for all t E I, there exists an x,,, such that &D-(x(t)),
< +E
5uMt))lI: ~zw-w),fkn)(~)) + ~,(kilcw),Ym(~))
i- SE =
E.
Consequently, the condition (21) is confirmed.
(33) 0
4.2. Lipschitz ‘s conditions Under the real-world environment [REl], it is still quite difficult to find a primitive polynomial constructor zpy for most ordinary SS-pairs (0, w). For example, one cannot even start with finding any contour x, that satisfies [zpy-CONla] unless one knows all the values off contrary to [REl]. Thus, we are required to elaborate the primitive polynomial constructor npV further into the one that is actually constructible under [REl].
Y. TakahashilNeurocomputing
16 (1997) 149-186
163
We here exploit Lipschitz’s conditions. This is because owing to the Lipschitz’s condition, we can deal with [REl] well by reducing neighborhood properties of all the mappings f(x), t,(x) and P(x) into those of their sharing variable x itself. In general, it is known that C ’ mappings from Rp to (WQ satisfy Lipschitz’s conditions. The mapping f, any neural mappings 5, and any polynomials P are all in C ‘. Hence they satisfy some Lipschitz’s conditions and thus are applicable to the elaboration of the primitive polynomial constructor rcpv. This subsection develops Lipschitz’s conditions for the mappings f(x). &,,(x) and P(x). 4.2.1. Lipschitz ‘s conditions for J cpand P Let us begin with the following definition of Lipschitz’s conditions in general, Definition 13. Let F be any Co (continuous)-function
from a compact metric space X * of [wpto RQ where it is assumed that each point of the space X* is expressed by the parameter t~1. Then, a Lipschitz’s condition for function F is expressed by I’% > 0: for
all
~Q[F(%(t,)),
F(Xb(t2))l
5
-Gl)
=
(x.I(~I),
. . . , x,P@I)),
XbV2)
=
(Xbl@2),
...
LF6P(x&l)~Xb(~2))
1 Xb&*))EX*;
t1,
t,EI,
where Q&i))
= (Y,l@i), ...
, hQ@l))
and
@b@2))
=
(Ybl@2h
...
,
ybQ@Z))ERQ.
(34)
Here, LF is called a Lipschitz’s constant, &(k = P, Q) are defined as follows: &(%(~1),Xb(~2))
E
maxi=
I,
,kIXadtl)
and besides, k-dimensional
-
XbdtZ)I
(k
=
P,
Q).
distances (35)
Since the functions f; cpand P are in C ‘, all of them satisfy some specific Lipschitz’s conditions. We here assume that those conditions are represented as follows. Assumption
2. (a) It is assumed that a Lipschitz’s condition for f is represented by
s,[f(x~)(tl),f(Xb)(t2)1
5 LfS2(x&l),Xb@l))
for all _&,&,EX z X”‘;
tl, t,El,
(36)
where Lfe [w+is the Lipschitz’s constant described in [PR6] of Premise. (b) It is assumed that a Lipschitz’s condition for cp G (qk)(k = 1, . . . , K) is represented by G2(~[x&l)1,
q[xb(t2)l)
for all &EX”‘,
c
L,G2(xa(tl),xb(t2))
XbEC(I)M1; tl, t2E1,
where Lqe [w+is a Lipschitz’s constant that is fixed throughout
(37)
the paper.
164
Y. TakahashiJNeurocomputing
16 (1997) 149-186
(c) Let P be any polynomial from C(Z) = C(Z)“’ to C(Z) = C(Z)“‘. assumed that a Lipschitz’s condition for the polynomial P is represented
&(PC41)1, PC%Wl) 5 for allx,EX,
xbEC(Z);
where Lpe R+ is a Lipschitz’s each polynomial P.
Then by
it is
~P~ZMtl)?%(t2))
ti, t,EZ,
(38)
constant
that can be selected
variably
depending
on
4.2.2. Lipschitz’s conditions for <, This section develops a generic form of Lipschitz’s conditions for neural mappings 5, that include only the least possible elements particular to each individual structure cc). First of all, let us derive from (37) a particular form of Lipschitz’s conditions for 5, that is straightforwardly associated to each individual structure o. Lemma 1. Let o = (K, a, 8, p) be any structure. and B,,, be specified by A~~max{~~,~~i=
1,2;k
B, E max{l/?,jllk Then, the following ‘Jx,EX,
vxbEC(z);
real numbers A,
= 1, . . ..K}.
= 1, . , K;j
Lipschitz’s
Also, let constant
(3%
= 1, 2).
condition
(40)
holds for the neural mapping 5,.
vt1, vt,Ez:
~,(CL(xcJI(t~), CLWl(tz)) I 2KA,B,L,G,(x,(t,),xb(t2)).
(41)
Proof. It follows from (1 l), (12), (17) that
~2([&k)l@IhC&b)l(b)) = ~,(&C%@I) - 01,P~Cc%Vd - 01). It then follows from (40) as well as the specification (42) is evaluated by UP(PCWI(t,)
-
side of
01,PACT&) - Ql)
5 Wo&(vClwx,(t~ 1- 01,p[mb(t2) Next, we have from the Lipschitz’s is further evaluated by ZWA,t~CMt,)
(2) of & that the right-hand
(42)
- 01,
-
condition
q[ub(t2)
I
2KB,LG,(ccw,(t,)
=
2W,J&tccw,(t,), abtt2)).
-
01).
(37) for cp that the right-hand
(43) side of (43)
@)
- 0, ccw,(t,) - 0) (44)
Y. TakahashilNeurocompuiing 16 (1997) 149-186
165
Finally, we have from (39) as well as (2) that the last expression on the right-hand side of (44) is further evaluated by
Therefore, we obtain the target Lipschitz’s condition (41) for 5, from the evaluations (42)-(45) above. 0 Next, consider any SS-pair (o(M, N), w(K)) that has a primitive polynomial constructor 7tpy(K; M, IV) = (o(M, N), P, (PI, . . ,pM} ). Then, the structure o(K) = (K, CI,8, p) satisfies in particular the condition (28) in Definition 12 as well as the conditions (1 l), (12) regarding a sample a(M, N). That is, w(K) satisfies the following equation/inequality system: Zkm(tn)
=
(Pk[CiClikXi(tn)
-
O!fl
(i = 1, 2; k = 1, . . . , K; m = 1, . . . , M; n = 0, . . . , N - l),
(46)
(k = 1, . . , K; m = 1, . . . , M; n = 0, 1, . . . , N - 1; j = 1,2),
(47)
Pj(xm)(tn)
(m=
(E/2o)< .Vjm(tn)<
-
Pj(xm)(tn) +
(E/2o)
1, . ..) M; n = 0, 1, . . , N - 1; j = 1, 2).
Here, we demonstrate
(48)
the following Lemma 2.
Lemma 2. Assume that there exists a structure co*(K) = (K, c(*, fies the composite equation/inequality system (46)-(48) with y* E (yj*,(t,)) suitably selected. Then, there exists some structure w’(K) = (K, CC’,go, /?“) that with z” E (zf,,,(tn)) and u” = (yTm(tn)) appropriately selected, and satisfies the following conditions: lail
foralli=1,2andk=l,...,
(/Ikg I B
for all k = 1,
Proof. The purpose
K,
satisjies (46)-(48) in addition it also
(49)
, K and j = 1, 2.
of our proof is to construct Eqs. (46)-(48) by the transformation
P, /I*) that satisz* = (z&,(t,)) and
(50)
a solution
(a’, O”,/?“) to the
C&= &(A,
A),
(51)
Bij
B),
(52)
=
Bk*jtBw*l
166
Y TakahashiJNeurocomputing
16 (1997) 149-186
where (0”, z”,yo) with 0’ = (0:) z” = (z&,(t,)) and y” = (y&(&)) is appropriately selected subject to ZkO,(t,)
=
qk[CiMiXi(tn)
n=O , “’ >N-
-
(i
=
1, 2; k = 1,
) K; m = 1,
) M;
1).
(53) (k = 1, .
y&(t,) = C,/?tjz&(t,) j=
fl,“]
, K; m = 1, . . . , M; n = 0, 1, . . , N - 1;
1,2).
(54)
Here, we put that A,.-max{lair,IIi=1,2;k=l, B,* E max{I/?,*jIlk = 1,
. . ..K}.
(55)
. , K; j = 1, 2);
(56)
where A and B indicate positive real numbers arbitrarily chosen. It is obvious from (51), (52), (55), (56) that (a’, PO) satisfies the conditions (49), (50). Thus, the purpose of our proof is fulfilled when we demonstrate that y = y” satisfies the inequality (48). First of all, since by assumption, o*(K) satisfies (46)-(48), it follows that (a*, 6*, ,/3*,z*, y* ) satisfies the following conditions:
Zk*m(tn) = (Pk 1 &k%(t)- 0: [i 1 (i = 1, 2; k = 1, . . . , K; m = 1, . . . , M; n = 0, .VTm(tn) = 1
. , N - l),
(57)
Pk*jzk*m(tn)
k
(k = 1, . . . , K; m = 1, Pj(xm)(tn)
(m
-
=
(d20)
<
, M;n=O,l,...,
$z(tn)
<
pj(xm)(tn)
N-l;j=l,2), +
(58)
(d20)
1, . . . , M; n = 0, 1, . . . , N - 1; j = 1, 2).
(59)
It follows from (59) that there exists some positive real number sr E &r(s) such that pj(xm)(tn) -
(420)< Yjnm
<
Pj(xm)(tn)
+
(420) (j = 132)
for all j&n = (Lln, JLnn)~~Ymn,
(60)
where DYmnis specified by D Ymn
=
{y,,~~21&EY*J%lnl< cy} (m = 1, . . , M; n = 0, . , N - 1).
(61)
Then, the mapping specified by jj = C pkg.Zk(j = 1,2), k
or briefly, j = P’z,
(62)
Y. TakahashilNeurocomputing
16 (1997) 149-186
.!67
is continuous in zk, a continuous variable that is an output from the kth intermediate unit. Thus, there exists some positive real number EZ= s&I*, B) such that J&,=~~~~,,ED~~~
forallz,,E&,.
(m=l,...,M;n=O,...,N-l),
(63)
where DZmnis specified by
(m = 1, . . . , M; n = 0, . . . , N - 1).
(64)
In the meantime, we take a real value V = I’(.+, p*, B) = V(/?*, B) small enough so that the following condition can hold: max.[qk(v), (B,.B-‘)&v)]
for all v < I/.
(65)
This is possible because (Pk is bounded, monotone increasing and besides, (Pk(- co) = 0. Hence, it follows from (60), (61), (63)-(65) that we can reselect 0* = 8*(&z, cI*,p*, B) E e*(c(*, p*, B) such that (a*, O*, /I*) is maintained to satisfy (46)-(48) and besides, satisfies the following condition: max,,,C(B,*B-‘)zk*,(t,), =
max.,,.{(B,*B-‘)cp,CCi
z&&J} ai*kxi(tfz)
-
@I, Zk*m(~n)} < Ez
for all k = 1, . . , K(m = 1, . , it4 and n = 0, . . . , N - 1).
(66)
Similarly, it follows from (65) that we can select 0’ = Q”(sz, 01’,/I*, B) = 8*(a*, A, /I*, B) such that (x0, 0’, p”) satisfies the following condition. max.,,,C(B,*B-‘)zkO,(t,),
&&)I
= max.,,.[(B,*B-‘)z&(t,),qkcci
ccfLxi(t)
-
@II < Ez
for all k = 1, . . . , K(m = 1, . . . , A4 and n = 0, . . . , N - 1).
(67)
Hence, we obtain from (53), (54), (66), (67) that s,[(B,.B-‘)z*,
zi(t,)]
< sZ for all m = 1, . . . , M and n = 0, . . , N - 1. (68)
Therefore, it follows from (53), (54) and (60), (61) as well as (63), (64), (68) that y = ji” satisfies the inequality (48). This completes the proof. 0 Lemma 2 enables us to search for BGAP solution SS-pair (a’, w”) entirely within the scope of all %-pairs (GO, w”) that satisfy the conditions (49), (50) in Lemma 2. Furthermore, Lemma 1, combined with Lemma 2, has already demonstrated the following Proposition 2 where the individual-structure-dependent parts A, and B, of the Lipschitz’s constant 2KA,B,L, for 5, in (41) are replaced with the more generic individual-structure-invariant constants A and B, respectively.
Y. TakahashilNeurocomputing
168
16 (1997) 149-186
Proposition 2. Let A and B be arbitrary positive real numbers that can be chosen by network designers. Also, assume that there exists a structure co* = (K, CI*, 8*, j*) that satisfies the composite equation/inequality system (46)-(48). Then, there exists some structure coo = (K, go, go, f3’) that satisfies (46)-(48) and besides satisfies the conditions: IcciI
foralli=l,2andk=l,...,K;
(69)
Ip~jI I B
for all k = 1, . . . , K and j = 1, 2;
(70)
and then satisfies
the following
Lipschitz’s
condition for <,o:
WXOE x, VXb E C(Z), vt, E I, Vt, E I:
&(CL(xa)l(t~),CS&dl(tz)) 52KABL,G,(x,(t,),x,(t,)). Proof. A straightforward
consequence from Lemmas 1 and 2.
(71)
q
This Proposition 2 enables us to search for the BGAP solution SS-pair (a’, 0’) entirely within the scope of all SS-pairs (o’, w”) that satisfy the Lipschitz’s condition (71). Thus we henceforth focus our discussion on such structures, taking at the same time the following Assumption 3 regarding numbers A and B. 3. It is assumed that positive real numbers A and B are arbitrarily chosen but fixed throughout the paper.
Assumption
4.3. Polynomial
constructors
By the aid of Proposition 2 as well as Assumption 2 on the Lipschitz’s conditions, we elaborate the primitive polynomial constructor rcpy(K; M, N) in Definition 12 into the polynomial constructor that is specified as follows. 14. Let (a, o) = (o(M,
IV), o(K))
be any SS-pair. Then, any SPP-triplet ~~~~~(K;M,N)=(~(M,N),P,{P~, . . ..PM}> is said to be an s-polynomial constructor for the SS-pair (a, w) (henceforth, “polynomial constructor” simply), or conversely, (g, o) is said to have a polynomial constructor n if and only if the SSP-triplet (0, o; rc) satisfies the following fourfold fine conditions as well as the two conditions [+V-CON2a] and [rt pV-CON2e] in Definition 12 (also see Fig. 3 again):
Definition
[rc-CON11 The SSP-triplet X,EX = XM2 6(x, x,) < ~/~LY~,K,
is s-fine with respect to (x,x,),
that is, for all
(721
where LfV,K is specified by Lf,,k
s max.(Lf, 2KABL,).
(73)
Y. TakahashijNeurocomputing
16 (I 997) 149-186
[n-CON21 The SSP-triplet is s-fine with respect X,(C)EX = X”‘, t~[t,, t,,+i] and n = 0, 1, . . . , N - 1, &(&&), G&))
169
to x,(t),
that
is, for all
< ~/L~P~,K;
(74)
where LfPy,, K is specified by L fPI”,K = max.@Lr, 20Lp, 16KABL,).
[rc-CON33 The SSP-triplet X,EX = XM2 andp,, d2hmPd
(75)
is s-fine with respect to (x,,p,),
that is, for all
< E/20&
(76)
[rc-CON41 The SSP-triplet is s-fine with respect to p,,,(t), that is, for all p,,,(t), t~[t,,,t,,+i]andn=O,l,..., N-l, (77) Every polynomial constructor n(K; M, N) in Definition 14 is one of the primitive polynomial constructors npy(K; M, N) in Definition 12. Actually, we obtain the following Proposition 3. Proposition 3. hsume that an SPP-triplet x(K; M, N) = (o(M, _V),P, {pl, . . . ,pM} ) is a polynomial constructor for an SS-pair (o(M, N), o(K)). Then, n(K; M, N) also constitutes a primitive polynomial constructor for (o(M, N), w(K)). That is, z(K; M, N) satisjies the conditions [n,,-CONl], [7cr,,-CON21 and [n,,-CON31 in Dejinition 12. Proof. We confirm, one by one, all the conditions
[Q,-CONl], [7cpy-CON21 and [z,,-CON31 in Definition 12 except for the assumptive conditions [rr,V-CON2a] and [rcpV-CON2e] for n(K; M, N). (1) Confirmation &(f(x),f(x,)) (2) Confirmation
of [rtpy-CONla]:
5 LfdZ(X,&) < (Lf/Lr,,K)(as) of [np,-CONlb]:
s,(f(x,)(t,),f(x,)(t)) (3) Confirmation ~2(P(Pm)@n)>
(4) Confirmation ~2(%&)>
It follows from (36) and [n-CON11 that i $8.
(78)
It follows from (36) and [rcpy-CON21 that
5 L,&(&&,),&(t))
< E(L//L/P~,K) 5 6s.
(79)
of [rcpV-CON2b]: It fo 11ows from (38) and [rc-CON41 that [P(Pm)l(t))
5 Ld2(P&),Pm(~))
< LP(+&,)
= h&.
(80)
of [7rpy-CON2c]: It follows from (38) and [7c-CON4) that [P(Pm)l(t))
s Ld2(&n(t)>Pm(~))
< LP(&&)
= &I&.
(81)
Y. TakahashilNeurocompuiing
170
(5) Confirmation
16 (1997) 149-186
of [npV-CON2d]: It fo 11ows from (38) and [n-CON31 that
Cmh)l(O) 5 JwZbt(~),Pm(~)) < Jh&&4 = As-.
MP(&M (6) Confirmation M%J(L),
of [rep,-CON3a]:
(82)
It follows from [71] and [n-CON21 that
5(&)(Q) 5 2KABL,G,(x,(t,),x,(t))
< s(2KA&&P,,K)
I&. (83)
(7) Confirmation
of [npy-CON3b]:
It follows from (71) and [n-CON11 that
6,(5,(x)(r), <,(x,)(r)) I ~K.~BL,G,(x(~),x,(~))
< &(~KABL,/~L~,,,)
I $8. (84)
This completes the proof.
0
4.4. A standard restricted version of the BGAP Taking into account Propositions l-3, we reduce the BGAP in Definition 11 into its restricted version that is specified as follows. 15. A standard Restricted Version of the BGAP (SRV-BGAP) is specified as follows. Assume that an admissible error E and the twofold-c’ mapping f specified by Definition 4 are given. Then, construct any SS-pair (CT’,0’) = (o(MO, NO), w(KO)) an d a structure o(KO) = (KO, CI’,8’, PO) of a sample o(M”, No) = {(~,&),Y,&))} such that the following three conditions are simultaneously satisfied. [CON11 The SS-pair (o’, w”) has some polynomial constructor rc” E n(KO; MO, NO) ? (00, PO, {p?, . . . ,pLo} ) that satisfies conditions [rc-CONl][7r-CON43 in Definition 14 as well as conditions [npV-CON2a] and [npV-CON2e] in Definition 12. [CON21 The structure o(K”) satisfies the following conditions.
Definition
Io$l
foralli=1,2andk=l,...,
Iflfji
forallk=l,...,
K.
Kandj=1,2.
(85)
(86)
[CON31 Among all the SS-pairs (0, w) each of which satisfies conditions [CON11 and [CON2], the following minimality conditions simultaneously hold. [CON3a] The sample a(A4’, No) is minimal, i.e., sample size M”No is minimum. [CON3b] The structure o(K”) is minimal, i.e., unit size K” is minimum.
5. Mathematical
tools for solving the SRV-BGAP
For the present paper to be self-contained, this section provides a brief review of two mathematical key tools for solving the SRV-BGAP: the Weierstrass’ polynomial approximation theorem in analysis and the solution existence theorem for linear equation systems in linear algebra.
Y TakahashijNeurocomputing 16 (1997) 149-186
5. I. The Weierstrass ’ polynomial
approximation
171
theorem
Among various polynomial approximation methods, the one due to Weierstrass is the most fundamental and thus the most popularly used. His theorem, quite simple, is described as follows: The Weierstrass’ polynomial approximation theorem. Let f(x) be a C’ function of a real variable x on a closed interval [a, b] c [w. Also, let E be any given real positive number. Then, there exists a polynomial P(x) that satisfies the condition o1 [f(x),
f’(x)1< 8 for allx E[a,bl,
(87)
where o1 indicates the distance defined by
(88)
WI = If(x) - wa.
61(f(x),
The theorem is naturally extended to any C ’ mapping f(x) from a compact set in 52’ to another. Furthermore, there have been many excellent computing methods developed in numerical analysis to calculate specific polynomials P(x) in the theorem efficiently by computers. Thus, when we say later in the context of solving the SRV-BGAP that we make use of the Weierstrass’ polynomial approximation theorem, we intend to imply that we not only ensure the existence of an approximation polynomial P(x) only by the theorem itself but also construct some specific approximation polynomial E’(x) by making use of any of these practical computing methods. 5.2. The solution existence
theorem for linear equation systems
Consider the following general linear equation system: 41x1
+
...
+
ainx,
=
bi
(ail, . . . , ain, biE[W; i = 1, . . . , m).
(89)
Solving methods of (89) were established in linear algebra. Among them, one of the most fundamental solution existence theorems popularly available is the following. The solution existence theorem for Eq. (89). The linear equation solutions if and only if the following condition holds:
system
rank(A) = rank( [A, b]), where A and [A, b] indicate an m respectively.
(89) has
(90) x
n matrix
(aij) and an m
x
(n + 1) matrix
(aij, bi),
6. A solution to the SRV-BGAP
This section states our main results as Theorem 1 that provides a specific solution to the SRV-BGAP specified by Definition 15.
172
Y. TakahashijNeurocomputing
I6 (I 997) 149-186
Prior to the statement, we need to prepare some notation. First, we denote by X(M, IV) an MN x 2 constant matrix (xi,(t,)(m = 1, . , M; n = 0, 1, . , N - 1; i = 1,2); and besides, by b(zk, 13,) an MN-dimensional parametric constant vector ((~k~[z~~(t,,)] + Q,)(m = 1, . . , M; n = 0, 1, . , N - 1). Then, for admissible error e and any set { Pj(Xm)(tn)lm = 1, . . , M; n = 0, 1, . . . , N - 1; j = 1,2}, we also denote by U(m, n, j) the following open interval on R: U(m, %j) G (m =
(Pj(xm>(tn) - (c/2oh Pj(xm)(tn) + (d20))
1, ... ,
M; n = 0, 1, . . , N - 1; j = 1, 2).
(91)
From this interval set { U(m, n, j)l m = 1, . . . , M; n = 0, 1, . . . , N - 1; j = 1, 2), we furthermore specify a set { V’j(Eb)IA = 1, , Aj; Aj: a natural number) (j = 1,2). Element sets Vj( A)(3, = 1, . . , Aj) are specified as follows: vj(/z) E
fl u(m,
n,j)
for a set GA of (m, n)(,? = 1, . , Aj; j = 1,2);
(92)
WI,” subject
to the condition GAInGAs = 8
for all pairs (AI, A,), A1 # ,I2
and u GA = {(m, n)lm = 1, . . . , M; n = 0, 1, . . . , N - l}; Vj(A) # 8
for all A = 1,
Vj(Al)nVj(A2)
=
Now, we proceed
.
.
,
/lj;
for all pairs (A,, A,), A1 # &.
(b
(93)
to the theorem.
Theorem 1. There exists some solution (a’, a~“) = (a(M’, IV’), o(K’)) to the SRVBGAP that has some polynomial constructor 71’ = n(K’; MO, IV’). Here, unit size K” of the structure co(K’) = (K’, a’, go, /I”) is obtained as the minimum among all SSPtriplets (o, w, Z) that satisfy the following two conditions as well as conditions [CONl], [CON21 and [CON3a] in Definition 15. [T-CON11
a-substructure
rank(X(M, [T-CON21 min.{ #
existence
N)) = rank([X(M,
(0, P)-substructure {Vj(A)lA
=
1, . . .
{Vj(A)lA
=
1, . .
N), b(zk, I&)])@ = 1, . , K).
existence ,
Aj}>
for j = 1,2 when rank(X(M, min.{ #
condition:
,
Aj}]
for j = 1,2 when rank(X(M,
(94)
condition:
I3K N)) = 2;
(95)
I2K N)) = 1.
(96)
Y. TakahashilNeurocomputing
16 (1997) 149-186
173
Proof. We demonstrate Theorem 1 by actually constructing a specific S&P-triplet (a’, o”, no) that satisfies conditions [CONl]-[CON31 in Definition 15 while verifying that condition [CON3b] is equivalent to conditions [T-CON11 and [T-CON21 in Theorem 1. Specifically, we construct the SSP-triplet (o’, o”, 7~‘) in the following two steps: (Step 1) We construct and pick out all the SP-pairs ((T, rc) that satisfy condition [CON11 in Definition 15. (Step 2) From among all these SP-pairs (a, rr), we select and construct a specific solution (o’, 0’) together with a polynomial constructor rc” that satisfies conditions [CON21 and [CON3]. It is also verified in this process that condition [CON3b] is equivalent to conditions [T-CON11 and [T-CON21 in Theorem 1. Each of these steps is to be described in detail in the subsequent Sections 7 and 8, respectively. 7. Demonstration
of Theorem 1: constructing
We begin with demonstrating
the following
SP-pairs (a, 7~) subject to [CON11 proposition
Proposition 4. Let K be any unit size (1 I K 5 KM). Then, there is at least one solution frame y(K; M, N) of constructible that satisfies condition [CON11 in Definition 15. Here, a solution frame y(K; M, N) is defined as consisting of a triplet (a(M, N), x(K; M, N), j(K; M, N)) where o(M, N) and rc(K; M, N) indicate a sample and a polynomial constructor; and besides, j(K; M, N) denotes a point set {j,, E ?j,(x,)(t,)l m= 1, . . . . M; n = 0, 1, , N - 1) that satisfies condition [+,-CON2e]. Note. A neural mapping <,, of course, is unknown at this time. We can construct only the point set { j& 1m = 1, . . . , M; n = 0, 1, . . . , N - l} that is to be passed over by some neural mapping 5,. Proof. Let us actually construct a specific solution frame y(K; M, N) that satisfies condition [CON11 in Definition 15. First of all, we pick out any partitioning d, = {xm Ix,EX; m = 1, . . , MJ of X E XM2 that satisfies [rc-CON11 in Definition 14. This selection is possible since X is assumed compact. Then, we proceed to pick out any partitioning A:” = {t,l.v E II t,+&= nLv/N’,‘; nLv = 0, 1, . . . , NLv - l} of I that satisfies a loosened version of [7r-CON21 specified as follows. [rc-CON2]“”
For all X,EX
= X”‘,
tE[t,p,
tpfl]
and nLv = 0, 1, .
, NLv - 1,
Y. TakahashilNeurocomputing
174
16 (1997) 149-186
Next, we select any set of points {P(pm(t,+~))1m = 1, . . . , M; nLv = 0, 1, . . . , NLv - 13 t$tyEs [r+$ON2a] while choosing any set of points {pm(+) 1m = 1, . . , M; - l} near to {xm(t,,LY)lm = 1, . . , M; nLv = 0, 1, . - 11. - > ,...> Note P(pm(t,+~))j and {P~(~,,LV)) that are to be passed over by some polynomial P and some polynomial contourp,, respectively. Furthermore, we construct any polynomial P(x(t)) on C(I) z C(Z)“’ that passes over the point set { P( p,(t,~~))}. Although we cannot know all the values of f(x) which are possible because a polynomial is determined by and constructible from given finite points that it should pass over. From this polynomial P&(t)), we then obtain its Lipschitz’s constant Lp for P(x(t)). Now, back to condition [x-CON2]‘” above, we select any refinement d, = {t,,~llt, = n/N; n = 0, 1, . , N - l} of &” that satisfies condition [n-CON2]. Thus we construct a sample 04, N) = {(x&J, y&J) I m = 1, . . , M II = 0, 1, . . . ) N - l} from A,, A, and J: If the previous polynomial P(x(t)) cannot satisfy condition [rcpv-CON2a] for whatever selection of {p,(t,)lm = 1, , M; n=O,l, . ..) N - l), we have only to modify P(x(t)) so that it can satisfy condition [npv-CON2a]. In consequence, we obtain a polynomial P(x(t)) and a point set {pm(&)} that satisfies [7cpy-CON2a]. In the procedure above, we select the point set {~~(t,,)} sufficiently near to {x,Jt,,)} such that for all m = 1, . . . , M and n = 0, 1, . . . , N - 1, Mx,(r,),&&))
< s/20&.
(99)
Furthermore, we construct any specific set {p 1, . ,pw} of polynomial contours pm = p,,,(t) = ( pm1(t), pm2(t)) that satisfies [rc-CON33 by making use of the Weierstrass’ polynomial approximation theorem. Here, we construct {pl, . . . ,pM) that satisfies [x-CON41 as well whatever partitioning A, of I was previously selected. Therefore, we have constructed a polynomial constructor rc(K; M, N) E (o(M, N), P, {pl, . ,p,}). From the sample a(M, N) and the polynomial constructor rc(K; M, N), we finally select a point set j(K; M, N) = {ym,, = S,(x,)(t,)lm = 1, . . . , M; n = 0, 1, . . , N - l$ so that it can satisfy condition [npV-CON2e]. Consequently, we have constructed a solution frame y(K; M, N) = (o(M, N), x(K; M, N), j(K; M, N)) that proves to satisfy [CON11 in Definition 15. This completes the proof. 0 Let us consider all the solution frame y(K; M, N) that can be constructed Proposition 4. Thus, we put that
in
T(K) = {Y= y(K M, N)ly(K M, N) = (04,
N), n(K M, N), y(K M, N) >:
a solution frame) for all K = 1,
rr
u
K=1,
..P
WQ.
. , KM,
(100) (101)
Y. TakahashiJNeurocomputing16 (1997) 149-186
175
It follows from Proposition 4 and T(K) # 8. Furthermore, our construction procedure of the solution frame y(K; M, N)E T(K) in the proof of Proposition 4 does not put any restrictions on possible solution frames y(K; M, N) except for condition [CON l] in Definition 15. Thus, the set r must include all the solution frames that satisfy condition [CONl]. In general, the cardinality of each T(K) as well as that of r amounts to that of all the real numbers principally owing to the selection varieties of rc(K; M, N) and j$K; M, N). Therefore, we are allowed to find a solution (o’, w”) to the SRV-BGAP by examining against conditions [CON21 and [CON31 in Definition 15 all %-pairs ((T, w) constructed from all the solution frames y(K; M, N) entirely within r.
8. Demonstration of Theorem 1: constructing [CON21 and [CON31 8.1. Developing
an eficient
construction
a solution (co, w”) subject to
method of (o’, CO’) from r
Although the set r is extremely huge, a closer look at it suggests us that we can focus the examination of conditions [CON21 and [CON31 in Definition 15 on a smaller subset of r and besides, develop an efficient examination sequence of them for that subset. Let us demonstrate the following Proposition 5. Proposition 5. (A) Let y1 = y(K; Ml, N’) and y2 = y2(K; M2, N2) be any pair of solution frames of T(K)(l I K < KM) such that M’IM’
and
N’ < N2.
(102)
Assume that there exists an B-pair (o(M’, N’), o’(K)) that is constructed from ‘y2 is constructed from y2 if and only if where we say that (o(M’, N2),w2(K)) (o(M2, N2), o’(K)) satisjes the composite equation/inequality system (46)-(48) derived from y2. Then, there exists some SS-pair (a(M’, N ‘), w’(K)) that is constructed from ‘JI. (B) Let K’ and K2 be any pair of unit sizes (1 i K’, K2 5 KM) such that K’ I K2. Then, the following min.{M’N’ly(K’;
(103) inequality
regarding
M’, N’)ET(Kl)}
sample size MN holds. I min.{M2N21Y(K2;
M2, N2)Er(K2)}. (104)
Proof. (A) As we will show in Proposition 6 in Section 8.2, conditions [T-CON11 and [T-CON21 in Theorem 1 are necessary and sufficient for the existence of some SS-pair (a(M, N), w(K)) that is constructed from any solution frame y(K; M, N)ET(K). From the viewpoint of conditions [T-CON11 and [T-CON21 then, we can naturally
176
Y. TakahashilNeurocomputing 16 (1997) 149-186
regard the solution frame y2 as a refinement of y ‘. More specifically, conditions [T-CON11 and [T-CON21 are stronger for y2 than for yr. Thus, the existence of the SS-pair (c(M2, N2), 02(K)) constructed from y2 ensures that of the SS-pair (a(M’, N’), o’(K)) constructed from y’. This completes the proof. (B) Let y(K2; M& N;)E r(K2) be a structure frame that attains the minimum size M$ N& i.e., MiNg
= min. {M2N21y(K2; M2,N2)~r(K2)).
(105)
Among all the conditions included in condition [CONl], conditions that depend on unit size K prove to be reduced to only two conditions: [n-CON11 and [rc-CON2]. For any unit size K’ that satisfies (103), both conditions are satisfied by some structure frame y(K’; M’, N’)ET(K~) produced from y(K2; M& Ng) by the elimination of unnecessary sample points. Here, the numbers M’ and N’ of y(K’; M’, N’ ) satisfy the following condition: M’ I Mi
and
N’
(106)
Hence, it follows from (105), (106) that min.{M’N’ly(K’; 5 MiNz
M’, N’)eT(Kl)}
s M’N’
= min. (M2N21y(K2; M2, N2)~r(K2)).
(107)
Consequently, inequality (107) yields (104), completing the proof.
0
Proposition 5(A) allows us to focus the examination of conditions [CON21 and [CON31 on only those structure frames y(K; M, N)E T(K)(K = 1, . . . , KM) where sample size MN is the minimum, denoted here by M’(K)N’(K), within T(K). There can be infinite number of solution frames with minimum sample size M’(K)N’(K). However, this variety in general cannot have any influence upon conditions [T-CON11 and [T-CON21 in Theorem 1, that is, the existence of the SS-pair (a(M, N), o(K)) that is constructed from any solution frame y(K; M’(K), No(K)) E T(K). This is because the admissible error E is regarded as quite small compared to the distances between any pair of points Pjl(x~l)(tnl) and Pj2(X,Z)(tn2). Hence, we choose any one solution frame with minimum sample size M’(K)N’(K), denoted by y(K) = y(K; M’(K), N’(K)). We thus have only to examine conditions [CON21 and [CON33 against at most KM solution frames y(K)(K = 1, . . . , KM). Let us put that r” = {y(K) = y(K; M’(K), K = 1, .
N’(K))IM’(K)N’(K):
minimum sample size;
, KM}.
(108)
Regarding condition [CON3], Proposition 5(B) furthermore suggests us to take the following examination sequence on the set of solution frames y(K)(K = 1, . . . , KM). ~(1; M’(l),
N’(1)) + ... -+ y(K; M’(K),
+ y(K”; M”(KM),
N”(KM)).
No(K)) -+ . . (109)
Y. TakahashilNeurocomputing 16 (1997) 149-186
177
That is, an SS-pair (a(M”(Ko), N”(Ko)), o(KO)) constructed from r(KO) is minimal in unit size K, then it is minimal in sample size MN as well. Therefore, condition [CON3a] is naturally satisfied by condition [CON3b], which we have only to examine. 8.2. Solution existence condition [T-CONl] and [T-CON21 This section elucidates a necessary and sufficient condition under which an %-pair (o(M’(K), IV’(K), o(K)) can be constructed from each y(K) = y(K; MO(K), NO(K)) E r”. The condition is an existence condition of some SS-pair (a(MO(K), NO(K)), o(K)) that satisfies the composite equation/inequality system (46)-(48) derived from y(K). This existence condition is explicitly written down by the following Proposition 6. Proposition 6. For any solution frame y(K) = (a(MO(K), NO(K)), n(K; M’(K), N’(K)),y(K; M’(K), N”(K)))To (1 I K I KM), there exists an SS-pair (a(M’(K), N’(K)), o(K)) with o(K) = (K, CI,0, /3) that is constructed from y(K) if and only ifthe structure existence condition [T-CON11 and [T-CON21 in Theorem 1, i.e., the condition (94), (95) (or (94), (96)) holds. Proof. Let us explicitly write down a necessary and sufficient condition of the solution existence to the composite equation/inequality system (46)-(48) derived from y(K). (1) SOlUtiOII CXiStCIKC condition (94) for M: Since (Pk iS One-to-One (monotone increasing), the system (46) is rewritten into
(i = 1,2; k = 1, .
, K; m = 1, . . , M’(K);
n = 0, . . , No(K) - l),
(110)
where q; 1 denotes the inverse function of (Pk.The system (110) is a linear equation system that takes a particular form of (89) when we consider aik(i = 1, 2) variables, and zkm(tn)(k = 1, . . . , K; m = 1, . . . , M’(K); n = 0, 1, . , AJo(K) - 1) and ek(k = 1, . . . , K) all parametric constants. Applying the solution existence condition (90) to this linear equation system (1 lo), we obtain the condition (94) as a necessary and sufficient solution existence condition of (xik) for (110) that should be satisfied by all the parametric constants z&t,,) and (1,. (2) Solution existence condition (95) (or (96)) for (0, /?): The condition (94) is equivalently rewritten into either of the following conditions. b(Zk,
ek)
=
ClkXl
+
c2kX2
when rank(X(M’(K),
(Clk>C2kEIw) No(K))) = 2.
b(zk, 0,) = clkX1(clk E [w) when rank(X(M’(K),
(111) No(K))) = 1.
(11.2)
Y TakahashilNeurocomputing
178
Here, Xi and X2 vectors (X,j) and X,j = xz,(t,)(j = 1, elementwise into the &&)
denote the first and the second MN-dimensional column (X,j) of X(A4, N), respectively, where Xij = x1,(&,) and . . . 2MN). The conditions (11 l), (112) are furthermore rewritten following:
1, .
, K; m = 1, . . . , M’(K);
when rank(X(M”(K), %&J
@J
= (PkL-c1kx1m(tn)+ cZkxZm(L) -
(k =
16 (1997) 149-186
N’(K)))
- 1)
= 2.
(113)
w
= (PkCC1kXimM -
(k = 1, . . . , K; m = 1, . when rank(X(A4’(K),
n = 0, 1, . . . , No(K)
, M’(K);
N’(K)))
n = 0, 1, . . . , No(K)
- 1)
= 1.
(114)
In the meantime, consider the inequality system (47), (48). Since all the values Pj(xm)(t,) are known from the given solution frame y(K), and in addition, the parameters zkm(tn) are also determined by (113) (or (114)), the inequality system (47), (48) for variables bkj becomes Pj(xfn)(t,) - (E/2o) < C BkjZkm(rn) < pj(xfn)(tll) + (E/2o) k
(m = 1, . . . , M’(K);
n = 0, 1, . . , No(K)
- 1; j = 1, 2)
(115)
which consists of 2M”(K)No(K) inequalities. By substituting zkm(tn) in (115) for the right-hand side of(113) (or (114)), we have the following inequality systems for variables (elk, cZk, &, Pkj)) (or (cik, Ok, Pkj)) that consists of 2M”(K)No(K) inequalities. Pj(&)(t,)
- (c/20) < 2
f
bkj(Pk[Clk&l(tn)
C2k%z2(tn)
-
ok]
<
Pj(xm)(tn)
+
(E/20)
k
(k= 1, . . . , K;m= when rank(X(M’(K),
1, .
,M;n=O, N’(K)))
1,
= 2.
(k = 1, . . , K; m = 1, . . . , M;n=O,l,..., when rank(X(M’(K),
N’(K)))
.. . , N-l;j=l,2)
= 1.
(116)
N-l;j=l,2) (117)
The intermediate term in the inequality system (116) includes 3K freely selectable parameters (cik, c2k, 0,) while the numbers Pki and /k2, also freely selectable, can work as independent numerical modulators, i.e., can make the value (Pk[c&i,,,(t,,) + c2$2m(tn) - ok] extend or reduce independently of each other. Hence, the condition (95) turns out to be necessary and sufficient for the existence of a solution (Cik, C2k, &, to the inequality system (116). We have here taken the flkj)
Y. Takahashi/Neurocomputing
16 (1997) 149-186
179
assumption
that none of the coordinates x,i(t,) (i = 1,2; m = 1, . . . , M’(K); n = 0, . . . , No(K) - 1) equals to zero: X,i(t”) # 0. We can take this assumption with-
out any loss of generality by preparing beforehand an appropriate coordinate system (X,, X,) for X”‘. A similar argument is applied to the inequality system (117) and obtain the condition (96). This completes the proof. 0 It is not so hard to calculate by computers condition [T-CON11 and [T-CON21 for any solution frame y(K) = y(K; MO(K), NO(K)) E r”. Consequently, Proposition 6 finishes the demonstration regarding condition [‘TCON11 and [T-CON21 in Theorem 1 for the solution frames yK) = y(K; MO(K), NO(K)) E r”. 8.3. Constructing
a solution (o’, 0’) subject to [CON21 and [CON3]
Let us denote
by y” = y(K”; MO, N’)(l I K” _
y(K) = y(K; M’(K), N’(K))E~’
Proposition 7. There is a structure coo G o(K’) = (K’, LX’,0’, p”) constructible from the solution frame 7’ = y(K’; MO, No) that sati$es condition [CON2], i.e., the conditions (85, (86). Proof. We actually construct
a specific structure (a’, Q”,p”) of unit size K” to the composite equation/inequality system (1 lo), (116) (or (1 lo), (117)) which is equivalent to the one of (46)-(48) under condition [T-CON11 and [T-CON2], i.e., the conditions (94), (95) (or (94), (96)). The discussion about the system (llO), (117) under the condition (94), (96) is quite similar to that about (1 lo), (116) under (94), (95), and thus is omitted. (1) Constructing a part structure (c*, 8*, /3*) from (116) under (95): When the condition (95) holds for y”, we put that /I; = #{Vjo(i)ji
= 1, . . . ) ng>
E min. { # {J’,(n)/ i = 1, . . . , Aj} I 3K” for j = 1,2.
(118)
Y. Takahashi/Neurocomputing
180
16 (1997) 149-186
To the inequality system (116) derived from y”, we then construct (c*J*, B*) c ((cTk,GA, (0: ), (flk*j))such that for all m = 1, . . . ,
a solution
MO;
k
n = 0, . . .
This
No - 1; i = 1, . . . , A:
)
construction
(k = 1, . . . , K”; j = 1, 2).
(119)
possible because the combined parameter can take 3K” different values at the maximum by the apprOpriate SeleCtiOn of the parameter VdUeS (cyk, &, 6,*, flk*j). Owing to a similar derivation technique of (66) in the proof of Lemma 2, we here reselect only O* so that (c*, 0*, j*) satisfies the following condition, similar to (66), as well as (119). [cTkxl,,,(tn)
+
=
is
&xZm(tn)
[ckp$jj(pk
-
@]]
max.m,n{(&*B-‘)~k[
CTk%&n)
for all k = 1,
+
C:khn@n)
-
%.I,
Z&&n)}
<
, K”(m = 1, . . . , h/lo and y1= 0, . . . , No - 1).
EZ
(120)
Here, B,*;, z,$,,(&)and sZ are defined as in the proof of Lemma 2. The cardinality of a set of all the parameter values (c*, tI*, /I*) that satisfy (119), (120) can amount to that of all the real numbers. We are allowed to choose any one (c*, 8*, /?*) of them. (2) Constructing a part structure M* from (110) under (94): The condition (94) is equivalent to (113), from which it follows that qDklt-Z:k(tn)l
+
0:
=
C,&~lm(~n)
+
CT&~~&I)
(k = 1, . . . , K”; m = 1, . . . ,MO; n = 0, 1, . . . , No - 1).
(121)
By substituting the right-hand side of (121) for the right-hand side of (1 lo), we have the following linear equation system for variable M= (aik)(i = 1, 2; k = 1, . . . , K) that is obviously equivalent to (110). $.
@ikXim(tn)
=
c
Ck*lXim(tn)
I (k = 1, . . , K”; m = 1, . . ,MO; n = 0, 1, . . . , No - 1).
(122)
The equation system (122) is further rewritten into the following one: C
~$1= 0
Xmi(tn)[Eik
-
(k = 1,
, K”; m = 1,
, MO; n = 0, 1, . . . , No - 1).
(123)
Therefore, we obtain the following solution c(* = (c&)(i = 1,2) to the equation system (123) and thus to (110). c$ E ci”k (i = 1, 2; k = 1, . , K).
(124)
Y. TakahashilNeurocomputing
16 (1997) 149-186
181
(3) Furnishing the structure (a*, 8*, /I*) with condition [CON21 The whole discussion in the proof of Lemma 2 can be applied to (a*, 8*, /3*). In this consequence, we obtain a structure (cr’, f3’, /3’) condition [CON21 in Definition 15, i.e., the conditions (85), (86), where specified by (51) and (52), respectively while 8’ is selected such that the is satisfied. The completes the proof. 0
the structure that satisfies U’ and p” are condition (67)
Through Propositions 4-7 in Sections 7-8, we have constructed the SSPtriplet (a’, o”, rc”) = (o(M’, No), o(K’), rc(K’; MO, No)) that satisfies conditions [CONl]-[CON31 while we have demonstrated that the unit size minimality condition [CON3b] is specifically achieved by the examination of conditions [T-CONI]. [T-CON2]. This concludes the proof of Theorem 1.
9. Extension of the BGAP to the GAP This section extends the BGAP formulation in Definition 11, the SRV-BGAP in Definition 15 and thus Theorem 1 to the GAP in Definition 1. This extension requires additional substantial technical development. Hence we tersely describe only those points on the GAP that are particularly distinguished from the BGAP. 9.1. Extension
of the problem
formulation
and reduction
Essential extension of the GAP from the BGAP is summarized in a contrastive manner as follows: (a) Every object consists of one or more sub-objects, each sub-object being a compact set in [wp;while in the BGAP, every object consists of a singe contour on the plane [w2.We point out that each sub-object is not necessarily limited to a metric space that is continuously deformable to a (P - 1)-dimensional hypersphere Sp- ’ or a P-dimensional hyperball BP in Iwp Here, each sub-object x is represented by a set in [w’: x = {(xi,
. . ,xP)~[WPl(xl,
. . . ,xp)esub-object}.
(lZ!5)
Also, a finite sample [T = o(M, N1, . . . , NP) of sample size MN,
... NP is specified by
a E ((x,,, ym,,)E Rp x RQjx,,,, E sub-object; y,, = f(x,,J; m=l,...,
M;n=(n,
,...,
np);nl=l
,...,
N,;...;np=l
,...,
NP}
(126)
(b) The unit sizes of the (L + 2)-layer network are (P, Ki, . . . : KL, Q) while in the BGAP, those of the 3-layer network are (2, K, 2). (c) Mapping f is in C1 from a metric space X of sub-objects in [wpto another Y in aBQwhere each sub-object is formulated as a compact metric space in Iwp or IwQ.Each neural mapping 4 is also in C’ from X to Y. It is noted that metrics between any pair of sub-objects are out of scope; from this point of view, X has no metric structure.
Y. TakahashilNeurocomputing
182
16 (1997) 149-186
While in the BGAP, both f and 5 are in C’ from the compact metric space of the contours to another. With these extensions of the GAP from the BGAP, we obtain the following mathematical formulation of the GAP, which, on the surface, is quite similar to that of the BGAP in Definition 11. 16. A mathematical form of the GAP is specified as follows: Assume that an admissible error E and the twofold-C’ mapping f from X to Y are given. Then, construct any SS-pair (o’, 0”) E (o(M’, NY, . . . , Ng), o(Ky, . . . , Ki)) with o(Ky, . . . , Ki) E (KY, .,. , KE; LX:,0:; . . . ; ai, 0:; p”) such that the following conditions are simultaneously satisfied.
Definition
[G-CON11 [G-CON21 [G-CON31
The neural mapping &,,ogeneralizes the same 0’. The neural mapping cUOalso approximates J: Among all the SS-pairs each of which satisfies conditions [G-CON11 and [G-CON2], the SS-pair (a’, o”) satisfies the following minimal conditions simultaneously. [G-CON3a] The sample a(M’, NY, . . . , Ng) is minimal, i.e., sample size M”Ny ... N$ is minimum. [G-CON3b] The structure @(KY, . . . , KY) is minimal, i.e., the sum (KY + ... + Kz) of the unit sizes is minimum. Also, we reduce the GAP into the following standard restricted version, which, on the surface, is quite similar to the SRV-BGAP in Definition 15 as well. Definition 17. A Standard Restricted Version of the GAP (SRV-GAP) is specified as follows: Assume that an admissible error E and the twofold-C’ mapping f from X to Y are given. Then, construct any SS-pair (o’, w”) = (a(M’, NY, . . . , N$), co(Ky, . . . , Ki)) of a sample a(M’, NY, . . . , Ng) = {(x,,,~~,,)} and a structure o(Ky, . . , KF) = (KY, . . . , Kg; c&8:; . . . . LX:,0:; PO) such that the following three conditions are simultaneously satisfied:
[CONl-G]
[CON2-G]
The SS-pair (o’, 0’) has some polynomial constructor rc” = TC(K’; MO, NY, . . . , Ni) E (o’, PO). Note that the third constituent { pl, . . . , pM} of the SRV-BGAP polynomial constructor n(K’; MO, No) = (a’, PO, {p:, . . . , p&} ) has been dropped because it is particularly tuned to the contour and thus unnecessary. The structure o(Ky, . . . , Ki) satisfies the following conditions: foralli=l,...,
I8~j1 < B
for all k = 1, . . . , KL and j = 1, . . , , Q.
[CON3-G]
Pandk=l,...,
Kl
jc&llA
(1=1,...,
L),
(127) (128)
Among all the SS-pairs (a, w) each of which satisfies conditions [CONl-G] and [CON2-G], the following minimality conditions simultaneously hold:
Y. TakahashijNeurocomputing 16 (1997) 149-186
[CON3a-G]
The
sample
a(M”, NT, . . . , N$)
is minimal,
183
i.e., sample
size
M”N: ... Ng is minimum.
[CON3b-G]
The structure o(Ky, . . . , Ki) is minimal, (KY + ... + KE) of the unit sizes is minimum.
i.e.,
the
sum
9.2. Extension of Theorem 1 On reviewing the proof of Theorem 1 described in Sections 7-8, one can easily recognize that there are no demonstration techniques used that are substantially bound to any extensions of the GAP described in Section 9.1. Hence, we obtain the following theorem that provides a solution to the SRV-GAP. Theorem 2. There exists some solution (a’, coo) = (o(M’, NY, . . . , Ni), w(Ky, . . , KF)) to the SRI/-GAP that has some polynomial constructor no = rt(K’; MO, NY, . . . , N;) := (a’, PO). Here, the sum (KY + .‘. + KE) of the unit sizes of the structure o(K:, . . . , K;) = (K;, . . . , Kz; cry, 87,tIy; . . ; ai, 8:; j3” > is obtained as the minimum among all the SSP-triplets (a, or, 7~) that satisfy the following conditioins as well as conditions [CONI-G], [CON2-G] and [CON3a-G] in Dejinition 17:
(aI, . . . , a&substructure
existence condition. al-substructure existence condition.
[T-CONl-G] [T-CONla-G] rank(X(M,
NI, . . .
, NE-))= rank(CXW,NI, . . . , NP),~I(zI~,~dl)
(k = 1, . . . , K,).
(129)
Here, X(M, N1, . . . , NP) indicates an M(NI . . . Nr) x P constant matrix (x,,i); MN1 s.. N,-dimensional parametric constant and besides, bl(zlk, Olk) and vector ((P;~‘(z~~,,,~)+ elk) (k = 1, . . . , K1; m = 1, . . . , M; n = (nI, . . . , nr); n, = 1, . . . , N1; . . . ; np = 1, . . , Nr; i = 1, . . . , P). [T-CONlb-G] (x2, . . . , al)-substructure existence condition. rank(Z, -
1W, NI, . . . , NP))= ran@LG- 1W, N1, . . . , NPMh, &)I 1 (k = 1, . . . , K1; 1 = 2, . . . , Lo).
(130)
Here, Zl_ ,(M, N1, . . , Nr) indicates an M(N1, . . . , Nr) x (K, - 1) constant matrix (.qK,- l)mni);and besides, bt&K,k, &,tJ and MN1 ... Nr-dimensional parametric constant (k=l, . . ..Kt. 1 = 2, . . . ) L; m=l, . . ..M. vector ((Pi,: (ZK,- kmn)+ t&J n = (nl, . . . , n,); n, = 1, . , N1; . . . ; np = 1, . . . , Np; i = 1, . . . , P). [T-CONZ-G] (Q,, . , 13,; P)-substructure existence condition.
min.{ # {Vj(l)(l when
= 1, . . . , Aj}}<(K,_,-i+l)K,
rank(Z,(M,
N1, . . . , Np)) = min.(MN,,
=KL_l-i
(OIiIKL_l-l).
forj=l,...,Q,
. . . , Nr, KL_l)
- i
(131)
Here it is supposed that KL_l I MN1,
. . , Nr.
(132)
Y. TakahashiJNeurocomputing 16 (1997) 149-186
184
9.3. A practical
alternative
to the universal approximation
theorem
Let us look at an impact of Theorem 2 upon the universal approximation theorem developed by Funahashi [S] and Hornik et al. [6]. They mathematically rigorously proved that any Co (not restricted to C’) mapping f from a compact set in [wpto [Weis approximated within any given admissible error by the 3-layer network with sufficient intermediate units. In the universal approximation theorem, however, they did not actually construct any specific neural mapping to approximate the Co mapping f given. Our Theorem 2 actually constructs the specific minimal structure to approximate the C’ mapping f given, although Theorem 2 addresses to the restricted version of the GAP, i.e., the SRV-GAP and besides, the mapping f is in Cl, not in Co. Furthermore, Theorem 2 meets the natural requirement [REl] from the real-world environment that one can observe and collect only a finite set of sample points from the mapping J: Hence we consider that Theorem 2 is sharper and more practical than the universal approximation theorem. More specifically, Theorem 2 produces the following special corollary that we can claim is a promising alternative to the universal approximation theorem. Corollary. Assume that an admissible error E and a C’ mapping f from a compact set X in Rp to RQ are given. Then, there exists a minimal sample o” = o(1, NY, . . . , Ng) = {(x,y,)ln = (nI, . . . , np); nl = 1, . . . , Nl; . . . ; nP = 1, . . . , Nr} and a minimal structure coo = w(K’) = (K’, MO,Q”, f3”) of the 3-layer network constructible such that &,,o generalizes o” and moreover, ty, o approximates f by way of some polynomial constructor Z’ = n(K’; 1, N,, , NP) = (a’, PO) where the minimum unit size K” is obtained as the minimum among all the SSP-triplets (a, co, Z) that satisfy the following conditions simultaneously:
[C-CONl-G] rank(X(1, (k = 1,
a-substructure NI, . . .
existence
condition.
, NP))= ranWCX(1, NI, . . . , NP),bb, 4Jl)
, K).
(133)
Here, X(1, Nl, . , Nr) indicates an (N, ... NP) x P constant matrix (x,~); and besides, vector (N, ... N,)-dimensional parametric constant and b(zk, ‘U (qkl(zkn) + t&)(k = 1, . . . , K; n = (nI, . , nr); n1 = 1, . , Nl; . . . ; np = 1, . . . , Nr; i=l > “. >P). [C-CON2-G) (0, f3)-substructure existence condition.
min.{#{Vj(/l)l;L= when rank(X(M, =P-i where it is supposed P I Nl,
1, . ..) Aj}} <(P-i+ N)) = min.(N, (OIi
l)K
forj=
1, . . . . Q,
... N,, P) - i (134)
that
, Np.
(135)
Y. TakahashijNeurocomputing 16 (1997) 149-186
185
10. Conclusions
This paper first formulated the GAP (Generalization-and-Approximation Problem) which is a problem to simulate the humans’ generalization-and-approximation capability by means of layer networks with C’ activation functions. The GAP nearly includes as its special subproblem the approximation problem of the Co mapping from a compact set in lRpto [WQresolved by the work [S, 61. It then reduces the GAP into the Standard Restricted Version (SRV-GAP) where the given mapping can be generalized and approximated only through polynomials. Since the polynomial is one of the most efficient calculation means by computers, the SRV-GAP turns out a central issue when one tries to construct a practically feasible GAP solution in the real-world environment. Finally, it constructed the minimal pair (sample, structure), a solution to the SRV-GAP, by the aid of the Weierstrass’ polynomial approximation theorem and the solution existence theorem for linear equation systems. The specific solution construction to the SRV-GAP was stated as theorems: Theorem 1 for the special object world, the set of contours on the plane Iw2and for the 3-layer network; Theorem 2 for the general object world, an ensemble of sets each of which consists of compact sets in [wpand for the general multilayer network. Theorem 2, in particular, elucidates the generalization-and-approximation capability of the minimal multilayer network loaded with the minimal sample. Furthermore, it also provides a practical solution construction method that one can employ in the real construction of the solution by computers. In fact, Corollary to Theorem 2 is expected to be a useful practical version of the current universal approximation theorem [S, 61 though it is restricted to the C’ mapping that is a sub-class of Co by [5,6]. Unfortunately, the full GAP has not been solved completely in this paper. Solving the GAP requires exhaustive investigation on all other solution construction methods different from our polynomial constructor method developed for the SRV-GAP. Since there can exist numerous other methods, the whole GAP seems to be extremely difficult to solve completely. Hence we are confident that the SRV-GAP solution is practically sufficient for most real-world environments, but we also think it theoretically interesting to solve the full GAP mathematically rigorously. We thus leave it as a challenging future study.
Acknowledgements
The author is grateful to the anonymous reviewers for their valuable suggestions and comments that improved the presentation of the results in this paper.
References [l]
P.I. Bartlet, The sample size necessary for learning in multi-layer networks, Proc. ACNN pp. 14-17. [2] E.B. Baum, On the capabilities of multilayer perceptrons, J. Complexity 4 (1988) 193-215.
1993,
186
Y. TakahashilNeurocomputimg
16 (1997) 149-186
[3] E.B. Baum, D. Haussler, What size net gives valid generalization?, Neural Comput. (1989) 151-160.
[4] M. Bichsel, P. Seitz, Minimum class entropy: a maximum information approach to layered networks, Neural Networks (1989) 133-141. [S] K. Funahashi, On the approximate realization of continuous mappings by neural networks, Neural Networks (1989) 1833192. [6] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks arc universal approximators, Neural Networks (1989) 359-366. [7] C. Ji, R. Snapp, D. Psaltis, Generalizing smoothness constraints from discrete samples, Neural Comput. (1990) 188-197. [S] J. Lee, A novel design method for multilayer feedforward neural networks, Neural Comput. (1994) 8855901. [9] A. Malinowski, J.M. Zurada, Minimal training set size estimation for sample-data function encoding, Proc. INNS, San Diego, CA, vol. 2, 1994, pp. 372-390. [lo] G.J. Mitchison, R.M. Durbin, Bounds on the learning capacity of multi-layer networks, Biol. Cybernet. 60 (1989) 345-356. [l l] S. Mukhopadhyay, L.S. Kim, S. Govil, A polynomial time algorithm for generating neural networks for pattern classification: its stability properties and some test results, Nueral Comput. (1993) 317-330. [12] A. Roy, L.S. Kim, S. Mukhopadhyay, A polynomial time algorithm for the construct and training of a class of multilayer perceptrons, Neural Networks (1993) 5355545. [13] M.A. Sartori, P.J. Antsaklis, A simple method to derive bounds on the size and to train multilayer neural networks, IEEE Trans. Neural Networks 2 (4) (1991) 467-471. [14] M. Shoam, M. Meltser, L.M. Manevitz, Constructive uniform approximation of differentiable vector-functions by neural network methods, Proc. INNS, San Diego, CA, vol. 2 1994, pp. 3722390. [15] Y. Takahashi, Generalization and approximation capabilities of multilayer networks, Neural Comput. 5 (1) (1993) 1322139. [16] R. Zollner, H. J. Schmitz, F. Wiinsch, U. Krey, Fast generating algorithm for a general three-layer perceptron, Neural Networks 5 (1992) 771-777.
Takahashi received the M.Sc. degree in mathematics from The University of Tokyo, Tokyo, Japan in 1975. He is currently with NTT Communication Science Laboratories, Kanagawa, Japan. His research fields include communications protocol, fuzzy theory, neural networks, nonmonotonic logic, genetic algorithms, and semantics information theory. Mr. Takahashi was awarded the first Moto-oka Commemorative Award in 1986. He is a member of the Japanese Institute of Electronics, Information and Communication Engineers, and the Information Processing Society of Japan. Yoshikane