The characteristics of a biased estimator applied to the adaptive GMDH

The characteristics of a biased estimator applied to the adaptive GMDH

Mathl. Comput. Modelling Vol. 17, No. 1, pp. 37-48, Printed in Great Britain. All rights reserved 1993 Copyright@ 0895-7177193 $5.00 + o.ocl 1993 Pe...

754KB Sizes 0 Downloads 10 Views

Mathl. Comput. Modelling Vol. 17, No. 1, pp. 37-48, Printed in Great Britain. All rights reserved

1993 Copyright@

0895-7177193 $5.00 + o.ocl 1993 Pergamon Press Ltd

THE CHARACTERISTICS OF A BIASED ESTIMATOR APPLIED TO THE ADAPTIVE GMDH T. NISHIKAWA Department

of Management 6-6,

Engineering,

Asahigaoka, (Received

AND S.

SHIMIZU

Tokyo Metropolitan

Institute

of Technology

Hino-city, Tokyo, 191, Japan September

1991)

Abstract-hi this paper, the characteristics of the ridge biased estimator obtained in the adaptive GMDH (Group Method of Data Handling), which applies the ridge bias parameter to a system identification procedure is discussed. Using the canonical model based on the generalized ridge estimator, the stability and shrinkability of the ridge estimator depends on certain convergence conditions, which are described and examined with the relationships of the iterative procedure for obtaining an optimal estimator. Furthermore, it is shown that a principal component estimator is included in the ridge estimator in the adaptive GMDH.

1. INTRODUCTION The mathematical modelling of a system is applied in system analysis for identification of systems, The modelling problems of environmental, social, and forecasting, optimization and control. management systems in general are always ill-posed because there are complex systems, such as stochastic systems, which are generally characterized by large dimentionality, and nonlinear For these complex systems, many studies and systems with unanticipated unknown elements. useful methods to solve the problems of system identification have been studied actively in recent years, among which the identification method of the “Group Method of Data Handling (GMDH)” developed by A. G. Ivakhnenko is well known [l-4]. The conceptual basis for the structure of the GMDH is based on heuristic self-organization, and the resulting polynomial equation from its identification procedures is dependent on the Kolmogorov-Gabor polynomial [5]. The structure of the GMDH can be considered as a kind of heuristic filter which eliminates the harmful information in order to increase accuracy through the multilayered algorithm where self-sampling thresholds of useful data are used in each layer. As stated above, the GMDH method appears to be practical and effective for identifying complex systems; however, the GMDH method arises from multicollinearity because of the feature of its structure, and we need an alternative method of estimation that enables us to remove the effects of the multicollinearity. Therefore, the system identification method described in this paper focuses attention on a modification and extension of the parameter estimation of the GMDH algorithm by applying the ridge regression estimation. And then, the ridge regression estimation as a biased estimation, and an alternative method to the least squares estimation method that performs efficiently in the presence of multicollinearity, is discussed here. This alternative is described as the adaptive GMDH. We will mention some of the charateristics of the ridge estimator and the generalized ridge estimator of the canonical form in the adaptive GMDH procedure, in which we state that the principal components estimator is a special case of the ridge estimator. 2. GMDH

METHOD

As previously stated, the GMDH method is considered to be one of the system identification methods for nonlinear complex systems; in other words, it is the identification and the forecasting method applied to heuristic self-organization (the use of heuristic thresholds for the self-sampling Typeset 37

by AA&‘&X

T. NISHIKAWA,S. Stwmu

38

of useful information through the multilayer) and to complex social phenomena, and it is also the system with layer dimensionality. The GMDH method has the rem,arkable feature with which it is possible to identify simultaneously the structure of the optimal complexity model and its coefficient parameters. The fundamental structure of the GMDH is shown in Figure 1, where it is illustrated that the GMDH is one of the control techniques which makes it possible to combine the variables in the selection layer, and the optimal combination minimizes the various heuristic criteria in multilayer selection. The basic idea of the GMDH algorithm consists of the selection of partial descriptions (polyno mials) that yield at the next layer new partial descriptions than those generating them, and the determination of the optimal heuristic criteria for appropriate selection of the number of variables that are let through. The algorithm to obtain the optimal output is as follows:

0)

Determine the basic function that becomes the model to identify after considering the plant system (forecasting and control). (2) Determine the output variables and input variables that influence the output, and then choose the pm-history (time lag) as to the input variables, considering the time series of data set. Divide the data into two sets called the “training set” and “checking set.” The former is (3) used to estimate coefficients of partial description, and the latter is used to calculate the selection criterion of the, intermediate variables (the value of the partial description). (4 Identify the partial description with different combinations of the input variables. The training sequence of data is taken two at a time in accordance with the basic function. 3:’ (I) = Zm(Z + 1) = Fzfr (Zi(l), Cj

=1,2 ,..., p,

;#f;

m=1,2

,...,

M,

Zj(l))

=

&j(Z)

6(I),

M
1=1,2 ,...,

13

,

(24

is the mth partial description, and the coefficients in equation (2.1) are least squares estimates based on the training sequence of data of the coefficients of the assumed model:

y$)(I)

=

a(l) =

.2+)(l) a(l) ‘I

+ e(l)

(2.2)

,

[zpT(I)zy(r)] -l .qT(I) y(‘)(l),

f2.3 1

where a(l) is the v x 1 vector of the coefficient estimators, 2$‘(l) is the n x 6 matrix of r&asurements of the input variables, and ytt)(l) is the n x 1 vector of measurements of the output variables &j(l). N denotes the time point. After the partial description of the number of combinations is identified, an arbitary number is selected and filtered to the next selection layer in the accordance with the selection criterion E(I) which is the value of the criterion for the partial description in the lfth final selection layer. 3. THE ADAPTIVE

GMDH

The stability of the coefficients of the system equations as the solution of the system ident&* tion problem is very important when they are used for forecasting and control problems. When identifying the partial description models in the multilayer selection in the GMDH algorithm, their coefficient parameters are unstable because the partial descriptions are repeatedly identified until they generate the threshold. For the reason mentioned above, because of the essential characteristic of the GMDH structure, the part of the output obtained in the previous selection layer becomes the input in the subsequent one, and then they are highly correlated, as shown in Figure 1. The identticat’ion by the application of which, arises from the trouble,with some multicollinearity problems when using the least squares method. The multicollinearity among the inputs causes [Zij(t)*(f) .$)(I)] to be nearly singular, this implies that a small eigenvalue of ZijctJT(i) Z+;)(I)] indicates near the I singularity of ZijctjT(I) 2$)(r)]. [

The characteristics of a biased estimator

39

f----p-P

Heuristic checking

filter by using data (2)

A 4

1 _____---------

----

4

rE:

2

11 )

1 L

1i ____

Heuristic checking

____

----------

\

L

filter by using data (1)

Figure 1. Structure of the GMDH.

As one of the methods to remove multicollinearity and to stabilize the estimators of the regression coefficients when multicollinearity exists, it is known that the biased estimation method is useful for parameter estimates [6,7]. We apply the ridge regression by Hoer1 and Kennard [8,9]

T. NISHIKAWA, S. SHIMIZU

40

to the estimation of the regression coefficients of the partial description in the GMDII algorithm. The ridge estimators obtained are biased, but stable in the sense that they are not affected by slight variations in the estimation data and tend to have a smaller mean squares error than the ordinary least squares estimators [8]. Estimations rameter

based

on the correlation

k (0 < k < l),

rather

matrix

than

Zij(t)T(I) 2$)(I) + k(I)l] and the ridge bias pa[ [Zi;‘*(/) .$)(I)] have been found to be reasonable

on

procedures that can be used to help circumvent many of the difficulties associated with the usual least squares estimates [lo]. Here, we shall propose “the adaptive GMDH” which is an extension of the GMDH. The term “adaptive” indicates that the ridge bias parameter k is determined according to the characteristics of the data. The structure of the adaptive GMDH is shown in Figure 2. Indeed, the effectiveness of the applications becomes clear after the investigation on the forecasting problems [ll]. Now, to apply the foregoing to parameter estimations by the adaptive GMDH, the partial description in the Ith selection layer is identified as follows: Z,(I+l)=~~l,(Zi(l),Zj(l))=a~,,+a:,,’Zi(l)+a~

, 8

I

,m,,‘zj(l)+a,k,,‘Zi(l)Zj(l) 9 #

3 t

.

(31)

where i, j = 1,2 ,***,P,

i # j;

m=

bk(l) = Thus,

1,2,...,

[Z$'*(l)

ok(l) is the ridge estimator

M,

M <

Z,!jt)(Z) + k(l)I]

in the Ith selection

1=1,2

+1C2;

-’ z,!;'*(I)

,a**, 1fl

Zj = final layer;

Z&f'(i) ir(l).

(3.2) layer, and k means the ridge bias parameter.

is a matrix of correlation coefficients of Zi(l), Zj (I) it is assumed that Zjj(1)T(I) zj;)(/)] [ and Zi(/) Zj(l). I d enotes a unit matrix and b(l) means the least squares estimator of the partial descriptions. Superscript (t) means the training sequence of data. As a result of selection in the multilayer, the procedure to identify those partial descriptions in the multilayer is respected until the value of s:(l) decreases in each layer with respect to the selection criterion &(I). The following inequality satisfies Where

&k(l) > E?(2) > . . . > &!(I - 1) > &f(l) 5 &!(I + l), and hence the lth layer becomes the final layer 1,; that .$(I) that determines the final layer as: .55(l) = min {s;(l)

] I=

is, I=

lf, where we define the threshold

I,, m = 1,2, . . . ,pl_lCz},

and then EL(~) is the index of the selection criterion that selects scriptions in the lfth (final) 1ay er. The existence of this threshold by S. Shimizu and T. Nishikawa [12]. 4. THE

GENERALIZED

We shall adapt

the partial

RIDGE description

(3.3)

ESTIMATOR

OF THE

(3.4)

m number of the partial de&!(I,) has been investigated

ADAPTIVE

GMDH

in the Ith layer

yet)(l) = Z,!jt)(l) ok(l) + e(l),

(4.1)

where it is assumed that Z!?)(l) is an n x Q matrix and of rank q, d(l) is a q x 1 vector and normally distributed with mean 0 and unknown, and e(l) is an n ‘: 1 vector of disturbances, covariance matrix a21. The superscript (t) denotes the training data. For convenience, we assume

that

the independent

variables

form. Zij(t)*(1) Z&?(I)] is in correlation [ the ridge bias parameter, 0 < ki(Z) < 1, to the

are scaled so that

We consider the bias estimator that introduces identification process of the partial description.

The characteristics

of a biased estimator

Decide the complete

41

description

pp!;;-

B

M x

min.5..1(2)SminE..1(3) 1

I

I

I

2

3 h I

-

min E..~(l)Smin

E..1(2)

I

Arrange the value of the selection criterion of the partial descriptions, &42), in ascending order

Decide the number layer

of input

variables

to the 2nd

1 ,m

t

: Arrange the values of the selection criterion of the partial descriptions, e,“(l), in ascending order

Figure 2. Structure

of the adaptive GMDH.

.

-

T. NISHIKAWA,S. SHIMIZU

42

Make matrices, Z”“(/) and y”‘(l), of the training data of input and output variables

Make matrix C(‘)(l) composing of the elements of principal components of FL)(I)

Calculate ~2=Ver[y(“(l)-CC”‘(l).i,(l)‘o’]

Calculate ki(O’“‘==

27 (bi (l))2

i =I,2 )......, q

I

I

s=l

YES

I Calculate

cl

STOP

Figure 3. Computation algorithm of the optimal ridge estimator.

The characteristics

The Ai(

indicating

the eigenvalues

a.. 1 X,(1), and V(I) denotes eigenvectors

of [Zi;)T(I)

I, where h(l)

of

emrnator

biased

as defined

43

) are ordered

Zijc*)‘(l) Z$j([)]

the eigenvector

Zc,!‘(/)]

of a

of cigenvadues

columns of V(I) are the normalized characteri&. vectors correspondlfig V(l) may be used to define a new set of variables, rarnely

The C(I) is referred to as the principal be rewritten in terms of C(I) as

1 Aa(/) 2

orthonormal

V(1) = A(I), V(/)T I’(l) =

by t’ (;Sr i,Z,$‘T(Q Z:;)(i)]

.zqZ) V(Z) ‘I

.\,(I)

con& tmg of the corresponding

matrix

= diag [Al(Z), X2(/), . . , S.,(l)] is 1 tl ie matrix

C(Z) =

such that

of [ZiJ’T(/)

,?$‘(/)]. The

to xl(r), AZ(l), . . yAq(I).

h(l) = V(I)Tc6(f).

and

components.

The canonical

(4.2)

model of equation

(4.1) can

y(‘)(l) = C(‘)(I) b’(l) + e(l). Suppose

is of rank r, so that

[Z!Jf’T(/) .Zj/)(I)]

We write in partitioned

the last (q - r) ordered

(4.3) elements

of A(1) are zero.

form

V(I) = W(OrI w,-r1 ,

(4.4

wT = [WT I W)~-rl

(4.5)

7

and A(l) = diag [A($ The generalized

ridge estimator

of the canonical

&I)

I A(/&].

(4.6)

is of the form

= [A(/) + K(I)]-‘C(*)T(I) = [A(l) + K(1)]-%(1)

#)(l),

(4.7)

a(/),

(4.8)

Equation (4.7) is computed where K(I) = diag [kr(/), kz(/), . . . , k4(1)], a positive definite constant. from computation algorithm of the optimal ridge estimator as shown in Figure 3. From equation (4.8) and the orthogonal transform V(I), we obtain the generalized ridge estimator of a(l) as P(l)

= [Z$,T(I)

2,(,“(I) + V(l) K(I) v(r)rl)

= [QT(/)

2$)(l)

-l Z$‘T(I)

+ V(I) K(I) V(i)T] -l z,!:QJ

y(f)(l) Z;;)(l)

(4.9) G(I).

(4.10)

In equations (4.7)-(4.10), the generalized ridge estimator becomes the least squares estimator, when K(I) = 0. We can now be obtained a new expression with respect to tiK (b) by applying a matrix inversion lemma to equation (4.10).

&K(Z)=

1[ I -

z$yT(I)

@(I)]-l V(z)(V(z)Tiz$yT(Z) 2$)(Z)] --lV(Z) + K(r)-‘)-lv(z)T} ii(Z)

= {I - V(I) A(I)-1 = {I - V(i)[A(I) By substituting and by letting

[A(l)-’

+ K(l)]-%(I)

V(1) = V(I),_, the elements

V(I)T}

+ K(I)-‘]-’ V(I)T}

(4.11)

ii(I)

and [zicj’JT(/) Z!;)(I)]

of K(I)

i(I)

-’ = V(l) A(1)-’

V(I)T into equation

(4.11)

be large, we obtain

ayq = [I -

V(Z),_,

v(z);_‘_,]G(l).

(4.12)

44

T.

NISHIKAWA, S. SHIMIZU

Thus, iiK(/) described in equation (4.12) represents the principal components estimator. That is, it follows from equation (4.12) that the ridge estimator of the partial description in the Ith layer is given as the principal components estimator [13]. 5. THE

RIDGE

PARAMETER

IN THE

ADAPTIVE

GMDH

Though various arguments have been proposed with regard to the determination of K(I), optimal values for the ridge bias parameter K(Z) in each selection layer can be considered to be those K(I) that minimize the mean square error as MSE [bk(l)] = E [(b’(l) - b(QT (bk(l) - b(l))] . Another

expression

of equation

(5.1) can be written

(5.1)

as

(5.2)

where,

Xi(l) > 0, i = 1,2,. . . , q and therefore

aMSE[&k(l)]

the condition

= 0 gives

ahi ki(l) = c2 hi(l)-“, Hence, optimal

I= 1,2 ,...,

i = 1,2, * . . , q,

values for ki(l) can be expressed K(I) = diagc2[b1(1)-2,

. . , b,(1)-2],

b1(l)-2,.

i=

~~{6ki(l)(~)}-~ , I

(5.3)

in the form

which becomes the index that require to obtain the stabilized variance unbiased estimator of c2 and least squares estimator this procedure can be written as ki(~) =

fj.

1,2 ,...,

(5.4)

ridge estimators. Here, let, minimum of &i(l) be u2 and &i(1), respectively,

q,

I= 1,2 ,...,

lf.

(5.5)

6. CHARACTERISTICS OF THE RIDGE ESTIMATOR IN THE ITERATIVE PROCEDURE 6.1. Application of the Ridge Bias Parameter In this section, we consider the identification process in the iterative procedure that introduces K(I)% in equation (5.4) to the parameter estimation of coefficients in each layer of the multiselection layer. Characteristics of the ridge estimator of the adaptive GMDH may be expressed with the following theorem. THEOREM 1. We assume that

iterate

in the iterative

the elements of the q-vector iFi( which can be obtain procedure, are represented as a diagonal matrix G(l)(‘+‘):

by the sth

-’ G(/)(O), W( ‘+l) = [I + u~A(~)-‘(G(/)(~))-~]

(6.1)

G(I)(“) = diag 8t1(/)(‘), 1

(6.2)

where

The superscript (s) denotes the sth iterate. equation (6.1), and we obtain

6ta(l)(‘) , . . . , ~~.(lp]

Let M(I)(‘)

.

= u2 A(I)-‘(G(I)(“))-2

hf(I)(“+‘) = M(1)@) (I + M(I)(‘))2.

be defined as in

(6.3)

45

The characteristics of a biased estimator PROOF.

From equation (4.8), suppose that the q-vector xi(z) &i(z) can be expressed ss a diagonal

matrix H(Z) = diag [xi(z) k(z), AZ(Z)Z&(z),. . . , ~~(1) i,(l)]

.

(6.4)

From equations (4.4) and (6.4), we have that G(Z)(‘) = A(Z)-%(Z).

(6.5)

As described earlier, the iterative procedure can be written by following matrix formula: G(Z)(‘) = [A(Z) + K(Z)]-’

Substituting

A(Z) G(Z)(‘).

(6.6)

equation (6.5) into equation (6.6), we can write equation (6.6) as G(Z)(‘) = [I + F~A(Z)-‘(G(~)(~))-‘]

-’ G(Z)(‘),

(6.7)

and then generally in the sth iterate, we obtain as follows

G(Z)@+‘)= [I +

ezA(Z)-~(G(z)(#))-21 -’ G(Z)(O).

(6.8)

Here, by raising both sides of equation (6.8) to the minus 2”d power, an expression for (G(Z)(“+‘))-2 may now be given to rearrange an iteration for G(Z)(‘+l), considering that the matrices G(Z)(O) and [I + &2A(Z)-‘(G(Z)(‘))-2] in equation (6.8) are diagonal and commuted with each other. Furthermore, pm-multiplying both sides of the equation by C2A(Z)-l, yields the following:

d2A(Z)-’ (G(Z)@+i)) -2 = ti2 A(Z)-’ (G(Z)“‘) -2 [I + k2 A(Z)-’ (G(Z)(‘)) -2] 2 ,

(6.9)

so that if we let M(Z)(‘) = 82 A(Z)-’ (G(Z)(“) -2 ,

(6.10)

we obtain the following iterative relation:

M(z)(“+‘) = M(Z)(O)(I + qzp)

6.2. Convergent

Chamcteristics

of the Ridge Estimator

2.

(6.11)

in the Zth Layer

One of the purposes of the iterative protiess is to find out the unstable estimator in the model to be identified. By applying the ridge bias parameter ki(Z) in the multilayer, we clarify the convergent characteristics of the generalized ridge estimator and its convergent limitation. And then their behavioral patterns are shown by using the ridge trace (a plot of the ridge estimator versus kj(Z)). The ridge estimator is stable, which implies that the elements of a diagonal matrix M(Z)(“) converge to a certain constant. In this section, we investigate the convergence of M(Z)(‘) [14,15]. In the scalar formula of equation (6.11), when mi(Z)(‘) > 0 for an arbitrary i, from equation (6.10) we have that ?TQ(Z)(‘)

= W(Z)(‘)

(1+ mj(Z)(0))2 > ??li(Z)(‘),

mi(Z)(2)

= mi(Z)(‘)

(1+ mi(Z)(1))2 > mi(z)(‘),

T. NISHIKAWA, S. SHIMIZU

46

and then we obtain mi(/)(“+l)

> *i(l)(“)

For b!‘(Z) # 0 for all i, when mi(l)(‘)

converges

that substitutes equation solution to mi(l)’

> 0.

(6.12)

we have that

m(l)*.

(6.13)

(6.13) into the scalar formula

of equation

(6.11),

(I--J1_4mio(o)) -1,

mi(~)*=((2mitl)(0)))

where the minus

to m(l)*,

mi(l)(“) =

JiiI

From the relationship we obtain the explicit

fO1 mi(l)(‘)

(6.14)

sign is selected.

For 0 < mi(l)(‘) 5 0.25, in equation from equation (6.11), we have

which can be written

(6.14) mi(1)* converges

to a certain

mi(l)*

N

0

when mi(l)(‘)

2: 0,

mi(r)*

21 1

when mi(l)(‘)

N 0.25,

and then

(6.15)

as 0 <

W(l)(‘) < mi(l)(‘) < . . . < mi(/)(“) -<

where 0 < mi(l)(‘) 5 0.25, i = 1,2, . . . , q, 1 = 1,2, . . . , lf. Next, from the above, we investigate the convergence with respect tion (6.10), and we have the matrix formula

G(i)@+‘) = Here, by substituting conditions of equation

real number,

[I + M(I)(‘)]-’

(6.16)

1,

to equation

(6.1) and equa-

G(I)(').

(6.17)

equation (6.13) into the jth element of equation (6.17), considering (6.15), we can write the scalar formula of equation (6.17) as

the

(6.18)

Consequently, estimators

I

by the sth increase,

&Fi (I)(‘+‘)

reduce

it follows from equation

to !j of the absolute

for i= 1,2 ,..., q, for unstable estimators. These convergent characteristics of the generalized in from Figures 4-8.

values estimators

(6.18)

that

of the initial

the generalized estimators,

in the adaptive

ridge

ISfi (Z)(O)I

GMDH are shown

7. CONCLUSIONS In this paper, we discussed that the adaptive GMDH applied the ridge bias parameter to the GMDH and gives reasonable estimators which remove multicollinearity. Furthermore, we showed clearly that the principal components estimator is a particular case of this ridge estimator obtained in multilayer selection of the GMDH.

47

The characteristics of a biased estimator

2 ilk

1

6:(,x ~'"'=o.oo171)

0.6 -

6:(m1’“‘=0.0000J)

0.5

0.4-

0.2-

-0.5

1

6:(m3~0'=0.00096)

i

-10.0 t

-15.0iI

,I

1

-0.4

6:(m2’“‘=0.02517) I

I

2

3

_I-

6:(m ~‘~‘=0.03772)

I

I

I

2

1

3

-s

-s

Figure 4. Plot of the generalized ridge estimator b(r) versus the iterate s. [Zl(r)].

Figure 5. Plot of the generalized ridge estimator b(I) versus the iterate s. [22(1)].

i&nz'"'=0.00016) 0.5-

0.0-

-“‘5

I 1

I

3

I

5

I i

I 9

I 11

I 13

-S

Figure 6. Plot of the generalized ridge estimator 6(l) vemus the iterate s. [23(l)].

Figure 7. Plot of the generalized ridge estimator t(l) versus the iterate s. [Z4(1)].

T. NISHIKAWA, S.

48

SHIMIZU

6&mP'=O.O0475) --0.5 -

-1.0 t

,

,

(

, M(m2’o’=0.06551)

I

2

3

4 -s

Figure 8. Plot of the generahzed ridge estimator b(l) versus the iterate s. [Za(1)]. REFERENCES 1. A.G. Ivakhnenko, Polynomicai theory of complex systems, IEEE Trans., Man Cgbern., SMC-1, 364-378 (1971). 2. A.G. Ivakhnenko, Heuristic self-organization in problems of engineering cybernetics, Aulomalica 6, 207-219 (1970). 3. A.G. Ivakhnenko, Yu.V. Koppa and W.S. Min, Polynomial and logical theory of dynamic systems (part l), Sov. Automat. Contr. 15 (3), 1-13 (1970). 4. A.G. Ivakhnenko, Yu.V. Koppa and W.S. Min, Polynomial and logicai theory of dynamic systems (part 2), Sov. Automaf. Contr. 15 (4), 11-30 (1970). 5. R.A. Roy and J. Sherman, A learning technique for Volterra series representation, IEEE Trans. Automat. Conlr. 12, 761-764 (1967). 6. S.L. Sclove, Improved estimators for coefficients in linear regression, J. Amer. Stafisl. Assoc. 63, 597-606 (1668). 7. RR. Hocking, The anelysis and selection of variables in linear regression, Biometrics 32, 1-49 (1976). 8. A.E. Hoer1 and R.W. Kennard, Ridge regression; biased estimation for nonorthogonal problems, Technometrics 12, 55-67 (1970). 9. D.W. Marquardt, Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation, Technomelrics 12, 591-612. 10. T. Nishikawa and S. Shimizu, The effectiveness of application ridge estimator to the GMDH aIgorithm, J. Sot. Sya. Eng. 5,3%48 (1981). 11. T. Nishikawa and S. Shimizu, Identification and forecasting in management systems using the GMDH, Appl. Maih. Modeling 6, 7-15 (1982). 12. S. Shimizu and T. Nishikawa, An analysis of mechanism to occur the threshold in GMDH, J. Sot. Sys. Eng. 7, 31-48 (1983). 13. T. Nishikawa and S. Shimieu, The properties of biased estimator applied to GMDH, Memoir8 of Metro. C. Tech. 12, 130-145 (1963). 14. W.J. Hemmerle, An explicit solution for genera&& ridge regression, Technometrics 17, 309-314 (1975). 15. T. Nishihawa and S. Shimizu, An anelysis of the generahzed ridge estimator in the adaptive GMDH, J. Sot. Sya. Eng. 9, 53-66 (1965).