Neural network approach for solving nonsingular multi-linear tensor systems

Neural network approach for solving nonsingular multi-linear tensor systems

Journal Pre-proof Neural network approach for solving nonsingular multi-linear tensor systems Xuezhong Wang, Maolin Che, Yimin Wei PII: DOI: Referenc...

2MB Sizes 0 Downloads 81 Views

Journal Pre-proof Neural network approach for solving nonsingular multi-linear tensor systems Xuezhong Wang, Maolin Che, Yimin Wei

PII: DOI: Reference:

S0377-0427(19)30574-6 https://doi.org/10.1016/j.cam.2019.112569 CAM 112569

To appear in:

Journal of Computational and Applied Mathematics

Received date : 24 November 2018 Revised date : 6 May 2019 Please cite this article as: X. Wang, M. Che and Y. Wei, Neural network approach for solving nonsingular multi-linear tensor systems, Journal of Computational and Applied Mathematics (2019), doi: https://doi.org/10.1016/j.cam.2019.112569. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Elsevier B.V. All rights reserved.

Journal Pre-proof

Maolin Che†

Yimin Wei‡

p ro

Xuezhong Wang∗

of

Neural Network Approach for Solving Nonsingular Multi-linear Tensor Systems

Abstract

Pr e-

The main propose of this paper is to develop two neural network models for solving nonsingular multi-linear tensor system. Theoretical analysis shows that each of the neural network models ensure the convergence performance. For possible hardware implementation of the proposed neural network models, based on digital circuits, we adopt the Euler-type difference rule to discretize the corresponding Gradient neural network (GNN) models. The computer simulation results further substantiate that the models can solve a multi-linear system with nonsingular tensors.

Key words: Multi-linear systems; Nonsingular tensors; neural networks; Activation function; Convergence.



urn

al

AMS subject classifications: 15A18, 15A69, 65F15, 65F10

Jo

E-mail: [email protected]. School of Mathematics and Statistics, Hexi University, Zhangye, 734000, P. R. of China. This author is supported by the National Natural Science Foundation of China under grant 11771099. Natural Science Foundation of Gansu Province and Innovative Ability Promotion Project in Colleges and Universities of Gansu Province 2019B-146. † E-mail: [email protected] and [email protected]. School of Economic Mathematics, Southwest University of Finance and Economics, Chengdu, 611130, P. R. of China. This author is supported by the Fundamental Research Funds for the Central Universities under grant JBK1801058. ‡ Corresponding author (Y. Wei). E-mail: [email protected] and [email protected]. School of Mathematical Sciences and Shanghai Key Laboratory of Contemporary Applied Mathematics, Fudan University, Shanghai, 200433, P. R. of China. This author is supported by the National Natural Science Foundation of China under grant 11771099 and Shanghai Municipal Education Commission.

1

Journal Pre-proof

1

Introduction

of

Let R and C be the real field and complex field, respectively. We denote the set of all morder n-dimensional real tensors by R[m,n] . For a given A ∈ R[m,n] and b ∈ Rn , consider the multi-linear system [34, 44] Axm−1 = b, (1.1) with x ∈ Cn , where the ith entry of Axm−1 ∈ R[m,n] is given as [33]: (Ax

n X

)i =

aii2 ,...,im xi2 . . . xim , i = 1, 2, . . . , n.

i2 ,...,im =1

(1.2)

p ro

m−1

Pr e-

Multi-linear systems of the form (1.1) arise in a number of applications, such as the numerical partial differential equations, the data mining, and the tensor complementarity problems [7, 21, 26, 2]. Han [14] gave a method to partially symmetrize A = (ai1 i2 ...im ) ∈ R[m,n] with respect to the indices i2 . . . im . In detail, the partially symmetrized tensor A = (ai1 i2 ...im ) as follows ai1 i2 ...im =

1 (m − 1)!

X

ai1 π(i2 ...im ) ,

π(i2 ...im )

Jo

urn

al

where the sum is over all the permutations π(i2 . . . im ). For any A ∈ R[m,n] , we can get a partially symmetrized tensor A ∈ R[m,n] such that Axm−1 = Axm−1 , by an averaging procedure. Recently, many researchers focus on the multi-linear systems with some structured tensors. For example, Ding and Wei [7] proved that the system (1.1) has a unique positive solution when A is a nonsingular M-tensor and b is a positive vector, and extended some basic iterative methods including the Jacobi and Gauss-Seidel methods from matrices to tensors. Li et al. [18] proposed a Newton-Gauss-Seidel method for the system (1.1) with A being a symmetric tensor. Liu et al. [25] proposed some tensor splitting algorithms for solving multi-linear systems when A is a strong M-tensor. Han [14] attained a homotopy method and Wang, Che and Wei proposed neural networks [41] for finding the unique positive solution of (1.1) with A being a nonsingular M-tensor. Based on the rank-one approximation of symmetric M-tensor A, Xie et al. [48] considered a new tensor method to solve the multi-linear system (1.1). He et al. [15] showed that solving the multi-linear system with a M-tensor is equivalent to solving the systems of nonlinear equations by involving P -functions, and proposed a Newton-type method to solve this system. Lv and Ma [27] derived a Levenberg-Marquardt method for solving (1.1) with A being a semi-symmetric tensor. Du et al. [8] proposed an inexact Levenberg-Marquardt method for solving the tensor absolute equation. Li et al. [22] developed a hybrid alternating projection algorithm for solving three-order tensor equations. Li et al. [20] gave some spectral radius comparisons between two different iterative tensors for solving multi-linear systems by tensor splitting iterative methods. Xu et al. [49] provided a way to get a solution of the complementarity problem via solving a system of lower dimensional tensor equations. Yan et al. [50] introduced a new class of so-named Z + -tensor and employed a Levenberg-Marquardt algorithm to find 2

Journal Pre-proof

urn

al

Pr e-

p ro

of

a solution to generalized tensor equations. Ling et al. [24] studied the solutions existence of some classes of tensor absolute equations with the help of degree theory and employed the generalized Newton method to find a solution of tensor absolute equations. The known results for solving multi-linear equations focus on the structural tensors, such as nonsingular M-tensors [7, 8, 14, 15, 25, 48], symmetric tensors [18], Z + -tensors [50] and semi-symmetric tensors [27]. However, there are few results on solving multi-linear equations with nonsingular tensors. The main difficulty and challenge is that we can not guarantee that matrix Ax(t)m−2 is positive definite matrix if we use the existing methods under the case of nonsymmetric tensors. Hence, our first motivation is how to solve multi-linear equations with nonsingular tensors. The system of linear equations can be viewed as a fundamental problem and broadly applied in many domains [47, 57, 55], such as mathematics, control systems, robotic kinematics and so on. The gradient neural network (GNN) has now been regarded as a powerful alternative for online computation [53], such as the matrix inversion [56], inner inverse and the Drazin inverse [37, 38, 43], linear and nonlinear equations solving [55, 12], in view of its high speed processing nature and its convenience of hardware implementation in practical applications [9, 36]. The interested readers can refer to [4, 35, 40, 47, 51, 54, 55] for solving the system of linear equations via gradient neural networks (GNNs). In addition, neural networks are also widely used in modeling hydrogen production [1], river forecasting [39], predict evaporation [31], singular spectrum analysis [46], estimating and optimizing the parameters that affect the yield and cost of biodiesel production [32], etc. To the best of our knowledge, there exists few research results on solving the multi-linear systems via GNNs. Thus, our second motivation is to propose different GNN models for solving multi-linear systems with some nonsingular tensors by involving different error-monitoring functions. It is theoretically proved that the state of the GNN models converges to the theoretical solution of the multi-linear systems. The convergence analysis shows that we do not need the matrix Ax(t)m−2 to be positive definite, we just need its nonsingular matrix in proposed models. Through illustrative computer-simulation examples, the efficacy and the superiority of the proposed GNN models are well-verified. At the end of this introductory section, the main contributions of this paper are listed as follows. (1) By means of different types of activation functions, we present two types of GNNs for computing the nonsingular multi-linear systems.

Jo

(2) For some nonsingular tensors A, we show that each equilibrium point of model (3.3) or model (3.9) is a solution for the system of equations (1.1), and prove that the solution of model (3.3) or model (3.9) converges to the theoretical solution for the multi-linear system. (3) We propose neural network methods for solving tensor equations. These methods reduce the coefficient tensor with symmetric tensors, semi-symmetric tensors, M-tensors, to general nonsingular tensors. (4) For digital hardware implementation and numerical algorithm development, we adopt the 3

Journal Pre-proof

Euler-type difference rule to discretize the GNN models (3.3) and (3.9) and prove that the discretization schemes are zero-stable.

of

(5) Computer simulation results illustrate that the GNN model with nonlinear activation functions are much more efficient more than the GNN model with the linear activation function for computing the multi-linear systems with nonsingular tensors.

Preliminaries

Pr e-

2

p ro

This paper is organized as follows. In Section 2, we recall some preliminary results. GNN models with different activation functions for the solutions of the multi-linear systems are presented in Section 3. By utilizing the Euler-type difference rule, we present two discretization schemes, discrete nonlinear gradient neural network (DNGNN) and discrete linear gradient neural network (DLGNN) for solving multi-linear systems in Section 4. Convergence properties of the presented neural network models will be discussed in Section 5. Illustrative numerical examples are presented in last section.

Recently, eigenvalues of tensors (hyper-matrices) have been independently introduced in [23, 33] and viewed as the generalization of the well-known definition of matrix eigenvalues. Definition 2.1. ([16]) A pair (λ, x) ∈ C × (Cn \{0}) is called an eigenvalue-eigenvector (or eigenpair) of A ∈ Rm,n , if it satisfies the system of equations Axm−1 = λx[m−1] ,

al

, . . . , xm−1 )> . Let ρ(A) denote the spectral radius of tensor A, that , xm−1 where x[m−1] = (xm−1 n 2 1 is, ρ(A) = max{|λ| : λ is an eigenvalue of A}.

urn

Definition 2.2. ([16, Definition 1.2]) The determinant of m-th order n-dimensional tensors denoted by DET, is defined as the irreducible polynomial with variables Vi1 ...im , ik = 1, 2, . . . , n; k = 1, 2, . . . , m, such that it is the resultant of the polynomial system Vxm−1 = 0,

Jo

where V ∈ Rm,n . Moreover, the value of the determinant for a given tensor A ∈ Rm,n is denoted by det(A), and is defined as the evaluation of DET at the point {Vi1 ...im = Ai1 ...im : ik = 1, 2, . . . , n; k = 1, 2, . . . , m}. For A ∈ Rm,n , if det(A) 6= 0, then A is said to be nonsingular. Lemma 2.1. ([16]) The product of all the eigenvalues of tensor A is equal to det(A), that is, n(m−1)n−1

Y

det(A) =

i=1

4

λi .

Journal Pre-proof

It is easy to prove the following lemma by Theorem 3.1 in [16]. Lemma 2.2. Let A ∈ R[m,n] . If det(A) 6= 0, then for any b ∈ Rn , A ∈ Rn×n , and B j ∈ C[j,n] for j = 3, . . . , m − 1, Axm−1 = (B m−2 )xm−2 + . . . + (B 3 )x2 + Ax + b has a solution in Cn .

of

Remark 2.1. Note that if all the entries of B j and A in Lemma 2.2 are zero, then when det(A) 6= 0, Axm−1 = b has a solution in Cn .

p ro

Definition 2.3. ([6]) Let A = (Ai1 i2 ...im ) be an m-order and n-dimensional tensor. We call another tensor hAi = (mi1 i2 ...im ) as the comparison tensor of A if  |Ai1 i2 ...im | (i1 i2 . . . im ) = (i1 i1 . . . i1 ), mi1 i2 ...im = −|Ai1 i2 ...im | (i1 i2 . . . im ) 6= (i1 i1 . . . i1 ). Definition 2.4. ([6]) We call a tensor A is an H-tensor, if its comparison tensor is an Mtensor; we call it as a nonsingular H-tensor, if its comparison tensor is a nonsingular M-tensor.

Pr e-

Lemma 2.3. ([6]) If A is a nonsingular H-tensor, then there exists a positive vector x such that hAixm−1 > 0. The following lemma shows this partial symmetrization preserves the nonsingular tensor structure and matrix Axm−2 is a nonsingular symmetric matrix. This results will be used in Scetion 5. Lemma 2.4. [14] If A ∈ R[m,n] is a nonsingular M-tensor, then there exists a nonzero vector x ∈ Cn such that Axm−1 6= 0, and then matrix Axm−2 is a nonsingular symmetric matrix.

al

Lemma 2.5. Let A ∈ R[m,n] is a nonsingular H-tensor, then there exists a nonzero vector x ∈ Cn such that Axm−1 6= 0, and then matrix Axm−2 is a nonsingular symmetric matrix.

urn

Proof. Since hAi is a nonsingular M-tensor, together with Definition 2.4 and Lemma 2.3, there exist nonzero vector x ∈ Cn such that hAixm−1 > 0, it follows from hAixm−1 = hAixm−1 . Similar to the proof of Theorem 2.1 in [14], we can obtain Axm−2 is nonsingular.

Jo

The entries of any matrix-valued activation function F(E) with E = (eij ) ∈ Rm×n , is defined as (f (eij )) with i = 1, 2, . . . , m and j = 1, 2, . . . , n, where f is a scalar-valued monotonicallyincreasing odd function. In general, any monotonically increasing odd activation function f can be used for the construction of the neural networks [42]. As shown in [19, 56], the convergence rate can be improved thoroughly by an appropriate activation function. The influence of various nonlinear activation functions was investigated for different neural network models. In this paper, we introduce the following four types of monotonically increasing odd activation functions: ♦ a linear function is defined as flin (x) = x,

5

Journal Pre-proof

♦ a bipolar-sigmoid function with q > 2 is defined as fbs (x, q) =

1 + exp(−q) 1 − exp(−qx) , 1 − exp(−q) 1 + exp(−qx)

of

♦ a power-sigmoid function with q ≥ 2 and p ≥ 3 is defined as ( xp , if |x| ≥ 1, fps (x, p, q) = 1+exp(−q) 1−exp(−qx) 1−exp(−q) 1+exp(−qx) , otherwise,

p ro

and

♦ a smooth power-sigmoid function with p ≥ 3 and q > 2 is defined as 1 1 + exp(−q) 1 − exp(−qx) fsps (x, p, q) = xp + . 2 1 − exp(−q) 1 + exp(−qx)

3

Pr e-

There are many kinds of activation functions in the neural network model. We only consider above four monotonically increasing odd activation functions which are needed in the convergence of our neural network models.

Gradient neural network models

3.1

al

The error monitoring function is designed for deriving a GNN. Specifically, by defining different error monitoring functions and activation functions, different GNNs can be obtained to solve multi-linear systems with nonsingular tensors. In this section, we construct two gradient neural network models, one is nonlinear gradient neural network models (NGNN), the other is linear gradient neural network model (LGNN-I), respectively.

LGNN-I and NGNN models

urn

Now we are ready for the design of our neural network model for computing multi-linear systems with nonsingular tensors. Let E1 (x(t)) = Ax(t)m−1 − b.

Following the gradient-descent design method [5], we define an error monitoring function as

Jo

1 kAx(t)m−1 − bk22 ε1 (t) = ε1 (x(t)) = kE1 (t)k22 = . 2 2

In order to force ε1 (t) to converge to zero, the negative of the gradient (i.e., -∂(ε1 (t))/∂x) is used as the descent direction, which leads to the so-called GNN design formula in the form of a first-order differential equations: dx ∂(ε1 (t)) = −γ , (3.1) dt ∂x 6

Journal Pre-proof

Pr e-

p ro

of

where γ is a positive scaling constant. Note that γ corresponds to the reciprocal of a capacitance parameter, of which the value should be set as large as the hardware would permit, or appropriately large for modeling and experimental purposes. By adopting GNN design formula (3.1), the differential equations can thus be established as follows: dx = −α(Ax(t)m−2 )> (Ax(t)m−1 − b), (3.2) dt with α = γ(m − 1) > 0, which should be set to be as large as the hardware permits. The dynamic equations (3.2) will be termed the linear gradient neural network model I (simplified LGNN-I). By exploiting a matrix-valued nonlinear activation function F(·), we obtain the following nonlinear GNN model: dx = −α(Ax(t)m−2 )> F(Ax(t)m−1 − b), (3.3) dt The dynamic equations (3.3) will be termed the nonlinear gradient neural network (simplified NGNN) model. Remark 3.1. Note that if we take matrix-valued nonlinear activation function as linear function, then NGNN model (3.3) reduces to LGNN-I model (3.2).

3.2

al

Note that there are only transposed terms of matrix Ax(t)m−2 in presented models. Another feasible alternative on defining an error monitoring function is to deform the equation with the same solution, for example, Li et al. [18] multiply x on both side of Ax(t)m−1 = b, which leads to the Newton-like methods. As we all know, matrix inversion requires a lot of overhead, especially for large-scale computing problems. Therefore, our model outperforms Newton-like methods in computational overhead. The block diagram of model (3.3) for online solving multi-linear systems illustrate in Figure 1.

LGNN-II model

urn

In order to monitor and control the solution process of the system of nonlinear equations (1.1), we recall the solution of large linear systems of the form Ax = b,

(3.4)

Jo

with A ∈ Rn×n and b ∈ Rn . It is widely recognized that preconditioning is the most critical ingredient in the development of efficient solvers for challenging problems in scientific computation, and that the importance of preconditioning is destined to increase even further. It is well known that the preconditioning refers to transforming the system (3.4) into another system with more favorable properties for iterative solution [3].

7

p ro

of

Journal Pre-proof

Figure 1: Block diagram of model (3.3) for online solving multi-linear systems.

Pr e-

If M ∈ Rn×n is a nonsingular matrix that approximates A (in some sense), then the linear system M −1 Ax = M −1 b, (3.5)

al

has the same solution as (3.4), would be easier to solve. Here M is called the preconditioner. For given nonzero y ∈ Rn , let D(y) = diag(y), notice that y may contain zero components, so D(y) may be a singular matrix. Hence, we let D(y) = diag(y) + λB, where B is a nonsingular matrix, since there exist λ > 0 such that D(y) is nonsingular. This replacement of a singular matrix by a well-conditioned matrix is known as the Tikhonov regularization method introduced in [10]. However, how to choose matrix B is a difficult problem, a simple way to do this is to choose B as I, i.e. let D(y) = diag(y) + λI. Hence the system of nonlinear equations (1.1) is equivalent to the following system of equations D(y)Axm−1 = D(y)b.

(3.6)

urn

In particular, we assume that y = x. On the basis of these considerations, similar to E1 (x(t)), we define E2 (t) = D(x)Ax(t)m−1 − D(x)b (3.7)

Jo

Following the gradient-descent design method [5], we define another error monitoring function as 1 kD(x)Ax(t)m−1 − D(x)bk22 ε2 (t) = ε2 (x(t)) = kE2 (t)k22 = . 2 2 Note that the minimal value ε2 (t) = ε2 (x(t)) = 0 of the residual-error function ε2 (t) is achieved in a minimizer x = x(t) if and only if x(t) is the exact solution of (3.6). A typical computational scheme is defined along a descent direction of ε2 (x(t)), until the minimum ε2 (t) is reached.

8

Journal Pre-proof

The typical descent direction of ε(x(t)) is the negative gradient −∂ε2 (x(t))/∂x of ε2 (x(t)), where ∂ε2 (x(t))/∂x is defined as

Combining

dx(t) ∂ε2 (x(t)) = −β dt ∂x

p ro

and (3.8), we have

(3.8)

of

>  ∂ε2 (x(t)) = diag(Ax(t)m−1 − b) + (m − 1)D(x)Axm−2 D(x)Ax(t)m−1 − D(x)b . ∂x

>  dx(t) D(x)Ax(t)m−1 − D(x)b , (3.9) = −β diag(Ax(t)m−1 − b) + (m − 1)D(x)Ax(t)m−2 dt

with t ∈ [0, +∞), where β is a positive scaling constant, which should be set to be as large as the hardware permits. For the convenience, let

Pr e-

B(x) = diag(Ax(t)m−1 − b) + (m − 1)D(x)Ax(t)m−2 .

(3.10)

Thus, the dynamic equations (3.9) will be rewritten as

 dx(t) = −βB(x)> D(x)Ax(t)m−1 − D(x)b . dt

(3.11)

4

al

The dynamic equations (3.11) will be termed the linear gradient neural network model II (simplified LGNN-II). The block diagram of model (3.11) for online solving multi-linear systems is similar to model (3.3), so we omit it.

Discrete neural network model

4.1

urn

For possible hardware implementation of the neural network models, based on digital circuits, in this section, we adopt the Euler-type difference rule to discretize the GNN models (3.3) and (3.11).

Basic definitions

Jo

The following definitions [11] are provided to measure the computational performance of the descretization schemes of (3.3) and (3.11) descretized by the Euler-type difference rule. Definition 4.1. ([11]) For the N-step algorithm depicted in xk+1 +

N X i=1

9

αi xk+1−i ,

Journal Pre-proof

its zero-stability can be verified by determining the roots of the characteristic polynomial P (v) = v N +

N X

αi v N −i .

i=1

of

If the roots of P (v) = 0 are such that: 1) all roots lie in the unit disk, i.e., |v| ≤ 1 and 2) any roots on the unit circle (i.e., |v| = 1) are simple; then, the N-step algorithm is zero-stable. That N P is, the algorithm is zero-stable if all solutions for xk+1 + αi xk+1−i . i=1

p ro

Definition 4.2. ([11]) An N-step algorithm is considered consistent with order p if its truncation error is O(τ p ) with p > 0 for the smooth exact solution.

Definition 4.3. ([11]) An N-step algorithm is considered convergent, i.e., x[(t−t0 )/τ ] → x∗ (t), for all t ∈ [t0 , tf inal ], as τ → 0, if and only if such an algorithm is zero-stable and consistent. That is, zero stability and consistency lead to convergence. In particular, a zero-stable and consistent algorithm converges with the order of its truncation error.

4.2

Pr e-

Note that in this section, we use the computer science interpretation of O(·) to refer to the class of functions whose growth is bounded and below up to a constant.

DNGNN and DLGNN models

 xk+1 = xk − h2 B(x)> D(x)Ax(t)m−1 − D(x)b ,

urn

and

al

To discretize the GNN models (3.3) and (3.11), the following Euler-type difference rule [28] is used: dxk dx(t = kτ ) x((k + 1)τ ) − x(kτ ) = = , (4.1) dt dt τ with k = 1, 2, . . . , kmax , where τ > 0 denotes the sampling time and kmax is the maximum iteration step. Hereafter, we denote xk = x(t = kτ ) for the convenience [29]. For the GNN models (3.3) and (3.11), via (4.1), we obtain the following Euler-type discretetime schemes: (4.2) )> F(Axm−1 − b), xk+1 = xk − h1 (Axm−2 k k (4.3)

where h1 = τ α > 0 ∈ R and h2 = τ β > 0 ∈ R, B(x) is defined by (3.10). For (4.2) and (4.3), the value of τ should be sufficiently low to satisfy the typical precise solution required in practice. The dynamic equations (4.2) and (4.3) will be termed the discrete nonlinear gradient neural network model and discrete linear gradient neural network model, respectively. For convenience, we use DNGNN and DLGNN to denote (4.2) and (4.3), respectively.

Jo

Remark 4.1. We can adopt Taylor-type difference rule to discretize the GNN models. The discretizations of Taylor-type difference rule are better than the discretization of Euler-type difference rule for time-varying problem, which have been shown in [12]. However, for the timeinvariant problem, the difference between the numerical results of the two methods is not obvious. Therefore, we choose Euler-type difference rule to discretize the GNN models in this paper. 10

Journal Pre-proof

Theorem 4.1. The proposed DNGNN (4.2) and DLGNN (4.3) algorithms are zero-stable.

P (v) = v − 1.

of

Proof. From the prove of Theorem 2 in [13], we know the characteristic polynomial is, for the proposed DNGNN (4.2) and DLGNN (4.3),

p ro

The characteristic polynomial P (v) = 0 has one root, which lie in the unit disk, thereby showing that the proposed DNGNN (4.2) and DLGNN (4.3) algorithms are zero-stable. We complete our proof. Theorem 4.2. The proposed DNGNN (4.2) and DLGNN (4.3) algorithms are consistent and convergent. Each of them converges with the order of its truncation error being O(τ 2 ) for all tk ∈ [t0 , tf inal ]. Proof. In terms of (4.1), we obtain the following equation:

Pr e-

dxk xk+1 − xk = + O(τ 2 ). dt τ

(4.4)

If we use (4.4) to discretize (3.3) and (3.11), then we further obtain the following equation: − b) + O(τ 2 ), xk+1 = xk − h1 (Axm−2 )> F(Axm−1 k k and

 − D(x)b + O(τ 2 ), xk+1 = xk − h2 B(x)> D(x)Axm−1 k

(4.5) (4.6)

5

urn

al

where B(x) is defined by (3.10). When O(τ 2 ) is dropped from the preceding equation, the proposed DNGNN (4.2) and DLGNN (4.3) algorithms for solving multi-linear system with nonsingular tensor are derived, thereby further showing that the truncation error of DNGNN (4.2) and DLGNN (4.3) are O(τ 2 ). Thus, DNGNN (4.2) and DLGNN (4.3) algorithms for solving multi-linear system with nonsingular tensor are consistent. Together with Theorem 4.1, the proposed DNGNN (4.2) and DLGNN (4.3) algorithms are both zero-stable and consistent. On the basis of Definitions 4.1 and 4.2, we further conclude that the proposed algorithms are consistent and convergent given that each of them converges with the order of O(τ 2 ) for all tk ∈ [t0 , tf inal ]. Thus, the proof is completed.

Convergence analysis

Jo

We now recall some useful definitions. An autonomous dynamical system is described by the ordinary differential equations: dx(t) = f (x(t)), dt

x(t0 ) ∈ Rn ,

(5.1)

where x ∈ Rn is the state vector. As shown in [52], if f : Rn → Rn is continuous differentiable, then the solution of (5.1) exists for t > t0 and is unique for some given initial condition x(t0 ). 11

Journal Pre-proof

Definition 5.1. ([52]) If f (x∗ ) = 0, then x∗ is called an equilibrium point or fixed point of (5.1).

of

Definition 5.2. ([52]) An equilibrium point x∗ of (5.1) is stable if for every neighborhood U of x∗ , there exists a neighborhood U1 ⊂ U such that every solution x(t) with x(t0 ) ∈ U1 lies in U for every t > t0 .

5.1

Convergence of NGNN models

p ro

Definition 5.3. ([52]) If in addition to the property of Definition 5.2, U1 can be chosen such that limt→∞ x(t) = x∗ for every x(t0 ) ∈ U1 , then x0 is said asymptotically stable.

Here we focus on a particular case that an equilibrium point of (3.3) is isolated. Let S denote the solution set of the system of equations (1.1). For any x ∈ S, we have E1 (x) = 0. Furthermore, for any matrix-valued activation function F(·), we also have F(E1 (x)) = 0.

Pr e-

Remark 5.1. Notice that the number of solutions of tensor equations is not unique, we only need to find a solution of the tensor equations. If we consider the non isolated equilibrium point of (3.3), the state variables generated by the neural network models will tend to multiple equilibrium points, it is difficult to analyze the convergence of models. Hence, we consider isolated equilibrium point of (3.3). Theorem 5.1. Every solution to the system of equations (1.1) is an equilibrium point of (3.3). Conversely; if x ∈ Rn is an equilibrium of (3.3) with Axm−2 is nonsingular, then x ∈ S.

al

Proof. We only need to address the second part of this theorem. If x is a stationary point of (3.3), since Axm−2 is nonsingular, together with F(·) is odd and monotonically increasing function, we have Axm−1 = b. Thus, a stationary point of (3.3) is a solution for (1.1).

urn

Suppose that x∗ ∈ Rn satisfies Axm−1 = b. Let Axm−2 be nonsingular, that is, the eigenval∗ ∗ m−2 are nonzero. As shown in [45], there exists a δ > 0 such that Axm−2 is nonsingular ues of Ax∗ with kx − x∗ k2 ≤ δ. For this δ, we define a neighbourhood of x∗ as B(x∗ ; δ) := {x ∈ Rn | kx − x∗ k2 ≤ δ}.

We have the following result of the convergence of the NGNN model.

Jo

Theorem 5.2. Let A ∈ R[m,n] be nonsingular. Suppose that x∗ ∈ Rn satisfies Axm−1 = b. If ∗ Axm−2 is nonsingular, then starting from the initial state x(0) ∈ B(x ; δ), the state x(t) of (3.3) ∗ ∗ converges to x∗ . Proof. We construct the following Lyapunov function: L1 (t) =

kE1 (t)k22 E1 (t)> E1 (t) = ≥ 0, 2 2 12

(5.2)

Journal Pre-proof

with its time derivative being = 12 (E1 (t)> E1 (t)) = E1 (t)> dEdt1 (t)  = (m − 1) (Axm−1 − b)> Axm−2 dx dt  = −(m − 1)α (Axm−1 − b)> Axm−2 (Axm−2 )> F(Axm−1 − b)  = −(m − 1)αTr Axm−2 (Axm−2 )> F(Axm−1 − b)(Axm−1 − b)> .

(5.3)

of

dL1 (t) dt

p ro

Note that for all x ∈ B(x∗ ; δ), Axm−2 is nonsingular. Then matrix H = Axm−2 (Axm−2 )> is symmetric positive definite, which implies that     λmin Tr F(Ax(t)m−1 − b)(Ax(t)m−1 − b)> ≤ Tr HF(Ax(t)m−1 − b)(Ax(t)m−1 − b)>   ≤ λmax Tr F(Ax(t)m−1 − b)(Ax(t)m−1 − b)> ,

Pr e-

where λmin and λmax denote the smallest and largest eigenvalues of matrix H, respectively. Since f (·) is an odd and monotonically increasing function, then we have f (−x) = −f (x) and   > 0, if x > 0, = 0, if x = 0, f (x)  < 0, if x < 0,

which implies

xf (x)

It follows that 

m−1

m−1

− b)(Ax(t)

al

Tr F(Ax(t)



> 0, if x 6= 0, = 0, if x = 0.

  > 0, if Ax(t)m−1 − b 6= 0, − b) = 0, if Ax(t)m−1 − b = 0. >

urn

Due to the fact that the design parameter satisfies α > 0, in view of (5.3), it follows that  dL1 (t) < 0, if Ax(t)m−1 − b 6= 0, = 0, if Ax(t)m−1 − b = 0. dt By the Lyapunov theory, residual error E1 (t) = Ax(t)m−1 − b can converge to zero. Equivalently speaking, the vector x(t) of NGNN model (3.3) is asymptotic stable at x∗ .

Jo

Remark 5.2. Note that we need Axm−2 is nonsingular in Theorem 5.2, in fact, this condition ∗ is satisfies for some nonsingular tensors based on Lemma 2.4 and 2.5.

13

Journal Pre-proof

5.2

Convergence of LGNN-II model

Similar to the case of NGNN models, if x ∈ S, then E2 (x) = 0. Together with f (·) is an odd and monotonically increasing function, we have F(E2 (x)) = 0.

of

Theorem 5.3. Every solution to the system of equations (1.1) is an equilibrium point of (3.9). Conversely, if x ∈ Rn is an equilibrium of (3.9) and diag(Axm−1 − b) + (m − 1)D(x)Axm−2 is nonsingular with D(x) = diag(x) + λI, then x ∈ S. Proof. Analogous to the proof of Theorem 5.1, we can prove this Theorem, so we omit it.

p ro

Suppose that x∗ ∈ Rn satisfies Axm−1 = b. Let B(x∗ ) be nonsingular in (3.10). As shown ∗ in [45], for a given λ > 0, there exists a δ > 0 such that B(x) is nonsingular with kx − x∗ k2 ≤ δ. For this δ, we define a neighbourhood of x∗ as B(x∗ ; δ) := {x ∈ Rn | kx − x∗ k2 ≤ δ}.

Pr e-

We have the following result of the convergence of the LGNN-II model. = b. For a Theorem 5.4. Let A ∈ R[m,n] be nonsingular. Suppose that x∗ ∈ Rn satisfies Axm−1 ∗ given λ > 0, if B(x∗ ) in (3.10) is nonsingular, then starting from the initial state x(0) ∈ B(x∗ ; δ), the state x(t) of (3.9) converges to x∗ . Proof. We construct the following Lyapunov function:

1 E2 (t)> E2 (t) L2 (t) = kE2 (t)k22 = ≥ 0, 2 2 with its time derivative being

1 dE2 (t) (E2 (t)> E2 (t)) = E2 (t)> 2 dt dx = (D(x)Ax(t)m−1 − D(x)b)> B(x) dt  =

al

dL2 (t) dt

(5.4)

urn

 = −β (D(x)Ax(t)m−1 − D(x)b)> B(x)B(x)> (D(x)Ax(t)m−1 − D(x)b)   = −βTr B(x)B(x)> (D(x)Ax(t)m−1 − D(x)b)(D(x)Ax(t)m−1 − D(x)b)> .

Jo

Note that for all x ∈ B(x∗ ; δ), B(x) is nonsingular in (3.10), which implies that B(x)B(x)> is symmetric positive definite. Analogous to the proof of Theorem 5.2, we have  dL2 (t) < 0, if D(x)Ax(t)m−1 − D(x)b 6= 0, = 0, if D(x)Ax(t)m−1 − D(x)b = 0. dt By the Lyapunov theory, residual error E2 (t) = D(x)Ax(t)m−1 − D(x)b converges to zero. Equivalently speaking, the vector x(t) of (3.9) is asymptotic stable at x∗ . 14

Journal Pre-proof

6

An application

of

As shown in the recent paper, one specific application of (1.1) is the following problem [2] ( − max Lλ U − ηU − 21 αγ 2 U + βγ = 0, on Ω, (γ,λ)∈(Γ,Λ) (6.1) U = g, on ∂(Ω), where Lλ U (x) is the (possibly degenerate) elliptic operator

p ro

1 Lλ U (x) = σ(x, λ)2 U 00 (x) + u(x, λ)U 0 (x), 2 and Ω = (0, 1),

Γ = [0, ∞),

and Λ is a compact metric space.

We discretize (6.1) by using an “optimize then discretize” (OD) approach. Thus, we can attain a 3-order Bellman equation min A(λ)u2 = b,

Pr e-

+1 λ∈ΛM 4x

which can be regarded as a higher order generalization of [17], where  ai,i−1,i (λ) = ai,i,i−1 (λ),    1 1 2 1  2a  i,i,i−1 (λ) = − 2 σi (λi ) (4x)2 + ui (λi ) 4x 1(−∞,0) (ui (λi )),    a (λ) = 1 σ (λ )2 2 + |u (λ )| 1 + η , i i 4x i i,i,i 2 i i (4x)2 1 1 1 2  2ai,i,i+1 (λ) = − 2 σi (λi ) (4x)2 − ui (λi ) 4x 1(0,+∞) (ui (λi )),     a (λ) = ai,i,i+1 (λ),    i,i+1,i ai,i,i (λ) = 1,

(6.2)

if if if if if if

0 < i < M, 0 < i < M, 0 < i < M, 0 < i < M, 0 < i < M, i = 0, M,

al

where 1(−∞,0) is defined as

1(−∞,0) (x) =

(

1, 0,

x ∈ (−∞, 0); x∈ / (−∞, 0).

7

urn

Lastly, one can define the vector b = (b0 , . . . , bM )> by ( 2 1 βi 2 αi , if 0 < i < M, bi = gi2 , if i = 0, M.

Numerical examples

Jo

In this section, numerical and comparative results with some illustrative examples are provided to substantiate the efficacy and superiority of the proposed neural network models. The ordinary differential equation solver engaged is ode45. All computations are carried out in Matlab Version 2014a, which has a unit roundoff 2−53 ≈ 1.1×10−16 , on a laptop with Intel Core(TM) i5-4200M CPU (2.50GHz) and 7.89GB RAM. 15

(a) NGNN with m = 3, n = 10, α = 1 × 108 .

p ro

of

Journal Pre-proof

(b) LGNN-II with m = 3, n = 10, β = 1 × 108 .

7.1

Pr e-

Figure 2: Comparison of the NGNN and LGNN-II for Example 7.1.

NGNN and LGNN-II models

We assume that λ = 1 × 10−6 and take residual error as RES = kAx(t)m−1 − bk2 .

(7.1)

Note that in all figures, blue stars, pink pluses, black triangles and green circles represent the results when the activation function is linear function, the power-sigmoid function, the bipolar-sigmoid function and the smooth power-sigmoid function, respectively.

al

Example 7.1. Let B ∈ R10×10×10 be a nonnegative tensor with

urn

bi1 i2 i3 = tan(i1 + i2 + i3 ),

then A = 1500I − B is a symmetric nonsingular H-tensor [6, 7].

Jo

Choose b such that the exact solution of Axm−1 = b is x∗ = (10, 10, . . . , 10)> ∈ R10 . Let x0 = (2, 2, . . . , 2)> ∈ R10 be an initial value. Let p = 3 and q = 5 for the power-sigmoid function, q = 5 for the bipolar-sigmoid function, and p = 5 and q = 7 for the smooth powersigmoid function. Suppose that α = 1 × 108 . By the employment of NGNN, the trajectories of residual errors (7.1) are shown in graphs included in Figure 2 (a). By employing the LGNN-II, the trajectories of residual errors (7.1) is shown in graph included in Figure 2 (b). Taking different parameter α = 1 × 104 and α = 1 × 106 , we compare the trajectories of residual errors (7.1) by using the NGNN models, the test results show in Figure 3. Figure 3 show that the larger of the parameter α, the better of the convergence for NGNN models. 16

(a) NGNN with α = 1 × 104 .

p ro

of

Journal Pre-proof

(b) NGNN with α = 1 × 106 .

urn

al

Pr e-

Figure 3: Comparison of the NGNN with different α for Example 7.1.

(a) Smooth power-sigmoid function.

(b) Power-sigmoid function.

Figure 4: Sensitivity of the NGNN with parameter for Example 7.1.

Jo

We also tested the sensitivity of the NGNN model to parameters with α = 1 × 108 . Taking Power-sigmoid function and Smoothing power-sigmoid function as examples, we test the trajectories of errors under different parameters p and q. The obtained results show in Figure 4. From the Figure 4, we can easily find that the larger of the parameter value, the better of the convergence for NGNN models. Example 7.2. Let the entries of tensor B ∈ R[m,n] be the random values drawn from the standard 17

(b) LGNN-II with m = 3, n = 10, β = 1 × 108 .

al

Pr e-

(a) NGNN with m = 3, n = 10, α = 1 × 108 .

p ro

of

Journal Pre-proof

urn

(c) NGNN with m = 3, n = 20, α = 1 × 108 .

(d) LGNN-II with m = 3, n = 20, β = 1 × 108 .

Figure 5: Comparison of the NGNN and LGNN-II for Example 7.2.

uniform distribution on (0, 1) and compute the scalar s = (1 + ) max (Be2 )i , i=1,2,...,n

 > 0,

Jo

where e = (1, 1, . . . , 1)> ∈ Rn . As shown in [7], A = −sI + B is a nonsingular H-tensor.

Choose b such that the exact solution of Axm−1 = b is x∗ = (1, 1, . . . , 1)> ∈ Rn . Take the initial vector as x0 = (2, 2, . . . , 2)> ∈ Rn . Let p = 5 and q = 7 for the power-sigmoid function, q = 7 for the bipolar-sigmoid function, and p = 5 and q = 7 for the smooth power-sigmoid function. 18

(b) LGNN-II with m = 4, n = 5, β = 1 × 108 .

al

Pr e-

(a) NGNN with m = 4, n = 5, α = 1 × 108 .

p ro

of

Journal Pre-proof

(d) LGNN-II with m = 4, n = 10, β = 1 × 108 .

urn

(c) NGNN with m = 4, n = 10, α = 1 × 108 .

Figure 6: Comparison of the NGNN and LGNN-II for Example 7.2.

Jo

Suppose that α = 1 × 108 . For different m and n, by the employment of NGNN, the trajectories of residual errors (7.1) are shown in graphs included in Figure 2 (a) and (c) and Figure 5 (a) and (c), respectively; by employing the LGNN-II, the trajectories of residual errors (7.1) are shown in graphs included in Figure 2 (b) and (d) and Figure 5 (b) and (d), respectively. Example 7.3. Suppose that the entries of B ∈ R10×10×10 are given bi1 i2 i3 = sin(i1 + i2 + i3 ).

Then, A = sI − B is a nonsingular H-tensor [48]. 19

(b) LGNN-II m = 3, n = 50, β = 1 × 108 .

al

Pr e-

(a) NGNN with m = 3, n = 50, α = 1 × 108 .

p ro

of

Journal Pre-proof

(d) LGNN-II m = 3, n = 100, β = 1 × 108 .

urn

(c) NGNN with m = 3, n = 100, α = 1 × 108 .

Figure 7: Comparison of the NGNN and LGNN-II for Example 7.3.

Jo

Choose b such that the exact solution of Axm−1 = b is x∗ = (1, 1, . . . , 1)> ∈ R10 . Take the initial vector as x0 = (2, 2, . . . , 2)> ∈ R10 . Let p = 5 and q = 7 for the power-sigmoid function, q = 7 for the bipolar-sigmoid function, and p = 5 and q = 11 for the smooth power-sigmoid function. Suppose that α = β = 1 × 108 . By the employment of NGNN, for different m and n, the trajectories of residual errors (7.1) are shown in graphs included in Figure 7 (a) and (c), respectively; by employing the LGNN-II, the trajectories of residual errors (7.1) are shown in graphs included in Figure 7 (b) and (d), respectively.

20

Journal Pre-proof

Example 7.4. We compute a numerical solution of (6.1) under the parameterization σ(x, λ) = 0.2,

α(x) = 2 − x,

u(x, λ) = 0.04λ,

η(x) = 0.04,

β(x) = 1 + x,

g(x) = 1,

of

where λ = {−1, 1}. Since Λ is finite, we take Λ4x = Λ. From the discussion in [2], we know that A(λ) is a strictly diagonal dominant Z-tensor, and then, A(λ) is a nonsingular M-tensor.

al

Pr e-

p ro

Choose x = (1, 1, . . . , 1)> ∈ RM , we would like to employ NGNN models and LGNN-II model to compute a positive solution of A(λ)u2 = b. Take the initial vector as u0 = rand(M, 1). Let p = 3 and q = 5 for the power-sigmoid function, q = 5 for the bipolar-sigmoid function, and p = 5 and q = 7 for the smooth power-sigmoid function. Let M be chosen from {100, 200}. By the employment of NGNN, the trajectories of residual errors (7.1) are shown in graphs included in Figure 8 (a) and (c), respectively; by employing the LGNN-II, the trajectories of residual errors (7.1) are shown in graphs included in Figure 8 (b) and (d), respectively.

(b) LGNN-II with M = 100, β = 1 × 108 .

urn

(a) NGNN with M = 100, α = 1 × 108 .

Jo

We compare the NGNN and LGNN-II models with Newton method introduced in [7]. Take the initial vector as u0 = rand(M, 1) and α = β = 1 × 104 . Let p = 3 and q = 3 for the power-sigmoid function, q = 5 for the bipolar-sigmoid function, and p = 5 and q = 7 for the smooth power-sigmoid function and M = 200. The comparison results show in Figure 9. From Figure 9 (a), it is easy to see that the Newton method is better than the LGNN-I (NGNN with linear function) and the NGNN with the bipolar-sigmoid function or the smooth powersigmoid function or the power-sigmoid function under the considered parameters is better than the Newton method for solving multi-linear equations with nonsingular M-tensors. From Figure 9 (b), we can see that the Newton method and the LGNN-II almost simultaneously make the residual errors less than 1 × 10−8 . 21

(c) NGNN with M = 200, α = 1 × 1010 .

p ro

of

Journal Pre-proof

(d) LGNN-II with M = 200, β = 1 × 1010 .

urn

al

Pr e-

Figure 8: Comparison of the NGNN and LGNN-II for Example 7.4.

(a) NGNN with M = 200, α = 1 × 104 .

(b) LGNN-II with M = 200, β = 1 × 104 .

Figure 9: Comparison of the NGNN, LGNN-II and Newton method [7] for Example 7.4.

Jo

Example 7.5. We randomly generate a strictly diagonal dominant tensor A ∈ Rn×n×n×n . It is easy to show A is a nonsingular tensor. Choose b such that the exact solution of Axm−1 = b is x∗ = (1, 1, . . . , 1)> ∈ Rn . Take the initial vector as x0 = rand(n, 1). Let p = 5 and q = 7 for the power-sigmoid function, q = 7 for the bipolar-sigmoid function, and p = 5 and q = 11 for the smooth power-sigmoid function. Suppose that α = β = 1×106 . For n = 10 and n = 50, the trajectories of residual errors (7.1) 22

Journal Pre-proof

(b) LGNN-II

urn

al

(a) NGNN

Pr e-

p ro

of

by the employment of NGNN are shown in Figure 10 (a) and (c), respectively; the trajectories of residual errors (7.1) by employing the LGNN-II are shown in Figure 10 (b) and (d), respectively.

(c) NGNN

(d) LGNN-II

Figure 10: Comparison of the NGNN and LGNN-II for Example 7.5.

Jo

Example 7.6. We define a 3-order mode-1 upper triangular tensor A = (aijk ) with the elements hold aijk = 0, if i > max(j, k). For randomly generate a strictly diagonal dominant 3-order mode-1 upper triangular tensor A ∈ Rn×n×n . It is easy to show A is a nonsingular tensor. 23

Journal Pre-proof

(a) NGNN

Pr e-

p ro

of

Taking n = 10 and b such that the exact solution of Axm−1 = b is x∗ = (1, 1, . . . , 1)> ∈ Rn . Take the initial vector as x0 = rand(n, 1). Let p = 5 and q = 7 for the power-sigmoid function, q = 7 for the bipolar-sigmoid function, and p = 5 and q = 11 for the smooth power-sigmoid function. Suppose that α = β = 1 × 106 . The trajectories of residual errors (7.1) by the employment of NGNN is shown in Figure 11 (a); the trajectories of residual errors (7.1) by employing the LGNN-II is shown in Figure 11 (b).

(b) LGNN-II

Figure 11: Comparison of the NGNN and LGNN-II for Example 7.6.

al

Example 7.7. We consider a randomly generate 4-order strong Mz -tensor A ∈ Rn×n×n×n . It is easy to show A is a nonsingular tensor [30].

Jo

urn

Suppose that α = β = 1 × 103 . Other conditions are same as Example 7.6, we attain the trajectories of residual errors (7.1) by the employment of NGNN, which is shown in Figure 11 (a) and the trajectories of residual errors (7.1) by employing the LGNN-II, which is shown in Figure 11 (b). From the computer simulation results derived in the illustrative examples, the following conclusions can be highlighted: 1. Both the NGNN models with different types of activation functions and LGNN-II model, presented in Section 3, exactly and efficiently ensure the convergence of solving multi-linear systems with nonsingular tensors. 2. NGNN models could achieve different performances if different activation function arrays are used. In general, the convergence performance of nonlinear activation functions is superior to that of the linear activation function. 3. The convergence performance of the LGNN-II model is better than that of the LGNN-I model under the same condition α = β. In particular, Figure 10 (a) and 10 (b) show that, for strictly diagonal dominant tensor in Example 7.5, LGNN-II model is better than those of the 24

p ro

of

Journal Pre-proof

(a) NGNN

(b) LGNN-II

Pr e-

Figure 12: Comparison of the NGNN and LGNN-II for Example 7.7. NGNN models under the condition α = β and the parameters of different activation functions take test value.

7.2

DNGNN model

al

In this subsection, some computer-simulation examples are demonstrated to verify the efficacy and the superiority of the proposed discretize neural networks. We apply the DNGNN and DLGNN algorithms to solve the equation (1.1), respectively. After the iteration step k, xk is derived by (4.2) or (4.3), and its residual error is defined as RES = kAxm−1 − bk2 . k

(7.2)

urn

In all tables, ‘ITE’ and ‘CPU’ denote the iterative steps of our methods, and CPU time used (in seconds) when the algorithm terminated at RES < 10−10 , respectively. In the following, we test the DNGNN with different activation functions and DLGNN algorithms for different examples presented in subsection 7.1, respectively. Example 7.8. We consider the tensor A given in Example 7.2.

Jo

Take the same b as example 7.2, initial vector x0 =rand(n, 1). Let h1 = 0.12 and h2 = 0.12, which are the best in experiment. We test the convergence performance of the DNGNN with different activation functions and DLGNN proposed in Section 2 for different m and n, respectively. The test results are shown in Table 1, where ‘flin ’ is linear function, ‘fbs ’ is bipolar-sigmoid function with q = 3, ‘fsps ’ is smooth power-sigmoid function with p = 2 and q = 7 and ‘fps ’ is power-sigmoid function with p = 2 and q = 5, respectively. 25

Journal Pre-proof

Table 1: Convergence performance of DNGNN model for Example 7.8.

(3,20) (3,100) (4,5)

DLGNN fsps 27 0.1011 65 0.2534 58 0.0919 65 0.2509 671 2.5710

fps 22 0.0812 56 0.2104 51 0.3558 100 0.3810 576 2.2079

75 0.4563 154 0.7845 160 1.0176 145 0.5023 1081 3.9871

Pr e-

(4,10)

ITE CPU ITE CPU ITE CPU ITE CPU ITE CPU

DNGNN fbs 59 0.2205 127 0.4694 113 0.4115 85 0.3279 525 2.0243

of

(3,10)

flin 101 0.6485 215 0.8283 196 1.3784 170 0.6573 1445 5.5604

p ro

(m, n)

Example 7.9. We consider the tensor A given in Example 7.3. Take the same b and initial vector x0 as example 7.3, h1 = 0.2 and h2 = 0.2, which are the best in experiment. We test the convergence performance of the DNGNN with different activation functions and DLGNN algorithms proposed in Section 2 for different n, respectively. The test results are shown in Table 2, where ‘flin ’ is linear function, ‘fbs ’ is bipolar-sigmoid function with q = 3, ‘fsps ’ is smooth power-sigmoid function with p = 2 and q = 3 and ‘fps ’ is power-sigmoid function with p = 2 and q = 3, respectively.

ITE CPU ITE CPU ITE CPU ITE CPU ITE CPU

flin 61 0.2274 50 0.2024 51 0.2346 51 0.3561 52 0.9994

urn

n

al

Table 2: Convergence performance of DNGNN and DLGNN models for Example 7.9.

10 20 50

Jo

100 200

DNGNN fbs 34 0.1305 25 0.1000 25 0.1075 26 0.1744 26 0.4941

26

DLGNN fsps 29 0.1075 21 0.0780 21 0.0919 22 0.1474 22 0.4255

fps 45 0.1673 35 0.1356 36 0.1617 36 0.2464 35 0.7350

50 0.1832 45 0.1709 45 0.2078 45 0.2931 45 0.8934

Journal Pre-proof

Conclusion

p ro

8

of

These illustrative numerical results in Table 1 and Table 2 substantiated that the proposed DNGNN and DLGNN models are effective for solving the nonsingular tensor multi-linear systems. In general, DNGNN models could achieve different performances if different activation function arrays are used. We can find the convergence performance of nonlinear activation functions is superior to that of the linear activation function. The convergence performance of the DLGNN model is better than that of the DNGNN model with linear activation function under h1 and h2 are the best in experiment condition.

urn

al

Pr e-

Two neural network models, denoted NGNN and LGNN-II, are developed for the purpose of solving the multi-linear systems with M-tensor. The proposed network models are based on corresponding fundamental error-monitoring functions. The selected error-monitoring functions ensure local convergence of the proposed network models. The neural networks are activated by linear, power, bipolar-sigmoid, and smooth power-sigmoid activation functions. The use of different activation functions in the network models NGNN affects different convergence behavior of the neural networks. Theoretical analysis shows that any monotonically increasing odd activation function could be used for the construction of such neural network models. The computer simulation further substantiates that a different performance of network model could be achieved with the use of different design parameters and activation functions. The limitation of this paper is that in the convergence analysis of the neural network model, we only consider the case that the solution of the multi-linear systems is an isolated point of the neural network. For the case that the solution of the multi-linear systems is another equilibrium point, we need to consider it in the future. Note that if we choose different α, β, h1 and h2 , then we will get different convergence performance corresponding to NGNN, LGNN, DNGNN and DLGNN models for different examples based on the numerical experiments. How to choose the optimal parameter value for different examples, it is an interest topic, we will consider in the future work. In addition, the convergence order of the proposed neural network models will discuss in the future work.

Acknowledgment

We would like to thank the Editor and anonymous referees for their helpful comments.

References

Jo

[1] S. F. Ardabili, B. Najafi, S. Shamshirband, B. M. Bidgoli, R. C. Deo, and K. Chau, Computational intelligence approach for modeling hydrogen production: a review, Engineering Applications of Computational Fluid Mechanics, 12 (2018), pp. 438–458. [2] P. Azimzadeh and E. Bayraktar, High order bellman equations and weakly chained diagonally dominant tensors, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 276–298.

27

Journal Pre-proof

[3] M. Benzi, Preconditioning techniques for large linear systems: a survey, Journal of Computational Physics, 182 (2002), pp. 418–477. [4] K. Chen, Implicit dynamic system for online simultaneous linear equations solving, Electronics Letters, 49 (2013), pp. 101–105.

of

[5] F. Ding and T. Chen, Gradient based iterative algorithms for solving a class of matrix equations, IEEE Transactions on Automatic Control, 50 (2005), pp. 1216–1221. [6] W. Ding, L. Qi, and Y. Wei, M-tensors and nonsingular M-tensors, Linear Algebra and Its Applications, 439 (2013), pp. 3264–3278.

p ro

[7] W. Ding and Y. Wei, Solving multi-linear systems with M-tensors, Journal of Scientific Computing, 68 (2016), pp. 689–715. [8] S. Du, L. Zhang, C. Chen, and L. Qi, Tensor absolute value equations, Science China Mathematics, 61 (2018), pp. 1695–1710. [9] D. Feng, S. Yang, and T. Chen, Gradient-based identification methods for hammerstein nonlinear armax models, Nonlinear Dynamics, 45 (2006), pp. 31–43.

Pr e-

[10] G. H. Golub, P. C. Hansen, and D. P. Oleary, Tikhonov regularization and total least squares, SIAM Journal on Matrix Analysis and Applications, 21 (1999), pp. 185–194. [11] D. F. Griffiths and D. J. Higham, Numerical Methods for Ordinary Differential Equations: Initial Value Problems, Springer Science and Business Media, 2010. [12] D. Guo, L. Yan, and Z. Nie, Design, analysis, and representation of novel five-step dtzd algorithm for time-varying nonlinear optimization, IEEE Transactions on Neural Networks, 29 (2018), pp. 4248–4260. [13] D. Guo, L. Yan, and Z. Nie, Design, analysis, and representation of novel five-step dtzd algorithm for time-varying nonlinear optimization., IEEE Transactions on Neural Networks and Learning Systems, 29 (2018), pp. 4248–4260.

al

[14] L. Han, A homotopy method for solving multilinear systems with M-tensors, Applied Mathematics Letters, 69 (2017), pp. 49 – 54.

urn

[15] H. He, C. Ling, L. Qi, and G. Zhou, A globally and quadratically convergent algorithm for solving ultilinear systems with M-tensors, Journal of Scientific Computing, 76 (2018), pp. 1718–1741. [16] S. Hu, Z. H. Huang, C. Ling, and L. Qi, On determinants and eigenvalue theory of tensors, Journal of Symbolic Computation, 50 (2013), pp. 508–531. [17] H. J. Kushner and P. G. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time, Springer-Verlag, 1992. [18] D. H. Li, S. Xie, and H. R. Xu, Splitting methods for tensor equations, Numerical Linear Algebra with Applications, 24 (2017), p. e2102.

Jo

[19] S. Li, S. Chen, and B. Liu, Accelerating a recurrent neural network to finite-time convergence for;solving time-varying sylvester equation by using a sign-bi-power;activation function, Neural Processing Letters, 37 (2013), pp. 189–205. [20] W. Li, D. Liu, and S.-W. Vong, Comparison results for splitting iterations for solving multi-linear systems, Applied Numerical Mathematics, 134 (2018), pp. 105–121.

28

Journal Pre-proof

[21] X. Li and M. K. Ng, Solving sparse non-negative tensor equations: algorithms and applications, Frontiers of Mathematics in China, 10 (2015), pp. 649–680. [22] Z. Li, Y. Dai, and H. Gao, Alternating projection method for a class of tensor equations, Journal of Computational and Applied Mathematics, 346 (2019), pp. 490–504.

of

[23] L. Lim, Singular values and eigenvalues of tensors: A variational approach, in IEEE CAMSAP 2005: First International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, IEEE, 2005, pp. 129–132.

p ro

[24] C. Ling, W. Yan, H. He, and L. Qi, Further study on tensor absolute value equations, arXiv:1810.05872, (2018). [25] D. Liu, W. Li, and S. W. Vong, The tensor splitting with application to solve multi-linear systems, Journal of Computational and Applied Mathematics, 330 (2018), pp. 75–94. [26] Z. Luo, L. Qi, and N. Xiu, The sparsest solutions to Z-tensor complementarity problems, Optimization Letters, 11 (2017), pp. 471–482.

Pr e-

[27] C. Q. Lv and C. F. Ma, A levenbergcmarquardt method for solving semi-symmetric tensor equations, Journal of Computational and Applied Mathematics, 332 (2018), pp. 13 – 25. [28] J. H. Mathews and K. D. Fink, Numerical Methods Using MATLAB, vol. 4, Pearson Prentice Hall Upper Saddle River, NJ, 2004. [29] S. K. Mitra and Y. Kuo, Digital Signal Processing: a Computer-based Approach, vol. 2, McGrawHill New York, 2006. [30] C. Mo, C. Li, X. Wang, and Y. Wei, Z-eigenvalue based structure tensors: Mz -tensors and strong Mz -tensors, Preprint. [31] R. Moazenzadeh, B. Mohammadi, S. Shamshirband, and K. Chau, Coupling a firefly algorithm with support vector regression to predict evaporation in northern iran, Engineering Applications of Computational Fluid Mechanics, 12 (2018), pp. 584–597.

al

[32] B. Najafi, S. F. Ardabili, S. Shamshirband, K. Chau, and T. Rabczuk, Application of anns, anfis and rsm to estimating and optimizing the parameters that affect the yield and cost of biodiesel production, Engineering Applications of Computational Fluid Mechanics, 12 (2018), pp. 611–624.

urn

[33] L. Qi, Eigenvalues of a real supersymmetric tensor, Journal of Symbolic Computation, 40 (2005), pp. 1302–1324. [34] L. Qi, H. Chen, and Y. Chen, Tensor Eigenvalues and Their Applications, vol. 39, Springer, 2018. [35] Z. Raida, Improvement of convergence properties of wang neural network, Electronics Letters, 30 (1994), pp. 1865–1866.

Jo

[36] S. Ramezani, Nonlinear vibration analysis of micro-plates based on strain gradient elasticity theory, Nonlinear Dynamics, 73 (2013), pp. 1399–1421. [37] P. S. Stanimirovi´ c, M. D. Petkovi´ c, and D. Gerontitis, Gradient neural network with nonlinear activation for computing inner inverses and the drazin inverse, Neural Processing Letters, (2017), pp. 1–25.

29

Journal Pre-proof

[38] P. S. Stanimirovi´ c, I. S. Zivkovi´ c, and Y. Wei, Recurrent neural network for computing the drazin inverse., IEEE Transactions on Neural Networks and Learning Systems, 26 (2015), pp. 2830– 2843. [39] R. Taormina, K. Chau, and B. Sivakumar, Neural network river forecasting through baseflow separation and binary-coded swarm optimization, Journal of Hydrology, 529 (2015), pp. 1788–1797.

of

[40] J. Wang, Electronic realisation of recurrent neural network for solving simultaneous linear equations, Electronics Letters, 28 (1992), pp. 493–495.

p ro

[41] X. Wang, M. Che, and Y. Wei, Neural networks based approach solving multi-linear systems with M-tensors, Neurocomputing, doi: https://doi.org/10.1016/j.neucom.2019.03.025. [42] X.-Z. Wang, H. Ma, and P. S. Stanimirovi´ c, Nonlinearly activated recurrent neural network for computing the drazin inverse, Neural Processing Letters, 46 (2017), pp. 195–217. ´, Complex neural network models for time-varying [43] X. Z. Wang, Y. Wei, and P. S. Stanimirovic Drazin inverse, Neural Computation, 28 (2016), pp. 2790–2824. [44] Y. Wei and W. Ding, Theory and Computation of Tensors: Multi-dimensional Arrays, Academic Press, 2016.

Pr e-

[45] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon, 1965. [46] C. L. Wu and K. Chau, Rainfallcrunoff modeling using artificial neural network coupled with singular spectrum analysis, Journal of Hydrology, 399 (2011), pp. 394–409. [47] L. Xiao and Y. Zhang, Solving time-varying inverse kinematics problem of wheeled mobile manipulators using zhang neural network with exponential convergence, Nonlinear Dynamics, 76 (2014), pp. 1543–1559. [48] Z.-J. Xie, X.-Q. Jin, and Y. Wei, Tensor methods for solving symmetric M-tensor systems, Journal of Scientific Computing, 74 (2018), pp. 412–425.

al

[49] H. Xu, D. Li, and S. Xie, An equivalent tensor equation to the tensor complementarity problem with positive semi-definite Z-tensor, Optimization Letters, (2018, to appear). [50] W. Yan, C. Ling, L. Ling, and H. He, Generalized tensor equations with leading structured tensors, arXiv:1810.05870, (2018).

urn

[51] C. Yi and Y. Zhang, Analogue recurrent neural network for linear algebraic equation solving, Electronics Letters, 44 (2008), pp. 1078–1079. [52] J. Zabczyk, Mathematical Control Theory:an Introduction, Birkh¨ auser, 2015. [53] Y. Zhang, A set of nonlinear equations and inequalities arising in robotics and its online solution via a primal neural network, Neurocomputing, 70 (2006), pp. 513–524. [54] Y. Zhang and K. Chen, Global exponential convergence and stability of wang neural network for solving online linear equations, Electronics Letters, 44 (2008), pp. 145–146.

Jo

[55] Y. Zhang, Z. Chen, and K. Chen, Convergence properties analysis of gradient neural network for solving online linear equations, Acta Automatica Sinica, 35 (2009), pp. 1136–1139. [56] Y. Zhang, Y. Shi, K. Chen, and C. Wang, Global exponential convergence and stability of gradient-based neural network for online matrix inversion, Applied Mathematics and Computation, 215 (2009), pp. 1301–1306.

30

Journal Pre-proof

Jo

urn

al

Pr e-

p ro

of

[57] Y. Zhang and L. Xiao, Solving time-varying nonlinear inequalities using continuous and discretetime zhang dynamics, International Journal of Computer Mathematics, 90 (2013), pp. 1114–1127.

31