Distributed fault diagnosis of networked dynamical systems with time-varying topology

Distributed fault diagnosis of networked dynamical systems with time-varying topology

Available online at www.sciencedirect.com Journal of the Franklin Institute 356 (2019) 5754–5780 www.elsevier.com/locate/jfranklin Distributed fault...

2MB Sizes 0 Downloads 28 Views

Available online at www.sciencedirect.com

Journal of the Franklin Institute 356 (2019) 5754–5780 www.elsevier.com/locate/jfranklin

Distributed fault diagnosis of networked dynamical systems with time-varying topology Chen Peng a, Yanlin Zhou b, Qing Hui a,∗ a Department

of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0511, USA b National Science Foundation Center for Big Learning, University of Florida, Gainesville, FL 32611, USA Received 5 July 2018; received in revised form 18 January 2019; accepted 25 May 2019 Available online 1 June 2019

Abstract In this paper, the distributed fault diagnosis (DFD) of networked dynamical systems with time-varying connected topologies, e.g., wireless sensor networks in harsh environments, is considered. Specifically, two essential problems are focused on, which are faced in extending the the decomposition-based adaptive DFD approach to such topology-varying systems. The problems introduced by the time-varying topologies are, respectively, decomposition schemes deterioration and pre-training difficulties. The causes of the two problems are detailed and addressed in our work. First, for the decomposition schemes deterioration problem, a multi-agent dynamics-based online distributed decomposition algorithm are developed, so that a decent decomposed network structure for such topology-varying network can be maintained. Second, to alleviate the pre-training difficulties in topology-varying systems, a fault detection method is proposed, which avoids the need for pre-training. The distributed decomposition algorithm is proved to converge in finite steps, and the proposed fault detection method is verified both theoretically and experimentally. © 2019 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.



Corresponding author. E-mail address: [email protected] (Q. Hui).

https://doi.org/10.1016/j.jfranklin.2019.05.027 0016-0032/© 2019 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5755

1. Introduction The fault diagnosis (FD) or fault tolerant design and control of large-scale systems has been receiving growing interest in recent years [1–4]. In large-scale distributed network structures, to reduce the problem space for fault diagnosis, usually a distributed FD (DFD) approach is used, such as mobile agents FD [5] and the decomposition-based approach [6–8]. In mobile agents FD, each agent carries a diagnoser and performs random walks [9] in the network. In the decomposition-based approach, at the fault-tolerant design phase, the network is decomposed into different subsystems, each governed by a local fault diagnoser (LFD) [10]. Each LFD monitors the states of the local basic nodes, communicates to other LFDs (probably neighbors), and submits local fault decisions to the global fault diagnoser (GFD), while the GFD supervises the LFDs and makes global fault decisions. In previous studies, the decomposition-based approach is only applied to systems with fixed topology, with the network decomposition scheme prescribed. Here, the extension of the framework to distributed systems with time-varying network topology is studied, with focus primarily on wireless sensor networks (WSNs) [11–13]. In a mobile sensing network [14–16], or a WSN with random links or node failures [17] and possible addition of new nodes [18], clearly the underlying topology is a dynamic graph. In decomposition-based DFD methods, although not usually mentioned, the DFD performance is largely determined by its decomposition structure. However, for a dynamic graph, the time-varying topology gradually undermines the quality of any existing decomposition scheme; therefore, an online decomposition method should be used, such that the decomposed network structure is adaptive to the variation of the topology of the network system. On the other hand, due to the constantly changing topology of the network, many connection functions will be new and unknown, especially on subsystem boundaries where information exchange is limited. Besides, new nodes can also be added to a subsystem due to the decomposition scheme update as a result of the decomposition algorithm. Therefore, for most data-driven FD methods, such as the adaptive approximation approach [19], it is infeasible to pre-train fault diagnosers with many of those new nodes or connections added to the subsystems. Currently, most data-driven adaptive methods for DFD are based on the assumption that the training of the LFDs happens in a fault-free environment [20–22]. For example, if T0 is used to denote the unknown fault occurrence time, then usually it is assumed that T0 > 0 [22], and the training is supposed to be finished before T0 . To satisfy the fault-free condition, the diagnosers must be trained with a correct system before applied to a system with potential faults. However, this condition is hardly satisfied for our problem, since i) pre-training is not able to cover those connections or nodes newly added to the subsystems due to topological changes and decomposition scheme updates; and ii) if training is performed when the real system is running, the fault-free condition is not satisfied since faults can exist during run-time. Therefore, as a first attempt to such problems, here some special cases of fault detection are presented, where such fault-free condition is avoided. Specifically, cases are considered where the classical batch least squares training method is used; and under this premise, a so-called matched weight fault detection (MWFD) method is proposed, which avoids the fault-free condition required by most adaptive methods. The MWFD method is based on neural network approximation with special basis function (hidden neuron) configurations, such that the matching of the learned weights of the target node and its neighbors reflects the health conditions of the nodes (the target node and its neighbors). The advan-

5756

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

tage of this method is that it does not need to be pre-trained with healthy nodes data, and thus it can be applied when most existing adaptive observer-based methods are not usable. The contributions of this paper are summarized as follows. 1. The extension of the decomposition-based DFD framework to dynamic graphs is studied, and two most important problems introduced by the time-varying topology are identified (i.e., deterioration of decomposition schemes and FD of nodes with unknown connection functions). 2. To maintain a good decomposed DFD structure for dynamic graphs, an online distributed decomposition (DD) method and its variant, i.e., the DD2 algorithm, are proposed, which are proved to converge in finite steps for static network topology. The DD2 algorithm is also applied on dynamic graph and shown by experiments to maintain a small number of subsystem interconnections (compared with using prescribed static decomposition schemes). 3. For FD of nodes with unknown connection functions due to the time-varying topology, the MWFD method is proposed, which does not require the fault-free condition for training fault detectors as in most existing DFD approaches. 4. Some theoretical basis for the MWFD method is given, and two different implementations (shared-basis and B-splines neurons) of the method are investigated. This paper is an extended work of Peng and Hui [23] and Peng et al. [24], where the DD algorithm and the MWFD method have been experimentally evaluated. In this paper, comprehensive theoretical analysis are also given on the proposed methods where detailed proofs are provided. Based on the DD algorithm, a variant of the algorithm, i.e., DD2, is also proposed and evaluated (both theoretically and experimentally), which is more suitable for dynamic graph online decomposition. The DD2 algorithm inherits the finite-time convergence property from the DD algorithm, but allows a more general coefficient setting. Here, the decomposition and the FD problems are integrated together under the framework of DFD over dynamic graph, with specific focus on distributed systems such as WSNs, where each node has only very limited computational power. Thus, discussions are also given on i) the connections between the proposed decomposition algorithms and existing graph partitioning algorithms, and ii) the application of these algorithms to WSN environments. The organization of this paper is given as follows. In Section 2, the problem formulation is given for extension of the decomposition-based DFD framework to dynamic graphs. Next, Section 3 presents the overall augmented framework, with some detailed relationship and reasoning given. Sections 4 and 5 introduce the distributed decomposition algorithm and the MWFD methodology, respectively. In Section 6, two different implementations (shared-basis and B-splines neurons) of the MWFD method are investigated. Finally, computer experiments are presented in Section 7, and Section 8 concludes the paper. Notation and preliminaries Following the convention in [25], the network with changing topology is referred to as a dynamic graph, which is denoted as an ordered pair G(t ) = (V (t ), E (t )), or simply G = (V , E ). Specifically, V denotes the set of nodes and E is the set of directed edges, and i, j  ∈ E if and only if there is an edge from node j to node i. Let n = |V| be the number

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5757

of nodes in the network, and N (i) (or Ni ), Nin (i), and Nout (i) denote the neighbors, inneighbors, and out-neighbors of node i, respectively. In addition, φi (t ) ∼ U (−1, 1) indicates that φ i (t) is a uniform random process in the interval [−1, 1]. Furthermore, R is the set of real number, Rn denotes the set of n-dimensional real vectors, and Rm×n denotes the set of m-by-n real matrices. 2. Problem formulation Here the DFD of nonlinear discrete-time dynamical systems underlying a large-scale network with time-varying topology is considered, which can be represented as  xi (t + 1) = xi (t ) + fi (xi (t ), ui (t )) + φi (t ) + ci j h˚ i j (xi (t ), x j (t )), i ∈ V. (1) j∈N˚ (i)

Similar formulations can be found in [26,27]. It is assumed that the states xi ∈ Rni , i ∈ V are observable. Let n = |V| be the number of nodes in the network. For each node i, the nominal function fi : Rni × Rnui → Rni , the control input ui (t ) ∈ Rnui , and the weights of connections ci j ∈ Rni ×mi j are known a priori. In addition, hi j (xi , x j ) : Rni × Rni → Rmi j are the unknown connection functions. The objective of the FD task is to detect the unknown fault in each node i, and φ i (t) represents the deviation in dynamics of node i due to a fault. Since robustness is not considered in this paper, uncertainty of the state equation is neglected. Thus, whenever φ i (t) = 0 is detected in (1), a fault is detected, and φ i (t) is simply referred to as the fault on node i The time-varying topology here means that there are time instants at which new connections are added and old connections are removed; however, the function of a connection is timeinvariant during the period when it exists. Thus, to avoid confusion with the notion of timevarying functions, in Eq. (1), the notation ◦ (shorthand notation for ◦(t)) is used for meaning of the “latest version” up to time t. Specifically, h˚ i j denotes the latest version of connection functions between nodes i and j, and N˚ (i) denotes the neighbors of node i corresponding to the latest topology. In the rest of the paper, the notation ◦ is omitted as long as there is no confusion. In the next section, the motivations and overview of our main contributions (i.e., the online DD algorithm and the MWFD method) are described in details. 3. Decomposition-based DFD over dynamic graphs: problems and methods In the decomposition-based DFD, nodes in the physical network layer are assigned with different cyber-layer roles. Specifically, local fault diagnosers (LFDs) are designated nodes to provide computational power for FD of the local subsystems, and basic nodes are usually smart sensors for data collection and simple computation. For large systems, there can also be a global fault diagnoser (GFD) which coordinates the overall DFD operations. In this paper, a wireless sensor network (WSN) [12,28] is mainly considered as the target distributed system for DFD, and the underlying topology is assumed to be a dynamic graph, such as in mobile sensing network or WSNs with random failures and network replenishment [16,17,29]. In the literature, dynamical systems with time-varying topology (or switching topology) are usually considered in network consensus problems with agent communication noises or time-delays [30,31], flocking problems [32], or mobile sensor network problems [14]. For example, in [14], multiple unmanned aerial vehicles (UAVs) form an ad hoc network, which uses a clustering architecture with a so-called Software Defined Networking (SDN)

5758

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

technology to support various applications on the platform. The communication topology of the ad hoc network is time-varying, since the relative position and the clustering architecture of the UAV network change over time. With various onboard sensors, the UAV fleet acts as a flying ad-hoc sensor network, which has been have been playing great roles in the fields of reconnaissance, Earth science research, humanitarian observations, etc [33]. In this section, the two most important problems for DFD resulted from the time-varying topology are identified, i.e., the deterioration of decomposition schemes and the unclean training environment (where faults can exist). Correspondingly, our proposed solutions for these problems are introduced, respectively. 3.1. Adapting the decomposition scheme While the term “decomposition” is often used in FD research, in the literature of network science, the problem of decomposing a system consisting of a large number of nodes is usually referred to as graph partitioning (GP) [34–36]. Currently, existing GP algorithms can be classified as global algorithms, heuristic approaches, the multilevel method, and metaheuristics such as evolutionary algorithm which uses multiple runs of other algorithms. Since the global/exact GP problem is NP-complete [34,37], heuristics approaches are widely used; and by now the most successful among them is the multilevel method [37]. In the conventional decomposition-based DFD methods [6–8], subsystem regions and boundaries are prescribed; and as other distributed systems, a good decomposition structure is needed to achieve better performance. In general, good DFD decompositions/partitions are those with balanced LFDs load distribution but small number of subsystem interconnections. From the perspective of pure GP, a most prominent objective is to minimize the total cut, i.e., the number of edges that connect different subsystems. In network science [38], modularity is also used to measure the quality of a partition scheme, which is the fraction of the edges that fall within the given groups minus the expected fraction of the edges if they were distributed at random (with at least one end in the given groups). However, pure GP approaches do not consider LFDs load balancing, and direct application of the existing GP/PGP algorithms on WSNs is still hard since WSNs consist of sensors with only very limited computational resources. As stated in [28], i) sensors are usually cheap and small-size devices that have very limited computational power compared to regular personal computers; ii) WSNs are usually deployed in harsh environment, such as battlefield, forests, and special industrial fields; and iii) since WSNs are energy constrained, it is essential that transmission with the base station (the most energy consuming activity for sensors) is minimized. Therefore, a WSN should operate in a self-configured and autonomous mode, such that decisions are made at sensor level as much as possible [28]. Therefore, for the DFD of network systems like WSNs with time-varying topology, an online distributed decomposition (DD) algorithm and its variant, i.e., the DD2 algorithm, are developed, which are implementable at the basic node/sensor level. Some advantages of the proposed decomposition algorithms can be summarized as follows. 1. The DD and DD2 algorithms can be viewed as diffusion-based GP [39,40], but are distributed-system-oriented, where each basic node has only very limited computational power. Therefore, the DD and DD2 algorithms are designed such that they are implementable at the basic node/sensor level.

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5759

Fig. 1. Fault diagnosis over dynamic graph.

2. Developed for the DFD framework, the DD and DD2 algorithms naturally support LFDs load balancing, since each subsystem partition starts with the corresponding LFD, and can be adjusted using a decay coefficient. 3. The DD and DD2 algorithms are based on network system dynamics, thus the decomposition scheme can evolve in real-time upon any connection changes. 4. System-theoretic analysis is given on the proposed algorithms, which shows that the DD and DD2 algorithms are finite-step convergent on static graph under various conditions.

3.2. Fault detection in volatile training environment Note that the proposed online decomposition algorithms are implemented distributedly by each sensor, however, fault detection, as a much more complicated task, is carried out by each LFD of the corresponding subsystem and runs at a lower frequency than the decomposition algorithms. As previously mentioned, for dynamic graphs, since pre-training cannot cover new system patterns, adaptive fault diagnosers need to be trained online, which, however, can be affected by existing faults. Therefore, considering the volatile online training environment in dynamic graphs, new FD methodologies are needed. In general, a complete FD system consists of three tasks, i.e., fault detection, fault isolation, and fault identification [41]. Fault detection is the most basic task of FD, which is to check whether there is a fault existing in the system; fault isolation is to locate the faulty component; and fault identification is to determine the type and/or other characteristics of the fault. In this paper, to deal with the topology-varying training environment, a fault detection method is proposed, which avoids the fault-free training condition. The proposed method is named as matched weights fault detection (MWFD), since it uses special basis functions with known and specifically designed coupling relationships, so that the matching of the learned neuron weights reflects the health conditions of certain nodes. Fault isolation and identifications are not in our scope and are left for future work. Fig. 1 shows the overall relationships between the different components in this framework, which summarizes the previous discussions.

5760

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

It is worth pointing out that although the decomposition algorithms and the fault detection algorithm are executed at different frequencies, the same discrete-time notations t and t + 1 are still used for the current and next time instants in all algorithms without confusion. 4. Online distributed decomposition The DD algorithm is a distributed GP algorithm which is designed to be implementable at sensors level. To facilitate convergence analysis, in this section, key components of the DD algorithms are formulated in mathematical expressions before presented in pseudocodes. Furthermore, considering the application on dynamic network, a variant of the DD algorithm, i.e., DD2, is also proposed, which can generate an adaptive decomposition scheme over dynamic graphs with a rather stable decomposition structure and small number of subsystem interconnections. 4.1. The DD algorithm In the DD algorithm, each node i is assigned a scalar value pi ≥ 0, called the node strength, and a group number gi ∈ Q = {0, 1, 2, . . . , N }, where N is the number of LFDs and Q is the group number set. If a node does not belong to any group, its group number is 0. The nodes of the same group form a subsystem in the DFD architecture. Furthermore, let p = [ p1 , p2 , . . . , pn ]T and g = [g1 , g2 , . . . , gn ]T . Definition 1. The strength Pi(k) of group k on node i is defined as  Pi(k) = pj,

(2)

j∈N¯ i(k)

where N¯ i(k) = N¯ i ∩ {l ∈ V|g(l ) = k} and N¯ i = Ni ∪ {i}. In addition, the strength distribution P (k) ∈ Rn of group k is defined as P (k) = [P1(k) , P2(k) , . . . Pn(k) ]T ; and P = [P (1) , P (2) , . . . P (ng ) ] is called the group strength matrix. In the DD algorithm, the GFD is unrelated to the decomposition process. Since in the DFD architecture each LFD serves as a central node of the corresponding subsystem, in the DD algorithm, it is used as a starting location for its group. When running the algorithm, the group number and the strength of each LFD are fixed; and the strength of all LFDs are set as the same scalar a. The group selection rule for each basic node i is given as follows ⎧ arg max Pi(k) (t ), ⎪ ⎪ ⎨ k∈Q gi (t + 1) = (3) if max Pi(k) (t ) > Pi(gi (t )) (t ), ⎪ k∈Q ⎪ ⎩ gi (t ), otherwise. Thus if the current group gi of node i is one of the groups with the greatest group strength Pi(gi ) on node i, then gi remains the same. When gi changes, the strength of node i should also be updated (otherwise it remains the same). The update rule of the node strength is given as follows, pi (t + 1) =

cr Pi(gi (t+1)) (t )   ,  ¯ (k)  N i 

(4)

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5761

where cr ∈ (0, 1] is called the decay coefficient since if cr < 1, it reduces the strength of a node passed by the influencing nodes. Since the strength of each LFD node remains unchanged, for cr < 1 the influence of each LFD is limited to its nearby region, which can thus support balancing LFDs load distribution. The pseudocodes for each LFD and basic node are given in Algorithms 1 and 2, respecAlgorithm 1 Pseudocode for the Ith LFD. 1: Initialize group number gI = I ; 2: Initialize strength pI = a; 3: Initialized observer models; 4: repeat 5: Detect topology changes and update FD models; 6: Perform local fault diagnosis; 7: Generate and submit fault decisions to the GFD; 8: until End signal received

Algorithm 2 Pseudocode for basic node i in subsystem I. 1: Initialize node strength pi = 0; 2: Initialize group number gi = 0; 3: Set parameter cr as specified; 4: repeat 5: Detect neighborhood topology changes; 6: Get each neighbor’s group g j and node strength p j , j ∈ Ni ; 7: Compute group strength Pi(k) for each group k on node i; 8: Update gi (t + 1) using (3); 9: if gi (t + 1) = gi (t ) then 10: Update node strength pi with (4); 11: Collect physical data, e.g., xi , ui ; 12: If required, report data to LFD of subsystem I , including physical data, pi , gi , and latest neighborhood topology (if updated); 13: Forward time (synchronized); 14: until End signal received tively. For decomposition purposes, each basic node (as shown in Algorithm 2) repeatedly inquires the group and strength information from its neighbors and then updates its own variables. For DFD purposes, the tasks of a basic node should be collecting data from physical layers and reporting necessary data to the local LFD. Each LFD, after initialization, simply focuses on the complicated DFD task using different FD methods. The time complexity of the DD algorithm (not considering FD parts) on basic node i within each iteration time step t is linear to the number of its adjacent nodes, i.e., O(N¯ i (t )). As proved in the next subsection, the number of steps for the group numbers to converge for a static graph is finite. In fact, experiments show that for a network of one hundred nodes and four LFDs, the algorithm converges in only a few steps (three for our static graph results in Section 7.1).

5762

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

4.2. Convergence analysis In this subsection, the DD algorithm is proved to be finite-step convergent over static graphs. First, the following theorem shows that there can be an equilibrium point for the group numbers. Theorem 1. Suppose the network is a static graph. Furthermore, suppose that at time tM > 0, no group update is made on any of the nodes in the network, i.e., gi (t ) = gi (t − 1), ∀i ∈ V

(5)

for t = tM , then (5) also holds for all t > tM . The first time instant tM > 0 is called the convergence time of the decomposition. Proof. Suppose that Eq. (5) holds for t = tM , then from Eq. (3) we have max Pi(k) (t ) = Pi(gi (t )) (t ), ∀i ∈ V k∈Q

(6)

for t = tM − 1. Since pi is updated only when gi is changed, pi also remains unchanged for all i at time tM , i.e., pi (tM ) = pi (tM − 1), ∀i ∈ V, hence Eq. (6) also holds for t = tM , and Eq. (5) for t = tM + 1. The result is now immediate.  Assumption 1. Suppose that at time t, the group number of node i is updated, i.e., gi (t + 1) = gi (t ). Then, it is assumed that the neighbors of i and the neighbors of the neighbors of i remain the same, i.e., g j (t + 1) = g j (t ) and gl (t + 1) = gl (t ) for j ∈ Ni and l ∈ N j In practice, this can be realized by using certain communication techniques [42]. Next, the following theorem shows that the algorithm can converge if all group strengths are equal. Theorem 2. Suppose that Assumption 1 holds, and the network is a static graph. Let cr = 1 and p(0) = a1, where 1 = [1, 1, . . . , 1]T ∈ Rn×1 and a ≥ 0 is a scalar. Then gf = limt→∞ g(t ) exists. Furthermore, there exists a convergence time tM = tM1 ≥ 0, such that g(t ) = gf for all t ≥ tM1 . Proof. Let  V (t ) = d (gi (t ), g j (t )), i∈V j∈Ni

where



d (k1 , k2 ) =

0, k1 = k2 , 1, k1 = k2 .

Let us also define Vi (t) as   (k) Vi (t ) = d (gi (t ), g j (t )) = Di (t ), j∈Ni

where Di(k) (t ) =

 j∈Ni k=g j (t )

k∈Q k=gi (t )

1.

(7)

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5763

From cr = 1, p(0) = a1 and the node strength update rule (4), it can be known that the strength pi of each node i will remain as a throughout the execution of the algorithm. From Eq. (2), the following relationship holds Pi(gi (t )) (t ) = aDi(gi (t )) (t ) + a,

(8)

and Pi(k) (t ) = aDi(k) (t ) for k = gi (t ).

(9)

Suppose at time instant t, the group of node i changes from k1 to k2 , then according to the algorithm, Pi(k2 ) (t ) = max Pi(k) (t ) > Pi(k1 ) (t ). k∈Q

From Eqs. (8), (9) and the above relationship, Di(k2 ) (t ) > Di(k1 ) (t ). Furthermore, Di(k) (t + 1) = Di(k) (t ), k ∈ QNi . The influence of gi changing from k1 to k2 is as follows. From Assumption 1, for j ∈ Ni , D(kj 1 ) (t + 1) = D(kj 1 ) (t ) − 1, D(kj 2 ) (t + 1) = D(kj 2 ) (t ) + 1. Therefore, Vi (t + 1) = Vi (t ) − Di(k2 ) (t ) + Di(k1 ) (t ), V j (t + 1) = V j (t ) − 1, for j ∈ Ni with g j (t ) = k2 , V j (t + 1) = V j (t ) + 1, for j ∈ Ni with g j (t ) = k1 , V j (t + 1) = V j (t ), for j ∈ Ni with g j (t ) = k1 and g j (t ) = k2 . For compactness of the following equations, V is used to denote V(t) and V + to denote V (t + 1). Notations of D and D+ follow similarly. In addition, since it is assumed that g j (t + 1) = g j (t ) for j ∈ Ni (Assumption 1), gj (t) and g j (t + 1) are simply written as gj . Thus,    V + = Vi+ + V j+ + V j+ + V j+ j∈Ni g j =k2

j∈Ni g j =k1

= Vi − Di(k2 ) + Di(k1 ) +

 j∈Ni g j =k2



= V − 2 Di(k2 ) − Di(k1 ) .

j∈Ni g j =k1 g j =k2

V j − Di(k2 ) +

 j∈Ni g j =k1

V j + Di(k1 ) +



Vj

j∈Ni g j =k1 g j =k2

Because Di(k2 ) (t ) > Di(k1 ) (t ), we have V (t + 1) < V (t ), which means that every group change will reduce the value of V. Furthermore, from Eq. (7), (Di(k2 ) (t ) − Di(k1 ) (t )) ∈ Z. Since V ≥ 0

5764

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

always holds, there must be a limited number of updates on g. Once at some time instant tM1 there is no update for all nodes, from Theorem 1, the convergence time in this case is tM = tM1 .  Finally, the following theorem shows that there will be a time when all group strengths become equal. Theorem 3. Assume that Assumption 1 holds, and the network is a static graph. Let cr = 1 and p(0) = ab, where b ∈ Rn × 1 is a vector with each element being either 1 or 0, and a ≥ 0 is a scalar. In addition, suppose gi (0) = gj (0) for any i, j with pi (0) = 0 and pj (0) = 0. Then gf = limt→∞ g(t ) exists. Furthermore, there exists a convergence time tM = tM2 ≥ 0, such that g(t ) = gf for all t ≥ tM2 . Proof. If a node i has strength 0 and at least one of its neighbor node has strength a, then at the next iteration node i will have strength a. Thus in the connected network there exists tM ≥ 0 at which all of the nodes in the network have strength a. By Theorem 2, there also exists a tM1 ≥ 0 such that g(t ) = gf for all t ≥ tM2 = tM + tM1 . Thus, tM = tM2 is the convergence time in this case.  Remark 1. By Theorem 3, the DD algorithm with cr = 1 is shown to have finite-time/finitestep convergence property on static networks. Furthermore, it can also be noted that since limt → ∞ g(t) exists, the DD algorithm with cr = 1 over static networks also possesses a special type of semistability [43–45], i.e., semistability with finite-time convergence [46], or equivalently, finite-time semistability [47]. 4.3. The DD2 algorithm When applied on dynamic network, the decay coefficient cr should be strictly less than one, so that each subsystem is always centered around the corresponding LFD even after a long period of time. However, the DD algorithm is not guaranteed theoretically to be finitetime convergent over static graph when cr ∈ (0, 1). To secure the convergence property over static graph when cr ∈ (0, 1), here a variant of the DD algorithm, i.e., DD2, is proposed. Let us define the intensity of a group k on node i as  1  pj, P¯ i(k) (t ) =   ¯ (k)  Ni  j∈N¯ (k) i

then a new group selection rule can be given as ⎧ arg max P¯ i(k) (t ), ⎪ ⎪ ⎨ k∈Q gi (t + 1) = if max cr P¯ i(k) (t ) > pi (t ) + , ⎪ k∈Q ⎪ ⎩ gi (t ), otherwise, where  > 0. The node strength is updated as pi (t + 1) = cr P¯ i(gi (t+1))(t ) . Corollary 1. Assume that Assumptions 1 hold, and the network is a static graph. Let cr ∈ (0, 1) and p(0) = ab, where b ∈ Rn × 1 is a vector with each element being either 1 or 0, and a ≥ 0 is a scalar. In addition, suppose gi (0) = gj (0) for any i, j with pi (0) = 0 and pj (0) = 0.

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5765

Then gf = limt→∞ g(t ) exists. Furthermore, there exists a convergence time tM = tM2 ≥ 0, such that g(t ) = gf for all t ≥ tM2 . Proof. It is easy to verify that Theorem 1 still holds. Furthermore, let V (t ) = i∈V (t ) pi (t ). Since pi (t + 1) − pi (t ) = cr P¯ i(gi (t+1)) (t ) − pi (t ) >  > 0, V(t) can only increase; and since V (t ) ≤ a|V|, by Lyapunov theory, there must exists a convergence time.  Although our ultimate goal is to perform decomposition over dynamic graph, the finite-time convergence property over static network still provides threads on the speed and stability of the algorithm. In Section 7.1.2, the DD2 algorithm is applied on network with dynamic topology, and experimental results show that DD2 with cr ∈ (0, 1) generates an adaptive decomposition scheme over the dynamic graph with a rather stable decomposition structure and small number of subsystem interconnections. 5. Matched weights fault detection In this section, the MWFD methodology is introduced, and some preliminary theoretical results are presented for MWFD when batch least-squares training is used. The MWFD method consists of two parts, i.e., matched basis training and matched weights detection logics. The batch least-squares training [48,49] is a classical optimization approach which minimizes the least-squares error of the neural network approximations, using pseudoinverse of the neuronal data matrix. It is simple and effective, although can suffer from ill-conditioning problems. While further studies may use a better training method, the batch least-squares training method is simply used here to facilitate the development of our matched weights detection logics. First, the connection functions hi j (xi , x j ), i, j  ∈ E (t ) are assumed to satisfy the following linear coupling constraint. Assumption 2 (Coupling Connections). For any i ∈ V and j ∈ N (i), the connection functions hij of node i and hji of node j satisfy hi j (xi , x j ) = Ei j h ji (x j , xi ),

(10)

where Ei j ∈ Rmi j ×mi j is a known invertible coefficient matrix (thus E ji = Ei−1 j , and m ji = mi j ). Examples of Eq. (10) include the Kuramoto model [50] and simple input/output relationship, e.g., hi j (xi , x j ) = −h ji (x j , xi ), in a traffic network or packet routing network. Next, let hˆ i j (xi , x j ; ϑ ) = Zi j (xi , x j )ϑ be a neuronal approximation to hij (xi , xj ) using the input layer neu(Li j ) (2) rons (also called basis functions here) Zi j (xi , x j ) = [ϕi(1) j (xi , x j ), ϕi j (xi , x j ), . . . , ϕi j (xi , x j )] with ϕi(lj ) ∈ Rmi j being linearly independent with each other and a weight vector ϑ = [ϑ (1) , ϑ (2) , . . . , ϑ (Li j ) ]T ∈ RLi j . Suppose that the sensors have collected the states of node i and its neighbors, i.e., xi (s) and xj (s), j ∈ Ni , s ∈ S (t ), where S (t ) denotes the set of time instants at and before time t for the collected data. Then we have the following theorem. Theorem 4. Suppose that Assumption 2 is true. Let  θˆi∗j = arg min hi j (xi , x j ) − hˆ i j (xi , x j ; ϑ )2 , ϑ

(11)

s∈S(t−1)

where  ·  denotes the Euclidean L2 norm. The following results hold, corresponding to two different set-up cases of the neural network.

5766

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

1. Set-up 1: If ϕi(lj ) (xi , x j ) = Ei j ϕ (lji ) (x j , xi ) for all xi ∈ Rni , x j ∈ Rn j and any (i, j) ∈ E, l ∈ {1, 2, . . . , Li j }, then θˆi∗j = θˆ∗ji . 2. Set-up 2: Suppose Ei j = αi j ∈ R for any (i, j) ∈ E. If ϕi(lj ) (xi , x j ) = ϕ (lji ) (x j , xi ) for all xi ∈ Rni , x j ∈ Rn j and any (i, j) ∈ E, l ∈ {1, 2, . . . , Li j }, then θˆi∗j = αi j θˆ∗ji Proof. Let Hi j (t ) = vertcat(hi j (s) : s ∈ S (t − 1)) ∈ Rmi j NS(t−1) , and Zi j (t ) = vertcat(Zi j (s) : s ∈ S (t − 1)) ∈ Rmi j NS(t−1) ×Li j , where NS(t) denotes the size of S(t). Then, it follows that  min hi j (xi , x j ) − hˆ i j (xi , x j ; ϑ )2 = Hi j − Zi j (t )ϑ 2 , ϑ

s∈S(t−1)

and thus θˆi∗j = Zi†j (t )Hi j (t ). For Set-up 1, since ϕi(lj ) (xi , x j ) = Ei j ϕ (lji ) (x j , xi ) for any (i, j) ∈ E, l ∈ {1, 2, . . . , Li j }, we have Zi j (xi , x j ) = Ei j Z ji (x j , xi ), (i, j) ∈ E. Therefore, θˆi∗j = Z † (t )Hi j (t ) = horzcat(Z † (s)E † , ij

s ∈ S (t − 1))vertcat(Ei j h ji (s), s ∈ S (t − 1)) = Z †ji (t )H ji (t ) = θˆ∗ji .

ji

ij

For Set-up 2, we have Zi j (xi , x j ) = Z ji (x j , xi ), (i, j) ∈ E. Thus, it follows that θˆi∗j = Zi†j (t )Hi j (t ) = α ji Z †ji (t )H ji (t ) = α ji θˆ∗ji .  Theorem 4 gives the matching properties of the theoretical optimal weights θˆi∗j , (i, j) ∈ E. In Section 5.1, the training of practical optimal weights θˆij , (i, j) ∈ E using the batch leastsquares method is introduced; next, in Section 5.2, the matching between the theoretical optimal weights θˆi∗j and the practical optimal weights θˆij is given. 5.1. The batch least-squares training Note that, since hij (xi , xj ) is unknown a priori, the optimal solution θˆi∗j in Eq. (11) is not obtainable directly. Thus, the following relaxation of Eq. (11) is adopted: 2   ˆ ˆ Gi (s) − min ci j hi j (xi (s), x j (s); θi j ) (12) , θˆi j , j∈N (i) s∈S(t ) j∈N (i) where Gi (s) =



ci j hi j (xi (s), x j (s)) + φi (s).

j∈N (i)

Let NS(t) denote the size of S(t). Note that if S(t ) = {t − NS(t ) + 1, t − NS(t ) + 2, . . . , t}, then Gi (s) ∈ Rni can be easily computed via using Gi (s) = xi (s + 1) − xi (s) − fi (xi (s), ui (s)) for s ∈ S (t − 1). Thus, from here on it is assumed that each Gi (s), s ∈ S (t − 1) has already been computed. Recall that hˆ i j (xi (s), x j (s); θˆi j ) = Zi j (xi (s), x j (s))θˆi j , and henceforth Zij (xi (s), xj (s)) is denoted as Zij (s). Let Gi (t ) = vertcat(Gi (s) : s ∈ S (t − 1)) ∈ Rni NS(t−1) , Zi j (t ) = vertcat(Zi j (s) : s ∈ S (t − 1)) ∈ Rmi j NS(t−1) ×Li j ,

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

Zi (t ) = horzcat(ci j Zi j (t ) : j ∈ Ni )

5767



∈ Rni NS(t−1) × j∈Ni Li j , ˆ i = vertcat(θˆi j : j ∈ Ni ) ∈ R j∈Ni Li j ,  where the operation “vertcat” denotes vertical concatenation and “horzcat” denotes horizontal concatenation. For simplicity, the parameter t will be omitted. Then Eq. (12) can be written as 2 ˆ i ), Fi (  ˆ i) = ˆ i min Fi ( Gi − Zi  , ˆi 

and the optimal solution of the above problem is ˆ  = vertcat(θˆ : j ∈ Ni ) = Z † Gi  i

ij

where

Zi†

(13)

i

denotes the Moore–Penrose pseudoinverse of Zi .

5.2. Constructing basis for MWFD Here, some basics are laid down regarding the construction of basis functions Zi j (xi , x j ), i, j  ∈ E (t ), and present results on the matching between theoretical optimal weights θˆi∗j and the practical optimal weights θˆij , (i, j) ∈ E. First, the definitions for two index sets of the basis functions are given as follows. Definition 2. Let us define (l ) Li(1) j = {l = 1, . . . , Li j : ϕi j (xi , x j ) depends on xi but not on x j },

and (l ) Li(2) j = {l = 1 . . . Li j : ϕi j (xi , x j ) depends on x j }. (2) For example, if Zi j (xi , x j ) = [xi , xiT x j , x j ], then Li(1) j = {1}, and Li j = {2, 3}.

Lemma 1. Suppose that Assumption 2 is true, and assume that one of the neural network (2)  set-up cases in Theorem 4 is used. Consider a j ∈ N (i), i ∈ V. If l  ∈ Li(1) j , then l ∈ L ji . 

(1) (l ) Proof. If l  ∈ Li(1) j , then by the definition of Li j , ϕi j (xi , x j ) depends on xi but not on xj . Note 









that we have ϕ (lji ) (x j , xi ) = E ji ϕi(lj ) (xi , x j ) or ϕ (lji ) (x j , xi ) = ϕi(lj ) (xi , x j ). Thus, ϕ (lji ) (x j , xi ) also depends on xi . Therefore, l  ∈ L(2)  ji In MWFD method, the fault detector is constructed such that its trainer keeps track of all occurred connection functions hij (xi , xj ) history and maintains a library of possible connection functions or their linear components as complete as possible, so that each Zij (xi , xj ) can be constructed as a basis of the linear space containing as many potential connection functions hij (xi , xj ) as possible. Constructing the trainer in this way maximally leads to the satisfaction of the following assumption, which is presented for further theoretical purposes. Assumption 3. Each connection function hij (xi , xj ) is a linear combination of ϕi(lj ) (xi , x j ), l = 1 . . . Li j . It is clear that with Assumption 3, there exists a θi∗j such that vi j (xi (s), x j (s); θˆi∗j ) = 0, for all s ∈ S(t ) and every  j, i ∈ E or i, j  ∈ E.

5768

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

ˆ ∗i ) = 0 holds, Therefore, for fault-free situations, i.e., when φi (s) = 0 for all s ∈ S (t ), Fi ( ∗ ˆ i is a solution of Problem 2. However, when Zi is column rank deficient, which means that   ˆ ˆ ∗i . Next, the following theorem gives the matching between i is not necessarily equal to  ˆ ∗i and the practical optimal weights  ˆ . the theoretical optimal weights  i Theorem 5. Suppose that Assumption 3 is true, and all Zi j , (i, j) ∈ E, are of full column rank. Furthermore, suppose that for a certain j ∈ N (i), there does not exist ϑ (l ) , l ∈ Li(2) j such that l∈L(2) ϕi(lj ) (xi , x j )ϑ (l ) = Ti (xi ) for any interval [a, b) of xj , where Ti (xi ) is a function ij independent of xj . Then, when φi (s) = 0 for all s ∈ S (t ) (the system is healthy), it holds that ) θˆi∗(l = θˆij (l ) , j

(14)

for l ∈ Li(2) j . ˆ ∗i ) = 0; and since Proof. Due to Assumption 3 and φi (s) = 0 for all s ∈ S (t ), we have Fi (   ∗ ˆ ˆ ˆ ˆ  ) = 0. Recall θi j is a least-squares solution of Problem 2, Fi (i ) ≤ Fi (i ), and thus Fi ( i that Zi = horzcat(ci j Zi j : j ∈ Ni ), with Zi j = Zi j (t ) = vertcat(Zi j (s) : s ∈ S (t )). Thus,   ˆ = Gi − Zi  ci j Zi j θˆi∗j − ci j Zi j θˆij = 0. i j∈N (i)

j∈N (i)

Since it is assumed that all Zi j , (i, j) ∈ E are of full column rank, the above numerical relationship is transformed into the following functional relationship   ci j Zi j (xi , x j )θˆi∗j = ci j Zi j (xi , x j )θˆij , j∈N (i)

j∈N (i)

and thus 

ci j

Li j 

j∈N (i)

=

) ϕi(lj ) (xi , x j )θˆi∗(l j

l=1



ci j

j∈N (i)

Li j 

ϕi(lj ) (xi , x j )θˆij (l ) .

(15)

l=1

Note that for a certain j ∈ N (i) and an l ∈ Li(2) j (i.e., ϕ ij (xi , xj ) is a function dependent on xj ), ϕi(lj ) (xi , x j ) is linearly independent of any other basis function, since only the jth 

node has variable xj and ϕi(lj ) (xi , x j ), l  = 1, . . . , Li j1 are already linearly independent of (l ) 2 each other. (Conversely, if l ∈ Li(1) j , e.g., ϕi j (xi , x j ) = xi , then it may be correlated with 

(l ) 2 other terms if there are j2 and l  ∈ Li(1) j2 such that ϕi j2 (xi , x j2 ) = xi .) Thus, the only way to Li j (l )  (l )  (l ) ˆ vary θˆi j without changing is addition/subtraction to/from j∈N (i) ci j l=1 ϕi j (xi , x j )θi j Li j (l )  (l ) (l ) (l ) ˆ (2) ϕ (x , x ) θ a multiple of ϕ (x , x ) ϑ = T i j ij i j i (xi ) which is correlated with ij l=1 i j l∈L 

ij

some ϕi(lj2 ) (xi , x j2 ), l  ∈ Li(1) j2 , or with another linear combination Ti j2 (xi ) independent of x j2 (even for only the data on an xj interval [a, b)). However, this situation conflicts with our (2) ˆ∗(l ) assumption that there does not exist such ϑ (l ) , l ∈ Li(2) = j . Therefore, for any l ∈ Li j , θi j (l ) ˆ θ .  ij

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5769

Remark 2. Note that although it is theoretically sufficient to have all Zi j , (i, j) ∈ E being of full column rank for Eq. (14) to hold, in practice, the more sampled data (thus ˆ  , i ∈ V, can the more number of rows) the more accurate the least-squares solutions  i be. Furthermore, making a long Zi j (row-wise) increases the likelihood of it being of full column rank; therefore, it is unnecessary to check the column rank of the matrices Zi j , (i, j) ∈ E, if we can obtain a large enough sampled data, not to mention that the neuronal basis ϑ = [ϑ (1) , ϑ (2) , . . . , ϑ (Li j ) ]T ∈ RLi j are already linearly independent with each other. 6. Basis functions and detection logics In this section, two different implementations of the basis functions in MWFD are studied, i.e., shared basis and B-splines basis, corresponding to the two different set-up cases in Theorem 4. Furthermore, based on the results in Section 5, fault detection logics for both the basis types are given. 6.1. MWFD with shared basis Here, a shared basis approach is proposed for MWFD. The following configuration extends Set-up 1 in Theorem 4, providing similar and consistent basis functions to be constructed at each node. Shared Basis Set-Up. Suppose that there is a shared basis Z(x, y, E). Each node i uses Zi j (xi , x j ) = Z (xi , x j , Ei j ) for j ∈ Nin (i), while using Zi j (xi , x j ) = Ei j Z (x j , xi , Ei j ) for j ∈ Nout (i). Specifically, Z (x, y, E ) = [q1 (x), q2 (x), . . . , qlq (x), qlq +1 (y, E ), qlq +2 (y, E ), . . . , q2lq (y, E ), q2lq +1 (x, y), q2lq +2 (x, y), . . . , qL (x, y)],

(16)

where qlq +k (y, E ) = E −1 qk (y) for k = 1, 2, . . . , lq , q1 (x), . . . , qlq (x) are linearly independent functions of x, and q2lq +1 (x, y), . . . , qL (x, y) are linearly independent functions of both x and y (note that the shared basis definition only specifies the form of the basis: the dimensions are determined only when applying the shared basis to a specific connection). Corollary 2. Suppose that Assumptions 2–3 are true, the shared basis set-up (16) is used, and all Zi j , (i, j) ∈ E are of full column rank. Consider Set-up 1 given in Theorem 4, and suppose that φi (s) = 0 for all s ∈ S (t ) (system is healthy). Then, for any node i, it holds that     (l+l ) (l+l ) (l ) ci j θˆij (l ) + ci j θˆi j q = ci j θˆ + ci j θˆji q (17) ji j∈Nin (i)

j∈Nout (i)

j∈Nin (i)

j∈Nout (i)

for l = 1, 2, . . . , lq , and (l ) θˆij (l ) = θˆ ji , ∀ j ∈ N (i)

for l = 2lq + 1, . . . , L. Proof. Note that for j ∈ Nout (i), Zi j (xi , x j ) = Ei j Z (x j , xi , Ei j ) = [Ei j q1 (x j ), Ei j q2 (x j ), . . . , Ei j qlq (x j ), q1 (xi ), q2 (xi ), . . . , qlq (xi ), Ei j q2lq +1 (x j , xi ), Ei j q2lq +2 (x j , xi ), . . . , Ei j qL (x j , xi )].

(18)

5770

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

(1) Thus, it follows that Li(1) j = {1, . . . , lq } for j ∈ Nin (i), and Li j = {lq + 1, . . . , 2lq } for j ∈ Nout (i). Therefore,



ci j

j∈N (i)



ϕi(lj ) (xi , x j )θˆij (l )



=

j∈Nin (i)

l∈Li(1) j

=

lq 

ci j

lq 

ql (xi )⎝

l=1

ci j

j∈Nout (i)

l=1





ql (xi )θˆij (l ) +

 j∈Nin (i)

ci j θˆij (l )

+



2lq  l=lq +1

ql (xi )θˆij (l ) ⎞

(l+l ) ci j θˆi j q ⎠.

j∈Nout (i)

Since q1 (xi ), . . . , qlq (xi ) are linearly independent, we have (17). Next, for j ∈ Nin (i), Li(2) j = {lq + 1, lq + 2, . . . , 2lq , 2lq + 1, . . . , L}; for j ∈ Nout (i), (2) Li j = {1, 2, . . . , lq , 2lq + 1, . . . , L}. Thus, for any node i, Eq. (14) is satisfied for l = (l ) ) ) 2lq + 1, . . . , L and for all j ∈ N (i). Therefore, θˆij (l ) = θˆi∗(l = θˆ∗(l = θˆ j ji ji , j ∈ N (i), for l = 2lq + 1, . . . , L.  The shared basis approach simplifies the basis functions structure and provides simple evaluation criteria for DFD of the large-scale network system. However, the shared basis needs to be prescribed and assigned to all nodes. In the following subsection, the B-splines-based MWFD method is proposed, which allows for dynamical construction of basis functions. 6.2. MWFD with B-Splines basis In this section, B-splines are used for the scalar-valued basis functions (hidden neurons), i.e., when mi j = 1, in the neural networks. In the shared-basis implementation, a shared basis function set-up is configured; with B-splines basis, however, our intention is to simplify the fault detection implementation in that no shared basis needs to be provided. Such a challenge is overcome by utilizing the built-in properties of B-splines. Using the same conventions as in [51], let K, M ≥ 0 be two integers, and u = [u−M , . . . , uK+M ). Then the B-Splines Bk,l,u (x): R → R of degree l with knot vector u are recursively defined by  1 if uk ≤ x < uk+1 , Bk,0,u (x) = 0 if x < uk or x ≥ uk+1 , for k = −M, . . . , K + M − 1, and Bk,l+1,u (x) = x − uk uk+l+2 − x Bk,l,u (x) + Bk+1,l,u (x) uk+l+1 − uk uk+l+2 − uk+l+1 for k = −M, . . . , K + M − l − 2, l = 0, . . . , M − 1. The support of a B-spline function Bk,l,u (x) is [uk , uk+l+1 ). Furthermore, {Bk,M,u : k = −M, . . . , K − 1} restricted to [u0 , uK ) is a basis of the spline space Su,M ([u0 , uK )) of degree M with knot vector u [51]. In our Bsplines-based MWFD method, the following property of the B-splines is used (Lemma 14.4 in [51]): K−1 

Bk,M,u (x) = 1 for x ∈ [u0 , uK ).

(19)

k=−M

Here, for simplicity and without loss of generality, the case of scalar state variable (i.e., ni = 1 for i ∈ V) is considered for B-splines-based MWFD. Let Mi be the degree of

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5771

i) B-splines of node i, Xi = {xi (s) : s ∈ S (t − 1)}, and uXi = [uX(−M , . . . , uX(Ki i +Mi ) ] = [min Xi − i Mi TXi , . . . , min Xi + Ki TXi ], where Ki is the number of B-spline intervals on [uX(0) , uX(Ki i ) ) = i [min Xi , max Xi ), and TXi = (max Xi − min Xi )/Ki . Furthermore, let BXi (x) = horzcat {Bk,Mi ,uXi (x) : k = −Mi , . . . , Ki − 1} : R → R1×(Ki +Mi ) . To satisfy Set-up 2 in Theorem 4, let Bi j (xi , x j ) = BX j (x j )  BXi (xi ) : R × R → R1×(Ki +Mi )(K j +M j ) for j ∈ Nin (i), and Bi j (xi , x j ) = BXi (xi )  BX j (x j ) : R × R → R1×(K j +M j )(Ki +Mi ) for j ∈ Nout (i), where  denotes the Kronecker tensor product. Then, neural network basis functions are set as Zi j (xi , x j ) = Bi j (xi , x j ). By Lemma 15.1 in [51], Zij (xi , xj ) is a basis of the multivariable spline space (K ) SuXi ,Mi ,uX j ,M j ([uX(0) , uX(Ki i ) ] × [uX(0)j , uX j j ]). i

Theorem 6. Suppose that Assumption 3 is true, and all Zi j , (i, j) ∈ E are of full column rank. Consider Set-up 2 given in Theorem 4. Let Zi j (xi , x j ) = Bi j (xi , x j ) for all j ∈ N (i). Then, when φi (s) = 0 for all s ∈ S (t ) (the system is healthy), 1M j +1  b is contained as a subsequence in θˆij − θˆi∗j for j ∈ Nin (i) (or b  1M j +1 for j ∈ Nout (i)), where b ∈ RMi +Ki , and 1L denotes the one vector [1, . . . , 1]T ∈ RL . Proof. Let δ θˆij = θˆij − θˆi∗j . Then, similar to Theorem 5, it follows that  j∈N (i)

ci j Zi j δ θˆij =

 j∈N (i)

ci j

Li j 

ϕi(lj ) (xi , x j )δ θˆij (l ) = 0.

l=1

Note that, ϕi(lj ) (xi , x j ) = Bk1 ,Mi ,uXi (xi )Bk2 ,M j ,uX j (x j ) for some k1 ∈ {−Mi , . . . , Ki − 1} and k2 ∈

{−M j , . . . , K j − 1}. Therefore, we have Li(2) j = {1, . . . , Li j }. Thus, for any l = 1, . . . , Li j , the Li j (l )   ˆ ˆ only possibility δ θi j = 0 is when Zi j δ θi j = l=1 ϕi j (xi , x j )δ θˆij (l ) = Ti (xi ) where Ti (xi ) is a function not related to xj (cf. Theorem 5). Such δ θˆ exists when using B-splines basis. ij

For example, consider a j ∈ Nin (i), and let δ θˆij = 1M j +K j  b, where b ∈ RMi +Ki . Then, using Eq. (19), it follows that   Bi j (xi , x j )δ θˆij = BX j (x j )  BXi (xi ) (1M j +K j  b ) = B j (x j )1M j +K j  BXi (xi )b = BXi (xi )b. (K )

Conversely, since BX j (x j ) is a basis on [uX(0)j , uX j j ), when K j = 1, the only way to remove xj is to use BX j (x j )(a1M j +1 ), with a ∈ R. When Kj > 1, as long as 1M j +1  b (b ∈ RMi +Ki ) is contained as a subsequence in δ θˆij , xj can be removed from a region [uX(α) , uX(α+1) ) using j j  BX j (x j )δ θˆi j , thus making that part of the data generate T(xi ), which is independent of xj . M j +1 Therefore, from j∈N (i) ci j Zi j δ θˆij = 0 and Li(2)  b will be contained j = {1, . . . , Li j }, a1  Mi +Ki ˆ as a subsequence in δ θi j for some a ∈ R, b ∈ R . The case of j ∈ Nout (i) is similar and immediate.  The following is a simple way to apply the B-splines-based MWFD method. Corollary 3. Suppose that Assumptions 2 and 3 are true, and all Zi j , (i, j) ∈ E are of full column rank. Consider Set-up 2 given in Theorem 4. Let Zi j (xi , x j ) = Bi j (xi , x j ) for all i, j  ∈ E, and Ki = 1 and Mi = M ∈ R for all i ∈ V. Then, for any node i, when φi (s) = 0

5772

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

Fig. 2. Static network decomposition with n = 100 and the average node degree of 6 using the DD algorithm.

for all s ∈ S (t ) (the system is healthy), it holds that M+1 θˆij − αi j θˆ  b1 + b2  1M+1 , ∀ j ∈ N (i) ji = 1

for some b1 , b2 ∈ RM+1 . Proof. Consider a j ∈ Nin (i), then by Theorem 6, we have θˆij − θˆi∗j = 1M+1  b1 and θˆ ji − ∗ M+1 M+1 ˆ θ ji = b2  1 for some b1 , b2 ∈ R . Since Bi j (xi , x j ) are constructed so that Set-up 2 in ∗ ˆ Theorem 4 is satisfied, it follows that θi j = αi j θˆ∗ji . The result is immediate.  Note that the detection of structure 1M+1  b1 + b2  1M+1 is easy to implement, as to be described in the next section. 7. Computer experiments Our experiments are designed as follows. Since the distributed decomposition and DFD algorithms run at different time scales, separate experiments are performed for decomposition and MWFD. Specifically, distributed decomposition experiments are first performed on general networks, then MWFD is carried out on simple network examples that are assumed to be part of a subsystem determined by the decomposition algorithm. All connections in the network examples for MWFD are assumed to be unknown, and faults exist before t = 0 so that observer-based method is not usable. 7.1. Distributed decomposition Here cr = 1 is considered for decomposition tasks of four LFDs with different system sizes. The high efficiency of the algorithm comes from its distributed nature, and the convergence time tM is depends on the number and locations of the LFDs, as well as the size of the network. 7.1.1. DD algorithm on static network In this subsection, the decomposition process is demonstrated on a static network of 100 nodes with an average node degree of 6. As shown in Fig. 2, the decomposition convergence time is tM = 3. Specifically, at t = 0, the locations of the four LFDs are specified near the four corners in Fig. 2(a); the four LFDs are assigned with different group numbers, each marked by a different color (1-yellow, 2-red, 3-orange, and 4-brown), while the basic nodes

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5773

Fig. 3. Dynamic network decomposition with n(0) = 300 and the average node degree of 6 using the DD2 algorithm with cr = 0.95.

share the same group number 0 and are marked by the grey color. The strengths of the four LFDs are a = 100, while the strengths of the basic nodes are 0. As the algorithm proceeds, each subsystem partition expands. Following the group selection rule (3), as different subsystems coincide, the algorithm converges to a decomposition scheme with fewer subsystem interconnections. Fig. 2(d) shows the final stage t = tM = 3 of the decomposition. Let us denote the subsystem formed by nodes of group I as SI ; then, at the final stage, there are the top-left S1 , top-right S2 , bottom-left S3 , and bottom-right S4 . From Fig. 2(d), it can be seen that the subsystem pairs (S1 , S3 ), (S3 , S4 ), and (S4 , S2 ) has only a few interconnections; (S1 , S2 ) has more interconnections because the regional preference also needs to be satisfied, i.e., each subsystem should be centered near its own LFD, so that the work load of LFDs are more equally distributed. Note that because cr = 1, the experiment in this subsection is exactly the case in Theorems 2 and 3. 7.1.2. DD2 algorithm on dynamic networks In this subsection, the DD2 algorithm is applied with the decay coefficient cr = 0.95 on dynamic networks, where the network connections are constantly and randomly changing, and nodes are randomly added or removed. Here, a connection change is defined to be either adding a new connection between two nodes, or removing an existing one. Specifically, at each time instant, five random connection changes are allowed; and up to five nodes are randomly added or removed. Fig. 3 shows the decomposition results of a network of 300 nodes with an average node degree of 6. As shown in the figure, although network topology has changed, a similar decomposition structure is maintained. The grey nodes in the figure are newly added nodes at the corresponding time instants, since they are initialized with group number 0. Furthermore, because the number of removed nodes is not equal to the number of new nodes added at each time instant, at t = 20, the number of nodes in the network is n(t ) = 289, the number of nodes in the network at t = 70 is n(t ) = 287. Next, to compare the decomposition quality of the adapting decomposition scheme generated by the DD2 algorithm with a fixed decomposition scheme on the dynamic graph, the network topology is first hold static from t = 0 to 14; then beginning from t = 15, changes are allowed for both nodes and connections. Let D(t ) denote the decomposition scheme generated by the DD2 algorithm at time t. The algorithm begins at t = 0, then at t = 15, the decomposition scheme D1 = D(15) is used as a reference for our comparison. We thus

5774

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

Fig. 4. Comparisons of the number of subsystem interconnections in decomposition scheme D1 (black) and D(t ) (red) using the DD2 algorithm.

Fig. 5. A network of 8 nodes.

compare the number of subsystem interconnections in D(t ) with that in D1 from t = 15 to t = 100. In the experiment for Fig. 4(a), only connections are allowed to change; for Fig. 4(b), both nodes and connections change. It is clear that the number of subsystem interconnections is kept at a lower level by applying the DD2 algorithm online, compared to using a static decomposition scheme D1 . The advantage is more obvious when both nodes and connections are changing, as shown in Fig. 4(b), where the number of subsystem interconnections of D(t ) can be kept at a steady low level, while the number for D1 continues to rise. 7.2. MWFD with shared basis Suppose that the MWFD method is applied on all eight nodes in the network shown as Fig. 5, and that all connection functions are given in Table 1. Suppose that the shared basis set-up (16) is used, and a shared basis has been constructed as Z (x, y) = [x , x 2 , x 3 , sin (x ), cos(x ), −y, −y2 , −y3 , − sin (y), − cos(y), x y, x 2 y, x y2 , sin (x ) cos(y), cos(x) sin (y), sin (x) sin (y), cos(x) cos(y)].

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5775

Table 1 Connection functions and coefficients. hij (xi , xj )

cij

cji

h12 (x1 , x2 ) = x2 − x1 h23 (x2 , x3 ) = x3 x2 − x2 h31 (x3 , x1 ) = x1 − x33 h15 (x1 , x5 ) = x5 − x1 h54 (x5 , x4 ) = x4 x5 − x53 h41 (x4 , x1 ) = sin (x1 − x4 ) h27 (x2 , x7 ) = cos(x7 − x2 ) h76 (x7 , x6 ) = sin (x6 ) − x7 h62 (x6 , x2 ) = x6 − sin (x6 ) cos(x2 ) h83 (x8 , x3 ) = x8 − sin (x3 − x8 )

0.03 0.12 0.12 0.1 0.17 0.2 0.1 0.2 0.1 0.1

0.05 0.05 0.15 0.05 0.07 0.05 0.1 0.1 0.05 0.05

Table 2 Normed node balances in different scenarios. φi (t ) = 0, i ∈ V φ2 (t ) = 0.03 cos(t ) φ4 (t ) ∼ U (−1, 1) φ8 (t ) = 0.05 φ1 (t ) = 0.03 cos(t ) φ3 (t ) ∼ U (−1, 1) φ7 (t ) = 0.05

b1 2

b2 2

b3 2

b4 2

b5 2

b6 2

b7 2

b8 2

4.889E−27 1.165E−02 5.281E−02 2.290E−23 1.148E−02

2.507E−24 2.805E+00 3.236E−25 2.028E−19 2.568E−01

2.554E−23 2.773E+01 3.145E−24 4.634E−03 1.131E+00

1.178E−25 1.156E−25 2.725E−02 3.070E−25 1.356E−03

1.601E−22 3.637E−22 4.114E−01 2.162E−19 3.539E+00

2.632E−26 1.340E+00 6.560E−25 3.622E−15 7.972E-07

2.676E−22 6.515E+02 1.510E−22 1.017E−15 5.614E−02

1.569E−25 2.120E−24 1.552E−25 6.712E−03 8.600E+00

It is clear that Assumption 3 is satisfied. To simplify the notations and experiments, let us consider the case where ni = nui = 1 (i.e., xi , ui ∈ R), fi (xi , ui ) = −0.1xi + ui for all i ∈ V, and Ei j = −1 for every i, j  ∈ E. ˆ  are computed. Also, all θˆ are extracted. Note that this is the case of Next, Zi and  i ij (l ) Corollary 2 where lq = 0. Thus we should have θˆij (l ) = θˆ for l = 1, 2 . . . L and every ji i, j  ∈ E if φ1 (t ) = 0. Furthermore, let u1 (t ) = sin (t ), and ui (t ) = 0 for i = 2, . . . , 8. Next, let us define a vector node balance, bi ∈ Rlq , for each node i, with the lth element (l = 1, . . . , lq ) being



  (l+l ) (l+l ) (l ) − bi(l ) = ci j θˆij (l ) − θˆ ci j θˆi j q − θˆji q . ji j∈Nin (i)

j∈Nout (i)

Thus, Eq. (17) can be written as bi = 0. Also, a vector edge match, mi j ∈ RL−2lq , is defined (2l +l ) (2l +l ) for each edge  j, i ∈ E, with mi(lj ) = θˆi j q − θˆji q . Therefore, the necessary conditions in Corollary 2 for the network to be healthy (fault-free) are bi = 0 for each i ∈ V and mi j = 0 for each i, j  ∈ E. Tables 2 and 3 show the results of MWFD in different fault scenarios. The results are also illustrated in Fig. 6, where the color depth and border thickness of a node i indicate the magnitude of the normed node balance bi 2 , and the thickness of an arrow < j, i > indicates the magnitude of the edge match mij 2 (dashed arrows denote edges with zero edge matches). Furthermore, the faulty nodes are marked with additional red circles. As shown in the tables, when there are no faults, i.e., φi (t ) = 0, i ∈ V, we have bi = 0 and mi j = 0 for all i, j, which verifies Corollary 2. Note that due to the computational accuracy of the simulation software, small values such as 3.236E−25 = 3.236 × 10−25 are actually zero. Next, let us consider the

5776

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

Table 3 Normed edge matches in different scenarios. φi (t ) = 0, i ∈ V φ2 (t ) = 0.03 cos(t ) φ4 (t ) ∼ U (−1, 1) φ8 (t ) = 0.05 φ1 (t ) = 0.03 cos(t ) φ3 (t ) ∼ U (−1, 1) φ7 (t ) = 0.05

m12 2

m23 2

m31 2

m15 2

m54 2

m41 2

m27 2

m76 2

m62 2

m83 2

6.52E−25 3.53E+00 3.14E−25 1.16E−19 8.38E-02

1.36E−23 2.85E+01 8.39E−24 3.07E−19 1.55E+01

2.41E−24 5.10E−24 9.42E−26 1.83E−21 1.09E+00

3.26E−21 3.10E−20 7.07E−25 4.09E−19 4.81E+00

1.28E−24 6.68E−24 2.55E+01 1.35E−23 1.53E−26

5.82E−24 1.98E−23 1.03E+01 1.90E−22 1.58E−01

1.40E−22 7.48E+02 2.44E−23 9.89E−17 1.50E−05

1.70E−23 2.58E−23 1.34E−24 2.01E−16 9.01E−05

5.91E−23 4.54E+02 1.39E−23 6.92E−17 5.64E−27

9.57E-24 6.49E-23 7.22E-25 2.65E−01 1.07E+03

Fig. 6. Results of the MWFD method.

three situations where only only one node has faults, i.e., when nodes 2, 4, and 8 are faulty, respectively. In Table 2, it is clear that if a fault exists on node i and all other nodes are healthy, the normed node balances of nodes i and all j ∈ N (i) are distinguishably nonzero, while other nodes have zero node balances. This is because the computation of bi is only affected by node i itself and its neighbors j ∈ N (i). This can be observed in Fig. 6(a)–(c) more clearly. Similarly, as shown in Table 3 and Fig. 6(a)–(c), if a fault exists on node i and all other nodes are healthy, the normed edge matches of all adjacent edges of node i are distinguishably nonzero, while other edges have zero edge matches. As derived from Corollary 2 and verified by the experimental results (also with the definition of node balances and edge matches), some useful properties of the MWFD method with shared basis and rules for DFD can be summarized qualitatively as follows. 1. If bi = 0, then either node i or at least one of its adjacent nodes is faulty (fault detected). 2. If bi = 0, then node i and its adjacent nodes are unlikely to be faulty (probability reduction if permitted). 3. If mij = 0, then at least one of its end nodes is faulty (fault detected). 4. If mi j = 0, then nodes i and j are unlikely to be faulty (probability reduction if permitted). To further demonstrate the application of MWFD, let us consider a situation where faults exist on three nodes simultaneously, i.e., when φ1 (t ) = 0.03 cos(t ), φ3 (t ) ∼ U (−1, 1) and φ7 (t ) = 0.05, simultaneously. As shown in Tables 2 and 3 as well as Fig. 6(d), in this case, all nodes have nonzero node balances. Thus, all the eight nodes are marked as “suspected”. At this point, further actions can be performed on marking the nodes according to the level of stringency of the DFD system. For example, the edges with zero edge matches can be used to reduce the adjacent end nodes’ likelihood of being faulty. Thus, nodes 2, 4, 5, and 6

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5777

Table 4 Results of MWFD using B-splines basis.

φi (t ) = 0, i ∈ V φ2 (t ) = 0.03 cos(t ) φ3 (t ) = U (−0.1, 0.1) φ5 (t ) = 0.02 cos(t ) φ1 (t ) = 0.05 sin (t ) φ2 (t ) = 0.07 cos(t ) sin (t ) φ4 (t ) = U (−0.4, 0.4)

ε 12

ε 23

ε 31

ε 15

ε 54

1.289E−12 4.756E+10 1.950E−16

5.574E−16 5.349E+09 4.846E+10

2.961E−15 2.263E−15 1.581E+07

1.582E−14 7.070E−16 1.153E+09

4.159E−20 8.436E−19 6.592E+06

8.740E+08

3.692E+08

1.396E+06

1.172E+04

2.453E+08

can be excluded from the list of suspected nodes (if at a low level of stringency), or marked as “suspected with lower fault probability”, since m54 2 = 0 and m62 2 = 0. If there are nodes with zero node balances, similar actions can be performed. 7.3. MWFD with B-Splines basis Here the application of MWFD with B-splines basis is demonstrated, where Ki = 1 and Mi = M ∈ R for all i ∈ V. The following algorithm (Algorithm 3) implements the detection Algorithm 3 Pseudocode for the detection of 1M+1  b1 + b2  1M+1 .   1: Input: θˆi j − αi j θˆ ji ; 2: Output: εi j   3: W ← reshape (θˆi j − αi j θˆ ji , M + 1, M + 1); 4: εi j ← 0; 5: for each column k ∈ {2, . . . , M + 1} of W do 6: δω ← W (:, k) − W (:, k − 1); 7: εi j ← εi j + ω − ω(1)1M+1 2 ; 8: end for of structure 1M+1  b1 + b2  1M+1 (b1 , b2 ∈ RM+1 ) in the weighted difference of the learned parameters θˆij − αi j θˆ ji . Note that the MATLAB function “reshape” is used, W(:, k) denotes the kth column of matrix W, and ω(k) denotes the kth element of vector ω. As shown in the algorithm, εij depicts the degree of matching the structure 1M+1  b1 + b2  1M+1 (b1 , b2 ∈ RM+1 ) by θˆij − αi j θˆ ji , which is similar to the edge match defined in the experiment of shared basis MWFD. Let us consider the five-node network (Fig. 7(a)) for the B-splines-based MWFD, where the same connection functions set is used for the connections between the five nodes as shown in Table 1. Here, we let M = 5. Table 4 shows the computed εij for each edge when under different fault scenarios. Note that as a different implementation, the B-splines-based MWFD yields different computational accuracy. For example, in shared-basis MWFD, 3.26E−21 is considered zero and 9.01E−05 is considered nonzero; while in B-splines-based MWFD, 1.289E−12 can also be accepted as zero since nonzero mostly yields values such as 4.756E+10. The evaluation is similar to using edge matches in shared-basis MWFD. As shown in Table 4 and Fig. 7, when node i has fault, its adjacent connections will have εij distinguishably nonzero, thus verifying Corollary 3 for B-splines-based MWFD.

5778

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

Fig. 7. B-Splines-based MWFD on a network of 4 nodes.

8. Conclusions and discussions In this paper, the problem of extending the existing decomposition-based DFD framework to systems with time-varying network topology (so-called dynamic graphs) is considered; and two most important problems resulted from the time-varying topology have been identified and studied, i.e., the deterioration of decomposition schemes and difficulty of training fault diagnosers for dynamic graphs. Therefore, accordingly, the distributed decomposition algorithms DD and DD2, and the MWFD method, are proposed to solve the above-mentioned problems, respectively. The decomposition algorithm has been proven to converge in finite steps for static topology, and shown by experiments to maintain a small number of subsystem interconnections for dynamic graphs (compared with using static decomposition schemes). On the other hand, two different implementations (i.e., shared-basis and B-splines neurons) of the MWFD method have been developed, and their effectiveness has been verified. Since uncertainties are not considered in this paper, the MWFD method is still in qualitative stage of the study. Future research directions for MWFD include studying and improving the robustness of the MWFD method with respect to uncertainties. Besides, better distributed decomposition algorithms can be explored, and further analysis on the decomposition algorithm can be carried out. It is also worth pointing out that although observer-based method is not our focus in this paper, the integration of the distributed decomposition algorithm and observer-based DFD over dynamic graphs is not trivial. An LFD observer not only needs to run its state dynamics equation, but also needs to adapt itself to the changing network. This can also be included in future studies.

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

5779

Acknowledgment This work was supported by the Defense Threat Reduction Agency under Grant HDTRA115-1-0070. References [1] D. Zhang, R. Ling, Q.-G. Wang, L. Yu, Y. Feng, Sensor-network-based distributed stabilization of nonlinear large-scale systems with energy constraints and random sensor faults, J. Frankl. Inst. 352 (8) (2015) 3345–3365. [2] H. Qi, S.S. Iyengar, K. Chakrabarty, Distributed sensor networksa review of recent research, J. Frankl. Inst. 338 (6) (2001) 655–668. [3] J. Chen, S. Kher, A. Somani, Distributed fault detection of wireless sensor networks, in: Proceedings of the 2006 Workshop on Dependability Issues in Wireless Ad Hoc Networks and Sensor Networks, Los Angeles, CA, 2006, pp. 65–72. [4] Z. Lan, Z. Zheng, Y. Li, Toward automated anomaly identification in large-scale systems, IEEE Trans. Parallel Distrib. Syst. 21 (2) (2010) 174–187. [5] M.G. Morais, F.R. Meneguzzi, R.H. Bordini, A.M. Amory, Distributed fault diagnosis for multiple mobile robots using an agent programming language, in: 2015 International Conference on Advanced Robotics (ICAR), IEEE, 2015, pp. 395–400. [6] R.M.G. Ferrari, T. Parisini, M.M. Polycarpou, Distributed fault detection and isolation of large-scale discrete-time nonlinear systems: an adaptive approximation approach, IEEE Trans. Autom. Control 57 (2) (2012) 275–290. [7] M. Blanke, M. Kinnaert, J. Lunze, M. Staroswiecki, Distributed fault diagnosis and fault-tolerant control, in: Diagnosis and Fault-Tolerant Control, Springer Berlin Heidelberg, Berlin, Heidelberg, 2016, pp. 467–518, doi:10. 1007/978- 3- 662- 47943- 8_10. [8] S. Shao, H. Yang, B. Jiang, S. Cheng, Decentralized fault tolerant control for a class of interconnected nonlinear systems, IEEE Trans. Cybern. 48 (1) (2018) 178–186, doi:10.1109/TCYB.2016.2627682. [9] Q. Hui, C. Peng, Randomized target search and its convergence in dynamic multi-layer networks, in: American Control Conference (ACC), 2016, IEEE, 2016, pp. 3770–3775. [10] D. Peng, N.H. El-Farra, Z. Geng, Q. Zhu, Distributed data-based fault identification and accommodation in networked process systems, Chem. Eng. Sci. 136 (2015) 88–105, doi:10.1016/j.ces.2015.04.040. [11] Y. Liang, A. Sheng, G. Qi, Y. Li, Event-triggered diffusion estimation for asynchronous sensor networks with unreliable measurements, J. Frankl. Inst. (2018). [12] M. Ilyas, I. Mahgoub, Smart Dust: Sensor Network Applications, Architecture and Design, CRC press, 2016. [13] C. Yang, C. Liu, X. Zhang, S. Nepal, J. Chen, A time efficient approach for detecting errors in big sensor data on cloud, IEEE Trans. Parallel Distrib. Syst. 26 (2) (2015) 329–339. [14] W. Qi, Q. Song, X. Kong, L. Guo, A traffic-differentiated routing algorithm in flying ad hoc sensor networks with sdn cluster controllers, J. Frankl. Inst. 356 (2) (2019) 766–790. [15] J.-N. Li, Y.-F. Xu, W.-D. Bao, Z.-J. Li, L.-S. Li, Finite-time non-fragile state estimation for discrete neural networks with sensor failures, time-varying delays and randomly occurring sensor nonlinearity, J. Frankl. Inst. 356 (3) (2019) 1566–1589. [16] J. Cortes, S. Martinez, T. Karatas, F. Bullo, Coverage control for mobile sensing networks, IEEE Trans. Robot. Autom. 20 (2) (2004) 243–255. [17] X. Su, L. Wu, P. Shi, Sensor networks with random link failures: distributed filtering for t–s fuzzy systems, IEEE Trans. Ind. Inform. 9 (3) (2013) 1739–1750. [18] Z. Hu, B. Li, On the fundamental capacity and lifetime limits of energy-constrained wireless sensor networks, in: Real-Time and Embedded Technology and Applications Symposium, 2004. Proceedings. RTAS 2004. 10th IEEE, IEEE, 2004, pp. 2–9. [19] R.M. Ferrari, T. Parisini, M.M. Polycarpou, Distributed fault detection and isolation of large-scale discrete-time nonlinear systems: an adaptive approximation approach, IEEE Trans. Autom. Control 57 (2) (2012) 275–290. [20] J.A. Farrell, M.M. Polycarpou, Adaptive Approximation Based Control, Unifying Neural, Fuzzy and Traditional Adaptive Approximation Approaches, 48, John Wiley & Sons, Inc., Hoboken, NJ, USA, 2006. [21] M.M. Polycarpou, A.T. Vemuri, Learning methodology for failure detection and accommodation, Control Syst., IEEE 15 (3) (1995) 16–24. [22] R.M. Ferrari, T. Parisini, M.M. Polycarpou, A fault detection scheme for distributed nonlinear uncertain systems, in: 2006 IEEE International Symposium on Intelligent Control, IEEE, 2006, pp. 2742–2747.

5780

C. Peng, Y. Zhou and Q. Hui / Journal of the Franklin Institute 356 (2019) 5754–5780

[23] C. Peng, Q. Hui, Real-time distributed decomposition for large-scale distributed fault diagnosis over dynamic graphs, in: American Control Conference (ACC), 2016, IEEE, 2016, pp. 2472–2477. [24] C. Peng, Y. Zhou, Q. Hui, Distributed fault diagnosis with shared-basis and b-splines-based matched learning, in: 2017 13th IEEE Conference on Automation Science and Engineering (CASE), IEEE, 2017, pp. 536–541. [25] D.D. Šiljak, Dynamic graphs, Nonlin. Analysis: Hybrid Syst. 2 (2) (2008) 544–567. [26] S. Barbarossa, G. Scutari, Decentralized maximum-likelihood estimation for sensor networks composed of nonlinearly coupled dynamical systems, IEEE Trans. Signal Process. 55 (7) (2007) 3456–3470. [27] W. Yu, G. Chen, M. Cao, Some necessary and sufficient conditions for second-order consensus in multi-agent dynamical systems, Automatica 46 (6) (2010) 1089–1095. [28] M.M. Afsar, M.-H. Tayarani-N, Clustering in sensor networks: a literature survey, J. Netw. Comput. Appl. 46 (2014) 198–226, doi:10.1016/j.jnca.2014.09.005. [29] K. Ma, Y. Zhang, W. Trappe, Managing the mobility of a mobile sensor network using network dynamics, IEEE Trans. Parallel Distrib. Syst. 19 (1) (2008) 106–120. [30] R. Olfati-Saber, R.M. Murray, Consensus problems in networks of agents with switching topology and time-delays, IEEE Trans. Autom. Control 49 (9) (2004) 1520–1533. [31] T. Li, J.-F. Zhang, Consensus conditions of multi-agent systems with time-varying topologies and stochastic communication noises, IEEE Trans. Autom. Control 55 (9) (2010) 2043–2057. [32] H.G. Tanner, A. Jadbabaie, G.J. Pappas, Flocking in fixed and switching networks, IEEE Trans. Autom. control 52 (5) (2007) 863–868. [33] I. Bekmezci, O.K. Sahingoz, S¸ . Temel, Flying ad-hoc networks (fanets): a survey, Ad Hoc Netw. 11 (3) (2013) 1254–1270. [34] A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, C. Schulz, Recent advances in graph partitioning, in: Algorithm Engineering, Springer, 2016, pp. 117–158. [35] C.-E. Bichot, P. Siarry, Graph Partitioning, John Wiley & Sons, 2013. [36] J. Kim, I. Hwang, Y.-H. Kim, B.-R. Moon, Genetic approaches for graph partitioning: a survey, in: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, ACM, 2011, pp. 473–480. [37] G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput. 20 (1) (1998) 359–392. [38] A.L. Barabási, Network Science, Cambridge university press, 2016. [39] S.E. Schaeffer, Graph clustering, Comput. Sci. Rev. 1 (1) (2007) 27–64. [40] F. Pellegrini, A parallelisable multi-level banded diffusion scheme for computing balanced partitions with smooth boundaries, in: European Conference on Parallel Processing, Springer, 2007, pp. 195–204. [41] Z. Gao, C. Cecati, S.X. Ding, A survey of fault diagnosis and fault-tolerant techniquespart i: Fault diagnosis with model-based and signal-based approaches, IEEE Trans. Ind. Electron. 62 (6) (2015) 3757–3767. [42] A. Leon-Garcia, I. Widjaja, Communication Networks, McGraw-Hill, New York, NY, 2003. [43] Q. Hui, J.M. Berg, Semistability theory for spatially distributed systems, Syst. Control Lett. 62 (10) (2013) 862–870. [44] Q. Hui, Distributed semistable LQR control for discrete-time dynamically coupled systems, J. Frankl. Inst. 349 (1) (2012) 74–92. [45] Q. Hui, A semistability-based design framework for optimal consensus seeking of multiagent systems in a noisy environment, in: American Control Conference, Montréal, Canada, 2012, pp. 20–25. [46] Q. Hui, W.M. Haddad, S.P. Bhat, Semistability, finite-time stability, differential inclusions, and discontinuous dynamical systems having a continuum of equilibria, IEEE Trans. Autom. Control 54 (10) (2009) 2465–2470. [47] Q. Hui, W.M. Haddad, S.P. Bhat, Finite-time semistability and consensus for nonlinear dynamical networks, IEEE Trans. Autom. Control 53 (2008) 1887–1900. [48] H.W. Sorenson, Least-squares estimation: from gauss to kalman, IEEE Spectrum 7 (7) (1970) 63–68. [49] S. Van Huffel, P. Lemmerling, Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications, Springer Science & Business Media, 2013. [50] Y. Kuramoto, Chemical Oscillations, Waves, and Turbulence, 19, Springer Science & Business Media, 2012. [51] L. Györfi, M. Kohler, A. Krzyzak, H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer Series in Statistics, Springer New York, New York, NY, 2002.