7th IFAC Workshop on Distributed Estimation and 7th IFAC Workshop on Distributed Estimation and 7th IFAC Workshop on Distributed and Control Networked 7th IFACin Workshop onSystems Distributed Estimation Estimation Availableand online at www.sciencedirect.com Control in Networked Systems Control Networked 7th IFACin onSystems Distributed Groningen, NL, August 27-28, 2018Estimation and Control inWorkshop Networked Systems Groningen, NL, August 27-28, 2018 Groningen, NL, August 27-28, 2018 Control in Networked Systems Groningen, NL, August 27-28, 2018 Groningen, NL, August 27-28, 2018
ScienceDirect
IFAC PapersOnLine 51-23 (2018) 408–413
Space-Time Sampling for Network Observability Space-Time Sampling for Network Observability Space-Time Sampling for Network Observability Space-Time Sampling for Network Observability ∗ ∗∗ Hossein K. Mousavi ∗ Qiyu Sun ∗∗ Nader Motee ∗∗
∗ ∗∗ ∗ ∗ ∗∗ ∗ Hossein Qiyu Nader Motee ∗ ∗∗ ∗ Hossein K. Mousavi Qiyu Sun Nader Motee Hossein K. K. Mousavi Mousavi ∗∗∗ Qiyu Sun Sun ∗∗ Nader Motee ∗∗ ∗ Hossein K. Mousavi Qiyu Sun Nader Motee ∗ Mechanical Engineering and Mechanics Department, Lehigh University, ∗ ∗ Engineering and Mechanics Department, Lehigh University, ∗ Mechanical ∗ Mechanical Engineering and (e-mails: Mechanics{mousavi,motee}@lehigh.edu) Department, Lehigh Lehigh University, University, Bethlehem, PA 18015, USA Engineering and Mechanics Department, ∗ Mechanical Bethlehem, PA 18015, USA {mousavi,motee}@lehigh.edu) ∗∗Mechanical Engineering and (e-mails: Mechanics Department, Lehigh University, Bethlehem, PA 18015, USA (e-mails: {mousavi,motee}@lehigh.edu) Department of Mathematics, University of Central Florida, Orlando, Bethlehem, PA 18015, USA (e-mails: {mousavi,motee}@lehigh.edu) ∗∗ ∗∗ Department of Mathematics, University of Central Florida, Orlando, ∗∗Bethlehem, PA 18015, USA (e-mails: {mousavi,motee}@lehigh.edu) ∗∗ Department of Mathematics, University of Central Florida, Orlando, FL 32816, USA (e-mail:
[email protected]) Department of Mathematics, University of Central Florida, Orlando, ∗∗ FL 32816, USA (e-mail:
[email protected]) DepartmentFL of 32816, Mathematics, University of Central Florida, Orlando, USA
[email protected]) FL 32816, USA (e-mail: (e-mail:
[email protected]) FL 32816, USA (e-mail:
[email protected]) Abstract: We will investigate under what conditions taking coarse samples from a network Abstract: We will investigate under what conditions taking coarse samples from network Abstract: We will investigate under what conditions taking coarse samples from aaathe network Abstract: We will investigate under what conditions taking coarse samples from network will contain the same information as a finer set of samples. Our goal is to estimate initial will contain the same information as aawhat finer set of samples. Our goal is to estimate the initial Abstract: We will investigate under conditions taking coarse samples from a network will contain the same information as finer set of samples. Our goal is to estimate the initial state of a linear network of subsystems, which are distributed in a spatial domain, from noisy will contain the same information as a finer set of samples. Our goal is to estimate the initial state of aa linear network of subsystems, which are distributed in aa spatial domain, from noisy will contain the same information as a finer set of samples. Our goal is to estimate the initial state of linear network of subsystems, which are distributed in spatial domain, from noisy measurements. We develop a framework to produce feasible sets of spatio-temporal samples state of a linear network of subsystems, which are distributed in a spatial domain, from noisy measurements. We develop a framework to produce feasible sets of spatio-temporal samples state of a linear network of subsystems, which are distributed in a spatial domain, from noisy measurements. We develop a framework to produce feasible sets of spatio-temporal samples measurements. We develop a framework to produce feasible sets of spatio-temporal samples for the estimation problem, which essentially have a non-uniform space-time sampling pattern. for the estimation problem, essentially have space-time sampling pattern. measurements. Wesampling develop which a framework to produce feasible sets of spatio-temporal samples for the estimation problem, which essentially have non-uniform space-time sampling pattern. forthe the number estimation problem, which essentially have aaa non-uniform non-uniform space-time sampling pattern. If of locations is comparable to the size of the network, the sampling If the of sampling locations is comparable to the size ofspace-time the network, the sampling for the number estimation problem, which essentially haveFor a non-uniform sampling pattern. If number of locations is to the size the the sampling pattern will have a high degree of redundancy. these an efficient If the the number of sampling sampling locations is comparable comparable to the cases, size of ofusing the network, network, thealgorithm, sampling pattern will have a high degree of redundancy. For these cases, using an efficient algorithm, If the number of sampling locations is comparable to the size of the network, the sampling pattern will have a high degree of redundancy. For these cases, using an efficient algorithm, we present method for finding feasible subset samples that have sparser space-time pattern willa have a high degree ofa redundancy. Forof these cases, using anaa efficient algorithm, we aa method for finding feasible subset of samples that have sparser space-time pattern will aIt high degree ofaa redundancy. For cases, using anaasamples: efficient algorithm, we present present ahave method for finding aspatial feasiblesamples subset ofthese samples that have sparser choosing space-time we present method for finding feasible subset of samples that have sparser space-time sampling pattern. is shown that can be traded for time to sampling pattern. It shown that samples can be for time choosing to we present a amethod finding feasible subset of samples that have sparser space-time sampling pattern. It is isfor shown thataspatial spatial samples can be traded traded fortaking time asamples: samples: choosing to sample from smaller set of subsystems must be compensated by more frequent time sampling pattern. It is shown that spatial samples can be traded for time samples: choosing to sample from a smaller set of subsystems must be compensated by taking more frequent time sampling pattern. It is shown that spatial samples can be traded for time samples: choosing to sample from a smaller set of subsystems must be compensated by taking more frequent time samples from those subsystems. Furthermore, we apply the Kadison-Singer paving solution to sample from a smaller set of subsystems must be compensated by taking more frequent time samples from those subsystems. Furthermore, we apply the Kadison-Singer paving solution to sample from a smaller set of subsystems must be compensated by taking more frequent time samples from those subsystems. Furthermore, we apply the Kadison-Singer paving solution to sparsify a feasible redundant sampling strategy with guaranteed estimation bounds. We support samples from those subsystems. Furthermore, we apply the Kadison-Singer paving solution to sparsify a feasible redundant sampling strategy with guaranteed estimation bounds. We support samples from those subsystems. Furthermore, we apply the Kadison-Singer paving solution to sparsify a feasible redundant sampling strategy with guaranteed estimation bounds. We support sparsify a feasible redundant sampling strategy with guaranteed estimation bounds. We support our theoretical findings via several numerical examples. our findings via numerical sparsify a feasible redundant sampling strategyexamples. with guaranteed estimation bounds. We support our theoretical theoretical findings via several several numerical examples. our theoretical findings via several numerical examples. © 2018, IFAC (International of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. our theoretical findings viaFederation several numerical examples. 1. INTRODUCTION observations from aa continuous-time linear time-invariant 1. INTRODUCTION observations from linear 1. INTRODUCTION INTRODUCTION observations from continuous-time linear time-invariant time-invariant 1. observations from aa continuous-time continuous-time linear time-invariant network, which ensures the stable estimation of its initial network, which ensures the stable estimation of its 1. INTRODUCTION observations from a continuous-time linear time-invariant network, which ensures the stable estimation ofconditions its initial initial state. Then, we demonstrate that under simple network, which ensures the stable estimation of its initial Estimation of the state of a linear system based on the state. Then, we demonstrate that under simple conditions Estimation of the state of aa linear system based on the network, which ensures theand stable estimation ofconditions itsfeasible initial state. Then, we demonstrate that under simple Estimation of the state of linear system based on the for the sampling locations times, we can build state. Then, we demonstrate that under simple conditions noisy observations is an important problem in control Estimation of the state of a linear system based on the noisy observations is an important problem in control for the sampling locations and times, we can build feasible state. Then, we demonstrate that under simple conditions for the sampling locations and times, we can build feasible Estimation of the state of a linear system based on the noisy observations is an important problem in control sampling strategies equipped their leastthe sampling locations andwith times, we appropriate can build feasible theory and dynamical dynamical systems. The notion notion of observability, observability, noisy observations is systems. an important problem in control for theory and The of sampling strategies equipped their appropriate leastfor the sampling locations andwith times, weestimation can build feasible sampling strategies equipped with their appropriate leastnoisy was observations isby anKalman, important problem in control theory and dynamical systems. The notion of sampling strategies equipped with their appropriate leasttheory andintroduced dynamical systems. Theplays notion of observability, observability, squares estimators. Furthermore, the quality which a significant role in which was introduced by Kalman, plays a significant role in squares estimators. Furthermore, the estimation quality sampling strategies equipped with their appropriate leastsquares estimators. Furthermore, thecharacterizable. estimation quality quality theory andand dynamical systems. The notion of observability, which was introduced by Kalman, plays a role in which was introduced by Kalman, plays a significant significant role in squares using these estimators is completely Beestimators. Furthermore, the estimation this field, was the key to the major advance in the field, using these estimators is completely characterizable. Bethis field, and was the key to the major advance in the field, squares estimators. Furthermore, the estimation quality using these estimators is completely characterizable. Bewhich was introduced by Kalman, plays a significant role in this field, and was the key to the major advance in the field, cause in certain designs, the number of the space-time using these estimators is completely characterizable. BeKalman-Bucy filter (Kalman and Bucy (1961)) and its this field, and was the key to the major advance in the field, Kalman-Bucy filter (Kalman and Bucy (1961)) and its cause in certain designs, the number of the space-time using these estimators is completely characterizable. Because in certain designs, the number of the space-time this field, and was the key to the major advance in the field, Kalman-Bucy filter (Kalman and Bucy (1961)) and its samples could be initially large, we look at randomized cause in certain designs, the number of the space-time extensions such as extended Kalman filter (Anderson and Kalman-Bucy filter (Kalman and Bucy (1961)) and its extensions such as extended Kalman filter (Anderson and samples could be initially large, we look at randomized cause in certain designs, the number of the space-time samples could be initially large, we look at randomized Kalman-Bucy filter (Kalman and Bucy (1961)) and its extensions such as extended Kalman filter (Anderson and samples could be initially large, we look at randomized extensions such as extended Kalman filter (Anderson and for the of these i.e., Moore (1979)) (1979)) and and unscented unscented Kalman Kalman filter filter (Julier (Julier and methods Moore methods could for reducing reducing the number number of look theseatsamples; samples; i.e., be initially large, weof randomized methods for the samples; i.e., extensions such and as The extended Kalman filter Moore unscented Kalman filter (Julier and Moore (1979)) (1979)) and unscented Kalman filter (Julier filter and samples finding feasible sparse subsets of aa given sampling strategy. methods for reducing reducing the number number of these these samples; i.e., Uhlmann (2004)). distributed version of(Anderson Kalman finding feasible sparse subsets of given sampling strategy. Uhlmann (2004)). The distributed version of Kalman filter methods for reducing the number of these samples; i.e., finding feasible sparse subsets of a given sampling strategy. Moore (1979)) and unscented Kalman filter (Julier and Uhlmann (2004)). The distributed version of Kalman filter We have included several examples to help understand our finding feasible sparse subsets of a given sampling strategy. in the context of networks was introduced in Olfati-Saber Uhlmann (2004)). The distributed version of Kalman filter in the context of networks was introduced in Olfati-Saber We have included several examples to help understand our finding feasible sparse subsets of a given sampling strategy. We have included several examples to help understand our Uhlmann (2004)). The distributed version of Kalman filter in the context of networks was introduced in Olfati-Saber theoretical contributions. We have included several examples to help understand our (2007). The least-squares estimators for initial state estiin the context of networks was introduced in Olfati-Saber (2007). The least-squares estimators for initial state estitheoretical contributions. We have included several examples to help understand our theoretical contributions. in the context of networks was introduced in Olfati-Saber (2007). The least-squares estimators for initial state estitheoretical contributions. (2007). The least-squares estimators for initial state estimation are also of paramount importance (Boyd (2008)). mation are also of importance (Boyd (2008)). Notation: we use R and Z to denote the real and integer contributions. (2007). The least-squares estimators for initial state esti- theoretical mation are also of paramount paramount importance (Boyd (2008)). Notation: we use R and Z to denote the real and integer On the other hand, developing parsimonious sampling mation are also of paramount importance (Boyd (2008)). n Notation: we use use R R and and Z to to denote the real real andforinteger integer On the are other hand, developing parsimonious sampling we Z denote the and numbers, respectively. The set of standard basis R is n mation of importance (2008)). Notation: On parsimonious sampling n numbers, respectively. The set of standard basis R strategies isalso onehand, of paramount thedeveloping important problems(Boyd in estimation On the the other other hand, developing parsimonious sampling n Notation: we{euse R. , and ZWe to denote the real andfor integer n is numbers,by respectively. The set of standard basis for R is strategies is one of the important problems numbers, respectively. The set of standard basis for R is in estimation denoted , . . e }. use the block capital letters 1 n On the other hand, developing parsimonious sampling strategies is one of the important problems in estimation n by {e ,, .. .. .. ,, eenn }. We the block capital and control of networked systems problems as they they make make network denoted strategies is of onenetworked of the important in estimation 1 numbers, respectively. The setuse ofoperator. standard basis forletters R is denoted by {e }. We use the block capital letters 1 and control systems as network to denote a matrix or a linear The transpose denoted by {e , . . . , e }. We use the block capital letters 1 n 1 n a linear operator. The transpose strategies is of onenetworked of the cost-effective important in estimation and control control of networked systems problems asdue they make network and systems as they network to denote aa {e matrix design problems more to make their reduced denoted T operator. . . , eor use the block capital letters to denote or linear The transpose 1 , .denoted n }.a design problems more due to their reduced of matrix is by A .operator. The identity matrix of to denotebyA a matrix matrix or a We linear The transpose T and control of networked systems asmay they make network design problems more cost-effective cost-effective due to their reduced T of matrix A is denoted by A The identity matrix of sampling requirements. In fact, we be interested in design problems more cost-effective due to their reduced T ..operator. to denote a matrix or a linear The transpose T of matrix A is denoted by A The identity matrix of sampling requirements. In fact, we may be interested in of matrix A is denoted by A . The identity matrix of appropriate size is denoted by I. For a positive semidesign problems more cost-effective due to their reduced sampling requirements. In fact, we may be interested in T I. For a positive semiappropriate size is denoted by particular structures for the samples of the system in space sampling requirements. In fact, we may be interested in n×n of matrix A size is X denoted by A . The identity matrix of appropriate size is∈denoted denoted by I. For a positive semiparticular structures for the samples of the system in space appropriate is by I. For a positive semidefinite matrix R , we use 0 ≤ λ (X) ≤ · · · 1 (X) ≤ · · · ≤ sampling In fact, we of may be interested in definite matrix X ∈ Rn×n particular structures the samples the system in n×n use 00 ≤ ≤ and time.requirements. This, for for for instance, may be motivated byspace the particular structures for the samples of the system inby space n×n ,, we 1 appropriate sizeX isits denoted by I. For aλ positive semin×n definite matrix ∈ R we use ≤ λ (X) ≤ · · · ≤ 1 and time. This, instance, may be motivated the λ (X) to denote ordered eigenvalues. Denoting j = definite matrix X ∈ R , we use 0 ≤ λ (X) ≤ · · · 1 n (X) to denote its ordered 1Denoting j ≤ particular structures for the samples of system inby space and This, instance, may be motivated the n×n and time. time. This,offor for instance, maychannels, bethe motivated bylevels the √ λ eigenvalues. = limited number communication or lower n definite matrix X ∈ R , we use 0 ≤ λ (X) ≤ · · · ≤ λ (X) to denote its ordered eigenvalues. Denoting j = 1 √ n limited number of communication channels, or lower levels λ (X) to denote its ordered eigenvalues. Denoting j = n √ −1, we define 2πj Z := {2πjk| k ∈ Z}. For two vectors n and time. This, for instance, may be motivated by the limited number of communication communication channels, or lower lower levels √−1, we define 2πj Z := {2πjk| k ∈ Z}. For two vectors of energy consumption (that is crucial in mobile robotics). limited number of channels, or levels λ (X) to denote its ordered eigenvalues. Denoting j = n −1, we define 2πj Z := {2πjk| k ∈ Z}. For two vectors n of energy consumption (that is crucial in mobile robotics). √ , x, y denotes their dot product. We use x, y ∈ R −1, we define 2πj Z := {2πjk| k ∈ Z}. For two vectors limited number of communication channels, or lower levels of energy consumption (that is crucial in mobile robotics). n n Therefore, we consider consider sampling sampling from aainnetwork network at certain x, of energy consumption (that is crucial mobile at robotics). y denotes their dot product. We use yyy ∈ R n ,, x, −1, we define 2πj Z := {2πjk| k ∈ Z}. For two vectors n x, ∈ R x, y denotes their dot product. We use Therefore, we from certain , x, y denotes their dot product. We use x, ∈ R (a, Σ) to denote aa normal random variable with mean a of energy and consumption (that is crucial mobile at robotics). Therefore, we from a certain n locations in certain certainsampling times. For this purpose, we need N Therefore, we consider consider sampling from ainnetwork network at certain N Σ) to normal random variable with mean a x,matrix y aa denotes their dot product. use x, (a, y covariance ∈ N (a, Σ) to denote normal random variable with mean a locations and in times. For this purpose, we need N (a, Σ) R to ,denote denote normal random variable with We mean a and Σ. Therefore, we consider sampling from a network at certain locations and in certain times. For this purpose, we need locations and in certain times. For this purpose, we need and covariance matrix Σ. to characterize the sampling strategies that provide us N (a, Σ) to denote a normal random variable with mean a and covariance matrix Σ. to characterize the sampling strategies that provide us and covariance matrix Σ. locations and in certain times. For this purpose, we need to characterize the sampling strategies that provide us with a stable stable estimation estimation for the the initial state state ofprovide the linear linear to characterize the sampling strategies thatof us and covariance2.matrix Σ. with a for initial the PROBLEM STATEMENT to characterize the sampling strategies provide us with aa stable estimation for initial the 2. PROBLEM STATEMENT network. In this this direction, inthe Aldroubi etthat al.of (2017), the with stable estimation forin the initial state state of(2017), the linear linear 2. network. In direction, Aldroubi et al. the 2. PROBLEM PROBLEM STATEMENT STATEMENT with a stable estimation for the initial state of the linear network. In this direction, in Aldroubi et al. (2017), the network. In this direction, in Aldroubi et al. (2017), the authors have highlighted the role of frame theory in 2. PROBLEM STATEMENT We consider linear dynamical networks with multiple authors have highlighted the role of frame theory in the network. In this direction, in role Aldroubi et al. (2017), authors have have highlighted the role ofa frame frame theory in the We consider linear dynamical networks authors highlighted the of theory in the construction of the initial state of discrete-time linear We consider linear dynamical networks with multiple We consider linear dynamical networks with with multiple multiple subsystems and the state vector construction of the initial state of aa frame discrete-time linear authors have highlighted the role of theory in the construction of the initial state of discrete-time linear subsystems and the state state vector networks system by space-time sampling, and in the problem of construction of the initial state of a discrete-time linear We consider linear dynamical with multiple subsystems and the vector T system by space-time sampling, and in the problem of subsystems and the state vector , x = [x construction of the state ofand a discrete-time linear 1 ,, x 2, . . . , x n ]]T system sampling, in problem of sensor-scheduling hasinitial been investigated inthe Tzoumas et al. al. x x x = [x system by by space-time space-time sampling, and inin the problem of subsystems and the state ,,vector x x ]]TTT ,,, x = [x sensor-scheduling has been investigated Tzoumas et x2222 ,,, ... ... ... ,,, of xnnnnsubsystem x state = [x1111variable system by space-time sampling, and in the problem of sensor-scheduling has been investigated in Tzoumas et al. ∈ R is the ii for every where x T i sensor-scheduling has been investigated in Tzoumas et al. (2016). In In this this paper, paper, we we derive derive the the necessary necessary and and sufficient sufficient where xi ∈ R is thex state every , x2 , . . . , of xnsubsystem ] , = [x1variable (2016). R is the state variable of subsystem ii for for every x ii. ,∈ sensor-scheduling has been investigated in Tzoumas et al. where (2016). paper, we derive the and ii = 1, . . n. These subsystems are interconnected and the ∈ R is the state variable of subsystem for every where x (2016). In In this this paper, we derive the necessary necessary and sufficient sufficient i conditions for the sampling locations and times of noisy = 1, . . . , n. These subsystems are interconnected and the conditions for the sampling locations and times of noisy is the state variable of subsystem i for every where ioverall = 1, 1, ..x..dynamics n.RThese These subsystems are interconnected and the i.. ,,∈n. i = subsystems are interconnected and the (2016). In this paper, we derive the necessary and sufficient conditions for the sampling locations and times of noisy is governed by conditions for the sampling locations and times of noisy overall dynamics is governed by This research is supported by NSF CAREER ECCS-1454022 and i = 1, . . . , n. These subsystems are interconnected and the overall dynamics is governed by conditions for the sampling locations and times of noisy overall dynamics is governed by This research is supported by NSF CAREER ECCS-1454022 and x ˙˙ = Ax (1) This research is supported by NSF CAREER ECCS-1454022 and ONR YIP N00014-16-1-2645. This research is supported by NSF CAREER ECCS-1454022 and x = Ax (1) overall dynamics is governed by x˙˙ = = Ax Ax (1) ONR N00014-16-1-2645. x (1) ONR YIP N00014-16-1-2645. ThisYIP research is supported by NSF CAREER ECCS-1454022 and ONR YIP N00014-16-1-2645. x ˙ = Ax (1) ONR YIP N00014-16-1-2645.
2405-8963 © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Copyright © 2018 IFAC 408 Peer review© under responsibility of International Federation of Automatic Control. Copyright 2018 IFAC 408 Copyright © 408 Copyright © 2018 2018 IFAC IFAC 408 10.1016/j.ifacol.2018.12.070 Copyright © 2018 IFAC 408
IFAC NecSys 2018 Groningen, NL, August 27-28, 2018
Hossein K. Mousavi et al. / IFAC PapersOnLine 51-23 (2018) 408–413
for a time-invariant matrix A. Let us assume that the initial state of the network x0 ∈ Rn is unknown. Suppose that samples can only be collected from a subset of subsystems Ω = {i1 , i2 , . . . , ip } ⊂ {1, . . . , n}, where Ω is called the set of sampling locations. At every spatial location i ∈ Ω, sensors are allowed to take finite number of samples with different time stamps, where the finite set of such sampling times is denoted by Θi . The following set S = (i, t) for every location i ∈ Ω : t ∈ Θi is called a sampling strategy. For a given sampling strategy S, the corresponding vector of observations is (2) y = xi (t) + ξi (t) (i,t)∈S ,
where measurement noises ξi (t) in all samples are assumed to be independent from each other and have normal (Gaussian) distributions with zero mean and σ 2 variance. For a given set of sampling locations Ω, the corresponding output matrix is denoted by T CΩ = ei1 | . . . | eip . Assumption 1. The set of sampling locations Ω is chosen such that the pair (A, CΩ ) is observable.
The research problem is to characterize properties of sampling strategies that allow us to estimate an initial state of the linear network (1) using sparse sets of spatio-temporal samples. In fact, we consider the following specific objectives for this paper: (i) to characterize the conditions on the sampling strategies, which enable us to estimate the initial state of the network x0 from the noisy observations y, (ii) to provide a way of constructing sampling strategies such that we can quantify the quality of the resulting estimation, and (iii) to come up with methods that ensure the scalability and computational efficiency of the method. 3. CHARACTERIZATION OF SAMPLING STRATEGIES We apply tools from finite frame theory to reformulate the observability problem and investigate properties of estimation using these sampling strategies. Vector Reconstruction in Frame Theory: The contents of this subsection are based on adjusted materials from reference Casazza et al. (2013). Definition 2. Given a family of vectors (φi )m i=1 , the corresponding analysis operator T : Rn → Rm is T(x) := [x, φi ]i=1,...,m , and its frame operator S : Rn → Rn is m S(x) := x, φi φi . i=1
The operator T has the canonical matrix representations T = [φ1 | . . . |φm ]T ∈ Rm×n . Moreover, the canonical matrix representation for the frame operator S, namely frame matrix, is given by (3) S = TT T ∈ Rn×n , n in R is a frame Definition 3. A family of vectors 1 (φi )m i=1 for Rn , if there exists some constants 0 < α ≤ β such that 1
(φi )m i=1
When we use notation for a familiy of vectors φ1 , . . . , φm , we implicitly allow repeated elements in our representation.
409
αx22
≤
m i=1
| x, φi |2 ≤ βx22
409
for all x ∈ Rn .
The largest lower frame bound and smallest upper frame bound are called the optimal frame bounds. Proposition 4. Let us consider a family of vectors Φ = (φi )i=1,...,m in Rn . The following statements are equivalent: (i) The family of vectors Φ is a frame for Rn . (ii) The family of vectors Φ span Rn . Thus, m = |Φ| ≥ n. (iii) The corresponding frame matrix is positive definite, i.e., S 0, with optimal bounds α = λ1 (S) and β = λn (S). (iv) The corresponding analysis operator T is injective 2 with a pseudo-inverse −1 T T† := TT T T = S−1 TT (4) which is a left-inverse satisfying T† T = I.
Let us now consider a canonical problem where the goal is to reconstruct vector x ∈ Rn from a vector of observation that is characterized as y = Tx = [x, φi ]i=1,...,m ∈ Rm . (5)
In the following proposition, we recall that T† is a choice for the reconstruction of vector x. Proposition 5. If the family of vectors Φ is a frame for Rn , then any vector x ∈ Rn corresponding to the observations y ∈ Rm given by (5) can be reconstructed according to x = T† y, where T is the analysis operator associated with Φ.
(6)
Initial State Reconstruction: the solution of linear network (1) is given by x(t) = eAt x0 , where its i’th component is T (7) xi (t) = eTi eAt x0 = x0 , eA t ei . From definition of a frame, (7) reveals that the following families of vectors are the only candidates for building constructors for recovery of the initial state of the network. Theorem 6. Suppose that Ω is the set of sampling locations and Θi is the set of sampling times for each i ∈ Ω. Every initial state of linear network (1) can be reconstructed from the set of samples that are collected according to sampling strategy S = {(i, t)| i ∈ Ω, t ∈ Θi } if and only if the following family of vectors is a frame for Rn . T (8) Φ = eA t ei (i, t) ∈ S .
This result asserts that initial state of the network can be recovered from the vector of observations y = Tx0 = [xi (t)](i,t)∈S using the following equation x0 = T† y, where T is the analysis matrix of frame (8). Remark 7. A frame for Rn must contain at least n vectors. Hence, the number of components in frame (8) satisfies |Θi | ≥ n. i∈Ω
This inequality implies that taking less spatial samples should be compensated by taking more temporal samples. 2
This implies that its matrix representation has full column rank.
IFAC NecSys 2018 410 Groningen, NL, August 27-28, 2018
Hossein K. Mousavi et al. / IFAC PapersOnLine 51-23 (2018) 408–413
This hints at an inherent tradeoff between the minimum number of samples in space and time required for successful initial state reconstruction. It turns out that observability at the sampling locations is a necessary condition for the reconstruction problem. Theorem 8. Suppose that the family of vectors Φ in (8) is a frame for Rn . Then, the pair (A, CΩ ) must be observable. As a result, if the sampling locations Ω create an unobservable output matrix CΩ , then the initial state construction remains infeasible for all choices of sampling times. Initial State Estimation from Noisy Observations: Instead of pure measurements (5), suppose that a noisy observation vector is collected yˆ = y + ξ ∈ Rm (9) m in which ξ ∈ R is a zero mean Gaussian measurement with independent components and covariance noise E ξξ T = σ 2 I. For linear network (1), equation (9) can be rewritten in the following form (10) yˆ = Tx0 + ξ. ˆ0 and define the Let us denote an estimation of x0 by x corresponding estimation error as (11) η := x ˆ 0 − x0 . In order to quantify quality of the resulting estimation from a given sampling strategy (or equivalently, frame), we adopt standard deviation of the estimation error as our estimation measure, which is defined by ρ(Φ) = E{η22 }.
This measure has been widely used to compute an optimal estimation via least-squares approximation; we refer to Goyal et al. (2001) for detailed discussions. Lemma 9. Suppose that a noisy observation vector yˆ as in (10) is given. Then, (12) x ˆ0 = T† yˆ x0 } = x0 . Moreis an unbiased estimator for x0 with E{ˆ over, the estimation measure ρ(Φ) is monotone 3 and can be characterized as 1/2 n −1 λi (S) (13) ρ(Φ) = σ i=1
where λ1 (S), . . . , λn (S) are the eigenvalues of the corresponding frame matrix S.
Construction of Observability Frames: Let us denote distinct eigenvalues of state matrix A by λ1 , . . . , λq for some q ≤ n and its corresponding minimal polynomial 4 by I (λ − λi )pi pA (λ) = i=1
for some positive integers pi . To state our next result, we need to define the row vector map (14) E(t) := eλi t tk i=1,...,q ∈ R1×n , k=0,...,pi −1
where n ≤ n is the degree of the polynomial pA (λ).
An operator ρ : Φ → R+ is called monotone if ρ(Φ2 ) ≤ ρ(Φ1 ) for every two frames that satisfy Φ1 ⊆ Φ2 . 4 This should not be confused with the characteristic polynomial of a matrix, which is always of degree n and only in certain cases coincides with the minimal polynomial Burrow (1973). 3
410
Theorem 10. Suppose that a sampling strategy S = {(i, t)| i ∈ Ω, t ∈ Θi } is adopted such that:
• At every i ∈ Ω, Mi := |Θi | ≥ n samples are collected, • Ei has full column rank, where
Ei := [E(t)]t∈Θi ∈ RMi ×n . Then, under Assumption 1, the family of vectors T Φ = eA t ei (i, t) ∈ S ,
(15)
(16)
forms a frame for R . n
Any frame for Rn must have at least n vectors. Hence, prior to the application of Theorem 10, a necessary condition for the total number of sampling times is |Θi | ≥ n. |S| = i∈Ω
On the other hand, Theorem 10 requires |Θi | ≥ n. Comparing these two arguments implies that the resulting frame Φ will have many redundant elements as the number of locations |Ω| increases. This motivates our investigation in the next section to seek for scalable algorithms to construct sparse frames (in space and time) out of highly redundant observability frames. The set of all time instances for which Ei is not full column rank have zero Lebegues measure in the corresponding design space. In fact, Theorem 10 suggests that one can comfortably skip rank verification step. Theorem 11. For a given τ > 0, suppose that the sampling times in Theorem 10 are drawn randomly and independently from the uniform distribution over [0, τ ]. Then, with probability 1 family of vectors in (16) is a frame for Rn . Example 12. Let us consider the two-dimensional system 0 −1 x˙ = x. (17) 1 0 We choose to sample only from the first subsystem; i.e., Ω = {1}, CΩ = [1 0]. Thus, (A, CΩ ) is observable and eAt =
cos(t) sin(t)
− sin(t) . cos(t)
Let us pick sampling times t1 , t2 ∈ [0, τ ], i.e., M1 = 2. The corresponding family of vectors is T T cos(t1 ) cos(t2 ) Φ = e A t1 e 1 , e A t2 e 1 = , . − sin(t1 ) − sin(t2 )
In this case, matrix (15) is E1 =
ejt1 e−jt1 ejt2 e−jt2
⇒ det(E1 ) = 2j sin(t1 − t2 ).
Hence, according to Theorem 10, if t1 − t2 = kπ for k ∈ Z, then Φ is a frame. Alternatively, if we compute the frame matrix S, using trigonometric identities, we get det (S) = sin2 (t1 −t2 ), which gives us the same constraints on the sampling times. Since the Lebegues measure of the points for which sin2 (t1 − t2 ) = 0 is indeed zero in [0, τ ] × [0, τ ], any random choices for t1 and t2 will result into a frame with probability 1. The latter observation agrees with Theorem 11. By induction, for sampling times Θ1 = {t1 , . . . , tM }, we have det (S) =
i=1,...,M, j=i+1,...,M
sin2 (ti − tj ).
(18)
Again, if ti − tj = kπ for k ∈ Z, we get a frame out of these observations. Random sampling results in a √ also frame with probability 1. Moreover, 2σ/ det(S).
IFAC NecSys 2018 Groningen, NL, August 27-28, 2018
Hossein K. Mousavi et al. / IFAC PapersOnLine 51-23 (2018) 408–413
Now, we briefly look at periodic sampling strategies, i.e., Θi = {0, δ, . . . , (M − 1)δ} for every i ∈ Ω, where δ > 0 is a known sampling step-size. Theorem 13. Suppose that the sampling step-size satisfies (λi − λk )δ ∈ 2πj Z (19) for all distinct eigenvalues λi and λk of state matrix A. If M ≥ n, then the family of vectors Φ = Bkδ ei i ∈ Ω, k = 0, . . . , M − 1 , (20) T
where Bδ := eA δ , forms a frame for Rn . Theorem 14. If all eigenvalues of A are real, then for every step-size δ > 0 and sampling horizon M ≥ n, the family of vectors (20) is a frame for Rn .
Example 12 (Continued). A sufficient condition for the sampling step-size for linear system (17) is j − (−j)δ = 2jδ ∈ 2πj Z ⇒ δ ∈ πZ. Alternatively, because ti − tj = (i − j)δ, using the expression for det(S) in (18), Φ is a frame if δ ∈ πZ. 4. SPARSIFICATION OF SAMPLING STRATEGIES As discussed in the previous section, sampling strategies that are obtained via certain methods may result in extremely redundant frames, i.e., frames with i∈Ω |Θi | n. Thus, it is desirable to sparsify a given strategy in order to reduce sensing and computational requirements. The goal of this section is to solve the following problem: Given a sampling strategy S with underlying observability frame Φ, find a new sampling strategy Ss with underlying frame Φs such that: (i) Ss ⊂ S, (ii) |Ss | < |S|, and (iii) Φs is a frame for Rn , i.e., the frame matrix Ss satisfies φi φTi 0. (21) Ss := φi ∈Φs
Sparsification by Leverage Scores: Our approach is based on sparsification via graph effective resistance that was originally proposed by Spielman and Srivastava (2011), where they developed a sparsification method for weighted Laplacians that depends on the concentration properties of sums of random outer-products. The key similarity between a graph Laplacian and frame matrix S is that both can be written as a sum of rank-one matrices, i.e., φi φTi . S= φi ∈Φ
A summary of our frame sparsification method is given in Algorithm 1, where we employ the notion of leverage scores, that are defined by (22) Ri := φTi S−1 φi > 0 for all φi ∈ Φ, to calculate sampling probabilities. Algorithm 1 is well defined as Pi := Ri /n ≥ 0 and |Φ| i=1
Pi =
|Φ|
1 Tr(S−1 φi φTi ) = 1. n i=1
The following theorem asserts that Algorithm 1 results in a sampling strategy whose size is almost linear in network size with success rate greater than 50%. Theorem 15.√Suppose that Algorithm 1 is executed for some ∈ (1/ n, 1). Then, Ss is positive-definite and Φs is a frame for Rn with probability at least 0.5. 411
411
Algorithm 1 Sparsification via Randomization Input: Sampling strategy S, parameter > 0 Output: Sparsified strategy Ss and family of vectors Φs Initialize: q = O(n log n/ 2 ), Ss = 0, ,Ss = ∅, Φs = ∅ for i = 1 to q do Randomly sample φi with probabilities Pi = Ri /n if φi ∈ / Ss then, Add φi to Φs and associated (i, t) to Ss update the frame Ss ← Ss + φi φTi end if end for
Fig. 1. The worst case relative performance degradation based on the bound of Theorem 17 Space-Time Tradeoff: let us define the fraction (23) fi := |Θi (Φs )|/|Θi (Φ)|, where Φ is a highly redundant frame and Φs is a sparsified frame produced by feeding Φ to Algorithm 1. Suppose that |Θi (Φ)| = M for each sampling location i ∈ Ω in Φ. Next, let us define average quantity 1 fi . (24) f¯ := |Ω| i∈Ω
The number of samples in Algorithm 1 is q, while the number of unique samples is qu = |Φs | (elements of Φs correspond to unique space-time combinations by construction). Since qu ≤ q, for some c > 0, E {qu } = E f¯M |Ω| ≤ q = cn log n/ 2 . This result implies that |Ω| · E f¯ ≤ cn log n/(M 2 ). If the number of elements of Φ is large enough, the probability of selecting repeated elements through the randomized method decreases. This can be observed via appropriate simulations as well (see Fig. 3 in Example 18). Therefore, in practice q − E{qu } shrinks and the following approximation can be inferred. c n log n for large enough |Φ|. (25) |Ω| · E f¯ ≈ M 2 This illustrates a tradeoff between the number of sampling locations Ω and the average fraction of sampling times per location f¯ in Φs . For a certain network and fixed number of sampling times M and sparsification parameters and c, if the number of sampling locations |Ω| in Φs decreases, the average fraction of sampling times in Φs relative to Φ has to hike to compensate the total number of samples.
While Algorithm 1 gives us a frame with a probability of more than 50% and O(n log(n)/ 2 ) vectors, its drawback is the lack of estimates on the estimation quality of the sparsified strategy. In the following subsection, we recall a method which offers spectral bounds under conditions. Sparsification and Kadison-Singer Paving Solution: Under conditions on the maximum leverage scores, we can par-
IFAC NecSys 2018 412 Groningen, NL, August 27-28, 2018
Hossein K. Mousavi et al. / IFAC PapersOnLine 51-23 (2018) 408–413
Fig. 2. Spatio-temporal samples of frames Φ and Φs , where |Φ| = 1760 and |Φs | = 218 (see Example 18). tition the components of the frame in a balanced way. The next theorem is based on Corollary 1.3 of the seminal paper Marcus et al. (2013). Theorem 16. Suppose that the leverage scores in (22) satisfy Ri ≤ r. Let us randomly partition Φ into two subfamilies Φ1 and Φ2 such that every element of Φ, independent of others, belongs to either of the partitions with probability 1/2. Then, with a positive probability, the resulting partition for j = 1 and 2 will satisfy
1−
(1 +
√
2r)2
2
S
φi ∈Φj
φi φTi
1+
(1 +
√ 2 2r) S, 2
Let us consider a highly redundant observation frame Φ with an estimation measure ρ(Φ). Theorem 16 allows us to deduce the next result. Theorem 17. Suppose √ that the leverage scores in (22) satisfy Ri ≤ r < 1.5 − 2. Then, the resulting subfamilies Φ1 and Φ2 from Theorem 16 are both frames for Rn . Moreover, the relative performance degradation is bounded by ρ(Φj ) − ρ(Φ) /ρ(Φ) ≤ µ(r) (26) for j = 1, 2, where the function µ(r) defined as −1/2 √ − 1. µ(r) := 1 − ( 2r + 1)2 /2
The quantity µ(r) is a worst-case bound on the relative performance degradation of the sparse frames Φ1 and Φ2 . This function has been illustrated in Fig. 1. In simulations, we observe that comparably better bounds are achievable. 5. NUMERICAL EXAMPLE Example 18. We consider a network whose coupling matrix A = [Aij ] is constructed as follows. Each subsystem is assigned a random position in [0, 1]2 ⊂ R2 . We denote the distance of the subsystems i, j by dis(i, j). We set Aij =
cij e−a 0
dis(i,j)b
dist(i, j) ≤ d/2 , dist(i, j) > d/2
(27)
where a, d > 0 and b ∈ (0, 1) are certain parameters. The coefficients cij are independently and randomly chosen from N (0, 1). For n = 40, a = 1, b = 0.5, and d = 0.6, we create a matrix A that has stable and unstable modes. Random Frame: we consider the state sampling, thus Ω = {1, . . . , n}. We select the the parameters τ = 0.12 and Mi = 44 and use Theorem 11 to randomly construct Θi and frame Φ as in (16). This frame has nM = 1760 412
Fig. 3. The mean fraction of time in sparsified frame relative to the redundant frame (see Example 18). components, with space-time representation illustrated by blue dots in Fig. 2. For a noise intensity of σ = 0.1, we get ρ(Φ) ≈ 0.0964. Frame Sparsification: we find a sparsified frame Φs using Theorem 15 for = 0.8 and q = 230 samples. The size of Φs is qu = 218. Thus, we the mean ratio of the sampling times in the sparse strategy to the original strategy is f¯ = qu /nM ≈ 0.124. The locations and times of the sampling strategy corresponding to the sparse frame Φs are illustrated by black circles in Fig. 2. The performance index of the estimation is ρ(Φs ) ≈ 0.3056. Thus, the number of space-times samples has decreased by 87% and the price is about 217% of performance degradation. Space-Time Tradeoff in Sparse Frames: for different number of sampling times Mi = M = 40, 42, . . . , 80, we create dense frames. Then, we run the sparsification algorithm of Theorem 15 and find 50 different successful samples of sparsified frame Φs for each frame. We compute the empirical value of the mean fraction f¯ defined in (24). In Fig. 3, we observe that as M increases, the approximation (25) becomes tighter, because the ratio of qu = |Φs | to the random samples q in Algorithm 1, approaches 1. Performance-Locality Tradeoff: for the fixed coupling parameters a = 1 and b = 0.5 used in (24), we study the effect of locality diameter d for d = 0.4, 0.8, 1.2. For each value of d, we vary the number of samples from M = 62 to M = 88. Moreover, we scale-up the time-horizon τ proportional to M . For each combination of d and M , we create 10 different random frames based on Theorem 11. Then, we compute the mean value of the estimation measure ρ(Φ). In Fig. 4, we observe that for more localized networks (i.e., for networks with smaller connectivity diameter d), the
IFAC NecSys 2018 Groningen, NL, August 27-28, 2018
Hossein K. Mousavi et al. / IFAC PapersOnLine 51-23 (2018) 408–413
413
Fig. 5. Performance degradations (see Example 18). Fig. 4. The change in the performance of the redundant frames vs. localization parameter d (see Example 18). same quality of estimation can be realized with a smaller number of samples in the space and time. Random Partitioning: reconsider the highly redundant frame Φ that we have built (with the space-time representation shown in Fig. 2). Consider the random partitioning of this dense frame into two subframes Φ1 and Φ2 according to Theorem 16. We repeat this random partitioning for 5 × 105 different experiments and for each case, we record the minimum and maximum relative degradation based on two resulting subframes Φ1 and Φ2 ; i.e., each gives us max
j=1,2
ρ(Φj ) − ρ(Φ) ρ(Φj ) − ρ(Φ) , min . j=1,2 ρ(Φ) ρ(Φ)
Fig. 6. Estimation measures of the sparsified frames by Algorithm 1 and greedy algorithm (see Example 18). sparsification is an efficient approach for increasing the computational efficiency of the estimation. Note that the suitability of different sparsification methods should be inferred based on the size of the network. REFERENCES
The histogram of this data is depicted in Fig. 5. In this example, the minimum and maximum degradations are less than 0.55 with a high probability, whereas the theoretical estimate from Theorem 17 is r = max Ri ≈ 0.0331 ⇒ µ(r) ≈ 1.1848. φi ∈Φ
Thus, in practice, what we may achieve could be much better than the estimate bound in (26). In this case, for an almost 50% decrease in number of space-time samples, we underwent a performance loss of less than 55% (in most outcomes of the experiments). Randomized vs. Greedy: next, we compare the estimation measure of the sparsified frames resulting from Algorithm 1 and the greedy method of sparsification. In Algorithm √ 1, we change the value of in (1/ n, 1], and create 25 different sparsified frames per each value of . Then, we record the estimation measure of the best sparsified frame. On the other hand, we use the greedy algorithm to get a frame with a similar number of observations. In Fig. 6, we compare the estimation quality of these methods against each other, where it turns out the randomized method gives us sensing strategies with values of the estimation measure that are close to the estimation measure of the frames found using the greedy method. It took about 1.68 seconds and 26.71 seconds to conduct the sparsifications using the randomized method (25 experiment per point) and greedy methods, respectively (on a PC). 6. CONCLUSION We can spread out the noisy observations from certain network subsystems in a flexible manner in space and time, and still be able to estimate its initial state with a known estimation quality. The role of frame theory in the formulations is highlighted. Moreover, if the sensing and computational burden of a design is heavy, a randomized 413
Aldroubi, A., Cabrelli, C., Molter, U., and Tang, S. (2017). Dynamical sampling. Applied and Computational Harmonic Analysis, 42(3), 378–401. Anderson, B.D. and Moore, J.B. (1979). Optimal filtering. Englewood Cliffs, 21, 22–95. Boyd, S. (2008). Lecture notes for ee263. Introduction to Linear Dynamical Systems. Burrow, M. (1973). The minimal polynomial of a linear transformation. The American Mathematical Monthly, 80(10), 1129–1131. Casazza, P.G., Kutyniok, G., and Philipp, F. (2013). Introduction to finite frame theory. In Finite Frames, 1–53. Springer. Goyal, V.K., Kovaˇcevi´c, J., and Kelner, J.A. (2001). Quantized frame expansions with erasures. Applied and Computational Harmonic Analysis, 10(3), 203–233. Julier, S.J. and Uhlmann, J.K. (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3), 401–422. Kalman, R.E. and Bucy, R.S. (1961). New results in linear filtering and prediction theory. Journal of basic engineering, 83(1), 95–108. Marcus, A., Spielman, D.A., and Srivastava, N. (2013). Interlacing families ii: Mixed characteristic polynomials and the kadison-singer problem. arXiv preprint arXiv:1306.3969. Olfati-Saber, R. (2007). Distributed kalman filtering for sensor networks. In Decision and Control, 2007 46th IEEE Conference on, 5492–5498. IEEE. Spielman, D.A. and Srivastava, N. (2011). Graph sparsification by effective resistances. SIAM Journal on Computing, 40(6), 1913–1926. Tzoumas, V., Jadbabaie, A., and Pappas, G.J. (2016). Near-optimal sensor scheduling for batch state estimation: Complexity, algorithms, and limits. In Decision and Control (CDC), 2016 IEEE 55th Conference on, 2695–2702. IEEE.