Signal Processing 25 (1991) 113 133 Elsevier
113
The SVD and reduced rank signal processing L o u i s L. S c h a r f Department of Electrical and Co...
The SVD and reduced rank signal processing L o u i s L. S c h a r f Department of Electrical and Computer Engineering, University of Colorado, Boulder 80309-0425, USA Received 13 June 1990 Revised 18 June 1991
Abstract. The basic ideas of reduced-rank signal processing are evident in the original work of Shannon, Bienvenu, Schmidt, and Tufts and Kumaresan. In this paper we extend these ideas to a number of fundamental problems in signal processing by showing that rank reduction may be applied whenever a little distortion may be exchanged for a lot of variance. We derive a number of quantitative rules for reducing the rank of signal models that are used in signal processing algorithms.
Zusanunenfassung. Die Grundgedanken der rangreduzierten Signalverarbeitung gehen aus den Arbeiten yon Shannon, Bienvenu, Schmidt, und Tufts und Kumaresan hervor. Diese Grunds~itze werden hier auf eine Reihe elementarer Aufgabenstellungen der Signalverarbeitung/ibertragen. Dazu wird gezeigt, dal3 Rangreduktion dann angewendet werden kann, wenn hohe Varianzen gegen kleine Verzerrungen ausgetauscht werden k6nnen, Es werden einige quantitative Regeln hergeleitet, die eine Reduktion des Ranges verschiedeneI:, in der Signalverarbeitung/iblicher Modelle erm6glichen. R6sum& Les idles de base du traitement du signal fi rang r6duit sont clairement expos~es dans les travaux de Shannon, Bienvenu, Schmidt, et Tufts et Kumaresan. Nous 6tendons dans cet article ces id6es d nombre de probl6mes fondamentaux en traitement du signal en montrant que la r6duction de rang peut 6tre appliqu~e chaque lois qu'une faible distortion peut &re 6chang6e contre beaucoup de variance. Nous d6rivons plusieurs r6gles quantitatives permettant de i-eduire le rang de mod61es de signaux utilis6s dans des algorithmes de traitement.
Keywords. ARMA, analysis and synthesis, bandlimited, bias and variance, block quantizer, complex exponential, detector, distortion, eigenvalues and eigenvectors, Grammian, least squares, linear constraints, linear models, low-rank, matrix approximation, order selection, orthogonal decomposition, orthogonal subspace, parsimony, projection, pseudo-inverse, quadratic minimization, rank reduction, rate-distortion, signal processing, signal subspace, singular values and singular vectors, subspace splitting, SVD, Wiener filter. This p a p e r is d e d i c a t e d to P r o f e s s o r D e a n W . Lytle in r e c o g n i t i o n o f his t h i r t y years o f teaching at the U n i v e r s i t y o f W a s h i n g t o n in Seattle.
I. Introduction In V o l u m e X o f his OEuvres, u n d e r R u l e V for D i r e c t i o n o f the M i n d , Ren6 D e s c a r t e s said " M e t h o d consists entirely in p r o p e r l y o r d e r i n g a n d a r r a n g i n g the things to which we s h o u l d p a y attent i o n " . W i t h a t o u c h o f license we can claim this rule for statistical signal processing, wherein singular values (or eigenvalues) are p r o p e r l y o r d e r e d to
d e t e r m i n e the singular vectors to which we s h o u l d p a y attention. These singular vectors then assume the role o f ' m o d e s ' t h a t are used to c o n s t r u c t r e d u c e d - r a n k linear t r a n s f o r m a t i o n s for signal processing. I n this review o f the singular value d e c o m p o sition ( S V D ) a n d its a p p l i c a t i o n to r e d u c e d - r a n k signal processing, we begin with a n u m b e r o f philosophical c o m m e n t s a b o u t m o d e l i n g , in general, a n d the i m p o r t a n c e o f the distortion-variance trade in r e d u c e d - r a n k signal m o d e l i n g , in particular. W e i n t r o d u c e a class o f linear m o d e l s a n d illustrate their richness with a n u m b e r o f familiar e x a m p l e s f r o m the t h e o r y o f statistical signal processing. W e
establish that the SVD is the natural linear algebraic tool for determining the dominant structure of a linear model and for constructing low-rank signal processing algorithms. From here on we proceed by example to illustrate the power of the SVD (or, in some cases, the EVD) for solving problems in reduced-rank signal processing. For every example we derive a full-rank solution, based on a principle of optimality, and then show that rank can be reduced to produce a solution which is actually superior in many ways to the original solution. For example, a minimum variance unbiased (MVUB) estimator, whose mean-squared error equals its variance, can often be replaced with a reducedrank, biased estimator whose mean-squared error is smaller than that of the MVUB estimator. All that is sacrificed is a little b i a s . . , in exchange for a lot of variance saved. In fact, this is a key idea that a little distortion can often be introduced into a solution in exchange for a big savings in variance. The exchange pays off whenever the sum of distortion plus variance is smaller than the variance of the undistorted solution. We shall find that each problem brings its own natural definitions of distortion and variance and, consequently, its own rules for selecting the order of the best reducedrank solution.
2. General comments on modeling
The traditional approach to modeling has been to write down a mathematical model that perfectly represents the signal or system under study. Typically, the model is a linear one for which a difference equation, impulse response, transfer function, covariance sequence, or power spectrum is given. As a practical matter, the exact order of the model may be unknown and some of the parameters of the model may be unknown. If the model itself is to be used in a signal processing algorithm, then the order and the unknown parameters must be estimated (or identified). There is a generally-held view that economical models (or parsimonious models, as they are called) are better than luxury Signal Processing
models, because luxury models often produce spurious effects. These effects are artifacts of the procedure used to (imperfectly) identify the model, rather than actual characteristics of the signal or system under study. The principle of parsimony lies at the very heart of the scientific method, where we state and test hypotheses in terms of mathematical models that are just complicated enough to model what we can measure, but no more complicated. Such a fundamental principle should have a set of quantitative rules for its application. The AIC rules of Akaike [1] and the CAT rule of Parzen [12] are notable examples of rules for choosing the order of a parsimonious autoregressive (AR) model for a time series. But my favorite example of the principle at work is Shannon's rate distortion theory [19]. Shannon established the mathematical connection between the complexity of a source (its bit-rate) and its distortion ~ and derived formulas for trading one against the other. Our aim in this paper is to extend the philosophies of Akaike, Parzen and Shannon to a number of basic problems in statistical signal processing. With these remarks as preamble, I would like to suggest that the principle of parsimony applied to signal and system modeling says that a model should be complicated enough to reproduce the most important properties of a signal or system but simple enough to resist the spurious effects that are associated with the use of the model in a signal processing application. Sometimes these spurious effects can be directly associated with coefficient quantization, parameter uncertainty or model order mismatch, but usually they show up in a measure of performance that is intimately tied to the problem under study. When we set about to apply the principle of parsimony to signal modeling, we find that rank reduction is the appropriate procedure for reducing the complexity of a model. As we proceed from the most complicated model to the simplest model we find that complicated models are distortion-free Shannon's distortion is actually what we would call distortion (or model bias) plus variance.
L.L. Scharf / Reduced rank signal processing
and variance-limited, whereas simple models are distortion-limited and variance-free. The trick is to find just the right tradeoff between simplicity and complexity to produce the best compromise between distortion and variance. This requires that we determine the appropriate definitions for distortion and variance and derive order selection rules for trading one against the other.
3. Linear models Throughout this paper we will be concerned with linear models for signals. These models take the form
x=HO
(1)
where x = [x0 Xl . . . xu-~] x is a sequence of signal samples, H is a model matrix and 0 = [01 02 . . . Op]T is a vector of coefficients. There are two ways to write out the terms in this linear model. First, if we denote the matrix H by its columns, H=[hl
hz
...
hp].
(2)
Then the signal x may be written out as a linear combination of modes : P
x = E h.0..
(3)
n=l
That is, we think of the columns of the model matrix H as modes and the entries in 0 as mode weights. Second, if we denote the matrix H by its rows,
=lc l=c
(4)
Lc~_,J then the nth measurement x, is the correlation between the nth row c ,T and the parameter vector 0:
x.=cV.O.
(5)
In the linear model we say that the model is (i)
115
overdetermined if N > p (measurements exceed parameters), (ii) determined if N = p , and (iii) undetermined if N < p (parameters exceed measurements). The overdetermined case describes filtering, enhancement, deconvolution and identification problems. The underdetermined case describes inverse and extrapolation problems. If the coefficients 0 are unknown but not drawn from a probability distribution, then the model for x is said to be deterministic. If the coefficients are drawn from a multivariate distribution, then the signal x inherits a multivariate distribution and we say that the signal model is stochastic. For example, if the coefficients 0 are drawn from a normal distribution with mean zero and covariance matrix Roo, which we denote 0:N[0, Roo], then the signal x :N[0, HRooH T ] is drawn from a normal distribution with mean zero and covariance matrix HRooH T. The nth measurement x, is distributed as x, :N[0, CT, Rooc,]. A linear model brings an algebraic characterization for the signal, as well. In this algebraic characterization we say that the columns, or modes, of H=[hl hz . . . hp] span a signal subspace . This signal subspace contains all measurements that can be constructed from the modes of H, nothing more and nothing less. We assume that the modes of H are linearly independent. Then the dimension of the signal subspace is p, meaning that every vector that lies within it is a linear combination of just p linearly independent vectors. To say that an N-vector lies within a subspace of dimension p < N is to say that the signal is constrained by the modes that construct it to have fairly predictable characteristics. From the linearly independent modes that span the signal subspace. , we can construct N - p modes that are orthogonal to them. These modes, which we organize into the matrix A = [al a 2 . . . aN-p], span an orthogonal subspace of dimension N - p . Any vector constructed from the modes of A lies inside the subspace and outside the signal subspace . Taken together, the subspaces and span Euclidean N space •N, meaning that any vector that lies Vol. 25, No. 2, November 1991