Pattern Recognition 38 (2005) 1099 – 1110 www.elsevier.com/locate/patcog
Singular value decomposition in additive, multiplicative, and logistic forms Stan Lipovetsky∗ , W. Michael Conklin GfK Custom Research Inc., 8401 Golden Valley Road, PO Box 27900, Minneapolis, MN 55427-0900, USA Received 28 April 2004; received in revised form 14 January 2005; accepted 14 January 2005
Abstract Singular value decomposition (SVD) is widely used in data processing, reduction, and visualization. Applied to a positive matrix, the regular additive SVD by the first several dual vectors can yield irrelevant negative elements of the approximated matrix. We consider a multiplicative SVD modification that corresponds to minimizing the relative errors and produces always positive matrices at any approximation step. Another logistic SVD modification can be used for decomposition of the matrices of proportions, when a regular SVD can yield the elements beyond the zero-one range, while the modified SVD decomposition produces all the elements within the correct range at any step of approximation. Several additional modifications of matrix approximation are also considered. 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Singular value decomposition; Matrix approximation; Positive matrix; Proportion matrix; Multiplicative decomposition; Logistic decomposition
1. Introduction Singular value decomposition (SVD) was introduced by Eckart and Young [1] and has become one of the most widely used techniques of computational algebra and multivariate statistical analysis applied for data approximation, reduction and visualization. The SVD, is also known in terms of matrix spectral decomposition, is closely related to principal components and Moore–Penrose generalized matrix inverse. SVD presents a rectangular matrix via a low rank additive combination of the outer products of dual right and left eigenvectors [2–5]. Sequential sums of these outer products yield the matrix approximation with a needed precision defined by the cumulative share of the eigenvalues ∗ Corresponding author. Tel.: +1 763 542 0800; fax: +1 763 542 0864. E-mail addresses:
[email protected] (S. Lipovetsky),
[email protected] (W.M. Conklin).
in the matrix squared Euclidian norm. SVD is applied to various problems in pattern recognition [6–12], multidimensional scaling and cluster analysis [13–17], and perceptual mapping [18,19]. It is the main tool in correspondence analysis, or dual scaling for categorical data [20–27]. Numerous SVD applications are known in practical data visualization [28–33], and in priority evaluations [34–39]. Although the SVD is extremely useful in various applications, it can produce inadequate results for some data. In scene recognition and reconstruction problems, the positive pixel data matrices are used. Perceptual maps are often constructed by the counts, proportions, or positive share values. The correspondence analysis utilizes the second and third pairs of dual vectors for data plotting, so the matrix approximation of the third rank is implicitly used. In all these problems, if we reconstruct the original data by the first several items of the matrix spectral decomposition, we could easily obtain an approximated matrix with irrelevant negative values (for instance, of a pixel data). In the case of proportions data, the decomposition by singular vectors could yield
0031-3203/$30.00 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2005.01.010
1100
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
approximation of a lower rank with the reconstructed elements of beyond the needed interval (outside of 0–100 range for a percent data). In this work we suggest a convenient modification of SVD that produces a lower rank approximation of data with the desired properties. To obtain an always positive matrix approximation of any rank, we consider the SVD applied to the logarithm transformation of the elements of the original data matrix. This approach corresponds to the minimization of the multiplicative relative deviations of the vectors’ outer product from the original data. We obtain a multiplicative decomposition of the matrix into a product of the exponents powered with the singular values and dual vectors. In another approach, using SVD for the logistically transformed proportion data and minimizing the deviations, we obtain a lower rank approximation with all the matrix elements positive and less than one. This technique is based on minimization by the criterion of the multiplicative relative deviations from the odds of the empirical proportions. We consider also an SVD with additive components that correspond to a data matrix centering in both directions, and an SVD for data similar to regression analysis when besides independent variables there is a dependent variable. In the data transformation to the logarithms for positive matrices, or to the logarithms of the odds for proportion matrices, the element-by-element transformation is performed. In the transformation from the singular value decomposition for the transformed data back to the matrix approximation for the original data, the inverted transformation is performed on the elementwise basis as well. This procedure engages a straightforward transformation of the values of each element, not the matrix as a whole, so no computational problem (such as in taking an exponent of a matrix) occur. This paper is organized as follows. Section 2 describes the regular SVD technique and suggests its additive, multiplicative, and logistic extensions. Section 3 considers numerical examples, and Section 4 summarizes.
2. Matrix decomposition in additive, multiplicative, and logistic forms Let us briefly describe the regular SVD, or matrix approximation by a cumulative sum of the outer products of eigenvectors—see, for instance, [1–5]. Let X denote a data matrix of m × n order, with elements xij of ith observations (i = 1, . . . , m) by j th variables (j = 1, . . . , n). A matrix approximation by r outer products of the vectors is xij = 1 bi1 aj 1 + 2 bi2 aj 2 + · · · + r bir aj r + εij ,
(1)
where bik and aj k are elements of the kth pair of vectors bk and ak (of mth and nth order, respectively), k are the normalizing terms, and εij are additive residual error terms. The least-squares criterion for finding all unknown
parameters by minimizing errors is
LS = ε 2 =
n m i=1 j =1
xij −
r k=1
2 k bik aj k .
(2)
Minimization by the parameters in Eq. (2) yields a system of equations: n j =1 m i=1
xij aj q = xij biq =
r
k bik ,
i = 1, . . . , m,
k aj k ,
j = 1, . . . , n,
k=1 r
k=1
(3)
where the vectors a and b are normalized by their norms. In matrix notation relations (3) are Xa = b, X b = a, where X is the data matrix and prime denotes transposition. Substituting one of these equations into the other yields the eigenproblems: (X X)a = 2 a,
(XX )b = 2 b.
(4)
The rank of a matrix is always less than or equal to the smallest of its dimensions, so suppose n m. The first eigenproblem in Eq. (4) yields a set of n eigenvalues 21 22 · · · 2n 0 (the first values are positive and the rest values equal zero), and the corresponding set of eigenvectors a1 , a2 , . . . , an (the right vectors of a matrix). The second eigenproblem (4) has m eigenvalues (the same positive values plus m − zeros) and the corresponding set of eigenvectors b1 , b2 , . . . , bm (the left vectors of a matrix). Square roots of the eigenvalues (4) are called the singular values . The set of eigenvalues is called the spectrum of the matrix, so the presentation (1) is the matrix spectral decomposition, or the SVD. The eigenvectors can be found by either of the eigenproblems (4), and the dual vectors can be defined by the transformation (3). Taking the first r pairs of eigenvectors we approximate the matrix (1) with the precision defined by the residual sum of squares (2): n m i=1 j =1
2 = εij
=
n m i=1 j =1 n m i=1 j =1
xij − 2 − xij
r k=1
r k=1
2 k bik aj k
2k ,
(5)
so each subsequent eigenvectors’ approximation diminishes the squared residual deviations by the eigenvalue 2k . Taking the number of eigenvalues equal to the rank of the matrix, r = , we reduce the residual error to exactly zero. The relative root mean square error (RMSE) of the approximation can be estimated by the ratio of the Euclidean norms of the
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
residuals and the original matrix: r 2k /X2 . RMSE = /X = 1 −
(6)
k=1
If the variables in X are centered (and normalized), then X X is a covariance (correlation) matrix, and problems (3)– (4) correspond to the principal component analysis (PCA) [40]. In PCA we construct a linear combination of variables y = Xa and maximize its variance y y = aX Xa. With normalization for the vector a the conditional objective is var(y) = a X Xa − (a a − 1),
(7)
where is a Lagrange term. Maximizing (7) yields the first problem (4) for the vector a with = 2 . Comparison with (3) shows that y = b, so the aggregate y in PCA equals the dual SVD vector b. Similarly in the dual PCA approach, we can consider a linear combination of observations z = X b and maximize its variance, which results in the second eigenproblem (4) for the vector b and the relation z = a between the PCA aggregate z and the SVD dual vector a. The SVD expressions (3) show that the elements of one eigenvector are related to the elements of the dual eigenvector. This property corresponds to the derivation of SVD as the extremes of a bilinear form. Aggregating columns of a matrix with a vector of weights a we get a vector of scores Xa, and projecting this vector onto the direction of the dual vector b we get a bilinear form b Xa. Similarly, summing matrix by rows with unknown weights of the vector b we get a vector X b, and projecting it onto the direction of the vector a yields the same bilinear form. This bilinear form with the normalized vectors can be presented as a conditional objective [38]: F = 2b Xa − (a a − 1) − (b b − 1).
(8)
and its maximization yields a system of equations (3), or the same SVD solution. The additive SVD matrix approximation can be extended for the model with a mixed effect, when in the first vectors approximation (1) we have xij = + bi + aj + bi aj + εij .
(9)
For the SVD problem we can use the relations known in the analysis of variance [41–44] when such a model (9) is represented as xij = + i j + εij ,
(10)
with the relations between the parameters in Eqs. (9)–(10): = − −1 ,
i = bi + −1
, j = aj + −1
(11)
The model (10) differs from the first approximation of the regular SVD (1) by a constant item . It can be shown that the SVD solution of Eq. (10) can be presented as the dual
eigenproblems:
1 1 In − en en X Im − em em X = 2 , n m
1 1 Im − em em X In − en en X = 2 , m n
1101
(12)
where In and Im are identity matrices, and en and em are the uniform vector-columns of the orders n and m, respectively. Problems (12) belong to the family of SVD and PCA techniques (4). The regular SVD uses a matrix X and its transpose, and the PCA uses a one direction centered matrix and its transpose, while in Eq. (12) we see the data matrix centered in two different directions. By the parameters in the model (10) and relations (11) we get the original parameters of model (9) as well. Another SVD problem corresponds to the situation when together with the data matrix X we have a vector y of an additional variable. In the work [45] such a data was used to obtain the residuals in the pair regressions of y by each x for the perceptual PCA mapping. We suggest to use the approach similar to derivation (8), when we arrange a vector of the data scores Xa of size m. Then with a diagonal matrix of unknown weights diag(bi ) we construct a weighted bilinear form with the vector y, that is y diag(b)Xa. We represent this bilinear form as b diag(y)Xa, where b is a vector of unknown weights, and diag(yi ) is a diagonal matrix arranged from the elements of the vector y. Now we can introduce a product z of the diagonal matrix for the dependent variable and the matrix X, that is a data matrix with the rows weighted by the values of the dependent variable, z = diag(y)X, so the bilinear form can be re-written as b za. An objective of maximum for this bilinear form subject to the normalizing conditions coincides with problem (8) with the data matrix z. Its solution is reduced to the SVD problems (4), that in explicit form can be represented as (X diag(y 2 )X)a = 2 a, (diag(y)X X diag(y))b = 2 b.
(13)
If the data is standardized by each xj and y variable, then the total in each column of the matrix z equals the correlation ryj between these variables, or z em =x diag(y)em =x y =ryj . So the matrix z consists of the items of the pair correlations, and the eigenproblems (13) define the SVD structure of this matrix. Many practical problems may require the elements of matrix approximations to be positive, or to belong to a given interval. Let us first consider positive data of counts arranged in a matrix X with all elements xij > 0. The regular SVD (1) with a number of items r less than the matrix rank, r < , can easily produce negative elements of the reconstructed matrix of counts. If these SVD results are used for a perceptual mapping by second and third-pairs of dual vectors, it is hardly possible to interpret a bi-plot corresponding to the third-order data approximation that have negative count estimates. In such cases we suggest the following approach.
1102
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
For the elements of a positive matrix we consider a multiplicative model of their presentation: xij = exp(1 bi1 aj 1 ) exp(2 bi2 aj 2 ) · · · exp(r bir aj r ) · ij ,
(14)
where ij are multiplicative relative error terms of approximation for each ijth element. Taking logarithms of this equation, we represent (14) as follows: ln xij = 1 bi1 aj 1 + 2 bi2 aj 2 + · · · + r bir aj r + ln ij .
(15)
Comparing with (1) shows that (15) is nothing more than a model of singular value decomposition for the logarithms of the original data. Minimizing logarithms of the deviations, ln(ij ) in objective (2) corresponds to constructing an SVD by the criterion of minimum relative deviations. All the properties (2)–(6) of the regular additive SVD are held for the SVD by the logarithms of the data matrix (15). The exact decomposition in the index form (14) contains the number of terms coinciding with the rank of the matrix. All other terms correspond to zero singular values yielding terms equal to one in decomposition. Practically, we apply a regular SVD to the logarithm of positive data (15), and then we return to the original data by powering the decomposition (14). Thus, at any step of such a separable multiplicative eigenvector decomposition (14) we always obtain only positive matrix approximation. In the right-hand side (14) we have the partial indices, or the exponent terms for each element in the Hadamard product of matrix restoration. In other words, we take exponents on the element-by-element basis, not exponent as a function of the matrix as a whole. Let us consider a SVD approach for the logistic transformation of proportion (or percentage) data. Suppose, we need a decomposition that at any step of approximation yields all the elements of a matrix to be positive and less than one. For the elements of such a matrix we consider the model: xij =
exp(1 bi1 aj 1 ) exp(2 bi2 aj 2 ) · · · exp(r bir aj r ) · · · ij
1 + exp(1 bi1 aj 1 ) exp(2 bi2 aj 2 ) · · · exp(r bir aj r ) · ij
,
(16)
where ij denote multiplicative odds’ error terms of approximation. Taking logarithms of odds of each element xij , we get the following presentation of model (16): xij ln 1 − xij = 1 bi1 aj 1 + 2 bi2 aj 2 + · · · + r bir aj r + ln ij , (17) that is a logistic transformation of the data. Objective (2) in this case corresponds to SVD by the criterion of minimum for the odds of multiplicative relative deviations defined by the logarithms of errors ln(ij ). Similarly to multiplicative SVD (14)–(15), the regular SVD properties (2)–(6) are applicable to the logistic SVD (16)–(17) as well. The exact decomposition in the logistic form (16) contains a number of exponents coinciding with the rank of the matrix. Practically, we use a regular SVD for the logistic transformation of a data (17), then we reconstruct the original data by the logistic decomposition (16), so at any step of matrix approximation we always obtain the elements in 0–1 range.
3. Numerical examples The data for numerical examples are taken from a marketing research project on a pharmaceutical product evaluated by 280 medical practitioners. The product’s brands are denoted as Z, ZS, YD, YS, Y, X, and XD (where X, Y, and Z are actual brands, S denotes a syrup version, and D denotes an additional ingredient). The attributes are: a—quick relief, b—safe to use, c—safe to use with other diseases, d—listed on most formularies, e—few side effects, f—one dose for all day, g—long lasting relief, h—effective relief, i—for kids-patients, j—the most symptoms relief, k—highest patient satisfaction, l—provides high quality of life. The attributes were estimated on a five point scale, where 1 is least important and 5 is most important. A matrix of top box proportions (the percent answering “5”) constructed from the original data is presented in Table 1.
Table 1 Data matrix: top box proportion (%) by brands and attributes Attributes
Brand Z
Brand ZS
Brand YD
Brand YS
Brand Y
Brand X
Brand XD
a b c d e f g h i j k l
1.87 29.6 64.62 1 0.1 91.77 25.45 4.26 0.14 8.46 0.23 0.12
0.16 91.21 89.01 0.16 0.1 76.14 38.23 1.06 16.1 3.95 0.12 0.1
6.36 25.94 90.3 13.75 1.59 66.11 39.1 5.03 0.1 36.78 2.76 2.41
0.25 93.64 37.86 1.14 25.83 16.09 14.76 0.1 8.62 1.9 1.24 1.75
0.1 93 45.94 8.42 36.92 33.03 4.41 0.1 0.25 0.18 1.06 2.17
1.32 93.16 55.91 21.06 40.38 15.75 10.48 0.6 0.1 1.03 4.16 6.39
19.06 25.6 90.37 35.05 3.45 5.42 13.54 7.37 0.1 19.15 0.85 0.18
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
1103
Table 2 (A) Regular additive SVD for 4 brands and 5 attributes: vectors and errors Estimates
Vectors and precision of approximation 1st
2nd
3rd
4th
Eigenvectors a
0.35 0.66 0.44 0.49
0.32 −0.08 0.63 −0.7
0.07 0.68 −0.52 −0.51
−0.87 0.3 0.37 −0.1
Eigenvectors b
0.02 0.67 0.74 0.04 0.07
0.07 −0.7 0.65 0.12 −0.25
−0.19 0.17 −0.04 −0.46 −0.85
0.15 0.17 −0.16 0.85 −0.45
Singular value RMSE
190.77 0.34
67.38 0.08
16.47 0.02
4.87 0
Mean (STD) Mean by abs. error (STD) Relative mean (STD) Relative mean by abs. error (STD)
−0.2 (15.95) 11.35 (10.9) 9.43 (21.93) 9.78 (21.77)
−0.27 (3.93) 2.77 (2.73) 6.0 (23.58) 8.31 (22.83)
0.04 0.71 −0.83 1.99
(1.12) (0.84) (4.94) (4.58)
0 0 0 0
(0) (0) (0) (0)
(B) Additive SVD matrix approximation Approximation
Partial inputs Z
Cumulative sums
Step
Attribute
ZS
YD
1
a b c d e
1.31 45.36 49.76 2.52 4.73
2.47 85.15 93.41 4.73 8.87
1.65 56.85 62.37 3.16 5.92
1.8 62.18 68.22 3.46 6.48
1.31 45.36 49.76 2.52 4.73
2.47 85.15 93.41 4.73 8.87
1.65 56.85 62.37 3.16 5.92
1.8 62.18 68.22 3.46 6.48
2
a b c d e
1.43 −15.2 14.23 2.65 −5.5
−0.37 3.92 −3.67 −0.68 1.42
2.8 −29.79 27.84 5.18 −10.77
−3.1 32.97 −30.82 −5.74 11.92
2.75 30.13 63.98 5.17 −0.78
2.1 89.07 89.74 4.05 10.29
4.45 27.06 90.21 8.34 −4.84
−1.3 95.15 37.4 −2.28 18.4
3
a b c d e
−0.23 0.2 −0.05 −0.55 −1.03
−2.16 1.89 −0.5 −5.11 −9.55
1.64 −1.43 0.38 3.88 7.24
1.63 −1.43 0.38 3.86 7.2
2.52 30.33 63.93 4.62 −1.8
−0.06 90.96 89.24 −1.06 0.74
6.09 25.63 90.59 12.22 2.39
0.33 93.73 37.78 1.57 25.6
4
a b c d e
−0.65 −0.73 0.69 −3.62 1.9
0.22 0.25 −0.23 1.22 −0.64
0.27 0.31 −0.29 1.53 −0.8
−0.08 −0.09 0.08 −0.43 0.23
1.87 29.6 64.6 1 0.1
0.16 91.21 89.01 0.16 0.1
6.36 25.94 90.3 13.75 1.59
0.25 93.64 37.86 1.14 25.83
For the first, more detailed example, we use a subset of five attributes (a, b, c, d, and e) and four brands (Z, ZS, YD, and YS), so we take a sub-matrix of 5 × 4 size from the upper-left corner of the total matrix in Table 1. The regular SVD solution for this matrix is presented in Table 2A.
YS
Z
ZS
YD
YS
The rank of the matrix in Table 1 equals four, so there are four pairs of dual eigenvectors a and b in the columns of Table 2A. The first eigenvectors a1 and b1 are close to the normalized by scalar norms column totals (those are 0.33, 0.61, 0.47, and 0.54) and row totals (0.02, 0.65,
1104
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
Table 3 (A) Multiplicative SVD for 4 brands and 5 attributes: vectors and errors Estimates
Vectors and precision of approximation 1st
2nd
3rd
4th
Eigenvectors a
0.47 0.56 0.47 0.49
0.34 0.59 −0.44 −0.59
0.36 −0.32 0.64 −0.59
0.73 −0.48 −0.42 −0.25
Eigenvectors b
−0.05 0.68 0.73 0.02 −0.05
−0.17 −0.05 0 −0.47 −0.87
0.68 −0.2 0.19 0.53 −0.41
0.7 −0.31 −0.22 −0.58 −0.15
Singular value RMSE
11.55 0.49
4.92 0.31
4.14 0.02
0.31 0
Mean (STD) Mean by abs. error (STD) Relative mean (STD) Relative mean by abs. error (STD)
−3.59 11.25 1.09 1.59
(17.44) (13.59) (2.52) (2.22)
−2.99 11.19 0.47 0.97
(16.49) (12.23) (1.33) (1.0)
−0.05 (1.73) 1.08 (1.33) 0 (0.07) 0.06 (0.04)
0 0 0 0
(0) (0) (0) (0)
(B) Multiplicative SVD matrix approximation Approximation
Partial indices Z
Cumulative products
Step
Attribute
ZS
YD
YS
Z
ZS
YD
YS
1
a b c d e
0.77 40.81 51.52 1.13 0.77
0.74 85.06 112.64 1.16 0.73
0.77 41.37 52.34 1.14 0.76
0.77 47.52 60.65 1.14 0.76
0.77 40.81 51.59 1.13 0.77
0.74 85.06 112.64 1.16 0.73
0.77 41.37 52.34 1.14 0.76
0.77 47.52 60.65 1.14 0.76
2
a b c d e
0.75 0.91 0.99 0.46 0.23
0.6 0.86 0.99 0.26 0.08
1.45 1.12 1.01 2.72 6.39
1.66 1.17 1.01 3.86 12.2
0.58 37.3 51.19 0.52 0.18
0.44 72.8 111.13 0.3 0.06
1.13 46.41 52.87 3.09 4.89
1.27 55.48 61.46 4.41 9.24
3
a b c d e
2.76 0.74 1.33 2.21 0.54
0.4 1.31 0.78 0.49 1.74
6.19 0.58 1.66 4.13 0.33
0.19 1.65 0.63 0.27 2.76
1.6 27.57 67.93 1.14 0.1
0.18 95.56 86.14 0.15 0.1
6.96 27.01 87.77 12.76 1.62
0.24 91.38 38.52 1.19 25.52
4
a b c d e
1.17 1.07 0.95 0.88 1.04
0.9 0.95 1.03 1.09 0.98
0.91 0.96 1.03 1.08 0.98
1.06 1.02 0.98 0.96 1.01
1.87 29.6 64.62 1 0.1
0.16 91.21 89.01 0.16 0.1
6.36 25.94 90.3 13.75 1.59
0.25 93.64 37.86 1.14 25.83
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
1105
Table 4 (A) Logistic SVD for 4 brands and 5 attributes: vectors and errors Estimates
Vectors and precision of approximation 1st
2nd
3rd
4th
Eigenvectors a
0.52 0.69 0.3 0.39
0.44 −0.07 0.42 −0.79
0.73 −0.41 −0.52 0.17
−0.05 0.59 −0.68 −0.44
Eigenvectors b
−0.57 0.11 0.13 −0.54 −0.59
0.41 −0.56 0.26 0.21 −0.63
0.09 −0.32 −0.9 −0.26 −0.11
0.63 −0.67 −0.08 −0.23 −0.29
Singular value RMSE
16.98 0.33
5.52 0.13
1.86 0.08
1.41 0
Mean (STD) Mean (STD) by abs. error Relative mean (STD) Relative mean (STD) by abs. error
1.52 11.78 0.77 1.13
(18.62) (14.25) (1.99) (1.8)
0.84 4.51 0.08 0.24
(7.22) (5.61) (0.31) (0.2)
0.45 1.83 0.06 0.18
(3.8) (3.33) (0.27) (0.21)
0 0 0 0
(0) (0) (0) (0)
(B) Logistic SVD matrix approximation Approximation
Partial terms Z
Cumulative results
Step
Attribute
ZS
YD
YS
Z
ZS
YD
YS
1
a b c d e
0.64 73.03 76.27 0.84 0.54
0.12 79.11 82.64 0.17 0.09
5 64.15 66.41 5.82 4.52
2.12 68.07 70.82 2.61 1.86
0.64 73.03 76.27 0.84 0.54
0.12 79.11 82.64 0.17 0.09
5 64.15 66.41 5.82 4.52
2.12 68.07 70.82 2.61 1.86
2
a b c d e
0.37 3.95 0.52 0.6 4.73
1.16 0.82 1.1 1.08 0.8
0.39 3.66 0.54 0.62 4.33
6.02 0.09 3.16 2.46 0.06
1.74 40.65 85.97 1.39 0.11
0.1 82.27 81.23 0.16 0.12
11.98 32.85 78.41 9.06 1.08
0.36 96.11 43.47 1.07 23.15
3
a b c d e
0.89 1.55 3.37 1.42 1.16
1.07 0.78 0.5 0.82 0.92
1.09 0.73 0.42 0.78 0.9
0.97 1.11 1.32 1.08 1.04
1.96 30.65 64.49 0.98 0.1
0.09 85.61 89.61 0.19 0.13
11.09 40.07 89.64 11.35 1.21
0.37 95.71 36.75 0.99 22.53
4
a b c d e
1.05 1.05 0.99 0.98 0.98
0.59 0.57 1.07 1.21 1.27
1.84 1.91 1.93 0.8 0.76
1.48 1.52 0.95 0.87 0.84
1.87 29.6 64.62 1 0.1
0.16 91.21 89.01 0.16 0.1
6.36 25.94 90.3 13.75 1.59
0.25 93.64 37.86 1.14 25.83
1106
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
Table 5 Regular additive SVD restoration of total percentage matrix: Steps 2, 4, and 6 Step
Attribute
Z
ZS
YD
YS
Y
X
XD
#2
a b c d e f g h i j k l
6.22 27.56 80.06 9.98 −4.25 65.91 30.63 4.61 3.01 19.64 0.86 0.38
5.39 83.75 96.63 14.11 16.46 68.05 32.23 3.79 5.74 15.49 1.87 2.24
6.99 28.07 88.69 10.96 −5.77 73.59 34.17 5.19 3.24 22.16 0.92 0.33
0.02 90.38 41.36 8.27 30.38 16.69 8.67 −0.29 4.72 −2.24 1.7 2.88
0.65 94.07 49.91 9.37 30.25 23.56 11.87 0.17 5.07 −0.27 1.8 2.95
0.46 98.33 49.73 9.52 32.11 22.43 11.39 0.01 5.25 −1.01 1.87 3.1
3.94 34.49 58.44 7.87 3.04 44.84 21.01 2.86 2.8 12 0.87 0.78
#4
a b c d e f g h i j k l
−0.54 30 63.23 −0.87 1.39 90.37 29.11 3.07 0.18 14.79 0.91 0.92
0.87 91.6 88 −1.28 0.79 75.61 39.83 1.62 16.12 7.31 0.15 −0.1
9.2 25.17 93.3 17.26 −1.03 68.57 31.75 6.08 0 25.41 1.49 1.07
−0.95 92.9 39.85 3.88 24.44 17.16 11.44 −0.91 8.58 −4.66 1.1 2.02
−1 92.61 44.91 9.4 38.06 32.46 8.65 0.17 0.33 0.18 2.39 3.97
3.54 93.92 56.02 18.78 39.83 15.93 7.52 1.31 0.04 3.81 2.78 4.28
16.79 25.79 88.76 33.84 4.99 4.04 18.29 6.52 0.17 24.45 1.93 1.52
#6
a b c d e f g h i j k l
0.9 29.6 64.94 0.95 −0.11 91.72 24.84 3.52 0.13 8.74 0.14 0.1
0.86 91.21 88.78 0.2 0.25 76.18 38.67 1.59 16.11 3.75 0.19 0.12
6.75 25.94 90.17 13.77 1.67 66.13 39.34 5.32 0.1 36.67 2.8 2.42
−1.08 93.64 38.3 1.07 25.55 16.02 13.92 −0.91 8.6 2.28 1.12 1.72
1.18 93 45.58 8.48 37.15 33.09 5.09 0.92 0.26 −0.13 1.16 2.2
1.17 93.16 55.96 21.05 40.35 15.74 10.38 0.49 0.1 1.07 4.15 6.39
18.8 25.6 90.46 35.04 3.4 5.41 13.38 7.17 0.1 19.22 0.83 0.17
0.76, 0.04, and 0.07) of the matrix in Table 1. The main eigenvectors are positive (as they should be due to the Perron–Frobenius theory for a positive matrix), but the next eigenvectors have elements of both signs. Below the eigenvectors in Table 2A we see the singular values corresponding to the dual vectors, and then the values of RMSE (6). Using two and three items in decomposition (1) makes this error very small (0.08 and 0.02), and with all four items the matrix spectral decomposition yields exactly zero error. Several other characteristics of precision on each step of approximation are shown in the bottom part of Table 2A. First, there are the means (and standard deviations in parentheses) obtained by the residual errors εij in
Eq. (1)—see the row of Mean (STD). The next row presents the means and standard deviations estimated by the absolute value |εij | of the residual errors. Then we see the relative mean (STD) row defined by the relative deviations εij /xij from the data elements. And the last row of Table 2A shows the relative means and standard deviations obtained by the absolute value |ij /xij | of the relative errors. All of these error estimates diminish quickly with the increase of the approximation step. Table 2B presents results of the matrix reconstruction by the eigenvectors’ outer products. At the left-hand side of Table 2B we see the partial inputs in the sum (1) of the spectral decomposition at each step of approximation, and
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
1107
Table 6 Multiplicative SVD restoration of total percentage matrix: steps 2, 4, and 6 Step
Attribute
Z
ZS
YD
#2
a b c d e f g h i j k l
1.75 34.71 78.77 1.04 0.15 53.38 33.4 4.09 0.84 14.68 0.28 0.14
1.85 29.39 71.83 0.84 0.1 52.22 33.68 4.84 0.94 16.79 0.24 0.11
1.22 90.32 125.59 4.24 2.15 55.16 28.84 1.34 0.41 5.86 0.74 0.57
0.88 42.41 37.52 7.23 11.3 15.75 8.91 0.56 0.33 1.74 1.62 1.83
#4
a b c d e f g h i j k l
1.53 28.98 74.21 1.45 0.09 83.77 23.58 5.33 0.17 7.52 0.23 0.12
0.18 81.06 78.03 0.12 0.11 93.19 39.77 0.81 13.47 4.94 0.12 0.1
9.02 42.86 120.87 17.34 2.45 27.25 29.97 5.14 0.1 22.54 1.45 0.68
#6
a b c d e f g h i j k l
1.56 25.11 68.47 1.3 0.1 99.96 23.6 4.31 0.16 9.18 0.22 0.12
0.17 97.63 86.91 0.14 0.1 73.5 39.44 1.05 15.37 3.82 0.12 0.1
7.48 30.07 85.72 10.89 1.65 61.22 41.84 4.98 0.09 34.19 2.84 2.43
at the right-hand side the cumulative sums of these items are presented. By the left-hand side we see that the incremental matrices become smaller on a higher step of approximation. The right-hand side cumulative matrices show that the first approximation yields a positive matrix (because it is constructed by the positive vectors). However, as we see in Table 2B, the cumulative matrices of the 2nd and 3rd approximation contain negative elements. These matrices correspond to two or three items in decomposition (1), that yield negative proportions which does not make sense. In a bi-plotting by the 2nd and 3rd pairs of vectors the negative values could be misleading. The last approximation given at the bottom of the right-hand side in Table 2B, of course restores exactly the sub-matrix in the upper-left corner of Table 1.
YS
Y
X
XD
0.71 62.86 41.01 15.75 54.18 13.73 7.16 0.28 0.23 0.94 2.91 4.31
0.72 89.22 60.16 17.37 55.65 18.95 9.41 0.29 0.21 1.1 2.81 4.04
1.2 47.96 64.8 3.34 1.78 32.21 18.47 1.34 0.48 4.76 0.74 0.59
0.22 100.07 42.22 1.46 22 14.21 14.42 0.13 10.3 1.6 1.29 1.94
0.11 104.55 40.47 5.96 32.14 36.1 5.28 0.11 0.23 0.17 1.37 3.34
1.19 70.1 58.69 27.3 50.69 17.56 8.73 0.45 0.1 1.33 3.19 4.1
16.12 18.56 61.96 20.13 2.2 12.52 19.94 7.41 0.09 28.55 1.81 0.74
0.23 85.84 39.04 1.31 25.31 16.83 14.18 0.1 9.15 1.98 1.22 1.74
0.12 109.15 43.42 6.54 38.32 30.39 4.75 0.1 0.22 0.17 1.09 2.19
1.08 77.84 59.56 27.94 38.73 17.29 9.65 0.61 0.11 1.13 4.01 6.34
19.93 26.66 89.09 32.87 3.48 5.31 13.79 7.35 0.1 18.77 0.86 0.18
To get positive matrix reconstruction at any approximation step, we apply the multiplicative SVD approach (14)–(15)—the results are presented in Tables 3A and B. These tables are arranged similarly to Tables 2A and B of the regular SVD. Comparing error estimates in Tables 2A and 3A, we see that generally the regular additive errors are smaller and the relative multiplicative errors are bigger in Table 2A, and vice versa for the errors in Table 3A. This is expected, because in the multiplicative SVD we minimize the relative deviations in the matrix approximation. In the left-hand side of Table 3B, we have the partial indices, or the exponent terms for each element in the Hadamard product of matrix representation (14). Cumulative products at the right-hand side in Table 3B present an all positive
1108
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
Table 7 Logistic SVD restoration of total percentage matrix: steps 2, 4, and 6 Step
Attribute
Z
ZS
YD
YS
Y
X
XD
#2
a b c d e f g h i j k l
1.82 37.76 88.94 1.46 0.11 71.82 29.42 3.42 0.22 13.79 0.19 0.09
0.71 60.35 83.85 1.46 0.34 58.76 19.99 1.05 0.18 4.94 0.2 0.12
5.96 42.89 80.04 5.42 1.02 64.6 34.91 8.83 1.52 21.29 1.43 0.84
0.35 92.94 50.07 5.75 28.96 22.84 9.95 0.24 0.96 0.83 1.71 2.69
0.12 96.05 47.03 3.98 33.17 17.55 6.6 0.07 0.48 0.32 0.99 1.82
0.55 92.02 48.2 7.71 35.43 23.51 11.48 0.38 1.56 1.17 2.68 4.2
3.87 49.46 79.19 4.63 1.11 61.26 30.66 5.51 1.13 15.08 1.13 0.71
#4
a b c d e f g h i j k l
1.26 36.57 83.67 1.19 0.09 91.53 32.45 3.54 0.13 10.96 0.27 0.18
0.19 85.75 79.95 0.12 0.12 75.76 35.05 0.8 16.7 4.85 0.11 0.08
9.53 26.96 79.14 15.7 1.55 69.2 27.9 10.13 0.12 19.67 2.18 1.39
0.22 96.21 51.85 2.03 21.7 18.65 12.92 0.2 9.62 0.92 1.14 1.64
0.13 95.03 42.81 5.31 34.75 27.08 6.27 0.08 0.18 0.28 1.26 2.75
1.06 84.92 50.79 23.3 47.74 18.51 8.08 0.44 0.13 1.16 3.67 5.34
18.49 27.71 91.78 32.4 3.35 4.99 15.79 6.29 0.09 23.05 0.93 0.21
#6
a b c d e f g h i j k l
1.68 26.43 66.86 1.11 0.1 92.25 24.37 4.1 0.15 8.41 0.22 0.12
0.16 91.22 89 0.16 0.1 76.13 38.24 1.06 16.09 3.95 0.12 0.1
7.17 29.58 89.25 12.35 1.51 64.38 40.7 5.26 0.09 36.95 2.85 2.45
0.23 93.05 39.27 1.21 26.38 16.63 14.33 0.1 9 1.89 1.22 1.73
0.14 95.4 39.08 6.36 33.85 29.04 5.15 0.11 0.2 0.18 1.14 2.26
0.94 89.28 63.37 27.16 44 18.68 8.9 0.53 0.13 1.01 3.84 6.11
19.81 26.92 89.99 34 3.38 5.28 13.84 7.49 0.1 19.19 0.86 0.18
matrix at any step of the approximation, however, they contain elements above 100% on the 1st and 2nd steps of approximation. The 3rd approximation becomes already very close to the last 4th step of the exact matrix reconstruction. For the original matrix of percents we need all the elements to belong to the interval 0–100 percent. Applying the logistic SVD (16)–(17) to get a positive and restricted from above matrix at any step of approximation, we obtain the results presented in Table 4A and B. The results look very similar to those of the previous Table 3A and B. Error estimates by Table 3A and 4A are very close, and the matrix approximations in Tables 3B and 4B by both multiplicative
and logistic decompositions are comparable. However, the cumulative results of the matrix restoration in the right-hand side of Table 4B are all positive and below 100% at any approximation step. For the second example we use the total positive matrix of percents in Table 1. Reconstructing this matrix by the regular additive SVD (1) we obtain the negative elements at all the steps of approximation by lower rank from r = 2 to r = 6 (the total rank equals 7). The restored matrices at the approximation step 2, 4, and 6 are presented in Table 5. We see that the negative elements appear at various locations of the matrix, mostly for the smallest original
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
elements. Using the multiplicative SVD (14)–(15) we obtain the approximations presented at the same steps in Table 6. All the elements are positive for any rank of the restored matrix. So the multiplicative SVD works pretty well for a positive data matrix. However, for any approximation of a lower rank from r = 2 to r = 6 there are some elements above 100%, mostly among those of a high original value. To obtain the elements in the appropriate range at any approximation step we apply the logistic SVD (16)–(17). The results presented in Table 7 show that now all the elements are always in the 0–100 interval for any lower approximation rank. And at the last 7th step of approximation all the decompositions of additive, multiplicative, and logistic SVD (corresponded to Tables 5–7) restore exactly the original matrix of Table 1.
4. Summary We considered the singular value decomposition technique adjusted for the obtaining of matrices with specific features at any step of approximation. It is shown that a positive matrix can be more adequately approximated in the multiplicative SVD by the product of powered spectral decomposition items obtained by the logarithm of the original elements of a matrix. In such an approximation we are guaranteed to preserve the positive entries of the original matrix in the entries of any approximating matrix of a lower rank. A proportion matrix can be more adequately approximated by the logistic transformation of the elements of the original matrix. In this case we preserve the range of the entries of the original matrix in the entries of the approximating matrix of any lower rank. The suggested modifications can be easily utilized because they use element-by-element transformations of the data, not a function of the matrix as a whole, so regular SVD software is applicable. We demonstrated the usefulness of the considered techniques for obtaining a matrix with desired properties—positive, or restricted in a given range—at any step of approximation. Future research is needed using a synthetic generation of matrices of high ranks with various distributions of the negative elements in the eigenvectors. Then the performance of the different SVD forms can be studied and the advantages of each specific approach in data approximation can be evaluated. The generalized family of the SVD techniques can enrich the area of theoretical and practical applications of singular value decomposition in the numerous problems of data approximation, reduction, visualization, pattern recognition, and other statistical estimations.
Acknowledgments The authors wish to thank a referee whose valuable comments and suggestions improved and clarified the paper.
1109
References [1] C. Eckart, G. Young, The approximation of one matrix by another of lower rank, Psycometrika 1 (1936) 211–218. [2] G.H. Golub, C.F. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, 1983. [3] G.A.F. Seber, Multivariate Observations, Wiley, New York, 1984. [4] R.A. Thisted, Elements of Statistical Computing: Numerical Computation, Chapman & Hall, New York, 1988. [5] D. Kalman, A singular value decomposition: the SVD of a matrix, Coll. Math. J. 27 (1996) 2–23. [6] H.C. Andrews, C.L. Patterson, Outer product expansions and their uses in digital image processing, Am. Math. Mon. 82 (1975) 1–13. [7] V.C. Klema, The singular value decomposition: its computation and some applications, IEEE Trans. Automat. Control 25 (1980) 164–176. [8] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, London, 1990. [9] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. [10] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, UK, 1996. [11] R. Duda, P. Hart, D. Stork, Pattern Classification, Wiley, New York, 2000. [12] Y. Tian, T. Tan, Y. Wang, Y. Fang, Do singular values contain adequate information for face recognition, Pattern Recognition 36 (2003) 649–655. [13] H.F. Gollob, A statistical model which combines features of factor analysis and analysis of variance techniques, Psychometrika 33 (1968) 73–116. [14] S.S. Schiffman, M.L. Reynolds, F.W. Young, Introduction to Multidimensional Scaling, Academic Press, New York, 1981. [15] J.M. Chambers, W.S. Cleveland, B. Kleiner, P.A. Tukey, Graphical Methods for Data Analysis, Wadsworth, Belmont, California, 1983. [16] M.S. Oh, A.E. Raftery, Bayesian multidimensional scaling and choice of dimension, J. Am. Statist. Assoc. 96 (2001) 1031–1044. [17] P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, Clustering large graphs via the singular value decomposition, Mach. Learn. 56 (2004) 9–33. [18] K.R. Gabriel, The biplot graphical display of matrices with applications to principal component analysis, Biometrika 58 (1971) 453–467. [19] K.R. Gabriel, C.L. Odoroff, Biplots in biomedical research, Statist. Med. 9 (1990) 469–485. [20] S. Nishisato, Analysis of Categorical Data: Dual Scaling and its Applications, Toronto University Press, Toronto, 1980. [21] J. de Leeuw, J. van Rijckevorsel, Homals and principals: some generalization of principal component analysis, in: E. Diday et al. (Eds.), Data analysis and informatics, vol. II, Amsterdam, North Holland, 1980, pp. 231–242. [22] L. Lebart, A. Morineau, K.M. Warwick, Multivariate descriptive statistical analysis: Correspondence Analysis and Related Techniques for Large Matrices, Wiley, New York, 1984. [23] M.J. Greenacre, Theory and Application of Correspondence Analysis, Academic Press, London, 1984.
1110
S. Lipovetsky, W.M. Conklin / Pattern Recognition 38 (2005) 1099 – 1110
[24] M.J. Greenacre, T. Hastie, The geometric interpretation of correspondence analysis, J. Am. Statist. Assoc. 82 (1987) 437–447. [25] J.P. Benzecri, Correspondence Analysis Handbook, Marcel Dekker, New York, 1992. [26] W.J. Krzanowski, Recent Advances in Descriptive Multivariate Analysis, Clarendon Press, Oxford, 1995. [27] H.S. Lynn, C.E. McCulloch, Using principal component analysis and correspondence analysis for estimation in latent variable models, J. Am. Statist. Assoc. 95 (2000) 561–572. [28] J.D. Carroll, P.E. Green, C.M. Schaffer, Interpoint distance comparisons in correspondence analysis, J. Marketing Res. 23 (1986) 271–280. [29] S.M. Shugan, Estimating brand positioning maps using supermarket scanning data, J. Marketing Res. 24 (1987) 1–18. [30] E. Kaciak, J. Louviere, Multiple correspondence analysis of multiple choice experiment data, J Marketing Res. 27 (1990) 455–465. [31] J.B. Steenkamp, H. van Trijp, J. Ten Berge, Perceptual mapping based on idiosyncratic sets, J. Marketing Res. 31 (1994) 15–27. [32] A. Carlier, P.M. Kroonenberg, Decompositions and biplots in three-way correspondence analysis, Psychometrica 61 (1996) 353–373. [33] I. Sinha, W.S. DeSarbo, An integrated approach toward the spatial modeling of perceived customer value, J. Marketing Res. 35 (1998) 236–249. [34] C. Moler, D. Morrison, Singular value analysis of cryptograms, Am. Math. Mon. 90 (1983) 78–87.
[35] S.I. Gass, T. Rapcsak, Singular value decomposition in AHP, Eur. J. Oper. Res. 154 (2004) 573–584. [36] S. Lipovetsky, M. Conklin, Dual priority-antipriority Thurstone scales as AHP eigenvectors, Eng. Simulat. 18 (2001) 631–648. [37] S. Lipovetsky, M. Conklin, Robust estimation of priorities in the AHP, Eur. J. Oper. Res. 137 (2002) 110–122. [38] S. Lipovetsky, A. Tishler, M. Conklin M., Multivariate least squares and its relation to other multivariate techniques, Appl. Stoch. Models Bus. Ind. 18 (2002) 347–356. [39] S. Lipovetsky, M. Conklin, Nonlinear Thurstone scaling via SVD and Gower plots, Int. J. Oper. Quant. Manage. 10 (3) (2004) 1–15. [40] S. Lipovetsky, A. Tishler, Linear methods in multimode data analysis for decision making, Comput. Oper. Res. 21 (1994) 169–183. [41] J. Mandel, A method for fitting empirical surfaces to physical or chemical data, Technometrics 11 (1969) 411–429. [42] J. Mandel, A new analysis of variance model for non-additive data, Technometrics 13 (1971) 1–18. [43] J.W. Tukey, Exploratory Data Analysis, Addison-Wesley, MA, 1977. [44] J.D. Emerson, G.Y. Wong, Resistant nonadditive fits for twoway tables, in: D.C. Hoaglin, F. Mosteller, J.W. Tukey (Eds.), Exploring Data Tables, Trends, and Shapes, Wiley, New York, 1985, pp. 67–124. [45] B. Falissard, Focused principal component analysis: looking at a correlation matrix with a particular interest in a given variable, J. Comput. Graph. Statist. 8 (1999) 906–912.