Linear Algebra and its Applications 505 (2016) 361–366
Contents lists available at ScienceDirect
Linear Algebra and its Applications www.elsevier.com/locate/laa
On a result of J.J. Sylvester Michael P. Drazin Department of Mathematics, Purdue University, West Lafayette, IN 47907-2067, United States
a r t i c l e
i n f o
Article history: Received 7 December 2015 Accepted 5 May 2016 Available online xxxx Submitted by R. Brualdi MSC: 15A09 15A18 15A24 16–XX 20Mxx
a b s t r a c t For any algebraically closed field F and any two square matrices A, B over F , Sylvester (1884) [8] and Cecioni (1910) [1] showed that AX = XB implies X = 0 if and only if A and B have no common eigenvalue. It is proved that a third equivalent statement is that, for any given polynomials f , g in F [t], there exists h in F [t] such that f (A) = h(A) and g(B) = h(B). Corresponding results hold also for any finite set of square matrices over F , and these lead to a new property of all associative rings and algebras (even over arbitrary fields) with 1. © 2016 Published by Elsevier Inc.
Keywords: Associative algebras Associative rings Core-nilpotent decomposition Eigenvalues Fitting’s lemma Generalized inverses Matrix polynomials Semigroups
1. Introduction Our starting point here is the following result: E-mail address:
[email protected]. http://dx.doi.org/10.1016/j.laa.2016.05.007 0024-3795/© 2016 Published by Elsevier Inc.
362
M.P. Drazin / Linear Algebra and its Applications 505 (2016) 361–366
Theorem 1.1. For any field F , any r, s ∈ N and any square matrices A ∈ Mr (F ), B ∈ Ms (F ) whose eigenvalues all lie in F , the following three properties of the pair A, B are equivalent: (i) for any given polynomials f, g ∈ F [t] there exists h ∈ F [t] such that f (A) = h(A) and g(B) = h(B); (ii) A and B share no eigenvalue in common; (iii) for r × s matrices X over F , AX = XB ⇒ X = 0. The equivalence of (ii) and (iii) is an old result due to J.J. Sylvester [8] and F. Cecioni [1, p. 43] (or see [6, p. 90, Theorem 46.2]), while that of (i) and (ii) seems to be new. Since the arguments of Sylvester and Cecioni involve the unnecessary complications of resultants and canonical forms, while more modern proofs of (ii) ≡ (iii) use the Kronecker sum Is ⊗ A − B T ⊗ Ir (see e.g. [9, p. 41, Theorem 2.7], and also well known to Sylvester), perhaps a simpler and more economical proof of the whole of Theorem 1.1 may now be worth putting on record. We give the details in Section 2. In Section 3 we extend Theorem 1.1 more generally (Theorem 3.1) to finite sets A1 , . . . , Am of square matrices (and discuss possible applications). To explain a slight discrepancy between the forms of (iii) in Theorems 1.1 and 3.1, we note here an obvious consequence of Theorem 1.1: Corollary 1.2. For any F , A, B as in Theorem 1.1, AX = XB ⇒ X = 0 if and only if BY = Y A ⇒ Y = 0.
2
In Section 4 we show that the case r = s = n of Theorem 1.1 (and the case n1 = . . . = nm = n of Theorem 3.1) extends, in part, to associative algebras R more general than Mn (F ). Our main result (Theorem 4.1) is that in fact (i) ⇒ (iii) holds (even for infinite subsets of R) for all associative rings and algebras R with 1, but easy examples show that, for such R, (iii) no longer implies (i) and Corollary 1.2 also fails. In view of the scarcity of known properties of arbitrary subsets valid for the class of all associative rings and algebras with 1, any new result of this generality might ordinarily be of considerable interest. However, since the proof of Theorem 4.1 is so transparent, and since neither of the properties (i) or (iii) seems at first sight to be of compelling interest in itself or to be connected to anything more familiar or more significant, Theorem 4.1 may be only a curiosity. 2. Proof of Theorem 1.1 (i) ⇒ (ii). If (ii) is false, then A, B have some common eigenvalue λ; let u, v be any corresponding pair of eigenvectors, and let f , g be the constant polynomials f = 0, g = 1. Then (i) would require h(λ)u = h(A)u = f (A)u = 0, so that h(λ) = 0, but also h(λ)v = h(B)v = g(B)v = v, so that h(λ) = 1, a contradiction. Hence (i) is false.
M.P. Drazin / Linear Algebra and its Applications 505 (2016) 361–366
363
(ii) ⇒ (i). If (ii) holds then the characteristic polynomials φA , φB of A, B are mutually coprime over F , and so there exist polynomials p, q ∈ F [t] such that pφB + qφA = 1. Given any f, g ∈ F [t], let h = pφB f + qφA g = (1 − qφA )f + qφA g. Since φA (A) = 0 this yields h(A) = f (A) and similarly h(B) = g(B), i.e. (i) holds. (ii) ⇒ (iii). This part of the argument is well known (see e.g. [4, Section 2.4.4]) but we include it for completeness (see also Remark 2.1 below). Let X be any solution of AX = XB, so that, by an easy induction, φ(A)X = Xφ(B) for every φ ∈ F [t]: in the proof of Theorem 4.1 we shall refer to this as the “intertwining principle”. Also we may s s write φB (t) = j=1 (t − μj ), so that φB (A) = j=1 (A − μj Ir ). If (ii) holds then every factor A − μj Ir is invertible, and so φB (A) is invertible, whence φB (A)X = XφB (B) = 0 gives X = 0, i.e. (iii) holds. (iii) ⇒ (ii). If (ii) is false then A and B (and hence also the transpose B T of B) have a common eigenvalue λ in F . Now let u ∈ F r and v ∈ F s be any λ-eigenvectors for A and B T respectively, and define X = uv T . Since u and v are both nonzero so also is X, and we have AX = A(uv T ) = (Au)v T = (λu)v T = u(λv)T = u(B T v)T = XB. Thus (iii) is false, as required.
2
Remark 2.1. As will appear in Remark 4.2 below, Theorem 1.1 can be proved most simply by showing (i) ⇒ (iii) ⇒ (ii) ⇒ (i), so that the arguments above for (i) ⇒ (ii) and (ii) ⇒ (iii) are not really needed. Remark 2.2. For any field F the property (ii) is equivalent to the minimal polynomials ψA , ψB of A, B having greatest common divisor (ψA , ψB ) = 1, and this alternative version of (ii) is easily seen to be equivalent to (i) even when F is not algebraically closed and the eigenvalues of A and B are not assumed to lie in F . For, if (i) holds for given f, g, then f (A) = h(A) yields ψA |(f − h) and by symmetry also ψB |(g − h), so that (ψA , ψB ) is a divisor of (f − h) − (g − h) = f − g. Hence, by choosing f = 1 and g = 0, we obtain (ψA , ψB ) = 1, while also, conversely, (ψA , ψB ) = 1 ⇒ (i) by using ψA , ψB instead of φA , φB in the argument above. However, the proof of (ii) ⇒ (iii) above is not valid for general F since it requires that φB (or equivalently ψB ) has no non-linear irreducible factor over F . 3. A generalization and applications For simplicity we stated Theorem 1.1 for just two matrices A and B, but we next note the following immediate extension to finite sets of square matrices. While Theorem 1.1 does in fact remain true even when r = s and A = B (since then (i), (ii) and (iii) are all obviously false), we shall state Theorem 3.1 (and 4.1) so as to exclude this trivial
364
M.P. Drazin / Linear Algebra and its Applications 505 (2016) 361–366
case: loosely described, each of (i), (ii) and (iii) is a different way of saying that each of a1 , . . . , am “differs significantly” from all the others. Theorem 3.1. For any field F , any n1 , . . . , nm ∈ N and any m different square matrices Ai ∈ Mni (F ) (i = 1, . . . , m) whose eigenvalues all lie in F , the following three joint properties of A1 , . . . , Am are equivalent: (i) for any given polynomials f1 , . . . , fm ∈ F [t] there exists h ∈ F [t] such that fi (Ai ) = h(Ai ) (i = 1, . . . , m); (ii) no pair of A1 , . . . , Am have any eigenvalue in common; (iii) for all i, j ∈ {1, . . . , m} with i = j and any ni × nj matrix X over F , Ai X = XAj ⇒ X = 0. Proof. To prove that (ii) ⇒ (i), now use the fact that there exist polynomials p1 , . . . , pm ∈ F [t] such that the characteristic polynomials φ1 , . . . , φm of A1 , . . . , Am m m satisfy i=1 pi φ1 . . . φi−1 φi+1 . . . φm = 1, and let h = i=1 pi φ1 . . . φi−1 φi+1 . . . φm fi ; the rest of the proof of Theorem 1.1 extends entirely routinely. 2 On writing n1 + . . . + nm = n, Theorem 3.1 is obviously relevant to any given n × n matrix A over any given field F ⊇ spec(A), since, e.g. by using the Jordan canonical form of A under similarity, we can write P −1 AP = diag .(A1 , . . . , Am ) where P ∈ Mn (F ) is invertible and each individual Ai has all its eigenvalues the same, say λi , and where λi = λj for i = j. Then (ii) holds, and we have the standard direct decomposition F n = V1 ⊕ . . . ⊕ Vm of the column vector space F n into subspaces V1 , . . . , Vm , where each Vi is spanned by the ni columns of P whose positions correspond to those of the diagonal block Ai in P −1 AP , so that ⎛
AV
1
A1 ⎜ 0 =P⎜ ⎝ ... 0
⎞ 0 ... 0 0 . . . 0⎟ P −1 V1 : V1 → V1 , .. .. ⎟ ⎠ . . 0 ... 0
and similarly for i = 2, . . . , m. Another special case of (ii) in Theorem 3.1 (or Theorem 1.1) is with m = 2 and spec(A2 ) = {0}: then A2 is nilpotent (say with Ak2 = 0), and, by (ii), A1 (hence also Ak1 ) is invertible, so that V1 = Ak F n and Ak V = 0. Thus in this case the general decomposition 2 F n = V1 ⊕. . .⊕Vm reduces to F n = V1 ⊕V2 = Ak F n ⊕ker(Ak ), a form of Fitting’s lemma. Several authors (see e.g. [3,5,7]) have noted more general versions of Fitting’s lemma, and Theorem 1.1 may be of use in such directions. Moreover, Fitting’s lemma (in any of its forms) is closely related to core-nilpotent decompositions and certain generalized inverses (such as the author’s pseudo-inverse [2] or quasi-inverse [3]), thus suggesting other settings in which Theorem 1.1 may be applicable.
M.P. Drazin / Linear Algebra and its Applications 505 (2016) 361–366
365
4. Main results for general rings and algebras In the special case n1 = . . . = nm = n, Theorem 3.1 becomes a result about the algebra Mn (F ) of all n × n matrices over F , and in fact the implication (i) ⇒ (iii) holds for arbitrary (i.e. not necessarily finite) subsets of any associative algebra with 1 (needed for (i) to make sense): Theorem 4.1. Let F be any (not necessarily algebraically closed) field, R any associative F -algebra with 1, and S any subset of R, regarded as a labeled family S = {aγ }γ∈Γ , where Γ is any given set with card(Γ) card(R) and where aγ = aδ only when γ = δ. Then the statement (i) for any given (correspondingly labeled) family {fγ }γ∈Γ ⊆ F [t] of polynomials, there exists h ∈ F [t] such that fγ (aγ ) = h(aγ )
(∀γ ∈ Γ)
implies (iii) for any given γ, δ ∈ Γ with γ = δ and any given x ∈ R, aγ x = xaδ ⇒ x = 0. Proof. Suppose that (iii) is false, i.e. that there exist some pair γ, δ ∈ Γ with γ = δ and some nonzero x ∈ R such that aγ x = xaδ . Then, if (i) were true, by two applications of the intertwining principle, we should have fδ (aγ )x = xfδ (aδ ) = xh(aδ ) = h(aγ )x = fγ (aγ )x, which, however, is falsified by choosing fγ , fδ in (i) to be (e.g.) the constant polynomials 0, 1. Thus (i) ⇒ (iii), as required. 2 (Hence, for commutative R, if (i) holds for given a, b ∈ R then a − b is not a divisor of zero.) There is also a corresponding result for arbitrary associative rings R with 1 and with F [t] in (i) replaced by Z[t], and more generally for rings R over any subrings (with 1) of their centers. One can even formulate a version of Theorem 4.1 for semigroups (with 0 and 1), but then the fγ and h in (i), instead of being polynomials, need to be monomials of the form ztk , where k ∈ {0, 1, 2, . . .} and z is any central element of the semigroup. The converse of Theorem 4.1 is false even when R is itself a field, e.g. when F = R and R = C with Γ = {1, 2}. For then (iii) holds for a1 = i, a2 = −i, but, for constant polynomials f1 , f2 ∈ R[t], say f1 (t) = c1 and f2 (t) = c2 (with real c1 , c2 ), (i) (with h ∈ R[t]) would give c1 = f1 (a1 ) = h(a1 ) = h(i) = h(−i) = h(a2 ) = f2 (a2 ) = c2 = c2 , so that (i) is false whenever c1 = c2 .
366
M.P. Drazin / Linear Algebra and its Applications 505 (2016) 361–366
Remark 4.2. In view of Theorem 4.1, the most efficient way to prove Theorems 1.1 and 3.1 is now via the sequence (i) ⇒ (iii) ⇒ (ii) ⇒ (i). However, because the argument for (ii) ⇒ (i) applies only to finite subsets, Theorem 3.1 cannot be extended to infinite sets of matrices, as the following explicit example at once confirms: Example 4.3. For F = C and n1 = n2 = . . . = 1, let Aj = (j) ∈ M1 (C) (∀j ∈ N). Then, in Theorem 3.1, obviously (ii) and (iii) both hold, but, if we choose fj (t) in (i) to be the constant polynomial 2j (∀j ∈ N), then (i) would require that h(j) = 2j (∀j ∈ N), which is impossible for any h ∈ C[t]. Thus (i) is false. Finally, note that Corollary 1.2 does not hold in general even for finite-dimensional R:
α β Example 4.4. In the subalgebra R = : α, β, γ ∈ F of M2 (F ), let a = 0 γ
θ 0 0 0 , b = for any θ, φ ∈ F satisfying θ = 0, φ = 0 and θ = φ 0 0 0 φ
α β (available in any field other than Z2 ). Then, for any x = ∈ R, we have 0 γ
θα (θ − φ)β ax − xb = , and so ax = xb ⇒ x = 0 for all x ∈ R. However, for 0 −φγ
0 1 y= ∈ R, by = 0 = ya, and so by = ya y = 0. 0 0 Acknowledgement I should like to thank the referee for helpful suggestions. References [1] F. Cecioni, Sopra alcune operazioni algebriche sulle matrici, Ann. Sc. Norm. Super. Pisa 11 (1910) 1–140. [2] M.P. Drazin, Pseudo-inverses in associative rings and semigroups, Amer. Math. Monthly 65 (1958) 506–514. [3] M.P. Drazin, Generalizations of Fitting’s lemma in arbitrary associative rings, Comm. Algebra 29 (2001) 3647–3675. [4] R. Horn, C. Johnson, Matrix Analysis, 2nd edition, Cambridge University Press, 2013. [5] P. Körtesi, J. Szigeti, A general approach to the Fitting lemma, Mathematika 52 (2005) 155–160. [6] C.C. MacDuffee, The Theory of Matrices, Chelsea, New York, 1946. [7] W.K. Nicholson, Strongly clean rings and Fitting’s lemma, Comm. Algebra 27 (1999) 3583–3592. [8] J.J. Sylvester, Sur l’équation en matrices px = xq, C. R. Acad. Sci. Paris 99 (1884) 67–71 and 115–116. [9] X. Zhan, Matrix Theory, Grad. Stud. Math., vol. 147, Amer. Math. Soc., 2013.