Ultramicroscopy 28 (1989) 315-319 North-Holland, Amsterdam
315
MULTIVARIATE STATISTICAL ANALYSIS OF ELECTRON PROBE MICROANALYTICAL DATA ON CELL NUCLEAR CONSTITUENTS C. QUINTANA U 194 M~thodologie Informatique et Statistique en M~decine, 1NSERM, 91 Boulevard de I'H~pital, F-75013 Paris, France and Centre de Biologie Cellulaire CNRS, 67 Rue Maarice Giinsbourg, F-94205 lory-sur-Seine Cedex, France *
and
A. OLLACARIZQUETA Servicio Microscapia Electronica, Centro Inoestigaciones Biolagicas, CSIC, 144 Velazquez, 28006 Madrid, Spain Received at Editorial Office 3 November 1988; presented at Workshop March 1988
A multivariate statistical analysis (the principal component analysis) has been used to process electron probe microanalytical data from cell nuclei. Fifty-seven measurements from different areas of chromatin and nucleolus in follicular rat cells have been studied. The variables are the X-ray characteristic signals for P, S, AI, Fe, Cu and Zn. This method demonstrates three groups of individuals - the chromatin area which is associated with a stronger concentration of P and two groups of nucleolar areas, one of them being connected with a higher content in S, AI and Zn. This high degree of correlation between these three elements proves the chemical affinity of the metals with the protein, S being the signature for proteins.
1. Introduction Biological applications of electron probe microanalysis (EPMA) concern either physiological or pathological situations - see ref. [1] for a recent review. Specimen preparation is different for these two major fields of research. In physiology, one is mostly interested in the localization of ionic free elements, which diffuse between the different extra and intracellular components; in this case, only rapid freezing, cryosectioning and freeze-drying techniques can be used to retain these elements in situ with the "in vivo" concentration level. However, these techniques offer a weak morphological preservation of cellular structures, which renders difficult the ultrastructural locali7a-
* Address for correspondence.
tion of these diffusible elements. Conversely, the analysis of toxic elements (which are generally present at high concentration within stable compounds) can be achieved on conventional electron microscopy preparations, even stained with Os, U or Pb. There is no major difficulty in localizing these toxic elements in lysosomes [2]. A third class of biological applications is relevant to both physiological and pathological domains: the study of metals which are naturally present in cell nuclei. Actually, metals such as Fe, Ni, Cu and Zn enter into the composition of some nucleo-enzymes; they can also interact with nucleic acids, and some of them, such as Zn, can play a role in the expression and the regulation of genes [3]. In the present state of our knowledge, they have a physiological influence which is very variable and poorly understood when they are present at low concentrations in the nuclei. However, they
0304-3991/89/$03.50 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division)
316
C. Quintana, A. Ollacarizqueta / Multivariate statistical analysis of EPMA data
become very toxic at high concentrations [4]. It is rather likely that they are not diffusible elements and consequently, EPMA can be performed on specimens prepared without the constraints required when diffusible elements are involved. Up to now, Fe, Ni, Cu and Zn have only been detected in cellular nuclei of Dinoflagellates [5] in aldehyde-fixed specimens as well as in thin cryofixed cryosections. Very recently, we have shown [6] that they can also be found, together with A1, in dense chromatin, nucleolus and chromosomes of different types of animal cells. This work has been performed on material chemically fixed with aldehydes, dehydrated and embedded at low temperature in Lowicryl. This type of specimen preparation provides an excellent preservation of nuclear components and of rough endoplasmic reticulum which can be visualized without any stain in dark field mode, as well as in CTEM or STEM. The sensitivity of the analysis is also improved due to the absence of staining products and the low density of the embedding medium. Furthermore, the understanding of the physiological activity of these metals requires the determination of their bonding with nucleic acids and (or) proteins. As it does not seem possible in the near future to solve directly this complex molecular problem with the available EELS or Auger techniques, we have undertaken to follow another approach. It consists of looking for a correlation occurring between the presence of these metals and that of P and S in the different nuclear compartments. Such a correlation would be useful because P is a signature for nucleic acids (1 P atom per basis) and S is a signature for some amino acids. Multivariate statistical analysis of data constitutes an appropriate tool for the study of such correlations. Thus, these methods study globally individuals and their characters (variables) and they are applied to partially correlate variables as in our case [7,8].
2. Principle of multivariate statistical analysis
The multivariate statistical analysis gathers two main classes of methods;
- t h e factorial methods, based on the properties of euclidean vectorial spaces; - the classification methods based on the algorithmic calculs. There are three different factorial methods [9-11]: principal component analysis, correspondence analysis, discriminate factorial analysis. The choice of method is governed by the type of problem and the nature of the data (of qualitative or numerical character). In the present case, the principal component analysis has been chosen and this paper presents the preliminary results of this study. Principal components analysis (PCA) is a descriptive method which graphically displays the maximum information contained within a table of n individuals and p quantitative variables (characters). On these graphs, similarity or difference between variables and individuals is appreciated through the proximity or remoteness of their projection. The principle of the technique is to transform the p original characters, which are more or less correlated, into q new characters (principal axis components) which are uncorrelated. The solution is found for the linear combinations of the p original variables which maximize the variance of the individuals. Mathematically, the following steps are followed: (1) Calculate either the variance-covariance matrix or the correlation matrix of the variables. (2) Diagonalize this matrix. Eigenvalues are the variances of the individuals and of the variables on the new set of orthogonal axis. The matrix of Eigenvectors contains the transformation coefficients between the initial and the new axis. (3) Calculate and graphically display, on a circle of radius unity, the correlations between the initial variables and the principal axis. The variables with higher correlation coefficient are those which contribute most to the creation of an axis. (4) Calculate and graphically display the coordinates of the individuals along the principal axis. As above, the individuals which contribute most are those with highest coordinates. From the study of the relative position of individuals and variables in several planes (one single plane may introduce perspective errors) it is then possible to induce the correlation between individuals (gathered into several groups), between variables and between groups and variables.
C. Quintana, A. Ollacarizqueta / Multivariate statistical analysis of EPMA data a
317
d -'...
.."
""-..,
"',, /'
• l
$ ",
/ i
b
•
I
• III it
P
i'
",,,,
:
#
II •
#
!
II
Fe
Z n/'"
"',.. "--,
./
Cu ".-.. .....
b
e -,..
.. /
¥
""',,,\
/
/ Cu • Zn
•
I
II
S •
p
""'..
Ill
II == 8q
I
•
4~
/
C8
AI,/ /, /,/
""'.....
Fe ........ ..--°"
4
........ .._..... /'/" /'p
"'.,,.
Ca
t
/ :'
Cu
AI
I
•
•
S
*~* eee 8
",
•
•
III
II d •
/,
Fe
/'
"',, .." " ' ° ' . ...... 1.......... "-
Fig. 1. Results of the principal components analysis on 57 different nuclear areas of folficular rat cells. Analysis of the variables P, S, AI, Fe, Cu, Zn: (a) projection on 1-2 principal axis; (b) projection on 1-3 principal axis; (c) projection on 1 - 4 principal axis. One observes the opposition between the phosphorus and the group of elements sulphur, aluminium and zinc. Analysis of individuals: (d), (e) and (f): group I is morphologically identified as dense chromatine; groups II and III are morphologically identified as nucleole; group III is richer in sulphur containing proteins and metals aluminium and zinc.
C. Quintana,A. Ollacarizqueta /Multivariate statisticalanalysis of EPMA data
318
3. A p p l i c a t i o n t o E P M A
Table 1 Results of the principal component analysis
results
T h e set o f d a t a is m a d e o f 57 i n d i v i d u a l s ( E P M A a n a l y s i s o n different c h r o m a t i n o r nucleolus areas in folicular rat cells). The v a r i a b l e s are the six X - r a y c h a r a c t e r i s t i c signals I x (P, S, A1, Fe, Cu, Zn). T h e C a signal, always p r e s e n t in g r e a t a m o u n t s in all the a n a l y z e d a r e a s [6], h a s b e e n c o n s i d e r e d as " s u p p l e m e n t a r y " variable, i.e. it does n o t interfere with the analysis b u t its c o r r e l a t i o n with the p r i n c i p a l axes is c a l c u l a t e d a n d d i s p l a y e d in the graphs. T h e signals h a v e b e e n p r e v i o u s l y n o r m a l i z e d , i.e. d i v i d e d b y the X - r a y c o n t i n u u m m e a s u r e d b e t w e e n 5.2 a n d 6.2 k e V ( W ) which has b e e n itself c o r r e c t e d f r o m the grid c o n t r i b u t i o n ( W ~ ) . These numbers I x / ( W Wep) are p r o p o r t i o n a l to the m a s s f r a c t i o n for e a c h e l e m e n t ( H a l l m e t h o d ) . A s the results o f t h e P C A analysis are i n f l u e n c e d b y the m a g n i t u d e o f the variables, it is t h e r e f o r e n e c e s s a r y to t r a n s f o r m the initial table m a d e o f 57 rows a n d 6 c o l u m n s i n t o a n o t h e r o n e with c e n t e r e d a n d r e d u c e d variables. C o n s e q u e n t l y , all m e a s u r e d e l e m e n t s h a v e the s a m e weight in the statistical analysis, i n d e p e n d e n t l y f r o m their c o n c e n t r a t i o n . T h e results o b t a i n e d with the S T A T I T C F software o n a P C A T m i c r o c o m p u t e r a r e shown in tables 1 a n d 2 a n d in fig. 1. T h e m a j o r results are: (a) Analysis of the characters. Fig. 1 d i s p l a y s the c o r r e l a t i o n b e t w e e n the s t u d i e d v a r i a b l e s w i t h the p r i n c i p a l axes 1, 2, 3 a n d 4. O n e sees a s t r o n g c o r r e l a t i o n b e t w e e n the axis 1 a n d the e l e m e n t s S, A1, Z n a n d P. O n e c a n also n o t e the o p p o s i t i o n b e t w e e n the P e l e m e n t a n d the g r o u p s S, A1 a n d Zn. T h e C u a n d F e h a v e g o o d c o r r e l a t i o n with the s e c o n d a n d t h i r d axes respectively. T h e r e is n o c o r r e l a t i o n b e t w e e n C a a n d p r i n c i p a l axes 1 a n d 2.
(a) Diagonalization of the correlation matrix
Axis 1 Axis 2 Axis 3 Axis 4
Eigenvalues (variance on the axis)
Contribution to total variance (%)
2.3263 1.3738 1.0324 0.7390
38.8 22.9 17.2 12.3
(b) Correlation between the variables and the principal axes Variables
Axis 1
Axis 2
Axis 3
Axis 4
P s AI Fe Cu Zn Ca
- 0.6459 0.8477 0.8026 - 0.1968 -0.1785 0.6898 -0.1615
0.1465 0.2532 0.0948 - 0.4450 -0.8301 - 0.6262 -0.0473
-
0.6135 0.1540 0.3035 - 0.3456 0.3560 0.0255 0.6848
0.3986 0.0311 0.4150 0.7839 0.2930 0.0000 -0.4328
(b) Analysis of the individuals. O n e d i s t i n g u i s h e s three g r o u p s o f i n d i v i d u a l s w h i c h differ o w i n g to their g l o b a l c o m p o s i t i o n : I ( e ) , II (1), I I I (~r). G r o u p I, a s s o c i a t e d w i t h a s t r o n g e r c o n c e n t r a t i o n in p h o s p h o r u s , is o p p o s e d , a l o n g axis 1, to g r o u p I I I , c h a r a c t e r i z e d b y a higher c o n c e n t r a t i o n in S, A1 a n d Zn. G r o u p II is i n t e r m e d i a t e b e t w e e n them. T h e m o r p h o l o g i c a l i d e n t i f i c a t i o n o f the ind i v i d u a l s in the d i f f e r e n t g r o u p s shows t h a t g r o u p I is c o n n e c t e d to areas with d e n s e c h r o m a t i n while g r o u p s I I a n d I I I c o r r e s p o n d to n u c l e o l a r areas.
4. D i s c u s s i o n
I n theory, in the a l d e h y d e fixations, the possib i l i t y o f a r t e f a c t u a l r e d i s t r i b u t i o n o f all the a n a l y s e d e l e m e n t s c a n n o t b e rejected. But the
Table 2 Means 4-standard deviation of the mean of whole individuals and of groups of individuals f
Total Group I Group II Group III
P
S
A1
Fe
Cu
Zn
Ca
n
262 -I-17 4044-27 213+18 183+-10
17 + 2 64-2 17+2 28+-2
16 4-1 104-2 12+1 25+-2
5,5 4-0,4 6 4-1 6 +1 5 4-1
13 4-1 14+2 14+_2 11+-3
7 4-1 44-0,4 6+_1 12+-1
85 4- 9 964-7 764-6 834-5
57 17 19 21
c. Quintana, A. OIlacarizqueta / Multivariate statistical analysis of EPMA data
strong correlation between P and dense c h r o m a t i n or S and the nucleolus agrees with the classical biochemical data. Furthermore, the statistical analysis shows the correlation between S and the metals A1 a n d Zn. O n the other h a n d Ca, which is a well-known diffusible element, is present in all analyzed compartments, but the p o o r correlation with a n y principal axis indicates a non-specific distribution. Thus, the high correlation between S a n d A1 and Z n implies a chemical affinity. Alt h o u g h Fe and C u have g o o d correlations with axes 3 and 2, they do not seem to be chemically linked in stoichiometric f o r m to the other analysed elements. Consequently, these results seem to d e m o n s t r a t e that a nucleolar c o m p o n e n t , g r o u p I I I (which is n o t yet identified) is richer in sulphur-containing proteins, A1 and Zn.
5. Conclusion O u r results show that this multivariate statistical analysis method, mainly principal c o m p o n e n t s analysis, can be applied to distinguish the different constituents of cell nuclei b y their elemental composition in phosphorus, sulphur and metals. T h e automatic classification m e t h o d s followed
319
b y the discriminate factorial analysis are being carried out at present, and preliminary results agree with the P C A data. A comparative study of these two approaches will be published in the near future.
References [1] A. Le Furgey, M. Bond and P. Ingrain, Ultramicroscopy 24 (1988) 185. [2] P. Galle, J. Microscopy 19 (1974) 17. [3] A. Klung and D. Rhodes, in: Trends in Biochemical Sciences, Vol. 12 (1987) p. 464. [4] D.R.C. McLachlan and B.J. Farnell, in: Neurology and Neurobiology, Vol. 15, Eds. S. Gabay, 3. Harris and B.T. Ho (Liss, New York, 1985) pp. 69-87. [5] L.P. Kearns and D.C. Sigee, J. Cell Sci. 46 (1980) 113. [6] C. Quintana, A. Olmedilla, A. Ollacarizqueta and N. Antoine, Biol. Cell 61 (1987) 115. [7] C. Ballan-Dufranqaise, A.Y. Jeanted, C. Feghali and S. Halpern, Biol. Cell 53 (1985) 283. [8] H.B. Jones, A.J. Morgan, D.P. Lovell and D. Bellamy, Micron and Microscopica Acta 17 (1986) 307. [9] J.P. Benzecri, in: Methodologies of Pattern Recognition, Ed. S. Watanabe (Academic Press, New York, 1969) p. 35. [10] L. Lebart, A. Morineau and J.P. Fenelori, Traitement des Donn6es Statistiques; M~thodes et Programmes (Dunod, Paris, 1979). [11] T. Foucard, Analyse Factorielle; Programmation sur Microordinateur (Masson, Paris, 1982).