163
Clinica Chimica Acta, 52 (1974) 163-171 0 Elsevier Scientific Publishing Company,
Amsterdam
- Printed
in The Netherlands
CCA 6222
THE FREQUENCY DISTRIBUTIONS OF COMMONLY DETERMINED BLOOD CONSTITUENTS IN HEALTHY BLOOD DONORS
F.V. FLYNNa, HEALYb
K.A.J.
PIPERa,
P. GARCIA-WEBBa*
K. McPHERSONb
and M.J.R.
‘Department of Chemical Pathology, University College Hospital, London W.C. 1 and bDivision of Computing and Statistics, MRC Clinical Research Centre, Northwick Park Hospital, Harrow, Middx., HA1 3UJ (U.K.) (Received
October
2, 1973)
Summary The frequency distributions of 19 biochemical variables in blood have been studied on 1000 blood donors. The distributions within age/sex groups of calcium, CO2 capacity, chloride, ions difference, inorganic phosphate, potassium, sodium and total protein values did not differ significantly from Gaussian form; those of alkaline phosphatase, cholesterol, creatinine, glucose, glutamic oxaloacetic transaminase, iron, protein-bound iodine, urea and urate could be converted to close agreement with Gaussian distributions by logarithmic transformations, although the glucose distribution remained significantly long-tailed. Albumin and globulin were slightly skew in opposite directions. There was no evidence that distributional shape was affected by age or sex.
Introduction In 1968-1969 a survey of a thousand blood donors was undertaken with the primary objective of establishing the effects of age, sex and other factors upon the levels of biochemical constituents of the blood in normal individuals. Seventeen biochemical constituents were measured and the results are thus very extensive; this paper is the first of a series in which they will be reported. When the clinician is faced with the numerical result of a biochemical determination, he is often interested in comparing this value with a population of values (in the statistical sense) which might be obtained from measurements on a large number of healthy individuals. More specifically, he may wish to know what fraction of the normal population exhibits values more extreme than that observed. If this fraction is small, this may be taken as evidence of
* Present
address:
Alfred Hospital,
Prahran,
Victoria,
Australia.
164
A
Fig.
1.
Gaussian
Density curves
functions have
for
the
same
populations means
and
exhibiting standard
skewness
(A)
and
kurtosis
(B):
the
superimposed
deviations.
abnormality. Many standard statistical techniques assume that the underlying probability distribution population value is of Gaussian form. One of these is the rule of thumb stating that 95% of the population is contained in the “normal range” defined by the mean I two standard deviations; this may be far from the truth when the distribution is non-Gaussian in shape. Non-Gaussian distributions may be asymmetric or skew (Fig. lA), or they may be symmetric but possess higher (or lower) peaks than the Gaussian curve (Fig. 1B) when they are said to exhibit kurtosis. Skewness and kurtosis are measured by two quantities yI and y2 which depend on the shape of the distribution function [l] ; for the Gaussian curve, both y1 and y2 are zero. The shape of actual distributions has to be assessed from samples and this can be done to some extent by calculating from the sample values statistics g, and g, which estimate the population quantities y1 and y2. Before trying to assess population shape, it is important to allow as far as possible for factors that may affect the population mean. Suppose for example we have male and female populations with Gaussian distributions but different means. If we mix the two together (especially in unequal proportions), we obtain a population which can be noticeably different from the Gaussian form (Fig. 2).
Fig.
2. Non-Gaussianity
produced
by
mixing
two
Gaussian
populations.
165
Our data contain 17 measurements (and two derived quantities) on blood donors selected to give samples of nearly fifty subjects from each of nine 5-year age groups and both sexes - more detail is given in the following section. For each measurement, we thus have 18 sub-samples and can calculate 18 values of gl and g, . From these we can see whether there is any consistent pattern of skewness or kurtosis in the distributions from which the samples were drawn. Material and Methods The blood donors Specimens of blood were collected between 9.30 a.m. and 12.30 p.m. from 1022 Caucasian donors attending routine sessions of the National Blood Transfusion Service; all volunteered to contribute a small sample of blood before the main donation. The volunteers comprised 521 males and 501 females aged between 18 and 65 years. Nine of the donors subsequently failed to satisfy the criteria laid down by the Transfusion Service for fitness for donation and these were eliminated from the survey, together with 2 more from whom an insufficient specimen was obtained and one noted later to have an icteric serum (serum bilirubin 2.4 mg/lOO ml). The only selection of available donors that took place was in relation to age and sex. We aimed at obtaining 50 of each sex in each of the age groups 18-20, 21-25, 26-30, 31-35, 36-40, 41-45, 46-50, 51-55, 56-60 and 61-65 years; this proved possible in all groups save the last. A number of facts were recorded about all the donors, namely the date and time of the blood sampling, sex, date of birth, ABO and Rhesus blood group, the number of previous donations, the time of last having something to eat or drink, and whether clioquinol (‘Enterovioform’) had been taken in the last 3 months or cough mixture in the last week. Women donors were also asked the date of their last menstrual period, or when appropriate, the date of the menopause or of a hysterectomy; married women were asked if they were taking a contraceptive pill. The collection and handling of blood samples Twenty-two-ml blood samples were collected from donors while seated. An elastic cuff was used to aid venepuncture but was rarely kept on for more than 1 min, the average duration being about 50 s. Donors were instructed not to clench the fist and during the handling of blood specimens precautions were taken to avoid haemolysis and chemical contamination. Each blood sample was divided between 3 containers: (a) One ml was transferred into a polystyrene tube containing solid sodium fluoride and potassium oxalate; this sample was refrigerated as soon as possible at 4” and was used to determine the blood glucose. (b) 7 ml were transferred into a glass bottle specially cleaned so as to render it iron free; this sample was kept at room temperature until the clot retracted to allow separation of an adequate volume of serum, which was stored at 4” until used for the determination of iron and total protein. (c) 14 ml were transferred into polystyrene tubes containing solid lithium heparin and ‘Sep-ar-ate’ polystyrene beads to facilitate rapid separation of the
166
plasma; the stoppered tubes were centrifuged promptly and in all cases the plasma was separated from the blood cells and transferred into a Pyrex test tube within 20 min. The plasma samples were stored at 4” until used for all the other determinations. Chemical analyses Seventeen constituents were measured on each donor’s blood. The urea, total COZ capacity, chloride, sodium, potassium and glucose determinations were all carried out within 6 h of collection. All the other determinations, namely total protein, albumin, calcium, inorganic phosphorus, alkaline phosphatase, uric acid, creatinine, cholesterol, glutamic oxaloacetic transaminase, iron and protein-bound iodine, were completed within 30-72 h. All the specimens were batched with routine analyses being carried out for patients attending the hospital; the donor samples were inserted in the analytical runs in a single block so that carry-over effects from specimens of abnormal composition were eliminated. The methods of chemical analysis were those in routine use in the Department of Chemical Pathology of University College Hospital. Total protein was estimated by the method of Reinhold [2], making use of an automatic diluter and a Gifford 300 spectrophotometer to simplify procedure and improve precision. All other analyses were carried out with AutoAnalyser techniques; these were operated at 40 specimens per h except in the case of albumin and iron determinations which were run at 30 cups per h and protein-bound iodine determinations which were operated at 20 cups per h. Technicon methods N-21a and N-lc were grouped together for the simultaneous CO, capacity, chloride, sodium, potassium and urea determinations; the only modifications were that the donors’ specimens were automatically equilibrated with 5.5% COZ in air on the sampler plate just before sampling and that the ferric chloride-phosphoric acid reagent used in the urea method was replaced by a phosphoric acid-nitric acid reagent. The method of Halse [3] and procedures based on Technicon methods N-4b and N-7a were grouped together for simultaneous determination of calcium, inorganic phosphate and alkaline phosphatase, respectively; the last procedure was modified by inclusion of a dialysis stage to eliminate the need for blank determinations, as suggested by Axelsson et al. [ 41, and by the use of human sera assayed by the method of Kind and King [5] as standards. Procedures based on Technicon methods N-13b and N-llb were grouped for simultaneous determination of urate and creatinine; the former method was modified by substitution of sodium carbonate for the sodium cyanide reagent [ 61. The remaining determinations were carried out on single channel analysers, albumin by the bromo-cresol green method of Northam and Widdowson [ 71, glucose by the micro glucose oxidase method of Marks and Lloyd [ 81, cholesterol by the totally automated procedure of Van der Honing [9], glutamic oxaloacetic transaminase after the fluorometric method of Levine and Hill [lo], using human serum assayed by the Karmen kinetic technique as standards, iron by the method of Young and Hicks [ll] and protein-bound iodine by Technicon method N-56. Peak heights from the AutoAnalyzer channels were automatically recorded on punched paper tape. The results were calculated by computer, using
167
a program which automatically validates the calibration standards and corrects for instrument or chemical drift [ 121; they were expressed to two decimal places in the case of potassium, calcium, phosphate, urate, albumin and protein-bound iodine determinations and to one place for all other channels. In the case of the total protein determinations the optical density readings obtained on the digital read-out device were recorded manually and subsequently punched on paper tape. To monitor the within-batch and between-day precision of the methods, three samples of a serum pool and a single sample of a commercial quality control serum were included in every analytical run. Initial statistical handling of data The personal and analytical data on all the donors and the information on the quality control data sheets were punched on 80-column cards and verified by an independent punch operator. The information was then transferred to a magnetic tape file along with three derived items of information, the donor’s age in years, the serum total globulin value and the so-called ‘ions difference’, which equals the amount by which the sum of the sodium and potassium values exceeds the sum of the CO* capacity and chloride concentration, all concentrations being expressed in mmoles/l; this gives a measure of the minor anions. The information held on the file was subsequently printed out and all entries checked against the source documents. Forty-eight two-way plots of the concentration of one constituent against that of another were first produced by the computer (the protein-bound iodine results of subjects who were taking oral contraceptives or who had taken clioquinol in the last 3 months were excluded from these plots). The plots were used to identify donors who appeared not to belong to the desired population because of grossly outlying results. Twelve such donors were identified and excluded from all further analyses; all of these were subsequently shown to have at least one result lying outside three standard deviations from the mean for their age and sex grouping, and six of them had two or more such results. Results As described above, each chemical measurement provided 18 age/sex subsamples from which the shape parameter estimates g, and g, were calculated. The means and standard errors of these estimates are given in Table I (the standard errors are estimated directly from the variation of the 18 values among themselves, not derived from theoretical considerations). Eight determinations (calcium, CO2 capacity, chloride, ions difference, phosphorus, potassium, sodium and total protein) gave mean values of g, and g, which did not differ significantly from the zero values which are expected on the Gaussian hypothesis. The other eleven determinations had skewness measures which were significant (i.e. consistent in sign across the 18 age/sex sub-groups), many to a high degree. The skewness was positive (long tail towards high values) except for albumin which was negatively skew. Five determinations had significant positive kurtosis; all of these were also significantly skew.
168 TABLE
I
SKEWNESS
AND
KURTOSIS
OF
UNTRANSFORMED
constituent
VALUES
g1
S.E. (gl)
0.193
0.052**
1.210
0.111***
Calcium
0.098
0.086
CO2
0.144
0.074
-
Albumin Alkaline
phosphatase
capacity
Chloride
-
-
g2
S.E. (g2)
0.023
0.113
2.263
0.373***
0.004
0.120
0.242
0.146
0.002
0.140
0.154
0.098
Cholesterol
0.465
0.103***
0.400
0.393
Creatinine
0.314
0.114*
0.177
0.305
Globulin
0.250
0.112*
0.664
0.401
Glucose
0.541
0.172**
1.45
0.408**
1.198
0.181***
2.871
0.786**
0,007
0.123
0.455
0.323
0,670
0.102***
0.851
0.323’
Glutamic Ions
oxaloacetic
transaminase -
difference
Iron
-
- 0.116
0.077
- 0.127
0.147
0.029
0.074
0.096
0.121
2.009
0.4e1***
9.310
3.376*
0.004
0.081
0.217
0.131
Urea
0.426
0.086***
Uric acid
0.182
0.064*
Phosphorus Potassium Protein
bound
iodine
Sodium Total
protein
-
-
0.114
0.146
0.412
0.351
0.166
0.199
0.0626
0.139
* P
P
Logarithmic Transformations The finding of non-Gaussian distributions 1s in no way novel, though previous workers have not usually made due allowance for age and sex effects. It has often been shown that the logarithms of certain determinations are more nearly distributed in Gaussian form than the original values. We have examined this possibility for those of our determinations which show significant positive skewness, allowing the extra freedom of adding or subtracting a constant value before taking logarithms - this simple manoeuvre greatly extends the power of the logarithmic transformation [13] . The means and standard errors of g, and g, calculated from the transformed values are given in Table II. It will be seen that only three significant values remain (out of a total of 38 tested). The most important is glucose, where the logarithmic values, though symmetric, are very significantly more peaked than the Gaussian curve. The following points also require comment. Total protein does not differ significantly from the Gaussian form; albumin is negatively skewed; globulin is positively skewed, but a simple logarithmic transformation over-corrects this. In view of the additive relationship between these three measurements, we have used all three untransformed in subsequent statistical analyses. Protein-bound iodine levels can be drastically raised in subjects who have recently taken clioquinol (‘Enterovioform’) or certain cough mixtures, or who are taking highoestrogen oral contraceptives. We have excluded from our analyses donors whose values were known to be liable to this kind of influence.
169 TABLE
II
SKEWNESS The
AND
units
for
tein-bound alkaline
the
KURTOSIS additive
iodine;
OF
TRANSFORMED
constants
Karmen
units
are: per
mg/lOO
ml/min
at
VALUES ml& for 25
urea
for
and uric
GOT;
King
acid;
/_Ig/lOO
Armstrong
ml.
units/100
for
iron
andOpro-
ml at 37
for
phosphatase.
Variable
S.E.
g1 phosphatase
-
2)
0.119
0.136
0.860
0.081
0.035
0.171
0.026
0.094
0.089
0.185
0.186
0.088*
0.368
0.246
0.134
0.158
1.156
0.21-l**
0.002
0.168
1.063
0.581
0.084
0.083
0.121
0.107
-
0.058
0.097
0.268
0.259
-
0.004
0.084
0.006
0.144
-
0.098
0.068
0.041
0.134
Log
creatinine
Log
globulin
-
Log
glucose
-
Log
(GOT
- 4)
-
Log
(iron
+ 30)
-
Log
(protein-bound
Log
(urea
+ 10)
Log
(uric
acid
+ 4)
0.306*
(92)
0.044
cholesterol
2)
S.E.
-
(alk.
Log
-
g2
-
Log
iodine
(gl)
* P co.05. **
P
Age and Sex Effects on Distributional Shape The above analyses have tacitly assumed that a single distributional shape can be used for all age and sex groups. To test this assumption, each set of 18 values of g, and g, (using transformed variables where appropriate) were subjected to an analysis of variance of the form Degrees
of Freedom
Sex Age-linear
1 trend
1
Age-remainder
I
Residual
8
Total
17
The results of the 38 analyses will not be given in detail; only three significant mean squares appeared out of 114 tested, those associated with sex differences in g, for log creatinine (P < 1%) and log cholesterol (P < 5%) and the nonlinear age effect in g2 for globulin (P < 5%). On the whole, it seems fair to disregard these as chance effects. Discussion Our finding that the frequency distributions of almost all the determinations studied can be taken to be Gaussian, possibly after preliminary logarithmic transformation, is of more than academic interest in the context of determining normal ranges or reference values. These values have to be determined from samples, and consequently will be subject to sampling variability - if two
170
samples are drawn from a single population, the reference values determined from the first sample will not agree exactly with those determined from the second. The likely size of the disagreement will depend, among other things, on the statistical method used. There are in practice two possible choices; either we may fit a Gaussian distribution to the samples values and calculate the reference values from the estimated mean and standard deviation, or else we may arrange the sample values in ascending order and read off the reference values from them, interpolating if necessary. This latter method makes no assumptions about the form of the underlying distribution and has been recommended (under the name of the percentile method) for this reason by Herrera [14]. The two methods differ quite markedly however in the efficiency with which they use the data. The percentile method, used to determine the usual 95% reference values, has an efficiency of no more than 50% - this means that to attain the same precision in the reference values as that achieved by the Gaussian procedure, twice as many subjects would be needed. Further information and actual examples are given in Elveback and Taylor [ 151 (see especially their Fig. 4). It is also of interest that a single distributional shape (Gaussian or logGaussian) appears to hold for both sexes and all the age-groups studied. It is sometimes suggested [16] that the observed distributions which are skew to the right are made so by the appearance of a proportion of high values, abnormal though sub-clinically so, which increases with age. Our results make such an explanation rather unlikely. We are not able, however, to assert that our findings, especially the additive constants used in the transformations, will hold outside the age-range 18-65. Acknowledgements We wish to thank the Director, Dr T.E. Cleghorn, and the staff of the North London Centre of the National Blood Transfusion Service for enabling us to undertake this investigation. We also wish to thank the blood donors whose collaboration made this work possible and the Department of Health and Social Security for financial support. References 1 R.A. Fisher. Statistical Methods for Research Workers, 10th edn. Oliver and Boyd, Edinburgh, 1946. Chapter II. in M. Reiner (Ed.), Standard Methods of Clinical Chemistry, Vol.1. Academic Press, 2 J.G. Reinhold. New York, 1953, p. 88. 3 K. H&e. in Automation in Analytical Chemistry (Technicon Symposium 1967) Vol. 2, p. 143. B. Ekman and D. Knutsson. in Automation in Analytical Chemistry (Technicon Sym4 H. Axelsson, posium 1965) p. 603. 322. 5 P.R.N. Kind and E.J. King, J. Clin. Pathol., 7 (1954) S. Zelmanowski. E. Lew, A. Ruttenberg and B. Fanias. J. Clin. Pathol., 14 (1961) 450. 6 F. Eichhorn, 7 B.E. Northam and G.E. Widdowson. Ass. Clin. Biochem. Technical Bulletin No. 11 (1967). 2 (1963) 176. 8 V. Marks and K. Lloyd, Proc. Ass. Clin. Biochem.. 960. 9 J. Van der Honing, CC. Saarloos and J. Stip. Clin. Chem., 14 (1968) in Analytical Chemistry (Technicon Symposium 1965) p. 10 J.B. Levine and J.B. Hill, in Automation 569.
171 11 D.S. Young and J.M. Hicks, J. Clin. Path&, 12 13 14 15 16
18 (1965)
98.
F.V. Flynn, Proc. Roy. Sot. Med., 59 (1966) 779. J.W. Tukey, Ann. Math. Statist.. 28 (1957) 602. L. Herrera, J. Lab. Clin. Med., 52 (1958) 34. L.R. Elveback and W.F. Taylor, Ann. N.Y. Acad. Sci., 161 (1969) 538. J.B. Files, H.J. Van Peenen and D.A.B. Lindberg, J. Am. Med. Ass., 205 (1968)
684.