FERTILITY AND STERILITY
Vol. 64, No.6, December 1995
Copyright e 1995 American Society for Reproductive Medicine
Printed on acid-free paper in U. S. A.
Sperm morphology and in vitro fertilization outcome: a direct comparison of World Health Organization and strict criteria methodologies* Abraham Morgentaler, M.D. t May Y. Fung, B.S. Doria H. Harris, Ph.D.
R. Douglas Powers, Ph.D. Michael M. Alper, M.D.
Division of Urology and Department of Obstetrics and Gynecology, Beth Israel Hospital, Harvard Medical School, Boston, Massachusetts, and Boston NF, Brookline, Massachusetts
Objective: To perform a direct comparison of two sperm morphology methodologies with regard to IVF outcome. Design: Blinded comparison of two methods of morphology assessment using the same morphology slides. Patients: Data were obtained from 132 couples in a consecutive series of patients undergoing IVF. Main Outcome Measures: Two practical end points were selected for analysis for each couple: the presence of any fertilization and the number of fertilized eggs. Normal traditional morphology was defined as :2:40% normal forms in a sample and normal strict criteria was defined as :2:4%. Results: Traditional morphology demonstrated a higher sensitivity and negative predictive value than strict criteria (87% versus 61%, and 68% versus 36%, respectively). Positive predictive value and specificity were also numerically greater but did not reach statistical significance. Abnormal traditional morphology, but not strict criteria, was associated with reduced fertilization even among samples with normal sperm concentration and motility. Samples with normal morphology were associated with a greater number of fertilized eggs per couple than those with abnormal morphology: this difference was 3.2 fertilized eggs for traditional morphology and 1.6 for strict criteria. Overall, for samples with <40% by traditional morphology only one case yielded more than two fertilized eggs. In contrast, up to five fertilized eggs were noted for the lowest strict criteria scores. Conclusions: Comparison of traditional morphology and strict criteria with regard to IVF outcome favored traditional morphology in several areas. In particular, low scores were more predictive of poor IVF outcome. Fertil Steril 1995; 64:1177 -82 Key Words: Sperm, morphology, strict criteria, IVF, World Health Organization
In 1986 and 1988 Kruger et a1. (1, 2) described a set of strict criteria for normal sperm morphology, which was highly correlated with IVF outcome (1, 2). Studies in large IVF populations have confirmed the utility of sperm morphology in fertility studies
Received January 10, 1995; revised and accepted May 18, 1995. * Presented in part at the Meeting of the American Urological Association, San Antonio, Texas, May 15 to 20, 1993. t Reprint requests: Abraham Morgentaler, M.D., 330 Brookline Avenue, Boston, Massachusetts 02215 (FAX: 617-278-7292). Vol. 64, No.6, December 1995
when performed using strict criteria (3). In contrast, sperm morphology assessment by World Health Organization (WHO) criteria (4,5) has not been identified uniformly as a predictive variable for IVF success. Some reports have indicated a positive relationship to IVF outcome (6-8), whereas others have observed no association whatsoever (9-12). In view of these mixed results for WHO, strict criteria has gained prominence as an improved methodology for morphology assessment. However, to properly determine the merits of one test over another requires a direct comparison, ideally performed under Morgentaler et al. Sperm morphology and NF outcome
1177
blinded circumstances. Despite increased application of strict criteria among IVF programs in this country, there exists only limited data comparing strict criteria to the classic WHO morphology assessment (13, 14). Interpretation has been restricted by small numbers and by the potential for observer bias when samples are scored according to two criteria by the same individual or laboratory. As a result, it remains to be demonstrated whether strict criteria is superior to WHO criteria. The purpose of this study was to compare directly strict criteria and WHO criteria in an active IVF program. This was performed by having the same morphology slides read in blinded fashion by two different technicians in different laboratories, each trained either in strict criteria or the WHO method. Two practical end points were selected for analysis: the presence of any fertilization and the number of fertilized eggs. Because the only variable in the study was sperm morphology, female factors and technical aspects of IVF were identical for both groups, leaving any difference in outcome to be explained by differences in morphology assessment. MATERIALS AND METHODS Patients
The study population consisted of 141 consecutive couples undergoing IVF at Boston IVF in Brookline, Massachusetts, from October to December 1991. Nine cases were excluded due to incomplete data or inadequate slide material, leaving 132 cases for analysis. Semen Analysis
An aliquot of the raw semen sample was used for semen analysis. The remainder of the specimen was processed for IVF. Strict criteria morphology assessment was performed by technicians having undergone specific and personal training by one of the original authors of this technique. At a later date, the same morphology slides were reviewed in blinded fashion in a different laboratory by a technician experienced only in WHO morphology assessment. Although the WHO has proposed 2:50% and, more recently, 2:30% as the proper threshold for normal morphology (4,5), our own experience has suggested 2:40% as a more useful level, and this was the threshold used in this study. Data for strict criteria were studied at several thresholds, including > 14%, 2:8%, and 2:4%. Morphology Assessment
Strict criteria morphology assessment was performed with careful adherence to published protocols 1178
Morgentaler et al. Sperm morphology and NF outcome
(2). All sperm heads failing to meet established size and shape criteria were categorized as abnormal. World Health Organization morphology differed primarily in classifying borderline head forms as normal. Tail defects, pinheads, grossly abnormal heads, or other obvious deformities were categorized as abnormal by both strict criteria and WHO criteria. In Vitro Fertilization
Clinical and laboratory aspects of IVF as performed in this study have been described previously (15). The number offertilized eggs was recorded, and results were categorized as IVF + if any fertilized eggs occurred and IVF - if no eggs fertilized. Statistical Analysis
Statistical analysis was performed using StatView 4.0 software (Abacus Concepts, Berkeley, CA). Differences of means were analyzed using the nonparametric Mann-Whitney test. Two way tables with Fisher's exact P values were used for group comparisons. Multiple and stepwise regression was performed for identification of variables contributing independently to the number of fertilized eggs obtained. RESULTS Sperm Morphology
World Health Organization criteria scores ranged from 8% to 72%, with a mean of 45.5%, whereas strict criteria scores ranged from 0 to 21%, with a mean of 5.4%. Normal sperm morphology was present in 71% of men by WHO criteria at a threshold of 2:40% but only 5% of men by strict criteria at a threshold of > 14% normal forms. The population above this threshold has been reported to have a favorable prognosis with regard to IVF outcome (1, 2). However, because of the small number of men qualifying as normal at 2:14%, further analyses were performed using lower thresholds of 2:8% and 2:4%, which resulted in categorization as normal for 27% and 58% of the population, respectively. For comparison purposes, 2:4% was used to represent a normal sample by strict criteria. Samples with strict criteria scores below this level have been demonstrated previously to have poor IVF success (1-3). Correlation Between WHO and Strict Criteria
Sixty-eight percent of samples with normal WHO criteria also had normal strict criteria, and 66% of those with abnormal WHO criteria were also abnormal by strict criteria. Conversely, 83% of those categorized as normal by strict criteria were similarly Fertility and Sterility
Table 1 Statistical Comparison of World Health Organization and Strict Criteria Methodologies with Regard to IVF Outcome
Positive predictive value Negative predictive value Specificity Sensitivity
WHO criteria
Strict criteria
%
%
83 68 62 87
71 36 48 61
P
NS* 0.003 NS <0.0001
for IVF- (P = 0.04). The IVF+ group also demonstrated higher sperm concentration (71.0 ± 3.1 versus 45.1 ± 4.5 X 106/mL; P < 0.0001) and motility (62.4% ± 1.5% versus 55.1% ± 2.8%; P = 0.02) than the IVF- group. Samples with "Adequate" Motility and Concentration
The original reports describing strict criteria were applied to samples with "adequate" motility and con-
* NS, not significant. 70
categorized by WHO criteria, and 46% of those with abnormal strict criteria scores were abnormal by WHO criteria. Forty-eight percent of the total study population was normal by both strict and WHO criteria. The overall correlation between WHO and strict criteria morphology was 0.461.
60
50
B
u
Fertilization Rates and Morphology Scores
Overall, there was fertilization of one or more eggs in 90 of 132 couples, for a fertilization rate of 68%. A fertilization rate of 100% was achieved by samples with strict criteria scores> 14%, although this applied to only six cases. The highest WHO morphology scores also demonstrated excellent IVF success, with 90% fertilization for scores 2e: 65% (n = 10). At the lower end of the scale, samples with strict criteria scores < 4% demonstrated a fertilization rate of 64%, which was not statistically different from the overall fertilization rate. In contrast, fertilization rates for men with WHO scores <40% were significantly reduced at 32% (Fisher's exact, P < 0.0001).
IVF + Versus IVF-
Histograms of morphology scores for IVF + and IVF- groups demonstrated a shift towards higher scores with positive outcome for WHO criteria, which was less apparent with strict criteria (Fig. 1). Higher morphology scores were noted for the IVF + group compared with IVF-. Values for WHO were 49.8 ± 1.8 versus 36.3 ± 2.3 (P < 0.0001). Strict criteria scores were 6.2 ± 0.6 for IVF+ and 4.0 ± 0.5 Vol. 64, No.6, December 1995
40
'S
i
-
30
20
-
10
-
WHO
60
Statistical Measures
Positive and negative predictive values, sensitivity, and specificity were determined using IVF outcome as the "true test" and morphology as the experimental test (Table 1). World Health Organization criteria demonstrated a higher sensitivity and negative predictive value than strict criteria. Numerical values for positive predictive value and specificity also were higher for WHO than for strict criteria but did not achieve statistical significance.
-
49
50
B
U
40
'S
i
30
20
10
<4
+14
>14
Strict criteria
Figure 1 Morphology scores and fertilization. Samples are grouped according to the percentage of normal forms. (A), WHO methodology. (B), Strict criteria methodology. ~, cycles with fertilization; D, cycles without fertilization. Morgentaler et al. Sperm morphology and NF outcome
1179
centration, defined as a concentration of 2:20 X 106/ mL and motility of 2:30% (1, 2). In this study there was an overall IVF+ rate of 74% for those samples meeting these criteria. The positive predictive value was similar for WHO and strict criteria (83% versus 75%, respectively), however, the negative predictive value was higher for WHO (58% versus 27%, P = 0.01). The IVF+ group demonstrated higher morphology scores than IVF- by both WHO (50.3 versus 40.1, P < 0.0001) and strict criteria (6.4 versus 4.3, P < 0.05). Mean sperm concentration in this selected population was higher for the NF + group than IVF- (72.9 versus 54.7, P < 0.01), but no significant difference was noted for motility (63.5 versus 59.6, P> 0.05).
A 20
Number of Fertilized Eggs
The number of fertilized eggs per couple ranged from 0 to 18 with a mean of 3.1. The correlation between the number offertilized eggs per couple and morphology was 0.364 for WHO criteria and 0.330 for strict criteria. World Health Organization (P = 0.036) and strict criteria (P = 0.038) morphology were both identified by multiple and stepwise regression to be independent factors at the 0.05 level of significance for number of fertilized eggs, whereas sperm concentration and motility were not. Samples with normal morphology by WHO were associated with a greater number of fertilized eggs than samples with abnormal morphology (4.04 versus 0.84, P < 0.0001). The same relationship was noted for strict criteria, although the difference in the number of eggs was less marked (3.82 versus 2.15, P = 0.02). Figure 2 depicts a scattergram of the number of fertilized eggs and morphology scores. Of particular note is that samples with <40% normal forms by WHO criteria fertilized only two or fewer eggs, with only one exception. Scores < 30% were associated 1180
Morgentaler et al.
Sperm morphology and IVF outcome
0
16
0
'"'"'"
Ql
".!:! :e CD
~
'0 G;
.0
E :J
Z
14 12
0 0 0 0
0
10
o
0
8
o 0 0 o
6
0 0
..,
0 0
0 0
4
0
~""
0
o
..,
~~~~
2
0 0 0
0
4
0
0 0
0
0
0
..,q,
..20 .... -~
0
0
0 0 o 000 0 0 0 0 0 o ' b O o o .., 0 0. 'b
2
~
6
10
8
12
14
16
18
20
22
Strict
B
Abnormal Morphology with Normal Concentration and Motility
There were 95 of 132 men with sperm concentration 2: 20 X 106/mL and motility 2: 50%, classified as normal parameters by the World Health Organization (4, 5). The overall fertilization rate for this group was 77%. Abnormal sperm morphology was identified in 17 ofthese men by WHO criteria and 33 men by strict criteria. Samples with an abnormality limited to morphology would be expected to demonstrate a low fertilization rate if abnormal morphology were indeed a prognostic factor independent of concentration and motility. This was true for WHO criteria, with a 41 % fertilization rate (Fisher's exact, P = 0.0065), but not for strict criteria, which demonstrated a fertilization rate of 79%.
0
18
20 0
18
0
16 0
'"'"'" ".!:!'" :e'"
~
'0 G; E
.0 :J
z
14 0
12 0
0 00 00 0
10 8
00
6
Q)
00
4
CXQ)
0
co
2 0
0
10
20
0
0
0 ceo 0
0
0 0 0 0 00 0 0 0 00 00 00 0 0 oco co 0 co 0 0 Q) co 0 o 00 axm 00 0
0 0
0
o
30
40
50
60
70
80
w-o
Figure 2 Scattergram of morphology scores (percent normal forms) versus number of fertilized eggs. (A), Strict criteria. (B), WHO criteria.
with fertilization of only one egg or no fertilization at all. Low strict criteria scores were not amenable to similar observations. For example, up to five fertilized eggs were achieved for the lowest possible strict criteria score of 0% normal forms. DISCUSSION
In 1986 and 1988 Kruger et al. (1, 2) published reports indicating that a modification of the WHO morphology assessment of sperm, termed strict criteria, had considerable prognostic value for IVF outcome. However, no data were presented to demonstrate that this new methodology represented an improvement over traditional WHO morphology. Enginsu et al. (13, 14) reported that strict criteria was superior to WHO in a study of a variety of sperm tests, but no provision was made to address the potential bias of a single individual subjectively scoring "normal" sperm according to two techniques. The Fertility and Sterility
I
present study was undertaken to overcome this limitation and to provide comparative information on sperm morphology for those involved in the fertility assessment of men. In this study a direct comparison of WHO and strict criteria was performed by using the same slides for review by both methodologies. Technicians from different laboratories scored the percentage of normal forms according to their own training and clinical practice in either WHO or strict criteria. Each technician was blinded to the results of the other morphology score and to the results of IVF. Specific training in strict criteria was obtained by personal instruction from the original author of this technique. To ensure that no systematic bias towards excessive "strictness" was demonstrated by the strict criteria technician, 20 of the original slides subsequently were reviewed by another similarly trained technician. No such bias was found. Further, the use of a consecutive series of couples strengthened the study by investigating the performances of WHO and strict criteria in a population with a variety of infertility etiologies, including various degrees of male and female factors. On the other hand, the potential weakness of such a study design is that results for subgroups may be lost in the results for the group as a whole. The major finding of this study is that strict criteria failed to demonstrate any superiority over classic WHO morphology with regard to IVF results. On the contrary, WHO outperformed strict criteria in several comparisons. Fertilization rates for samples with abnormal morphology by WHO criteria were significantly lower than for the group overall, whereas samples with abnormal strict criteria scores did not differ significantly from the overall fertilization rate. The negative predictive value and the sensitivity of WHO were greater than strict criteria. The specificity and positive predictive value were also numerically greater for WHO than strict criteria but did not achieve statistical significance. These results were observed for the overall group, as well as for the subpopulation with "adequate" concentration and motility, which formed the study group in the initial reports by Kruger et al. (1, 2). Finally, among samples with normal sperm concentration and motility, abnormal WHO was associated with reduced fertilization, however, abnormal strict criteria was not. This last observation suggests that morphology as determined by WHO can serve as an independent factor with regard to IVF outcome. A second end point for analysis was the association of morphology and the number of fertilized eggs. In this area WHO criteria also was superior to strict criteria. Comparison of samples with normal and abnormal WHO criteria revealed a difference of Vol. 64, No.6, December 1995
more than three fertilized eggs, whereas the difference for strict criteria was only half as great. Low WHO scores were associated with two or fewer fertilized eggs, whereas up to five fertilized eggs were noted for even the lowest strict criteria scores. A question of paramount importance to patients and clinicians alike is what is the prognostic value of a specific morphology score for IVF? For WHO there appears to be a clear relationship between morphology score and IVF outcome, with a statistical break at 40%. Fertilization was achieved in 83% of samples with scores above or equal to this value, whereas only 32% of samples with scores <40% were able to fertilize. Scores at the highest and lowest extremes yielded higher and lower fertilization rates, respectively. For example, scores <20% had a fertilization rate of only 17%, whereas scores ?: 65% had a fertilization rate of 90%. Further, the number of fertilized eggs was reduced when WHO morphology scores were <40%. For strict criteria the prognostic value of a given score is less clear. Although high scores (> 14%) were associated with good IVF outcomes, there was no breakpoint below which IVF success became remote. Using the lowest possible score as an illustration, there were nine fertilizations among 14 samples with 0% normal forms, for a fertilization rate of64%. Additionally, there was no strict criteria score below which the number of fertilized eggs was likely to decrease. In effect, regardless of what threshold for normal was used, strict criteria categorized a large proportion of samples as abnormal despite reasonable fertilization rates. This failure of low strict criteria scores to predict diminished reproductive success has also been reported by Check et al. (16), who found that strict criteria scores :s: 4 were not associated with reduced pregnancy rates in either retrospective or prospective non-IVF studies. The fertilization rate of 64% for samples with 0% normal forms by strict criteria merits comment. Clearly, this particular strict criteria score need not be considered indicative of a hopeless IVF attempt. In this study, a WHO score of <40% offered a much worse prognosis than 0% by strict criteria. For clinicians responsible for interpreting results to patients it is important to emphasize that 0% by strict criteria does not necessarily signify that no normal sperm are present: it signifies only that no sperm meeting a set of criteria were observed in the sample. Clearly, some sperm with "abnormal" morphological characteristics by strict criteria must possess the capacity to fertilize eggs. The major limitation of this study is that sperm morphology assessment is a subjective test, and the absence of uniform standards renders a study such as this vulnerable to the criticism that morphology Morgentaler et al. Sperm morphology and NF outcome
1181
-..,.assessment may have been performed improperly. In this study, the strong association between WHO scores and IVF outcomes argues ipso facto that WHO scoring was well done. The weaker association of strict criteria with IVF outcomes makes those results more suspect. Further, strict criteria scores in this study were lower than those reported previously (1-3), raising the concern that excessive "strictness" may have been applied during morphology assessment. However, considerable efforts were made to obtain reliable strict criteria performance, including personal training with the original author of strict criteria, careful adherence to proper technique, and independent review of a subset of slides by another technician at a later date. The lower strict criteria scores also can be explained in part by the inclusion in this study of consecutive couples regardless of semen quality, whereas other studies restricted analysis to samples with ~20 X 106 spermlmL and ~30% motility (1,2). Thus, a lower score distribution would be anticipated for this study. It thus seems unlikely that the observed differences between strict and WHO criteria could be attributed entirely to subjective error. Although the determination of the relative merits of WHO and strict criteria will require further investigation, an unavoidable conclusion of this study is that WHO morphology can be a vigorous predictor of IVF outcome. Whereas some prior publications have supported this view (6-8), other studies have revealed no relationship between WHO morphology and IVF outcome (9-12). It may well be that the more recent emphasis on sperm morphology assessment has led directly to improved results due to greater quality performance of this test. Older negative studies generally compared several semen parameters, and results of morphology assessment may have suffered from a prevailing belief that this test was unreliable. Thus, the renewed interest in morphology because of the advent of strict criteria may have contributed directly to the strong results of WHO criteria in this study. Given the strong association between WHO criteria and IVF outcome, further investigation of WHO morphology and male fertility may be clinically rewarding. REFERENCES 1. Kruger TF, Menkveld R, Stander FSH, Lombard CJ, Van der Merwe JP, van Zyl JA, et al. Sperm morphologic features
1182
Morgentaler et al. Sperm morphology and NF outcome
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
as a prognostic factor in in vitro fertilization. Fertil Steril 1986;46:1118-23. Kruger TF, Acosta AA, Simmons KF, Swanson RJ, Matta JF, Oehninger S. Predictive value of abnormal sperm morphology in in vitro fertilization. Fertil Steril 1988;49:112-7. Grow DR, Oehninger S, Seltman HJ, Toner JP, Swanson RJ, Kruger TF, et al. Sperm morphology as diagnosed by strict criteria: probing the impact of teratozoospermia on fertilization rate and pregnancy outcome in a large in vitro fertilization population. Fertil Steril 1994;62:559-67. World Health Organization. WHO laboratory manual for the examination of human semen and sperm-cervical mucus interaction. 2nd ed. Cambridge: The Press Syndicate of the University of Cambridge, 1987. World Health Organization. WHO laboratory manual for the examination of human semen and sperm-cervical mucus interaction. 3rd ed. New York: Cambridge University Press, 1993. Liu DY, Du Plessis YP, Nayudu PL, Johnston WI, Baker HWG. The use of in vitro fertilization to evaluate putative tests of human sperm function. Fertil Steril1988;49:272-7. Rogers J, Bentwood BJ, van Campen H, Helmbrecht G, Soderdahl D, Hale RW. Sperm morphology assessment as an indicator of human fertilizing capacity. J Androl 1983;4: 119-25. Chan SYW, Chan STH, HO PC, So WWK, Chan YF, Ma HK. Predictive value of sperm morphology and movement characteristics in the outcome of in vitro fertilization of human oocytes. J In Vitro Fert Embryo Transf 1989;6:142-7. Alper MM, Lee GS, Seibel MM, Smith D, Oskowitz SP, Ransil BJ, et al. The relationship of semen parameters to fertilization in patients participating in a program of in vitro fertilization. J In Vitro Fert Embryo Transf 1985;2:217-23. Rosenborg L, Gustafson 0, Lunell NO, Nylund L, Pousette A, Slotte H, et al. Morphology of seminal and swim-up spermatozoa and the outcome of in vitro fertilization and embryo transfer. Andrologia 1990;22:369-75. Check JH, Bollendorf A, Press M, Blue T. Standard sperm morphology as a predictor of male fertility potential. Arch Androl 1992;28:39-41. Zaini A, Jennings MG, Baker HWG. Are conventional sperm morphology and motility assessments of predictive value in subfertile men? Int J Androl 1985;8:427 -35. Enginsu ME, Dumoulin JCM, Pieters MHEC, Bras M, Evers JLH, Geraedts JPM. Evaluation of human sperm morphology using strict criteria after Diff-Quik staining: correlation of morphology with fertilization in vitro. Hum Reprod 1991; 6:854-8. Enginsu ME, Dumoulin JCM, Pieters MHEC, Bergers M, Evers JLH, Geraedts JPM. Comparison between the hypoosmotic swelling test and morphology evaluation using strict criteria in predicting in vivo fertilization. J Assist Reprod Genet 1992;9:259-64. Alper MM, Seibel MM, Oskowitz SP, Taymor ML. Comparison of follicular fluid hormones in patients with one or two ovaries participating in a program of in vitro fertilization. Fertil Steril 1987;48:94-7. Check JH, Adelson HG, Schubert BR, Bollendorf A. Evaluation of sperm morphology using Kruger's strict criteria. Arch Androl1992;28:15-7.
Fertility and Sterility
I
I