A Comparison of Models for Predicting Sperm Retrieval Before Microdissection Testicular Sperm Extraction in Men with Nonobstructive Azoospermia

A Comparison of Models for Predicting Sperm Retrieval Before Microdissection Testicular Sperm Extraction in Men with Nonobstructive Azoospermia

A Comparison of Models for Predicting Sperm Retrieval Before Microdissection Testicular Sperm Extraction in Men with Nonobstructive Azoospermia Ranjit...

407KB Sizes 9 Downloads 145 Views

A Comparison of Models for Predicting Sperm Retrieval Before Microdissection Testicular Sperm Extraction in Men with Nonobstructive Azoospermia Ranjith Ramasamy, Wendy O. Padilla, E. Charles Osterberg, Abhishek Srivastava, Jennifer E. Reifsnyder, Craig Niederberger* and Peter N. Schlegel† From the Departments of Urology, New York-Presbyterian Hospital, Weill Cornell Medical College, New York, New York, and University of Illinois at Chicago (WOP, CN), Chicago, Illinois

Abbreviations and Acronyms ANN ⫽ artificial neural network FSH ⫽ follicle-stimulating hormone LR ⫽ logistic regression micro-TESE ⫽ microdissection TESE NOA ⫽ nonobstructive azoospermia NPV ⫽ negative predictive value PPV ⫽ positive predictive value TESE ⫽ testicular sperm extraction Accepted for publication August 7, 2012. Study received Weill Cornell Medical College institutional review board approval. * Financial interest and/or other relationship with NexHand, Global Advanced Medical Care, American Society for Reproductive Medicine and American Urological Association. † Correspondence: Department of Urology, 525 East 68th St., Starr 900, New York, New York 10065 (telephone: 212-746-5491; FAX: 212-7468425; e-mail: [email protected]).

Purpose: We developed an artificial neural network and nomogram using readily available clinical features to model the chance of identifying sperm with microdissection testicular sperm extraction by readily available preoperative clinical parameters for men with nonobstructive azoospermia. Materials and Methods: We reviewed the records of 1,026 men who underwent microdissection testicular sperm extraction. Patient age, follicle-stimulating hormone level, testicular volume, history of cryptorchidism, Klinefelter syndrome and presence of varicocele were included in the models. For the artificial neural network the data set was divided randomly into a training set (75%) and a test set (25%) with n1/n2 cross validation used to evaluate model accuracy, and then modeled with a neural computational system. In addition, a nomogram with calibration plots was developed to predict sperm retrieval with microdissection testicular sperm extraction. We compared these models to logistic regression. Results: The ROC area for the neural computational system in the test set was 0.641. The neural network correctly predicted the outcome in 152 of the 256 test set patients (59.4%). The nomogram AUC was 0.59 and adequately calibrated. Multivariable logistic regression demonstrated patient age, history of Klinefelter syndrome and cryptorchidism to be significant predictors of sperm retrieval (p ⬍0.05). However, follicle-stimulating hormone and testicular volume were not significant by internal validation. Conclusions: We modeled a combination of well described preoperative clinical parameters to predict sperm retrieval using a neural computational system and nomogram with acceptable predictive values. The generalizability of these findings requires external validation. Key Words: testis, spermatozoa, neural networks (computer), sperm retrieval, nomograms

638

www.jurology.com

MICRO-TESE has become a recognized procedure for men with NOA. Simultaneous TESE-intracytoplasmic sperm injection cycles expose a couple to an emotional and financial burden so that it would be beneficial to predict the success of sperm retrieval before attempted treatment. Various studies

have focused on predicting the presence of spermatozoa in the testis using serum FSH and inhibin-B levels, previous testicular histology or fine needle aspiration.1– 4 The results indicate that most of these methods have low sensitivity or specificity, or an added invasive approach is required

0022-5347/13/1892-0638/0 THE JOURNAL OF UROLOGY® © 2013 by AMERICAN UROLOGICAL ASSOCIATION EDUCATION

http://dx.doi.org/10.1016/j.juro.2012.09.038 Vol. 189, 638-642, February 2013 RESEARCH, INC. Printed in U.S.A.

AND

MODELS FOR PREDICTING SPERM RETRIEVAL BEFORE TESTICULAR SPERM EXTRACTION

to achieve high predictive value. Still, it remains difficult for surgeons to accurately counsel men with NOA on the chance of successful sperm retrieval with micro-TESE. Furthermore, due to the nonlinear relation of each of these variables with successful sperm retrieval, it is difficult to reliably predict an individual patient outcome based on linear analysis. In this study we investigated the possibility of predicting the presence of spermatozoa in the testes of men with NOA using an ANN5 and a nomogram. We developed an ANN and a nomogram to predict the chance of identifying sperm with micro-TESE using clinical features, including patient age, testis volume and FSH levels. ANNs are nonlinear, computational mathematical models for information processing with architecture inspired by neuronal organizational biology. ANNs are not novel to urological diagnosis and treatment. In particular, research interest has centered on several subspecialty areas, including male infertility6 and uro-oncology.7–9 We compared the performance of this ANN and nomogram with that of a conventional LR model. To our knowledge this study presents the first use of models to predict the likelihood of successful sperm retrieval following micro-TESE, which may prove to be a useful guide in patient counseling.

METHODS Patients The study group consisted of 1,026 men with NOA who had undergone their first micro-TESE. All procedures were performed by a single urologist during an 11-year period (1999 to 2010). Azoospermia was confirmed by analysis of 2 specimens according to WHO guidelines.10 An additional semen sample was confirmed to be azoospermic on the day of the planned TESE. Testicular volume was measured by physical examination using an orchidometer and the volume of the larger testis was used for analysis. Karyotype analysis along with Y chromosome microdeletions were performed on all patients. Patients with AZFa and AZFb microdeletions did not undergo micro-TESE. The study protocol was approved by the Weill Cornell Medical College institutional review board.

Artificial Neural Network The data set was divided randomly into a training set of 770 patients (75%) and a test set of 256 (25%) with n1/n2 cross-validation used to evaluate model accuracy, and then modeled with a neural computational system (neUROn software environment, http://www.urocomp.org/).11 The output node represented the result of micro-TESE, which was encoded as a binary variable (sperm identified vs no sperm found). Each outcome was mapped with the corresponding patient characteristics, ie age, serum FSH and testicular volume.

639

The proportion of successful micro-TESE outcomes was constrained to be similar in the 2 sets using a randomization algorithm that maintained equal frequencies of outcomes. The test set was excluded from training and only used for cross validation (by the n1/n2 method). Multiple random sets of initial conditions (connection weights) were derived and the training set was applied iteratively to the neural computational system. When overlearning was observed by divergence of training and test set errors, hidden nodes were removed to reduce network topology. A 1-hidden node layer with 3 hidden nodes was determined to represent an optimal topology that maintained acceptable goodness of fit without over learning. We considered the network to be trained to completion when the error was observed to oscillate at a local error minimum. After a model with acceptable goodness of fit was achieved, reverse regression analysis using the Wilks generalized likelihood ratio test was performed to evaluate the statistical significance of each input variable. We computed ROC curve areas and compared as needed. Mean values of the variables in the spermatozoa positive and spermatozoa negative groups were compared using the t test. The means of variables in the training, validation and test sets were compared by ANOVA. Cross tabulation of TESE results with the outputs of the ANN and LR models, respectively, was used to determine the sensitivity, specificity, PPV, NPV and diagnostic value of each model.

Logistic Regression Predicting the presence of spermatozoa using clinical and laboratory data was also done by LR analysis. LR calculations were performed using STATA®, version 11.0. Data on patients in the training set (256) and the validation set (770) were combined to produce the probability equation in the LR model. The binary result in the LR model was the presence or absence of spermatozoa during the TESE procedure.

Nomogram LR coefficients of univariable predictors of sperm retrieval were then used to generate a nomogram. The predictive accuracy of all variables and the nomogram were quantified with AUC estimates derived from ROCs. The model was internally validated using 200 bootstrap resamples.12 This method produced a relatively unbiased measure of the ability of the nomogram to discriminate among patients, as quantified by the concordance index obtained by a calibration plot, which explores the performance characteristics of the nomogram through the predictive range. The concordance index is similar to measuring the area under the ROC curve. It represents the probability that the model will correctly assign a higher probability to a patient with a successful outcome compared to one with Table 1. ROC areas in ANN test set Method

ROC Area

Wicken p Value*

ANN LR Age Serum FSH

0.64 0.59 0.51 0.54

0.9454706 0.9161109 0.978252 0.994845

* Closer to 1 implies greater significance.

640

MODELS FOR PREDICTING SPERM RETRIEVAL BEFORE TESTICULAR SPERM EXTRACTION

Table 2. Multivariable LR analysis of sperm identification predictors at micro-TESE

Male age FSH Testis vol Klinefelter syndrome (yes vs no) Varicocele (yes vs no) Cryptorchidism (yes vs no)

Multivariable OR (95% CI)

p Value

1.02 (1.00–1.04) 0.99 (0.98–1.00) 0.98 (0.96–1.01) 3.07 (1.84–5.03) 1.18 (0.85–1.65) 2.29 (1.47–3.57)

0.01 0.66 0.48 ⬍0.001 0.30 ⬍0.001

an unsuccessful outcome. All tests were 2 sided with statistical significance considered at p ⬍0.05. Analyses were done using R, version 2.12.2 (R Foundation for Statistical Computing, Vienna, Austria).

RESULTS The ROC area for the neural computational system in the test set was 0.641 (table 1). This had a higher goodness of fit when compared to the LR model. After acceptable goodness of fit was observed (the ROC area for the neural computational system in

the test set was ⬎0.6), reverse regression based on the Wilks generalized likelihood ratio test was performed. This test demonstrated that age, FSH level, history of cryptorchidism and a diagnosis of Klinefelter syndrome were significant to the model (p ⬍0.05) but testicular volume was not (p ⫽ 0.12). The neural network correctly predicted the outcome in 152 of the 256 test set patients (59.4%). Similar to the ANN, multivariable LR demonstrated that age, Klinefelter syndrome and history of cryptorchidism were significant (p ⬍0.05). However, FSH, presence of varicocele and testicular volume were not (table 2). With the addition of data (age, duration of infertility, serum hormone levels and testicular volume) from the test group, the new ANN was able to identify patients with or without spermatozoa in the testis with 67% sensitivity, 49.5% specificity, 63.9% PPV, 52% NPV and 60.8% diagnostic accuracy. We used the standard LR model to generate the linear prediction formula. When we applied our test group data, this standardized method was 85% sen-

Figure 1. Nomogram to predict successful sperm retrieval after micro-TESE. To use nomogram plot patient value of given parameter on appropriate scale and draw vertical line up to point line at top to assign associated point score. Repeat process for each parameter and sum values to obtain total points score. Plot total points score on total point line and draw vertical line down to bottom line.

MODELS FOR PREDICTING SPERM RETRIEVAL BEFORE TESTICULAR SPERM EXTRACTION

sitive and 21% specific, and had a PPV of 59%, an NPV of 52.4% and a diagnostic value of 57.7%. These diagnostic value results indicate that the ANN (64%) is superior to LR (59.7%) for predicting the presence of spermatozoa. Figure 1 shows a nomogram predicting the probability of successful sperm retrieval in an individual with micro-TESE. In internal validation the nomogram accuracy was 59.6%. Calibration was adequate for the nomogram (fig. 2). The factors in the nomogram showing the greatest effect on successful sperm retrieval were history of cryptorchidism and Klinefelter syndrome.

DISCUSSION The ability to predict micro-TESE outcome as part of the diagnostic assessment of patients with NOA would allow the urologist to carefully select those patients who are suited for TESE or donor insemination procedures. In the current study we built a well conditioned neural computational model and nomogram13–16 to predict sperm identification using clinical features, ie patient age, testicular volume, serum FSH, history of cryptorchidism, Klinefelter syndrome and varicocele. ANN was cross validated using a test set that was not used in its construction. We found it to be more accurate than the linear statistical modeling methods of LR. A number of investigations in the literature have focused on predicting spermatozoa presence in the

641

testis.17–20 Some groups that have experimented with prediction by noninvasive methods have reported low diagnostic value. On the other hand, the methods reported to have high diagnostic value require surgical intervention, namely testis biopsy and histopathological examination, to accurately predict the results of subsequent biopsies.4 Routine diagnostic biopsies are not recommended before undergoing micro-TESE since the data from this invasive test coupled with clinical features only predicted sperm retrieval in 74% of the patients in our series. Based on the nomogram and ANN model, men with a history of Klinefelter syndrome and cryptorchidism seem to have a favorable chance of sperm retrieval with micro-TESE. On the other hand, the majority of the men who do not have these diagnoses have no reliable clinical parameters that can predict the chance of sperm identification. On the nomogram it is interesting to note that older age and higher FSH predict better chances of sperm retrieval. We have previously reported data demonstrating that higher FSH levels suggest a better chance of sperm retrieval.21 Although to our knowledge this is the largest series on TESE reported to date, our models were able to predict sperm retrieval accurately in 59% to 64% of the cases. It is unclear whether more data would add to the accuracy of the prediction models. This certainly poses a challenge during preoperative counseling of couples planning to undergo micro-TESE along with intracytoplasmic sperm injection. Based on the imperfect ability of the models to predict the outcome of micro-TESE, it is difficult to convince a patient not to undergo this procedure because this is often his best option in trying to father a biological child. The predictive accuracy of the models is fair and cannot be used to definitely identify which patients will and will not have sperm during micro-TESE. As such, the models will need to be cautiously used to counsel patients regarding their chances of sperm retrieval. Noninvasive parameters, such as genetic tests and imaging modalities, may be used to enhance our ability to predict outcomes in the future.

CONCLUSIONS

Figure 2. Nomogram calibration by bootstrapping for model. Apparent line is calculated directly from data set. Bias corrected line is bootstrap corrected line, which uses bootstrap sampling to validate model. Line at 45-degree angle represents ideal nomogram reference line.

We modeled a combination of well described preoperative clinical parameters for predicting sperm retrieval using models with acceptable accuracy. These models highlight the observation that TESE results cannot be reliably predicted using only gross preoperative clinical parameters since the presence of focal areas of sperm production in the testis may not affect parameters such as testicular volume or FSH. Further investigation is warranted to look for better predictors.

642

MODELS FOR PREDICTING SPERM RETRIEVAL BEFORE TESTICULAR SPERM EXTRACTION

REFERENCES 1. Bohring C, Schroeder-Printzen I, Weidner W et al: Serum levels of inhibin B and follicle-stimulating hormone may predict successful sperm retrieval in men with azoospermia who are undergoing testicular sperm extraction. Fertil Steril 2002; 78: 1195. 2. Brugo-Olmedo S, De Vincentiis S, Calamera JC et al: Serum inhibin B may be a reliable marker of the presence of testicular spermatozoa in patients with nonobstructive azoospermia. Fertil Steril 2001; 76: 1124. 3. Ezeh UI, Taub NA, Moore HD et al: Establishment of predictive variables associated with testicular sperm retrieval in men with non-obstructive azoospermia. Hum Reprod 1999; 14: 1005.

8. Anagnostou T, Remzi M, Lykourinas M et al: Artificial neural networks for decision-making in urologic oncology. Eur Urol 2003; 43: 596. 9. Zlotta AR, Remzi M, Snow PB et al: An artificial neural network for prostate cancer staging when serum prostate specific antigen is 10 ng./ml. or less. J Urol 2003; 169: 1724.

17. Mostafa T, Amer MK, Abdel-Malak G et al: Seminal plasma anti-Mullerian hormone level correlates with semen parameters but does not predict success of testicular sperm extraction (TESE). Asian J Androl 2007; 9: 265.

11. Niederberger C: Neural computation in urology: an orientation. Mol Urol 2001; 5: 133.

18. Tunc L, Kirac M, Gurocak S et al: Can serum Inhibin B and FSH levels, testicular histology and volume predict the outcome of testicular sperm extraction in patients with non-obstructive azoospermia? Int Urol Nephrol 2006; 38: 629.

12. Harrell FE Jr, Lee KL and Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996; 15: 361.

5. Niederberger C: Computational tools for the modern andrologist. J Androl 1996; 17: 462.

13. el-Mekresh M, Akl A, Mosbah A et al: Prediction of survival after radical cystectomy for invasive bladder carcinoma: risk group stratification, nomograms or artificial neural networks? J Urol 2009; 182: 466.

7. Anagnostou T, Remzi M and Djavan B: Artificial neural networks for decision-making in urologic oncology. Rev Urol 2003; 5: 15.

16. Cummings JM, Boullier JA, Izenberg SD et al: Prediction of spontaneous ureteral calculous passage by an artificial neural network. J Urol 2000; 164: 326.

10. World Health Organization: WHO Laboratory Manual for the Examination of Human Semen and Sperm-Cervical Mucus Interaction, 3rd ed. Cambridge, United Kingdom: Cambridge University Press 1992.

4. Su LM, Palermo GD, Goldstein M et al: Testicular sperm extraction with intracytoplasmic sperm injection for nonobstructive azoospermia: testicular histology can predict success of sperm retrieval. J Urol 1999; 161: 112.

6. Samli MM and Dogan I: An artificial neural network for predicting the presence of spermatozoa in the testes of men with nonobstructive azoospermia. J Urol 2004; 171: 2354.

15. Stephan C, Cammann H, Meyer HA et al: An artificial neural network for five different assay systems of prostate-specific antigen in prostate cancer diagnostics. BJU Int 2008; 102: 799.

14. Stephan C, Buker N, Cammann H et al: Artificial neural network (ANN) velocity better identifies benign prostatic hyperplasia but not prostate cancer compared with PSA velocity. BMC Urol 2008; 8: 10.

19. Vernaeve V, Staessen C, Verheyen G et al: Can biological or clinical parameters predict testicular sperm recovery in 47, XXY Klinefelter’s syndrome patients? Hum Reprod 2004; 19: 1135. 20. Vernaeve V, Tournaye H, Schiettecatte J et al: Serum inhibin B cannot predict testicular sperm retrieval in patients with non-obstructive azoospermia. Hum Reprod 2002; 17: 971. 21. Ramasamy R, Lin K, Gosden LV et al: High serum FSH levels in men with nonobstructive azoospermia does not affect success of microdissection testicular sperm extraction. Fertil Steril 2009; 92: 590.