Testing diagnostic tests: Why size matters

Testing diagnostic tests: Why size matters

2. Adherence to medical treatment in childhood chronic illness: Concepts, methods, and interventions. Ed, Doctar D. Mahwah, NJ: Lawrence Erlbaum Assoc...

166KB Sizes 0 Downloads 51 Views

2. Adherence to medical treatment in childhood chronic illness: Concepts, methods, and interventions. Ed, Doctar D. Mahwah, NJ: Lawrence Erlbaum Associates, 2000. 3. Sherman J, Patel P, Hutson A, Chesrown S, Hendeles L. Adherence to oral montelukast and inhaled fluticasone in children with persistent asthma. Pharmacotherapy 2001;21:1464-7. 4. Finkelstein JA, Lozano P, Farber HJ, Miroshnik I, Lieu TA. Underuse of controller medications among Medicaid-insured children with asthma. Arch Pediatr Adoles Med 2002;156:562-7. 5. Bauman LJ, Wright E, Leickly FE, Crain E, Kruszon-Moran D, Wade SL, et al. Relationship of adherence to pediatric asthma morbidity among inner-city children. Pediatrics 2002;110:e6. 6. Varni JW, Wallander JL. Adherence to health-related regimens in pediatric chronic disorders. Clin Psych Rev 1984;4:585-96. 7. Milgrom H, Bender B, Ackerson L, Bowry P, Smith B, Rand C. Noncompliance and treatment failure in children with asthma. J Allergy Clin Immunol 1996;98:1051-7. 8. Birkhead G, Attaway NJ, Strunk RC, Towsend MC, Teutsch S. Investigation of a cluster of deaths of adolescents from asthma: evidence implicating inadequate treatment and poor patient adherence with medications. J Allergy Clin Immunol 1989;84:484-91. 9. Fiese BH, Wamboldt FS, Anbar RD. Family asthma management routines: connections to medical adherence and quality of life. J Pediatr 2004;. 10. Fiese BH, Wamboldt FS. Tales of pediatric asthma management: family-based strategies related to medical adherence and health care utilization. J Pediatr 2003;143:457-62.

11. Wagner GJ, Ryan GW. Relationship between routinization of daily behaviors and medication adherence in HIV-positive drug users. Aids Patient Care Stds 2004;18:385-93. 12. Irvine L, Crombie IK, Alder EM, Neville RG, Clark RA. What predicts poor collection of medication among children with asthma? a case-control study. Eur Resp J 2002;20:1464-9. 13. Farber HJ, Capra AM, Finkelstein JA, Lozano P, Quesenberry CP, Jensvold NG, et al. Misunderstanding of asthma controller medications: association with nonadherence. J Asthma 2003;40:17-25. 14. Donnelly JE, Donnelly WJ, Thong YH. Inadequate parental understanding of asthma medications. Ann Allergy 1989;62:337-41. 15. Dimatteo MR. The psychology of health, illness and medical care: An individual perspective. Pacific Grove, CA: 1991. 16. Schillinger D, Piette J, Grumbach K, Wang F, Wilson C, Daher C, et al. Closing the loop: physician communication with diabetic patients who have low health literacy. Arch Intern Med 2003;163:83-90. 17. Riekert KA, Butz AM, Eggleston PA, Huss K, Winkelstein M, Rand CS. Caregiver-physician medication concordance and undertreatment of asthma among inner-city children. Pediatr 2003;111:E214-20. 18. Bartlett SJ, Krishnan JA, Riekert KA, Butz AM, Malveaux FJ, Rand CS. Maternal depressive symptoms and adherence to therapy in inner-city children with asthma. Pediatrics 2004;113:229-37. 19. Dimatteo MR, Lepper HS, Croghan IT. Depression is a risk factor for noncompliance with medical treatment: meta-analysis of the effects of anxiety and depression on patient adherence. Arch Intern Med 2000;160: 2101-7.

TESTING DIAGNOSTIC TESTS: WHY SIZE MATTERS

or more than 50 years, quantitative determination of sweat chloride has been the gold standard in the diagnosis of cystic fibrosis (CF).1 Currently, the sweat chloride test is used to confirm or rule out the diagnosis of CF in two populations: neonates identified by newborn screening programs and patients presenting with clinical features suggestive of the disease. The sweat chloride test, when performed correctly, is accurate and reliable but labor intensive. In an effort to simplify the test, sweat conductivity methods have been developed. Conductivity represents the nonspecific measurement of the total anion activity in a solution and therefore has a higher concentration in sweat than chloride. Sweat collected in Macroduct coils and transferred to the Sweat-Chek conductivity analyzer (Wescor Inc., Logan, Utah) has been used in some settings as a screening test for CF. Individuals with a conductivity result above a prescribed cut-point are then referred for a confirmatory quantitative chloride measurement at a CF care center.2-5 Some have suggested that sweat conductivity performs as well as sweat chloride in diagnosing CF and could be used alone as a confirmatory test.6-8 Recently, a new point of care conductivity analyzer, Nanoduct (referred to as the ‘‘new system’’), has been developed for use especially in the neonatal population. As

F

CF

Editorials

Cystic fibrosis

described in the accompanying article by Barben et al,9 the new system combines sweat collection and analysis into a single disposable conductivity sensor, using 3 lL of sample. The sensor and readout provide conductivity results within 30 minutes. The potential advantages over the traditional sweat chloride test are ease of use and availability of results within a short period of time. The important question remaining to be answered is whether the new system is as diagnostically accurate as the quantitative sweat chloride test in discriminating between CF and healthy individuals. The Barben article is the first published paper evaluating this new instrument. In their study of 20 patients with classic CF, 73 patients referred for sweat testing, and 1 patient with nonclassic or borderline CF, Barben et al reported 100% sensitivity (95% CI, 83% to 100%) and 100% specificity (95% CI, 95% to See related article, p 183. 100%) for the new system compared with sweat chloride Reprint requests: Vicky A. LeGrys, testing. We cannot know the DrA, Department of Allied Health true sensitivity or specificity Sciences, Division of Clinical Laboratory Science, CB #7145, University of of a diagnostic test. We can North Carolina at Chapel Hill, Chapel only observe the results from Hill, NC 27599-7145. studies with relatively small J Pediatr 2005;146:159-62. numbers of individuals. From 0022-3476/$ - see front matter Copyright ª 2005 Elsevier Inc. All rights this imperfect information, reserved. we conclude that the chances 10.1016/j.jpeds.2004.10.054 159

Figure 2. Sample size required to demonstrate minimum acceptable sensitivity of a new diagnostic test (Nanoduct sweat conductivity) relative to the gold standard (sweat chloride), given observed sensitivity of 100%, 1-sided a level of 0.025, and 80% power. Note logarithmic scale on y-axis.

Figure 1. Probability of observing 100% sensitivity if the true sensitivity of a diagnostic test is between 75% and 100% for a hypothetical study with a sample size of 20, 40, or 100 cases. Brackets indicate lower limit of a 1-sided, 97.5% CI for a study with an observed sensitivity of 100%.

of the true sensitivity and specificity being worse than the lower 95% confidence limit of the observed estimates are slim enough that these values are deemed implausible. Confidence intervals are often misunderstood, so we would like to be very clear about their interpretation. What does the lower confidence limit of 83% on the perfect sensitivity (100%) mean from this study with 20 patients with classic CF? The 2-sided 95% confidence limit indicates that the true sensitivity is unlikely to be less than 83% in this population. (If the true sensitivity of the new system was 83%, we could potentially miss 17 of every 100 true CF cases.) The probability of observing perfect sensitivity in a study of this size (n = 20) is only 2.5% if the true sensitivity was as poor as 83% (Figure 1). This probability corresponds to a P value of .025, for the difference between the observed sensitivity of 100% and a hypothetical value of 83%. If the true sensitivity was 91% (9% false-negative rate), the probability of observing perfect sensitivity in a study of this size is 15% (P = .15). Using the traditional a level of 0.05 for statistical significance, we would say that the observed sensitivity of 100% is not significantly different from a hypothetical true sensitivity of 91%. Based on this study, we 160

Editorials

would not be able to rule out the possibility that the new system has a false-negative rate as high as 9%. Larger studies with n = 40 would only observe perfect sensitivity 2.5% of the time if the true sensitivity was 91%. Studies of 100 CF cases would essentially never report a sensitivity of 100% if the true sensitivity was 91%. The lower limit of the 2-sided 95% CI marks the boundary where the chance of getting the results that we observed in this study (100% sensitivity) is only 2.5% of the time if the real performance of the test was 83% or worse. Looking at the curves in Figure 1, we would dismiss the possibility that the true sensitivity is worse than 83%, 91%, and 96%, based on studies with 20, 40, and 100 CF cases, respectively, that observe a sensitivity of 100%. Increasing the size of the study allows us to narrow the range of reasonable values for the true sensitivity of a diagnostic test. This single study with 20 classic CF cases gives us some measure of assurance that the new system would miss no more than 17 individuals with CF out of every 100 classic CF cases if we were to test everyone with CF in the world. Based on their results, the authors suggest that the new system might be used to diagnose CF in place of sweat chloride testing, but the confidence limit reveals that the new system might miss an unacceptably high number of CF-affected individuals. Larger studies would more precisely characterize the sensitivity of the new system before a decision is made regarding its ability to replace sweat chloride testing. How big does a study need to be to provide conclusive evidence that a new diagnostic test with apparently perfect sensitivity and specificity is as good as the gold standard? Diagnostic test studies are very different from analytic validation studies.10,11 In analytical validation studies, we are verifying the manufacturer’s performance for parameters such as precision, accuracy, and reportable range.12 In diagnostic studies, the fundamental issue is proving noninferiority. We The Journal of Pediatrics  February 2005

must decide a priori what values for sensitivity and specificity need to be ruled out to consider the new test no worse than the existing gold standard. The required sample size for a study of the new system’s performance as a diagnostic test is determined by the minimum acceptable values for its sensitivity (Figure 2) and specificity relative to the gold standard. When the disease being diagnosed is uncommon, as is the case with CF, sample size is generally driven by the need to demonstrate high sensitivity. To prove that the new system has a sensitivity that is ‘‘no worse’’ than that of sweat chloride testing, we must first decide what proportion of CF cases the new system must identify to be considered equivalent (minimum acceptable sensitivity). This is then used to calculate the sample size by stipulating that the confidence limit for the observed sensitivity cannot be any lower than this value. Using a very strict criterion of 99.5% as the minimum acceptable sensitivity, we require a study of 9688 individuals referred for sweat testing (approximately 775 of whom we would expect to have CF, given a prevalence of 8%) to demonstrate that the new system has fewer than 5 in 1000 false-negative values (Figure 2). This assumes that the observed sensitivity is 100%—the same as that reported by Barben et al. Enrolling 9688 individuals in such a study would be very costly and challenging. Next, we consider a more liberal criterion. Given a minimum acceptable sensitivity of 95%, we require a study of 925 individuals referred for sweat testing (approximately 74 of whom we would expect to have CF) to demonstrate that the new system has fewer than 5 in 100 false-negative values (Figure 2). Though within the realm of feasibility, this is still a multicenter, multiyear study. Even if the new system performed perfectly in this study, we could not exclude the possibility that it would incorrectly classify as negative or borderline up to 5 of every 100 individuals with CF, based on the confidence interval. The required sample size depends heavily on the minimum acceptable sensitivity that one selects. This cutpoint needs to be chosen on the basis of a thorough understanding of the costs of both tests and the risks and benefits of correctly and incorrectly diagnosing individuals who are referred for testing. Armed with this information, a cost-effectiveness analysis can help to guide the decision regarding how well the new system would need to perform to be a viable alternative to sweat chloride testing. Given the difficulty and expense of conducting a study of patients referred for sweat testing, we would like to understand as much as possible about the test’s characteristics from those who have already been diagnosed as CF-positive, as these authors have done. A test that performs poorly in known cases is unlikely to perform well in those who present for sweat testing. A study of several hundred CF cases could show that the new system can correctly identify those who have already been diagnosed with good precision—or to rule it out—but caution is warranted. We cannot conclude that the new system is a good diagnostic test based only on results from studies of known positives. The properties of a diagnostic test are specific to the study population in which they were Editorials

estimated. Individuals who present for sweat testing due to the results of neonatal screening or clinical suspicion may differ from known CF cases in important ways, owing to the older age of those who have already been diagnosed. In point of fact, Barben et al found that the difference between conductivity and chloride results was significantly greater in the referral population than in classic CF cases. In addition, a significant number of atypical patients having sweat chloride concentrations in the borderline region must be included in the diagnostic studies. Although equivocal sweat test results account for a small percentage of the total tests performed, these patients are valuable in their contribution to understanding the appropriate diagnostic cut-points. The work of Barben et al provides an important starting point for studying the diagnostic test characteristics of the new system. Their findings suggest that it would be worthwhile to initiate larger studies that can more precisely measure the new system’s sensitivity and specificity. These studies would allow us to rule out values that represent unacceptably high falsepositive and false-negative rates. Given the difficulty of conducting a single study from which we could conclude that the true sensitivity of the new system is within acceptable bounds, pooling the results of smaller studies conducted in similar populations may be a more feasible strategy. With some coordination of effort across CF centers and appropriate statistical analysis, the cumulative results of small studies from around the globe over the next few years could provide solid evidence for the approval of the new system for the diagnosis of CF. Michele Jonsson Funk, PhD Vicky A. LeGrys, DrA Departments of Epidemiology and Allied Health Sciences University of North Carolina at Chapel Hill Chapel Hill, North Carolina

REFERENCES 1. Rosenstein BJ, Cutting GR. The diagnosis of cystic fibrosis: a consensus statement. J Pediatr 1998;132:589-95. 2. National Committee for Clinical Laboratory Standards. Sweat testing: sample collection and quantitative analysis; approved guideline C34-A2. Wayne, PA: NCCLS, 2000. 3. CF Center Directors Update No. 1. Bethesda, MD: Cystic Fibrosis Foundation, 1990. 4. Guidelines for the Performance of the Sweat Test for the Investigation of Cystic Fibrosis in the UK. Birmingham, UK: Association of Clinical Biochemists and the Royal College of Paediatric and Child Health, 2003. 5. Hammond KB, Nelson L, Gibson LE. Clinical evaluation of the Macroduct sweat collection system and conductivity analyzer in the diagnosis of cystic fibrosis. J Pediatr 1994;124:255-60. 6. Heeley ME, Woolf DA, Heeley AF. Indirect measurements of sweat electrolyte concentration in the laboratory diagnosis of cystic fibrosis. Arch Dis Child 2000;82:420-4. 7. Lezana JL, Vargas MH, Karam-Bechara J, Aldana RS, Furuya MEY. Sweat conductivity and chloride titration for cystic fibrosis diagnosis in 3834 subjects. J Cystic Fibrosis 2003;2:1-7. 8. Mastella G, Dicesare G, Borruso A, Menin L, Zanolla L. Reliability of sweat-testing by the Macroduct collection method combined with conductivity analysis in comparison with the classic Gibson and Cooke technique. Acta Paediatr 2000;89:933-7.

161

9. Barben J, Ammann RA, Metlagel A, Schoeni MH. Conductivity determined by a new sweat analyzer compared to chloride concentrations for the diagnosis of cystic fibrosis. J Pediatr 2005;146:183-8. 10. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative: Standards for Reporting of Diagnostic Accuracy. Clin Chem 2003;49:1-6.

11. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7-18. 12. Department of Health and Human Services, Health Care Financing Administration. Clinical laboratory improvement amendments of 1988; final rule. Fed Reg 1992;7165:[42CFR493.1217].

SCREENING FOR CONGENITAL CYTOMEGALOVIRUS INFECTION: A TAPESTRY OF CONTROVERSIES

ongenital infection with cytomegalovirus (CMV) is an under-appreciated endemic public health problem that was first heralded by Weller in 1971 as an issue worthy of our attention and concern.1 Since then, numerous large cohort studies of mother-infant pairs conducted around the world have documented congenital infection rates between 0.3% and 2.2%.2,3 These studies also have documented that most (85%-90%) infections with CMV are silent at birth, yet longitudinal studies of identified newborns have shown that late sequelae, especially progressive hearing loss and possibly also learning or behavioral differences, emerge months and even years later, in up to 15% of these children.4,5 A smaller proportion, 5% to 15%, of congenitally infected newborns have recognizable symptoms and signs at birth, and, unfortunately, many of these children will experience neurosensory sequelae that will significantly impact their quality-of-life. Although preconceptional antibody in the mother appears to protect the fetus against transplacental infection and severe disease, the protection is incomplete because both symptomatic congenital infections and neurosensory sequelae, including hearing loss, have been documented in children born congenitally infected with CMV as a result of their mother’s recurrent CMV infection.6 The article by Naessens et al published in this issue of The Journal provides the latest information on the prevalence of congenital CMV infection in Brussels, Belgium.7 In accordance with other studies, the authors showed that 54% of 7140 pregnant women had serologic evidence of past CMV infection, and 4.1% of pregnant women experienced a primary CMV infection documented by seroconversion during pregnancy or a recent primary or recurrent CMV infection documented by immunoglobulin (Ig)M antibody on the first prenatal visit. They also documented congenital CMV infection in 44 (0.62%) of the newborns studied, with most (36; 82%) born to women with primary or recent primary infections. Based on their findings, the authors proposed a strategy for screening pregnant women that would identify >80% of newborns at

C

CMV Ig

162

Cytomegalovirus Immunoglobulin

Editorials

risk for congenital infection and neurosensory sequelae. Unfortunately, rather than being closer to solving the problem, we have yet another weave for the tapestry of controversies that covers rather than confronts the multitude of issues surrounding congenital CMV infection. The controversy surrounding screening for congenital CMV infection is not new. In fact, the idea to screen newborns for congenital CMV infection was first proposed in the early 1970s, when the link between congenital CMV infection and hearing loss, first realized by Medearis in 1964,8 was strengthened by several investigators who conducted longitudinal studies on the outcome of congenitally infected newborns. In 1982, Hanshaw wrote an editorial that accompanied one such study conducted by Saigel et al in which he emphasized the importance of congenital CMV infection as a major cause of nonhereditary sensorineural hearing loss and posed that the cost of a CMV screening program may be justified when considering the price for delayed diagnosis, unnecessary tests, and proven value of early intervention in hearing-impaired children.9,10 He concluded his editorial with the opinion, ‘‘We have more reason to consider screening newborns than we have ever had before.’’ Now, in 2005, the controversy surrounding screening newborns for congenital CMV infection continues, recently fueled by the realization that universal newborn hearing screening, endorsed as a standard of care in 2000, may miss more than two thirds of children who develop hearing loss from congenital CMV infection.11,12 Given decades of understanding about the large numbers of affected children, and the proven benefit of early intervention, the issue should See related article, p 194. not be whether we should screen but rather which approach is best to screen and Reprint requests: Dr Gail J. Demmler, Texas Children’s Hospital, MC 3monitor all congenitally in- 2371, 6621 Fannin Street, Houston, TX 77030-2399. E-mail: gjdemmle@ fected newborns. To date, a variety of texaschildrenshospital.org. J Pediatr 2005;146:162-4. approaches have been evalu0022-3476/$ - see front matter ated to diagnose congenital Copyright ª 2005 Elsevier Inc. All rights CMV infection in the new- reserved. born. These methods include 10.1016/j.jpeds.2004.11.020 The Journal of Pediatrics  February 2005