Verification bias in pediatric studies evaluating diagnostic tests

Verification bias in pediatric studies evaluating diagnostic tests

Verification bias in pediatric studies evaluating diagnostic tests Ann S. Bates, MD, Peter A. Margolis, MD, PhD, a n d Arthur T. Evans, MD, MPH From t...

553KB Sizes 0 Downloads 89 Views

Verification bias in pediatric studies evaluating diagnostic tests Ann S. Bates, MD, Peter A. Margolis, MD, PhD, a n d Arthur T. Evans, MD, MPH From the Departments of Pediatrics and Medicine, Indiana University,Indianapolis, Indiana, and the Departments of Pediatrics and Medicine, Universityof North Carolina, Chapel Hill

Improperly designed evaluations of diagnostic fesfs may lead to Inaccurate conclusions about a fest's accuracy. One problem, verification bias, occurs if subjects are not equally likely fo have the diagnosis verified by a gold-sfandard evaluation and if selection for further evaluation is d e p e n d e n t on the diagnostic test result. To determine whether verification bias is a problem in pediatric studies of diagnostic tests, we conducted a critical appraisal of all studies evaluating diagnostic tests published In three pediatric journals during a 3-year period. Thirty-six percent were subject to verification bias. The most prevalent cause was restriction of the patient sample to those whose diagnosis had been verified by a gold standard evaluation, when the decision fo obtain the gold standard was influenced by the diagnostic test result. Verification blas may have serious effects on the estimated sensitivity and specificity of a test. Improved awareness of the potential for verification bias may help physicians improve their selection and interpretation of diagnostic tests and thereby improve the quality and efficiency of patient care. (J PEDIAI'R4993,122:585"90) Clinicians use published evaluations of diagnostic tests to guide their selection of which tests to use. When evaluations are improperly designed, conducted, or reported, however, their results may lead to inaccurate conclusions about a test's accuracy and utility. Proposed criteria for judging the quality of diagnostic test studies include the following: whether the patients studied were representative of the patients to whom the test would be applied in practice, whether the reliability of the test was documented, whether the gold-standard evaluation was performed without knowledge of the test results, and whether the test characteristics were properly computed and presented) "4 One of the more difficult problems to recognize, "verification" or "workup" bias, s'1~ is not often included in lists

Supported in part by grant D28 PE-55009 from the Bureau of Ilealth Professions, Health Resources and Services Administration, and by Institutional NRSA award T 32 PE 15001, National Institutes of llealth (Dr. Bates). Submitted for publication July 28, 1992; accepted Nov. 19, 1992. Reprint requests: Ann S. Bates, MD, Regenstrief Institute, Fifth Floor, 1001 W. Tenth St., Indianapolis, IN 46202. Copyright 9 1993 by Mosby-Year Book, Inc. 0022-3476/93/$1.00 + .10 9/20/44360

of such criteria and thus is not well recognized. Verification bias occurs if the diagnoses for patients with different test results are not equally likely to be confirmed, or verified, by a gold-standard evaluation. It typically occurs when selection for further evaluation is dependent on the diagnostic test result. In clinical practice, patients who have certain clinical findings or positive results on initial tests are more likely to undergo additional tests to verify the diagnosis; ELM SICD

Early Language Milestone Scale Sequenced Inventory for Communication Development

[

patients without those findings or with negative test results are less likely to undergo more definitive testing. Although this approach may be sensible and cost-effective in clinical practice, when it occurs in studies designed to evaluate the performance of diagnostic tests, the sensitivity and specificity of those tests may be misrepresented. Most tests are not perfect; some patients with negative test results may have the disease (false-negative results). Similarly, some patients with positive test results may not have the disease (false-positive results). If patients with negative test results who do not undergo further evaluation are assumed to be

585

586

Bates, Margolis, and Evans

disease free, the investigators will falsely label some patients with disease but with negative test results as not having disease. This will produce a falsely elevated estimation of sensitivity and specificity. For example, in an evaluation of newborn screening for congenital adrenal hyperplasia, only patients with an elevated screening test result underwent further evaluation. II Patients with negative screening test results who had congenital adrenal hyperplasia would have been misclassified as not having the disease. Verification bias can also occur if patients with negative results who never underwent a gold-standard evaluation are excluded from the calculation of test characteristics altogether. In this situation the sensitivity will be falsely elevated but the specificity will be falsely decreased. For example, a retrospective study that evaluated the accuracy of the lactose breath hydrogen test in the diagnosis of enteropathy in children included only those who had undergone jejunal biopsy. 12 Because biopsy is performed only rarely on children with negative breath hydrogen test results, these patients would not be included in this study. The objective of our study was to determine whether verification bias is an important problem in studies of diagnostic tests reported in pediatric journals. We illustrate how this problem can affect the test characteristics, describe how it can be recognized, explain how to correct for this bias, and discuss how the bias can be avoided in the design of the study. METtlODS Sample selection. To ascertain the prevalence of verification bias in the pediatric literature, we retrieved all published studies of diagnostic tests that appeared in the American Journal of Diseases of Children, the JOURNALOF PEDIATRICS,or Pediatrics from January 1987 to December 1989. We searched the National Library of Medicine M E D L I N E data base using the MeSH heading "Diagnostic Tests, Routine" for articles published in these journals and having the words sensitivity and specificity appearing somewhere in the title, abstract, or as key words. To ensure completeness of the sample, one of us (A.S.B.) read the abstracts of all articles published in the three journals during that period to identify eligible articles that had been missed by the M E D L I N E search. Articles were included if they evaluated a test for clinical use in diagnosis, presented original data, and used human subjects.. Determination of ~erification bias. If all eligible study subjects (regardless of test result) received a definitive diagnosis according to the same gold-standard evaluation, the article was considered to be free of verification bias. Verification bias was considered to be present if subjects with positive and negative diagnostic test results did not have an equal chance of undergoing a gold-standard evaluation. If

The Journal of Pediatrics April 1993 patients with negative (or positive) test results were less likely to undergo a gold-standard evaluation, the study was considered to be subject to verification bias, regardless of whether these patients were eliminated from further calculations or labeled as having had true negative results. Verification bias is common in retrospective studies because only tested patients who subsequently undergo the goldstandard evaluation are usually included, and the gold standard is almost always applied more often to patients with positive test results. Therefore retrospective studies were considered to be subject to verification bias unless the report stated that all patients who underwent the diagnostic test had an equal chance of undergoing the gold-standard evaluation. Critical rerlew. All articles were reviewed independently by two of the three investigators according to a standard data collection form.* Disagreements between reviewers were resolved after joint review of the article. On a very few occasions the third investigator was needed to resolve a persistent disagreement. To verify the accuracy of our evaluations, we sent our findings to the corresponding author of each article. Each author received a list of the specific evaluation criteria that were not met by their study. We asked the authors to review this list and contact us if they had comments or corrections. RESULTS Forty-two articles met our inclusion criteria. 11-szOnly 19 articles were detected by M E D L I N E literature search; 23 others were identified by manual search. Fifteen studies (36%) were considered subject to verification bias. 11"26 In 7 of the 15 studies the investigators used a retrospective design and restricted the patient sample to those whose diagnosis had been verified by a gold-standard evaluation. For example, one study examined the utility of various clinical signs and symptoms as predictors of cervical spine injury. 13 Patients who had not undergone cervical spine radiography were excluded from the study. Many of the clinical signs that were being evaluated were likely to have been used as criteria for obtaining a radiograph; patients without those signs (i.e., with negative test results) were less likely to be included in the sample. The other eight studies exhibiting verification bias were performed prospectively; nevertheless, the patients with negative test results were less likely to undergo the goldstandard evaluation than were patients with positive results. In six of the eight prospective articles, some of the patients with negative test results did not have verification of the diagnosis but were assumed to have had "true" negative re*A copy of the data collection form and the detailed criteria are available on request from the authors.

The Journal of Pediatrics Volume 122. Number 4

sults. Three of these were evaluations of screening tests in which only patients with positive test results underwent further evaluation. For example, a neonatal galactosemia screening program completely examined only those subjects with abnormal screening test findings, t4 The others were assumed to be disease free. The remaining two prospective studies that were subject to verification bias examined only a random subset of patients with negative test results and excluded patients whose diagnoses were not verified. One of these studies is described in detail below. Responses from authors. We sent our critical review of each article to the corresponding authors. Three letters were returned. No authors contradicted our findings about patient sample selection. Two authors did not consider their studies to have been evaluations of diagnostic tests. One author stated that he was interested in whether a screening questionnaire could be incorporated easily into a pediatric practice to identify potential drug and alcohol problems. 27 The authors found that the questionnaire was capable of discriminating drug and alcohol use among teenagers with and without a history of substance abuse. The other author stated that the instrument evaluated was neither a test nor a pediatric instrument. 2s This study found pediatricians to have low sensitivity but high specificity in identifying the presence of psychiatric disorders. Because these articles reported evaluations of an instrument used in diagnosis, they were included in our sample. Neither of these studies exhibited verification bias. DISCUSSION Methodologic problems are common among studies evaluating diagnostic tests. More that one third of the articles that we assessed had potential verification bias. Studies of diagnostic tests that identified patients through retrospective chart review were particularly prone to verification bias. We found in the medical literature only one previously published assessment of the prevalence of verification bias. Using the search terms "diagnostic test" and either "efficacy," "sensitivity," "specificity," or "false positive rate," Greenes and Begg t~ reviewed 145 studies published between 1976 and 1980 that were identified by a M E D L I N E search of all indexed journals. At least 26% were thought to be subject to verification bias. However, they acknowledged that their search identified only a small subset of the articles reporting evaluation of diagnostic tests. We found a somewhat higher prevalence of verification bias, possibly because our sample of three clinical pediatric journals was more restrictive and our methods for identifying all relevant articles were more comprehensive. Only 19 of our 42 articles were obtained from the M E D L I N E search by a search strategy similar to that used by Greenes and Begg.

Bates, Margolis, and Evans

587

Recognition of verification bias. There are two forms of verification bias. In one form, some patients with negative diagnostic test results do not undergo a gold-standard evaluation and are assumed to be truly disease free. A study evaluating the sensitivity and specificity of history taking and physical examination in diagnosing serious illness in febrile infants illustrates the problem. Is The gold standard for the diagnosis of "serious illness" was at least one abnormal finding on a battery of laboratory tests. For example, all patients with positive culture results, abnormal serum electrolyte values, hypoxemia, or abnormalities on chest radiograph were considered to have a serious illness. House officers ordered the laboratory tests which they believed to be appropriate for patients with positive findings on history or physical examination but rarely ordered the full battery of tests and often failed to order any laboratory work if the history and physical examination findings were entirely normal. Fewer than one third of the patients underwent the full battery of tests that defined the gold standard; the two thirds who did not undergo this battery of tests and who had negative results on the tests that they did undergo were assumed to have negative results on all tests and therefore not to have serious illness. In this case, by assuming that patients who were not tested with the full battery of tests were disease free, the authors may have misclassified cases as having had "true negative" results. This would produce falsely elevated sensitivity and specificity. The magnitude of the effect on sensitivity and specificity depends on the extent to which the test result influences the decision to perform the gold-standard evaluation; the more influential the test, the stronger the effect. It was not possible to estimate the magnitude of the effect in this study because no information was available on the influence of the historical and physical findings on the decision to perform a gold-standard evaluation. A second form ofverification bias, in which some patients with negative test results are not evaluated further and are excluded from analysis, is illustrated by a study of the Early Language Milestone Scale, a screening test for language delay in young children. 16 The gold standard was an exhaustive, standardized interview (the Sequenced Inventory for Communication Development). Investigators applied the gold standard to all available patients with positive screening test results (40/53; 75%) and to a randomly selected comparable number (37/604; 6%) of those with negative test results. The authors reported the data reproduced in the Figure. By applying the gold standard to an equal number of those who had positive and negative results on the screening test, a large proportion of those With negative results were excluded from analysis. This produces a falsely increased sensitivity, but specificity, unlike that in the first form of verification bias, is falsely decreased.

588

Bates, 3fargol&, and Evans

The Journal of Pediatrics April 1993

Gold Standard (SICD)

Gold Standard (SICD) +

Screening Test

(ELM)

+

+

26

14

40

4

33

37

30

47

77

A

Screening --> Test (ELM)

B

+

26

14

40

50

412

462

76

426

502

Figure. Example of effect of excluding some patients with negative test results from analysis. A, Reported results (verification bias present): sensitivity = 26/(26 + 4) = 87%; specificity = 33/(33 + 14) = 70%. Reported calculation of sensitivity and specificity of screening test, ELM, when only 6% of the subjects who passed the screening test (negative results) and 75% of the subjects who failed the test (positive results) were subjected to the gold standard, SICD. Shaded area represents patients excluded from verification of disease status and calculations. B, Reconstructed results (verification bias corrected): sensitivity = 26/(26 + 50) = 34%; specificity = (412)/(412 + 14) = 97%. Reconstructs the study resuits after adjustment for the unequal proportion of subjects with negative results who underwent gold-standard evaluation. Correction was performed by multiplying the test-negative cells by the factor by which subjects with positive test resuits were more likely to undergo gold-standard evaluation (75%/6% = 12.5).

Correction of verification bias. One can correct for verification bias by estimating the result that would be obtained if all positive and negative test results were verified. We reconstructed the two-by-two table by calculating how many times the gold standard (SICD) was more likely to be obtained for those with positive screening test results (ELM) than for those with negative results (75%/6% = 12.5) (Figure). We then multiplied the test-negative cells (lower row of cells in the Figure) by this number. The magnitude of the effect on sensitivity and specificity is demonstrated by comparing the sensitivities and specificities. After correction, the expected sensitivity is 34%, rather than the 87% reported in the study. The expected specificity is 97%, compared with the reported 70%. Thus when patients with negative test results are disproportionately excluded from application of the gold standard, the sensitivity is falsely elevated and the specificity is falsely lowered. Effects of this magnitude may have serious consequences on a reader's interpretation of a test's usefulness. For example, the ELM is now widely perceived to be a sensitive screening test. However, as described above, the actual sensitivity (after correction for verification bias) suggests that many cases of speech and language delay will be missed. Avoiding verification bias. There are several ways to avoid verification bias when one is planning a diagnostic test evaluation. The surest way is to perform the gold-standard evaluation when the diagnostic test result is unknown. If it is not possible for all patients to undergo the gold-standard evaluation, a random sample of all results may be verified. ! f equal proportions of positive and negative results are verified, the sensitivity and specificity will be unbiased and no

correction is required. However, if a disproportionate number of positive or negative results are verified, the correction procedure described earlier can be applied. If it is not possible to perform the gold-standard evaluation on all patients, another approach is to follow all patients with negative test results to detect any whose condition was misdiagnosed. However, this approach is valid only for diseases that do not resolve spontaneously and that become apparent with time, such as cancer. For example, if the "disease" is bacteremia or electrolyte abnormalities or infiltrate on chest radiograph, as in the aforementioned study of the history and physical examination in the detection of illness in infants, clinical follow-up may miss many cases of disease that are transient and that resolve spontaneously. In all evaluations of diagnostic tests, it is crucial for researchers to describe in detail the method of patient selection and to indicate whether the decision to perform a gold-standard evaluation could have been influenced by the diagnostic test result. 9 Diagnostic tests often are evaluated in a stepwise fashion across several studies. 53 This process can lead indirectly to verification bias. After preliminary studies of a diagnostic test demonstrate that the test may be useful, patients with negative test results may be less likely to undergo a gold-standard evaluation and therefore less likely to be included in a study. Therefore the subjects with negative test results are underrepresented and the reported specificity will be falsely low. 54 Likewise, the sensitivity will be overestimated. This progression has been illustrated by a study of the declining specificity of radionuclide ventriculography, which was initially reported to be highly specific for

The Journal of Pediatrics I/olume 122, Number 4

coronary artery disease. 55 Subsequent evaluations reported a much lower specificity. The authors of the study concluded that this effect was partly the result of the frequent practice of not performing the gold-standard evaluation on patients with negative ventriculography results (verification bias). One of the limitations of this review is that there is no gold standard for judging the quality of published research, especially research that evaluates diagnostic tests. We improved the reliability of our assessments by using two independent reviewers and a standardized data collection protocol. In addition, we gave the authors of the articles that we reviewed an opportunity to correct any errors that we might have made. This was considered essential because the editorial process may have restricted the publication of important details about the study design. Critical reviews of the research methods of published studies risk misrepresentation if the authors are not given an opportunity to clarify details about study design. The authors of some articles that we evaluated told us that they had not intended their studies to be interpreted as definitive evaluations of diagnostic tests but merely as important preliminary Or exploratory data. If such compromises in study design are made, however, the effect of the design modifications should be discussed explicitly. Preliminary studies may be subject to large errors in the estimates of sensitivity and specificity.

Bates, Margolis, attd Evans

5. 6.

7. 8.

9. 10.

11.

12. 13.

14.

15.

CONCLUSION Despite previous publications that stressed the need for improved methods of evaluating diagnostic tests, serious problems are still prevalent in the pediatric literature. Verification bias was detected in more that one third of the articles that we evaluated. Readers should be aware of study designs that may be susceptible to verification bias, which include studies that are retrospective analyses of test performance and studies in which all patients do not undergo the gold-standard evaluation. Improved awareness of the potential for verification bias at every level--investigator, reviewer, journal editor, and r e a d e r - - m i g h t help physicians improve their selection and interpretation of diagnostic tests and thereby the quality and efficiency of patient care.

16.

17.

18.

19.

20.

REFERENCES 1. Sheps SB, Schechter MT. The assessment of diagnostic tests: a survey of current medical research. JAMA 1984;252:241822. 2. Arroll B, Schechter NIT, Sheps SB. The assessment of diagnostic tests: a comparison of medical literature in 1982 and 1985. J Gen Intern Med 1988;3:443-7. 3. Mulrow CD, Linn WD, Gaul MK, Pugh JA. Assessing quality of a diagnostic test evaluation. J Gen Intern Med 1989;4:28895. 4. Department of Clinical Epidemiology and Biostatistics, Mc-

21.

22.

23.

589

Master University. How to read clinical journals. II. To learn about a diagnostic test. Can Med Assoc J 1981;124:703-10. Begg CB. Biases in the assessment ofdiagnostic tests. Stat Med 1987;6:41 !-24. Ransohoff DF, Feinstein AR. Problem of spectrum and bias in evaluating the etiicacy of diagnostic tests. N Engl J Mcd i 978;299:926-30. Panzer R J, Suchman AL, Griner PF. Workup bias in prediction research. Med Decis Making 1987;7:115-9. Begg CB, Greenes RA. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 1983;39:207-15. Begg CB. Methodologic standards for diagnostic test studies. J Gen Intern Med 1988;3:518-9. Greenes RA, Begg CB. Assessment of diagnostic technologies: methodology for unbiased estimation from samples of selectively verified patients. Invest Radiol 1985;20:751-6. Thompson R, Seargeant L, Winter JSD. Screening for congenital adrenal hyperplasia: distribution of 17-hydroxyprogesterone concentrations in neonatal blood spot specimens. J PEDIATR 1989;I 11:400-4. Levine J J, Seidman E, Walker WA. Screening tests for enteropathy in children. Am J Dis Child 1987;141:435-8. Rachesky I, Boyce WT, Duncan B, Bjelland J, Sibley B. Clinical predictors of cervical spine injuries in children: radiographic abnormalities. Am J Dis Child 1987;141:199-201. Greenberg CR, Dilling LA, Thompson R, Ford JD, Seargeant LE, Haworth JC. Newborn screening for galactosemia: a new method used in Manitoba. Pediatrics 1989;84:331-5. McCarthy PL, Lembo RM, Fink HD, Baron MA, Cicchetti DV. Observation, history, an d physical examination in diagnosis of serious illnesses in febrile children --<24months. J PEDIATR 1987;110:26-30. Walker D, Gugenheim S, Downs MP, Northern JL. Early Language Milestone Scale and language screening of young children. Pediatrics 1989;83:284-8. Edwards JR, Ulrich PP, Weintrub PS, et al. Polymerase chain reaction compared with concurrent viral cultures for rapid identification of human immunodeficiency virus infection among high-risk infants and children. J PEDIATR 1989; 115:200-3. Bucher HU, Fanconi S, Baeckert P, Duc G. Hyperoxemia in newborn infants: detection by pulse oximetry. Pediatrics 1989;84:226r30. Tucker NT, Barguthy FS, Prihoda T J, Kumar V, Lerner A, Lebenthal E. Antigliadin antibodies detected by enzymelinked immunosorbent assay as a marker of childhood celiac disease. J PEDIATR 1988;113:286-9. Jellinek b,lS, Murphy JM, Robinson J, Feins A, Lamb S, Fenton T. Pediatric Symptom Checklist: screening school-age children for psychosocial dysfunction. J PEDIATR 1988; 112:201-9. Powell KR, Kaplan SB, tlall CB, Nasello MA, Roghmann KJ. Periorbital cellulitis: clinical and laboratory findings in 146 episodes, including tear countercurrent electrophoresis in eighty-nine episodes. Am J Dis Child 1988;142:853-7. Cheu HW, Brown DR, Rowe MI. Breath hydrogen excretion as a screening test for the early diagnosis of necrotizing enterocolitis. Am J Dis Child 1989;143:156-9. Bonadio WA, Smith DS, ttillman S. Clinical indicators of intracranial lesion on computed tomographic scan in children with parietal skull fracture. Am J Dis Child 1989;143:194-6.

590

Bates, 3Iargolis, and Evans

24. Hennes H, Lee M, Smith D, Sty JR, Losek J. Clinical predictors of severe head trauma in children. Am J Dis Child 1988;142:1045-7. 25. Griffin TC, Christoffel KK, Binns H J, et al. Family history evaluation as a predictive screen for childhood hypercholesterolemia. Pediatrics 1989;84:365-73. 26. Villalta IA, Pramanik AK, Diaz-Blanco J, Herbst J]. Diagnostic errors in neonatal polycythemia based on method of hematocrit determination. J PEDIATR 1989;114:433-5. 27. Klitzner M, Schwartz RH, Gruenewald P, Blasinsky M. Screening for risk factors for adolescent alcohol and drug use. Am J Dis Child 1987;141:45-9. 28. Costello E J, Burns BJ, Costello A J, Edelbrock C, Dulcan M, Brent D. Service utilization and psychiatric diagnosis in pediatric primary care: the role of the gatekeeper. Pediatrics 1988;82:435-41. 29. Zuckerman B, Amaro tt, Cabral H. Validity of self-reporting of marijuana and cocaine use among pregnant adolescents. J PEDIATR 1989;! 15:812-5. 30. Pifer LLW, Woods DR, Edwards CC, Joyner RE, Anderson F J, Arheart K. Pneumocystis carinii serologic study in pediatric acquired immunodeficiency syndrome. Am J Dis Child 1988;142:36-9. 3 !. Ostrea EM, Brady M J, Parks PM, Asensio DC, Naluz A. Drug screening of meconium in infants of drug-dependent mothers: an alternative to urine testing. J PEDIATR 1989;115:474-7. 32. Remafedi G, Abdalian SE. Clinical predictors of Chlamydia trachomatis endocervicitis in adolescent women: looking for the right combination. Am J Dis Child 1989;143:1437-42. 33. Kellogg JA, Landis RC, Nussbaum AS, Bankert DA. Performance of an enzyme immunoassay test and anaerobic culture for detection of group A streptococci in a pediatric practice versus a hospital laboratory. J PEDIATR 1987;1 ! 1:18-21. 34. Hauser G J, Pollack MM, Sivit C J, Taylor GA, Bulas DI, Guion CJ. Routine chest radiographs in pediatric intensive care: a prospective study. Pediatrics 1989;83:465-70. 35. Redd SC, Facklam RR, Collin S, et al. Rapid group A streptococcal antigen detection kit: effect on antimicrobial therapy for acute pharyngitis. Pediatrics 1988;82:576-81. 36. Schwartz DM, Schwartz RH. Validity of acoustic reflectometry in detecting middle ear effusion. Pediatrics 1987;79:73942. 37. Soren K, Willis E. Chlamydia and the adolescent girl: the enzyme immunoassay as a screening tool. Am J Dis Child 1989;143:51-4. 38. Taubman B, Barroway RP, McGowan KL. The diagnosis of group A, B-hemolytic streptococcal pharyngitis in the office setting: rapid latex test versus throat culture. Am J Dis Child 1989;143:102-4. 39. Kunnamo I, Kallio P, Pelkonen P, Hovi T. Clinical signs and laboratory tests in the differential diagnosis of arthritis in children. Am J Dis Child 1987;141:34-40.

The Journal of Pediatrics April 1993

40. Banco L, Jayashekaramurthy S, Graffam J. The inability of a temperature-sensitive pacifier to identify fevers in ill infants. Am J Dis Child 1988;142:!71-2. 41. Muir A, Daneman D, Daneman A, Ehrlich R. Thyroid scanning, ultrasound, and serum thyroglobulin in determining the origin of congenital hypothyroidism. Am J Dis Child 1988; 142:214-6. 42. Mauro RD, Poole SR, Lockhart CH. Differentiation of epiglottiditis from laryngotracheitis in the child with stridor. Am J Dis Child 1988;142:679-82. 43. lvarsson SA, Ericsson UB, Frediksson B, Persson PH. Ultrasonic imaging in the differential diagnosis of diffuse thyroid disorders in children. Am J Dis Child 1989;143:1369-72. 44. Orlando MS, Frank T. Audiometer and audioscope hearing screening compared with threshold test in young children. J PEDIA'rR 1987;110:261-4. 45. Dobkin D, Shulman ST. Evaluation of an ELISA for group A streptococcal antigen for diagnosis of pharyngitis. J PEDIATR 1987;110:566-9. 46. Czinn S J, Carr H. Rapid diagnosis of Campylobacter pyloridis-associated gastritis. J PEDIATR 1987;110:569-70. 47. Spivak W, Sarkar S, Winter D, Glassman M, Donlon E, Tucker KJ. Diagnostic utility of hepatobiliary scintography with 99m-Tc-DISIDA in neonatal cholestasis. J PEDIA'I'R 1987;I 10:855-61. 48. Sauder RA, Chesrown SE, Loughlin GM. Clinical application of transepithelial potential difference measurements in cystic fibrosis. J PEDIATR 1987;111:353-8. 49. Frankenhurg WK, Ker CY, Engelke S, Schaefer ES, Thornton SM. Validation of key Denver Developmental Screening Test items: a preliminary study. J PEDIATR 1988;I 12:560-6. 50. Sochett E, Daneman D. Screening tests to detect microalbuminuria in children with diabetes. J PEDIATR 1988;112:744-8. 51. Kimball TR, Weiss RG, Meyer RA, Daniels SR, Ryckman FC, Schwartz DC. Color flow mapping to document normal pulmonary venous return in neonates with persistent pulmonary hypertension being considered for extracorporeal membrane oxygenation. J PEDIATR 1989;114:443-7. 52. Dennison BA, Kikuchi DA, Srinivasan SR, Webber LS, Berenson GS. Parental history of cardiovascular disease as a indication for screening for lipoprotein abnormalities in children. J PEDIATR 1989;! 15:186-94. 53. Nierenberg AA, Feinstein AR. How to evaluate a diagnostic marker test: lessons from the rise and fall of dexamethasone suppression test. JAMA 1988;259:1699-702. 54. Sox HC. Probability theory in the use of diagnostic tests: an introduction to critical study of the literature. Ann Intern Med 1986;104:60-6. 55. Rozanski A, Diamond GA, Berman D, et al. The declining specificity of exercise radionuclide ventriculography. N Engl J Med 1983;309:518-22.