PAEDIATRIC RESPIRATORY REVIEWS (2008) 9, 295–300
REVIEW
Newborn screening for cystic fibrosis: the motion against – voices in the wilderness Mark Rosenthal Royal Brompton Hospital, Sydney Street, London SW3 6NP, UK
KEYWORDS CF screening; Scepticism
George Bernard Shaw amongst many others wrote that ‘you should try everything once except incest and folk dancing’. To this short list should be added writing against newborn screening (NBS) as this topic generates such intense emotion; the facts get buried. The following therefore is not a treatise against NBS for CF but a warning to the reader not to get carried away by the white heat of new technology and to retain scientific but not Luddite scepticism.
SCREENING TEST CRITERIA The standard criteria for a NBS test are that it is practicable, cheap, simple, highly sensitive and highly specific. The condition that is screened for has to be treatable and the outcome after screening has to be indisputably superior to that achieved by treatment after diagnosis by other methods. Congenital hypothyroidism (CH) is a classic example of the last tenet as a failure to diagnose it by 30 and certainly 60 days has devastating consequences for intellectual development.
THE CYSTIC FIBROSIS SCREENING ALGORITHM The national CF screening algorithm is outlined in Fig. 1. It has two/three stages. Stage 1 is a heel prick immunoreactive trypsin (IRT) at 5–7 days. If this is above a certain cut-off, Stage 2 is gene analysis initially for four mutations, and if this E-mail address:
[email protected] 1526-0542/$ – see front matter ß 2008 Published by Elsevier Ltd. doi:10.1016/j.prrv.2008.09.003
is inconclusive 29 or 31 mutation analysis is performed on the original heel prick sample. Two mutations provides a diagnosis, one or no mutations leads to a repeat IRT at 28 days, again by heel prick (Stage 3). If above a different cutoff again, there is referral for definitive diagnosis. If below the cut-off and one mutation is present, there is genetic counselling for carrier status; if no mutations are found the child is pronounced normal. So the process is practicable but it could never be described as simple compared with, for example, the one-stage measurement and detection of a raised thyroid stimulating hormone (TSH) measurement for CH. Its complexity therefore must be justified by its benefits to outcome. Its cost is relatively modest at approximately (UK)£4 per screened child. The assumed sensitivity from Fig. 1 is the sum of the green boxes (3.5 + 0.5 + 0.1) divided by the total number of CF patients identified by screening and clinical means (green + yellow box): 4.1/4.35 = 94%. This is not bad but remember that it means 6 in 100 cases are missed – the false negatives. For the UK with some 300 new cases per year this equates to 18 missed cases. It is a well recognized phenomenon that following the introduction of screening, clinical suspicion of the diagnosis markedly declines. The result is likely to be that these 18 cases will be diagnosed later than would have occurred prior to screening because the health carers will be prone to say, ‘it cannot be CF as the child has been screened’. The next criterion to consider is specificity and here the situation is unsatisfactory. From Fig. 1 it can be seen that the
296
M. ROSENTHAL
Figure 1
Diagnostic algorithm for UK newborn screening for cystic fibrosis.
number of patients detected/10 000 births is 4.1 (green boxes). However, there are five patients in the pink box: these are babies with a high initial IRT and one known mutation but whose second IRT is below the cut-off, although this could be by a tiny amount. These babies are pronounced carriers and, at least in south-east England, geneticists will see the parents of such babies to provide counselling. In my opinion, this category should be called a false positive. These babies get past two of the three screening stages and require a specific piece of health care (parental genetic counselling). Thus, the specificity of the screening process is 4.1/(4.1 + 5) = 45%, which frankly is very poor. The problem is compounded by the fact that, philosophically at least, it is difficult to prove a negative. The suspicion of CF has been raised in the families of the babies in the pink box. No amount of counselling will prove to these families that their baby is only a carrier. In the first 6 months of screening, I have seen two children whose parents are wholly unconvinced that their baby ‘is only a carrier’. I understand that there is at least the threat of a legal case in the UK involving a baby where a diagnosis or otherwise cannot be confirmed as the ‘screening process has failed in its objective’. The diagnosis of a small minority of children with CF can be exceedingly problematical: in one case with which I was involved it took me 2 years before I could ‘get off the fence’. Thus, ‘proving’ that someone does not have CF can be daunting and time consuming. So, in view of the criteria discussed so far, does the CF screening process increase or decrease the total sum of human happiness? In 2001 when only 20% of the UK population were screened, the overall median age at diagnosis was 4 months; in France, where there was even less screening, 4 months; and in the US 6 months.1 Of the 300 babies with CF born each year in the UK, 15% will have
meconium ileus and will be diagnosed at birth anyway, leaving 255 for whom diagnosis is not obvious at birth. If the number (18) not detected by screening is then subtracted, this leaves 237 cases diagnosed by screening some 2–3 months sooner than by clinical means. Remember the 18 missed cases may be diagnosed later than predicted but by how much the literature does not say. Add to this the 366 families in the UK drawn into the process by being carrier detected (pink box, Fig. 1) – 1.56 for every child with CF detected, and the balance of human happiness probably does tip towards screening but by a surprisingly small amount. In Australia, where there is a long-standing nationwide screening programme, 94% of babies in 2000 were diagnosed by 6 months but only 70% were diagnosed by screening.1
WHAT ARE THE CLINICAL BENEFITS OF SCREENING? This balance of human happiness cited above would tip further towards screening if the last two screening criteria were to be demonstrated: that the condition is treatable and that screening is indisputably superior to diagnosis by other means. A good result here would justify sucking the 366 carrier families into the process. The first criterion is easy: CF is definitely treatable although not currently curable. Do screened patients do better than clinically diagnosed ones? This is the crucial issue but the entire worldwide CF screening process effectively rests on one randomized blinded clinical trial, the Wisconsin study.2 Between 1985 and 1994 all infants born in Wisconsin, USA were screened for CF using blood spot IRT, and after 1991 IRT plus DF508 testing. For those infants positive for either scenario, half were recalled for sweat testing and the
NEWBORN SCREENING FOR CYSTIC FIBROSIS
other half, blinded to both parents and clinicians, were left to be diagnosed clinically or were recalled for testing if by 4 years no diagnosis of CF had been entertained. Those with meconium ileus in either group were excluded as the purpose was to examine the value of screening on outcome. Despite chance differences in pancreatic status and genotype frequency between the two groups, following correction there remained significant, if small, differences in height for age centile or z scores. However, the number of observations at age 10 years was four in the screened and treated group versus nine in the screened but left for clinical diagnosis group – these are very small numbers on which to base worldwide screening. Great play was made of the difference in mean age at diagnosis (12 vs 72 weeks).
297
However, the median age at diagnosis, a more realistic measure given the naturally skewed distribution, was 7 versus 23 weeks. The range for the screened group was 4– 281 weeks and the clinical group 7–372 weeks. Why did the screened subjects have to wait up to 5.4 years for treatment? As for the clinical group, if all the remaining undiagnosed subjects were tested at 4 years, why was diagnosis delayed until they were aged up to 7.2 years? Was ascertainment all it was cracked up to be? A subsequent paper3 addressed the concerns expressed by Wald and Morris4 and did show a persisting increased risk of height and weight < 10th centile in the screened but left to be clinically diagnosed group during follow-up. It is also stated that by chance, the clinically diagnosed group had a higher
Figure 2 Comparisons of pulmonary function tests (PFTs) as related to age between the screened and control groups after the age of 7 years. (A) % Predicted FEV1/FVC. (B) % Predicted FEV1. (C) % Predicted mean forced expiratory flow during the middle half of the FVC (FEF25–75). (D) Residual volume:total lung capacity (RV:TLC) ratio. P values shown were obtained from generalized estimating equation (GEE) model adjusting for CF centre, sex and age. The dashed lines show the lower limits of normal at 89.25% (A), 82.4% (B) and 67.9% (C). The average upper or lower 95% confidence intervals are represented by vertical bars. (Reproduced from Farrell et al.5 by kind permission of the American Thoracic Society.).
298
incidence of pancreatic sufficiency and more non-DF508 mutations, conferring a theoretical advantage to that group. As for lung function, this is a case of ‘the mountain bringing forth a mouse’. The 2003 report from the Wisconsin study states: ‘we experienced unexpected confounders when some post-randomized imbalances emerged and a subgroup of screened patients acquired Pseudomonas aeruginosa (PsA) prematurely’.5 No differences in lung function were demonstrated. Fig. 2 describes the lung function/age relationship. My interpretation is that the screened group after about age 10 years had a small but consistently worse lung function than the control group. Fig. 3 shows that the screened group has an inferior chest radiograph score after age 10 years, possibly related to increased PsA acquisition (Fig. 4). So, the screened patients had worse lung function and chest radiograph scores and more PsA. Great emphasis is placed on possible reasons for the screened patients doing worse but the outcome remains the same; the screened patients aged 10 years are doing worse than the clinically diagnosed ones. One unexpected result was the claim that cognitive function was affected in the clinically diagnosed children who had a low vitamin E level at diagnosis.5 After adjusting for parental education and marital status, clinically diagnosed infants (mean, not median, age at diagnosis = 18 months) with a low vitamin E level had a score some 13 points lower than the screened infants with the same vitamin E level. If the vitamin level was normal at diagnosis, there was no difference
Figure 3 Comparison of Wisconsin chest X-ray (WCXR) scores5 as related to age between the screened and control groups. The P value for group comparison is 0.017 when the generalized estimating equation (GEE) model I includes patient group, CF centre, sex and age, whereas the P value for group comparison is 0.10 when the GEE model II includes the covariates above plus adjustment for risk factors: genotype, pancreatic status and Pseudomonas aeruginosa acquisition. The numbers of observations, i.e. patients at each age, are indicated above the line for the screened group and below the line for the control group. The dashed line at WCXR = 5 shows the value that discriminates potentially irreversible, although still mild, lung disease. For WCXR, a high score indicates greater chest radiograph abnormalities. The average upper or lower 95% confidence intervals are represented by the vertical bars. (Reproduced from Farrell et al.5 by kind permission of the American Thoracic Society.).
M. ROSENTHAL
Figure 4 Analysis of Pseudomonas aeruginosa (PsA) acquisition5 as patients age in the two groups and stratified by centres [A, Madison or B, Milwaukee (where an old, small clinic was used from 1985 to 1989)] into four subgroups: screened group + Madison, control group + Madison, screened group + Milwaukee and control group + Milwaukee. Time to first PsA acquisition was compared using log-rank tests (P = 0.007 and 0.005, respectively). (Reproduced from Farrell et al.5 by kind permission of the American Thoracic Society.).
in cognitive score irrespective of patient grouping. The implication is that it is the duration of low vitamin E levels that is important. However, scrutiny of their Table 26 shows that the significant correlation between vitamin E level at diagnosis and cognitive score explains just 6% of the variability, the other 94% being unaccounted for, so the effect, though real, is very small and swamped by other factors. It is worth noting that 27% of adults with CF achieved a college education or beyond in 2004 and 24% of patients with CF in the UK had a university degree in 2002, neither of which suggests a uniform reduction in cognition. My conclusion from the Wisconsin trial is: ‘is that it’! What of evidence from other sources? The UK study in Wales and the West Midlands from 1985 to 1989 allocated screening on an alternate week basis,7 the non-screened weeks being diagnosed clinically. Seventy-eight cases were diagnosed by screening and 71 clinically. By definition there is incomplete ascertainment of the clinical group as all the cases may not be diagnosed yet. At age 4 years there were no nutritional differences between the groups. By age 5 years there had been four deaths in the clinical group and none in the screened group, but two of the four deaths occurred in subjects diagnosed by 7 weeks of age. Thus it is
NEWBORN SCREENING FOR CYSTIC FIBROSIS
doubtful whether screening would have had any impact on those deaths. In the last 15 years there have been no deaths among cases < 5 years and two among those < 10 years at our non-screening institution with an average clinic population of 350. Thus, the Italian study quoting an 11.8% mortality in the clinically diagnosed group from Sicily with an average 11-year follow-up compared with 1.8% in the screened group from Veneto is a ridiculous comparison and prompts questions as to the quality of care the former were receiving.8 The French study9 comparing adjacent regions with and without screening but a common treatment protocol at their respective centres based their conclusions on only seven and eight subjects, respectively, at 10 years, and contains the remarkable observation that the clinically diagnosed group was diagnosed at a mean 471.8 days with an SD of 607 days: thus, about 10% of this group were diagnosed before conception. The Australian group10 compared children diagnosed clinically between 1978 and 1980 with those diagnosed by screening from 1981 to 1983. By age 15 years, nutritional indices were the same but there was a 12% difference in forced expiratory volume in 1s (FEV1) benefiting screening and no difference in mortality. The use of historical controls in a condition with a rapidly improving prognosis (approximately 9 months per year for the past 30 years) is open to much misinterpretation. In desperation, people have therefore looked at CF database data to demonstrate in ordinary usage the virtues of screening. As an example,1 comparison of the UK data (20% screened), French data (10% screened, but only 60% overall data coverage), US data (< 20% screened) and Australian data (supposedly 100% screened but only 70% diagnosed by screening – very odd) has been undertaken. The median ages at diagnosis are 4, 6, 6 and 1.8 months, respectively. The percentage of the UK population at about 16 years with a body mass index (BMI) < 10th centile is 12% compared with 17% for the screened Australian population. PsA prevalence rates are greater in the UK population than the Australian (50% vs 22%). However, the proportion of patients with ‘normal’ lung function, although that definition is often not defined, is 53% in the UK, 37% in Australia and 42% in the US. My personal experience with US lung function is with one 13-year-old American child who had outpatient consultations with us, semi-alternating with a US CF centre. Our FEV1 was always 20% lower (with chest radiograph and symptoms to match) than her US results. Great care is required when comparing data across countries; if anything, data are generally worse in screened Australia than the largely unscreened UK.
COST–BENEFIT ANALYSIS OF CF SCREENING Finally, efforts have been made to demonstrate that CF screening saves money. The UK database undertook a comparison of Scottish screened patients with, I can only
299
Figure 5 Comparison of costs of care between screened and unscreened subjects divided by Pseudomonas aeruginosa status11.
assume, English unscreened patients.11 The treatment protocols were unmatched and outcomes per unit cost completely unknown. The claim was a median annual saving in treatment cost per screened patient of US$1600. This claim can only be valid if the outcomes, which are not considered, are at least equal. Our institution has one of the best lung function outcomes (unscreened) of any CF institution based on CF registry data, but I would also be completely unsurprised if our hospital admission rate per patient was also one of the UK’s highest. Are these two facts causal, associated or unassociated? I do not know but it does have a crucial bearing on cost analyses. The distributions of the cost were so skewed in each population that, for example, in relation to pseudomonas status (Fig. 5), I see no real difference.
SUMMARY So now we are stuck with CF screening because the unruly zealots have escaped from their cage again and managed to bamboozle us all into doing it on the basis of one flawed proper clinical trial which produced ambivalent, unexciting results and much hearsay from around the world. It would be foolish to favour the corollary to screening which would be that a delay in diagnosis is advantageous. The current situation remains, as in Scottish law, ‘not proven’. Diagnosis is only the beginning of the attritional war against CF, not the end. Therefore, my concern is not so much when cases are diagnosed, as the great majority in the UK prior to screening were diagnosed before the age of 12 months and screening will not have a huge impact on this; it is whether the patients, once diagnosed, receive appropriate aggressive, unremitting, life-long therapy.
REFERENCES 1. McCormick J, Sims EJ, Green MW, Mehta G, Culross F, Mehta A. Comparative analysis of cystic fibrosis registry data from the UK with USA, France and Australasia. J Cyst Fibros 2005; 4: 115–122.
300
2. Farrell PM, Kosorok MR, Laxova A et al. Nutritional benefits of neonatal screening for cystic fibrosis. Wisconsin Cystic Fibrosis Neonatal Screening Study Group. N Engl J Med 1997; 337: 963–969. 3. Farrell PM, Kosorok MR, Rock MJ et al. Early diagnosis of cystic fibrosis through neonatal screening prevents severe malnutrition and improves long-term growth. Wisconsin Cystic Fibrosis Neonatal Screening Study Group. Pediatrics 2001; 107: 1–13. 4. Wald NJ, Morris JK. Neonatal screening for cystic fibrosis. BMJ 1998; 316: 404–405. 5. Farrell PM, Li Z, Kosorok MR et al. Bronchopulmonary disease in children with cystic fibrosis after early or delayed diagnosis. Am J Respir Crit Care Med 2003; 168: 1100–1108. 6. Koscik RL, Farrell PM, Kosorok MR et al. Cognitive function of children with cystic fibrosis: deleterious effect of early malnutrition. Pediatrics 2004; 113: 1549–1558.
M. ROSENTHAL
7. Doull IJ, Ryley HC, Weller P, Goodchild MC. Cystic fibrosis-related deaths in infancy and the effect of newborn screening. Pediatr Pulmonol 2001; 31: 363–366. 8. Grosse SD, Rosenfeld M, Devine OJ, Lai HJ, Farrell PM. Potential impact of newborn screening for cystic fibrosis on child survival: a systematic review and analysis. J Pediatr 2006; 149: 362–366. 9. Siret D, Bretaudeau G, Branger B et al. Comparing the clinical evolution of cystic fibrosis screened neonatally to that of cystic fibrosis diagnosed from clinical symptoms: a 10-year retrospective study in a French region (Brittany). Pediatr Pulmonol 2003; 35: 342–349. 10. McKay KO, Waters DL, Gaskin KJ. The influence of newborn screening for cystic fibrosis on pulmonary outcomes in New South Wales. J Pediatr 2005; 147(3 Suppl): S47–S50. 11. Sims EJ, Mugford M, Clark A et al. UK Cystic Fibrosis Database Steering Committee. Economic implications of newborn screening for cystic fibrosis: a cost of illness retrospective cohort study. Lancet 2007; 369: 1187–1195 Erratum in: Lancet 2007; 370: 28..