EDITORIALS
THE JOURNAL OF PEDIATRICS DECEMBER 2000
Promises and pitfalls in the evaluation of pediatric asthma scores Assessment of acute asthma severity in young children is difficult. Pulmonary function tests can provide reliable and objective information on the severity of airways obstruction but require cooperation and may not be feasible in young children.1 Therefore pediatric asthma scores, consisting of a combination of clinical symptoms and signs, are frequently used to estimate the severity of acute airways obstruction, to guide treatment decisions, and to evaluate treatment results. A search of the medical literature (1966-1992) resulted in the identification of no less J Pediatr 2000;137:744-6. Copyright © 2000 by Mosby, Inc. 0022-3476/2000/$12.00 + 0 9/18/111458 doi:10.1067/mpd.2000.111458
744
than 16 clinical asthma scores.2 Most scores were designed in an ad hoc manner, based on clinical experience and face validity only. Information on the clinimetric properties of the scores in terms of reliability, validity, and responsiveness was scarce.
See related article, p 762. Since 1992, at least two pediatric asthma scores have been developed by using a more formal methodology: the Clinical Asthma Score1 and the Preschool Respiratory Assessment Measure,3 which is introduced in this issue of The Journal of Pediatrics. The PRAM was developed by relating potentially relevant items, such as wheez-
ing and retractions, to a measure of pulmonary function (respiratory resistance), which is believed to be of sufficient reliability and responsiveness in children aged 3 years and older.4 Commendable is the fact that separate data sets were used to design the PRAM PRAM Preschool Respiratory Assessment Measure
and, subsequently, to test its properties. However, the PRAM still needs to be validated in other settings and other patient groups. Given the fact that a considerable number of pediatric asthma scores are available, emphasis should now be placed on evaluating and comparing the reliability, validity, and responsiveness of fre-
EDITORIALS
THE JOURNAL OF PEDIATRICS VOLUME 137, NUMBER 6 quently used asthma scores, rather than on the development of yet another asthma score. The study of these clinimetric properties is not without pitfalls. There appears to be little consensus in the literature regarding definitions and methods, especially concerning responsiveness. Kirschner and Guyatt5 and Guyatt et al6,7 defined responsiveness as the ability to detect a clinically important change over time. Responsiveness was presented as a signal-to-noise ratio, in which the signal indicates the clinical change one wishes to detect, and the noise the random measurement error over which the change needs to be detected.7 However, other authors have argued that responsiveness is not a separate dimension, but just another aspect of validity. Responsiveness would simply incorporate longitudinal information (clinical change) into the evaluation of validity. If an asthma score is able to discriminate between levels of severity, it should be able to do this at any point in time.8 However, responsiveness refers to changes over time within patients, whereas validity or reliability usually refers to cross-sectional differences between patients. The study by Chalut et al3 demonstrated a strong association between the PRAM and pulmonary function regarding changes over time, whereas the (cross-sectional) association between PRAM and pulmonary function before treatment was rather weak. The authors subsequently conclude that the PRAM appears to be responsive but less suited for discrimination between levels of asthma severity. This somewhat unexpected outcome can be explained by the strict criteria that were used for patient selection. The severity of obstruction was mild to moderate in nearly all children. Improvement after treatment was very well possible (strong signal, little noise), but the small range in severity resulted in a relatively poor correlation between the two measures before treatment. Such findings emphasize the
need for the evaluation of asthma scores in different settings and patient populations and for adequate methods for the study of clinimetric properties. A variety of statistical methods have been described for the assessment of responsiveness, including the computation of standardized response means,9 effect sizes,10 responsiveness ratios,7 relative efficiency statistics,11 and receiver operating characteristic curves.12 The interpretation of these statistics is difficult because the unit of analysis varies and a direct comparison is not possible. One “solution” is to calculate all these statistics for a variety of clinical scores in one patient population and evaluate the relative responsiveness of the scores. The actual ranking of measures based on their relative responsiveness may be similar across statistical methods13 but may also vary considerably.11,14 When facing such results, it is difficult to draw conclusions on the responsiveness of clinical scores or to decide in favor of one specific score. In view of these difficulties, should we still consider investigating the responsiveness of pediatric asthma scores? I think we should. Given the fact that pulmonary function tests are usually not available for preschool aged children, alternative measures such as pediatric asthma scores are needed to evaluate improvement or deterioration of acute asthma. It does not really matter whether one prefers the term responsiveness or longitudinal validity, as long as adequate methods are used to answer these questions. Most methods require an external criterion or “gold standard” for change. In the evaluation of asthma scores, several external criteria for (change in) asthma severity have been used, including pulmonary function (forced oscillation techniques),3 a treatment of known efficacy,1 and a general judgment of severity by professionals.3 Ideally, the external criterion should be an objective measure that is assessed in blinded fashion for the results of the asthma score.
A general clinical judgment may not be the best option, because it is clearly not independent of the asthma score being evaluated.1 The clinical signs and symptoms that make up the score will also form an important part of the general evaluation. This is confirmed by the relatively strong association, reported by Chalut et al,3 between scores on the PRAM and appraisal of severity by professionals (rs = 0.50).3 Forced oscillation techniques or other pulmonary function tests may not always be available or feasible in young children, but fortunately, there is a treatment of known efficacy for acute asthma in children (nebulized bronchodilators, oral or intravenous steroids). In this approach the improvement in an asthma score after therapy is assumed to be a clinically relevant change. If this change can be detected over the random measurement error, the asthma score is considered to be responsive. Deciding on the clinical relevance of change remains arbitrary. Attempts should be made to define, in advance, the magnitude of a minimal clinically important change,14 analogous to the definition of a clinically important difference in the design of clinical trials. Chalut et al3 demonstrate that a change in the PRAM score of 3 points corresponds to a change of at least 25% in respiratory resistance, which may certainly be considered to be clinically important.3 Interpreting the magnitude of the strength of correlations seems to be more difficult. How relevant is a Spearman correlation of .58 between changes in the PRAM and respiratory resistance? Statistical significance is clearly not informative here, because it depends heavily on sample size. Such difficulties in interpretation limit the usefulness of correlation coefficients in the evaluation of responsiveness. Finally, it is important to realize that clinimetric properties may depend on the setting and patient population. An asthma score that is responsive in secondary care or in a specific age group 745
EDITORIALS
may not necessarily perform equally well in other circumstances. Therefore additional studies, comparing the performances of several pediatric asthma scores in various settings and populations, should be encouraged in order to support (or refute) the use of asthma scores in clinical practice. Daniëlle van der Windt, PhD Institute for Research in Extramural Medicine Vrije Universiteit Amsterdam, 1081 BT The Netherlands
THE JOURNAL OF PEDIATRICS DECEMBER 2000
3.
4.
5.
6.
REFERENCES 1. Parkin PC, Macarthur C, Saunders NR, Diamond SA, Winders PM. Development of a clinical asthma score for use in hospitalized children between 1 and five years of age. J Clin Epidemiol 1996;49:821-5. 2. Van der Windt DAWM, Nagelkerke
746
7.
8.
AF, Bouter LM, Dankert-Roelse JE, Veerman AJP. Clinical scores for acute asthma in pre-school children. A review of the literature. J Clin Epidemiol 1994;47:635-46. Chalut DS, Ducharme FM, Davis GM. The Preschool Respiratory Assessment Measure (PRAM): a responsive index of acute asthma severity. J Pediatr 2000;137:762-8. Ducharme FM, Davis GM. Respiratory resistance in the emergency department: a reproducible and responsive measure of asthma severity. Chest 1998;113:1566-72. Kirschner B, Guyatt GH. A methodological framework for assessing health indices. J Chron Dis 1985;38:27-36. Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chron Dis 1987;40:171-8. Guyatt GH, Kirschner B, Jaeschke R. Measuring health status: what are the necessary properties? J Clin Epidemiol 1992;45:1341-5. Hays RD, Hadorn D. Responsiveness to change: an aspect of validity, not a
9.
10.
11.
12.
13.
14.
separate dimension. Qual Life Res 1992;1:73-5. Liang MH, Fossel AH, Larson MG. Comparison of five health status instruments for orthopedic evaluation. Med Care 1990;28:632-42. Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care 1989;27:S17889. Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol 1997;50:239-46. Deyo RA, Centor RM. Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance. J Chron Dis 1986;39:897-906. Stucki G, Liang MH, Fossel AH, Katz JN. Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis. J Clin Epidemiol 1995; 48:1369-78. Stratford PW, Binkley JM, Riddle DL. Health status measures: strategies and analytic methods for assessing change scores. Phys Ther 1996;76:1109-23.