Letters to the Editor / Journal of Clinical Epidemiology 64 (2011) 1463e1469
the sum of the percentages of false positive and false negative classifications is smallest (the Receiver Operating Characteristics [ROC] method, which is an anchor-based method) [2]. The SDC is defined as described above. In Fig. 1a the SDC is larger than the MIC. Now if the measurement error of this instrument would be reduced, the distribution of the scores in both groups will become smaller, and the overlap between the curves also will become smaller. This will not influence the MIC, but the SDC will become smaller (closer to the MIC). If the measurement error is decreased enough, the SDC will become smaller than the MIC (Fig. 1b) [6]. In summary, if the MIC is smaller than the SDC, we argue that one should try to reduce the SDC, and not increase the MIC. The SDC can be reduced, for example, by increasing the number of items in a scale or by taking the mean score of repeated measurements. In conclusion, we think the solution of Kemmler et al. is inadequate because MIC and SDC are two different concepts. What is considered clinically relevant should not be
1467
determined by the measurement error of the scale. Therefore, SEM or SD should not be used as a measure of the MIC. C.B. Terwee* Department of Epidemiology and Biostatistics and the EMGO Institute for Health and Care Research VU University Medical Center Van der Boechorststraat 71081 BT Amsterdam, The Netherlands *Corresponding author. E-mail address:
[email protected]
B. Terluin Department of General Practice and the EMGO Institute for Health and Care Research VU University Medical Center Amsterdam, The Netherlands
D.L. Knol H.C.W. de Vet Department of Epidemiology and Biostatistics and the EMGO Institute for Health and Care Research VU University Medical Center Van der Boechorststraat 71081 BT Amsterdam, The Netherlands
References [1] Kemmler G, Zabernigg A, Gattringer K, Rumpold G, Giesinger J, Sperner-Unterweger B, et al. A new approach to combining clinical relevance and statistical significance for evaluation of quality of life changes in the individual patient. J Clin Epidemiol 2010;63:171e9. [2] de Vet HC, Ostelo RW, Terwee CB, van der RN, Knol DL, Beckerman H, et al. Minimally important change determined by a visual method integrating an anchor-based and a distribution-based approach. Qual Life Res 2007;16:131e42. [3] Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 2008;61:102e9. [4] de Vet HCW, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol 2006;59:1033e9. [5] de Vet HC, Terwee CB, Ostelo RW, Beckerman H, Knol DL, Bouter LM. Minimal changes in health status questionnaires: distinction between minimally detectable change and minimally important change. Health Qual Life Outcomes 2006;4:54. [6] de Vet HCW, Terluin B, Roorda LD, Mokkink LB, Ostelo RW, Knol DL, et al. Three ways to quantify uncertainty in individually applied ‘‘minimally important change’’ (MIC) values. J Clin Epidemiol 2010;63:37e45. doi: 10.1016/j.jclinepi.2011.06.015
Clinically relevant, statistically significant, or both? Minimal important change in the individual subject revisited In reply: Fig. 1. (a) Anchor-based Minimal Important Change (MIC) distribution. Smallest Detectable Change (SDC) is larger than the MIC. (b) Anchor-based MIC distribution. SDC is smaller than the MIC.
Scientific concepts develop over time. Sometimes it takes long until a concept takes on a form that is both
1468
Letters to the Editor / Journal of Clinical Epidemiology 64 (2011) 1463e1469
scientifically sound and generally accepted. Several approaches may develop in parallel, controversial discussions, and even errors and their uncovering may be an important part of the scientific discourse. All this seems to apply for the concept of the minimal important difference (MID) or minimal important change (MIC), introduced about 20 years ago [1]. Starting point of our article [2] was an observation made when studying longitudinal quality-of-life (QOL) data of cancer patients collected routinely for monitoring purposes. For most subscales of the QOL instrument used (European Organisation for Treatment and Research of Cancer [EORTC] QLQ-C30 [3]), we found a conspicuously large gap between the recommended MIC [4] and the smallest detectable change (SDC) with a substantial proportion of changes falling within this gap: 50% or more ‘‘relevant’’ changes were indistinguishable from random fluctuations! An illustration and discussion of this issue was the main purpose of our study. In order not just to criticize the MIC concept when being applied to individual subjects, we gave a suggestion how the concept might be improved. Having deliberated the criticism by Terwee et al. [5] we acknowledge that our suggested approach had not been sufficiently well thought out. Combining the MIC and the SDC to a new threshold is indeed problematic. It has the clear disadvantage that it overrides and thus disregards the originally determined, for example, anchor-based MIC value. A viable alternative in our view would be to report both the MIC and the SDC. Changes exceeding the MIC but falling below the SDC should be labeled with a warning: ‘‘Beware! Despite potential clinical relevance this change may be due to chance fluctuation alone.’’ We would like to point out here that it has never been our intention to advocate the use of QOL instruments with low reliability. In this respect, we feel misinterpreted by Terwee et al. Terwee et al. suggest to place less emphasis on the type-one error but to use both the type-one and type-two errors when determining and evaluating the MIC [5e7]. Their Receiver Operating Characteristic (ROC) approach looks appealing because it integrates the two types of error. However, two things should be borne in mind: 1) Combining type-one and type-two errors in an ROC analysis does not exempt from reporting the size of the two types of error. Both should lie within reasonable limits, for example, below 0.2 or 0.25. 2) The ROC approach requires an anchor that can serve as a gold standard for the construct to be measured, that is, change in the particular QOL dimension. Surprisingly, the anchors used in literature, for example, transition ratings or other single-item ‘‘scales’’ have often not been extensively validated, and considerable criticism about their reliability and validity has been raised [7,8].
Hence not only the reliability of the QOL instruments, as Terwee et al. recommend, but also the reliability and validity of the anchors should be increased to improve the accuracy and enhance the applicability of the ROC method. Georg Kemmler* Johannes Giesinger Bernhard Holzner Department of Psychiatry and Psychotherapy Medical University of Innsbruck, Anichstr. 35 6020 Innsbruck, Austria *Correponding author. E-mail address:
[email protected] (G. Kemmler)
References [1] Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials 1989;10:407e15. [2] Kemmler G, Zabernigg A, Gattringer K, Rumpold G, Giesinger J, Sperner-Unterweger B, et al. A new approach to combining clinical relevance and statistical significance for evaluation of quality of life changes in the individual patient. J Clin Epidemiol 2010;63: 171e9. [3] Aaronson NK, Ahmedzai S, Bergman B, Bullinger M, Cull A, Duez NJ, et al. The European Organization for Research and Treatment of Cancer QLQ-C30. J Natl Cancer Inst 1993;85: 364e76. [4] Osoba D, Rodrigues G, Myles J, Zee B, Pater J. Interpreting the significance of changes in health-related quality of life scores. J Clin Oncol 1998;16:139e44. [5] Terwee CB, Terluin B, Knol DL, de Vet HCW. Combining clinical relevance and statistical significance for evaluating quality of life changes in the individual patient. J Clin Epidemiol 2011;64:1465e8. [in this issue]. [6] De Vet HCW, Ostelo RW, Terwee CB, van der Roer N, Knol DN, Beckerman H, et al. Minimally important change determined by a visual method integrating an anchor-based and a distribution-based approach. Qual Life Res 2007;16:131e42. [7] De Vet HCW, Terluin B, Roorda LD, Mokkink LB, Ostelo RW, Knol DL, et al. Three ways to quantify uncertainty in individually applied ‘‘minimally important change’’ (MIC) values. J Clin Epidemiol 2010;63:37e45. [8] Guyatt GH, Norman GR, Juniper EF, Griffith LE. A critical look at transition rating scales. J Clin Epidemiol 2002;55:900e8. doi: 10.1016/j.jclinepi.2011.06.014
Engraving marble and comparative effectiveness reviews To the Editor: Tsertsvadze et al. [1] wisely insisted on the updating issue in comparative effectiveness reviews (CERs). They