Volume 4 • Number 4 • 2001 VALUE IN HEALTH
LETTERS TO THE EDITOR
To the Editor—Health-related quality of life (HRQL), health status, and other specific functional status assessments that are included udner the umbrella of PRO (patient-reported outcomes), are increasingly used as efficacy end points in randomized controlled trials. It is now recognized that, although perceptual, PROs can be measured in reliable and valid ways. Indeed, evidence of the scientific soundness of the questionnaire should be provided. In that sense we fully support Paul Kind’s general statement “we need demonstrable rigor in our methods.” Nevertheless, Dr. Kind’s comments raise two major issues: the perspective taken for scaling, and the level of data reported in a single manuscript.
Scaling It is common practice for multi-item scales in descriptive (psychometric) questionnaires to be scored using the method of summated ratings. Indeed, simple summing of scores over the individual items is the most rational index. This “linear model” approach works if items are measuring the same construct, the scaling assumption being based on a similar distribution of responses to items and similar item variances. In addition, the internal consistency reliability of the scale is estimated using Cronbach’s alpha coefficient. This provides an indication of the degree of convergence between different items hypothesized to represent the same construct. Classic references include Likert [1], Nunnally [2], and Streiner [3]. This was the perspective we took for the MSF-4, bearing in mind that the MSF-4 was a descriptive questionnaire aimed at evaluating the sexual functional status of men with benign prostatic hypertrophy (BPH). We actually followed the psychometric criteria described and recommended by the Medical Outcomes Trust and its Scientific Advisory Committee [4], based on Likert’s [1] theory. We did not introduce a valuation system (explicit weights) in the scoring algorithm of the MSF-4, given that this questionnaire is not a preference-based instrument and the introduction of differential weights in a one-domain, multi-item scale does not seem to provide a substantial advantage over using the unweighted score, particularly when item-total correlations are similar or © ISPOR 1098-3015/01/$15.00/344 344–345
when the reliability is acceptable [5,6]. Furthermore, improvement in the quality of the items and/or increases in the number of items are generally recommended ways of improving reliability, rather than the weighting of items. In addition, major issues related to weighting are still under discussion: Which method should be used? Whose value should be taken into account?
Level of data displayed in a single manuscript If the first issue raised by Dr. Kind can be considered as theoretical, or even philosophical, the second one is very practical. According to the perspective taken, the underlying theory, and the context, authors have to face a difficult choice. What is the minimum level of data that should be reported in a single manuscript, taking into account the type of journal and the numbers of words/tables recommended by the editor? How much evidence should be provided to demonstrate the appropriateness of the scoring system and the reliability and validity of the PRO instrument? One can easily note that, even though standards of validation are available, great variability exists in the types of data reported in manuscripts describing the development and use of PRO instruments. In particular, details supporting the scoring algorithm or the ordinality of item response categories are not commonly reported. Following the usual practice, in our manuscript we decided to put the focus on the clinical validity of the MSF-4 questionnaire rather than report details on the scaling assumptions. A great deal more information is available in the analyses than was reported in the manuscript. Interested readers can contact the author for additional details on the MSF-4 instrument characteristics. Again, we think the main issue is the absence of consensus regarding the type of data that should be shown in a manuscript to support the validation of a scale. In any case, as stated by Dr. Kind, we should go beyond Cronbach’s alpha.—Patrick Marquis, Mapi Values, Lyon, France. References 1 Likert R. A technique for the measurement of attitudes. Arch Psychol 1932;140:5–55. 344
Letters to the Editor 2 Nunnally JC. Psychometric Theory (2nd ed.). New York: McGraw-Hill, 1978. 3 Streiner DL, Norman GR. Health Measurement scales. A Practical Guide to Their Development and Use. New York: Oxford University Press, 1989. 4 Lohr KN, Aaronson NK, Alonso J, et al. Evaluating Quality of Life and Health Status Instruments: de-
345 velopment of scientific review criteria. Clin Therapeutics 1996;18:979–92. 5 Lei H, Skinner HA. A psychometric study of life events and social readjustment. J Psychosomatic Research 1980;24:57–65. 6 Edwards AL, Kenney KC. A comparison of the Thurstone and Likert Kechniques of attitude scale construction. J Appl Psychology 1946;30:72–83.