Pain, 62 (1995) 101-109
101
0 1995 Elsevier Science B.V. Al1 rights reserved 0304-3959/95/$09.50
PAIN 2764
The factorial structure and stability of the McGill Pain Questionnaire in patients experiencing oral mucositis following bone marrow transplantation Gary W. Donaldson
*
Fred Hutchinson Cancer Research Center, Seattle, WA 98104 (USA)
(Received 10 June 1994, revision received 2 November 1994, accepted 8 November 1994)
Summary The McGill Pain Questionnaire (MPQ) (Melzack 1975) is an important assessment tool for multidimensional pain measurement in both clinical practice and research. Despite widespread acceptance, empirical analyses have not consistently verified the 3 a-priori factors that guided the subclass construction of the Pain Rating Index (PRI) of the MPQ. This study compared the a-priori model with 2 qualitatively different factor models in 191 patients with oral mucositis pain at 3 days and 10 days following bone marrow transplantation. A semantic model defined by Sensory Action, Sensory Evaluation, and Affective Evaluation factors of subclass descriptor content fit better than the a-priori model and a model positing a single genera1 pain factor. The 3 semantic PRI factors were highly intercorrelated, with the sensory factors correlating more highly with an independent visual analogue (VAS) pain scale. Standardized factor regression coefficients between the two occasions of measurement ranged between 0.4 and 0.5. Mean factor change was greatest for Sensory Evaluation and lowest for Affective Evaluation. Al1 analyses were conducted with the LISREL 7 structural equation modeling program. Although the factor analyses indicated an unambiguous ranking of PRI models according to statistical criteria, these theoretical results generalize poorly to simple scores formed by direct addition of the PRI subclasses. Summary scores can only approximate the unobserved factors and cannot retain the fine discriminations revealed by the theoretical factors, Psychometrie considerations suggest that a single PRI total score wil1 yield better practica1 measurement than any storing rules based on multiple factors.
Key words: McGill Pain Questionnaire;
Bone marrow transplantation;
Introduction The McGill Pain Questionnaire (MPQ) (Melzack 1975) is widely regarded as the definitive assessment tool for multidimensional pain. The item development for the Pain Rating Index (PRI) of the MPQ derives from the gate control theory of pain (Melzack and Wal1 1965; Melzack and Casey 19681, according to which
* Corresponding
author: Gary W. Donaldson, Pain and Toxicity Program, FB-600, Fred Hutchinson Cancer Research Center, 1124 Columbia Street, Seattle, WA 98104, USA.
SSDI 0304-3959(94)00256-8
Factor analysis; LISREL
pain comprises separate sensory, affective, and evaluative components. By constructing subclasses of verba1 descriptors for these theoretical dimensions, the MPQ developers sought to operationalize measurement of the pain components as 3 correlated but non-redundant summative scores. Despite widespread acceptance of the MPQ in the field, analyses have not consistently verified the presence of the 3 non-redundant PRI factors predicted by the a-priori model (for review of these exploratory analyses, see Turk et al. 1985; Lowe et al. 1991; Holroyd et al. 19921. Exploratory factor analyses are poorly suited to testing a-priori factor structures. In this paper, 1 follow the
102
recent tradition of Turk et al. (1985), Lowe ct al. (1991), and Holroyd et al. (1992) in considering confirmatory models that allow direct tests of factorial structure. Turk et al. (1985) found acceptable statistical fit for this model in 2 different chronic pain populations, and Lowe et al. (1991) reported satisfactory statistical fit in 2 different acute pain populations. Holroyd et al. (1992) found the a-priori structure for the PRI fit data from a large sample of chronic pain patients marginally better than a single factor, but not as wel1 as a 4-factor model inspired by exploratory factor analysis. Lowe et al. (1991) found factor intercorrelations in the 0.5-0.8 range, and concluded that the 3 factors were adequately separated, while Turk et al. (198.5) and Holroyd et al. (1992), who reported slightly higher correlations of 0.6-0.8, questioned the discriminant validity of the a-priori factors and storing. These 3 confirmatory studies indicate that the apriori 3-factor structure for the PRI can provide an acceptable fit to data from several different populations; other models, however, might fit as wel1 or
better. Just as a convincing experiment requires a credible control group for comparison, persuasive evidence for the validity of a particular model requires comparing its performance against substantively plausible alternatives. In the present study, 1 compare the fit of the a-priori model to 2 qualitatively different, yet plausible, factor structures for the data. The PRI, despite its theoretical 3-dimensional structure, is often used in practice to obtain a single overall score by summing the subclasses defining the PRI. This common practice suggests that a single-factor model, in which all items load on the same general factor, might fit the data adequately. If the single-factor model fits almost as wel1 as models with multiple factors, one should question whether the additional complexity of the multi-factor models is warranted. Examination of the descriptors composing the PRI subclasses suggests that patient endorsement of subclasses might cluster about 3 underlying semantic dimensions that differ somewhat from the theoretical organization implied by the gate control theory. Specif-
SemanticModel
Single Factor Model
A PrioriModel
i Temporal
+ 1Tempora11
t-
,,,-Sensor; Action &~___
I
~Pressure
1
[EEÏ
Pain Dullness
C
--
4:i Evaluatne ---.7--.
t-
Fig. 1. Three alternative measurement models for the PRI. Circles represent unobserved variables, and boxes denote observed subclass Straight arrows indicate hypothesized factor loading relationships and unique variances. Curved arrows denote correlations.
scores.
103
ically, the Temporal, Spatial, Punctate Pressure, Incisive Pressure, Constrictive Pressure, and Traction Pressure descriptors express semantic content of sensory action. The Thermal, Brightness, Dullness, Sensory Miscellaneous, Tension, and Evaluative descriptors connote sensory evaluation. The Autonomie Fear, Punishment, and Affective miscellaneous subclasses represent affective evaluation. (It is helpful when examining these groupings to attend to the descriptors themselves, rather than the subclass names, which are useful but not infallible summaries of their semantic content.) Fig. 1 summarizes the 3 models as alternative path diagrams for the 16 subclasses of the PRI (consistent with common practice, 1 restrict analysis to the first 16 subclasses of the PRO. The different symbols in this figure distinguish two kinds of variables: observed, measured variables (squares) and unobserved, inferred variables, or factors (circles). The unmeasured factors can be evaluated only through their effects, represented by arrows in Fig. 1, on observed variables. Arrows from factors to variables symbolize factor loading relationships; missing arrows imply factor loadings of zero. Additional arrows pointing toward the variables represent the effects of residual or unique influences independent of the factors, analogous to the error terms in regression equations. Curved bi-directional arrows indicate correlated factors (i.e., an oblique factor analysis solution).
Methods
analyses described in this paper also include visual analogue scale WAS) pain reports, scaled from 0 = no pain to 100 = as much pain as possible, obtained by patient self-report at these same two occasions. ’
Overview of statistical modeling process Al1 statistical models were estimated using the LISREL 7 (Jöreskog and Sörbom 1989) structural equation modeling program. In LISREL, models are specified in advance, rather than letting the algorithm find the ‘best’ solution by an empirical criterion. In practice, model specification requires identifying a set of prohibited relationships among factors and variables by fiiing their factor loadings to zero. The program then estimates the remaining relationships and provides diagnostic statistics useful in evaluating how wel1 the specified relationships fit the data. Poor fit indicates the model needs revision, while good fit indicates the model may be one (of perhaps many) consistent with the data. The program was used in 5. stages, with the best model at each stage serving as the starting model for the next stage. In stage 1, 1 compared the fit of the 3 altemative factor analysis models independently within each occasion and selected the best model according to several diagnostic criteria. In stage 2, the best stage-1 model was applied longitudinally, allowing factors at the second occasion to be determined by corresponding factors (only) at the first occasion. The third stage introduced a few technical modifications of the model to improve fit and parsimony. The fourth stage examined the consistency of factor interpretation by testing the invariante of factor loading patterns at the two occasions. The tïnal stage estimated the mean changes in the unobserved factors. Clinically useful factors must have detectable and distinguishable clinical consequences. In psychometrie terms, proposed factors should have dìmiminanf ualidity: different patterns of relationships with other criteria. 1 examined the discriminant validity of the 3 factors in the final model by considering their sensitivity to change and their correlations with contemporaneous VAS pain reports. Different patterns of association with these criteria would provide evidente that substantively distinguishable aspects of pain had been measured.
Sample The MPQ questionnaire data for these analyses were provided by 191 patients who received bone marrow transplants for leukemias and other hematological malignancies during 1987-1989 at Fred Hutchinson Cancer Research Center. The sample was 53% male and 47% female, with a mean age of 35 (range: 18-57). Al1 but 2% of the sample were high school graduates, and 41% were college graduates. The diagnoses varied, with 15% acute myelogenous leukemia, 11% acute lymphocytic leukemia, 36% chronic myelogenous leukemia, 15% lymphoma, and 23% other classifications. Preparatory chemoradiotherapy for bone marrow transplantation leaves patients immunosuppressed and vulnerable to adventitious infections that produce severe oral mucositis and pain. Average oral pain between transplant and engraftment closely tracks the course of mucositis during this period (Schubert et al. 1992). Typically, such pain begins 2 or 3 days after transplant, peaks about 1 week later, then gradually subsides (Chapko et al. 1989; Donaldson and Moinpour 1992). Even strong opioids, which are available to patients as needed, cannot completely mask this pain (Chapko et al. 19891. The analyses reported here include complete data (n = 191) obtained 3 days following transplant, and a subsample of 176 patients available 10 days following transplant. The first occasion corresponds to a time when pain is generally increasing, and the second with the typical time of maximum pain.
Results Stage 1: evaluation of cross-sectional models
Although evaluating qualitatively different models is never straightforward, several diagnostic indices, available as routine output from LISREL 7, help describe model adequacy. Each of these indices assumes a different rationale and stresses different criteria for model evaluation. x2 statistic. The most common measure of fit reported in confirmatory factor analysis studies is the x2 statistic, which indicates the absolute lack of fit of the model. In large multivariate normal samples, the P value provided by LISREL corresponds approximately to the probability of obtaining a x2 this large or larger (i.e., a model that fits this poorly or more poorly) if the theoretical population model is true. Contrary to conventional significante paradigms, large values of x2
Procedures Trained interviewers collected the MPQ measures as part of a larger questionnaire battery administered in the patients’ rooms. The
i Al1 data analyzed in this paper are available on request from the author.
104
and smal1 P values indicate poor fit and model performance, leading to rejection of the hypothesis that the assumed model is ‘true’. Modifications in models can be evaluated similarly. If the modifications are restrictive, relatively large increases in x2 reject the hypothesis expressed by the restrictions; if the modifications are expansive, relatively large decreases in xZ suggest the modifications improved the model. ComparatiLle fit index (CFI). A goodness-of-fit index based on a different rationale is given by the comparative fit index (Bentler 1990), or CFI, which ranges from 0 = complete lack of fit to 1.0 = perfect fit. The CFI compares the relative performance of a theoretical model, as measured by the x2 statistic, to that of a standard baseline model. CFI values greater than 0.9 are generally considered adequate. Akaike information criterion. As one estimates more free parameters, the degrees of freedom to evaluate model adequacy decrease. Models with zero degrees of freedom wil1 fit the data perfectly, and models with few degrees of freedom wil1 tend to fit very well. Clearly, examination of x2 statistics cannot inform about model parsimony. By comparing the x2 statistic against its degrees of freedom, parsimonious fit can be evaluated. One well-respected index for this purpose is the Akaike Information Criterion (AIC) (Akaike 1987) for which large values indicate poor fits and smal1 or negative values correspond to good, parsimonious fits. Proportion of explained variance. For factor analysis models, it is important to know how much variante in the observed variables is accounted for by the unobserved factors. LISREL provides a coefficient of determination, ranging from 0 to 1, for the observed variables considered jointly. 1 also report the median squared multiple correlation for each observed variable considered separately. Both indices correspond to the proportions of variante accounted for by the unobserved factors. Extreme residuals. Finally, 1 report a diagnostic index that tends to be very useful in practica1 application: the number of standardized residuals for the fitted variances and covariances that exceed 2.00 in absolute
TABLE
value. One expects, assuming normality, approximately 5% of these residuals to exceed 2.00 with acceptable models. Suspect models are often betrayed by greater numbers of extreme residuals. Individually, none of the above indices is entirely satisfactory; jointly, they provide a reasonable range of criteria for evaluating model adequacy. To the extent that models rank consistently on these separate criteria, one can be fairly confident in choosing among the models. The results for the 3 alternative models on these criteria are presented, separately for each occasion, in Table 1. The criteria indicate an unambiguous rank-ordering of the 3 models. The semantic 3-factor model fits the data best; the single-factor model fits worst, and the a-priori 3-factor model is in between. Accordingly, the semantic 3-factor model was retained for the next phases of analysis. Stage 2: initial longitudinal model
The cross-sectional semantic model was next extended to both occasions simultaneously in a single LISREL analysis. The primary aim at this stage was estimation of the regression coefficients predicting Day 10 factors from their respective Day 3 counterparts. These coefficients denote the stabilities of individual differences in the factors. It is of considerable interest to know whether the pain factors differ in their stabilities. Factors with high stabilities (near 1.0) are more trait-like and predictable, and should require less frequent repeated measurement in subsequent studies, than state-like factors displaying low stabilities (near 0). The fit of the stage 2 longitudinal model was poor: x2 = 919.88 with df = 519. Stage 3: technical modijïcations in longitudinal structure
LISREL diagnostics indicated that the longitudinal model fit poorly at stage 2 because the assumptions of uncorrelated residuals, both in the factors and the measured variables, were violated. Longitudinal data frequently violate this assumption, since unexplained individual differences tend to persist across occasions
1
DIAGNOSTIC
CRITERIA
FOR 3 ALTERNATIVE
CROSS-SECTIONAL
MODELS
XZ
df
ZRSD .2
Median SMC
Coef. of determ.
CFl
265.59 223.46 286.94
102 101 104
33 23 33
0.380 0.444 0.350
%
0.971 0.908
0.857 0.893 0.840
61.59 21.46 78.94
175.0 1 138.70 235.41
102 101 104
IS 12 27
0.372 0.408 0.335
*** 0.981 0.904
0.921 0.959 0.858
~ 21.91 - 63.30 27.41
AIC
Day 3 post-transplant A-priori Semantic Single-factor Day 10 post-transplant A-priori Semantic Single-factor
1
*
105
and induce correlations in the error terms of successive measurements. With LISREL, unlike ordinary regression, such patterns of correlated errors can be identified and correctly modeled. A stage 3 model allowed the errors of prediction of the Day 10 factors to intercorrelate, and also allowed the respective errors of measurement for the Autonomie, Fear, Punishment, and Affective Miscellaneous subclasses (i.e., the subclasses measuring Affective Evaluation) to correlate between Day 3 and Day 10. These modifications resulted in a much improved fit, with a new x2 of 671.09 with u”= 508 (A,y2 = 248.79, Adf= 11, P < 0.0001, a highly significant improvement). Stage 4: factorial invariante across occasions
The previous models maintained the same factor at the two occasions but allowed coefficient magnitudes to vary. Interpretation is simplest when factor loading patterns are completely invariant. When tested by specifying that al1 factor loadings be equal at the two occasions, this hypothesis yielded a x2 of 704.58 with df = 521 (A,y2 = 33.49, Adf = 13, P= 0.0014). Since the constraints introduced a significant lack of fit, 1 rejected the hypothesis of complete factor invariante. Although complete invariante is simplest to interpret, partial invariante allows meaningful estimation of changes in factor means. LISREL diagnostics suggested that the lack of invariante occurred primarily in the Constrictive Pressure, Dullness, and Autonomie subclasses. This degree of partial factor invariante is consistent with sufficiency conditions that Byrne et al. (1989) have described for inferring meaningful factor
pattems
change. To test for partial factor invariante, 1 constrained al1 factor loadings except these 3 to be equal across occasions. This hypothesis was consistent with the data (x2 = 683.77, df = 518; change from stage 3: A,y2 = 12.68, Adf = 10, P = 0.473). Stage 5: incorporating factor means
The final model extended the LISREL analysis at stage 4 to include mean data as wel1 as covariance data and allowed the means of the unobserved factors to differ between occasions. This model yielded a x2 of 703 with df = 532, stil1 consistent with the data. Furthermore, changes in the levels of factors (means) were consistent with factor loadings and stabilities (regressions); relative to stage 4, introducing means yielded an insignificant loss of fit: Ax2 = 19.23, Adf = 14, P = 0.156 (sec Mandys et al. (1994), for the rationale for this test). The factor loadings for the final model appear in Table 11. Factor loadings are the estimated regression coefficients for predicting the observed variables from the factors. In Table 11, they are standardized for ease of interpretation, but estimation, hypothesis testing, and imposition of equality constraints were carried out with the unstandardized parameters, as required for proper statistical specification. (Because of differences in variances, the descriptive standardized loadings can differ even when the unstandardized loadings are held constant.) The VAS variables are omitted from this table because they were treated as separate factors in the analyses. The Table 11 semantic model describes loadings of roughly the same magnitude within and across factors
TABLE 11 FINAL MODEL: STANDARDIZED Subclass
Day 3 Sensory Action
Temporal Spatial Punctate Incisive Gonstrictive Traction Thermal Brightness Dullness Sensor Misc. Tension Autonomie Fear Punishment Affective Misc. Evaluative
LISREL FACTOR LOADINGS Day 10 Sensory Eval.
Affect. Eval.
0.549 0.606 0.691 0.674 0.558 * 0.690
Sensory Action
Sensory Eval.
Affect. Eval.
0.530 0.582 0.647 0.646 0.684 * 0.674 0.559 0.529 0.687 * 0.651 0.608
0.570 0.550 0.520 * 0.620 0.590 0.661 * 0.586 0.784 0.657
0.753
0.592 * 0.697 0.891 0.788 0.727
* Unstandardized loadings (nat shown) free to vary across occasions; the unstandardized loadings were constrained to be equal across occasions for all other variables. Note: blanks denote parameters fixed to zero. AII loadings are significantly different from zero at P « 0.00001.
TABLE
111
FINAL MODEL: (diagonall
ESTIMATED
FACTOR
CORRELATIONS
tahove
diagonall,
Day 3
Sensory Action Sensory Eval. Affect. Eval. VAS Pain
tbelow
diagonal).
AND
VARIANCES
Day 10
Sensoiy Action
Sensory Eval.
Affect. EWI.
VAS Pain
Sensory Action
Sensory Eval.
Affect. Eval.
VAS Pain
1.151 0.76’) 0.730 17.364
0.x50 0.711 0.474 16.31 I
0.910 0.752 055Y 95x0
0.642 0.767 o.sox 637.72
1.615 1.147 1.169 21.432
0.882 1.047 0.824 18.519
0.750 0.656 1.504 13.638
0.610 0.655 0.402 764.15
and occasions. This is usually considered a desirable feature of a robust simple structure factor analysis, since al1 indicator variables perform comparably as measures of their respective factors. Al1 the loadings are in the 0.5-0.7 range, so the unobserved factors explain roughly 25-50% (squaring the factor loadings) of the variante of the subclass scores. * Table 111 reports estimated covariances and correlations among factors at Day 3 and Day 10. As in conventional factor analysis, these estimates differ from those that would be obtained by correlating summative scores. One notable feature of Table 111is the increasc in factor variances (and covariances) between occasions. As pain generally increases, patients become more heterogeneous in their reports of pain. Affective Evaluation is particularly striking: its variante increases almost 3-fold. The increase in variante of the VAS pain score is relatively smaller than the increases for the MPQ factors. Correlations among the 3 MPQ pain factors are generally high (particularly at Day 31, ranging between 0.7 and 0.9. Although the correlation between the 2 sensory factors is very high at both Day 3 and Day 10, correlations involving Affective Evaluation are slightly lower at Day 10. At both occasions, the VAS pain score correlates more highly with the sensory factors, particularly with Sensory Evaluation, than with the Affective Evaluation factor. Table IV presents the regression coefficients for predicting Day 10 factors from Day 3 factors (the stability coefficients). When standardized, the coefficients for the factors are al1 in the 0.4-0.5 range, while the VAS measure has a lower stability (0.296). The relatively low VAS stability is more apparent than real: the 3 factor stabilities correspond to perfectly reliable
?
COVARIANCES
Readers wishing to compare these factor loadings with these reported by others should note that my Table II results assume standardization of both factors and variables (i.e., what LISREL 7 now calls the ‘completely standardized’ solutionl. In previous releases of LISREL, the ‘standardized solution’ was only partially standardized. Additional complications attend the interpretation of standardized solutions in multi-sample analyses.
unobserved factors, but the coefficient for the observed VAS scale is attenuated by measurement error. Tabje V displays the estimated changes in factor means between Day 3 and Day 10. To provide a metric for factor comparison, 1 have computed coefficients of variation (CV) by dividing each estimated factor change by the standard deviation of factor change, a quantity that can be derived from ancillary LISREL output. Larger CVs correspond to greater between-occasion mean changes relative to the inferred standard deviation of factor change scores. As expected for this population, all changes in the semantic factors and the VAS measure were significant. Sensory Evaluation displayed the greatest relative change, with the mean leve1 at Day 10 being higher than the mean leve1 at Day 3 by 0.702 of the standard deviation of change. Affective Evaluation, by the same criterion, increased only half as much. Although the stage 5 results appear in separate tables for clarity, al1 the results in Tables 11-V were obtained from a single analysis. In LISREL, al1 parameter estimates are simultaneous. The results of Tables 11-V describe the 3 assumed factors generating the observed patterns of MPQ data, but provide no direct information about practica1 scoring rules derived from these results. The inferred factors can at best be approximated by observed scores, and multiple solutions for these approximations exist; this is known as the factor score indeterminacy problem (e.g., Mulaik 1972). NO consensus exists among
TABLE
IV
FINAL MODEL: ESTIMATED FICIENTS FOR PREDICTING 3 FACTORS
Sensory Action Sensory Evaluation Affective Evaluation VAS Pain Note: al1 P < « 0.00001.
REGRESSION (Stability) COEFDAY 10 FACTORS FROM DAY
Coefficient
Standardized
0.493 0.514 0.855 0.324
0.416 0.424 0.521 0.296
coefficient
107 TABLE FINAL MEANS
V MODEL: ESTIMATED INCREASES BET’WEEN DAY 3 AND DAY 10
Sensoty Action Sensory Evaluation Affective Evaluation VAS Pain
IN
FACTOR
Increase
P
CV of change (Increase/uJ
0.600 0.711 0.288 17.099
« 0.0001 « 0.0001 0.0053 « 0.0001
0.469 0.702 0.355 0.544
factor analysts for resolving this genera1 problem, but factor scores for psychometrie subscales have traditionally been estimated by simple addition of indicator variables. Table VI contains correlations and internal consistenties of semantic model factor scores computed in this manner. Since observed scores are somewhat unreliable, correlation magnitudes are lower than those estimated for the unobserved factors; they are nonetheless quite high when compared with their reliabilities, which can be thought of as upper bounds to observed correlations.
Discussion Although the subclasses measured the semantic factors consistently across dynamic conditions, the structure and intensity of the factors changed substantially. The analyses suggest greater separation of factors, particularly in the correlations involving Affective Evaluation, as pain increased from Day 3 to Day 10. The lower levels of pain observed at Day 3 may not reliably elicit separate dimensions of pain. The most notable longitudinal change was a genera1 increase in variante between Day 3 and Day 10, implying that patients varied in how much their pains increased. Moreover, the modest levels (0.4-0.5) of the factor stability coefficients limit the predictability of these increases. Heterogeneity was prominent for Affective Evaluation; patients were almost 3 times more variable on this factor at Day 10 than at Day 3. Despite limited predictability, such variability allows considerable scope for clinical intervention, since consistent treatment could moderate some of the more severe affective responses.
TABLE
Al1 pain dimensions showed substantial mean increases between Day 3 and Day 10, as expected for this population. As proportions of variability, these increases were higher for the sensory dimensions and the VAS score than for Affective Evaluation. This finding underscores the heterogeneity of changes in Affective Evaluation: the variability at Day 10 is large compared with the mean change from Day 3. Any simple model (i.e., one with simple structure and few factors) fitting these data wel1 must posit highly correlated factors, and many different factor patterns could provide roughly comparable fits. The homogeneous correlations among the PRI items wil1 not incisively differentiate competing models. Fig. 2 illustrates this ambiguity by representing both the apriori and semantic factor solutions at Day 3 as vectors embedded in 3 dimensions. The axes are the first 3 principal components of the correlation matrix of the PRI items; this coordinate system is an objective ‘hyperspace’ that provides consistent reference points across factor analysis solutions. Vectors close together in space represent highly correlated variables, while orthogonal vectors represent zero correlations. Both solutions place 3 highly oblique dimensions in the interior space of the variables, so that projections from variables fa11on a constrained surface area. The placement, however, is ambiguous: the clustering of the variables does not allow definitive visual separation. Minor shifts in the clustering could exert considerable influence on the placement of these dimensions. A single factor, by contrast, could be placed unambiguously along the axis of the first principal component, yielding, at some tost in overall fit, a highly robust and interpretable pain dimension. Fig. 2 also suggests that the 3-factor solutions are rather smal1 subspaces of the space defined by the entire set of variables. Restricting analysis to the subspaces allows finer distinctions among the inferred dimensions generating the common variante, or covariante, shared by the variables, but removes the reliable variante specific to each subclass along with random measurement error. Since the development of the PRI subclasses espoused distinctiveness, not overlap, as the main criterion, analysis of shared variante may be inappropriate (Gracely 1992). For the data in Fig. 2, LISREL estimates the shared variante to be only 43%
VI
INTERNAL SUBCLASS
CONSISTENCY ADDITION
RELIABILITIES
(Diagonal)
AND CORRELATIONS
FOR SEMANTIC
(0.767) 0.660 0.708
(0.797) 0.563
SCORES
COMPUTED
Day 10
Day 3 Sensory Action Sensory Evaluation Affective Evaluation
MODEL
(0.719)
(0.760) 0.673 0.611
(0.739) 0.549
(0.783)
BY
I 0x
based on adding together the indicator subclasses for the 3 a-priori factors, the other obtained by simply adding al1 the subclasses together to form a total score. Can either of these methods seriously mislead? For the single overall score, the answer is ‘Not very much’; for the 3 theoretical subscores, the answer is ‘Possibly quite a bit’. The overall score unquestionably provides a good. broad measure of pain. In these data, the internal consistency reliabilities of the overall score were 0.88 at Day 3 and 0.87 at Day 10. Given the breadth of item descriptors, these are impressive reliabilities by any standard. At Day 3, the overall score correlated 0.69 with VAS pain report; at Day 10, the correlation was 0.56. This constitutes measurement of the highest quality, with excellent reliability and good validity. Now consider storing the PRI to obtain 3 subscores. The rationale for measuring 3 putatively different dimensions of pain is presumably to understand how they operate differently. Gracely (19921, for example, has argued that we should examine affective pain conditional on sensory pain. To pursue this suggestion, assume that clinical interest lies in identifying patients who are high in affective pain but low in sensory pain; formally, this question concerns the discrepancy or differente between two scores. The psychometrie properties of the PRI subscores wil1 not support this kind of distinction, since differences become notoriously unreliable as the correlations between the subtracted scores approach their individual reliabilities. At Day 3, for example, the a-priori storing of Sensory Pain and Affective Pain yield reliabilities of 0.84 and 0.75, respectively, and a correlation of 0.71. Using conventional formulas for the reliability of a differente score based on its components (Lord and Novick 19681, the reliability of the Sensory-Affective differente is 0.29, an unacceptable value in any context. When 2 scores overlap
for both solutions. The subclasses contain more reliable information than can be incorporated in an analysis of common variante; moreover, this information enriches (or ‘contaminates’, depending on one’s point of view) scores formed by subclass addition. Extensive concern over the number and nature of common factors may divert attention from broader questions regarding the information content of the MPQ subclasses. There is no ‘correct’ number of factors for these data. One can choose to explain much of the covariance with a single, very stable, factor; or, one can choose to explain more covariance with highly oblique, rather unstable, factors. Either choice leaves most of the l~~riance unexplained. Factor analysis describes hypothetical constructs that can at best be approximated by adding subclass scores together. Direct storing of observed subclasses wil1 generally not mirror the patterns defined by factor analysis of shared variante. Fig. 2 shows how these scores ‘drift’ from the positions of the true factors. By summing subclass scores directly, al1 the unique variante removed by factor analysis is added back in, and observed scores more closely resemble their indicator subclasses than their respective factors. Fig. 2 suggests that as much discrepancy exists between the true factors and scores within a model as between the true factors of the two models. Most clinicians and investigators use the MPQ, not to answer factor analysis questions about populations, but to obtain pain scores for individuals. Because of the indeterminacies described above, factor analysis methodology provides little practica1 guidance in defining clinically useful subscales for the PRI. In fact, conclusions based on unobserved factors may have unanticipated consequences when extrapolated to psychometric scores computed by ordinary addition of items. Consider two storing methods for the PRI, one
.6-
0 B. \\
4
.4’
2
2’
-0
ti.
N ;
.o
c 5a
-.2
.
,\
8 9-2’
L
i
10
.6
.6
ComPo”e”t,
.4
.T
’
-La
_,2 .o -2 ~s446 Component 3
A Priori Model
.
:
__.. . Scores
-.4 c
Factors
“’
Factors
7
.g
.6 C0v0nenf
!
R ,
-4
,2
,.
component
.4
,2
!6
o “mables
3
Semantic Model
Fig. 2. A-priori and semantic factor models nested within the space of the fitst 3 principal components. The axes are the fitst 3 principal components of the PRI subclasses, a frame of reference that is independent of measurement model. Vectors within this space represent subclasses and factors, and the cosines between vectors equal the corresponding correlations. The shaded interior surfaces are the subspaces of common variante defined by the 24actor analysis models: factor loadings are projections from subclasses on these surfaces in the coordinate system of the factors. The score vectors, defined by adding subclasses, do not coincide with the vectors for factors.
109
this highly, examining how they differ inevitably exaggerates the effects of measurement error. This is not a unique failing of the a-priori model; similar conclusions hold for scores defined by the semantic model or for any set of scores with intercorrelations approaching reliabilities. Unfortunately, these are just the kinds of scores produced from data sharing one very strong factor; even when additional factors significantly improve the statistical fit of the factor analysis model, scores based on them wil1 generally lack discriminant validity (sec also Holroyd et al. 1992). Based on these psychometrie considerations, 1 am inclined to agree with Turk and his colleagues (Turk et al. 1985) that single-factor storing generally be used in pructice to obtain pain ratings for individuals. The unobserved MPQ factors inferred in theoretical models are poorly approximated by simple summative storing, and finely-drawn conclusions about unobserved factors do not generalize to the observed scores. Furthermore, no storing approximation - however complex - can improve on the discriminant validity of the inferred factors themselves, which is marginal at best (sec Table 111). This paper describes factor analysis results based on one population of patients, and should not be regarded as a definitive account of the structure of the PRI, which is likely to vary across populations. 1 do not suggest that the semantic model, which outperformed the a-priori model in these analyses, should now be considered the ‘true’ or ‘best’ factor model for the PRI; 1 suggest, rather, that the ‘best’ model is unknowable, and that it makes little practica1 differente anyway: clinicians and investigators wil1 stil1 face the basic storing choice of treating the PRI as a single scale or as subscores possessing marginal discriminant validity. Factor analysis studies of common variante provide one way to understand the information in the PRI, but complementary approaches stressing unique attributes of the subclasses may be at least as valuable (Holroyd et al. 1992). Future multivariate research should encompass methodologies that analyze individual subclasses (e.g., Kass 1980; Breiman 1984; Wasserman 1989; McArthur et al. 1991) as wel1 as further longitudinal study of the factor structure of the PRI in diverse populations. NO methodological approach, however sophisticated, can reveal the single best way to use the PRI information for al1 populations and al1 hypotheses.
Acknowledgements This research was supported by National Cancer Institute Grant CA 38552. 1 am grateful to Nigel E.
Bush, C. Richard Chapman and Carol Moinpour their comments on an earlier draft of this paper.
for
References Akaike, H., Factor analysis and AIC, Psychometrika, 52 (1987) 317-332. Bentler, P.B., Comparative fit indices in structural models, Psychol. Bul]., 107 (1990) 238-246. Byrne, B.M., Muthén, B. and Shavelson, R.J., Testing for the equivalente of factor covariance and mean structures: the issue of partial measurement invariante, Psychol. Buil., 10.5 (1989) 456466. Chapko, M.K., Syrjala, K.L., Schilter, L., Cummings, C. and Sullivan, K.M., Chemoradiotherapy toxicity during bone marrow transplantation: time course and variation in pain and nausea, Bone Marrow Transplant, 4 (1989) 181-186. Donaldson, G.W. and Moinpour, C.M., Strengthened estimates of individual pain trends in children following bone marrow transplantation, Pain, 48 (1992) 147-156. Gracely, R.W., Evaluation of multi-dimensional pain scales, Pain, 48 (1992) 297-300. Holroyd, K.A., Holm, J.E., Keefe, F.J., Turner, J.A., Bradley, L.A., Murphy, W.D., Johnson, P., Anderson, K., Hinkle, A.L. and O’Malley, W.B., A multi-center evaluation of the McGill Pain Questionnaire: results from more than 1700 chronic pain patients, Pain, 48 (1992) 301-311. Jöreskog, K.G. and Sörbom, D., LISREL 7: a guide to the program and applications (2nd edn.), SPSS, Chicago, IL, 1989. Lord, F.M. and Novick, M.R., Statistical theories of mental test scores, Addison-Wesley, 1968, Reading, MA. Lowe, N.K., Walker, S.N. and MacCallum, R.C., Confirming the theoretical structure of the McGill Pain Questionnaire in acute clinical pain, Pain, 46 (1991) 53-60. Mandys, F., Dolan, C.B. and Molenaar, P.C.M., Two aspects of the simplex model: goodness of fit to linear growth curve structures and the analysis of mean trends, J. Ed. Behav. Stat., 19 (1994) 201-215. McArthur, D.L., Cohen, M.J. and Schandler, S.L., Rasch analysis of functional assessment scales: an example using pain behaviors, Arch. Phys. Med. Rehabil., 72 (1991) 296-304. Melzack, R., The McGill Pain Questionnaire: major properties and storing methods, Pain, 1 (1975) 277-299. Melzack, R. and Casey, K.L., Sensory, motivational and central control determinants of pain: a new conceptual model. In: D. Kenshalo (Eds.), The Skin Senses, CC Thomas, Springfield, IL, 1968. Melzack, R. and Wall, P.D., Pain mechanisms: a new theory, Science, 50 (1965) 971-979. Mulaik, S.A., The Foundations of Factor Analysis, McGraw-Hill, New York, 1972. Schubert, M.M., Williams, B.E., Lloid, M.E., Donaldson, G. and Chapko, M.K., Clinical assessment scale for the rating of oral mucosal changes associated with bone marrow transplantation, Cancer, 69 (1992) 2469-2477. Turk, D.C., Rudy, T.E. and Salovey, P., The McGill Pain Questionnaire reconsidered: confirming the factor structure and examining appropriate uses, Pain, 21 (1985) 385-397. Wasserman, P., Neural Computing, Theory, and Practice, Van Nostrand Reinhold, New York, 1989.