Bifactor structure of clinical disability in relapsing multiple sclerosis

Bifactor structure of clinical disability in relapsing multiple sclerosis

Multiple Sclerosis and Related Disorders (2014) 3, 176–185 Available online at www.sciencedirect.com journal homepage: www.elsevier.com/locate/msard...

697KB Sizes 0 Downloads 21 Views

Multiple Sclerosis and Related Disorders (2014) 3, 176–185

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/msard

Bifactor structure of clinical disability in relapsing multiple sclerosis Eric Chamota,n, Ilya Kisterb,1, Garry R. Cutterc,2 a

Department of Epidemiology, University of Alabama at Birmingham School of Public Health, 1665 University Blvd, Suite 217H, Birmingham, AL 35294-0022, USA b NYU-Multiple Sclerosis Care Center, Department of Neurology, NYU School of Medicine, 240 East 38th Street, NY 10016, USA c Department of Biostatistics, UAB School of Public Health, 1665 University Blvd, Suite 410B, Birmingham, AL 35294-0022, USA Received 8 November 2012; received in revised form 1 March 2013; accepted 7 June 2013

KEYWORDS Multiple sclerosis; Disability evaluation; Outcome measure; Multiple Sclerosis Functional Composite (MSFC); Factor analysis; Bifactor model

Abstract Background: Multiple sclerosis (MS) can affect virtually every neurological function which complicates the conceptualization and assessment of disability. Similar challenges are encountered in other medical fields including child cognitive development and psychiatry, for instance. In these disciplines progress in diagnosis and outcome measurement has been recently achieved by capitalizing on the concept of bifactor model. Objective: To present in accessible terms an application of bifactor confirmatory factor analysis to study the clinical disability outcomes in MS. Methods: Data included 480 assessments on 301 patients with relapsing–remitting MS who participated in the North American interferon beta-1a clinical trial (Avonex). Measures consisted of the Expanded Disability Status Scale (EDSS), the three components of the Multiple Sclerosis Functional Composite (MSFC), and five other clinical measures of neurological functions. We determined which of three confirmatory factor analysis models (unidimensional, multidimensional, and bifactor) best described the structure of the data. Results: EDSS scores ranged from 0 to 8 (94% between 0 and 4). The final bifactor model fitted the data well, explained 59.4% of total variance, and provided the most useful representation of the data. In this model, the nine measures defined a scoring dimension of global neurological function (63.1% of total composite score variance) and two auxiliary dimensions of extra variability in leg and cognitive function (17.1% and 9% of total composite score variance). Conclusion: Bifactor modeling is a promising approach to further understanding of the structure of disability in MS and for refining composite measures of global disability. & 2013 Elsevier B.V. All rights reserved.

n

Corresponding author. Tel.: +1 205 934 7176; fax: +1 205 934 8665. E-mail addresses: [email protected] (E. Chamot), [email protected] (I. Kister), [email protected] (G.R. Cutter). 1 Tel.: +1 212 598 6305. 2 Tel.: +1 205 975 5048. 2211-0348/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.msard.2013.06.005

1.

Introduction

Multiple sclerosis (MS) can affect nearly every neurological system. This characteristic of the disease considerably

Bifactor structure of disability in MS complicates the conceptualization and measurement of disability (Rudick et al., 1996). Expanded Disability Status Scale (EDSS; Kurtzke, 1983) and the Multiple Sclerosis Functional Composite (MSFC; Cutter et al., 1999) are two of the disability outcome measures most widely used in clinical trials of MS, but both have well-known limitations and lack simple interpretation (Hobart et al., 2000; Fox et al., 2007; Cohen et al., 2012). In its current form, the MSFC gives equal weight to the different domains of disability and thus does not attempt to account for the variable degree of association among disability domains. Part of the confusion and disagreements about what EDSS and MSFC scores represent could be alleviated with a composite measure of global neurological disability that does not over- or under-represent specific domains of disability. The measurement problems associated with multi-faceted, not directly observable constructs such as disability in MS are encountered in other medical fields where rapid progress has recently been made by capitalizing on applications of the bifactor model in factor analysis (FA) and item response theory (IRT) analysis (e.g., in psychiatry, Simms et al., 2012; pediatric cognitive development, Willoughby et al., 2010; quality of life, Reeve et al., 2007). In this manuscript, we present an intuitive introduction to the fundamentals of confirmatory factor analysis (CFA) with special emphasis on bifactor CFA. We also present a substantive example of how these methods can be used as a starting point for the development of unconfounded composite outcome measures of global neurological disability in MS.

2.

Materials and methods

In Table 1 we provide a list of (crude) rule of thumbs to guide interpretation of key assumptions, methodological options, and study results.

2.1. Introduction to confirmatory factor analysis (CFA) and bifactor CFA FA can be defined as a modeling technique used to study latent variables or factors, such as global neurological disability, executive function, or depression that are conceptualized as existing on a continuum, but cannot be directly observed or measured. Under both its exploratory (EFA) and confirmatory forms (CFA), FA assumes that patients' scores on a pool of observable proxy measures, or indicators, reflect patients' positions on one or several underlying factors. FA's purpose is thus to infer the characteristics of the factors and their scales of measurement from the variances and covariances among the observable measures. EFA is primarily descriptive whereas in CFA, pre-specified hypotheses about the factors and their connections to the indicators are tested against the data. We first describe CFA and, then, bifactor CFA (most specifically as implemented in version 6 of the Mplus software package; Muthén and Muthén, 1998–2011). CFA has close ties with simple linear regression. Fig. 1a and Eq. (1) below illustrates this relationship for the simplest unidimensional case where each of a set of hypothetical indicator measures (e.g., a battery of cognitive neuropsychological tests) is influenced by only one common factor that one

177

Table 1 Guide to interpretation of assumptions, methods, and results. Topic

Rule of thumb

Correlations among indicator measures for FA to be worth performing Proportion of correlations greater ≥50% than 0.30 Correlations suggesting indicator 40.80 redundancy (to be avoided) Normality of indicator measures Acceptable skew ≤2.0 Acceptable kurtosis ≤7.0 Essential unidimensionality versus multidimensionality EFA: Ratio of the variance explained 44.0 (i.e., “eigenvalue”) by the first factor extracted to that explained by the second factor extracted Bifactor CFA: Correlations between the general factor and the indicator measures are 40.30 and substantially greater than their counterparts on the secondary factors CFA model estimation Continuous indicators Maximum likelihood At least one ordered categorical Robust weighted indicator least square Criteria of CFA model fit Chi-square goodness of fit test P40.05 Root Mean Square Error ≤0.05–0.08 Approximation (RMSEA) Comparative Fit Index (CFI) ≥0.90–0.95 Tucker–Lewis Index (TLI) ≥0.90–0.94 Mean absolute residual correlation o0.10 Interpretation of standardized loadings (indicatorfactor correlations) Trivial-to-weak 0.20–0.39 Moderate 0.40–0.60 Strong 40.60 Interpretation of between-factor correlations Weak o0.40 Moderate 0.40–0.60 Strong 40.60 Interpretation of indicator residual variance Large 450% Note: The above guidelines are not fully settled and do not apply to all situations. Alternative methods or more sophisticated criteria are sometimes needed, in particular to determine the degree of unidimensionality of a set of indicator measures. References: Beauducel and Herzberg (2006), Bollen (1989), Cook et al. (2009), Jackson et al. (2009), Norman and Streiner (2008), Reeve et al. (2007), Reise et al. (2007), Reise et al. (2013), and Schmitt (2011).

178

E. Chamot et al.

is trying to estimate (e.g., global cognitive function). For each indicator, a linear regression equation is specified so that Observed score ¼ intercept þ ðloading  factor scoreÞ þresidual error

ð1Þ

“Loading” is a usual regression coefficient defining the metric of measurement; that is, by how much the score on the indicator is expected to change for one-unit change in factor score. When factor scores are assumed to be normally distributed with a mean of zero and a variance of one, as is generally the case, patients located at the mean on the factor have average rating on an indicator equal to the intercept for this indicator. Finally, when all the indicators are standardized (i.e., converted to z-scores), the resulting

Y1 Y2 Y3

G Y4 Y5 Y6

Y1

F1 Y2 Y3

“standardized loadings” represent correlations between the indicators and the common factor. In Fig. 1a–d, data are assumed to be available for six observable measures (Y1–Y6), each represented by a horizontal bar with length equal to the variance of the measure (identical for all the measures, for simplicity). In Fig. 1a (unidimensional case), the common factor G explains a large fraction (black) of the variance of each measure. Since the variance of a measure is generally interpreted as the amount of information that it provides, there is little information in each measure that is not explained by the factor (residual variance, white). Under these and other assumptions of the classical test theory, an “MSFClike” composite of the six indicators would have the properties of a unidimensional rating scale measuring G with greater accuracy and reliability than any of the indicator measures individually. Group differences in composite scores would also have meaningful interpretation once the metric of the composite is defined. Fig. 1b presents one of several possible forms of multidimensional CFA models, in which most of the variance of the measures Y1 and Y2 is explained by factor F1, whereas most of the variance of the measures Y3–Y4 and Y5–Y6 is explained by the factors F2 and F3, respectively. In contrast with Fig. 1a, only a negligible fraction of the information in each measure could have been explained by a factor common to all six measures (black). As a result, the correlations among the three factors are negligible and a composite of the six measures would amount to mixing apples, carrots, and milk in variable proportions to obtain a single score of “thick or thin puree”. In a randomized clinical trial, a difference in mean score between the treatment groups would indicate that there is more “puree”

F2 Y4 Y5

F3 Y6

Y1

F1 Y2 Y3

F2 Y4 Y5

F3 Y6

F1'

G

F2'

F3'

Fig. 1 Comparison of unidimensional, multidimensional, and bifactor confirmatory factor analysis models. Model 1a—Unidimensional: Yi =intercepti+loadingi (G)+residual errori. Model 1b—Multidimensional uncorrelated: Yi =intercepti+loadingF1i (F1)+loadingF2i (F2)+loadingF3i (F3)+residual errori. The variance that could be explained by a general factor is very small. As a result there are no meaningful correlations among the factors F1, F2 and F3. Model 1c—Multidimensional correlated: Yi =intercepti+loadingF1i (F1)+loadingF2i (F2)+loadingF3i (F3) +residual errori. The variance that could be explained by a general factor is substantial. This translates in noticeable correlations among factors (double-headed, curved arrows). Model 1d—Bifactor (restricted): Yi =intercepti+loadingGi (G)+loadingF1’i (F1′)+loadingF2’i (F2′)+loadingF3’i (F3′)+residual errori. This represents an alternative way to model data represented in model 1c. A general factor G is introduced in the model. Note: Each horizontal bar represents the variance of one of six indicator measures Y1–Y6. For convenience, the measures are assumed to have been standardized to have equal variance. Variables enclosed in circles are factors (i.e. latent variables). Arrows indicate correlations (standardized loadings) between genfactors and indicator measures. Variance explained: factors F1, F2, F3, and eral factor G, residual/unexplained variance. F1, F2 and F3 are usual common factors; F1′, F2′, and F3′ are the specific/auxiliary factors corresponding to F1, F2 and F3 in the bifactor model after the variance explained by the general factor G is parsed out. See Section 2.1 for further details.

Bifactor structure of disability in MS in one group than the other, but there would be no clear answer to the question of what constitutes a minimal important difference score. Furthermore, because the components of the composite are confounded with each other, inconsistent conclusions would be reached across studies about which patient characteristics independently predict higher scores. The structure of measurement described in Fig. 1c combines elements of Fig. 1a and b. Like the structure of disability in MS as previously described, it is neither nearly unidimensional nor strictly multidimensional. CFA suggests that three factors are needed to explain the variances and covariances among the six measures, but the model does not explicitly recognize that a substantial fraction of the variance of each measure (black) could be explained by a common factor; instead, the model allows for noticeable correlations among the factors (represented by the doubleheaded, curved arrows joining each pair of factors). In this case, the three components (F1, F2, and F3) of the multidimensional composite of the six measures would still be confounded with each other, though to a lesser degree than in Fig. 1b owing to the correlations among factors. In Fig. 1d, we assume that a bifactor CFA model is fitted to the data that yielded model 1c (Gibbons and Hedeker, 1992). The most important difference between models 1c and 1d is that a general factor G is specified in model 1d to account for the important fraction of the variance of each measure represented in black in model 1c. In the bifactor model 1d, three specific or auxiliary factors F1′, F2′, and F3′ each explain an additional fraction of the variance of the measures in the doublets Y1–Y2, Y3–Y4, and Y5–Y6. In Fig. 1d, the bifactor model is “restricted”, which means that correlations among the four factors are imposed to be zero (no curved arrows joining the factors). In other words, the factors F1′, F2′, and F3′ in Fig. 1d represent the part of the reliable variances of F1, F2 and F3 in Fig. 1c that is not explained by the general factor G. The concept of bifactor model was conceived in the 1930s (Holzinger and Swineford, 1937), but it is only in the last few years that its unique potential for the study and measurement of broad, multi-faceted constructs has been fully recognized (Reise, 2012). In psychiatry, the bifactor approach has been used to investigate the lack of separation among DSM-IV mental disorders (Regier et al., 2009). Simms et al. (2012) have shown, for example, that the substantial overlap in symptom ratings among patients with syndromes of depression, anxiety, and somatization could be advantageously represented with a restricted bifactor model similar to that of Fig. 1d. The main appeal of the model was its capacity to decompose the variance in indicator measures of depression, anxiety and somatization into independent and easily interpretable components: the general factor captured the overall severity of general psychological distress, whereas the specific factors estimated uncorrelated residual variations in depressive, anxiety and somatic symptoms. These results suggested that it might be possible to compare patients on an unambiguous scale of overall psychological distress, using a well-defined, unidimensional definition of minimal important score difference, and to simultaneously classify them into diagnostic categories based on their profiles of residual variability in psychiatric symptoms. The study also showed that predictors of general psychological distress severity could be isolated from predictors of excess severity for specific symptom domains.

179 In line with these results, we hypothesized that neurologic disability in MS can be decomposed into a general component of global disability and uncorrelated components of extra-variability in specific domains of neurologic dysfunction.

2.2.

Sample and observed measures

Data for this proof-of-concept study included baseline and 2-year follow-up assessments of 301 men and women with relapsing–remitting MS (RRMS) recruited in 1990–1993 in the North American interferon beta-1a clinical trial (a subset of the original pooled database assembled by the Task Force on Clinical Outcomes Assessments for the development of the MSFC; Jacobs et al., 1996; Fischer et al., 1999). The rationale for choosing this dataset was that participants were evaluated with the EDSS and multiple clinical measures of cognitive, hand, and leg function, including the three components of the MSFC— the timed 25-foot walk test (T25FW), the nine hole peg test (9HPT) and the 3-second paced auditory serial addition test (PASAT)—as well as the ambulatory index (AI), the gait speed test (GST), the box and blocks test (BBT), the symbol digit modality test (SDMT), and total score on the controlled oral word association test (COWAT). All patients had EDSS of 1-to-3.5 at baseline. Two-year follow-up observations were recorded for 179 subjects (60%). Approval was obtained from the institutional review board of the University of Alabama at Birmingham (as opposed to the board for the original trial).

2.3.

Preliminary analyses

We used EFA and hierarchical item cluster analysis (Revelle, 1979) to obtain a preliminary description of data structure and degree of multidimensionality (Reeve et al., 2007; Lai et al., 2006). Providing an introduction to these exploratory techniques is beyond the scope of this paper (see Norman and Streiner (2008) and Studerus et al. (2010) for an elementary description). Analyses were conducted with the psych package (Revelle, 2010) of the statistical software R (R Development Core Team, 2011).

2.4.

A priori models

We compared the following three CFA models in terms of fit to the data: (1) a unidimensional model where all nine observed measures defined a single factor of neurological disability (similar to Fig. 1a); (2) a multidimensional model including three correlated factors of arm, leg, and cognitive dysfunction (similar to Fig. 1c); and (3) a restricted bifactor model where all nine measures contributed to a general factor of global disability and subgroups of the same nine measures also contributed nontrivially to specific (auxiliary) factors of leg, hand, and cognitive dysfunction (similar to Fig. 1d). Composite measures are rarely strictly unidimensional, because they virtually always include observed measures influenced by more than one underlying factor. Should a unidimensional model fail to fit the observed data, it is important to evaluate whether the data are to be considered as essentially unidimensional (i.e., unidimensional in first approximation; Reise et al., 2007). In this study, this amounted

180

E. Chamot et al.

3.

to judging whether the domain-specific factors not accounted for in the unidimensional model were sufficiently weak so as not to introduce clinically important bias in the single factor. The multidimensional model of neurological disability assumed that the nine observed measures had too little in common to be the manifestation of a single factor. Instead we assumed that three correlated factors (arm, leg, and cognitive) were necessary to account for the heterogeneity in levels of correlation among the observed measures. Finally, the bifactor model was seen as an alternative to both the essentially unidimensional and the multidimensional model (Reise et al., 2007).

2.5.

3.1. Sample characteristics and data transformations In the sample, 73.4% of patients were women. At baseline, mean age was 37.7 years (SD, 7.4) and mean disease duration 6.5 years (SD, 5.8). The sample size was relatively small; therefore we used all the baseline and follow-up data available in the analyses. Baseline distributions of patient characteristics were comparable in groups with and without follow-up data (sex ratio: 0.36 versus 0.29, P=0.30; disease duration: 6.3 versus 6.9 years, P=0.39; other characteristics: not shown). Because of the sparse numbers of high EDSS and AI scores, we collapsed EDSS scores of 4–4.5 and scores ≥5 into two categories. Similarly, AI scores ≥4 were collapsed into a single category. Count measures (e.g., PASAT) were treated as continuous. T25FW and 9-HPT scores were inverse transformed, and GST scores log transformed to achieve acceptable levels of normality (Beauducel and Herzberg, 2006; Table 1). Before CFA, all continuous scores were standardized (mean of zero, unit variance) and coded so that a higher score indicates a higher disability level. After transformations, most correlations among observed measures were low to moderate, but 21 out of 36 were greater than 0.30 (Tables 1 and 2). The highest correlations were among measures assessing the same neurological function (cognitive, hand, or leg). EDSS scores correlated moderately with the measures of hand and leg function, but only marginally with the measures of cognitive function. The lowest correlations were between measures of cognitive and leg functions.

Model estimation

A robust weighted least square procedure is generally recommended when the indicators include ordered-categorical measures such as the EDSS and the AI. We used the weighted least squares with mean and variance adjustment (WLSMV) in Mplus. All CFA models were adjusted for possible correlations of residual errors between baseline and follow-up measurements (i.e., parameters for these correlations were freely estimated instead of being fixed to zero by default).

2.6. Assessment of model fit and degree of unidimensionality Many indicators of CFA model fit have been proposed, but all have limitations; therefore, we examined several of them as recommended (Jackson et al., 2009). We used chi-square goodness of fit test corrected for WLSMV estimation (a nonsignificant result suggests acceptable fit), Root Mean Square Error Approximation (RMSEA), Tucker–Lewis Index (TLI), Comparative Fit Index (CFI), and mean absolute residual correlation criterion (Reeve et al., 2007). Several complementary criteria have also been proposed to inform decisions about the degree of unidimensionality versus multidimensionality of a set of measures. The criteria reported in this article are described in Table 1. Table 2

PASAT SDMT COWAT BBT 9-HPT EDSS AI T25FW GST

Results

3.2.

Preliminary analyses

In EFA, 42.4% of total variability was on the first factor extracted. The ratio of the variability explained by the first factor extracted to that explained by the second factor extracted was 2.2—a result suggesting that data were unlikely to be strictly, or even essentially unidimensional (see Table 1 and Norman and Streiner, 2008 for interpretation). Instead, all

Correlations among clinical outcome measures. PASAT

SDMT

COWAT

BBT

9-HPT

EDSS

AI

T25FW

0.557 0.467 0.400 0.302 0.183 0.164 0.145 0.113

0.415 0.453 0.424 0.270 0.226 0.171 0.134

0.363 0.245 0.091 0.061 0.023 0.032

0.741 0.425 0.455 0.342 0.329

0.479 0.450 0.290 0.292

0.643 0.483 0.399

0.657 0.599

0.404

Note: Differences with previously published tables (Fischer et al., 1999; Cutter et al., 1999) are due to data transformations and differences in methods of correlation calculation. For maximum model accuracy, the weighted least squares with mean and variance adjustment (WLSMV) estimation procedure (see Section 2.4) calculates correlations between pairs of variables differently depending on whether both measures are continuous (Pearson's correlation), one measure is continuous and the other categorical (polyserial correlation), or both measures are categorical (polychoric correlation).

Bifactor structure of disability in MS

181

Residual

Standardized loadings (S.E.)

variances (S.E.) 0.85 (0.04)

PASAT

0.77 (0.04)

SDMT

0.91 (0.03)

COWAT

0.48 (0.04)

BBT

0.39 (0.05)

0.48 (0.04) 0.30 (0.05)

Global

0.72 (0.03) 0.54 (0.04)

9-HPT

0.49 (0.04)

EDSS

0.68 (0.03)

G

0.71 (0.03) 0.82 (0.04) 0.32 (0.04)

AI

0.61 (0.04)

0.61 (0.04)

T25FW

0.54 (0.04)

0.71 (0.04)

GST

Variance 1.00

Fig. 2 Unidimensional CFA model of neurological disability in RRMS. Note: S.E. stands for standard error. All the performance measures were scaled to have variance one; therefore, the variance of a performance measure explained by the factor equals one minus the residual variance for this measure. The standardized loadings have interpretation of correlations between the factor and each of the performance measures. By design, no intercept was significantly different from zero.

preliminary analyses supported a multidimensional model with two or more correlated factors, or a bifactor model.

3.3.

Unidimensional CFA model

As expected, the one-factor CFA model (Fig. 2) provided poor fit (chi-square test, Po0.001; RMSEA, 0.144; CFI, 0.720; TLI, 0.694). Only BBT, 9-HPT, EDSS, and AI had high correlations (i.e., “standardized loadings” in FA parlance) of about 0.70 or higher. The factor explained 36.9% of total variance in clinical measures. The pattern of residual correlations indicated that a substantial fraction of the covariations between measures of cognitive function and arm function were not accounted for by the model (mean absolute residual correlation, 0.122).

3.4.

had significant correlations with the first factor (cognitive), BBT and 9-HPT with the second factor (arm), and AI, T25FW and GST with the third factor (leg). EDSS correlated moderately with the leg factor (0.52; Po0.001), weakly with the arm factor (0.25; Po0.001), and not at all with the cognitive factor (−0.02; P=0.82). We observed moderate-to-strong correlations between the arm factor and the other two factors (arm– cognitive, 0.61; arm–leg, 0.55), and only a weak correlation between the leg and cognitive factors (0.25). In combination, the number of large measure-factor correlations (i.e., ≥0.70), the relatively high correlations of the arm factor with the other two factors, and the correlation of EDSS with arm and leg factors suggested that ratings on the nine observed measures were not strictly multidimensional. The three factors explained 57.8% of total variance.

Three-factor CFA model 3.5.

Fig. 3 displays the solution for the three-factor CFA model. Model fit was excellent (chi-square test, P=0.21; RMSEA, 0.020; CFI, 0.995; TLI, 0.994). As hypothesized, PASAT, SDMT and CA

Bifactor CFA model

The restricted bifactor model with three auxiliary factors of cognitive, hand and leg function (i.e., similar to Fig. 1d) failed

182

E. Chamot et al. Residual variances (S.E.) 0.52 (0.06)

Standardized loadings (S.E.) PASAT 0.70 (0.05)

0.30 (0.07)

Correlations (S.E.)

SDMT

Cognitive

0.84 (0.04)

F1 0.72 (0.06)

0.53 (0.05) COWAT 0.61 (0.04)

0.21 (0.03)

Arm

BBT 0.89 (0.02)

0.31 (0.03)

9-HPT

0.83 (0.02)

0.25 (0.07)

F2

0.25 (0.05) 0.51 (0.04)

EDSS 0.55 (0.04) 0.54 (0.05)

0.08 (0.05)

Leg

AI 0.96 (0.03)

0.53 (0.05)

T25FW

0.69 (0.03)

F3

0.62 (0.04) 0.62 (0.05)

GST

Fig. 3 Three-factor CFA model of neurological disability in RRMS. Note: see Fig. 1 note for interpretation. In Cutter et al. (1999), the EDSS correlated 0.52 with the T25FW, 0.33 with the 9-HPT, and 0.23 with the PASAT.

to converge. However, an excellent solution was obtained with a general factor of global neurological function and only two auxiliary factors of cognitive and leg function (Fig. 4). This bifactor representation of the nine observed measures fitted the data significantly better than the unidimensional model (Po0.001) and as well as the three-factor model (chi-square test of model fit, P=0.15; RMSEA, 0.024; TLI, 0.991; CFI, 0.994). The two measures of arm performance were very strongly correlated with the general factor (BBT, 0.91; 9-HPT, 0.86) and, thus, did not constitute a third specific factor. Correlations of observed measures with the general factor ranged from 0.29 to 0.91 (median, 0.49) whereas correlations ranged from 0.51 to 0.70 (median, 0.54) with the cognitive factor, and 0.47–0.83 (median, 0.53) with the leg factor. Only EDSS had a stronger correlation with the general factor than with its secondary factor (Table 1); the reverse was true for PASAT, SDMT, COWAT, AI, GST and T25FW. The model explained 59.2% of total variance in the data and 89.2% of the variance of composite sum score of the nine measures. More specifically, 63.1% of composite sum score variance was accounted for by the general factor, 17.1% by the leg factor, 9.0% by the cognitive factor, and 10.8% was unexplained. In other words, the composite score, as an estimate of global neurological disability, was “70.1% unidimensional” (63.1 divided by 63.1+17.1+9.0; Zinbarg et al., 2005) and “29.9% confounded by residual variations in leg and cognitive dysfunctions” (Reise et al., 2013).

3.6. Correlations among individual score estimates Because of the substantial proportion of patients lost to followup, correlations among the different individual score estimates were calculated at baseline only. As expected, individual scores on the specific dimensions of the bifactor model were not significantly correlated with each other (the observed weak correlations are in part random, in part an artifact of the method of estimation; DiStefano et al., 2009). Individual scores of global neurologic dysfunction estimated using the bifactor model were less strongly correlated with standard MSFC scores (r=0.77; Po0.001) than with individual scores estimated under the unidimensional model (r=0.95; Po0.001; Table 3), which suggests that the MSFC is not entirely unidimensional. Similarly, the cognitive sub-scores were moderately correlated with the MSFC scores (r=0.35; Po0.001), suggesting that the contribution of PASAT to the standard MSFC composite is weighted too high for MSFC to be considered an unconfounded measure of global neurological disability.

3.7.

Sensitivity analysis

Because our sample size was relatively small, we used all the baseline and follow-up data available in the main analyses.

Bifactor structure of disability in MS

183 Standardized loadings (S.E.)

Standardized loadings (S.E.)

Residual variances (S.E.) 0.37 (0.07)

Cognitive

PASAT 0.70 (0.06)

0.48 (0.06)

0.38 (0.05)

SDMT

0.49 (0.05)

COWAT

0.54 (0.05)

F2

0.51 (0.05) 0.65 (0.05)

0.17 (0.04)

Global

0.29 (0.04)

BBT

0.91 (0.02) 0.26 (0.05)

G

0.86 (0.03)

9-HPT

0.52 (0.04) 0.49 (0.04)

0.50 (0.04)

EDSS 0.47 (0.04)

0.07 (0.05)

0.35 (0.05)

AI

0.83 (0.04) 0.59 (0.04)

0.53 (0.04)

0.33 (0.05)

Leg

F1

T25FW 0.51 (0.04) GST

0.63 (0.04)

Fig. 4

Table 3

Bifactor CFA model of neurological disability in RRMS. Note: see Fig. 1 note for interpretation.

Baseline correlations among factor scores and MSFC scores.a

Global (bifactor) Global (unidimensional) MSFC Cognitive (bifactor) Leg (bifactor)

Globalb (bifactor)

Global (unidimensional)

MSFCb

Cognitive (bifactor)

Leg (bifactor)

1.00 0.95*** 0.77*** 0.08 −0.10

1.00 0.85*** 0.17* 0.16*

1.00 0.35***c 0.14c

1.00 −0.10

1.00

Sidak-adjusted significance level: **Po0.01. a Factor scores are regression scores (i.e., expected a posteriori; DiStefano et al., 2009). b Coded so that a higher score indicates a higher level of disability. c In Fischer and Rudick (1999), the overall MSFC score correlated 0.68 with the PASAT and 0.67 with T25FW. n Sidak-adjusted significance level: Po0.05. nnn Sidak-adjusted significance level: Po0.00.

Excluding baseline data for the 122 subjects with incomplete follow-up had no substantive effect on study results.

4.

Discussion

This study differs in two important ways from previous studies of clinical outcome measures in MS. First, it relied on bifactor analysis to assess the dimensional structure of neurological disability in RRMS as described by the nine outcome measures considered; second, it incorporated the EDSS among the indicator measures of disability to be modeled, instead of treating it as an external “gold standard” of disability. Our main finding is that, in this sample of patients with RRMS, neurological disability was best represented by a

general factor of global neurological dysfunction and two uncorrelated auxiliary factors of leg and cognitive dysfunction. This factor structure shares similarities with that recently described for Guy's neurological disability scale (GNDS; Mokkink et al., 2011). The GDDS exhibited a bifactor structure where each of the 12 scale domains correlated with a general factor of disability, while 10 domains also defined a spinal factor (lower limb, upper limb, bladder, bowel, and sexual domains), a mental factor (cognition, mood and fatigue domains), and a bulbar factor (speech and swallow domains). Furthermore, the upper limb domain had a strong correlation with the general factor (0.740), as in our study, and only a negligible correlation with the spinal factor (0.184). Together, our results and those of Mokkink et al. (2011) support the notion that MS disability can be well

184

E. Chamot et al.

represented by a global disability factor and uncorrelated domain-specific disability factors. These results make sense from neuroanatomic and clinical standpoints, since the structures subserving “leg function”, for instance, are neuroanatomically distinct from those subserving cognition. It is also understandable that tests that purport to measure T25W, AI, and GST would be strongly correlated with each other as all purport to measure mobility. In this research, neurological disability in MS exhibited bifactor structure (Figs. 1d and 4) and did not satisfy basic criteria of essential unidimensionality. This suggests that functional performance tests in MS should be conceptualized as markers of both global and domain-specific disability (possibly with the exception of measures such as those of arm functioning which loaded only on the general factor in this study). Therefore, an association between external variables (e.g., disease duration or age) and a standard composite measure such as the MSFC could be due to a relation between the variable and global disability, domain-specific disability, or both. The associations of the composite outcome with the external variables might also vary across patient samples, and possibly underestimate, or overestimate the associations of the true level of global neurological disability with these variables. A related concern with the current MSFC scoring scheme is that equal nominal weights are assigned to T25FW, 9-HPT and PASAT. However, in our CFA models, the measures of cognitive function provided less information toward assessing the level of global neurological function than the measures of leg and especially arm functions. Moreover, factor scores specific of the cognitive specific domain correlated substantially and significantly with the MSFC scores. Together, these results suggest that the current scoring scheme is a source of confounding in the investigations of the relations between overall level of neurological dysfunction and patient characteristics or treatment. Our proof-of-concept study suffered from several limitations. The study sample included only RRMS patients with baseline EDSS≤3.5. Restricting participant selection to those scoring within a narrow range of an observed measure is discouraged in FA (Bollen, 1989). A consequence of the restriction in EDSS range at baseline is that correlations among observed measures, and between observed measures and factors, are likely to have been weaker than those that would have been observed in a more heterogeneous sample. No primary measure was available for several known dimensions of disability (e.g., vision, sphincter). Therefore, there might be more than two auxiliary factors of neurological dysfunction, and the measures of arm function might contribute to one of these auxiliary factors, as was the case in the GNDS study (Mokkink et al., 2011). Finally, we did not assess the reliability, the measurement invariance, or the concurrent and predictive validity of the scores generated.

5.

Conclusion

Our results suggest that it might be possible to apply bifactor FA/IRT methods to other existing datasets that include a broad range of clinical outcome measures. Such efforts might help refine the conceptualization of the clinical measurement of global disability in MS and,

ultimately, develop a bifactor model of neurological disability in MS that is invariant across samples of MS patients and situations (Cook and Petersen, 1987; McHorney and Cohen, 2000; Reeve et al., 2007). Once validated, such a pre-calibrated model would allow one to calculate unbiased scores of global neurological disability with only a subset of the performance measures employed for model development (conceivably just three or four measures; Lord, 1980). This feature of IRT models is relied upon, for instance, in applications of item banks and computer adaptive testing for patient-reported outcomes (Reeve et al., 2007). The methods of score linking might also be used to compare scores of different outcome measures of neurological disability on a common metric (Yen and Fitzpatrick, 2006). Then, various nomograms or conversion tables could be developed to facilitate the interpretation of the least intuitive measures (e.g., MSFC-like composites) by establishing stable connections between these and easier-tointerpret measures such as the EDSS.

Funding acknowledgement This work was supported by the National Multiple Sclerosis Society [award number HCO127]. The funding agency played no role in study design, analysis, results interpretation, or report writing.

Conflict of interest statement EC serves on the NMSS Task Force on Clinical Disability Measures. IK has no conflict of interest to declare. GRC served on scientific advisory boards for sanofi-aventis, Cleveland Clinic, Daiichi Sankyo, GlaxoSmithKline, Genmab A/S, Eli Lilly and Company, Medivation, Inc., Modigenetech, Ono Pharmaceutical Co. Ltd., PTC Therapeutics, Inc., Teva Pharmaceutical Industries Ltd., Vivus Inc., University of Penn, the NIH (NHLBI, NINDS, NICHD) and NMSS; serves on the editorial board of Multiple Sclerosis; has received speaker and consulting honoraria from Alexion Pharmaceuticals, Inc., Bayhill Therapeutics, Bayer Schering Pharma, Novartis, Genzyme Corporation, Nuron Biotech, Peptimmune, Somnus Pharmaceuticals, Sandoz, Teva Pharmaceutical Industries Ltd., UT Southwestern, and Visioneering Technologies, Inc.; has received funding for travel and speaker honoraria from Consortium of MS Centers and Bayer Schering Pharma; and has received research support from the NIH (NICHHD, NIDDK, NINDS, NHLBI, NIAIDS), NMSS, the Consortium of MS Centers, and Klein-Buendel Incorporated.

Acknowledgements We thank Dr. R.A. Rudick and the National Multiple Sclerosis Society (NMSS) for their permission to use the database assembled by the NMSS Task Force on Clinical Outcomes Assessments for secondary analyses pertaining to the development of new outcome measures in MS.

Bifactor structure of disability in MS

References Beauducel A, Herzberg YP. On the performance of maximum likelihood versus means and variance adjusted least squares estimation in CFA. Structural Equation Modeling 2006;13: 186–203. Bollen KA. Structural equations with latent variables.New York: John Wiley & Son; 1989. Cohen JA, Reingold SC, Polman CH, Wolinsky JS, et al., for the International Committee on Clinical Trials in Multiple Sclerosis. Disability outcome measures in multiple sclerosis clinical trials: current status and future prospects. Lancet Neurology 2012; 11:467–76. Cook L, Petersen N. Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances. Applied Psychological Measurement 1987;11:225–44. Cook KF, Kallen MA, Amtmann D. Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT's unidimensionality assumption. Quality of Life Research 2009;18(4):447–60. Cutter GR, Baier ML, Rudick RA, Cookfair DL, Fischer JS, Petkau J, et al. Development of a multiple sclerosis functional composite as a clinical trial outcome measure. Brain 1999;122:871–82. DiStefano C, Zhu M, Mindrila D. Understanding and using factor scores: Considerations for the applied researcher. Pract Assess Res Eval 2009;14(20):1–11. Fischer JS, Rudick RA, Cutter GR, Reingold SC. The Multiple Sclerosis Functional Composite (MSFC): an integrated approach to MS clinical outcome assessment. The National MS Society Clinical Outcomes Assessment Task Force. Multiple Sclerosis 1999;5:244–50. Fox RJ, Lee JC, Rudick RA. Optimal reference population for the multiple sclerosis functional composite. Multiple Sclerosis 2007;13: 909–14. Gibbons RD, Hedeker DR. Full-information item bi-factor analysis. Psychometrika 1992;57:423–36. Hobart J, Freeman J, Thompson A. Kurtzke scales revisited: the application of psychometric methods to clinical intuition. Brain 2000;123:1027–40. Holzinger MG, Swineford F. The bi-factor model. Psychometrika 1937;2:40–54. Jackson DL, Gillaspy JA, Purc-Stephenson R. Reporting practices in confirmatory factor analysis: an overview and some recommendations. Psychological Methods 2009;14:6–23. Jacobs LD, Cookfair DL, Rudick RA, Herndon RM, Richert JR, Salazar AM, et al. Intramuscular interferon beta-1a for disease progression in relapsing multiple sclerosis. Annals of Neurology 1996;39: 285–94. Kurtzke JF. Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology 1983;33:1444–52. Lai JS, Crane PK, Cella D. Factor analysis techniques for assessing sufficient undimensionality of cancer related fatique. Quality of Life Research 2006;15:1179–90. Lord FM. Applications of item response theory to practical testing problems.Hillsdale, NJ: Erlbaum; 1980. McHorney CA, Cohen AS. Equating health status measures with item response theory: illustrations with functional status items. Medical Care 2000;38(Suppl. II):II-43–59.

185 Mokkink LB, Knol DL, Uitdehaag BM. Factor structure of Guy's Neurological Disability Scale in a sample of Dutch patients with multiple sclerosis. Multiple Sclerosis 2011;17:1498–503. Muthén LK, Muthén BO. Mplus user's guide.6th ed. Los Angeles: Muthén & Muthén; 1998–2011. [computer software]. Norman GR, Streiner DL. Biostatistics: the bare essentials.3rd ed. Hamilton: BC Decker Inc; 2008. R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2011. ISBN 3-900051-07-0. Available from: 〈http://www.R-project.org/〉. Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care 2007;45(Suppl 1):S22–31. Regier DA, Narrow WE, Kuhl EA, Kupfer DJ. The conceptual development of DSM-V. American Journal of Psychiatry 2009;166: 645–50. Reise SP, Morizot J, Hays RD. The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research 2007;16:19–31. Reise SP. The rediscovery of bifactor measurement models. Multivariate Behavior Research 2012;47:667–96. Reise SP, Bonefay WE, Haviland MG. Scoring and modeling psychological measures in the presence of multidimensionality. Journal of Personality Assessment 2013;95:129–40. Revelle W. Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research 1979;14:57–74. Revelle W. Psych: procedures for psychological, psychometric, and personality research. Evanston: North-Western University. R package version 1.1.12; 2010 [computer program]. Rudick R, Fischer J, Antel J, Confavreux C, Cutter G, Ellison G, et al. Clinical outcomes assessment in multiple sclerosis. Annals of Neurology 1996;40:469–79. Schmitt TA. Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment 2011;29(4):304–21. Simms LJ, Prisciandaro JJ, Krueger RF, Goldberg DP. The structure of depression, anxiety and somatic symptoms in primary care. Psychological Medicine 2012;42:15–28. Studerus E, Gamma A, Vollenweider FX. Psychometric evaluation of the Altered States of Consciousness Rating Scale (OAV). PLoS One 2010;5(8):e12412. Willoughby MT, Blair CB, Wirth RJ, Greenberg M. The measurement of executive function at age 3: psychometric properties and criterion validity of a new battery of tasks. Psychological Assessment 2010;22:306–17. Yen WM, Fitzpatrick AR. Item response theory. In: Brennan RL, editor. Educational measurement. 4th ed. Westport, CT: American Council on Education and Praeger Publishers; 2006. Zinbarg RE, Revelle W, Yovel I, Li W. Cronbach's α, Revelle's β, and McDonald's ΩH: their relations with each other and two alternative conceptualizations of reliability. Psychometrika 2005;70 (1):123–33.