Measurement and Analysis of Quality of Life in Epidemiology

Measurement and Analysis of Quality of Life in Epidemiology

Handbook of Statistics, Vol. 28 ISSN: 0169-7161 Copyright © 2012 Elsevier B.V. All rights reserved DOI: 10.1016/B978-0-44-4518750.00015-4 15 Measure...

1MB Sizes 0 Downloads 58 Views

Handbook of Statistics, Vol. 28 ISSN: 0169-7161 Copyright © 2012 Elsevier B.V. All rights reserved DOI: 10.1016/B978-0-44-4518750.00015-4

15

Measurement and Analysis of Quality of Life in Epidemiology

Mounir Mesbah Laboratoire de Statistique Théorique et Appliquée, Université de Pierre et Marie Curie, Paris, France

Abstract Health related Quality of Life (HrQoL) is one of the most important outcome measure in clinical trials over the past 20 years. More recently, it became also more important in epidemiological surveys, where unlike clinical trials, the number of end points involved is generally larger. In both setting, Epidemiology, or clinical trials, its measurement and statistical analysis remain an issue. The validation of a Health related Quality of Life (HrQoL) measurement is generally mainly done by internal consistency methods as external standards or experts are generally not available. These methods are mainly based on the statistical validation, using goodness of fit tests, of measurement models. We will show in this chapter, how such validation, can be done using the empirical Backward Reliability Curve (the α-curve). Finally, we present some new simulation and real data results. Keywords: quality of life, backward reliability curve, unidimensionality, measurement models, rasch family models, simulations

1. Introduction In epidemiological surveys, Health related Quality of Life (HrQoL) is often considered as a global subjective health indicator. This confusion is mainly due to the strong correlation between the two concepts in most of the modern societies. HrQoL is more and more recognized as an important specific end point, which is generally treated as primary or at least, as secondary criterium in most epidemiological studies. For many reasons, easy to explain, the primary end point is generally the survival (duration of life) or another biological efficiency variable. Most of the time, the Quality of Life appears as an internal time-dependent covariate 369

370

M. Mesbah

in the survival analysis, or as a secondary end point. But, more and more, the effect of the treatment on survival or on other biological efficiency variable is generally already previously well known, so the investigation of treatment effect on Quality of Life is the main issue. The World Health Organization (The WHOQoL Group, 1994) defines Quality of Life as: “an individual’s perception of his/her position in life in the context of the culture and value systems in which he/she lives, and in relation to his/her goals, expectations, standards and concerns. It is a broad-ranging concept, incorporating in a complex way the person’s physical health, psychological state, level of independence, social relationships, and their relationship to salient features of their environment.” Patient Reported Outcomes (PROs) measurements are sometimes confused with Quality of Life measurements. Quality of Life is a broad concept referring to all aspects of a person’s well-being. Measurement of HrQoL is most of the time assessed through a patient questionnaire, where item (or question, or variable) responses are often categorical. In this paper, we present mathematical methods used in the statistical validation and analysis of the HrQoL. These methods are based on the statistical validation of some essential properties induced by measurement models linking the observed responses and unobserved latent HrQoL variables. In Section 2, some important measurement model used in HrQoL research are introduced. Within that section, we show how some important inequalities involving Kullback–Leibler measure of association among conditionally independent variables, can be very helpful in the process of validation. Then, the family of Rasch measurement models is introduced. The Rasch model can be considered as the standard of unidimensional measurement models. It must be used as a “docking” target in building unidimensional scores. Statistical Validation of Health related Quality of Life Measurement Models is thoroughly considered in Section 3. First, we define reliability of a measurement and we give its expression and the expression of reliability of the sum of item responses under parallel model, which is estimated by Cronbach Alpha coefficient. Then the Backward Reliability Curve is presented, and its connection with the notion of unidimensionality is explained, and consequently, how, it can be used to check empirically the unidimensionality of a set of variables. Cronbach Alpha coefficient is well known as a reliability or internal consistency coefficient, with little help in the process of validation of questionnaires. On the other hand, the Backward Reliability Curve can be very helpful in the assessment of unidimensionality which is a crucial measurement property. We explain, why, when such curve is not increasing, lake of unidimensionality of a set of questions is strongly suspected. In Section 4, we say more about construction of unidimensional HrQoL scores. This step follows generally the previous step of checking unidimensionality, using the Backward Reliability Curve. In a multidimensional context, separability of measured concepts need to be confirmed. Differential instrument functioning or invariance of measurement across groups is an important property which is addressed within the same section. Analysis of Health related Quality of Life change between groups is tackled in Section 5. Direct statistical analysis of latent scores through a global latent regression

Measurement and Analysis of Quality of Life in Epidemiology

371

model is shortly discussed, then longitudinal analysis of HrQoL and finally joint analysis of HrQoL and Survival. In Section (6), some simulations are presented, confirming the well behavior of the Backward Reliability Curve, when the items are unidimensional, and its ability to detect lack of unidimensionality. Last section (Section 7) is devoted to the presentation of some interesting real data examples.

2. Measurement models of Health related Quality of Life 2.1. Classical unidimensional models for measurement Latent variable models involve a set of observable variables A X1 , X 2 , . . . , X k  and a latent (unobservable) variable θ which may be either unidimensional (i.e., scalar) or vector valued of dimension d  k. In such models, the dimensionality of A is defined by the number d of components of θ. When d 1, the set A is unidimensional. In a HrQoL study, measurements are taken with an instrument: the questionnaire. It is made up of questions (or items). The random response of a subject i to a question j is noted X ij . The random variable generating responses to a question j is noted, without confusion X j . The parallel model is a classical latent variable model describing the unidimensionality of a set A X 1 , X 2 , . . . , X k  of quantitative observable variables. Define X ij as the measurement of subject i, i 1, . . . , n, given by a variable X j , where j 1, . . . , k, then: X ij

τij

 εij ,

(1)

where τij is the true measurement corresponding to the observed measurement X ij and εij a measurement error. Specification of τij as τij

βj

 θi ,

defines the parallel model. In this setting, βj is an unknown fixed parameter (nonrandom), effect of variable j and θi an unknown random parameter effect of subject i. It is generally assumed with zero mean and unknown standard error σθ . The zero-mean assumption is an arbitrary identifiability constraint with consequence on the interpretation of the parameter: its value must be interpreted comparatively to the mean population value. In our setting, θi is the true latent Health related Quality of Life that clinician or health scientist want to measure and analyze. It is a zero mean individual random part of all observed subject responses Xij , the same whatever is the variable Xj (in practice, a question j of a HrQoL questionnaire). εij are independent random effects with zero mean and standard error σ corresponding to the additional measurement error. Moreover, the true measure and the error are assumed uncorrelated: cov(θi , εij ) 0. This model is known as the parallel model, because the regression lines relating any observed item Xj , j 1, . . . , k and the true unique latent measure θi are parallels.

372

M. Mesbah

Another way to specify Model (1), is through conditional moments of the observed responses. So, the conditional mean of a subject response is specified as: E Xij /θi ; βj 

βj

 θi .

(2)

Again, βj , j 1, . . . , k, are fixed effects and θi , i 1, . . . , n are independent random effects with zero mean and standard error σθ . The conditional variance of a subject response is specified as: VarXij /θi ; βj 

Var(εij )

σ 2.

(3)

These assumptions are classical in experimental design. This model defines relationships between different kinds of variables: the observed score Xij , the true score τij , and the error εij . It is interesting to make some remarks about assumptions underlying this model. The random part of the true measure given by response of individual i to a question j is the same whatever might be variable j. θi does not depend on j. The model is unidimensional. One can assume that in their random part all observed variables (questions Xj ) are generated by a common unobserved (θi ). More precisely, let Xij Xij  βj the calibrated version of the response to item j of person i. Model (2) and (3) can be rewritten: E Xij /θi ; βj 

θi ; j,

(4)

with same assumptions on β and θ and with the same conditional variance model. Another important consequence of the previous assumptions, when the distribution is normal, is a conditional independence property: whatever j and j  , two observed items Xj and Xj are independent conditional to the latent θi . So, even when normality cannot be assumed, it is essential to specify this property.

2.2. Classical Multidimensional models for measurement Classical Multidimensional models for measurement generalize the previous simple parallel model: Xj

βj

 θ  εj

(the subject subscript i is, forgotten without risk of confusion) from one true component θ to p true components θl , with 1 < l < p. First, remark that: Xj

βj

 θ  εj Xj  βj

θ

 εj Xj

θ

 εj .

(5)

In Classical Multidimensional models for measurement, also known as factorial analysis models, the observed item is a linear function of p latent variables: Xj

a11 θ1  a12 θ2   a1p θp  Ej .

(6)

This is usually written in a matrix form: X

AU

 E,

where A is the factor loading matrix and U and E are independent.

(7)

Measurement and Analysis of Quality of Life in Epidemiology

373

Principal Component Analysis (PCA) is a particular factorial analysis model with p k, and without error terms (E is not in the model). In PCA, components (θl ) are chosen orthogonal (θl θm ) and with decreasing variance (amount of information). In practice, a varimax rotation is often performed after a PCA to allow a better interpretation of the latent variable in terms of the original variables. It allows a clear clustering of the original variables in subsets (unidimensional). In Section 3.2 we will show how this can be checked using a graphical tool, the Backward Reliability Curve. Parallel as well factor analysis models are member of classical measurement models. They deals mainly with quantitative continuous responses, even if some direct adaptations of these models to more general responses are today available. In the next section, we present the modern approach which include the classical one as a special case. Within this approach, qualitative and quantitative responses can be treated indifferently. Some useful general properties not well known but very important for the validation process of questionnaires are also presented. We introduce the Rasch model, and show how it can be interpreted as a nonlinear parallel model, more appropriate when responses are categorical.

2.3. Latent variable models and Graphical Modeling Modern ideas about measurement models are more general. Instead of arbitrarily defining the relationship between observed and the true latent as an additive function (of the true latent and the error), they just focus on the joint distribution of the observed and the true variables f (X, θ). We do not need to specify any kind of distance between X and θ. The error E and its relation to the observed X and the latent θ could be anything! This leads us naturally to Graphical Modeling. Graphical Modeling aims to represent the multidimensional joint distribution of a set of variables by a graph. We will focus on conditional independence graphs. The interpretation of an independence graph is easy. Each multivariate distribution is represented by a graphic, which is composed of nodes and edges between nodes. Nodes represent one-dimensional random variables (observed or latent, i.e., non-observed) while a missing edge between two variables means that those two variables are independent conditionally on the rest (all other variables in the multidimensional distribution). Since the pioneered work of Lauritzen and Wermuth (1989), a lot of monographs on Graphical Modeling are now available (Whittaker, 1990; Lauritzen, 1996; Edwards, 2000). One way to define latent unidimensionality in the context of graphical model is straightforward: a set of variables X are unidimensional, if there exist one and only one scalar latent variable θ such that each variable X is related to θ and only to θ. In Fig. 1a, the set of variables X1 , X2 , . . . , X9 is unidimensional. In Fig. 1b, the set of variables X1 , X2 , . . . , X9 is bidimensional. The unidimensionality is a consequence of the dimension of θ. The word latent means more than the fact that θ is not observed (or hidden). It means that θ is causal. The observed items Xj are caused by the true unobserved θ and not any other variable! This causal property is induced by the conditional independence property. If Xj is independent of Xj conditionally to θ, then knowledge of θ is enough. Such directed graphical models are also known as causal graphics or Bayesian networks.

374

M. Mesbah

Fig. 1. Graphical unidimensional or bidimensional model.

2.3.1. Measure of association and graphical models Let K (f, g) be the Kullback–Leibler Information between two distributions with respective density function f and g:



K (f, g)

f (x) log



f (x) dx. g(x)

(8)

The Kullback–Leibler Measure of Association (KI ) between two random variables X and Y with respective marginal distribution fx and fy and with joint distribution fxy is given by: KI (X, Y )

K (fxy , fy fx ).

(9)

In the same way, the measure of association between two variables X and Y conditionally on a third one Z is the Kullback–Leibler Measure of Conditional Association (KI ((X, Y )/Z), which using similar straightforward notations is given by: KI ((X, Y )/Z)

K (fxyz , fy/z fx/z fz )

K (fxyz , fyz fxz /fz ).

(10)

Theorem 2.1 Let X, Y , and Z three random variables such that X is independent from Y conditionally on Z. Then under mild general regularity conditions, we have: (1) (2) (3) (4)

KI ((X, Y )/Z) 0; KI ((Y, Z)/X ) KI ((Y, Z)  KI ((X, Y ); KI ((X, Z)/Y ) KI ((X, Z)  KI ((X, Y ); KI ((X, Y )  KI ((X, Z) and KI ((X, Y )  KI ((Y, Z).

Proof 1 1, 2, and 3 can be easily derived. 4 is a direct consequence of 1, 2, 3, and the Cauchy–Schwartz inequality (K (X, Y ) is always positive). The interpretation of 2 and 3 is the following: if we use the KI as measure of association, then the marginal association between two variables related by an edge in the graph G is stronger than the marginal association between two non-related variables. Remarks: (1) If (X, Y ) is normally distributed, then KI (X, Y ) is a monotonic function of ρ 2 (X, Y ), the square of the correlation coefficient. So KI (X, Y ) can be considered as a generalization of ρ 2 (X, Y ).

Measurement and Analysis of Quality of Life in Epidemiology

375

(2) If (X, Y, Z) is normally distributed, then KI (X, Y /Z) is a monotonic function of ρ 2 (X, Y /Z), the square of the partial correlation coefficient. So, KI (X, Y /Z) can be considered as a generalization of ρ 2 (X, Y /Z). Using (result (4)) of Theorem 2.1, and the of collapsibility property of a graphical model (Frydenberg, 1990; Mesbah et al., 1999), one can derive the following useful results. Consequences: (1) In Fig. 1a, the marginal association between any observed item X and the latent variable θ is stronger than the association between two observed items. (2) In Fig. 1b, the marginal association between any observed item X and its own latent variable θ is stronger than the association between that item X and another latent variable (other dimensions). These two relationships between marginal measures of association are useful characterizations of the conditional independence property which is a core property of latent variable models. Remarks: Under the Parallel model presented in Section 2, whatever j and j  , we

have: Corr(Xj , Xj ) ρ and Corr(Xj , θ) ρ, then, Corr2 (Xj , θ)

ρ  Corr2 (Xj , Xj )

ρ2.

This is a direct consequence of the fact that, under normality and parallel model assumption, items are independent conditionally to the latent variable. Consequences 1 and 2 are very helpful in the process of questionnaire validation. Graphical models framework is helpful to explain relationships between variables, when some of these are observed and others are not. Historically, the Rasch model, that we are going to introduce in the next section, was established earlier, in the sixties of the last century, mainly, as a measurement model more appropriate to binary responses, which occur frequently in HrQoL questionnaires. Nevertheless, its connection with graphical models through conditional independence properties included in it, is recent.

2.3.2. The family of Rasch measurement models The parallel model presented in Section 2 is a linear mixed model. When item responses are binary, ordinal, or categorical, the parallel model is inappropriate. For instance, when the item response is a Bernoulli variable Xij taking values xij (coded for instance 0 (failure or false or no) or 1 (success or correct or yes)), theories of exponential family and of generalized linear models (Mac Cullagh and Nelder, 1989) suggest us an adapted generalized linear model alternative to the model (2). Instead of the linear model: E Xij /θi ; βj 

βj

 θi ,

(11)

define the generalized linear model, using canonical link associated to Bernoulli distribution, Logit(E Xij /θi ; βj )

βj

 θi ,

(12)

376

M. Mesbah

with as previously βj a fixed effect and θi independent random effects with zero mean and standard error σθ . This model is known as the Mixed Rasch model. Its classical version, with θi assumed as a fixed parameter was introduced and popularized by the Danish mathematician George Rasch (Rasch, 1960) with the expression below. It is probably the most popular of modern measurement models in the psychometric context, where, it is mainly used as a measurement model. Under Rasch model framework, the probability of the response given by a subject i to a question j is

P(Xij

xij /θi ; βj )

exp (xij (θi  βj )) . 1  exp (θi  βj )

(13)

θi is the person parameter: it measures the ability of an individual n, on the latent trait. It is the true latent variable in a continuous scale. It is the true score that we want to obtain, using the instrument (questionnaire) including k items (questions) allowing us to estimate the true measurement (HrQoL) θi of person i. βj is the item parameter. It characterizes the level of difficulty of the question. The Rasch model is member of the Item Response Theory models (Fischer and Molenaar, 1995). The Partial Credit model (Masters, 1982) is another member of the same family: it is the equivalent to the Rasch model for ordinal categorical responses, with more than two levels of responses. Let Pijx P(Xij x), then



Pijx



x  l 1 βjl  , m h l 1 βjl h0 exp hθi  exp xθi j

(14)

for x 1, 2, . . . , mj (mj is the number of levels of item j); i 1, . . . , N (number of subjects); j 1, . . . , k (number of items). Figure 2a shows probability of positive response curves of set of three (3) Rasch items, drawn on the same graphic. All these three curves are increasing and “parallel” (two curves corresponding to two different items never crosses).

Fig. 2. Probability curves of simulated items from Rasch and Partial Credit models.

Measurement and Analysis of Quality of Life in Epidemiology

377

Figure 2b show probability of level curves of set of 3 Partial Credit items, each with three ordinal levels, drawn on the same graphic. The curve corresponding to the lowest level is always decreasing. The curve corresponding to the highest level is allays increasing, all other curves looks like a Gaussian curve. It is easy to show, using (14), that this behavior is always the same for Partial Credit items. Once more, curves corresponding to the same level are “parallel” curves.

2.3.3. Rasch model properties (1) Monotonicity of the response probability function. (2) Local Sufficiency: sufficiency of the total individual score for the latent parameter (considered as fixed parameter). (3) Local Independence (items are independent conditional to the latent). (4) Non Differential Item Functioning (conditional to the latent, items are independent from external variables). The first property is an essential property for latent models. It is included in the Rasch model through the logistic link. Mokken model (Molenaar and Sijtsma, 1988) does not assume the logistic link, but a nonparametric monotone link function: this is appealing for HrQoL field, but relaxing the logistic link, we loose the sufficiency property (2) of the total individual score, which is the most interesting characteristic property of Rasch model in the HrQoL field. This property justify use of simple scores as surrogate for the latent score. Kreiner and Christensen (2002) focus on this sufficiency property and define a new class of nonparametric models: the Graphical Rasch model. The last properties (3 and 4) are not included nor specific in the Rasch Model, but added general latent models properties. Considering the latent parameter as a fixed parameter lead to joint maximum likelihood method which, in this context, can be inconsistent (Fischer and Molenaar, 1995). Conditional Maximum Likelihood method based on the sufficiency property gives consistent and asymptotically normal estimates for item parameters (Andersen, 1970). When the latent parameter is clearly assumed as random, estimation of (β, σ 2 ) can be obtained by marginal maximum likelihood method. In HrQoL practice, the distribution of the latent parameter is generally assumed as Gaussian with zero population mean and unknown population variance σ 2 . The Likelihood function can be easily derived after marginalizing over the unobserved random parameter, the joint distribution of item responses and the latent variable, and, then, using local independence property, one get: L(β, σ 2 )

   K  J exp θ  βj xij  

( 2πσ 2 )K i 1  j 1 1  exp θ  βj

  1

2 exp 2σθ2

dθ .

(15)

Estimation of β parameters can be obtained using Newton–Raphson and numerical integration techniques or EM algorithm followed by Gauss–Hermite quadrature (Hamon and Mesbah, 2002; Fischer and Molenaar, 1995).

378

M. Mesbah

2.3.4. The remaining issue of estimation of latent parameters Estimation of item parameters is generally the main interest in psychometrical area. Calibration of the HrQoL is the preliminary goal. When item parameters are known (or assumed as fixed and known) estimation of the latent parameter is straightforward. One easy method is just to maximize classical joint likelihood method assuming that the latent parameter is a fixed parameter. Because item parameters are supposed to be known there is no problem of inconsistency estimation. Another popular estimator of latent parameter is the Bayes estimator, given by the posterior mean of the latent distribution (Fischer and Molenaar, 1995). Other estimators can be obtained. Mislevy (1984) propose a nonparametric Bayesian estimator for the latent distribution in the Rasch model. Martynov and Mesbah (2006) gives a nonparametric estimator of the latent distribution in a Mixed Rasch model. The posterior distribution of the latent parameter is defined as:



P θi /xi , β





 

 

P Xi

xi /θi , β g(θi )

P Xi

xi /θi , β g(θi )dθi

.

(16)

The Bayesian modal estimator is  θi , the value of θi which maximize the posterior distribution, while the Bayes estimator is given by:

 θi





θi P θi /xi , β g(θi )dθi .

(17)

The estimation of latent individual parameters in a frequentist point of view remains an issue. It is also done in a two step way. First, the item parameters are consistently estimated by a conditional or marginal maximum likelihood method, then their estimated value is plugged in a modified-likelihood function, assuming known values for item parameters. The argument of conditioning can be used to estimate directly the latent parameter, by the use of a likelihood function, conditional on the total item scores. The generally small number of items, limits the use of this method in real practice. In the next sections, we will show, how, validation of questionnaires (Section 3) and construction of scales (Section 4) can be performed.

3. Validation of HrQoL measurement models 3.1. Reliability of an instrument: Cronbach Alpha coefficient A measurement instrument gives us values that we call observed measure. The reliability ρ of an instrument is defined as the ratio of the true over the observed measure. Under the parallel model, one can show that the reliability of any variable Xj (as an instrument to measure the true value) is given by: ρ

σθ2

σθ2  σ 2

,

(18)

which is also the constant correlation between any two variables. This coefficient is also known as the intra-class coefficient. The reliability coefficient, ρ, can be easily

Measurement and Analysis of Quality of Life in Epidemiology

379

Fig. 3. Theoretical relationship between α and the number of items.

interpreted as a correlation coefficient between the true and the observed measure. When the parallel model is assumed, the reliability of the sum of k variables equals: ρ k

kρ . kρ  (1  ρ)

(19)

This formula is known as the Spearman–Brown formula. The Spearman–Brown formula indicates a simple relationship between ρ k and k, the number of variables. It is easy to show that ρ k is an increasing function of k. Figure 3 shows, as drawn on the same graph, these theoretical reliability curves corresponding to ρ 0.1; 0.2; . . .; 0.9. The maximum likelihood estimator of ρ k , under parallel model and normal distribution assumption, is known as Cronbach’s Alpha Coefficient (CAC) (Cronbach, 1951; Kristof, 1963). It’s expression is: α

k

1

k1

k

2 j 1 Sj 2 Stot



,

(20)

where Sj2

1

n1

n 

(Xij

i 1

 X j )2

and 2 Stot

1

nk  1

n  k  i 1 j 1

(Xij

 X )2 .

Under the parallel model, the joint covariance matrix of the observed items Xj and the latent trait θ is:



VX,θ

     

σθ2  σ 2 σθ2 : σθ2 σθ2

σθ2

σθ2  σ 2 :







σθ2 σθ2

σθ2 : σθ2



:

σθ2

 σ2

σθ2

σθ2 σθ2 : σθ2 σθ2



   ,  

380

M. Mesbah

and the joint correlation matrix of the observed items Xj and the latent trait θ is:

      

RX,θ

1 ρ :

ρρ





ρ



ρ 1 :

: ρ

ρ ρ

ρ ρ

:ρ 1

:











    .  

The marginal covariance VX and correlation matrix RX of the k observed variables Xj , under the parallel model, are:

 VX

and

   

 RX

   

σθ2  σ 2 σθ2 : σθ2

σθ2 σθ2

σθ2



: σθ2

σθ2  σ 2 :





ρ ρ

ρ

ρ 1 :

1 ρ : ρ

σθ2

: ρ



: 1

: σθ2  σ 2

    

   . 

This structure is known as compound symmetry type. It is easy to show that the reliability of the sum of k items given in (19) can be expressed as: ρ k

k



1

k1



trace(VX ) J  VX J

(21)

with J a vector with all component equal 1, and k

α

k1



1



trace(SX ) , J  SX J

(22)

where SX is the observed variance, empirical estimation of SX . There is, in the literature, even recent, a comprehensible confusion between Cronbach Alpha as a population parameter (theoretical reliability of the sum of items) or its sample estimate. Exact distribution of α under Gaussian parallel model and its asymptotic approximation are well known (van Zyl et al., 2000). In the next subsections, we recall their main results.

3.1.1. Exact distribution of Cronbach Alpha Assuming parallel model on Gaussian distribution of the latent and error component, we have: 1

1  ρ k

(1  α)  Fnn(k 1) ,

(23)

Measurement and Analysis of Quality of Life in Epidemiology

381

where Fn is the Fisher distribution with n and k  1 degree of freedom. A direct consequence is that, under same assumption, exact population mean and variance of α follows: n(k 1)

nρ k  2 ; n2 2(1  ρ k )2 n(nk  2) Var(α) ρ k . (k  1)(n  2)2 (n  4) E(α)

(24)

3.1.2. Asymptotical distribution of Cronbach Alpha When, the Gaussian distribution cannot be assumed, but the parallel form remains, the following results are obtained:

 ρ k ; 2(1  ρ k )2 k (b) nVar(α)  ρ k ; (k  1) (c) α  ρ k ;

n   1 k ln (1  α)  N ln (1  ρ k ); . and(d) 2 2 2(k  1) (a) E(α)

(25) (26)

In addition, it is easy to show a direct connection between the CAC and the percentage of variance of the first component in PCA which is often used to assess unidimensionality (Moret et al., 1993). The PCA is mainly based on analysis of the latent roots of VX or RX (or, in practise their sample estimate). The matrix RX has only two different latent roots, the greater root is λ1 (k  1)ρ  1, and k λ1 λ3 λ4

1  ρ k1 . So, using the the other multiple roots are λ2 Spearman–Brown formula, we can express the reliability of the sum of the k variables   k 1 as ρ k k 1 1  λ1 .

This clearly indicates a monotonic relationship between ρ k , which is consistently estimated by the CAC and the first latent root λx , which in practice is naturally estimated by the corresponding value of the observed correlation matrix and thus the percentage of variance of the first principal component in a PCA. So, CAC can also be considered as a measure of unidimensionality. Nevertheless such measure is not very useful, because, it is easy to show, using the Spearman–Brown formula (19) that, under the parallel model assumption, the reliability of the total score is an increasing function of the number of variables. So, if the parallel model is true, increasing the number of items will increase the reliability of a questionnaire. Moreover, this coefficient lies between 0 and 1. Zero value indicates a totally unreliable scale, while unit value means that the scale is perfectly reliable. Of course, in practice, these two scenarios never occur! The Cronbach α-coefficient is an estimate of the reliability of the raw-score (some of item responses) of a person if the model generating those responses is a parallel model. It could be a valid criterium of the unidimensionality of such responses, if, again, those item responses are generated by a parallel model.

382

M. Mesbah

In the next Section, we show how to build and to use a more operational and more valid criterium to measure the unidimensionality of a set of items: the Backward Reliability Curve (the α-curve).

3.2. Unidimensionality of an instrument: Backward Reliability Curve Statistical validation of unidimensionality can be performed through a goodness of fit test of the parallel model or Rasch model. There is a great literature on the subject, within classical or modern methods. These goodness of fit tests are generally very powerless because their null hypothesis is not focusing on unidimensionality: it includes indirectly other additional assumptions (for instance normality for parallel models, local independence for Rasch models, etc.), so the departure from these null hypothesis is not specifically a unidimensionality departure. In the following, we are presenting a graphical tool, helpful in the step of checking the unidimensionality of a set of variables. It consist on a curve to be drawn in a stepwise manner, using estimates of reliability of sub scores (total of a sub set included in the starting set). The first step uses all variables and compute their CAC. Then, at every successive step, one variable is removed from the score. The removed variable is that one which leaves the score (remaining set of variables) with a maximum CAC value among all other CAC of remaining sets checked at this step. This procedure is repeated until only two variables remain. If the parallel model is true, increasing the number of variables increases the reliability of the total score which is consistently estimated by Cronbach’s alpha. Thus, a decrease of such a curve after adding a variable would cause us to suspect strongly that the added variable did not constitute a unidimensional set with variables already in the curve. This algorithm was successfully used in various previous medical applications (Moret et al., 1993; Curt et al., 1997; Nordman et al., 2005). Drawing the Backward Reliability Curve (BRC) of a set of unidimensional items is an essential tool in the validation process of a HrQoL questionnaire. When one develop a HrQoL questionnaire, generally, the main goal is to measure some unidimensional latent subjective traits (such as sociability, mobility, etc.). Use of the BRC in empirical data is very helpful to detect non unidimensional subsets of items. When the BRC is not an increasing curve, one can remove one or more items to get an increasing curve. So, if the reduced set gives an increasing curve, it is in some sense, more valid in term of unidimensionality than the previous one.

4. Construction of Quality of Life scores 4.1. From reliability to unidimensionality Measuring individual Quality of Life is frequently done by computing one or various scores. This approach assumes that the set of items being considered represent a single dimension (one score) or multiple dimension (multiple scores). These scores can be considered as statistics, function of individual measurements (for instance item responses). They must have good statistical properties. Cronbach α-coefficient, as an indicator of reliability of an instrument, is probably one of the most used in HrQoL fields or more generally in applied psychology. The

Measurement and Analysis of Quality of Life in Epidemiology

383

big trouble with Cronbach α as a reliability coefficient is the lake of clear scientific rule to decide whether or not a score (based on a set of items) is reliable or not. We need to know a threshold to decide that the score is reliable or not. Following Nunnaly (1978), a scale is satisfactory when it has a minimal Cronbach’s alpha value around 0.7. The “Nunnally rule” is an empirical rule without any clear scientific justification. So reliability is not a direct operational indicator. Spearman–Brown formula (6) is a direct consequence of parallel model assumptions. It implies that, when adding an item, or more generally increasing the number of items, the reliability of the sum of item responses must increase. This property is of course a population property characterizing the parallel model. Its sampling version is probably less regular. Cronbach α coefficient is the sampling estimate of reliability of the sum of item responses. So, use of the Backward Reliability Curve as an empirical rule to validate graphically the parallel model and so, unidimensionality of set of items, is straightforward. The use of Backward Reliability Curve to find unidimensional set of items must be done in an exploratory way. It is a fast way to find suspect items, i.e., those items that must be removed to ensure an increasing curve and so a parallel model. It can also be used in a confirmatory way to a given set supposed unidimensional. When a given set of items have a nice Backward Reliability Curve (i.e., smoothly increasing in a close way to the one theoretical Spearman–Brown curve), one can perform additionally some statistical goodness of fit tests to check specific underlying properties. This consists mainly in validating the compound symmetry structure of the covariance matrix of the items, including assumption of equality of item variances and itemlatent variances. When the item responses are binary or ordinal one can test some underlying properties of the Rasch model (Hamon et al., 2002). In practice, this is rarely done, because of the lack of implementation of such tests in most of general statistical softwares. Under Rasch model a reliability coefficient close to Cronbach Alpha can be derived (Hamon and Mesbah, 2002). It can be interpreted in the same way as in parallel models. A Backward Reliability Curve can be used at a first step followed by a goodness of fit test of the Rasch model. Hardouin and Mesbah (2004) used a multidimensional Rasch model and Akaike Information, in a step by step procedure, to get, in an exploratory way, unidimensional clusters of binary variables. Most of the time, in real HrQoL research, simpler validation techniques are often performed. More details are given in the next section.

4.2. Specificity and separability of scores Measurement models considered here are very simple models based on unidimensionality principle. They can be defined as Rasch type models: parallel model for quantitative items and Rasch or Partial Credit model for ordinal items. Each “unidimensional” set of items is related to one and only one latent variable. There is no confusion between “concepts,” so an item cannot be related directly to two latent variables. An item can be related to another latent variable only through its own latent variable. It is of course a strong property, hard to get in practice. HrQoL questionnaires are built using questions drawn with words and often health concepts (psychological, sociological, or even physical concepts) are not clearly

384

M. Mesbah

Fig. 4. Graphical latent variable model.

separated. Anyway, measurement is generally considered as the beginning of Science, and Science is hard to achieve. So, correlations between each item and all unidimensional scores must be computed. This can be considered as part of the internal validation in a multidimensional setting, to ensure the separability of the subsets. We must check that for any item: (1) Specificity: there is a strong correlation between that item and its own score, and (2) Separability: the item correlation between that item and its own score is higher than the correlation between the same item and scores built on other dimensions. This is a direct consequence of Section 2.3. The first property is another view of internal consistency condition of the subscale. Under the parallel model, that correlation is the same whatever is the item, and it is also known as the intra-class coefficient. The Cronbach Alpha is a monotone function of that value. It must be evaluated for each sub scale. Item correlations between each item and all sub sores must be tabulated.

4.3. Graphical latent variable models for quality of life questionnaires Graphical latent variable models for scales can be easily defined as graphical models (Lauritzen and Wermuth, 1989) built on multivariate distribution of variables with three kind of nodes: • those corresponding to observed or manifest variables corresponding to items or questions, • those corresponding to unobserved or hidden variables corresponding to latent variables, • and those corresponding to other external variables. Figure 4 shows two examples, the first (Fig. 4a), with 13 items related to 3 latent variables and without external variables, and the second (Fig. 4b) with 9 items related

Measurement and Analysis of Quality of Life in Epidemiology

385

to 2 latent variables and 2 external variables Y and Z. The part of the graphic relating items and their corresponding latent variable is a graph where as previously, items are not, two by two, related by an edge. They are related only to the latent variable. One must have also the following properties: (1) Monotonicity: the marginal distribution of an item conditional to its latent variable must be a monotonous function of the latent variable. (2) Non Differential Item Functioning: is a graphical property. There are no direct edges between nodes corresponding to any item and another latent variable or between any item and any external variable.

5. Analysis of Quality of Life change between groups 5.1. Use of HrQoL scores or global analysis Development and validation of a HrQoL questionnaire is generally a hard work requiring more than one survey and many real sets of data. When the structure of the questionnaire is stabilized, i.e., when the clustering of the items in subset of unidimensional items is clearly defined, one needs simple rules for analyzing data of studies including the HrQoL questionnaire simultaneously with other external variables. So, a HrQoL questionnaire, like any instrument, must include “guidelines” for the statistical analysis step. Most of the time, for easiness to use, only simple rules based on computing simple scores are included: (1) Sum of item responses: this score is a sufficient statistics for the latent parameter under Rasch model. Under the parallel model, its reliability is estimated by the Cronbach Alpha coefficient. It is the simplest and easiest score to derive. (2) Weighted sum of item responses: is a more complicated than previous score. The weights are generally fixed and obtained with a Principal Component Analysis previously performed in a “large representative population.” (3) Percentage of item responses: this score is similar to the first, with different range of its values. This range is between 0% and 100%. When a dimension include k ordinal items with responses coded 0, 1, . . . , m. (all items with same maximum level m), this score is obtained by dividing the first score by km. Unfortunately, estimation of latent parameter is rarely suggested in a “guidelines book” of a HrQoL questionnaire, because it needs use of specific software, including latent variable estimation section. Scores (2) needs knowledge of the “good weights” given by the instrument developer, which is generally a marketing device to oblige any user of the questionnaire (like for instance scientists, clinical investigators, pharmaceutical companies, etc.) to pay royalties. In practice, these weights are generally obtained in a specific population and are not valid for another one. Use of a score such (1) or (3) is, in our point of view, the best way to do simply, in particular, when we do not have easy access to specific software for estimation of Rasch type models.

386

M. Mesbah

5.2. Latent regression of HrQoL It is usual to analyze HrQoL data with classical linear or generalized linear models where the response are scores of HrQoL built at a first step (measurement step). So, item responses are forgotten and replaced by summary surrogate scores. The analysis is of course easier and can be done using classical general software. Generally one assume that the distribution of scores is Gaussian, which is facilitated by the fact that most measurement models (parallel, Rasch, etc.) specify a Gaussian distribution for the latent variable. For instance, when the built score is a percentage, one can analyze its relation with other external variables by the mean of a logistic regression model which allows interesting interpretations in term of odds ratios. Nevertheless, analyzing surrogate scores as “observations” instead of the actual observation, i.e., item responses can give unsatisfactory results (Mesbah, 2004), mainly in term of lake of efficiency. So, when analyzing the relationships between the latent HrQoL and any other external variables (for instance survival time, treatment, sex, age, etc.), it could be more efficient to consider a global model, even if one do not need to build new scores or to valid once more the measurement model. In fact, under some additional simple conditions, that in most of real situations, can be easily assumed, must lead to a better statistical efficiency when considering such global model. Building a global model taking into account the latent trait parameter, without separation between measurement and analysis steps is a promising latent regression approach (Christensen et al., 2004; Sébille and Mesbah, 2005) allows nowadays by the increasing performance of computers. Nevertheless this approach need to be handled with care. Each practical case must be theoretically well analyzed, with a deep investigation of which specific identifiability constraints that we have to choose. We have to take care that this choice does not upset the interpretation of the final results. Joint analysis of a longitudinal variable and an event time is nowadays a very active field. Vonesh et al. (2006), Cowling et al. (2006), or Chi and Ibrahim (2006) are few recent papers indicating that “Joint modeling of longitudinal and survival data is becoming increasingly essential in most cancer and AIDS clinical trials.” Mainly due to the complexity of the computing programs, there is unfortunately no papers considering a joint model between a longitudinal latent trait and an event time. We present, in Section 5.3.1 a detailed study on joint analysis of a latent longitudinal variable and an event time. Another very popular method used in the 19th of last century was the Q-TWIST (Quality adjusted Time WIthout Symptoms of Toxicity) approach (Gelber et al., 1996), where duration of life was just divided in different categories corresponding to various state of health with given utilities. So, it is a kind of weighted survival analysis (weighted by utility weights or HrQoL weights). It was a two step approach, but main criticisms come more about the fact that used utility values, had, in practice, very poor measurement properties (Mesbah and Singpurwalla, 2008). Our approach, can be considered, as, in the framework of mixed models with a clear interpretation of the random factor as a latent trait, previously validated in an measurement step. Items are repeated measurements of such true latent trait.

Measurement and Analysis of Quality of Life in Epidemiology

387

Computer programs are nowadays available even in general softwares (Hardouin and Mesbah, 2007) which allows building and estimating nonlinear models with random effects models.

5.3. Longitudinal analysis of HrQoL and the shift response issue 5.3.1. Joint analysis of a longitudinal QoL variable and an event time Motivations of the following models is a HrQoL clinical trial involving analysis of a longitudinal HrQoL variable and an event time. In such clinical trial, the longitudinal HrQoL variable is often unobserved at dropout time. The model proposed by Dupuy and Mesbah (2002) (DM model) works when the longitudinal HrQoL is directly observed at each time visit except of course at dropout time. We propose to extent the DM model to the latent context case, i.e., when the HrQoL variable is obtained through a questionnaire. Let T be a random time to some event of interest, and Z be the HrQoL longitudinally measured. Let C be a random right-censoring time. Let X T  C and 1T C  . Suppose that T and C are independent conditionally on Z. Following, the Cox model, the hazard function of T has the form λ(tZ)

λ(t) exp (β T Z(t)),

(27)

The observations are: Xi , i , Zi (u), 0  u  Xi 1in . The unknown parameters t are: β and (t) 0 λ(u)du. Let us assume that C is non informative for β and λ. Dupuy and Mesbah (2002) suggest a method that suppose a non ignorable missing process, take into account the unobserved value of the longitudinal HrQoL variable at dropout time and use a joint modeling approach of event-time and longitudinal variable. Dupuy and Mesbah’s model assume that: λ(tZ)

λ(t) exp (W (t)β0Zad

 β1 Z d )

(28)

with • Z has a density satisfying a Markov property: fZ (zj zj 1 , . . . , z0 ; α) fZ (zj zj 1 ; α), α  p , • C is non informative for α and does not depend on Z(t).

Ê

Let W (t) (Zad , Zd )T and β T (β0 , β1 ). The observations are Yi (Xi , i , Zi,0 , . . . , Zi,ad )1in . The unknown parameters of the model are τ (α, β, ). There are hidden variables in the model, the missing values of Z at dropout time, Zi,ad . See Fig. 5 for more details. The objective is to estimate τ from n independent vectors of observations Yi .

388

M. Mesbah

Fig. 5. QoL assessments: t0 0 <    < tj 1 < tj <    < . Z: takes value Z(t) t time t and constant values Zj in the intervals (tj 1 , tj . Zj is unobserved until tj .

The likelihood for one observation yi (1  i  n) is obtained as: (i)

L (τ )



δi

T

 

λ(xi ) exp δi β wi xi



f





xi

 

β T wi u

λ u e





du

0

zi0 , . . . , ziad , zd ; α dzd

l(yi , zd , τ )dzd .

The parameter τ is identifiable. First, suppose that the functional parameter τ is a step function n (t) with jumps at event times Xi and taking unknown values n (Xi ) n,i , then rewrite the likelihood and estimate α, β and n,i . The contribution of yi to the likelihood obtained is now taken to be:



 

δn,ii exp δi β T wi xi

L(i) (τ )

f (zi , . . . , zi 0

ad



p(n) 

 β T wi (xk )

n,k e

1xk xi  

k 1

, zd ; α)dzd ,

where n,k n (Xk ) n,k  n,k 1 , n,1 n,1 and X1 < < Xp(n) (p(n)  n) are the increasingly ordered event times. The maximizer τn of n (i) i 1 log L (τ ) over τ  n satisfies: n  ∂  i 1

∂τ

Lτ(i) n (τ )

τ τ n

0,

where Lτ(i) ) Eτ n log l(Y, Z; τ )yi . n (τ (i) Let refer ni1 Lτ n (τ ) to as the EM-loglikelihood.

Measurement and Analysis of Quality of Life in Epidemiology

389

An EM algorithm used to solve the maximization problem is described by Dupuy  n ) of ni1 log L(i) (τ ) over τ  n and Mesbah (2002). A maximizer τn (α n , β n , exists and under some additional conditions,

( n(α n  αt ),

n(β  β ), n(   ))  G, n t n t

where G is a tight Gaussian process in l  (H) with zero mean and a covariance process covG(g), G(g ) (Dupuy et al., 2006). From this we deduce for instance: (1)

n(β  β ) converges in distribution to a bivariate normal distribution with n t

mean 0 and variance–covariance matrix τt 1 , (2) consistent estimate of τt is obtained,

 n are obtained. and similar results for α n and When the HrQoL variable z was observed (excepted for the last unobserved dropout value zd ), the likelihood for one observation yi (1  i  n) was: (i)

L (τ )



δi

λ(xi ) exp δi β wi xi

f







zi0 , . . . , ziad , zd ; α dzd

where, yi (xi , δi , zi0 , . . . , ziad ) inference, based on the likelihood, L(i) (τ )

 

T

xi

 

λ u e

β T wi u





du

0

l(yi , zid , τ )dzid ,

(xi , δi , ziobs ), and, all the previous statistical

l(xi , δi , ziobs ), zd , τ )dzd

(29)

is highly validated by theoretical asymptotic results and well working computer algorithms. In the latent variable context, ziobs is in fact not directly observed. The k item responses Qij of a subject i (response or raw vector Qi ) are observed and must be used to recover the latent HrQoL values zi through a measurement model. The obvious choice in our context is the Rasch model, which is for binary responses:



P Qij

qij zi , ζj





f qij , zi , ζj



e(zi ζj )qij . 1  ezi ζj

(30)

(Xi , i , Qi0 , . . . , QiaD )1in ; with Qi So, currently, observations are Yi (Qi1 , . . . , Qip ) for a unidimensional scale of p items. Unknown parameters of the model are τ (α, β, ) and nuisance parameters, ζ . The objective is now to estimate τ from n independent vectors of observations Yi . Let us suppose the following two assumptions hold: (1) The DM analysis Model hold for the true unobserved QoL Z and dropout D or survival T. (2) The Rasch measurement model relate the observed response items Q to QoL Z.

390

M. Mesbah

First, we have two main issues: • Specification of a model for the data and the true latent QoL. • Choice of a method of estimation. Similar to Rasch model, for categorical ordinal responses (with number of levels mj different per item), the Partial Credit model: pc



P Qij

czi , ζj



e

mj

 c l 1 ζjl   czi  cl1 ζjl

czi 

c0 e

.

(31)

The joint distribution of Q (items), Z (latent), D (time to death or dropout), and T (treatment) can be derived, using only the conditional independence property (Fig. 6) : f (Q, Z, D, T /Z)

f (Q, Z, D, T ) f (Z)

f (Q, Z) f (Z)

D, T ) , f (Z,f (Z)

(32)

so, we have: f (Q, Z, D, T /Z)

f (Q/Z) f (D, T /Z).

(33)

Then, without any other assumption, we can specify two models: • First model: f (Q, Z, D, T )

f (Q/Z) f (D/Z, T ) f (Z/T ) f (T ).

(34)

f (Q/Z) f (Z/D, T ) f (D/T ) f (T ).

(35)

• Second model: f (Q, Z, D, T )

Fig. 6. Joint graphical model for HrQoL and survival.

Measurement and Analysis of Quality of Life in Epidemiology

391

The right likelihood must be based on the probability function of the observations, i.e., currently, Yi (Xi , i , Qi0 , . . . , QiaD )1in . The parameters of the model are τ (α, β, ) and the nuisance difficulty parameters of the HrQoL questionnaire, ζ . There are non-observed (hidden) variables in the model (latent Z, missing Q): (Zi0 , . . . , ZiaD , Zid , Qd )1in . Straightly followed from the graph (Fig. 6) of the DMq model, factorization rules of the joint distribution function of the observations (Yi ), the latent HrQoL (Z) and the missing questionnaire Qd can now be specified, and then, integrating through the hidden variables, one gets the likelihood: L(i) (τ )

   p cz  c ζ ! e i0 l 1 jl  mj cz c ζ





h0 e

j 1



p



i0

e

czid  

c1 mj



h0 e

exp δi β T wi f (zi , . . . , zi 0

me

 c l 1 ζjl  c l 1 ζjl

czid 

 

ad



l 1 jl

xi



p(n) k 1

j



 c l 1 ζjl

czia  d 

h 0 e

 c l 1 ζjl

czia  d

" (i) δ  L (τ ) n,ii n,k eβ

T w (x ) i k

 1xk xi 

, zd ; α)dzi0 , . . . , ziad , zid .

The marginalization over the latent variables is similar to the marginalization over the dropout missing value. Computer programs are easily extended. Nevertheless, when the number of latent components is large, computing time can be very long. So, generally, in health applications, a two step approach is reasonably preferred.

5.3.2. Shift response issue in longitudinal analysis of HrQoL It is probably the current most challenging issue. Psychometricians are familiar with the difficulties of longitudinal analysis. Is the observed change a real change, or an artifact of the measuring instrument (questionnaire) which longitudinally, becomes obsolete. The responses of those interviewed, do they reflect their current state, or the memorizing effect of the same question asked at a previous visit? Wagner (2005) clearly explains what “Response shift” means: Response shift is a theoretical construct that provides a framework for this investigation. In essence, it posits that people can adjust how they think about their Quality of Life when they encounter relevant new information. In this model, antecedents (e.g., demographics, personality), interact with a catalyst (intervention or change in health status) to elicit psychological mechanisms (e.g., social comparison) in order to accommodate the catalyst. Response shift then influences one’s Quality of Life evaluation. …response shift per se refers to a change in one’s evaluation of Quality of Life as a result of: (a) a redefinition of the target construct (i.e., reconceptualization); (b) a change in values (i.e., the importance of component domains constituting the target construct), or (c) a change in internal standards of measurement (scale recalibration in psychometric terms).

392

M. Mesbah

Non Differential Instrument Functioning (Non DIF) methodology can be partly used to analyze longitudinal data affected by “response shift” noises. It mainly consist to check that, between two times t1 and t2 , the conditional distribution to the latent variable of the instrument measurement values is unchanged.

6. Simulation results A large number of simulations were performed, that we summarize below. The data have been simulated following respectively three model types: the parallel model (PM), the Rasch model (RM) and the Partial Credit model (PCM). The number of questions was ten for all simulations. For the difficulty parameters, the chosen values were the percentiles of a standard normal distribution, cutting out the area under the curve into ten classes of equal probabilities. For the PCM model, we selected a number of levels equal to three (3) for all questions. The first item parameters are identical to those of the Rasch model. The second parameter item is obtained by translation of the first, so that the area under the curve and between the two parameters, is always equal to 0.05. Sample sizes selected were 30, 50, 100, 150, and 200. Finally, for the same type of models (PM, RM, or PCM) and the same number of items (10), we simulated multivariate models in which, each time, the first five items were unidimensional, depending on a latent variable 1 and the last five, also unidimensional, but also dependent on another latent variable 2 , chosen simply as independent from 1 . Latent variables were simulated as standard normal. Then, we have graphing the alpha curve, for each set of simulated items. All simulations and graphs were made with SAS® software. Detailed program can be obtained, on request, by email, from the author. The results are shown in Figs. 7 and 8. We find that: (1) For the parallel model (PM), and including small numbers, the empirical Backward Reliability Curve (alpha curve), has a form very close to the expected curve. (2) For the Rasch (RM) and Partial Credit models (PCM), the shape resembles that of the expected curve very soon. From a sample size of 200, those empirical Backward Reliability Curves, have a form, as close as the PM model, to the expected curve. These results show clearly that the α-curve can be helpful to detect multidimensionality. More interesting, we observe a break in the curves, clustering the set of items in two separate recognized dimensions.

7. Real data examples 7.1. Health related Quality of Life of diabetics in France Figure 9 is an example of application to a real data set. This data set comes from ENTRED, a large national survey (N = 3198) about Quality of Life of Diabetics in France, using a random sample of diabetic patients contacted by

393

Fig. 7. Backward Reliability Curve for unidimensional models.

Measurement and Analysis of Quality of Life in Epidemiology

394

M. Mesbah

Fig. 8. Backward Reliability Curve for bidimensional models.

Fig. 9. Backward Reliability Curve of psychological distress dimension

mail. The HrQoL Measurement Instrument used was the “Diabetic Health Profile” (Chwalow et al., 2008) . It consists of a set of 27 ordinal questions (4 levels), split up into three dimensions: “Barriers to Activity” (13 items), “Psychological Distress” (14 items), and “Desinhibited Eating” (5 items). The Backward Reliability Curve of the “Psychological Distress” dimension is shown in Fig. 9. Removing item 22 (“Do you look forward to the future?”) leads to a perfect increasing curve.

7.2. Health related Quality of Life in oncology In this example, Quality of Life was assessed among subjects involved in a cancer clinical trial (Awad et al., 2002). Quantitative scores were obtained via a HrQoL instrument by auto-evaluation. There was two treatment groups and a non ignorable dropout analysis were performed. Results are indicated in Table 1 (Mesbah et al., 2004).

Measurement and Analysis of Quality of Life in Epidemiology

395

Table 1 HrQoL analysis in a cancer clinical trial

β 0

Arm Random

0.16

A NI 0.13

SE(β 0 ) β 1

0.08

0.08 0.36

α SE(α )

0.96 0.01

SE(σ e2 ) Loglikelihood

0.17

B NI 0.09

Test Random

Statistics NI

0.033

0.35

  

 0.37 

0.08

0.08 0.32

0.95 0.01

0.96 0.01

0.95 0.01

0.35

0.32

0.70

0.71

0.571

0.576

2.19565

2.18126

0.046

0.047 896.4

0.04 927.2

0.04 857.2

SE(β 1 )

σ e2

Arm Random

0.09

963.9

0.09



 



 

In this example HrQoL, excepted for its value at dropout time, was just considered as an observed continuous score, Z. But in fact, HrQoL is not directly observed. It is an unobserved latent variable. In practice, HrQoL data consist always in a multidimensional binary or categorical observed variable named Quality of Life Scale used to measure the true unobserved latent variable HrQoL. From the Quality of Life Scale, we can derive HrQoL scores, i.e., Statistics. These scores surrogate of the true unobserved latent variable HrQoL.

7.3. Health related Quality of Life and housing in Europe This example is based on a data set from the “Housing and Health” WHO LARES Survey (Bonnefoy and LARES Group, 2004; Bonnefoy et al., 2003; Fredouille et al., 2009; Mesbah, 2009), a large survey done in eight big European cities. Eight thousand five hundred and ninteen questionnaire self rated by all persons in the selected dwellings were collected, but only people older than 18 was considered in this work. Six thousand nine hundred and twenty valid questionnaires were retained. A HrQoL score was derived after a preliminary exploratory phase based mainly on a PCA with varimax rotation, followed by a confirmatory phase using Backward Reliability Curve method (see Fig. 10, the curve for the Quality of Life Scale finally built) and Rasch model methodology (Goodness of Fit tests of Rasch models). This score can be interpreted as the estimated probability of good HrQoL or as a proportion of the best HrQoL possible. This proportion is actually the ratio of two numbers: the numerator is the number of responses positively associated with good Quality of Life, the denominator is the maximum that can reach the denominator. So, we can analyze this score by multiple logistic regression and present odds-ratio as measure of association. All computations were made with SAS® software. The obtained odds-ratios were estimated under multiple logistic regression models. The final model was chosen after a parsimonious stepwise selection of model. Table 2 shows odds-ratio between Quality of Life some selected significant housing condition factors, for few domains. Odds-ratios greater than one means that the factor is positively associated with Quality of Life. On the other side, when the odds-ratio is smaller than one, this means that the factor is negatively associated to

396

M. Mesbah

Fig. 10. The empirical Backward Reliability Curve for the Quality of Life Scale.

Table 2 HrQoL and housing information Panel block Semi-detached housing unit

0.962 1.134

(0.932; 0.993) (1.077; 1.194)

Multifamily apartment block, up to six residential units

1.122

(1.084; 1.162)

In the urban center close to a busy street Window can be opened in flat

1.095 1.080

(1.057; 1.135) (1.037; 1.124)

Window cannot be closed in flat Single-glazed windows

0.929 1.047

(0.910; 0.947) (1.020; 1.075)

Condensation signs at windows

0.937

(0.900; 0.975)

Wallpaper, paint, etc. gets off wall Shared spaces are well maintained/taken care of

0.950 1.055

(0.924; 0.977) (1.027; 1.083)

One or two Graffitis

0.891

(0.861; 0.923)

Vegetation/greenery visible on facades/windows/balconies

1.029

(1.006; 1.053)

the Quality of Life. The 95% confidence interval of this odds ratio is indicated in parentheses. Nevertheless, we must be aware that, the fact that the LARES survey was transversal (instead of longitudinal) and observational (instead of interventional), limit slightly the causal interpretation of housing factors revealed. Setting the evidence of a causal relationship is a more complex work.

Measurement and Analysis of Quality of Life in Epidemiology

397

8. Conclusion In this work, I presented main modern statistical methods and models used in the validation and analysis of a Quality of Life measure in an epidemiological context. The validation step is mainly internal and consist on analyzing the unidimensionality of the set of items (questions) forming the scale. After giving the mathematical definition of the unidimensionality (the parallel model) an empirical algorithm based on the Backward Reliability Curve to assess the validity of such model is presented. Main ideas of the extension to multi dimensional scales, and to categorical variables are then, indicated. The definition (or construction) of variables and indicators, and the analysis of the evolution of their joint distribution between various populations, times, and areas are generally two different, well-separated steps of the work for a statistician in the field of Health related Quality of Life. The first step generally deals with calibration and metrology of questionnaires. Key words are measurement or scoring, depending on the area of application. Backward Reliability Curve can be used as a tool to confirm unidimensionality of a set of items. When more than one dimension is available computation of scores and correlations between items and scores is useful to check separability of dimensions. The second step is certainly more known by most statisticians. Linear, generalized linear, time series, and survival models are very useful models in this step, where the variables constructed in the first step are incorporated and their joint distribution with the other analysis variables (treatment group, time, duration of life, etc.) is investigated. HrQoL scores, validated during the first step, are then analyzed, with a complete omission of the real observations, i.e., item responses. The latent nature of the HrQoL concept is generally neglected. Mesbah (2004) compared the simple strategy of separating the two steps with the global strategy of defining and analyzing a global model including both the measurement and the analysis step. If, with a real data set, one find a significant association between a built (from items) score and an external covariate, then the true association, i.e., the one between that external covariate and the true latent, is probably larger. So, if the scientific goal is to show an association between the true and the covariate, one do not need to use a global model: just use the model with the surrogate built score instead of the true latent. Conclusions with the built score also stand for the true. But, if, one gets no significant association between built score and the covariate, then the true association could be anything, and perhaps larger. So, one have to consider a global model, even if one do not need to build new scores or to valid the measurement model. Building a global model taking into account the latent trait parameter in a one step way, i.e., without separation between measurement and analysis is a promising latent regression approach (Christensen et al., 2004; Sébille and Mesbah, 2005) allowed by the increasing performance of computers. In HrQoL field, most of papers are devoted to a two steps approach, where the HrQoL scores are used instead of the original item responses data. Moreover, scientific results are published in different kind of scientific journals: those devoted to validation of measurements and instruments and more numerous others, specialized on analysis of previously validated measurements.

398

M. Mesbah

In this work, we have presented some simulations showing the good behavior of the Backward Reliability Curve as an estimate of the true reliability curve following from Spearman–Brown formula, even when sample size is small, and when items are simulated from Rasch or Partial Credit models. Finally we have presented three different applications to real data set, including data from a large survey done in eight big European cities: the LARES study. I used data from this to derive Quality of Life related housing scores that can be easily interpreted in term of odds-ratio. Unlike, the Quality of Life measure which is obtained by itself, and internally validated in a first step, Quality of Life related housing factors were obtained by using multiple logistic regression.

Annex: A SAS Macro for the « -curve /*A SAS Macro for the α-curve*/ %macro courbealpha (dataset=); /*The dataset must contain only numerical variables named X1, …, Xk*/. proc contents data=&dataset out = tmpxOUTL (where=(substr (name,1,1)=‘X’) keep = name crdate) noprint;run; proc sort data = tmpxOUTL nodupkey;by name;run; proc means data = tmpxOUTL;var crdate; output out = myout n (crdate)=nva;run; data _null_;set myout;call symput (“nva”,nva);%put %syseval (&nva);run; data _null_;length allvars $2000;retain allvars;set tmpxOUTL end = eof; allvars = trim (left (allvars))’ ’left (name);if eof then call symput (’varlist’, allvars);run; %put &varlist; data table;set &dataset;run; %put &nva; option mprint mlogic; %do i = 1 %to &nva; proc corr alpha data = table nomiss outp = perstat;var _all_;run; data tab1;set perstat;where _type_=‘RAWALDEL’;drop _name_ _type_;run; proc transpose data = tab1 out = tab2;run; proc sort data = tab2 out = tab3 nodupkey;by descending col1;run; data listtable&i;set tab3;count + 1; if count ne 1 then delete;n=%Sysevalf (&nva-&i); call symput (“var”,_name_);run; data table;set table;if symexist (“var”) then do; drop &var;end;run; %put &var; %end; proc contents data = work._all_ out = tmpxOUTL (where=(substr (memname,1,4)=‘LIST’) keep = memname crdate) noprint;run; proc sort data = tmpxOUTL nodupkey;by MEMname;run;

Measurement and Analysis of Quality of Life in Epidemiology

399

data _null_;length allvars $2000;retain allvars;set tmpxOUTL end = eof; allvars = trim (left (allvars))’ ’left (memname);if eof then call symput (‘varlist’, allvars);run; %put &varlist; data all;set &varlist;run; proc sort data = all out = sorted;by n;run; proc gplot data = sorted;plot col1 n;symbol1 i = j value = none pointlabel=(“#_name_” position = bottom) color = red;run; quit; %mend;

References Andersen, E.B., 1970. Asymptotic properties of conditional maximum likelihood estimators. J. Roy. Stat. Soc. Ser. B 32, 283–301. Awad, L., Zuber, E., Mesbah, M., 2002. Applying survival data methodology to analyze longitudinal Quality of Life Data. In: Mesbah, M., Cole, B.F., Lee, M.L.T. (Eds.), Statistical Methods for Quality of Life Studies: Design, Measurement and Analysis. Kluwer Academic, Boston. Bonnefoy and LARES Group, 2004. Habitat et Santé: état des connaissances. Les echos du logement, vol. 4. Bonnefoy, X.R., Braubach, M., Moissonnier, B., Monolbaev, K., Röbbel, N., 2003. Housing and Health in Europe: Preliminary Results of a Pan-European Study. Am. J. Public Health 93, 1559–1563. Chi, Y., Ibrahim, J.G., 2006. Joint models for multivariate longitudinal and multivariate survival data. Appl. Stat. 62, 432–445. Christensen, K.B., Bjorner, J.B., Kreiner, S., Petersen, J.H., 2004. Latent regression in loglinear Rasch models. Commun. Stat.—Theory Methods 33, 1295–1313. Chwalow, J., Meadows, K., Mesbah, M., Coliche, V., Mollet, E. 2008. Empirical Internal Validation and Analysis of a QoL instrument in French diabetic patients during an educational intervention. In: Huber, C., Limnios, N., Mesbah, M., Nikulin, M., (Eds) Mathematical Methods in Survival Analysis, Reliability and Quality of Life., Wiley, London. Cowling, B.J., Hutton, J.L., Shaw, J.E.H., 2006. Joint modelling of event counts and survival times. Appl. Stat. 55, 31–39. Cronbach, L.J., 1951. Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334. Curt, F., Mesbah, M., Lellouch, J., Dellatolas, G., 1997. Handedness scale: how many and which items? Laterality 2, 137–154. Dupuy, J.-F., Mesbah, M., 2002. Joint modeling of event time and nonignorable missing longitudinal data. Lifetime Data Anal. 8, 99–115. Dupuy, J.-F., Grama, I., Mesbah, M., 2006. Asymptotic theory for the Cox model with missing time dependent covariate. Ann. Stat. 34. Edwards, D., 2000. Introduction to Graphical Modelling, second ed. Springer-Verlag, New York. Fisher, G.H., Molenaar, I.W., 1995. Rasch Models, Foundations, Recent Developments and Applications. Springer-Verlag, New York. Fredouille, J., Laporte, E., Mesbah, M., 2009. Housing and mental health. In: Ormandy, D. (Eds.), Housing and Health in Europe. The Who Lares Project. Taylor and Francis, Boston. Frydenberg, M., 1990. Marginalization and collapsibility in graphical interaction models. Ann. Stat. 18, 790–805. Gelber, R.D., Goldhirsch, A., Cole, B.F., Wieand, H.S., Schroeder, G., Krook, G.E., 1996. A qualityadjusted time without symptoms or toxicity (Q-TWIST) analysis of adjuvant radiation therapy and chemotherapy for resectable rectal cancer. J. Natl. Cancer Inst. 88, 1039–1045. Hamon, A., Mesbah, M., 2002. Questionnaire reliability under the Rasch model. In: Mesbah, M., Cole, B.F., Lee, M.L.T. (Eds.), Statistical Methods for Quality of Life Studies: Design, Measurement and Analysis. Kluwer Academic, Boston.

400

M. Mesbah

Hamon, A., Dupuy, J.F., Mesbah M., 2002. Validation of model assumptions in quality of life measurements. In: Huber, C., Nikulin, N., Balakrishnan, N., Mesbah, M. (Eds.), Goodness of Fit Tests and Model Validity. Kluwer Academic, Boston. Hardouin, J.B., Mesbah, M., 2004. Clustering binary variables in subscales using an extended Rasch model and Akaike information criterion. Commun. Stat.—Theory Methods 33, 1277–1294. Hardouin, J.B., Mesbah, M., 2007. The SAS macro-program %ANAQOL to estimate the parameters of item responses theory models. Commun. Stat.—Theory Methods 36, 437–453. Kreiner, S., Cristensen, K.B., 2002. Graphical Rasch models. In: Mesbah, M., Cole, B.F., Lee, M.L.T. (Eds.), Statistical Methods for Quality of Life Studies: Design, Measurement and Analysis. Kluwer Academic, Boston. Kristof, W., 1963. The statistical theory of stepped-up reliability coefficients when a test has been divided into several equivalent parts. Psychometrika 28, 221–238. Lauritzen, S.L., 1996. Graphical Models. Oxford University Press, Oxford. Lauritzen, S.L., Wermuth, N., 1989. Graphical models for association between variables, some of which are qualitative and some quantitative. Ann. Stat. 17 (1), 31–57. Mac Cullagh, P., Nelder, J., 1989. Generalized Linear Models. Chapman and Hall, London. Martynov, G., Mesbah, M., 2006. Goodness of fit test and latent distribution estimation in the mixed Rasch model. Commun. Stat.—Theory Methods 35, 921–935. Masters, G.N., 1982. Psychometrika 47, 149–174. Mesbah, M., 2004. Measurement and analysis of health related quality of life and environmental data. Environmetrics 15, 471–481. Mesbah, M., 2009. Building quality of life related housing scores using LARES study – a methodical approach to avoid pitfalls and bias. In: Ormandy, D. (Ed.), Housing and Health in Europe. The Who Lares Project. Taylor and Francis, Boston. Mesbah, M., Singpurwalla, N., 2008. A Bayesian ponders “The Quality of Life”. In: Vonta, F., Nikulin, M., Limnios, N., Huber-Carol, C. (Eds.), Statistical Models and Methods for Biomedical and Technical Systems. Birkhauser, Boston. Mesbah, M., Lellouch, J., Huber, C., 1999. The choice of loglinear. models in contingency tables when the variables of interest are not jointly observed. Biometrics 48, 259–266. Mesbah, M, Dupuy, J.F., Heutte, N., Awad, L., 2004. Joint analysis of longitudinal quality of life and survival processes. Handbook of Statistics, vol. 23. Elsevier B.V. Mislevy, R.J., 1984. Estimating latent distribution. Psychometrika 49, 359–381. Molenaar, I.W., Sijstma, K., 1988. Mokken’s approach to reliability estimation extended to multicategory items. Psychometrika 49, 359–381. Moret L., Mesbah, M., Chwalow, J., Lellouch, J., 1993. Validation interne d’une échelle de mesure: relation entre analyse en composantes principales, coefficient alpha de Cronbach et coefficient de corrélation intra-classe. la Revue d’Epidémiologie et de Santé Publique 41 (2), 179–186. Nordman, J.F., Mesbah, M., Berdeaux, G., 2005. Scoring of visual field measured through humphrey perimetry: principal component, varimax rotation followed by validated cluster analysis. Investigat. Ophtalmol. Visual Sci. 48, 3168–3176. Nunnaly, J., 1978. Psychometric Theory, second ed. McGraw-Hill, New York. Rasch, G., 1960. Probabilistic Models for Some Intelligence and Attainment Tests. Danmarks Paedagogiske Institut, Copenhagen. Sébille, V., Mesbah, M., 2005. Sequential analysis of quality of life Rasch measurements. In: Nikouline, M., Commenges, D., Huber, C. (Eds.), Probability Statistics and Modeling in Public Health. In Honor of Marvin Zelen. Kluwer Acad. Publ., New York. The WHOQoL Group, 1994. The development of the World Health Organization Quality of Life Assessment Instrument (the WHOQoL). In: Orley, J., Kuyken, W. (Eds.), Quality of Life Assessment: International Perspectives. Springer-Verlag, Heidleberg. van Zyl, J.M., Neudecker, H., Nel, D.G., 2000. On the distribution of the maximum likelihood estimator of Cronbach’s alpha. Psychometrika 65, 271–280. Vonesh, E.F., Greene, T., Schluchter, M.D., 2006. Shared parameter models for the joint analysis of longitudinal data and event times. Statist. Med. 25, 143–163. Wagner, J.A., 2005. Response shift and glycemic control in children with diabetes. Health Quality Life Outcomes 3, 38. Whittaker, J., 1990. Graphical models in applied multivariate statistics, first ed. Wiley, New York.