Available online at www.sciencedirect.com
Journal of Business Research 61 (2008) 1219 – 1228
Questions about formative measurement James B. Wilcox a,⁎, Roy D. Howell a,1 , Einar Breivik b a
b
Area of Marketing, Rawls College of Business, Texas Tech University, Lubbock, Texas, 79409, United States Department of Strategy Management, Norwegian School of Economics and Business Administration, Bergen, Norway Received 1 May 2007; received in revised form 1 November 2007; accepted 1 January 2008
Abstract A growing body of literature addresses the issue of formative measurement. However, questions remain regarding the nature of formative measures, their properties, and their usefulness, especially in the context of theory testing and structural equations modeling. This paper poses an incomplete list of questions and suggests possible answers to them, and concludes that the use of formative measurement remains problematic in theory testing research. © 2008 Elsevier Inc. All rights reserved. Keywords: Formative measurement; Reflective measurement; Structural equations modeling
While Bollen (2002) notes that “Nearly all measurement in psychology and the other social sciences assumes effect indicators” (p. 616), an alternative conceptualization wherein observable indicators are modeled as the cause of latent constructs has also been offered and investigated (Blalock, 1964; Bollen and Lennox, 1991; Cohen et al., 1990; Diamantopoulos and Winklhofer, 2001; Edwards and Bagozzi, 2000; Fornell and Bookstein, 1982; Heise, 1972; Howell, 1987; Law et al., 1998; MacCallum and Browne, 1993; Mackenzie et al., 2005; Podsakoff et al., 2003). This paper refers to these alternative views of the direction of the relationship between latent constructs and their associated observables as reflective and formative measurement, respectively. The research cited above has clarified the distinction between formative and reflective measurement, made a clear case for the dangers of misspecifying formative models as reflective, provided a methodology for estimating structural equation models with formative indicators, and provided guidance in developing formative measures. However, many questions
⁎ Corresponding author. Tel.: +1 806 742 3438; fax: +1 806 742 2199. E-mail addresses:
[email protected] (J.B. Wilcox),
[email protected] (R.D. Howell),
[email protected] (E. Breivik). 1 Tel.: +1 806 742 1543; fax: +1 806 742 2199. 0148-2963/$ - see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.jbusres.2008.01.010
remain with regard to the usefulness, appropriateness and analytic implications of formative measures. In particular, most proponents have been silent with regard to the effectiveness of formative measures in theory testing and the accumulation of knowledge. As Churchill (1979, p. 64) notes in the article often credited with creating blind adherence (Diamantopoulos, 2006) to reflective measurement, “A critical element in the evolution of a fundamental body of knowledge…is the development of better measures of the variables.” He adds, “Researchers should have good reasons for proposing additional new measures given the many available for most…constructs of interest” (Churchill, 1979, p. 67). Reflective measurement has filled the role of creating measures of constructs that can be used in different studies by different researchers to test different theories. But what of formative measurement? Can formative measurement fill the same need? Does formative measurement allow researchers to use the same “off-the-shelf” measure in different contexts to test different theories? This question is the overarching issue addressed by this research. The article is organized around a series of questions about formative measurement that are derived from explicit claims and/or implicit processes in the formative measurement literature. The questions asked can broadly be classified as those that pertain to determining whether a construct should be (or is) measured formatively or reflectively (questions 1, 2 and 3),
1220
J.B. Wilcox et al. / Journal of Business Research 61 (2008) 1219–1228
those that address more practical analytical issues (questions 4, 5, 6 and 7) and the final question that asks why researchers should develop formative measures (question 8). The purpose of this paper is to provide provisional answers to these questions about formative measurement in the context of theory testing and the accumulation of knowledge across studies. 1. Can a researcher tell if a construct is formative or reflective? As noted above, formative measurement has received considerable attention as an underutilized approach to representing latent constructs. Diamantopoulos and Winklhofer (2001) “…believe that several marketing constructs currently operationalized by means of reflective indicators would be better captured if approached from a formative perspective” (p. 274 and see Table 1, p. 275). Likewise, Rossiter (2002) suggests a different paradigm for measure development and cites multiple examples of constructs inappropriately measured as reflective. Thus the question — can a researcher tell which is which and therefore, which to use. This paper next addresses three questions pertaining to this issue. 1. Are constructs inherently formative or reflective? According to Podsakoff et al. (2003), “Some constructs … are fundamentally formative in nature and should not be modeled reflectively” (p. 650). Similarly, Rossiter's (2002) C-OAR-SE model assumes judge or rater input to determine the nature of the construct. If constructs are inherently either formative or reflective, the researcher would be obliged to measure them accordingly. For example, Heise (1972, p. 153) suggests that, “SES is a construct induced from observable variations in income, education, and occupational prestige, and so on; yet it has no measurable reality apart from these variables which are conceived to be its determinants.” However, Kluegel et al. (1977) develop subjective SES measures that function acceptably as reflective indicators. In questioning the applicability of their conceptualization of construct validity to formatively measured constructs, Borsboom et al. (2004) note that, “One might also imagine that there could be procedures to measure constructs like SES reflectively — for example, through a series of questions like How high are you up the social ladder? Thus, that attributes like SES are typically addressed with formative models does not mean that they could not be assessed reflectively…” (p. 1069). Assessing an individual's SES through multiple informant reports, network analyses, etc. also seems possible. A given research situation or research tradition may favor either formative or reflective measurement, but constructs themselves, posited under a realist philosophy of science as existing apart from their measurement, are neither formative nor reflective. 2. Do the observables inform the decision of which to use? If the constructs themselves do not inform the formative vs. reflective decision, what about the items used to measure the construct? While in some cases determining the direction of causation between measures and their constructs
appears to be easy (Diamantopoulos and Winklhofer, 2001; Jarvis et al., 2003; Podsakoff et al., 2003) many instances exist in which a potential indeterminacy from an examination of the items alone may occur, and the larger research context must be considered. Edwards and Bagozzi (2000) suggest several criteria derived from the literature on causation that might be employed in this regard, including association, temporal precedence, and the elimination of rival causal explanations. They go on to show that a simple formative/ reflective categorization may be overly simplistic. Bollen and Ting (2000, p. 4) note that, “Establishing the causal priority between a latent variable and its indicators can be difficult,” and offer a promising empirical tool for determining whether the covariance structure among a set of items is more consistent with a formative or reflective measurement model based on vanishing tetrad analysis. Bollen and Ting's (2000) effort suggests that a simple examination of a set of indicators along with a “mental experiment” may be insufficient to make the determination. The issue is further complicated by the likelihood that “indicators of psychological constructs might be a mixture of effect and causal indicators” (Bollen and Ting, 2000, p. 3). An example is found in Diamantopoulos and Winklhofer (2001), who list several examples of constructs that suggest the use of formative measurement models based on the items used to measure the construct, such as perceived coercive power (c.f., Gaski and Nevin, 1985). The items Gaski and Nevin employ constitute a list of different areas where the supplier might possess the ability to take different kinds of action and might comply with a formative measurement model. However, the original definition forwarded by French and Raven (1959), where coercive power is defined as the target's belief that the agent has the ability to punish him or her, does not by itself suggest either a formative or reflective conceptualization. Indeed, in their study of the effects of nonverbal behavior on perceptions of a female employee's power bases, Aguinis and Henle (2001) use a reflective scale of coercive power. Again, the conceptualization of the measurement model is often more dependent on the choice of the researcher than some inherent characteristic of a particular construct. To further complicate the matter the same list of items might, depending on the wording of the general instructions, be conceptualized as either formative or reflective. The items used by Gaski and Nevin (1985) could be employed in both a formative measurement model and a reflective measurement model depending on how the general instructions are presented. If the general instructions involve future actions, the responses might reflect a general capability by the supplier. Since the instructions refer to hypothetical actions the respondents are likely to reply based on some general notion of supplier capability instead of specific actions. Conversely, if the general instructions are pointing to past behavior a formative measurement model might be more applicable. The different conceptualizations appear in Fig. 1. In their study Gaski and Nevin (1985) ask the respondents to check how much capability the supplier has to take each of the actions included on the list. This instruction is a general
J.B. Wilcox et al. / Journal of Business Research 61 (2008) 1219–1228
1221
Fig. 1. Different measurement conceptualizations involving the same list of items.
instruction that could be interpreted in several ways and the corresponding coefficient alpha of 0.69 is somewhat inconclusive with regard to which measurement conceptualization should be the preferred. Similarly, Bollen and Ting (2000) discuss the possibility that a set of indicators may be causal with respect to one construct and reflective with respect to another. In their example, a child's viewing of violent television programs, playing violent video games, and listening to music with violent themes may be formative indicators of “exposure to media violence” but reflective indicators of “propensity to seek violent entertainment”. In the example of Gaski and Nevin's (1985) “perceived coercive power” construct, perhaps renaming the construct “perceived propensity to wield coercive power” results in a reflective interpretation. If the researcher has a choice regarding the naming of the construct (given the theory), the researcher has a choice in modeling the construct. Other instances can be even more problematic. Jarvis et al. (2003) suggest that Kohli et al.'s (1993) measure of market orientation (intelligence generation, intelligence dissemination, and responsiveness) is formative. While this view may be appropriate, the conceptualization market orientation as a predisposition that leads a firm to gather, disseminate, and respond to information seems equally valid. Further, the indicators have a clear causal ordering. One cannot disseminate information that has not been generated, and at the firm level to responding to non-disseminated information is difficult, suggesting that a simplex structure might be appropriate in modeling this construct. Cohen et al.'s (1990) observation that one researcher's measurement model may be another's structural model is appropriate.
3. Do the relationships among the observables inform the decision? If constructs are not inherently reflective or formative and the items themselves do not always provide guidance as to which model to choose, perhaps the relationships among the items can provide insight. MacKenzie et al. (2005), suggest that, “indicators in this type of [reflective] measurement model should be highly correlated…” (p. 711). They also note that, “Indeed, it would be entirely consistent with this [formative] measurement model for the indicators to be completely uncorrelated” (p. 712). Perhaps then, covariance (or lack thereof) among observables will inform the decision. This question can be addressed by looking at the situations that are not described by MacKenzie, et al. (2005) in the quote above. If one can find no formative measures with highly correlated items or no reflective measures (or measures that might be deemed to be reflective) with uncorrelated items, perhaps the researcher can answer the question based on interitem correlation or lack thereof. Jarvis et al (2003) also claim that for formative measures, covariation among the indicators is not necessary or implied. Does this claim mean that formative indicators are not correlated? Does the claim mean that correlation is not relevant? Addressing the initial question first, the answer is no, the claim simply means that the source of the covariation does not (cannot) come from the latent variable being formatively measured. Thus, to the extent that formative observables are correlated, the correlation must come from somewhere else as depicted in Fig. 2. The sources of covariation are numerous with some better known than others. Response or single source bias. Because a single respondent fills out the entire questionnaire, systematic variance unrelated
1222
J.B. Wilcox et al. / Journal of Business Research 61 (2008) 1219–1228
Fig. 2. Correlated indicators in a formative model.
to the constructs themselves may be present among otherwise unrelated items (Podsakoff and Organ, 1986). Artifact. Given the common use of integer rating scales, floor or ceiling effects can lead to spurious correlation among items. Halo. Halo effect occurs when one attribute is used to generalize about other attributes of the same object, even though the attributes are unrelated (Fisicaro and Lance, 1990). Whether halo is attributable to cognitive bias or cognitive laziness, the result is correlation among the traits. Ecological or structural. As noted above, one person's measurement model is another person's structural model (Cohen et al., 1990). To the extent that structural relationships are strong and the structural components are converted to a measure, the items may be correlated. Importantly, note that reflective measures are subject to these same effects. Neither formative nor reflective indicators are more susceptible to the influences noted, but covariation among items is of little use in identifying whether indicators are formative or reflective. Correlation among formative indicators, even high correlation, may be possible. Is this situation of any importance? As pointed out by Diamantopoulos and Winklhofer (2001), a formative measure is essentially a multiple regression with the construct representing the dependent variable and the indicators as the predictors. Therefore correlation among the indicators results in multicollinearity which can lead to instability in coefficients. High collinearity creates a difficulty in determining the individual contribution of each indicator making item selection a challenge. As a result, Diamantopoulos and Winklhofer (2001) suggest using normal regression diagnostics in order to delete redundant items. As discussed more fully below (questions 4, 5) the regression approach is problematic in that the regressed variable actually represents the shared variance of the outcome variables (observed or latent) of the formative measure (Diamantopoulos, 2006). Thus the coefficients of the indicators of a formatively measured construct would change depending on the endogenous variables selected to identify the model. Adding reflective items is generally recommended to identify a formative model and is referred to as a MIMIC model (Diamantopoulos and Winklhofer, (2001). In the context of theory development and testing, the objective of the researcher is to create and/or use measures of constructs that have value in more than one model.
Borsboom, et al. (2003) describe the interest in latent variables as stemming from “…the intuitive appeal of explaining a wide range of behaviors by invoking a limited number of latent variables” (p. 203). In essence, as the outcome variables change, so will the measure and the researcher's reaction to the measure. Thus, correlation among the observables can be a very real problem. An alternative to the regression formulation is for the items to be summed to form an index (equivalent to assuming that all betas of the observables are fixed to 1). In this case, correlated items will implicitly add more weight to the underlying cause of the correlation; the higher the correlation, the greater the impact. Another possibility would be to conduct a principal components analysis on a set of items and keep one item from each of the orthogonal components, thus insuring that inter-item correlation is not a problem. Unfortunately, a further argument against forming an index is that summing uncorrelated items can result in information loss (Howell et al., 2007). Bollen (1984) has demonstrated analytically that true reflective items must be correlated with one another. However, consider other measurement models which in a thought experiment appear to be reflective (i.e., the arrows go from the construct to the items or in Bollen's terms, appear to be effect indicators). Borsboom et al. (2003) suggest that this situation is a neglected area and cite the relationship between diagnostic criteria and mental disorders in the Diagnostic and Statistical Manual of Mental Disorders (1994). In this view the items used to measure the construct may be alternate manifestations of the construct. Thus, in at least some cases, observables that appear to be caused by a latent trait do not correlate, suggesting that inter-item correlation is not a useful criterion for distinguishing formative and reflective measures. An alternative view (Law et al., 1998) might consider these measures as a profile model. Thus, correlation among indicators is not only of little value in determining the appropriateness of formative or reflective measurement models, such correlation is problematic in its own right with regard to unwanted meaning. At best, unrecognized correlation among items may lead to lack of stability in regression coefficients; at worst, the items may carry meaning from some unintended and undesirable construct, and the normal procedures for identifying multicolinearity in regression are not necessarily consistent across uses of the formative measure. Thus, the statement that items in a formative measure need not correlate can lead to an unwarranted confidence in the quality of the measure. 2. What is the impact of formative measures in structural equation models? Prior to addressing the remaining questions, a review of current practices with regard to formative measures in structural equation models may be helpful. First, note that two forms of formative measures can be considered. Eq. (1) expresses the latent variable as a linear combination of observables: g ¼ g1 x1 þ g2 x2 þ N þ gn xn :
ð1Þ
J.B. Wilcox et al. / Journal of Business Research 61 (2008) 1219–1228
Two points are relevant with regard to this formulation. First, this model cannot be estimated without reference to a dependent variable or construct and second, this model implies that all relevant observables are included as predictors or else the measure would, by definition, be mispecified. To avoid the misspecification problem, an alternative formative measure has been suggested by Bollen and Lennox (1991), Diamantopoulos and Winklhofer (2001), Jarvis et al. (2003), and Law et al. (1998) and explored by MacCallum and Browne (1993). This specification includes an error term, ζ: g ¼ g1 x1 þ g2 x2 þ N þ gn xn þ f:
ð2Þ
Estimating the parameters of Eq. (2) requires inclusion of outcome measures. According to MacKenzie et al. (2005) each construct with formative indicators must emit paths to at least two unrelated reflective indicators, two unrelated latent constructs or some combination of the two. Fig. 3 represents a model with two latent constructs as outcomes. Notice in this formulation the impact of formative indicators on the outcomes is completely mediated by the latent construct. 4. Does the meaning of a formatively measured construct depend on the dependent variable(s) or construct(s) included in the model? As recognized by Heise (1972), a construct measured formatively is not just a composite of its measures; rather, “it is the composite that best predicts the dependent variable in the analysis … Thus the meaning of the latent construct is as much a function of the dependent variable as it is a function of its
1223
indicators” (p. 160). This situation is illustrated in the examples presented in Figs. 3 and 4. When dependent constructs change the empirical nature of the formatively measured construct changes. Fig. 3 presents a properly specified formative measurement model. Each of the formative observables (x1 to x4) contributes to the latent variable, such that its empirical realization is consistent with the content of the indicators and presumably its conceptual definition. Fig. 4 depicts a model with the same formative indicators but different outcomes. In contrast to the formatively measured construct in Fig. 3, the formatively measured construct in Fig. 4 is weakly associated with x1 and x2, but strongly associated with x3 and x4. Suppose, for example, that the formatively measured construct is socioeconomic status (SES), following Heise (1972), and assume that x1 is income and x4 is education. SES in Fig. 3 is closely related to both income and education, while in Fig. 4 SES is a strong function of education but not income. That is, SES in one model is not SES in another model — its empirical realization depends on the outcomes included in the model. This situation is problematic, since the nominal definition of the construct has not changed. Burt (1976) refers to this problem as interpretational confounding: “The problem, here defined as ‘interpretational confounding’, occurs as the assignment of empirical meaning to an unobserved variable which is other than the meaning assigned to it by an individual a priori to estimating unknown parameters. Inferences based on the unobserved variable then become ambiguous and need not be consistent across separate models” (p. 4). While Burt's discussion is in the context of reflective measurement, where interpretational confounding can indeed be a problem in the face of few, weak
Fig. 3. Illustration of a formative model. Note. All ly equal 1. Theta epsilon diagonal, te1–te4 equal 0.84. Correlations between all x variables set to 0.2. Chisquare = 0.03 (d.f. = 46). Squared Multiple Correlations for Reduced Form (SMCRF): eta1 = 0.66 (formative construct), eta2 = 0.64, eta3 = 0.43. Covariance matrix found in Appendix.
1224
J.B. Wilcox et al. / Journal of Business Research 61 (2008) 1219–1228
Fig. 4. Illustration of a formative model with different endogenous constructs. Note. All ly equal 1. Theta epsilon diagonal, te1–te4 equal 0.84. Correlations between all x variables set to 0.2. Chi-square = 0.03 (d.f. = 6). Squared Multiple Correlations for Reduced Form (SMCRF): eta1 (formative construct) = 0.50, eta4 = 0.26, eta5 = 0.47. Covariance matrix found in Appendix.
reflective measures, interpretational confounding is an obvious problem with formative measurement (Howell et al., 2007). The nature of the formatively measured construct changes from model to model and study to study depending on what the formatively measured construct is predicting. If a construct named “A” in one study is substantively different from a construct named “A” in another study, accumulation of knowledge regarding a construct is rendered meaningless or impossible, since the construct in one study is incommensurable with a different construct, but with the same name, in another study. Blalock (1982) makes this point clear in his chapter entitled “The comparability of measurement” (pp. 57–107). Blalock (1982) observes that, “Whenever measurement comparability is in doubt, so is the issue of the generalizability of the underlying theory…If the theory succeeds in one setting but fails in another, and if measurement comparability is in doubt, one will be in the unfortunate position of not knowing whether the theory needs to be modified, whether the reason for the differences lies in the measurement-conceptualization process, or both” (p. 30). The problem Blalock (1982) discusses is apparent in reviews of the SES literature. For example, Bradley and Corwyn (2002) note that the predictive value of specific composites of SES yield inconsistent results across research contexts. They observe that at times components of SES are similar in their correlations with outcome measures, while “At other times they appear to be tapping into different underlying phenomena and seem to be connected to different paths of influence…”(p. 373). 5. What is the error term associated with formatively measured constructs?
Podsakoff et al. (2003) and MacKenzie et al. (2005) claim that models with formative indicators (Eq. (2)) involve constructs with “surplus meaning”. That is, the construct (or latent variable) contains meaning “over and above its simple and mathematical representation” (Podsakoff et al., 2003, p. 621). The rationale for granting surplus meaning to the formative construct in a confirmatory model is that the composite latent variable takes measurement error into account (see Podsakoff et al., 2003, p. 623). This claim is based on the procedure offered by MacCallum and Browne in that “the latent variable would be defined as a linear function of the indicators, plus a disturbance term” (1993, p. 533). However, Diamantopoulos (2006, p. 14–15) notes that “… any estimate of the error term — and hence the surplus meaning of the construct — is not only a function of the selected indicators but also depends on the selection of the additional constructs or measures used to attain model identification…and thus, the selection of the ‘external’ (i.e. additional variables necessary for achieving identification) is just as crucial in a formative measurement model as is the selection of the formative indicators themselves” (p. 14–15). Selection of the outcome variables is crucial but surplus meaning has nothing to do with the formatively measured construct. In the MIMIC model, the dependent variable (that which is regressed on the formative indicators) is the shared variance of the reflected variables or constructs. The error term then, is realized as the shared variance between the outcomes not accounted for by the formative indicators. Thus, surplus meaning has to do with the endogenous variables, not the formative construct.
J.B. Wilcox et al. / Journal of Business Research 61 (2008) 1219–1228
This issue can also be illustrated by a comparison of the models in Figs. 3 and 4. The model in Fig. 3 suggests that the formative construct explains a substantial portion of the variance in both η2 and η3. More interesting is the error term associated with the formative construct. The model suggests that 34% of the variance in the formative construct is unexplained and hence represents surplus meaning. Based on the above proposed interpretation, the understanding of the disturbance term is that it is something that is associated with the formative construct. As Diamantopoulos (2006) notes, and as can be seen in Fig. 4, the disturbance term changes when different endogenous constructs are included in the model without any changes to the formative measures. In this model the formative construct (η1) is not as influential in explaining the endogenous constructs. More importantly, the disturbance term of the construct increases and the exogenous variables explain less variance in the construct (50% as opposed to 66% of the previous model). Furthermore, the relative contribution of the exogenous variables is altered as compared to the model presented in Fig. 3. Given that the exogenous variables (formative measures) are the same in both models the interpretation of the disturbance term as being associated with the formative construct appears to be ill-founded. This point is further illustrated in the model found in Fig. 5. This model is comparable to the one presented in Fig. 3 except that the association between the endogenous construct η2 and η3 is smaller. The reduction in the disturbance term (from 34% to 7% unexplained variance) further suggests that the error term is
1225
associated with the covariance of the endogenous constructs one wants to explain rather than the formative measures used to interpret the meaning of the formative construct. Hence, Diamantopoulos (2006) may be correct in questioning the idea that the disturbance term represents surplus meaning of the formative construct. This evidence suggests that η1 in Figs. 3–5 may be more appropriately considered a (second-order, in this case) factor deriving its meaning from the variance common to η2 and η3, and predicted by x1–x4. 6. Should a structural equation model with one or more formatively measured constructs be expected to exhibit adequate model fit? Formative indicators need not covary, need not have the same nomologic net, and hence “are not required to have the same antecedents and consequences” (Jarvis et al., 2003, p. 203). Indeed, referring to indicators used to (formatively) measure charismatic leadership, Podsakoff et al. (2003) suggest that, “Moreover, the antecedents and consequences of these diverse forms of leader behavior would not necessarily be expected to be the same” (p. 650). If this is the case, how can they be summarized as a single, meaningful construct with common antecedents and consequences? The models in Figs. 3–5 imply that the effects of the xi are completely mediated (Baron and Kenny, 1986) by η1. The xi as formative indicators are not required to have the same consequences, and thus x1 may relate strongly to η2 and weakly to η3, while x3, for example, may relate strongly to η3 and weakly or negatively with η2 — yet their effects are hypothesized to flow through a single construct (η1) with some unitary
Fig. 5. Illustration of a formative model, covariance between y1–y4 and y5–y8 lower than the covariance between y1–y4 and y5–y8 in Fig. 3 (0.99 as opposed to 1.39, see covariance matrix in Appendix). Note. All ly equal 1. Theta epsilon diagonal, te1–te4 equal 0.84. Correlations among all x variables set to 0.2. Chisquare = 0.02 (d.f. =46). Squared Multiple Correlations for Reduced Form (SMCRF): eta1 (formative construct) = 0.93, eta2 = 0.64, eta3 = 0.43. Covariance matrix found in Appendix.
1226
J.B. Wilcox et al. / Journal of Business Research 61 (2008) 1219–1228
interpretability. That is, η1 is “something”, and it is supposed to be the same “something” in its relationship with both η2 and η3, yet each x is connected to η1 through only one γ. In the case where the xi relates differently to the included ηi, (a) substantial lack of fit in the model will be evident, and (b) it will be difficult to interpret the meaning of η1 either in terms of the (potentially nonexistent) covariance of the η2 and η3 or the formative indicators. As illustrated by Howell et al. (2007) it is easy to appreciate the difficulty in fitting a model under these circumstances, and the difficulty should be readily apparent in the magnitude of lack of fit. The formatively measured construct fails to function as a “point variable” (Burt, 1976). The formative items lack external consistency, discussed by Anderson and Gerbing (1982) as when the indicators correlate with other constructs in proportion to their correlation with their own construct. In a structural equation model, external consistency is necessary for fit in formatively as well as reflectively measured constructs. Since reflective measures are conceptualized as having a common cause, they can be expected to intercorrelate (except when alternate manifestations of the construct are used as indicators) and one has some reason to believe that they might therefore relate similarly to other constructs. Internal consistency among formative indicators is not applicable, however, so external consistency becomes the sole condition for assessing the degree to which a formatively measured construct functions as a unitary entity. As noted previously, however, Bollen and Lennox (1991) and Jarvis et al. (2003) among others, note that formative indicators need not covary, need not have the same nomologic net, and hence “are not required to have the same antecedents and consequences” (Jarvis et al., 2003, p. 203). This view seems directly at odds with the concepts of point variability and external consistency. The researcher has no reason to expect a model containing formative indicators to fit. In a model where formative indicators of a construct have different consequences, lack of fit is to be expected. 7. Are (formative) causes of constructs necessary for their definition and measurement? Jarvis et al. (2003, p. 214) refer to a model similar to Fig. 3 but with five x's, where η2 and η3 are interpreted as, “contentvalid measures tapping the overall level of the construct (e.g., overall satisfaction, overall assessment of perceived risk, overall trust, etc.), and their x1–x5 are measures of the key conceptual components of the construct (e.g., facets of satisfaction, risk, or trust).” They suggest (p. 214) that the facets and the overall measures should be considered as a measurement model with five formative and two reflective indicators. “Indeed, in this instance, it would not make sense to interpret this structure as five exogenous variables … influencing a separate and distinct endogenous latent construct … with two reflective indicators, because the five causes of the construct are all integral aspects of it and the construct cannot be defined without reference to them” (Jarvis et al., 2003, p. 214, emphasis added). First note that one of the criteria for causation employed by Edwards and Bagozzi (2000) is the requirement that the cause and the effect be distinct entities. It
does not make sense to say that formative indicators are both causes and integral aspects of a construct. Further, satisfaction can and has been defined without reference to its antecedents, and numerous studies have measured overall satisfaction and its consequences (e.g., loyalty, repurchase intention, likelihood of recommending) without any measurement of its antecedents. Admittedly, understanding the drivers of satisfaction can lend insight and managerial relevance, but they need not be considered definitional. Similarly, MacKenzie et al. (2005) propose a model with two reflective indicators of job satisfaction (“Overall, how satisfied are you with your job?” and “Generally speaking, I am very satisfied with all facets of my job”) along with several predictors (such as satisfaction with pay, coworkers, supervisor, the work itself, etc.). They argue that such a model “should be viewed as a single latent construct with a mixture of formative and reflective indicators rather than as a single reflective-indicator latent construct with multiple causes because the indicators… all relate to the same conceptual domain specified in the construct definition and are all content valid operationalizations of the same construct (p. 727)”. However, overall job satisfaction can be defined and measured without reference to its causes (see, for example, Bagozzi, 1980 or Scarpello and Campbell, 1983). Further, satisfaction with pay, for example, is not a valid operationalization of the same construct (overall satisfaction). Again, one may wish to examine predictors of job satisfaction, but its sources are not an inherent part of its definition and are unnecessary for its measurement. A similar perusal of the exemplars of constructs with formative indicators presented by Jarvis et al. (2003), the formative examples given by Diamantopoulos and Winklhofer (2001), the constructs considered by Bollen and Lennox (1991) and the cases investigated by Bollen and Ting (2000) suggest few that could not be, at least conceptually, defined without reference to their causes and measured reflectively. Constructs need not be defined by their antecedents. 8. Why develop formative measures? Citing difficulties arising from identifying formative models with measurement error at the construct level (see MacCallum and Browne, 1993), Jarvis et al. (2003) suggest that, “In our view, the best option for resolving the identification problem is to add two reflective indicators to the formative construct, when conceptually appropriate. The advantages of doing this are that (a) the formative construct is identified on its own and can go anywhere in the model, (b) one can include it in a confirmatory factor model and evaluate its discriminant validity and measurement properties, and (c) the measurement parameters should be more stable and less sensitive to changes in the structural relationships emanating from the formative construct” (p. 213). Similarly, the formative scale development processes described by Diamantopoulos and Winklhofer (2001) and MacKenzie et al. (2005) also require at least two reflective indicators. Reflective indicators should be obtained but if the researcher is able to obtain two reflective indicators, obtaining at least
J.B. Wilcox et al. / Journal of Business Research 61 (2008) 1219–1228
1227
indicators of a single construct may have different consequences, the indicators should not be expected to perform as a unitary entity in a structural equations model, and thus adequate model fit should not be expected; 7) Causes of constructs are not necessary for their definition, and thus the so called formative measures of a construct with reflective measures are optional, not necessary; and 8) Since recent work on developing formative measures suggests that at least two reflective measures be obtained, by simply adding at least one more reflective item formative measurement ceases to be an issue. Causes of the reflectively measured construct can be modeled, but they need not be considered measurement. In dealing with existing formative measures, the suggestion of Howell et al. (2007) that the formative items be examined as independent predictors with separate direct effects on the outcome variables of interest is appropriate. In the case of a single outcome, such a direct effects model and the formative model in Eq. (1) are equivalent. Further, interaction effects among the formative indicators can be considered, providing potentially greater insight. In summary, this study supports Borsboom et al. (2003) and Ping (2004) who question the status of formatively measured constructs as latent variables. The provisional answers to the questions raised above suggest that, in the context of theory testing, formative measurement (at this stage of development, at least) should not be considered an equally good alternative to the reflective measurement model which has served the social sciences well for many decades. Perhaps formative measurement conceptualizations and procedures that overcome the problems raised will be developed. Given the intuitive appeal and potential practical benefits of formative measurement, researchers may benefit from such efforts.
one more reflective indicator (but preferably several more) would result in a testable reflectively measured construct, and the problem of developing formative measures vanishes. If the researcher wants to predict the construct in question, then the formative part of the model is an option, but the model and its parameters have a much more straightforward interpretation. 3. Summary While formative measurement has received increasing attention, employing formative measurement in theory testing research remains somewhat problematic. With regard to the meaning of formatively measured constructs, this paper suggests that 1) Constructs themselves are inherently neither formative nor reflective, suggesting that the researcher often has a choice between formative and reflective measurement; 2) The indicators in a measurement model do not inform the decision as to whether items should be modeled formatively or reflectively; and 3) Inter-indicator correlation is not a valid basis to determine whether items should be modeled formatively or reflectively. With regard to analysis issues the paper concludes that 4) The empirical meaning of a formatively measured construct depends on the outcome variables in the model, such that while the name of a formatively measured construct may remain the same, the construct's empirical realization will vary from model to model and study to study; 5) The meaning of the error term associated with formatively measured constructs in structural equations models is more closely associated with the constructs dependent on the formative construct and their correlation than on the formative measures; 6) Since formative
Appendix A Covariance matrix for examples in Figs. 3–5
y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14 y15 y16 x1 x2 x3 x4
y1
y2
y3
y4
y5
y6
y7
y8
y9
y10
y11
y12
y13
y14
y15
y16
x1
x2
x3
x4
3.35 2.51 2.51 2.51 1.39 1.39 1.39 1.39 1.02 1.02 1.02 1.02 0.73 0.73 0.73 0.73 0.95 0.84 0.73 0.62
3.35 2.51 2.51 1.39 1.39 1.39 1.39 1.02 1.02 1.02 1.02 0.73 0.73 0.73 0.73 0.95 0.84 0.73 0.62
3.35 2.51 1.39 1.39 1.39 1.39 1.02 1.02 1.02 1.02 0.73 0.73 0.73 0.73 0.95 0.84 0.73 0.62
3.35 1.39 1.39 1.39 1.39 1.02 1.02 1.02 1.02 0.73 0.73 0.73 0.73 0.95 0.84 0.73 0.62
1.62 1.22 1.22 1.22 1.02 1.02 1.02 1.02 0.73 0.73 0.73 0.73 0.54 0.48 0.42 0.35
1.62 1.22 1.22 1.02 1.02 1.02 1.02 0.73 0.73 0.73 0.73 0.54 0.48 0.42 0.35
1.62 1.22 1.02 1.02 1.02 1.02 0.73 0.73 0.73 0.73 0.54 0.48 0.42 0.35
1.62 1.02 1.02 1.02 1.02 0.73 0.73 0.73 0.73 0.54 0.48 0.42 0.35
1.73 0.89 0.89 0.89 0.78 0.78 0.78 0.78 0.19 0.24 0.34 0.38
1.73 0.89 0.89 0.78 0.78 0.78 0.78 0.19 0.24 0.34 0.38
1.73 0.89 0.78 0.78 0.78 0.78 0.19 0.24 0.34 0.38
1.73 0.78 0.78 0.78 0.78 0.19 0.24 0.34 0.38
1.81 1.41 1.41 1.41 0.32 0.40 0.56 0.64
1.81 1.41 1.41 0.32 0.40 0.56 0.64
1.81 1.41 0.32 0.40 0.56 0.64
1.81 0.32 0.40 0.56 0.64
1.0 0.2 0.2 0.2
1.0 0.2 0.2
1.0 0.2
1.0
Note. Covariances between y1–y4 and y5–y8 are 0.99 instead of 1.39 for Fig. 5.
1228
J.B. Wilcox et al. / Journal of Business Research 61 (2008) 1219–1228
References Aguinis H, Henle C. Effects of nonverbal behavior on perceptions of a female employee's power bases. J Soc Psychol 2001;141:537–49. Anderson JC, Gerbing DW. Some methods for respecifying measurement models to obtain unidimensional construct measurement. J Mark Res 1982;19:453–60. Bagozzi RP. Performance and satisfaction in an industrial sales force: an examination of their antecedents and simultaneity. J Mark Res 1980;44:65–77. Baron RM, Kenny DA. The moderator–mediator variable distinction in social psychological research: conceptual, strategic and statistical considerations. J Personali Soc Psychol 1986;51:1173–82. Blalock HM. Causal inference in nonexperimental research. New York: Norton; 1964. Blalock HM. Conceptualization and measurement in the social sciences. Beverly Hills: Sage; 1982. Bollen KA. Multiple indicators: internal consistency or no necessary relationship? Quality and Quantity 1984;18:377–85. Bollen KA. Latent variables in psychology and the social sciences. Ann Rev Psychol 2002;53:605–34. Bollen KA, Lennox R. Conventional wisdom on measurement: a structural equation perspective. Psychol Bull 1991;110:305–14. Bollen KA, Ting K. A tetrad test for causal indicators. Psychol Methods 2000;5:3–32. Borsboom D, Mellenbergh GJ, van Heerden J. The theoretical status of latent variables. Psychol Rev 2003;110:203–19. Borsboom D, Mellenbergh GJ, van Heerden J. The concept of validity. Psychol Rev 2004;111:1061–71. Bradley RH, Corwyn RF. Socioeconomic status and child development. Ann Rev Psychol 2002;53:371–99. Burt RS. Interpretational confounding of unobserved variables in structural equation models. Soc Meth Res 1976;5:3–52. Churchill GA. A paradigm for developing better measures of marketing constructs. J Mark Res 1979;16:64–73. Cohen P, Cohen J, Teresi J, Marchi M, Velez C. Problems in the measurement of latent variables in structural equations causal models. App Psychol Measure 1990;14:183–96. Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association, 4th ed. 1994. Diamantopoulos A. The error in formative measurement models: interpretation and modeling implications. J Modell Manag 2006;1:7–17. Diamantopoulos A, Winklhofer HM. Index construction with formative indicators. J Mark Res 2001;38:269–77.
Edwards JR, Bagozzi RP. On the nature and direction of relationships between constructs and measures. Psychol Methods 2000;5:155–74. Fisicaro SA, Lance CE. Implications of three causal models for the measurement of halo error. App Psychol Meas 1990;14:419–29. Fornell C, Bookstein FL. Two structural equation models: LISREL and PLS applied to consumer exit – voice theory. J Mark Res 1982;10:440–52. French JR, Raven B. The bases of social power. In: Cartwright D, editor. Studies of social power. Ann Arbor, MI: Institute for Social Research; 1959. Gaski JF, Nevin JR. The differential effects of exercised and unexercised power sources in a marketing channel. J Mark Res 1985;22:130–42. Heise DR. Employing nominal variables, induced variables, and block variables in path analysis. Soc Meth Res 1972;1:147–73. Howell RD. Covariance structure modeling and measurement issues: a note on Interrelations among a channel entity's power sources. J Mark Res 1987;14:119–26. Howell RD, Breivek E, Wilcox JB. Reconsidering formative measurement. Psychol Methods 2007;12:205–18. Jarvis CB, MacKenzie SB, Podsakoff P. A critical review of construct indicators and measurement model misspecification in marketing and consumer research. J Cons Res 2003;30:199–216. Kohli AK, Jaworski BJ, Kumar A. MARKOR: a measure of market orientation. J Mark Res 1993;30:467–77. Kluegel JR, Singleton R, Starnes CE. Subjective class identification: a multiple indicators approach. Amer Soc Rev 1977;42:599–611. Law KS, Wong C, Mobley WH. Toward a taxonomy of multidimensional constructs. Acad Manage Rev 1998;23:741–55. MacCallum RC, Browne MW. The use of causal indicators in covariance structure models: some practical issues. Psychol Bull 1993;114:533–41. MacKenzie SB, Podsakoff PM, Jarvis CB. The problem of measurement model specification in behavioral and organizational research and some recommended solutions. J App Psychol 2005;90:710–30. Ping Jr RA. On assuring valid measurement for theoretical models using survey data. J Bus Res 2004;57:125–41. Podsakoff P, Organ D. Self reports in organizational research: problems and prospects. J Manage 1986;12:531–44. Podsakoff PM, MacKenzie SB, Podsakoff NP, Lee JY. The mismeasure of man (agement) and its implications for leadership research. Leadership Quarter 2003;14:615–56. Rossiter JR. The C-OAR-SE procedure for scale development in marketing. Intern J Res Mark 2002;19:305–35. Scarpello V, Campbell JP. Job satisfaction: are all the parts there? Personnel Psychol 1983;36:577–600.