Experimental designs in management and leadership research: Strengths, limitations, and recommendations for improving publishability

The Leadership Quarterly xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect The Leadership Quarterly journal homepage: www.elsevier.com/lo...

Download PDF

925KB Sizes 0 Downloads 59 Views

Report

PDF Reader
Full Text

The Leadership Quarterly xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

The Leadership Quarterly journal homepage: www.elsevier.com/locate/leaqua

Experimental designs in management and leadership research: Strengths, limitations, and recommendations for improving publishability ⁎

Philip M. Podsakoﬀa, , Nathan P. Podsakoﬀb a b

Department of Marketing, Warrington College of Business Administration, University of Florida, Gainesville, FL 32611, United States of America Department of Management and Organizations, Eller College of Management, University of Arizona, Tucson, AZ 85720-1080, United States of America

A R T I C LE I N FO

A B S T R A C T

Keywords: Experimental designs Laboratory experiments Field experiments Quasi-experiments Causal inference

Despite the renewed interest in the use of experimental designs in the ﬁelds of leadership and management over the past few decades, these designs are still relatively underutilized. Although there are several potential reasons for this, chief among them is misunderstanding the value of these designs. The purpose of this article is to review the role of laboratory, ﬁeld, and quasi-experimental designs in management and leadership research. We ﬁrst discuss the primary goals of experimental studies. Next, we examine the characteristics of experimental designs and how to distinguish laboratory, ﬁeld, and quasi-experiments from one another and from non-experimental studies. Following these discussions, we provide examples of each type of experimental design and discuss their relative strengths and limitations. Finally, we discuss steps that researchers can take to increase the probability of having articles reporting experiments accepted by leadership and management journals.

We consider the … experiment to be the core research method … In advocating the experimental method, we are taking it as axiomatic that the purpose for which this method is best suited is that of testing theory rather than describing the world as it is. Without doubt, for descriptive and exploratory purposes, there are alternative models of systematic observation and data collection that can better serve the needs of the researcher. However, for subjecting theory-inspired hypotheses about causal relationships to potential conﬁrmation or disconﬁrmation, the experiment is unexcelled in its ability to provide unambiguous evidence about causation, to permit control over extraneous variables, and to allow for analytic exploration of the dimensions and parameters of a complex phenomenon. (Aronson, Brewer, & Carlsmith, 1985, p. 443) Introduction If the growing percentage of articles published in industrial/organizational (I/O) psychology and management over the past few decades

is any indication, then there is renewed interest in the use of experimental designs. Several authors (Austin, Scherbaum, & Mahlman, 2002; Colquitt, 2008; Griﬃn & Kacmar, 1991; Scandura & Williams, 2000; Stone-Romero, Weaver, & Glenar, 1995; Taylor, Goodwin, & Cosier, 2003) have chronicled the downward trend in publication rates of experimental studies, particularly those conducted in laboratory settings, from the 1960s through the late 1990s. However, our examination of recent leadership, management, and I/O psychology publications suggests that this trend may be reversing.1 Indeed, the results of our review indicated that although the percentage of articles using laboratory, ﬁeld, or quasi-experimental designs remained relatively stable between 1990 and 2009 at around 7%, it increased to 8% during 2010–2014, and then to almost 11.5% during 2015–2018, with the vast majority of the experiments (about 83%) conducted in the laboratory. One obvious reason for the use of experimental designs is their ability to provide evidence of causality (Antonakis, Bendahan, Jacquart, & Lalive, 2010; Campbell & Stanley, 1963; Colquitt, 2008; Falk & Heckman, 2009). Indeed, as indicated by the quotation at the beginning of this article, the power of experiments to establish causeand-eﬀect relationships is critical to the development of knowledge in

⁎

Corresponding author. E-mail addresses: [email protected]ﬂ.edu (P.M. Podsakoﬀ), [email protected] (N.P. Podsakoﬀ). 1 We searched the Academy of Management Journal, Administrative Science Quarterly, Journal of Applied Psychology, Journal of Organizational Behavior, Leadership Quarterly, Journal of Management, and Personnel Psychology, using the key words “experiment,” “experiments,” “experimental,” “laboratory experiment,” “quasiexperiment,” “quasi-experiments,” “ﬁeld experiment,” and “ﬁeld experiments.” We excluded articles that were not empirical in nature; using the remaining articles, we calculated the percentage of articles that were experimental in nature in six time periods (1990–1994, 1995–1999, 2000–2004, 2005–2009, 2010–2014 and 2015–March, 2018). https://doi.org/10.1016/j.leaqua.2018.11.002 Received 14 June 2018; Received in revised form 28 October 2018; Accepted 5 November 2018 1048-9843/ © 2018 Published by Elsevier Inc.

Please cite this article as: Podsakoff, P.M., The Leadership Quarterly, https://doi.org/10.1016/j.leaqua.2018.11.002

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

encouraging, there remains a need for a more comprehensive discussion of the strengths and potential limitations of experimental designs. Therefore, this article is intended to provide an integrative review of the role of experiments in management and leadership research. First, we discuss the basic goals of experimental designs and identify the conditions necessary to establish causal relationships. Next, we: (a) examine the characteristics of experiments and discuss how to distinguish between laboratory, ﬁeld, and quasi-experimental designs, (b) provide examples of each type from the leadership literature, and (c) discuss the relative strengths and limitations of each type of experimental design. Finally, we discuss some of the practical issues researchers encounter when using experimental methods in various research settings and provide some recommendations for addressing them. Although this article is intended primarily for doctoral students, our recommendations should also prove worthwhile for any researcher interested in improving the rigor and publishability of their experimental research.

the organizational and behavioral sciences. It is therefore not surprising that Jones (1985, p. 282) argued that experiments are “the most powerful technique[s] available for demonstrating causal relationships between variables,” and other scholars (Antonakis, 2017; Eden, 2017; Hauser, Linos, & Rogers, 2017; Holmes, 2014; Kenny, 1979) have referred to them as the “gold standard” of scientiﬁc research. Similarly, several researchers (Aronson et al., 1985; Colquitt, 2008; Fisher, 1984; Ilgen, 1986) have noted the importance of experimental designs for testing theory and helping us to develop a better understanding of the complex world in which we live. Finally, we do not believe that it is coincidental that the growth in the percentage of experimental studies over the past decade follows Colquitt's (2008) call in the Academy of Management Journal (AMJ) for more laboratory experiments. Although Colquitt's editorial was directed at researchers interested in publishing experiments in AMJ, it also served as a signal that other management journals might be interested in experimental research. The recent appearance in management and leadership journals of editorial statements citing Colquitt (2008) and echoing his call for laboratory experiments (cf. Anderson & Edwards, 2015; Antonakis, 2017; Mueller, 2018; Van Witteloostuijn, 2015; Zellmer-Bruhn, Caligiuri, & Thomas, 2016) lends weight to this proposition. That said, we still believe that experiments (particularly laboratory experiments) are often under-appreciated in management and leadership research. This lack of appreciation stems from the criticisms directed at such designs over the years. First are the criticisms of laboratory research for a presumed lack of realism, use of student subjects, and concerns about external validity (e.g., Colquitt, 2008; Greenberg & Tomlinson, 2004; Ilgen, 1986; Mook, 1983; Taylor et al., 2003). Campbell (1986, p. 276) has captured the essence of these criticisms, noting that “in the minds of its critics, laboratory research is of low quality, experimental in nature, theoretical and esoteric, and rigidly controlled to the point of sterility. Worst of all, it uses students as subjects.” Although Colquitt (2008) noted that researchers may be more likely to subscribe to this view than editors and reviewers, the eﬀect remains the same – fewer papers reporting laboratory experiments are submitted (and subsequently published) in management journals. Next, we believe that there is a general misunderstanding of some basic characteristics of experimental designs and their subsequent strengths and limitations. For example, it is not uncommon for laboratory experiments to be criticized for their “artiﬁciality” and the amount of control they exercise over extraneous variables (Babbie, 2014), even though this control is among the most important virtues of the method (Henshel, 1980; Mook, 1983; Webster & Sell, 2014). As long as these misunderstandings persist, it is unlikely that the advantages of experimental methods will be fully appreciated. Third, several researchers (Greenberg & Tomlinson, 2004; Stone-Romero et al., 1995), have noted that it is easier to administer questionnaire surveys in ﬁeld settings than to conduct experiments in laboratory settings, and that the growth in the use of covariance structure and multilevel analyses has provided researchers with more sophisticated ways of analyzing survey data. Finally, like Schwenk (1982) and Taylor et al. (2003), we think that many researchers perceive that rigor and relevance are orthogonal concepts and that relevance should take precedent over rigor. This false dichotomy has also been noted by other scholars (Lonati, Quiroga, Zehnder, & Antonakis, 2018). However, misplaced criticism of laboratory experiments leads many researchers to focus their eﬀorts on nonexperimental, questionnaire-based research in organizational settings, rather than on experimental studies in more controlled settings. Of course we are not implying that experimental methods have been completely neglected in leadership and management. For example, in addition to Colquitt's (2008) call for more laboratory experiments, Grant and Wall's (2009) article on the beneﬁts of quasi-experiments in organizational settings, and the recent articles by Chatterji, Findley, Jensen, Meier, and Nielson (2016), Eden (2017), and Hauser et al. (2017), on the virtues of ﬁeld experiments have made important contributions to the literature. However, although these articles are

What are the goals of experimental research designs? The basic goal of experimental research designs is to determine the causal relationships between independent and dependent variable(s). Although there is some debate about what constitutes a causal relationship (e.g., Cheng, 1997; Cook & Campbell, 1979; Pearl, 2000; Spirtes, Glymour, & Scheines, 1993), most organizational and behavioral scientists (cf. Antonakis et al., 2010; Bickman & Rog, 2009; Campbell, 1957; Cook & Campbell, 1979; deVaus, 2001) subscribe to the idea that a cause-eﬀect relationship is established with three criteria: (a) covariation between the independent and dependent variables; (b) temporal precedence, such that variation in the independent variable precedes variation in the dependent variable; and (c) alternative explanations for the observed relationship have been ruled out. The importance of the second and third criteria for establishing a causal relationship can be illustrated with a simple example. Suppose a researcher hypothesizes that supportive leader behaviors (SLB) increase employees' task performance (TP), and that after gathering data on these variables in an organizational setting the researcher ﬁnds that there is a relatively strong positive correlation (r = 0.63) between the measures of these variables. On this basis, the researcher might conclude that SLB causes an improvement in TP. However, the correlation can be explained by a variety of alternate causal relationships between these variables, which are illustrated in Fig. 1. First, as indicated in Panel 1, it is possible that the observed correlation between SLB and employees' TP supports the hypothesis that SLB cause employees to perform better. Second, as indicated in Panel 2, it is also possible that this correlation reﬂects the fact that leaders are more supportive of employees who perform well. In other words, high employee TP causes leaders to exhibit more SLB. A third possibility (illustrated in Panel 3) is that SLB and employee TP are reciprocally related: supportive leaders elicit better performance from their employees and this high performance is reinforced by yet more support from leaders. Of course, it is also possible that the observed correlation between SLB and employee TP is spurious and due to a third (confounding) variable. For example, it is possible that the organization's reward system causes SLB and employee TP to covary, although they are not causally related. This spurious relationship is shown in Panel 4. Finally, as illustrated in Panel 5, the correlation between SLB and TP may be moderated by another variable, such that the relationship is positive at one level of the moderator and weaker, non-existent, or negative at another level of the moderator. The role of experimental designs in minimizing threats to internal validity Experimental research designs are important because they minimize threats to internal validity. Internal validity is the conﬁdence a 2

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Fig. 1. Possible causal relationships between supportive leader behaviors (SLB) and employee task performance (TP).

whether an independent variable causes changes in a dependent variable is to manipulate the independent variable quantitatively (e.g., in terms of its magnitude, intensity, or frequency). For example, a leadership researcher might be interested in comparing the eﬀects of high and low levels of autocratic leadership. Such an experiment would require the researcher to deﬁne the conceptual domain of autocratic leadership, to operationalize it, to manipulate its level (so one group is exposed to high levels and another group is exposed to low levels), and then to observe the eﬀects on the dependent variable(s) of interest. However, if an experiment includes a more extensive range of values of the independent variable (e.g., low, medium, and high), it is possible to explore curvilinear eﬀects of the independent variable on the dependent variable(s). Yet another possibility is that a researcher is interested in comparing the eﬀects of two qualitatively diﬀerent types of leadership behavior (e.g., SLB and charismatic leadership behavior) on one or more dependent variables. Such an experiment would require establishing a conceptual distinction between the two forms of leader behavior, followed by the manipulation of these behaviors and comparison of their eﬀects. However, since we are dealing with qualitatively diﬀerent treatments, the researcher must also establish that the comparison is fair – i.e., the manipulations represent equivalent levels of the respective constructs. According to Cooper and Richardson (1986, p. 179), fair comparisons require that:

researcher has that a change (whether naturally occurring or due to manipulation) in the independent variable causes the observed change in the dependent variable. Although there are a number of confounding variables that may threaten internal validity, the most prominent include selection, history, maturation, testing, instrumentation, regression, mortality and selection by maturation interactions (Campbell & Stanley, 1963; Cook & Campbell, 1976, 1979; Crano, Brewer, & Lac, 2015). Deﬁnitions and examples of these threats are provided in Table 1. As we note below, laboratory experiments are particularly well-suited to minimizing threats to internal validity and establishing causal relationships, because participants are randomly assigned to treatments and because these designs oﬀer the researcher a high degree of control over the independent and extraneous variables. What are the characteristics that diﬀerentiate between types of experimental studies and between experimental and nonexperimental studies? Although there are many types of experimental designs (e.g., Eden, 2017; Harrison & List, 2004; Shadish, Cook, & Campbell, 2002), we focus on the three designs most widely used in management and leadership research: laboratory experiments, ﬁeld experiments, and quasiexperiments. Fig. 2 provides four questions that researchers can use to distinguish between these types of experiments and between experimental and non-experimental designs. The ﬁrst question is whether the independent variable is explicitly manipulated or not. In a study designed to determine whether changes in an independent variable have an eﬀect on changes in a dependent variable there should be at least two diﬀerent treatment conditions. For example, a researcher interested in the simple question of what eﬀect the manipulation of an independent variable (e.g., abusive supervision) has on a dependent variable (e.g., counterproductive employee behavior) might investigate this by comparing a treatment group (exposed to abusive supervision) with a control group that does not receive the treatment. Experiments that compare the eﬀects of the presence or absence of an independent variable using an experimental and a control group may prove particularly useful in the early stages of a research program, when a leadership researcher is trying to determine whether the independent variable has any eﬀect on the outcome(s) of interest. Another experimental technique that can be used to determine

“the competing theories, factors, or variables are operationalized, manipulated, or measured with equivalent strength. By equivalent strength we mean that: (a) the competing theories, factors, or variables are operationalized, manipulated, or measured with equal care and ﬁdelity (i.e., there is procedural equivalence); and (b) the values taken by the factors or variables vary over equivalent ranges of values in their respective populations (i.e., there is distributional equivalence).” At a minimum, it would be important to show that the levels of SLB and charismatic leadership behavior are manipulated to be approximately equivalent with respect to the distance from their respective population means. Otherwise, as noted by Cooper and Richardson (1986, p. 179), “When a convincing case for … equivalence cannot be made, and the results favor the theory or construct that was more strongly operationalized … then the possibility that the comparison was 3

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Table 1 Deﬁnitions and examples of threats to internal validity Label

Deﬁnition

Example(s)

Selection

Potential threat due to diﬀerences between experimental and control (or comparison) groups that exist prior to the administration of the treatment(s) and may be responsible for the observed eﬀect on the dependent variables(s).

History

Potential threat due to the occurrence of an unanticipated event during the experiment that is not part of the experimental treatment; this event may be responsible for the observed eﬀect on the dependent variable (s). Generally speaking, the potential threat of history becomes more problematic as the length of time between the treatment and the measurement of the dependent variable(s) increases. Potential threat due to study participants growing more mature, more experienced, more fatigued, more knowledgeable, older, etc., when these processes are not the treatment of interest. Maturation threats become more problematic as the period of time between the treatment and the measurement of the dependent variable(s) increases. Potential threat due to the fact that pretest measures may sensitize, prime, or otherwise inﬂuence subsequent measures of the dependent variable(s).

Participants who are selected by an organization for high-potential leadership training may diﬀer from control/comparison group participants who are not selected for this training with respect to conscientiousness, intelligence, and interpersonal eﬀectiveness. To the extent that these pre-existing diﬀerences are associated with leadership emergence and eﬀectiveness, the causal eﬀect of the treatment will be less clear. A researcher examining the eﬀects of a leadership development program will be unsure if the program is responsible for changes in the eﬀectiveness of the leaders if changes in the organization’s compensation system take place between the delivery of the program and measurement of the dependent variable(s).

Maturation

Testing

Instrumentation

Potential threat due to changes in the measurement instrument from the pretest to the posttest that cause changes in the dependent variable; such changes would not be attributable to the treatment.

Regression

Potential threat due to the fact that when participants are assigned to treatment and control/comparison groups on the basis of extreme scores, their posttest scores on the dependent variable(s) may become more moderate (i.e., regress toward their mean).

Mortality

Potential threat due to fact that participants who drop out of an experiment may diﬀer in some meaningful way from those who remain in the experiment.

Selection x maturation interaction

Potential threat due to diﬀerential maturation of the treatment and control or comparison groups with respect to the dependent variable, where the diﬀerential maturation is attributable to diﬀerences between the groups at the start of the study.

The maturation of middle-school students who are exposed to a leadership skills development program may make it diﬃcult to tell whether it is the program or the students’ natural gains in life experience and intellectual skills that are responsible for the eﬀectiveness of the program. Participants who are asked to rate their stress level before the implementation of a stress reduction program may monitor indicators of stress more carefully, thus making it diﬃcult to tell whether the stress reduction program, the increased monitoring, or a combination of the two is responsible for any change in the dependent variable(s). Observers who are coding leaders’ behaviors for a study in which the leaders are trained to be more supportive may change how they categorize the behaviors as they become more experienced or more familiar with the coding system. To the extent that such changes in coding are reﬂected in the posttest measures, the eﬀects of the treatment will be less certain. If poorly performing supervisors are selected to participate in a program in which they receive feedback about their lack of eﬀectiveness and are assigned speciﬁc goals, it will be diﬃcult to tell whether any improvement in their performance is due to the feedback and goal-setting program or to regression to the mean level of performance. If a study uses several treatment groups in which leaders are trained to exhibit diﬀerent leadership styles and the number of leaders who drop out of an “autocratic leader” treatment group is higher than from the other groups because some of the leaders are uncomfortable exhibiting autocratic leader behaviors, it would call into question the ﬁnding that leaders trained to exhibit empowering leader behaviors become more eﬀective than the leaders trained to be autocratic. If participants are assigned to a treatment group on the basis of their interpersonal skills, and participants who have better interpersonal skills mature faster than participants with lower interpersonal skills (who will have been assigned to the control or comparison group), it will be less clear whether any change in the dependent variable is due to the treatment or to the initial group diﬀerences in interpersonal skills.

assign them (or allow them to assign themselves) to treatment conditions. For example, organizations may assign employees to a training program based on their potential (e.g., leadership development) or deﬁciencies (e.g., knowledge or skills training programs), or oﬀer an organization-sponsored program on a voluntary basis (e.g., stress management intervention). Although these assignment procedures may seem practical, they can be problematic because they introduce the potential for confounding factors (e.g., demographic characteristics, personality variables, IQ, etc.) to inﬂuence the internal validity of the study. One way to address potential confounds is by “matching” participants to each experimental condition with respect to what the researcher considers to be the most prominent confounding variables (Holmes, 2014). For example, a researcher interested in determining which of two leader behavior training programs (leadership empowerment vs. directive leadership) has the greatest impact on leaders' effectiveness might match participants in the treatment conditions on IQ, as intelligence has been shown to be related to leadership emergence and eﬀectiveness (Judge, Colbert, & Ilies, 2004). Unfortunately, however, matching participants on one (or a few) characteristic(s) does not guarantee that all potential confounds are controlled (Holmes, 2014; Shadish et al., 2002). For example, matching trainees on intelligence

unfair must be explicitly addressed when the results are discussed.” Because all of the designs described above expose each participant to a single treatment, they are referred to as between-subjects (or between-participants) designs. It is also possible to conduct a withinperson (or within-subjects) experiment in which the participants are exposed to multiple treatments over time. Although within-subject designs have the advantage of reducing the error variance associated with individual diﬀerences among the participants and thus increasing statistical power, the fact that exposure to one treatment can confound the eﬀects of subsequent treatments leads many researchers to prefer between-subjects designs. However, regardless of whether a study uses a between- or within-subjects design, it can only be considered experimental if at least one of the independent variables has been manipulated; otherwise, the study is considered to be non-experimental. The second question used to determine the type of experimental design is: Are participants randomly assigned to conditions? In some experimental studies participants are allowed to remain in the groups in which they naturally reside (e.g., existing work groups in organizations; sports teams; cohorts/classes, etc.). In these studies, the researcher does not control the assignment of the participants to groups, but does control how treatment(s) are assigned to the pre-existing groups. In other studies, the organization for which the participants work may 4

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Fig. 2. Decision tree for classifying experimental research designs.

temperature, lighting, ambient noise, equipment, etc.), psychological characteristics (e.g., cognitive requirements of the task, job stress or strain, work-related distractions, etc.) and social characteristics (e.g., presence of other people, number and type of interactions with others, potential for interpersonal conﬂict, etc.). The problem is that these factors may aﬀect the dependent variable(s) directly or through interactions with the independent variable(s). Laboratory experiments oﬀer the researcher greater control over extraneous variables than do ﬁeld experiments, because the researcher can control the independent variable and a host of extraneous physical, psychological, and social factors in a laboratory setting. This is one reason why research conducted in laboratory settings generally has higher expected internal validity than research conducted in ﬁeld settings; although this depends on the speciﬁc level of control exercised on these characteristics in each setting. Of course, as noted in Fig. 2, it is possible that even in studies where an independent variable has been manipulated, participants may not be assigned randomly to conditions. In such cases, it is important to ask a ﬁnal set of questions: does the design include a control/comparison group, or multiple observations of a single group of participants? If the design includes a control or comparison group or if multiple measures of the dependent variable are taken before and after the manipulation of the independent variable in a single group, then the study qualiﬁes as a quasi-experiment. If neither is the case, then the study is non-experimental, even if the independent variable has been manipulated. Campbell and Stanley (1963) referred to this kind of design as a preexperimental design.

will not necessarily control for extraversion, conscientiousness, openness to experience, or other factors which have also been shown to inﬂuence perceived leadership eﬀectiveness (Judge, Bono, Ilies, & Gerhardt, 2002). Moreover, the diﬃculty increases as the number of characteristics on which the participants must be matched increases, as does the number of cases required to select a matched sample (Schwab, 2005). That said, some matching algorithms, such as propensity score matching, do work well (Holmes, 2014). We discuss this approach brieﬂy in the section on quasi-experimental designs. Unlike the approaches discussed above, random assignment is used in experimental studies to create multiple groups that are presumed to be equivalent in terms of various attributes (e.g., age, gender, personality, race, IQ, etc.). Under random assignment each participant has an equal chance of being assigned to each treatment condition. Schwab (2005, p. 64) noted that the primary advantage of random assignment over other assignment procedures is that it “controls for nuisance variables whether or not researchers are aware of them.” Random assignment is often viewed as a great equalizer, because it increases conﬁdence that all extraneous factors that could inﬂuence participants' behaviors are approximately equally distributed across conditions. In a study designed to examine the eﬀects of an independent variable on dependent variables at the individual level, this is accomplished by randomly assigning participants to treatments, However, in studies designed to examine the eﬀects of independent variables on dependent variables at the group level, control is maximized by (a) randomly assigning participants to groups and then (b) randomly assigning groups to treatments. This is particularly important if the composition of the groups (e.g., in terms of demographic characteristics, personality traits, abilities, skills, or other psychological variables) could inﬂuence the outcomes of the experiment. In such cases, failing to assign participants to the groups randomly would provide a potential alternative explanation for the observed eﬀects. Assuming that participants have been assigned to treatment conditions randomly, the third question is: Does the researcher have control over the experimental setting? As Crano et al. (2015) have noted, research settings may diﬀer in a number of ways that aﬀect the level of control a researcher has over extraneous variables. These factors include the setting's physical characteristics (e.g., physical layout,

The importance of manipulation checks in experimental designs Although random assignment facilitates internal validity by minimizing or eliminating individual diﬀerences as an explanation for observed eﬀects, manipulation checks are required to conﬁrm that the treatment conditions have operationalized the independent variable as it has been conceptualized (i.e., the manipulation is construct valid; Cook & Campbell, 1979). Ideally, manipulation checks should be undertaken as part of a pilot study, both to allow the researcher to revise the manipulation before it is used in the primary study (should this 5

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Laboratory experiments

prove necessary), and because manipulation checks made before or after measurement of a dependent variable can prove problematic (cf. Aronson & Carlsmith, 1968; Kidd, 1976; Lonati et al., 2018; Perdue & Summers, 1986; Wetzel, 1977). For example, manipulation checks carried out before measurement of the dependent variable(s) could alert participants to the nature of the study (i.e., serve as demand characteristics), whereas manipulation checks carried out after the measurement of the dependent variable(s) may be ineﬀective because the eﬀects of the manipulation may have already dissipated or because participants' responses to the manipulation may bias their response to the manipulation check (Kidd, 1976; Lonati et al., 2018; Perdue & Summers, 1986). Although there are some circumstances under which manipulation checks may be deemed unnecessary or counterproductive (cf. Sigall & Mills, 1998), providing evidence that the treatments used in an experiment are related to the level of the variables they are intended to manipulate increases conﬁdence in the inferences made from such studies (Perdue & Summers, 1986). Furthermore, if an experiment includes multiple independent variables it is important to determine whether the manipulations themselves are confounded. More speciﬁcally, Perdue and Summers (1986, p. 322) argued that,

Characteristics of laboratory experiments Fisher (1984, p. 169) deﬁned a laboratory experiment as a procedure in which the researcher attempts to test causal hypotheses by manipulating one or more independent variables (hypothesized causes) and measuring one or more dependent variables (hypothesized eﬀects) while controlling for all other variables. If done properly, the researcher may conclude that varying levels of the independent variable caused the observed diﬀerences in the dependent variable, since nothing in the situation, procedure, or subjects was systematically diﬀerent across groups except for the independent variable. (Italics in original.) As indicated in Table 2, laboratory experiments are designed to establish causal relationships between independent and dependent variables. Laboratory experiments accomplish this more eﬀectively than other experimental designs, because the researcher not only has precise control over the independent variable, but also because the participants are randomly assigned to treatment conditions, and the researcher exercises considerable control over the research setting. Thus, unlike experiments conducted in ﬁeld settings, laboratory experiments enable the researcher to control a variety of physical, psychological, and social extraneous variables, which reduces the number of alternative explanations (rival hypotheses) that can be used to explain changes in the dependent variable(s) and increases the internal validity of the study and the replicability of its ﬁndings (Camerer, 2015). Of course, the high degree of control over the independent and extraneous variables that is possible in laboratory settings also has some potential disadvantages. For example, laboratory experiments are often compared unfavorably with ﬁeld experiments and quasi-experiments on the grounds that controlled settings (a) are artiﬁcial and lack realism, (b) increase the potential for subjects' reactivity, and (c) lack generalizability.

an adequate analysis of a manipulation check for a given factor (manipulation) within a multiple-factor design requires the use of the full-factorial ANOVA model whenever it is plausible that one manipulation may have inadvertently aﬀected an independent variable associated with a diﬀerent manipulation. Furthermore, researchers must be concerned with the statistical signiﬁcance of all main and interaction eﬀects, not just those involving the factor corresponding to the manipulation check measure being analyzed. A statistically signiﬁcant main eﬀect for the manipulation (factor) corresponding to the manipulation check being analyzed provides evidence in favor of the convergent validity of that particular manipulation. To the extent that other main and/or interaction eﬀects are statistically signiﬁcant, the discriminant validity of the associated manipulations becomes suspect. Ideally, only one eﬀect, the main eﬀect of the factor (manipulation) of interest, will be statistically signiﬁcant. If eﬀects associated with other manipulations prove to be statistically signiﬁcant, these manipulations will have been “falsiﬁed” in the sense that they have not had their intended eﬀects.

Examples of laboratory experiments in the leadership domain To illustrate some of the ways in which laboratory experiments have been used in leadership research we provide a few examples from the literature. The ﬁrst example (Doci & Hofmans, 2015) treats leadership as the dependent variable, whereas the other two examples (Howell & Frost, 1989; Podsakoﬀ, Whiting, Podsakoﬀ, & Mishra, 2011) treat leadership as the independent variable. Doci and Hofmans (2015) conducted a laboratory experiment to examine the eﬀects of task complexity on transformational leadership behaviors. They hypothesized that leaders who experience stress due to the complexity of their group's task are less likely to engage in transformational leadership than leaders whose groups perform less complex tasks. They also hypothesized that the putative negative eﬀect of task complexity on transformational leadership would be mediated by the leaders' core self-evaluations (i.e., leaders' perceptions of themselves, their worth, and their abilities). They used a within-subjects research design in which participants were required to work in three-person groups simulating teams charged with making decisions of varying complexity. One member was randomly assigned to play the role of the leader and the other two participants played the role of subordinates. Three diﬀerent tasks were developed (choosing a new oﬃce space to rent; choosing a new product to market; choosing a new project manager) at three diﬀerent levels of complexity (low; moderate; high). After participants were allocated to groups and roles, the groups performed a training task to familiarize participants with the requirements of the experimental tasks. Then each group was asked to solve three diﬀerent decision-making tasks of variable complexity. To control for possible order eﬀects, both the sequence of the tasks and their complexity were randomly assigned to groups. For manipulation checks, participants

Perdue and Summers (1986) also note that if researchers are concerned that other, related constructs may be inﬂuenced by their manipulation or that their manipulation could be interpreted in terms of more than one construct, then they would be wise to check for confounds. Confound checks are measurements of variables that have not been explicitly manipulated, but may nevertheless have an eﬀect on the dependent variable(s). If it can be shown that the treatment inﬂuences measures of the manipulated variable but not the potential confounds, this will increase conﬁdence that the theoretical construct of interest – and not some other construct – caused the observed variation in the dependent variables.

What are the characteristics, strengths, and limitations of the various types of experimental designs? There is considerable variability within each of the three general types of experimental design. Some of these diﬀerences directly inﬂuence the inferences that can be made about the internal (and external) validity of the ﬁndings. In the sections that follow, we clarify the implications of these diﬀerences by: (a) comparing laboratory, ﬁeld, and quasi-experiments with respect to a number of characteristics, (b) providing examples from the leadership literature, and (c) discussing their strengths and limitations. A summary of these points are provided in Table 2.

6

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Table 2 Comparison of the characteristics of laboratory experiments, ﬁeld experiments, and quasi-experiments. Characteristic

Laboratory experiments

Field experiments

Quasi-experiments

Objective or goal of the design Manipulation of the IV Random assignment of participants to conditions Controlled setting Amount of control over IV Amount of control over extraneous variables Internal validity of ﬁndings? Replicability of ﬁndings Number of potential rival hypotheses Realism of research setting Participants' awareness of participation Generalizability of ﬁndings Strengths of design

Establish causal relationship between IV and DV

Establish causal relationship between IV and DV

Yes Yes

Yes Yes

Establish causal relationship between IV and DV Yes No

Yes Very precise control of IV

No Variable

No Variable

High

Moderate to high

Low to moderate

High

Moderate to high

Low to moderate

High Low

Moderate Low to moderate

Low to moderate Moderate to high

Low

High

High

High

Moderate to low

Low

Low to moderate

Moderate to high

High

manipulates IV(s) and has • Researcher substantial control over potential confounding variables.

assignment to experimental • Random conditions reduces the chance that pre-

• • • • • • • • • • •

existing diﬀerences between conditions will be able to account for observed changes in the DV(s). Control over independent and confounding variables reduces error variance, and increases the likelihood that relationships will be detected. Minimizes eﬀects of endogeneity biases. Allows researchers to obtain consistent estimates of the eﬀects of the IV(s) on the DV (s), and estimators that converge on the population parameters as sample size increases. High internal validity permits researcher to make strong claims regarding causal relationships between IV and DVs. Provides researcher with an eﬀective way of examining both the main and interactive eﬀects of two or more IVs on DVs. Allows for use of complex factorial designs that would be diﬃcult, if not impossible to implement in ﬁeld settings. Particularly eﬀective in testing “crucial hypotheses” Permits researcher to study topics that are diﬃcult, if not impossible, to study in natural environments (e.g., ﬁeld settings where researchers are concerned for the health and safety of workers). Easier for researchers to examine participants' behavior or the outcomes of their behavior, rather than their behavioral intentions or perceptions of speciﬁc behaviors. Permits researchers to examine the construct validity of their measures using experimental (causal) techniques. Permits strong (experimental) tests of mediation hypotheses.

(or someone in the organization) variable is manipulated by • Researcher • Independent manipulates the IV(s). someone. assignment to experimental causal inference when random • Random • Strengthens conditions reduces the chance that preassignment is not possible or ethical. existing diﬀerences between conditions will researcher to explore causal • Permits be able to account for observed changes in relationships between IV and DV. the DV(s). with the growing interest in • Consistent evidence-based management. researcher to explore causal • Permits relationship between IV and DV. the experiment is conducted in a • Because real-life context, the intensity of the IVs is with the growing interest in • Consistent more likely to “mirror” that in real life. evidence-based management. the experiment is conducted in a Observed behaviors are more likely to • Because • real-life context, the intensity of the IVs is reﬂect the form and strength of real life more likely to “mirror” that in real life.

behaviors are more likely to reﬂect • Observed the form and strength of real life because

• • • • • • •

they are occurring in a natural setting. Typically results in lengthier exposure of participants to the experimental setting(s). Because participants are generally less aware of experimental conditions than in a laboratory experiment, demand characteristics are less likely to inﬂuence the results. Because they are conducted in organizational settings, the results are likely to have practical relevance. External validity is typically higher than laboratory experiments. Facilitates the development of better theories of time and temporal relationships. Facilitates collaboration between researchers and practitioners. In principle, allows for strong (experimental) examination of mediation hypotheses.

• • • • • • •

because they are occurring in a natural setting. Typically results in lengthier exposure of participants to the experimental setting(s). Because participants are generally less aware of experimental conditions than in a laboratory experiment demand characteristics are less likely to inﬂuence the results. Because they are conducted in organizational settings, their results are likely to have practical relevance. Facilitates the development of better theories of time and temporal relationships. High external validity. Minimizes ethical concerns about harm to participants, inequity, paternalism, and deception. Facilitates collaboration between researchers and practitioners.

(continued on next page)

7

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Table 2 (continued) Characteristic Potential limitations of design

Laboratory experiments inability to manipulate complex • Presumed constructs (e.g. leadership behaviors) with precision.

Field experiments be diﬃcult to gain access to • May organizations to carry out this type of research.

Quasi-experiments be diﬃcult to gain access to • May organizations to carry out this type of research.

awareness of experimental precise control over the IV(s) than in the precise control over the IV(s) than in • Participants' • Less • Less setting increases the probability that demand laboratory. the laboratory. characteristics act as confounding variables. diﬃcult to control extraneous variables of random assignment increases the • More • Lack probability that pre-existing group of studies (brief exposure to (environmental factors) than in laboratory • Artiﬁciality diﬀerences inﬂuence subsequently observed manipulation; behaviors and decisions are settings. changes in the DV(s) of interest. less meaningful and less consequential in the More diﬃcult to replicate ﬁndings than in • laboratory setting) reduces ecological laboratory research. Less conﬁdence that experimental and • validity. control groups do not diﬀer in some Diﬃcult to implement complicated factorial • important ways than in randomized designs. high level of control may result in an • The experiments. impoverished environment in which the Field experiments typically require more • manipulated variable is the only stimulus to time, eﬀort, and planning than laboratory More diﬃcult to exercise control over • which participants can respond. experiments. extraneous variables (environmental factors) than in laboratory settings. that student participants are not Potentially more ambiguity about cause• Concern • representative of non-student populations. eﬀect relationships than in laboratory Susceptible to the eﬀects of endogeneity • experiments. biases. from laboratory settings may not • Findings generalize to non-laboratory (e.g., More threats to internal validity than in • organizational) settings. laboratory settings. deception is used in laboratory More diﬃcult to replicate ﬁndings than in • When • experiments, some participants may suspect laboratory research. the deception and behave diﬀerently from Diﬃcult to implement complicated factorial • participants who do not suspect deception. designs. typically require more • Quasi-experiments time, eﬀort, and planning than laboratory experiments.

more ambiguity about cause• Potentially eﬀect relationships than in laboratory Examples from the leadership literature

and Hofmans (2015) • Doci et al. (2011) • Podsakoﬀ • Howell and Frost (1989)

et al. (2002) • Dvir et al. (2013) • Martin • Avey et al. (2011)

experiments.

et al. (2001) • Hui & Hofmann (2011, Study 1) • Grant • DeRue et al. (2012)

Note. IV = independent variable. DV = dependent variable.

depicting high and low levels of each. These scripts were rated by subject-matter experts to ensure that they depicted the intended behavior at the intended intensity level (high vs. low). They trained one actor to serve as the interviewer and another to serve as the interviewee and recorded a series of scripted interviews. In the main experiment, Podsakoﬀ et al. presented the videos to subjects who were randomly assigned to one of the 32 treatment conditions. Manipulation and confound checks of the videos were performed in a pilot study using students who did not participate in the main experiment. As task performance and OCB are qualitatively diﬀerent variables, the authors established the equivalence of the corresponding manipulations (Cooper & Richardson, 1986). Speciﬁcally, the results of the pilot study demonstrated that: (a) videos intended to depict high levels of each behavioral variable (i.e., supervisor task performance, administrative task performance, helping, voice and loyalty) elicited high ratings of the corresponding behavior; (b) videos intended to depict low levels of each behavior elicited low-level ratings of the corresponding behavior; (c) mean ratings of videos depicting diﬀerent behaviors had similar ratings; and (d) high- and low-level videos of each behavior had signiﬁcantly diﬀerent ratings. Podsakoﬀ et al. (2011) found that the hypothetical job candidate was generally rated more competent, received higher overall evaluations, and received higher salary recommendations when exhibiting higher levels of helping, voice, and loyalty behaviors in the interview than when exhibiting lower levels of these behaviors, even after controlling for the scripted responses regarding task performance. They also found that the interviewee's responses to voice- and loyalty-related questions interacted with job level such that these responses tended to have stronger eﬀects on selection decisions related to the supervisory position compared to the entry-level position. Finally, content analyses of participants' open-ended responses indicated that selection decisions were particularly sensitive to responses indicating low levels of voice

rated the complexity of each task immediately after they had performed it; and the participants assigned to subordinate roles provided the measures of the dependent variables by rating their leader's transformational leadership behavior after performing each task. Consistent with their hypotheses, Doci and Hofmans (2015) reported that task complexity aﬀected transformational leadership. However, post hoc analyses indicated that the overall diﬀerence between conditions was primarily due to lower ratings of transformational leadership in the high task complexity condition compared with the low task complexity condition. Ratings of transformational leadership in the low and moderate complexity conditions were only marginally different, and ratings in the high and moderate complexity conditions were not signiﬁcantly diﬀerent. Doci and Hofmans also found partial support for their hypothesis that core self-evaluations mediated the relationship between task complexity and transformational leadership behavior. In our second example, Podsakoﬀ et al. (2011) examined the eﬀects of job candidates' propensity to exhibit organizational citizenship behaviors (OCBs) on selection decisions. They set up simulated interviews in which an actor playing a job candidate responded to interview questions about job performance and three types of OCB in administrative positions. More speciﬁcally, they used a 2 (task behavior response: high vs. low) × 2 (helping behavior response: high vs. low) × 2 (loyalty behavior response: high vs. low) × 2 (voice behavior response: high vs. low) × 2 (job position: supervisory vs. entry-level) betweensubjects factorial design to examine the eﬀects of these factors on participants' overall evaluations of the candidate, ratings of the candidate's perceived competence, and recommendations for the candidate's starting salary. Podsakoﬀ et al. (2011) developed interview questions designed to capture candidates' likely task performance and OCBs in each of the two jobs and operationalized task performance and OCBs by creating scripts

8

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Table 3 Manipulations used in Howell and Frost's (1989) study to operationalize charismatic, structuring and considerate leader behaviors. Source: Adapted from Howell and Frost (1989). Charismatic leadership Verbal behaviors

overarching goals • Articulate high performance • Communicate expectations to participants and display conﬁdence in their ability to reach these expectations Empathize with participants' needs

Non-verbal behaviors and interaction style

Paralinguistic cues

• a powerful, conﬁdent, dynamic • Project presence between pacing and sitting on • Alternate edge of desk toward participants and maintain • Lean direct eye contact a relaxed posture and animated • Adopt facial expressions to participants in a captivating, • Speak engaging tone of voice

Structuring leadership

Considerate leadership

nature of the task concern for personal well-being of • Explain • Express participants what needed to be done, and how • Decide it should be done importance of the comfort and • Emphasize satisfaction of participants clear about the quantity of work to be • Beaccomplished within a speciﬁed period of in two-way communication with • Engage time participants speciﬁc work standards • Maintain a neutral, business-like manner – a friendly, approachable persona • Project • Project Adopt a relaxed posture and friendly facial neither warm nor cold • expressions desk and maintain intermittent • Siteyebehind contact Sit on the edge of the desk, lean toward • participants, and maintain direct eye contact. a neutral facial expression and • Adopt demeanour (i.e., do not provide positive Adopt a relaxed posture and friendly facial • reinforcement by nodding or smiling) expressions and demeanour (i.e., nodding to participants using a moderate • Speak level of speech intonation • Neutral tone of voice

•

approval to participants when appropriate, smiling etc.) Speak to participants using warm tone of voice

indicate that laboratory experiments can diﬀer in complexity. Doci and Hofmans's experiment was relatively straightforward, examining the eﬀect of just one independent variable on one mediator and one dependent variable. They used a modiﬁed within-subjects design, in which all of the subjects were exposed to all levels of the independent variable. In contrast, the Podsakoﬀ et al. and Howell and Frost studies were considerably more complex, in that they tested the main and interactive eﬀects of several independent variables on several dependent variables. Both these studies employed between-subjects designs in which the participants were exposed to only one treatment. Finally, despite the fact that all three of the studies were conducted in controlled laboratory settings and involved an element of role-playing, they varied with respect to the level of mundane realism – which refers to the extent to which situations that participants encounter in the laboratory are similar to situations they encounter in real life (Aronson & Carlsmith, 1968). For example, unlike the Doci and Hofmans study, in which participants were assigned speciﬁc roles and exposed to all levels of the independent variable, or the Podsakoﬀ et al. (2011) study, in which participants viewed recorded interviews on computers, participants in the Howell and Frost study were immersed in a 2.5 h simulation in which they were expected to perform tasks that are similar to work carried out in real-life organizations with a confederate leader and co-workers. Evidence for the realism of Howell and Frost's study was provided by post-study debriefs, which indicated that none of the participants suspected the true purpose of the study or the use of confederates.

and helping behaviors. A ﬁnal example of a laboratory experiment that incorporates leadership concepts was reported by Howell and Frost (1989). These researchers conducted a complex study examining the main and interactive eﬀects of three diﬀerent leadership styles and two levels of group productivity norms on participants' interpersonal adjustment, task adjustment, and task performance using a decision-making task. In this 3 × 2 between-subjects factorial study, participants were randomly assigned to work (a) under the supervision of a confederate of the researcher exhibiting high or low levels of charismatic, structuring, or considerate leadership behaviors and (b) in the presence of two coworkers (also confederates) who exhibited high or low productivity on the task. In order to operationalize the leadership behaviors investigated in the study, Howell and Frost (1989) made a concerted eﬀort to distinguish between them on the basis of the (a) verbal behaviors, (b) nonverbal behaviors and interaction style, and (c) paralinguistic cues associated with these behaviors in the literature (see Table 3 for a summary of the diﬀerences). They then trained their confederates (professional actors) to engage in these behaviors and conducted manipulation checks to conﬁrm that the actors portrayed the intended leadership styles accurately. They also trained the confederate co-workers to exhibit high or low productivity norms and checked that this manipulation had the intended eﬀects. Howell and Frost found that participants working under the direction of a charismatic leader expressed higher levels of adjustment to their task, their leader, and co-workers, and performed better than participants working for structuring or considerate leaders, regardless of the productivity norm displayed by their co-workers. They also found that participants working under a structuring leader in a group that exhibited a high productivity norm reported higher task satisfaction and lower role conﬂict than participants working under a structuring leader in a group that exhibited a low productivity norm. Finally, Howell and Frost reported that participants working under the direction of a considerate leader in a group exhibiting a high productivity norm expressed higher task satisfaction than participants working under the direction of a considerate leader in a group exhibiting a low productivity norm. Together, the three studies discussed above demonstrate that laboratory experiments can be applied to various aspects of leadership using a variety of designs in order to address a range of research questions. For example, these studies demonstrate that leadership can be treated as an independent (Howell & Frost, 1989; Podsakoﬀ et al., 2011) or dependent variable (Doci & Hofmans, 2015). They also

Strengths of laboratory experiments One of the strengths of randomized laboratory experiments is that they permit researchers to address concerns about endogeneity. As noted earlier, one of the conditions necessary for establishing causality is ruling out the possibility that some factor, other than the independent variable being manipulated, is the cause of the change in the dependent variable. Experiments achieve this when participants are randomly assigned to conditions, thus creating a context in which the independent variable is not correlated with other manipulated or unmeasured variables. However, when this is not the case, and the independent variable may be correlated with confounding factors, the problem of endogeneity exists. According to Semadeni, Witherss, and Certo (2014, pp. 1070–1071), “Endogeneity occurs when an independent variable is correlated with the error term (also known as “disturbance” or “residual”) in an ordinary least squares (OLS) regression model … [When this happens] the errors are not random… [and] 9

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

… this leads to biased coeﬃcient estimates.” Because this is not the case in rigorous laboratory experiments, endogeneity is not a concern when interpreting ﬁndings from these designs (Antonakis et al., 2010). Indeed, randomized laboratory experiments permit researchers to obtain consistent estimates of the eﬀects of the independent variable(s) on the dependent variable(s), with estimators that converge on the population parameters as the sample size increases (Boruch, Weisburd, Turner, Karpyn, & Littell, 2009; Shadish, 2011). Laboratory experiments possess several other strengths. First, they permit researchers to make strong claims about the internal validity of their ﬁndings (Antonakis et al., 2010; Brown & Lord, 1999; Colquitt, 2008; Ilgen, 1986; Woﬀord, 1999). Because (a) participants in laboratory experiments are randomly assigned to treatment conditions, (b) researchers have precise control over the independent variable(s) of interest, and (c) researchers can exercise substantial control over potential confounding (extraneous) variables, conﬁdence is increased that the independent variable - and not some other factor - is the cause of the observed changes in the dependent variable(s) (Campbell & Stanley, 1963; Cook & Campbell, 1979; Falk & Heckman, 2009; James, 1980; Stone-Romero, 2002). Moreover, because laboratory experiments reduce random error, they increase the likelihood that relationships between the independent and dependent variables will be detected. Third, Mook (1983) and Ilgen (1986) noted that laboratory experiments are particularly well-suited to exploring “can it happen” hypotheses, which are fundamental to the testing of some theoretical statements, while Griﬃn and Kacmar (1991) argued that laboratory experiments are an especially eﬀective method of testing “crucial hypotheses” – i.e., hypotheses designed to test competing theories or models. A good illustration of the use of laboratory research to test crucial hypotheses is provided in a monograph by Latham, Erez, and Locke (1988). In previous research, Latham and Erez had reported conﬂicting ﬁndings on the relationship between participation in goal setting, goal commitment, and job performance. With Locke serving as a mediator, the authors conducted several experiments designed to reconcile inconsistent ﬁndings. In summarizing the results of their studies the authors noted the advantages of using experimental studies to resolve scientiﬁc conﬂicts and disputes. For example, from her experience, Erez (in Latham et al., 1988, p. 768) concluded that, “The collaboration process is not a zero-sum game… both sides gain from the process because it helps to deﬁne the speciﬁc conditions necessary to validate their predictions.” Another strength of laboratory experiments, illustrated by both the Podsakoﬀ et al. (2011) and Howell and Frost (1989) articles, is that they permit researchers to examine the main and interactive eﬀects of independent variables using fairly complex factorial designs. Although such designs are obviously not limited to laboratory settings, it is difﬁcult to imagine implementing the six experimental conditions utilized by Howell and Frost in a ﬁeld setting, let alone teasing out the eﬀects of the 32 treatment conditions on the three dependent variables used in the Podsakoﬀ et al. study. Fifth, several authors have noted that laboratory experiments allow researchers to study leadership issues and topics that are diﬃcult, if not impossible, to study in natural environments because they are rare or because they raise ethical concerns (Brown & Lord, 1999; Falk & Heckman, 2009; Griﬃn & Kacmar, 1991; Ilgen, 1986). For example, Brown and Lord's (1999, p. 534) discussion of Hunt, Boal, and Dodge's (1999) study of the eﬀects of diﬀerent types of charismatic leadership provides an excellent illustration of the value of laboratory experiments when it comes to studying rare events:

this diﬃculty reﬂects the fact that it is unlikely that enough data will be available to examine the eﬀectiveness of diﬀerent responses to crisis situations. Moreover, to merely describe how diﬀerent leaders have responded to crises does not necessarily provide direction regarding what could be done in a crisis. Thus, when the phenomenon under consideration is rare, experimental studies provide not only the opportunity to study these situations, but also provide the opportunity to discover the most eﬀective leadership styles for these events and to determine causality. Sixth, laboratory experiments permit researchers to examine the actual behaviors of participants, or the outcomes of these behaviors, rather than their intentions to exhibit such behaviors or their perceptions of these behaviors (Baumeister, Vohs, & Funder, 2007; Colquitt, 2008). More speciﬁcally, Baumeister et al. (2007, p. 396) noted that although psychology calls itself the science of behavior, some psychological subdisciplines have never directly studied behavior, and studies on behavior are dwindling rapidly in other subdisciplines … [and] … the direct observation of behavior has been increasingly supplanted by introspective self-reports, hypothetical scenarios, and questionnaire ratings...[Indeed] the selfreport appears to have all but crowded out all other forms of behavior. Behavioral science today … mostly involves asking people to report on their thoughts, feelings, memories, and attitudes. Occasionally they are asked to report on recent or hypothetical actions. Or, somewhat diﬀerently (and more rarely), reaction times, implicit associations, or memory recall might be assessed in the service of illuminating a cognitive process. But that is as close as most research gets. Direct observation of meaningful behavior is apparently passé. The same criticism of self-reports could be leveled at research in the ﬁeld of organizational behavior. However, one of the main advantages of laboratory experiments is that they oﬀer researchers the chance to observe behaviors and their outcomes directly. This advantage is important in the context of leadership research, because most of the contemporary theories and models and incorporate leader behaviors. For example, Doci and Hofmans (2015) examined the eﬀects of task complexity on the transformational leadership behaviors of participants, and Howell and Frost (1989) examined the eﬀects that speciﬁc leadership styles had on participants' task performance. Use of laboratory experiments to validate a scale. There are two further, often unappreciated, advantages of laboratory experiments in the context of leadership research. The ﬁrst is that laboratory experiments can provide strong evidence for the validity of leadership measures. Traditionally, the validity of leadership measures has been inferred from (a) examining the content validity of items intended to measure the leadership construct, (b) assessing the psychometric properties of the scale (e.g., reliability, measurement model ﬁt, factor loadings) and (c) observing whether the empirical relationships between measures of the leadership construct and other constructs in the nomological network are consistent with hypotheses (Churchill, 1979; MacKenzie, Podsakoﬀ, & Podsakoﬀ, 2011; Schwab, 1980). Researchers often assume that if these conditions are satisﬁed, one can be reasonably conﬁdent that inferences based on the measurements made using the scale are valid. However, a number of researchers (Borsboom, 2009; Borsboom, Mellenbergh, & Van Heerden, 2004; MacKenzie et al., 2011; Podsakoﬀ, Podsakoﬀ, MacKenzie, & Klinger, 2013) have noted that there are several problems with this approach to scale validation. Chief among these problems is the fact that the concept of validity implies direction and causality, and correlational evidence based on a nomological network does not provide evidence for either one of these criteria. Indeed, as noted by Podsakoﬀ et al. (2013, p. 100) this traditional approach,

In their investigation, Hunt et al. (1999) experimentally examined how diﬀerent forms of charismatic leadership (visionary versus crisis responsive) functioned during times of crisis and how eﬀective these forms of leadership were once a crisis had subsided. By their very nature crises are rare events, making them diﬃcult to investigate in the ﬁeld using correlational survey methods. In part,

…fails to test the heart of what validity is all about. When most 10

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

employee trust and job satisfaction; Schaubroeck, Lam, and Cha (2007) investigated whether team potency mediated the relationship between transformational leadership and team performance; and Detert, Trevino, Burris, and Andiappan (2007) examined whether group-level counterproductivity mediated the relationships between diﬀerent modes of managerial inﬂuence and unit-level ﬁnancial performance and customer satisfaction. Measurement-of-mediation designs. Traditionally, mediated eﬀects models are tested using one of two techniques. The ﬁrst technique, which is typically used in non-experimental ﬁeld studies, involves measuring the predictor variable, the proposed mediating variable, and the criterion variable, and then demonstrating that the indirect eﬀect of the predictor variable on the criterion variable (through the mediator) is statistically signiﬁcant. Zhang and Bartol (2010) used this design in a cross-sectional ﬁeld study to demonstrate that the relationship between empowering leadership and employee creativity was mediated by employees' feelings of psychological empowerment, intrinsic motivation, and engagement in the creative process. The second technique, used in laboratory experiments, involves measuring the presumed mediator and dependent variable after manipulation of the independent variable in order to demonstrate that variation in the independent variable is related to variation in the dependent variable “through” the mediating variable. Allen and Rush (1998, Study 2) used this design in a laboratory study to demonstrate that the eﬀect of OCBs on raters' evaluations of an instructor's performance was mediated by the raters' liking of and aﬀective commitment to the instructor. Spencer et al. (2005) referred to these designs as measurement-of-mediation designs and noted that they are useful in situations where it is easy to measure the proposed psychological mediating mechanism, but hard to manipulate it. However, Spencer et al. (2005) also identiﬁed a number of potential limitations of these designs: (a) the observed relationship between the mediator and the dependent variable may be spurious, because the designs are correlational; (b) measuring the mediating variable at approximately the same time as the dependent variable may sensitize or prime participants to respond to the dependent variable; and (c) neither design permits strong causal inferences about the relationship between the mediator and the dependent variable. In addition, because these designs often measure the mediator and the dependent variable using the same source and at the same point in time, they are also vulnerable to common method biases (Podsakoﬀ, MacKenzie, Lee, & Podsakoﬀ, 2003; Podsakoﬀ, MacKenzie, & Podsakoﬀ, 2012). For all of these reasons, several experts (e.g., Judd, Kenny, & McClelland, 2001; Kenny, 2008; MacKinnon, 2008) have noted that a statistically signiﬁcant indirect eﬀect does not, on its own, imply causation. Experimental-causal-chain designs. Fortunately, alternatives to measurement-of-mediation designs do exist. Spencer et al. (2005) referred to one alternative as the experimental-causal-chain design. This design involves carrying out two sequential experiments. In the ﬁrst experiment the independent variable is manipulated to demonstrate its eﬀect on the presumed mediating variable. In the second experiment the presumed mediating variable is manipulated to determine its eﬀect on the dependent variable of interest. If the independent variable causes the mediator and the mediator causes the dependent variable, then this is interpreted as support for the hypothesized causal chain linking the independent variable to the dependent variable through the mediating variable. A recent study by Liang et al. (2016) used the experimental-causalchain approach to examine why supervisors abuse poorly performing subordinates. Lian et al. hypothesized that poorly performing subordinates elicit hostility from supervisors, prompting them to engage in abusive behavior. They conducted two laboratory experiments to test this hypothesis. In the ﬁrst experiment they manipulated whether participants in a poor subordinate performance condition made hostile or non-hostile attributions about the poor performing subordinate, by asking the participants to imagine that the subordinate's poor

researchers are asked what is meant by the concept of validity, they say that indicators of a construct are valid to the extent that they measure what a theory says it does (Kelley, 1927). This suggests that basing inferences about validity primarily on a scale's relationships with other constructs in its nomological net is problematic because these relationships only indirectly reveal whether the scale is measuring what it is intended to measure (i.e., whether changes in the theoretical construct cause corresponding changes in the observed responses to the scale items). It would be better to test this assumption directly. Podsakoﬀ et al. (2013) argue that a more eﬀective way of establishing whether a scale is measuring what it is intended to measure is to manipulate the construct of interest and observe what eﬀect this has on items designed to reﬂect the underlying construct. Assuming that variations in the manipulated construct cause variations in the items, one can infer that the scale is valid. Podsakoﬀ et al. go on to say that scales purporting to measure behavioral constructs (included in many prominent leadership theories) are particularly amenable to such validation using videos that display high and low levels of the leadership behaviors in experiments. The authors provide a step-by-step guide to the development and use of such videos in the scale validation process. A good example of this procedure was reported by Maynes and Podsakoﬀ (2014, Study 4). These authors were interested in demonstrating the validity of their newly developed measures of employee voice behaviors (i.e., constructive, destructive, supportive, and defensive voice). More speciﬁcally, they wanted to demonstrate that (a) their measures of employee voice covaried with the attributes they were intended to measure; (b) variation in the measures was preceded by variation in the attributes; and (c) variation in the measures was not caused by other variables. Following the recommendations of Podsakoﬀ et al. (2013), Maynes and Podsakoﬀ developed scripts portraying high and low levels of each type of voice behavior, validated the scripts with subject-matter experts, ﬁlmed the scripts with professional actors, showed participants who had been randomly assigned to experimental conditions videos exhibiting high or low levels of each behavior, and asked them to rate the actors using items of all four voice behaviors, as well as other, related constructs. Consistent with their hypotheses, Maynes and Podsakoﬀ (2014) reported that manipulations of all four voice behaviors inﬂuenced ratings of the respective voice behavior, and that estimates of the strength of the paths from voice manipulations to putatively related measures were signiﬁcantly greater than those for the paths to unrelated measures. Taken together, these ﬁndings suggest that the measures of voice behavior developed by Maynes and Podsakoﬀ have a high degree of veridical validity (MacKenzie et al., 2011), measure relatively distinct constructs, and are relatively distinct from measures of related constructs (i.e., the measures possess discriminant validity). Given the validity-related criticisms directed at some scales purporting to measure leadership constructs (Podsakoﬀ & Schriesheim, 1985; Schriesheim, House, & Kerr, 1976; Schriesheim & Stogdill, 1975; Van Knippenberg & Sitkin, 2013), this procedure may prove particularly worthwhile for researchers who are interested in validating measures of leadership behaviors. Use of laboratory experiments to test mediation models. Another underappreciated beneﬁt of laboratory experiments is that they allow researchers to conduct strong tests of mediated eﬀects models (Eden, Stone-Romero, & Rothstein, 2015; MacKinnon, Fairchild, & Fritz, 2007; Spencer, Zanna, & Fong, 2005; Stone-Romero & Rosopa, 2010). Most leadership researchers are interested not just in the direct eﬀects of leader behaviors on employee outcome variables, but also in identifying the theoretical mechanisms that transmit the eﬀect of these behaviors to their outcomes. For example, Podsakoﬀ, MacKenzie, Moorman, and Fetter (1990) examined whether the relationship between transformational leadership and employees' OCB was mediated by 11

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

provided by a study conducted by Bauman, Tost, and Ong (2016). These authors hypothesized that unethical behavior by high-ranking individuals changes how people respond to lower-ranked individuals who transgress in the same way. More speciﬁcally, they hypothesized that observers would recommend less severe punishment for people if they were imitating higher-ranked individuals rather than people of the same rank, but that this eﬀect would only operate when the two transgressors were members of the same organization. They based this hypothesis on the argument that “the rank-dependent imitation eﬀect on punishment arises because the ﬁrst actor's behavior is either seen as a mitigating circumstance that reduces blame for the imitator or makes the behavior seem more normal (or both) … [and that] … these mechanisms should only engage when transgressors are members of the same organization” (Bauman et al., 2016, p. 128). To test these hypotheses, Bauman et al. (2016) conducted a 2 × 2 scenario study (Study 2)2 in which they manipulated the rank of the ﬁrst actor (e.g., higher rank or the same rank as the second actor) and whether the actors worked for the same organization or not, and analyzed the severity of the punishments recommended by the participants. Consistent with their predictions, these authors found an interaction between the rank of the ﬁrst actor and organization commonality. More speciﬁcally, they found that (a) when the two actors were from the same organization, recommended punishments were less severe when the ﬁrst actor was ranked above the second actor, but (b) when the two actors were from diﬀerent organizations the severity of the punishment was the same, regardless of the rank of the ﬁrst actor. Based on these ﬁndings, Bauman et al. (2016, p. 129) concluded that, “Given that outgroup members should not inﬂuence attributions of blame or perceptions of descriptive norms, the results provide initial evidence that attributions of blame and descriptive norms may play a role in the rankdependent imitation eﬀect.”

performance was intended to cause harm (hostile) or was outside the subordinate's control (non-hostile). Consistent with their hypotheses, Liang et al. found that participants in the hostile attribution condition experienced greater hostility toward the subordinates than participants in the non-hostile condition. In the second laboratory experiment, they manipulated whether participants felt broadly hostile (e.g., angry, hostile, scornful, disgusted, or loathing) or well-disposed (e.g., happy, joyful, delighted, cheerful, excited, or enthusiastic) to a subordinate with whom they interacted, to determine whether these feelings affected participants' intention to engage in abusive behavior. Consistent with their hypotheses, Liang et al. found that participants' in the hostile condition indicated that they intended to use more abusive supervisory tactics on their subordinate than participants in the well-disposed condition. Thus, using these two experiments, Liang et al. supported an experimental-causal-chain between poor performance, hostility, and intentions to employ abusive supervision. Spencer et al. (2005, p. 846) argued that the experimental-causalchain approach provides particularly strong evidence for mediating eﬀects, even though they do not allow for statistical tests of mediation: “The reason we make this claim is that by manipulating both the independent variable and the mediating variable we can make strong inferences about the causal chain of events. We argue that such designs should be understood as a powerful way to examine psychological processes.” In addition, since the mediating variable and dependent variable are not obtained from the same source in this design, potential common source biases are minimized (Podsakoﬀ et al., 2003; Podsakoﬀ et al., 2012). However, as noted by Spencer et al. (2005) and others (Fischer, Dietz, & Antonakis, 2017; Stone-Romero & Rosopa, 2010), this approach is not without its limitations. One is that the researcher must be able to measure the proposed mediating mechanism, which is not always possible. Another limitation is that the researcher must be able to manipulate both the independent and mediating variables. Third, the researcher must be able to provide a compelling argument that the mediating variable measured in Study 1 and manipulated in Study 2 is, in fact, the same variable. Fourth, these designs do not allow researchers to estimate the indirect eﬀect statistically or to calculate how much of the eﬀect of the independent variable on the dependent variable can be attributed to the mediator. In other words, this design does not provide a statistical test of the “indirect eﬀect,” nor an eﬀect size for the mediating eﬀect. Finally, experimental-causal-chain designs are undoubtedly more diﬃcult to implement (if not impossible) when a model includes multiple mediators or when multiple mediating eﬀects are sequentially ordered. Notwithstanding these limitations, we believe that the experimental-causal-chain approach has more beneﬁts than limitations, and we encourage leadership researchers to consider this approach. Moderation-of-process designs. Unfortunately, the experimentalcausal-chain approach cannot be used in cases where the proposed mediator is not amenable to measurement. In such cases an experimental moderation-of-process approach can be used to examine the eﬀects of an unmeasured mediating mechanism, provided that the presence or absence of the mediation process can be manipulated (Spencer et al., 2005). The moderation-of-process approach can provide evidence of the mediating eﬀects of psychological processes provided two conditions are met. The ﬁrst condition is that the presumed moderating variable has an eﬀect on the proposed psychological mechanism or process. The second condition is that the only way in which the moderator inﬂuences the relationship between the independent variable and the dependent variable is through its eﬀect on the psychological process, and that there is no other explanation for the observed pattern of moderating eﬀects. In other words, the manipulations of the moderator indicate the presence or absence of the process presumed to transmit the eﬀect of the independent variable on the dependent variable, and not some other process. One example of the use of this design in the leadership domain is

Potential limitations of laboratory research Despite their advantages, laboratory experiments have some potential limitations. However, several of these limitations may be less problematic than once thought. For example, Woﬀord (1999) noted that one of the main criticisms is that complex constructs, like leadership, cannot be validly operationalized in laboratory settings. Indeed, leader behavior is complex and many types of leader behavior are highly correlated with each other (DeRue, Nahrgang, Wellman, & Humphrey, 2011; Judge & Piccolo, 2004; Piccolo et al., 2012). That said, contemporary deﬁnitions of charismatic and transformational leadership appear to be no more diﬃcult to operationalize in laboratory settings than in ﬁeld settings, where they are typically measured using surveys. Moreover, laboratory experiments have been used to examine a variety of complex processes, including strategic management decisions (Schwenk, 1982) and battleﬁeld decision processes (Zelditch, 1969). The relatively high correlations reported between diﬀerent forms of leadership are more problematic if there is overlap at the conceptual level. However, leadership is not the only area in the management literature where there is overlap between purportedly diﬀerent constructs. For example, organizational commitment and organizational involvement share conceptual content, as does work-related prosocial behavior and OCB. Regardless of context, the key is to employ clear and concise conceptual deﬁnitions of constructs which not only identify the attributes shared by the related constructs, but also articulate the attributes unique to the focal construct under consideration (MacKenzie, 2 We generally agree with Lonati et al. (2018) regarding the potential weaknesses of hypothetical choice scenario studies (e.g., the potential to create unwanted demand eﬀects, and the lack of certainty as to whether self-reported choices would be reﬂected in actual behavior). Nevertheless, we reference the Bauman et al. (2016) study because it is one of the few leadership studies we could ﬁnd that employs a moderation-of-process approach.

12

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Henshel (1980), Mook (1983), and Ilgen (1986), criticism about the artiﬁciality of laboratory experiments demonstrates a misunderstanding of the objectives of this research. The basic goal of laboratory experiments is to test theoretical propositions about the causal relationships between variables (Berkowitz & Donnerstein, 1982; Ilgen, 1986; Kruglanski, 1975; Postman, 1955) and in this context, artiﬁciality may be regarded as a virtue rather than a vice (Henshel, 1980; Mook, 1983). Moreover, as noted by Ilgen (1986), beyond allowing researchers to examine whether some event, condition, or process can occur, laboratory experiments are particularly well-suited to addressing questions and issues which may be impractical to investigate in ﬁeld settings. For example, conducting ﬁeld research may (a) be too costly, (b) raise ethical concerns, (c) put the health or safety of organizational participants at risk or (d) not allow the researcher to examine the eﬀects of certain variables directly. Similarly, because most social phenomena are complex and determined by multiple factors, “it is nearly impossible to learn much about speciﬁc cause-eﬀect relations without the beneﬁt of a controlled artiﬁcial research setting” (Kardes, 1996, p. 280). That said, we do not mean to suggest that laboratory experiments are appropriate for exploring all leadership phenomena. For example, Mitchell, Vogel, and Folger (2015) noted that it may be diﬃcult to get participants to express realistic yet taboo reactions in laboratory settings, such as the satisfaction experienced from witnessing one's peers being subjected to (justiﬁed) abusive supervision. Moreover, building on the work of Heath and Sitkin (2001), Mitchell et al. argued that some aspects of organizations may be diﬃcult to capture in laboratory settings. Although we generally agree with these statements, we feel it is important to note that these concerns should not discourage leadership researchers from studying phenomena that can be examined in a laboratory setting. Regarding the generalizability of laboratory experiments, there are several points worth noting. First, a number of scholars (Bass & Firestone, 1980; Berkowitz & Donnerstein, 1982; Highhouse, 2009; Kardes, 1996; Lucas, 2003; Lynch, 1982) have noted that illuminating the psychological processes by which the independent variable aﬀects the dependent variable is more important than demonstrating that ﬁndings from the laboratory generalize to ﬁeld settings. Second, comparisons of the empirical relationships reported in laboratory experiments and ﬁeld studies (Anderson, Lindsay, & Bushman, 1999; Locke, 1986; Mitchell, 2012; Vanhove & Harms, 2015), suggest that they are often quite similar. For example, Locke (1986) asked prominent scholars in a variety of diﬀerent content areas in I/O psychology, organizational behavior and human resources management to compare the results of laboratory and ﬁeld studies. Locke's summary of these comparisons indicated that the direction of the eﬀects observed were almost always the same in both settings. More compelling evidence for the comparability of laboratory and ﬁeld ﬁndings comes from a series of meta-analytic studies. Anderson et al. (1999) reported a high correlation (r = 0.73) between eﬀect sizes from laboratory and ﬁeld studies of a variety of social psychological phenomena. Similar results were reported by Mitchell (2012), who replicated the Anderson et al. meta-analysis using a substantially larger sample (217 versus 38) covering a wider range of psychological phenomena. After removing an outlier, Mitchell reported that the average correlation between the effect sizes reported in ﬁeld and laboratory settings was virtually the same (r = 0.71) as that reported by Anderson et al. Mitchell cautioned that the average correlation did vary by sub-ﬁeld, but it is encouraging that the correlation was highest in the ﬁeld of I/O psychology (r = 0.89). Finally, a more recent study by Vanhove and Harms (2015) examined 203 meta-analyses from both laboratory and ﬁeld settings reporting estimates for relationships involving “workplace phenomena.” These authors reported that although the correspondence between ﬁndings from laboratory and ﬁeld settings is still fairly high in some cases, the relationship is dependent on several factors, including the speciﬁc type of variables used as predictors and outcomes in these settings (i.e., demographic characteristics, traits, psychological states,

2003; Podsakoﬀ, MacKenzie, & Podsakoﬀ, 2016; Suddaby, 2010). Perhaps one of the best examples of this approach is Howell and Frost's (1989) laboratory-based manipulation of charismatic, structuring, and considerate leadership styles that we discussed earlier. Drawing on the conceptual deﬁnitions in the literature, Howell and Frost distinguished these leadership styles in terms of three main attributes: verbal behaviors; nonverbal behaviors/interaction styles; and paralinguistic cues. The results of their study show that rather complex leader behaviors can be distinguished from one another, and that they produce some predictable diﬀerences in a variety of outcome variables. Another major concern regarding laboratory experiments is that their ﬁndings may be aﬀected by demand characteristics (Lonati et al., 2018; Orne, 1962, 1969), and experimenter expectancy eﬀects (Rosenthal, 1967; Rosenthal & Rosnow, 1991). According to Crano et al. (2015, p. 134), demand characteristics represent the “totality of all social cues communicated in a laboratory not attributable to the manipulation, including those emanating from the experimenter and the laboratory setting, which alter and therefore place a demand on the responses of participants.” As noted by Shimp, Hyatt, and Snyder (1991), these cues are problematic because they lead researchers to make erroneous inferences about cause-eﬀect relationships. The related phenomenon of experimenter expectancy eﬀects is deﬁned by Rosenthal and Rosnow (1991, p. 619) as an “artifact which results when the hypothesis held by the experimenter leads unintentionally to behavior toward the subjects which, in turn, increases the likelihood that the hypothesis will be conﬁrmed.” Although demand characteristics and experimenter expectancy effects can be problematic in any research setting (McCambridge, de Bruin, & Witton, 2012), they are often more challenging in laboratory experiments because such settings heighten the salience of the stimuli manipulated by the researcher. Lonati et al. (2018) identiﬁed several features of experimental settings that increase the probability of demand characteristics, including the perceived authority of the experimenter over participants, participants' need for social approval, the salience of the experimental manipulation, and situations in which experimenters who are not blind to the treatments interact with the participants. Despite the potential problems produced by demand artifacts, research has shown that these eﬀects typically occur under speciﬁc conditions (i.e., when participants are apprehensive about how their performance is being evaluated or when they become aware of the speciﬁc hypothesis being tested and adopt a faithful participant role; Weber & Cook, 1972), and demand eﬀects are not inevitable provided researchers take precautions to minimize potential problems. Strategies for reducing demand artifacts have been provided by Sawyer (1975), Rosenthal and Rosnow (1991) and Lonati et al. (2018). In addition, Rosenthal and Rosnow (1991) have provided several suggestions for minimizing experimenter expectancy eﬀects. These include (a) taking steps to ensure that experimenters are blind to the hypotheses and/or treatment condition being administered, (b) minimizing the interactions between experimenter and study participants, (c) using more than one experimenter, (d) monitoring the behavior of the experimenters during the study and (e) analyzing experiments for order eﬀects. Laboratory experiments are also criticized for their artiﬁciality on the grounds that (a) observations in laboratory settings are made over a relatively short period; (b) the consequences of behavior in a laboratory setting rarely correspond to those for the same behavior in real organizational settings; (c) “paper people” are not the same as the stimuli employees encounter in real organizational settings (Murphy, Herr, Lockhart, & Maguire, 1986); and (d) social interactions and tasks performed in work settings are much more complex than those that occur in the laboratory (Dobbins, Lane, & Steiner, 1988; Ilgen, 1986). These criticisms have led some researchers (Gadlin & Ingle, 1975; Harré & Secord, 1972; Kingstone, Smilek, Ristic, Freisen, & Eastwood, 2003) to argue that the control exercised in the laboratory is purchased at the price of generalizability, and other researchers to dismiss the ﬁndings from laboratory experiments as irrelevant. However, as noted by 13

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

would encourage leadership researchers to consider the short- and longterm eﬀects of using deception in laboratory experiments and consistent with APA guidelines, to avoid the unnecessary use of deception. Moreover, if deception is used, researchers should be able to justify its use to participants during the debrief.

workplace characteristics and decision-making). Third, criticisms regarding the lack of generalizability of laboratory experiments appear to assume that the ﬁndings of ﬁeld studies are inherently more generalizable. However, research by Dipboye and Flanagan (1979) called this assumption into question. Their analysis of the applied psychology literature indicated that studies conducted in ﬁeld settings are as narrow as laboratory studies in terms of the types of actors, behaviors, and settings sampled. This led them to concluded that,

What can be done to increase the probability that laboratory experiments will be published? Given the predisposition that many reviewers and editors appear to have against laboratory experiments, researchers must present compelling reasons for conducting and reporting such studies. Colquitt (2008) provided a useful set of recommendations for those interested in publishing laboratory experiments in the organizational sciences. These recommendations include (a) making a signiﬁcant contribution to the literature by testing, extending, or building new theory; (b) ensuring that the experimental setting and procedures capture the essence of the constructs of interest (i.e., experimental realism); (c) meeting high standards of technical adequacy in terms of internal validity, construct validity, and statistical conclusion validity; (d) using behavioral dependent variables or their outcomes (where appropriate); and (e) striving to produce original, interesting, and important research ﬁndings. Although Colquitt's recommendations were developed speciﬁcally for researchers interested in publishing in AMJ, they are relevant to scholars wanting to publish laboratory research in other journals in the organizational sciences, including those focused on the leadership domain. Beyond following Colquitt's (2008) recommendations, there are other strategies researchers can use to increase the likelihood that their laboratory experiments will be published. These strategies include (a) providing complementary qualitative data, either from the laboratory experiment itself or from a separate qualitative study; (b) pairing the results of the laboratory experiment with a quantitative ﬁeld study; or (c) combining the results of several laboratory experiments to help clarify theoretical mechanisms (i.e., mediating variables) or boundary conditions (i.e., moderators). A variation on the ﬁrst strategy was employed by Podsakoﬀ et al. (2011) in their examination of the eﬀects of job applicants' propensity to exhibit OCB on selection decisions. In addition to the quantitative analysis of the eﬀects of task performance and OCB on participants' ratings and salary recommendations for interviewees, Podsakoﬀ et al. also content analyzed participants' responses to open-ended questions about their decisions. The results of this analysis not only supported the quantitative ﬁndings, but also provided additional insights into the factors inﬂuencing raters' judgments. For example, these analyses indicated that (a) low-level responses had substantially more perceived inﬂuence on participants' evaluations and decisions than high-level responses and (b) low levels of voice and helping behaviors were perceived to have more inﬂuence than low levels of task performance or organizational loyalty. These ﬁndings led Podsakoﬀ et al. to speculate that responses indicating low helping behavior might be treated as a signal that the job candidate might be diﬃcult to work with, or unwilling to chip in and help others when necessary. Likewise, these authors speculated that low voice might be interpreted by raters as a signal that the job candidate would be unwilling to take the initiative to help the organization, even if he or she had suggestions for improvement. It is unlikely that these deeper insights would have been made without the qualitative data. Other advantages of using mixed methods designs that integrate quantitative and qualitative data in leadership research have been discussed by Stentz, Plano Clark, and Matkin (2012). An example of the second strategy is the study by Giessner, van Knippenberg, and Sleebos (2009) on the eﬀects that leaders' group prototypicality and performance have on followers' perceptions of their leadership eﬀectiveness. Using a laboratory experiment, in combination with a scenario study and a cross-sectional ﬁeld study, these authors found support for their hypothesis that prototypical and non-prototypical leaders would receive similar evaluations after success, but after

Contrary to the common belief that ﬁeld settings provide for more generalization of research ﬁndings than laboratory settings do, ﬁeld research appeared as narrow as laboratory research in the actors, settings, and behaviors sampled. Indeed, industrial-organizational psychology seems to be developing in the laboratory a psychology of the college student, and in the ﬁeld, a psychology of the self-report of male, professional, technical, and managerial employees in productive-economic organizations. (Dipboye & Flanagan, 1979, p. 141) Of course, we are not suggesting that leadership researchers conducting laboratory experiments should ignore concerns about the generalizability of their ﬁndings. After all, leadership researchers are interested in improving the eﬀectiveness of leaders in real-life organizational settings. However, like Highhouse (2009), we think that the primary focus of laboratory experiments should be on making sure that the manipulations of leadership phenomena are valid, representative, fair, and powerful enough to produce the intended eﬀects. Another concern about laboratory research is that student participants are not representative of the general population, which raises questions about the applicability of studies which rely on student samples. Indeed, there is a long history of concern about the use of student participants in social science research (Cooper, McCord, & Socha, 2011; McNemar, 1946; Oakes, 1972; Peterson, 2001; Rosenthal & Rosnow, 1969; Schultz, 1969). The basic issue is whether student samples produce the same results as non-student samples. Several authors (Compeau, Marcolin, Kelley, & Higgins, 2012; Gordon, Slade, & Schmitt, 1986, 1987; Henry, 2008; Landy & Bates, 1973; Sears, 1986; Slade & Gordon, 1988) have claimed that student participants diﬀer in important ways from non-student participants. However, others (Dobbins et al., 1988; Greenberg, 1987) have argued that criticisms of the use of students demonstrates a misunderstanding of the goals of laboratory experiments, and are often ﬂawed. One potential way to reconcile these conﬂicting opinions was suggested by Gordon et al. (1986). These authors noted that in many of the cases where student and non-student samples produce similar results, the student and non-student samples were equally (un)familiar with the tasks that they are asked to perform, and that diﬀerences are likely to occur when populations are diﬀerentially familiar with a task. In other words, to increase generalizability, student and non-student participants should be matched for their task familiarity. Unfortunately, we are not aware of any systematic examination providing evidence of this proposal; thus, we regard it as an interesting avenue for future research. One ﬁnal limitation of laboratory experiments is the possible side eﬀects of using deception. Although the use of deception is fairly common in some domains of psychological inquiry, several researchers (Antonakis, 2017; Hertwig & Ortmann, 2008; Jamison, Karlan, & Schechter, 2008; Ortmann & Hertwig, 2002) have noted potential problems with the practice. For instance, research has shown that participants who feel deceived behave diﬀerently from those who do not sense deception, and that repeated exposure to deception makes participants less trusting of researchers' intentions in subsequent studies (Hertwig & Ortmann, 2001, 2008; Jamison et al., 2008). Of course, obscuring the reasons for an experiment or a manipulation is not always deception. However, lying, deliberately misleading study participants, or mischaracterizing the purpose of the experiment, is typically considered to be deception (Ortmann & Hertwig, 2002). Therefore, we 14

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Nahrgang et al., 2013). Therefore, well-crafted group experiments (whether conducted in the lab or the ﬁeld) should enhance the likelihood of getting research published in management and leadership journals.

failure group-prototypical leaders would be rated more eﬀective than non-prototypical leaders. In discussing the merits of using a cross-sectional ﬁeld study to supplement their experimental study the authors noted that, The laboratory may seem a somewhat artiﬁcial setting to study the role of leader group prototypicality, given the lack of interaction between leader and followers and the ad hoc nature of the group and the leadership relation… [Therefore], an obvious and important question is whether the relationships studied in the present research may not only be observed in a laboratory setting but also in an organizational setting. To address exactly that question, the present research combined diﬀerent research methodologies to provide both evidence from controlled experiments that can speak to issues of causality and evidence from the ﬁeld that can speak to issues of generalizability. (Giessner et al., 2009, p. 446)

Field experiments Characteristics of ﬁeld experiments Hauser et al. (2017, p. 186) deﬁned ﬁeld experiments as “Studies that induce a change in a randomly selected subset of individuals (or teams, or units) within their natural organizational context, and compare outcomes to a randomly selected group for which the change was not introduced.” As indicated in Table 2, ﬁeld experiments share several similarities with laboratory experiments (e.g., establishing casual relationships; manipulation of independent variables; random assignment). That said, there are important diﬀerences, some of which favor the ﬁeld experiment. For example, because ﬁeld experiments are conducted in “natural” environments, they tend to be more realistic, less likely to sensitize participants to the experimental conditions, and produce results that are generally perceived to be more generalizable. However, unlike laboratory experiments, ﬁeld settings may require that the independent variables be manipulated by someone in the organization where the study is being conducted, rather than the researcher. This lack of control raises concerns about the quality of the manipulation (construct validity), if the person carrying out the manipulation does not maintain the same standards as a good experimenter. Field experiments also do not typically permit control over potentially important aspects of the context in which the study is conducted (Harrison & List, 2004). This can increase the possibility that rival explanations explain the ﬁndings of ﬁeld experiments, and also means that the results of ﬁeld experiments may be more diﬃcult to replicate.

A variation on the third strategy (combining the results of several laboratory experiments to clarify the boundary conditions for a speciﬁc relationship) has been reported recently by Bendahan, Zehnder, Pralong, and Antonakis (2015). These authors examined the eﬀects that leadership power has on corruption. In their ﬁrst experiment, they manipulated two variables assumed to be related to a leader's power (number of followers, and the amount of autonomy the leader has in allocating rewards), and found that both of these variables independently inﬂuenced leaders' antisocial behavior (corruption). In the second experiment, Bendahan et al. examined the potential moderating eﬀects of two individual diﬀerence variables (leader's personality and testosterone level) on the relationship between power and corruption. Their results showed that power interacted with testosterone level to predict corruption, such that corruption was highest when power and baseline testosterone level were both high. They also found that a leader's honesty did not interact with power, but it did have a main eﬀect on corruption that dissipated over time. Thus, the combination of these two studies provided support for Lord Acton's maxim that “Power tends to corrupt, and absolute power corrupts absolutely” (Acton & Himmelfarb, 1948), and also provided new insights into the eﬀects of testosterone on this relationship. Finally, given the increased interest in the study of groups and teams (Mathieu, Hollenbeck, van Knippenberg, & Ilgen, 2017), another strategy for increasing the likelihood of publishing laboratory experiments is to study the relationships between group-level inputs, processes, and outcomes. Several examples using this strategy exist in the literature. For example, Johnson, Hollenbeck, DeRue, Barnes, and Jundt (2013) examined the eﬀects of providing groups with feedback and diagnostic lists on team change processes and performance in selfmanaged teams, and found that structurally misaligned teams that received diagnostic lists and feedback about their misalignment were more likely to change their structure and improve their performance than teams that did not receive the feedback or diagnostic lists. Nahrgang et al. (2013), examined the eﬀects of three diﬀerent types of goal setting (speciﬁc learning, general “do your best” learning, and speciﬁc performance goals) on team performance. Contrary to ﬁndings at the individual level, they reported that: (a) teams with speciﬁc learning goals performed worse than teams with “do your best” learning goals or speciﬁc performance goals, and (b) the negative eﬀect of speciﬁc learning goals, relative to “do your best” or speciﬁc performance goals were magniﬁed under conditions of higher task complexity. The beneﬁts of using laboratory experiments to study group phenomena include: (a) enhancing a researcher's ability to establish causal relationships between group inputs and group processes and their outcomes, (b) minimizing the eﬀects of extraneous variables that are diﬃcult to control when conducting research with real groups in organizational settings, and (c) they can be used to provide strong tests of whether individual-level eﬀects are homologous at the group level (cf.

Examples of ﬁeld experiments in leadership research We focus on three studies (Avey, Avolio, & Luthans, 2011; Dvir, Eden, Avolio, & Shamir, 2002; Martin, Liao, & Campbell, 2013) that highlight some of the challenges of conducting experiments in ﬁeld settings. Dvir et al. (2002) studied the eﬀects of transformational leadership training on follower development and performance in two phases. In Phase 1, 160 infantry cadets engaged in oﬃcer training in the Israel Defense Force (IDF) were randomly assigned to experimental or alternative treatment group workshops designed to enhance their leadership skills. The experimental workshops focused on transformational leadership theory, whereas the alternative treatment workshops focused on elements of “eclectic” leadership and were based on a psychodynamic framework. Both workshops used a variety of training methods, including role playing, group discussions, simulations, presentations, video cases, and peer and trainer feedback. Phase 2 of the study began after the cadets had completed the ofﬁcer training course. This phase was conducted during a four-month infantry basic training course that began two months after the leadership training workshops ended. In this phase 54 (34%) of the cadets from Phase 1 were assigned to lead platoons undergoing basic training; 32 of them had received training in transformational leadership and the remaining 22 had received training in eclectic leadership. These 54 platoon leaders had a total of 90 non-commissioned oﬃcers (NCOs) reporting directly to them and a total of 724 indirect followers (new recruits) who reported to the NCOs. Dvir et al. (2002) assessed the impact of the platoon leaders' behaviors on (a) the development of their direct followers (NCOs) and (b) the development and performance of their indirect followers (new recruits) at the end of basic training course. Dvir et al. (2002) collected leadership ratings and developmental information from the NCOs and the new recruits at the beginning and end of the basic training course and the performance data from the new recruits at the end of basic training. They conducted several manipulation checks to determine whether the transformational leadership 15

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

training had the intended eﬀects. As expected, the ﬁrst manipulation (confound) check showed that there were no signiﬁcant diﬀerences in how favorably the platoon leaders in the experimental and alternative treatment groups responded to the leadership workshops. This ﬁnding suggests that diﬀerences in the eﬀects produced by the platoon leaders could not be attributed to diﬀerences in how positively they responded to the training they had received. The second manipulation check indicated that the platoon leaders from the experimental group had acquired more knowledge about transformational leadership theory than the platoon leaders in the alternative treatment group. Finally, the third manipulation check indicated that NCOs in platoons led by the members of experimental group gave their platoon leaders higher ratings for transformational leadership behavior than the NCOs in the alternative treatment group. In contrast, new recruits' ratings of the transformational leadership behavior of platoon leaders were not signiﬁcantly diﬀerent across groups. The results of the study revealed several diﬀerences between the development of the NCOs of the platoon leaders who had received transformational leadership training and those who had received eclectic leadership training. For example, there were signiﬁcant group diﬀerences in NCOs' feelings of self-eﬃcacy, critical or independent thinking ability, and extra eﬀort, as well as a marginal diﬀerence in the strength of their collectivistic orientation. However, somewhat surprisingly, the majority of these diﬀerences resulted from the fact that the dependent variables tended to decline over basic training in the alternative treatment group, but remained unchanged in the experimental group. Thus, the primary impact of the transformational leadership training was to prevent regression in NCOs serving under the person trained, rather than to enhance their development. Although Dvir et al. (2002) found that measures of new recruits' development were similar regardless of the leadership training their platoon leader had received, they did ﬁnd performance diﬀerences between the two groups of new recruits. Speciﬁcally, recruits in the experimental platoons performed better than recruits in the eclectic leadership platoons on a written test about light weapons and on an obstacle course, and there was a marginal group diﬀerence in performance on a practical light weapons test. Dvir et al.'s (2002) study is interesting, because it provides some insight into both the beneﬁts and challenges of ﬁeld experiments in real organizational settings. On the positive side, this study is one of the ﬁrst to demonstrate that leaders' behavior can aﬀect both their direct and indirect followers. Demonstrating the indirect impact of leadership would likely be diﬃcult, although perhaps not impossible, in a laboratory setting. On the other hand, the Dvir et al. study also demonstrates some of the challenges that researchers face in conducting experiments in ﬁeld settings because they may not possess as much control over the independent variable and have considerably less control over extraneous variables. This was exempliﬁed by the fact that a few weeks before the platoon leaders in the experimental condition began performing their leadership role they received a three-hour “booster” session to reinforce the lessons that they had learned in training, but due to budgetary constraints the platoon leaders in the control condition did not receive an analogous session. Dvir et al. noted that this diﬀerence in training protocols meant that they could not rule out the possibility that the booster session was responsible for some of the eﬀects they had observed. In our second example, Martin et al. (2013) utilized a pretestposttest experimental design with a control group to examine the consequences of leader behaviors. Business leaders in the United Arab Emirates were recruited and randomly assigned to one of three training conditions: (a) empowering leadership, (b) directive leadership or (c) a control condition. Leaders in the empowering and directive leadership groups received their experimental treatments in two phases. The ﬁrst phase consisted of a one- to two-hour training session. The second phase lasted for 10 weeks and required the leaders to engage in the newly learned leader behavior for 15 min each day. To ensure

compliance with this protocol, participating leaders were asked to maintain a daily log and hold bi-weekly discussions with a research assistant. Participants in the control group did not receive any speciﬁc training in leadership behavior, and were simply told to continue leading their teams in their usual style. Data on the dependent variables were obtained from internal or external customers of the leaders. Employee surveys were used to check the two manipulations of leadership behavior, and a measure of employees' satisfaction with their supervisor was used in moderator analysis. Customer surveys were used to measure core task proﬁciency and proactivity of the work units. The manipulation checks seemed to conﬁrm that the leadership treatments had their intended eﬀects; speciﬁcally, the authors reported that the directive leadership treatment accounted for 15% of the variance in the directive leadership manipulation check, and the empowering leadership treatment accounted for 36% of the variance in the empowering leadership manipulation check. Consistent with the authors' hypotheses, the results showed that work units' task proﬁciency and their satisfaction with their leaders increased between the pretest and posttest in both the directive and empowering leadership groups, but not in the control group. Post hoc comparisons indicated that posttest proﬁciency was higher in the empowering and directive leadership conditions than in the control condition, and similar in the two experimental groups. Also consistent with the authors' hypotheses, work units' proactivity increased from the pretest to the posttest in the empowerment leadership group, but not the other two groups; post hoc comparisons indicated that posttest proactivity was similar in the directive and control group work units, and higher in the empowering group units than in either the directive or control group units. However, Martin et al. (2013) did not ﬁnd evidence to support their hypothesis that satisfaction with the leader moderated the relationships between leader behavior and the outcome variables. Indeed, contrary to their hypotheses (a) satisfaction with the leader did not interact with directive leadership to inﬂuence task proﬁciency and (b) although satisfaction with leader did moderate the relationships between empowering leadership and work units' task proﬁciency and proactivity, the interactive eﬀects were in the opposite direction to that predicted. The Martin et al. (2013) study is interesting because the authors examined: (a) the eﬀects of two qualitatively diﬀerent types of leadership behavior on unit-level outcome variables, and (b) potential moderators of these eﬀects. Thus, this study demonstrates that ﬁeld experiments possess some of the same ﬂexibility as laboratory experiments. However, Martin et al. did report that the empowering leadership treatment accounted for more than twice the amount of variance in the relevant manipulation check (36%) than the directive leadership treatment did (15%), which raises questions about the equivalence of the treatments (Cooper & Richardson, 1986). There are several reasons for the possible lack of equivalence in this study. For example, (a) the directive leadership treatment may not have captured the complete conceptual domain of this construct or (b) the manipulation check may not have measured the directive leadership construct eﬀectively. In the ﬁrst case, the manipulation lacks construct validity, whereas in the second case it is the measure that lacks validity. Alternatively, it is possible that the manipulation captured the complete domain of the construct, but the directive training was not as eﬀective as the empowerment training or that although the training in both experimental conditions was equally eﬀective, the participants in the directive leadership condition were more uncomfortable exhibiting the directive leadership behaviors than participants in the empowering leadership condition. In any case, diﬀerences in the strengths of these manipulations may be an important qualiﬁer of the reported ﬁndings. We are not suggesting that establishing the equivalence of qualitatively diﬀerent behavioral treatments is more important in ﬁeld experiments than in laboratory experiments. However, it may be easier to ensure equivalence in laboratory experiments than ﬁeld experiments, because laboratory researchers generally have greater ability to pilot 16

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

variety of topics that are of interest to organizational scholars and practitioners alike, including the eﬀects of monetary (Bandiera, Barankay, & Rasul, 2007, 2011; Shearer, 2004) and non-monetary incentives (Bandiera et al., 2011) on employees' motivation and performance, how credit market imperfections and liquidity constrain a ﬁrms' growth (de Mel, McKenzie, & Woodruﬀ, 2008), and how frontline opinion leaders can be used as change agents in organizational settings (Lam & Schaubroeck, 2000). Third, ﬁeld experiments are less susceptible to criticisms about artiﬁciality because they are conducted in real organizational settings, with “real” people performing “real” jobs; the manipulation of the independent variable reﬂects the intensity of stimulus events in organizational settings; and participants are typically exposed to the manipulation for a longer period of time and are naturally incentivized to do their jobs (Lonati et al., 2018). Moreover, because participants are often less aware of the experimental conditions in ﬁeld settings, the possible eﬀects of demand characteristics are less of a concern. In addition, ﬁeld experiments, like laboratory experiments, provide an opportunity to measure behaviors (and their outcomes) rather than attitudes and perceptions. Finally, like laboratory experiments, ﬁeld experiments can be used to examine mediation eﬀects using experimental-causal-chain and moderation-of-process designs. Although we are not aware of any set of ﬁeld experiments in the leadership domain that were explicitly designed to test for mediation eﬀects using either of these approaches, Eden et al. (2015) demonstrated that meta-analyses can be used to synthesize the results of multiple randomized ﬁeld experiments to establish evidence in support of an experimental-causal-chain model. These authors examined the evidence relating to Eden's (1992, 2003) Pygmalion mediation model. This model hypothesizes that: managers' expectations → managers' leadership behavior → subordinates' self-efﬁcacy → subordinates' performance. In order to test these causal linkages, Eden et al. (2015) ﬁrst examined meta-analytic evidence from ﬁve ﬁeld experiments demonstrating that managers' expectations cause managerial leadership behavior. Next, they examined evidence from the only true ﬁeld experiment (Dvir et al., 2002) showing that leadership behavior aﬀects subordinates' self-eﬃcacy. Finally, they examined data from ﬁve ﬁeld experiments showing that subordinates' self-eﬃcacy increases their performance. Eden et al. argued that the collective evidence from these studies supports the hypothesis that leadership behavior and self-eﬃcacy mediate the relationship between leaders' expectations and subordinates' performance. Of course, as noted earlier, the ability to use experimental-causalchain designs to make strong statements about the veracity of mediating eﬀects is conditional on several factors: the measurability of the proposed mediator, the manipulability of both the independent and mediating variables, and whether the mediating variables being manipulated and measured do in fact represent the same construct. Nevertheless, given the advantages of ﬁeld experiments when it comes to maximizing the internal and external validity of ﬁndings, we encourage leadership researchers to consider using experimental-causalchain designs to examine mediation hypotheses. Like Eden (2017) and Hauser et al. (2017), we also encourage researchers to explore the possibility of using moderation-of-process designs in their ﬁeld experiments on leadership. As noted by Hauser et al. (2017, p. 195), this is possible if researchers include “not just an experimental condition that produces the desired treatment eﬀect (as predicted by the theory), but also include a condition that turns oﬀ this eﬀect by blocking a theorized pathway responsible for the eﬀect. That is, by also testing a version of the intervention where the treatment should not show the same impact.” Of course, researchers examining psychological processes using moderation-of-process designs must have strong theoretical grounds for assuming that the levels of the moderator block or facilitate the psychological process of interest, and that there is not some other process that could explain the eﬀects of the independent variable on the dependent variable(s).

test their manipulations. For example, in the Podsakoﬀ et al. (2011) laboratory experiment, the authors assessed the equivalence of their behavioral manipulations (task performance, helping, voice and loyalty) in a pilot study with students who did not participate in the main study. In the pilot study, students were: (a) shown one video depicting the questions and responses for either a high or low level of one behavior, (b) asked to sort the script of that video into one of the four behavioral categories on the basis of its content, and (c) then rate the extent to which the behavior in the script represented a high or low level of the focal behavior, using a scale from 1 (low level) to 7 (high level). Analysis of the data showed that: (a) the participants classiﬁed the scripts into the appropriate behavioral category 92% of the time, (b) there was a high degree of consensus about the level of behavior (high vs. low) depicted in videos, and (c) high- and low-level videos of each behavior were found to diﬀer signiﬁcantly from each other. Notwithstanding the potential diﬃculties in obtaining this data in ﬁeld settings, when possible, leadership researchers conducting ﬁeld experiments should examine the equivalence of their manipulations before employing them. Our ﬁnal ﬁeld experiment was conducted by Avey et al. (2011). These researchers examined the eﬀects of leader positivity and problem complexity on followers' positivity and job performance. The sample consisted of engineers in an aerospace ﬁrm, and participants were randomly assigned to one of four conditions: (a) a high leader positivity-low problem complexity condition, (b) a high leader positivityhigh problem complexity condition, (c) a low leader positivity-low problem complexity condition, and (d) a low leader positivity-high problem complexity condition. In order to make the experimental tasks as realistic as possible, participants in all conditions were asked to solve problems that were directly related to their jobs. Similarly, in order to ensure that the leadership manipulations were as realistic as possible, Avey et al. led the participants to believe that the high or low expressions of positivity they received were from a team of senior engineering leaders to whom they reported. Manipulation checks indicated that the leader positivity manipulation inﬂuenced the measure of this construct, but did not inﬂuence the measure of problem complexity, whereas the manipulation of problem complexity inﬂuenced the measure of problem complexity but not the measure of leadership positivity. In addition, the leadership positivity by problem complexity interaction did not aﬀect either the leadership positivity or the problem complexity manipulation checks. Consistent with Perdue and Summers (1986), these ﬁndings provide strong evidence that leadership positivity can be manipulated independently from problem complexity, and that the manipulations of these variables were not confounded in this study (at least not with the content from their other manipulation). The results showed that (a) leadership positivity had a positive eﬀect on followers' positivity and performance, (b) problem complexity had a negative eﬀect on followers' positivity and (c) there was no interaction between leader positivity and task complexity with respect to followers' positivity or performance. Strengths of ﬁeld experiments Similar to laboratory experiments, one of the strengths of ﬁeld experiments is that the researcher (or someone in the organization) exercises control over the independent variable(s), and the participants are randomly assigned to conditions. This means that researchers can use ﬁeld experiments to explore causal relationships, and reduce the threats to the internal validity of their studies. Second, ﬁeld experiments are a constructive response to the growing interest in evidencebased management practices that has developed over the past few decades (Pfeﬀer & Sutton, 2006; Rousseau, 2012; Rynes & Bartunek, 2017; Shadish & Cook, 2009). As noted by Shadish and Cook (2009), there is an increasing number of ﬁelds in which practitioners and administrators are interested in developing and adopting evidence-based interventions, and ﬁeld experiments provide such evidence. For example, ﬁeld experiments lend themselves to the investigation of a 17

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

ﬁnding, which might not have otherwise been readily identiﬁable.

Potential limitations of ﬁeld experiments As noted above, ﬁeld experiments have some limitations (e.g., researchers are unlikely to exercise as much control over the independent or extraneous variables as in laboratory experiments, it is potentially more diﬃcult to replicate the ﬁndings of ﬁeld experiments, and it may prove more diﬃcult to implement complex factorial designs in ﬁeld settings than in the laboratory). In addition, another potential limitation of ﬁeld experiments is that it can be diﬃcult to gain access to organizations to carry out ﬁeld experiments. Indirect evidence of this is provided by Scandura and Williams (2000), who reported that ﬁeld experiments accounted for a relatively small (and declining) percentage of studies reported in the leading management journals (i.e., 3.9% in the 1980s to 2.2% in the 1990s). This is consistent with our analysis, which indicated that the percentage of ﬁeld experiments published in the seven journals we examined never exceeded 2%. However, given that ﬁeld experiments require organizations to allow a researcher to manipulate variables likely to have a signiﬁcant impact on employees' attitudes, perceptions, and behaviors, it is not really surprising that they are often reluctant to participate. Moreover, since organizations want to optimize the performance of their units, managers may look unfavorably on the potential costs of disruption resulting from the random assignment of participants to conditions. Finally, Eden (2017) noted that ﬁeld experiments are often perceived to be complex and diﬃcult to conduct, which may deter scholars from attempting them.

Quasi-experiments Characteristics of quasi-experiments According to Grant and Wall (2009, p. 655), A quasi-experiment is a study that takes place in a ﬁeld setting and involves a change in a key independent variable of interest but relaxes one or both of the deﬁning criteria of laboratory and ﬁeld experiments: random assignment to treatment conditions and controlled manipulation of the independent variable. Quasi-experiments thus include experimenter-controlled and manager-controlled interventions in which random assignment is not achieved, such as when treatments are assigned to intact or preexisting groups. As indicated in Table 2, quasi-experiments share some similarities with both laboratory and ﬁeld experiments. For example, like laboratory and ﬁeld experiments, their objective is to establish causal relationships and they involve manipulation of independent variable(s). However, in contrast to other types of experimental designs, participants are not randomly assigned to treatment conditions in quasi-experiments. In fact, it is common for treatment conditions to be assigned to pre-existing groups or for participants to self-select into treatment conditions. As noted earlier, assigning participants to treatments in these ways increases the possibility that participants in the various conditions diﬀer on other characteristics that may be related to the dependent variable(s). Like ﬁeld experiments, quasi-experiments also diﬀer from laboratory experiments in that the researcher typically does not control potentially important aspects of the context in which the study is conducted; thus, increasing the number of rival hypotheses that might account for the ﬁndings.

What can be done to increase the probability that ﬁeld experiments will be published? Although there is considerably less resistance among editors and reviewers to publishing ﬁeld experiments, the larger challenge may be convincing an organization of the beneﬁts of participation. Fortunately, Eden (2017) has provided useful suggestions for overcoming managers' objections to ﬁeld experiments. These include (a) refraining from using research jargon, (b) explaining the purpose and value of randomization to managers, (c) looking for creative ways to implement randomized experiments, (d) using treatments of deleterious independent variables (e.g. stress) that are designed to reduce, rather than increase, their effects, and (e) piggybacking on naturally occurring events in the organization. We believe that these recommendations are sound and would encourage readers interested in conducting ﬁeld experiments to read Eden's paper for additional details on how to implement them. We also think that Colquitt's (2008) suggestions for those interested in publishing laboratory experiments are also relevant to those wanting to increase their likelihood of publishing ﬁeld experiments in the leadership domain (e.g., aim to test, extend or build new theory; ensure high internal validity, construct validity and statistical conclusion validity; use behavioral dependent variables where possible; strive to produce original, interesting, and important research ﬁndings). In addition, researchers should emphasize the fact that ﬁeld experiments can combine the best elements of experimental research with the ecological validity of real organizational settings in their papers. Finally, as we noted earlier in our discussion of laboratory experiments, combining the results of ﬁeld experiments with the results of other qualitative or quantitative studies should make them easier to publish. A good example of this approach is Li, Zheng, Harris, Liu, and Kirkman's (2016) examination of the spillover eﬀects of providing positive social recognition in teams. These authors combined two laboratory experiments and one ﬁeld experiment to show that the recognition received by a single team member boosted his or her teammates' individual performance, and the collective performance of the team. However, the results of this ﬁeld experiment also highlighted an unintentional downside of administering individual recognition in existing organizational teams, in that the performance of employees in the control condition decreased following the recognition announcements in the experimental condition. Since this drop in performance did not occur in the more internally valid laboratory experiments conducted by Le et al., it led the researchers to speculate on potential explanations for this

Examples of quasi-experiments in leadership research As noted by Shadish et al. (2002), there is a wide variety of quasiexperimental designs. To illustrate this variety, we highlight one study that uses a pretest-posttest nonequivalent groups design (Hui, Lam, & Schaubroeck, 2001), one that uses an interrupted time-series design (Grant & Hofmann, 2011), and one that uses a cohort design (DeRue, Nahrgang, Hollenbeck, & Workman, 2012). These designs are generally not as eﬀective as regression-discontinuity designs (RDDs), but few studies in the leadership domain have used RDDs (for an exception see Steﬀens, Peters, Haslam, & van Dick, 2017). This is unfortunate, because when used properly, regression discontinuity designs provide strong evidence of cause-eﬀect relationships. We encourage leadership researchers interested in learning more about them to refer to Shadish et al. (2002), Antonakis et al. (2010), and Cappelleri and Trochim (2015). In the ﬁrst of the studies we explore, Hui et al. (2001) examined the eﬀect that training bank employees to become service quality leaders has on customer satisfaction and employees' compliance to the requirements of a new service quality program. Hui et al. tested two hypotheses. The ﬁrst hypothesis stated that, compared with organizational units that did not have a service quality leader, units using frontline employees as service quality leaders would be more successful in implementing the service quality initiative. The second hypothesis stated that, compared with organizational units that used randomly selected frontline employees as service quality leaders, units using frontline employees selected on the basis of their OCB would be more successful in implementing the service quality initiative. Hui et al. (2001) tested their hypotheses in three U.S. branches of a large multinational bank, using a two-wave, repeated measures design. In one branch employees were selected to become service quality leaders on the basis of previous OCB, in the second branch the selection of service quality leaders was random, and in third (control) bank no employees were trained to be service quality leaders. No diﬀerences in age, education level, or organizational tenure were found between the 18

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

quasi-experiment indicated that fundraisers who received an ideological message from a beneﬁciary performed signiﬁcantly better after the message was delivered, whereas the performance of fundraisers who received messages from leaders did not change following delivery of the message. However, Grant and Hofmann noted that since the fundraisers were not randomly assigned to treatment conditions, their results were subject to several threats to validity. These included selection threats (fundraisers who showed up for the scholarship student's speech may have been more committed than other fundraisers), and multiple treatment eﬀects (the messages varied in terms of content as well as source). Nevertheless, Grant and Hofmann provided reasons why several other potential threats (e.g., history, testing, instrumentation, test-treatment interaction eﬀects, statistical regression, resentful demoralization, compensatory rivalry, compensatory equalization and treatment diﬀusion) were not likely to have inﬂuenced their ﬁndings. They conducted two laboratory experiments to address some of these potential threats. The ﬁnal quasi-experiment was reported by DeRue et al. (2012). These authors were interested in exploring the eﬀects that structured reﬂection, in the form of after-event reviews (AERs), have on experience-based leadership development activities, as well as how prior experiences and personality inﬂuence the impact of AERs on leadership development. DeRue et al. used MBA students as participants in a quasiexperimental cohort design. According to Cook and Campbell (1979, p. 127), “cohorts” are “groups of respondents who follow each other through formal institutions or informal institutions.” In the DeRue et al. study, the ﬁrst cohort of MBA students (the control group) preceded the second cohort of MBA students (the experimental cohort) by two years. Comparisons of the two cohorts indicated that they were similar with respect to a variety of factors previously shown to be related to leadership, including demographic variables, experience, cognitive ability, and personality traits. In addition, in order to reduce the likelihood that the subsequent ﬁndings would be due to unknown confounds or selection-maturation biases, both cohorts were exposed to the same curriculum, taught by the same instructors, and exposed to the same extracurricular activities and leadership development experiences. Unlike the participants in the control cohort, who were asked by trained facilitators simply to discuss the lessons they had learned after each major leadership developmental activity, the participants in the experimental cohort were guided through the AER protocol by the facilitators. Participants ﬁrst answered a series of questions relating to the activity (e.g., about the goal of the experience, their own behavior and contributions, the behavior of others, and speciﬁc actions they could take to improve their future performance), and were then guided by their facilitator to identify what they had learned about their leadership capabilities. Consistent with DeRue et al.'s (2012) hypotheses, the results showed that the AER intervention had a positive eﬀect on leadership development, and that this eﬀect was stronger in participants who were more conscientious, more open to new experiences, more emotionally stable, and had experienced greater developmental challenges in their previous work experiences. However, in contrast to their hypotheses, the authors found that neither participants' cognitive ability nor their amount of work experience moderated the relationship between the AER intervention and leadership development.

tellers who were trained in the two banks that received the experimental treatment. This treatment consisted of three weekly, two-hour group training sessions (led by an independent consultant) for those selected to become service quality leaders. The ﬁrst session was devoted to a discussion of the new company policy for improving the quality of service. The second session identiﬁed speciﬁc behavioral changes that were needed to improve the quality of customer service, and also included a discussion on how to use conversations to alert tellers to the beneﬁts of providing quality service. The ﬁnal session involved brainstorming strategies for improving service quality. Three dependent variables were used to assess the eﬀects of the training: customers' ratings of satisfaction with the service they had received, bank employees' self-ratings of compliance to the new service quality program, and supervisors' ratings of their employees' compliance to the new service quality program. A manipulation check indicated that leaders selected on the basis of their OCB did, indeed, receive higher supervisor ratings on a measure of OCB than other employees in the three bank branches, as well as the leaders chosen randomly to be trained in the other branch bank. Consistent with both hypotheses, Hui et al. (2001) reported that the two branches that used trained frontline employees as service quality leaders received higher customer satisfaction ratings than the branch without any service quality leaders (Hypothesis 1), and that the branch with “good citizens” as leaders received higher customer satisfaction ratings than the branch with randomly selected leaders (Hypothesis 2). Supervisor ratings of employees' compliance to the new service quality program also provided support for the hypotheses; however, employees' self-ratings only supported Hypothesis 1. Supervisors in the two branches that used frontline leaders reported better compliance to the new service quality plan than supervisors in the branch without frontline leaders but, inconsistent with Hypothesis 2, self-reported compliance to the new service quality program was similar in the branch with “good citizens” as leaders and the branch that used randomly selected leaders. In discussing the potential limitations of their study, Hui et al. (2001) noted that although the three bank branches were randomly assigned to treatment conditions, the tellers were not randomly assigned to the branches, and the researchers could not control for extraneous inﬂuences in the branch environments. However, they went on to provide reasons why several potential threats to construct and internal validity (e.g. compensatory equalization, resentful demoralization, selection, maturation, and reactance) were implausible explanations for their ﬁndings. In the second quasi-experiment, Grant and Hofmann (2011, Study 1) examined the eﬀects that the source of an ideological message (leader versus beneﬁciary of the message) has on the performance of the targets of the message. They noted that although virtually all of the previously reported studies had positioned leaders as the source of ideological messages, in some organizations such messages are delivered by beneﬁciaries. They went on to hypothesize that ideological messages delivered by beneﬁciaries have stronger eﬀects on employees' performance than messages delivered by leaders. Grant and Hofmann (2011) tested their hypotheses by studying the behavior of 60 university fundraisers in a naturally occurring quasiexperiment that took place over a three-month period. The fundraisers were responsible for contacting alumni and persuading them to donate money. The interventions occurred when the fundraisers' manager invited two university leaders and a scholarship student (a beneﬁciary of the fundraising process) to deliver messages at the beginning of the fundraisers' shifts. The authors tracked the performance of the fundraisers on a daily basis, before and after the interventions. Fourteen of the fundraisers received a message from a Director of the Young Alumni, 23 fundraisers received a message from a member of the Board of Trustees, and 18 fundraisers received a message from a scholarship student beneﬁciary. Performance was measured by the amount of money raised by each of these three groups. The results of Grant and Hofmann's (2011) interrupted time-series

Strengths of quasi-experiments Quasi-experiments share many of the strengths of ﬁeld experiments. First, control is exercised over the independent variable(s) of interest in quasi-experiments.3 In addition, quasi-experiments are less susceptible 3 It is worth noting, however, that in one type of quasi-experiment (often referred to as a “natural experiment”), the manipulation can happen naturally. In these studies, the researcher does not manipulate the independent variable, but instead takes advantage of the natural occurrence of the manipulation.

19

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

conﬁdence in causal inferences based on quasi-experimental data (compared with laboratory experiments), as well as conﬁdence in the replicability of the ﬁndings. Finally, it is also diﬃcult, if not impossible, to ﬁnd quasi-experiments that examine multiple independent variables using complex factorial designs.

to criticisms about artiﬁciality and demand characteristics (compared to laboratory experiments). The reasons for this advantage are that: (a) quasi-experiments are typically conducted in real organizational settings with real employees performing real jobs, (b) manipulation of the independent variable reﬂects the intensity of stimulus events in real organizational settings, (c) participants are typically exposed to the treatment(s) for a longer period of time than participants in laboratory experiments, and (d) participants are normally less aware of the experimental conditions, and therefore, less subject to some forms of participant reactivity. Finally, like other experimental designs, quasiexperiments provide researchers with an opportunity to measure actual behaviors (or their outcomes), rather than focusing only on attitudes and perceptions. In addition to the strengths identiﬁed above, Grant and Wall (2009, p. 653) have noted that quasi-experiments may be particularly beneﬁcial for: “(a) strengthening causal inference when random assignment and controlled manipulation are not possible or ethical; (b) building better theories of time and temporal progression; (c) minimizing ethical dilemmas of harm, inequity, paternalism, and deception; (d) facilitating collaboration with practitioners; and (e) using context to explain conﬂicting ﬁndings.” Similar points regarding the beneﬁts of quasi-experiments for exploring causal relationships when ethical considerations preclude random assignment or controlled manipulations, or when there is reluctance to participate in such studies, has been noted by Thyer (2012). Furthermore, Thyer has also noted that small scale quasi-experiments may be particularly useful in testing the eﬀectiveness of interventions before investing more resources in conducting largescale ﬁeld experiments.

What can be done to increase the probability that quasi-experiments will be published? The critical ﬁrst step is to obtain access to a participating organization. Grant and Wall (2009) have provided several worthwhile recommendations for gaining the cooperation of organizations including: (a) building long-term relationships and trust with organizations; (b) disseminating the ﬁndings of previous research to practitioners; (c) explaining how quasi-experiments can help practitioners achieve their goals; (d) asking questions in order to ﬁnd out what practitioners value and tailoring the study to these values; (e) highlighting the advantages that quasi-experiments have for researchers; (f) emphasizing common goals and unique expertise; (g) translating research jargon into common language; and (h) ﬁnding the right contacts in the organization. Beyond these strategies, leadership researchers need to look for ways to enhance the contributions of their quasi-experiments. One obvious way is to follow the lead of Grant and Hofmann (2011), and pair the results of a quasi-experiment with laboratory (or ﬁeld) experiments that directly address problems associated with the nonrandom assignment of participants to treatments. Similarly, the ﬁndings of quasi-experiments can be enhanced by combining them with the results of a non-experimental survey study that provide additional insights into the boundary conditions of the relationships examined in the quasi-experiment, or with a qualitative study that delves deeper into the theoretical mechanisms responsible for the ﬁndings. We believe that there are other strategies that researchers can use to minimize threats to validity in quasi-experiments. The ﬁrst recommendation is to anticipate and proactively address the likely threats to internal validity (Cook & Campbell, 1979; Mark & Reichardt, 2009). At the most basic level this can be accomplished by listing the threats in the planning phase of the study and then examining how well design decisions address each of them. Such planning helps avoid problems that might not otherwise be recognized until after the study has been conducted. Given that participants are not randomly assigned to conditions in quasi-experiments, one of the most obvious threats to the validity in such studies is selection. In response to this threat, researchers (Gu & Rosenbaum, 1993; Rosenbaum & Rubin, 1983; Smith, 1997; Stuart, 2010) have developed, described, and in some cases tested, the eﬀects of a variety of techniques designed to match non-randomly assigned participants across conditions. Matching is “any method that aims to equate (or ‘balance’) the distribution of covariates in the treated and control groups” (Stuart, 2010, p. 1). These procedures are designed to rule out potential threats to internal validity by ensuring that groups are equivalent with respect to potential confounding factors. Matching techniques include propensity score matching, individual-to-individual (or 1:1) matching, frequency distribution matching, weighted matching, and sub-classiﬁcation matching. A complete treatment of matching techniques is beyond the scope of this paper, but we encourage interested readers to examine articles on this topic by Stuart (2010), Harder, Stuart, and Anthony (2010), Connelly, Sackett, and Waters (2013), and Li (2013), as well as the book by Holmes (2014). Finally, because there is a greater likelihood that the independent variables examined in quasi-experimental studies are endogenous, and are correlated with the error terms of the dependent variables, quasiexperiments are susceptible to endogeneity biases. This makes the estimates inconsistent. As a result, we encourage researchers interested in publishing such studies to heed the recommendations of experts (Antonakis et al., 2010; Kennedy, 2008) on how these biases can be controlled.

Limitations of quasi-experiments Despite the potential beneﬁts of quasi-experiments, researchers are likely to encounter certain limitations when using these designs. For example, because many of the variables that management researchers are interested in manipulating (e.g., leadership behaviors, incentive systems, organizational or job characteristics etc.) are likely to aﬀect the actions of employees, organizations may be unwilling to participate in the research unless they are convinced that the outcomes will be positive. Furthermore, some managers may be reluctant to have researchers ﬁguratively “looking over their shoulders” and scrutinizing the eﬀectiveness of their actions. These factors not only make it diﬃcult to gain access to organizations, they also mean that more time, eﬀort, and planning may be required to conduct quasi-experiments than laboratory experiments. Aside from these practical considerations, researchers conducting quasi-experiments face other, design-related challenges. Foremost among them, quasi-experiments do not allow for random assignment, which raises the possibility that pre-existing diﬀerences between groups, or selection biases, may (at least in part) account for the observed results. In addition, quasi-experimental designs are more susceptible to the potential deleterious eﬀects of endogeneity biases (Antonakis et al., 2010). Third, researchers typically exercise considerably less control over the treatment condition(s) and extraneous variables in quasi-experiments compared with laboratory or ﬁeld experiments. This is illustrated by Grant and Hofmann's (2011, Study 1) inability to control the speciﬁc content of the ideological messages presented by the three diﬀerent sources, which raises obvious questions about the construct validity (and equivalence) of the manipulations in the study. Researchers' lack of control over the potential eﬀects of other, extraneous variables is also illustrated by the fact that in all the quasi-experiments we discussed, the authors found it necessary to explain why their ﬁndings could not be accounted for by confounding variables. When taken together, these limitations tend to decrease

(footnote continued) Grant and Hofmann's (2011, Study 1) is an example of this type of study. 20

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

generalizability of ﬁeld and laboratory research ﬁndings. American Psychologist, 35, 463–464. Bauman, C. W., Tost, L. P., & Ong, M. (2016). Blame the shepherd not the sheep: Imitating higher-ranking transgressors mitigates punishment for unethical behavior. Organizational Behavior and Human Decision Processes, 137, 123–141. Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). Psychology as the science of selfreports and ﬁnger movements: Whatever happened to actual behavior? Perspectives on Psychological Science, 2, 396–403. Bendahan, S., Zehnder, C., Pralong, F. P., & Antonakis, J. (2015). Leader corruption depends on power and testosterone. The Leadership Quarterly, 26, 101–122. Berkowitz, L., & Donnerstein, E. (1982). External validity is more than skin deep: Some answers to criticisms of laboratory experiments. American Psychologist, 37, 245–257. Bickman, L., & Rog, D. J. (2009). Applied research design: A practical approach. In L. Bickman, & D. J. Rog (Eds.). The Sage handbook of applied social research methods (pp. 3–43). (2nd ed.). Thousand Oaks, CA: Sage. Borsboom, D. (2009). Measuring the mind: Conceptual issues in contemporary psychometrics. Cambridge, England: Cambridge University Press. Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071. Boruch, R. F., Weisburd, D., Turner, M. T., III, Karpyn, A., & Littell, J. (2009). Randomized controlled trials for evaluation and planning. In L. Bickman, & D. J. Rog (Eds.). The Sage handbook of applied social research methods (pp. 147–181). Thousand Oaks, CA: Sage. Brown, D. J., & Lord, R. G. (1999). The utility of experimental research in the study of transformational/charismatic leadership. The Leadership Quarterly, 10, 531–539. Camerer, C. F. (2015). The promise and success of lab-ﬁeld generalizability in experimental economics: A critical reply to Levitt and List. In G. Fréchette, & A. Schotter (Eds.). Handbook of experimental economic methodology (pp. 249–295). Oxford, UK: Oxford University Press. Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54, 297–312. Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago, IL: Rand McNally. Campbell, J. P. (1986). Labs, ﬁelds, and straw issues. In E. A. Locke (Ed.). Generalizing from laboratory to ﬁeld settings (pp. 269–279). Lexington, MA: Heath. Cappelleri, J. C., & Trochim, W. M. (2015). Regression discontinuity design. International encyclopedia of the social & behavioral sciences. Vol. 20. International encyclopedia of the social behavioral sciences (pp. 152–159). Chatterji, A. K., Findley, M., Jensen, N. M., Meier, S., & Nielson, D. (2016). Field experiments in strategy research. Strategic Management Journal, 37, 116–132. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405. Churchill, G. A., Jr. (1979). A paradigm for developing better measures of marketing constructs. Journal of Marketing Research, 16, 64–73. Colquitt, J. A. (2008). From the editors: Publishing laboratory research in AMJ: A question of when, not if. Academy of Management Journal, 51, 616–620. Compeau, D., Marcolin, B., Kelley, H., & Higgins, C. (2012). Research commentary—Generalizability of information systems research using student subjects — A reﬂection on our practices and recommendations for future research. Information Systems Research, 23, 1093–1109. Connelly, B. S., Sackett, P. R., & Waters, S. D. (2013). Balancing treatment and control groups in quasi-experiments: An introduction to propensity scoring. Personnel Psychology, 66, 407–442. Cook, T. D., & Campbell, D. T. (1976). The design and conduct of quasi-experiments and true experiments in ﬁeld settings. In M. Dunnette (Ed.). Handbook of industrial and organizational psychology (pp. 223–326). Skokie, IL: Rand McNally. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for ﬁeld settings. Boston, MA: Houghton Miﬄin Company. Cooper, C. A., McCord, D. M., & Socha, A. (2011). Evaluating the college sophomore problem: The case of personality and politics. Journal of Psychology, 145, 23–37. Cooper, W. H., & Richardson, A. J. (1986). Unfair comparisons. Journal of Applied Psychology, 71, 179–184. Crano, W. D., Brewer, M. B., & Lac, A. (2015). Principles and methods of social research (3rd ed.). New York, NY: Routledge. de Mel, S., McKenzie, D., & Woodruﬀ, C. (2008). Returns to capital in microenterprises: Evidence from a ﬁeld experiment. Quarterly Journal of Economics, 123, 1329–1372. DeRue, D. S., Nahrgang, J. D., Hollenbeck, J. R., & Workman, K. (2012). A quasi-experimental study of after-event reviews and leadership development. Journal of Applied Psychology, 97, 997–1015. DeRue, D. S., Nahrgang, J. D., Wellman, N., & Humphrey, S. E. (2011). Trait and behavioral theories of leadership: An integration and meta-analytic test of their relative validity. Personnel Psychology, 64, 7–52. Detert, J. R., Trevino, L. K., Burris, E. R., & Andiappan, M. (2007). Managerial modes of inﬂuence and counterproductivity in organizations: A longitudinal business-unitlevel investigation. Journal of Applied Psychology, 92, 993–1005. deVaus, D. (2001). Research design in social research. Thousand Oaks, CA: Sage. Dipboye, R. L., & Flanagan, M. F. (1979). Are ﬁndings from the ﬁeld more generalizable than in the laboratory? American Psychologist, 34, 141–150. Dobbins, G. H., Lane, I. M., & Steiner, D. D. (1988). A note on the role of laboratory methodologies in applied behavioural research: Don't throw out the baby with the bath water. Journal of Organizational Behavior, 9, 281–286. Doci, E., & Hofmans, J. (2015). Task complexity and transformational leadership: The mediating role of leaders' state core self-evaluations. The Leadership Quarterly, 26, 436–447. Dvir, T., Eden, D., Avolio, B. J., & Shamir, B. (2002). Impact of transformational leadership on follower development and performance: A ﬁeld experiment. Academy of Management Journal, 45, 735–744. Eden, D. (1992). Leadership and expectations: Pygmalion eﬀects and other self-fulﬁlling prophecies in organizations. The Leadership Quarterly, 3, 271–305. Eden, D. (2003). Self-fulﬁlling prophecies in organizations. In J. Greenberg (Ed.).

Concluding remarks Although there is evidence of renewed interest in the use of experimental designs in management and leadership research (e.g., Anderson & Edwards, 2015; Antonakis, 2017; Colquitt, 2008; Van Witteloostuijn, 2015; Zellmer-Bruhn et al., 2016), they are still relatively underutilized. The strength of experimental designs is that they provide strong evidence of causal relationships between independent and dependent variables. We therefore encourage leadership researchers to include laboratory experiments, ﬁeld experiments, and quasi-experiments in their methodological toolkit. Although we have not addressed all of the issues associated with this important topic, we have hopefully provided leadership researchers interested in using experimental designs with some valuable suggestions for improving their research. Like Ariely (2010, pp. 292–293), we believe that the knowledge gained from experimental studies is important for both leadership scholars and practitioners alike: The importance of experiments as one of the best ways to learn what really works and what does not seems incontrovertible. I don't see anyone wanting to abolish scientiﬁc experiments in favor of relying more heavily on gut feelings or intuitions. But, I'm surprised that the importance of experiments isn't recognized more broadly, especially when it comes to important decisions in business or public policy. Frankly, I am often amazed by the audacity of the assumptions that businesspeople and politicians make, coupled with their seemingly unlimited conviction that their intuition is correct…But politicians and businesspeople are just people, with the same decision biases we all have, and the types of decisions they make are just as susceptible to errors in judgment as medical decisions. So shouldn't it be clear that the need for systematic experiments in business and policy is just as great? Acknowledgements Philip M. Podsakoﬀ gratefully acknowledges the support provided by the Hyatt and Cici Brown Chair in Business. References Acton, J. E. E. D. A., & Himmelfarb, G. (1948). Essays on freedom and power. Boston, MA: Beacon Press. Allen, T. D., & Rush, M. C. (1998). The eﬀects of organizational citizenship behavior on performance judgments: A ﬁeld study and a laboratory experiment. Journal of Applied Psychology, 83, 247–260. Anderson, C. A., Lindsay, J. J., & Bushman, B. J. (1999). Research in the psychological laboratory: Truth or triviality? Current Directions in Psychological Science, 8, 3–9. Anderson, D. M., & Edwards, B. C. (2015). Unfulﬁlled promise: Laboratory experiments in public management research. Public Management Review, 17, 1518–1542. Antonakis, J. (2017). On doing better science: From thrill of discovery to policy implications. The Leadership Quarterly, 28, 5–21. Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making causal claims: A review and recommendations. The Leadership Quarterly, 21, 1086–1120. Ariely, D. (2010). The upside of irrationality: The unexpected beneﬁts of deﬁned logic at work and at home. New York, NY: HarperCollins. Aronson, E., Brewer, M., & Carlsmith, J. M. (1985). Experimentation in social psychology. In G. Lindzey, & E. Aronson (Vol. Eds.), Handbook of social psychology(3rd ed.). Vol. 1. Handbook of social psychology (pp. 441–486). New York, NY: Random House. Aronson, E., & Carlsmith, J. M. (1968). Experimentation in social psychology. In G. Lindzey, & E. Aronson (Vol. Eds.), The handbook of social psychology. Vol. 2. The handbook of social psychology (pp. 1–79). Reading, MA: Addison - Wesley. Austin, J. T., Scherbaum, C. A., & Mahlman, R. A. (2002). History of research methods in industrial and organizational psychology: Measurement, design, analysis. In S. G. Rogelberg (Ed.). Handbook of research methods in industrial and organizational psychology (pp. 1–33). Malden, MA: Blackwell. Avey, J. B., Avolio, B. J., & Luthans, F. (2011). Experimentally analyzing the impact of leader positivity on follower positivity and performance. The Leadership Quarterly, 22, 282–294. Babbie, E. (2014). The practice of social research (14th ed.). Boston, MA: Cengage Learning. Bandiera, O., Barankay, I., & Rasul, I. (2007). Incentives for managers and inequality among workers: Evidence from a ﬁrm level experiment. Quarterly Journal of Economics, 122, 729–774. Bandiera, O., Barankay, I., & Rasul, I. (2011). Field experiments with ﬁrms. Journal of Economic Perspective, 25, 63–82. Bass, A. R., & Firestone, I. J. (1980). Implications of representativeness for

21

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

Organizational behavior: The state of the science (pp. 91–122). (2nd ed.). Mahwah, NJ: Lawrence Erlbaum. Eden, D. (2017). Field experiments in organizations. Annual Review of Organizational Psychology and Organizational Behavior, 4, 91–122. Eden, D., Stone-Romero, E. F., & Rothstein, H. R. (2015). Synthesizing results of multiple randomized experiments to establish causality in mediation testing. Human Resource Management Review, 25, 342–351. Falk, A., & Heckman, J. J. (2009). Lab experiments are a major source of knowledge in the social sciences. Science, 326, 535–538. Fischer, T., Dietz, J., & Antonakis, J. (2017). Leadership process models: A review and synthesis. Journal of Management, 43, 1726–1753. Fisher, C. D. (1984). Laboratory experiments. In T. S. Bateman, & G. R. Ferris (Eds.). Method & analysis in organizational research (pp. 169–185). Reston, VA: Reston Publishing. Gadlin, H., & Ingle, G. (1975). Through the one-way mirror: The limits of experimental self-reﬂection. American Psychologist, 30, 1003–1009. Giessner, S. R., van Knippenberg, & Sleebos, E. (2009). License to fail? How leader group prototypicality moderates the eﬀects of leader performance on perceptions of leadership eﬀectiveness. The Leadership Quarterly, 20, 434–451. Gordon, M. E., Slade, L. A., & Schmitt, N. (1986). “Science of the sophomore” revisited: From conjecture to empiricism. Academy of Management Review, 11, 191–207. Gordon, M. E., Slade, L. A., & Schmitt, N. (1987). Student guinea pigs: Porcine predictors and particularistic phenomena. Academy of Management Review, 12, 160–163. Grant, A. M., & Hofmann, D. A. (2011). Outsourcing inspiration: The performance eﬀects of ideological messages from leaders and beneﬁciaries. Organizational Behavior and Human Decision Processes, 116, 173–187. Grant, A. M., & Wall, T. D. (2009). The neglected science and art of quasi-experimentation: Why-to, when-to, and how-to advice for organizational researchers. Organizational Research Methods, 12, 653–686. Greenberg, J. (1987). The college sophomore as guinea pig: Setting the record straight. Academy of Management Review, 12, 157–159. Greenberg, J., & Tomlinson, E. C. (2004). Situated experiments in organizations: Transplanting the lab to the ﬁeld. Journal of Management, 30, 703–724. Griﬃn, R., & Kacmar, K. M. (1991). Laboratory research in management: Misconceptions and missed opportunities. Journal of Organizational Behavior, 12, 301–311. Gu, X. S., & Rosenbaum, P. R. (1993). Comparison of multivariate matching methods: Structures, distances, and algorithms. Journal of Computational and Graphical Statistics, 2, 405–420. Harder, V. S., Stuart, E. A., & Anthony, J. C. (2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychological Methods, 15, 234–249. Harré, R., & Secord, P. F. (1972). The explanation of social behavior. Oxford, UK: Blackwell. Harrison, G. W., & List, J. A. (2004). Field experiments. Journal of Economic Literature, 42, 1009–1055. Hauser, O. P., Linos, E., & Rogers, T. (2017). Innovation with ﬁeld experiments: Studying organizational behaviors in actual organizations. In A. P. Brief, & B. M. Staw (Vol. Eds.), Research in organziational behavior. 37. Research in organziational behavior (pp. 185–198). Heath, C., & Sitkin, S. B. (2001). Big-B versus Big-O: What is organizational about organizational behavior? Journal of Organizational Behavior, 22, 43–58. Henry, P. J. (2008). College sophomores in the laboratory redux: Inﬂuences of a narrow data base on social psychology's view of the nature of prejudice. Psychological Inquiry, 19, 49–71. Henshel, R. L. (1980). The purposes of laboratory experimentation and the virtues of deliberate artiﬁciality. Journal of Experimental Social Psychology, 16, 466–478. Hertwig, R., & Ortmann, A. (2001). Experimental practices in economics: A methodological challenge for psychologists? Behavioral and Brain Sciences, 24, 383–403. Hertwig, R., & Ortmann, A. (2008). Deception in experiments: Revisiting the arguments in its defense. Ethics & Behavior, 18, 59–92. Highhouse, S. (2009). Designing experiments that generalize. Organizational Research Methods, 12, 554–566. Holmes, W. M. (2014). Using propensity scores in quasi-experimental designs. Thousand Oaks, CA: Sage. Howell, J. M., & Frost, P. J. (1989). A laboratory study of charismatic leadership. Organizational Behavior and Human Decision Processes, 43, 243–269. Hui, C., Lam, S. S. K., & Schaubroeck, J. (2001). Can good citizens lead the way in providing quality service? A ﬁeld quasi-experiment. Academy of Management Journal, 44, 988–995. Hunt, J. G., Boal, K. B., & Dodge, G. E. (1999). The eﬀects of visionary and crisis-responsive charisma on followers: An experimental examination of two kinds of charismatic leadership. The Leadership Quarterly, 10, 423–448. Ilgen, D. R. (1986). Laboratory research: A question of when, not if. In E. Locke (Ed.). Generalizing from laboratory to ﬁeld settings (pp. 257–268). Lexington, MS: Lexington Books. James, L. R. (1980). The unmeasured variables problem in path-analysis. Journal of Applied Psychology, 65, 415–421. Jamison, J., Karlan, D., & Schechter, L. (2008). To deceive or not to deceive: The eﬀect of deception on behavior in future laboratory experiments. Journal of Economic Behavior & Organization, 68, 477–488. Johnson, M. D., Hollenbeck, J. R., DeRue, D. S., Barnes, C. M., & Jundt, D. (2013). Functional versus dysfunctional team change: Problem diagnosis and structural feedback for self-managed teams. Organizational Behavior and Human Decision Processes, 122, 1–11. Jones, R. A. (1985). Research methods in the social and behavioral sciences. Sunderland, MD: Sinauer Associates. Judd, C. M., Kenny, D. A., & McClelland, G. H. (2001). Estimating and testing mediation and moderation in within-subject designs. Psychological Methods, 6, 115–134. Judge, T. A., Bono, J. E., Ilies, R., & Gerhardt, M. W. (2002). Personality and leadership: A qualitative and quantitative review. Journal of Applied Psychology, 87, 765–780. Judge, T. A., Colbert, A. E., & Ilies, R. (2004). Intelligence and leadership: A quantitative

review and test of theoretical propositions. Journal of Applied Psychology, 89, 542–552. Judge, T. A., & Piccolo, R. F. (2004). Transformational and transactional leadership: A meta-analytic test of their relative validity. Journal of Applied Psychology, 89, 755–768. Kardes, F. R. (1996). In defense of experimental consumer psychology. Journal of Consumer Psychology, 5, 279–296. Kelley, T. L. (1927). Interpretation of educational measurements. New York, NY: Oxford University Press. Kennedy, P. (2008). A Guide to econometrics (6th ed.). Malden, MA: Blackwell. Kenny, D. A. (1979). Correlation and causality. New York, NY: John Wiley & Sons. Kenny, D. A. (2008). Reﬂections on mediation. Organizational Research Methods, 11, 353–358. Kidd, R. F. (1976). Manipulation checks: Advantage or disadvantage. Representative Research in Social Psychology, 7, 160–165. Kingstone, A., Smilek, D., Ristic, J., Freisen, C. K., & Eastwood, J. D. (2003). Attention, researchers! It is time to take a look at the real world. Current Directions in Psychological Science, 12, 176–180. Kruglanski, A. W. (1975). The human subject in the psychology experiment: Fact and artifact. In L. Berkowitz (Vol. Ed.), Advances in experimental social psychology. Vol. 8. Advances in experimental social psychology (pp. 101–147). New York, NY: Academic Press. Lam, S. S. K., & Schaubroeck, J. (2000). A ﬁeld experiment testing frontline opinion leaders as change agents. Journal of Applied Psychology, 85, 987–995. Landy, F. J., & Bates, F. (1973). Another look at contrast eﬀects in the employment interview. Journal of Applied Psychology, 58, 141–144. Latham, G. P., Erez, M., & Locke, E. A. (1988). Resolving scientiﬁc disputes by the joint design of crucial experiments by the antagonists - application to the Erez-Latham dispute regarding participation in goal setting. Journal of Applied Psychology, 73, 753–772. Li, M. (2013). Using the propensity score method to estimate causal eﬀects: A review and practical guide. Organizational Research Methods, 16, 188–226. Li, N., Zheng, X., Harris, T. B., Liu, X., & Kirkman, B. L. (2016). Recognizing “Me” beneﬁts “We”: Investigating the positive spillover eﬀects of formal individual recognition in teams. Journal of Applied Psychology, 101, 925–939. Liang, L. H., Lian, H., Brown, D. J., Ferris, D. L., Hanig, S., & Keeping, L. M. (2016). Why are abusive supervisors abusive? A dual—system self-control model. Academy of Management Journal, 59, 1385–1406. Locke, E. A. (1986). Generalizing from laboratory to ﬁeld: Ecological validity or abstraction of essential elements? In E. A. Locke (Ed.). Generalizing from laboratory to ﬁeld settings (pp. 257–267). Lexington, MA: Heath. Lonati, S., Quiroga, B. F., Zehnder, C., & Antonakis, J. (2018). On [Lonati, S., Quiroga, B. F., Zehnder, C., & Antonakis, J. (2018). On]. https://urldefense.proofpoint.com/v2/ url?u=https-3A__doi.org_10.1016_j.jom.2018.10.003&d=DwICaQ&c= pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=wn-_K4u1Gj1cQJ6TSL_ eQysCX9ZUpQUFfGR9WdjO6DifxnT6bgHWWmjgPFNqv2vM&m=l8_Iu-MD__ v4krQzIoBNVIK9IBiVPDFkqmbLyqmLRH4&s= 67rFzhwcASkwjx5x6kWoTOgWnfpF8qT5BN-dirlLi10&e=. Lucas, J. W. (2003). Theory-testing, generalization, and the problem of external validity. Sociological Theory, 21, 236–253. Lynch, J. G., Jr. (1982). On the external validity of experiments in consumer research. Journal of Consumer Research, 9, 225–239. MacKenzie, S. B. (2003). The dangers of poor construct conceptualization. Journal of the Academy of Marketing Science, 31, 323–326. MacKenzie, S. B., Podsakoﬀ, P. M., & Podsakoﬀ, N. P. (2011). Construct measurement and validation procedures in MIS and behavioral research: Integrating new and existing techniques. MIS Quarterly, 35, 293–334. MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. New York: Lawrence Erlbaum. MacKinnon, D. P., Fairchild, A. J., & Fritz, M. S. (2007). Mediation analysis. Annual Review of Psychology, 58, 593–614. Mark, M. M., & Reichardt, C. S. (2009). Quasi-experimentation. In L. Bickman, & D. J. Rog (Eds.). The Sage handbook of applied social research methods (pp. 182–213). Los Angeles, CA: Sage. Martin, S. L., Liao, H., & Campbell, E. M. (2013). Directive versus empowering leadership: A ﬁeld experiment comparing impacts on task proﬁciency and proactivity. Academy of Management Journal, 56, 1372–1395. Mathieu, J. E., Hollenbeck, J. R., van Knippenberg, D., & Ilgen, D. R. (2017). A century of work teams in the Journal of Applied Psychology. Journal of Applied Psychology, 102, 452–467. Maynes, T. D., & Podsakoﬀ, P. M. (2014). Speaking more broadly: An examination of the nature, antecedents, and consequences of an expanded set of employee voice behaviors. Journal of Applied Psychology, 99, 87–112. McCambridge, J., de Bruin, M., & Witton, J. (2012). The eﬀects of demand characteristics on research participant behaviours in non-laboratory settings: A systematic review. PLoS One, 7, 1–6. McNemar, Q. (1946). Opinion-attitude methodology. Psychological Bulletin, 43, 289–374. Mitchell, G. (2012). Revisiting truth or triviality the external validity of research in the psychological laboratory. Perspectives on Psychological Science, 7, 109–117. Mitchell, M. S., Vogel, R. M., & Folger, R. (2015). Third parties' reactions to the abusive supervision of coworkers. Journal of Applied Psychology, 100, 1040–1055. Mook, D. (1983). In defense of external invalidity. American Psychologist, 38, 379–387. Mueller, J. (2018). Finding new kinds of needles in haystacks: Experimentation in the course of abduction. Academy of Management Discoveries, 4, 103–108. Murphy, K. R., Herr, B. M., Lockhart, M. C., & Maguire, E. (1986). Evaluating the performance of paper people. Journal of Applied Psychology, 71, 654–661. Nahrgang, J. D., DeRue, D. S., Hollenbeck, J. R., Spitzmuller, M., Jundt, D. K., & Ilgen, D. R. (2013). Goal setting in teams: The impact of learning and performance goals on process and performance. Organizational Behavior and Human Decision Processes, 122, 12–21.

22

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoﬀ, N.P. Podsakoﬀ

3, 213–225. Sears, D. O. (1986). College sophomores in the laboratory: Inﬂuences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology, 51, 515. Semadeni, M., Withers, M. C., & Certo, S. T. (2014). The perils of endogeneity and instrumental variables in strategy research: Understanding through simulations. Strategic Management Journal, 35, 1070–1079. Shadish, W. R. (2011). Randomized controlled studies and alternative designs in outcome studies: Challenges and opportunities. Research on Social Work Practice, 21, 636–643. Shadish, W. R., & Cook, T. D. (2009). The renaissance of ﬁeld experimentation in evaluating interventions. Annual Review of Psychology, 60, 607–629. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York, NY: Houghton Miﬄin. Shearer, B. S. (2004). Piece rates, ﬁxed wages and incentives: Evidence from a ﬁeld experiment. Review of Economic Studies, 71, 513–534. Shimp, T. A., Hyatt, E. M., & Snyder, D. J. (1991). A critical appraisal of demand artifacts in consumer research. Journal of Consumer Research, 18, 273–283. Sigall, H., & Mills, J. (1998). Measures of independent variables and mediators are useful in social psychology experiments: But are they necessary? Personality and Social Psychology Review, 2, 218–226. Slade, L. A., & Gordon, M. E. (1988). On the virtues of laboratory babies and student bath water: A reply to Dobbins, Lane, and Steiner. Journal of Organizational Behavior, 9, 373–376. Smith, H. (1997). Matching with multiple controls to estimate treatment eﬀects in observational studies. Sociological Methodology, 27, 325–353. Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005). Establishing a causal chain: Why experiments are often more eﬀective than mediational analysis in examining psychological processes. Journal of Personality and Social Psychology, 89, 845–851. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York, NY: Springer-Verlag. Steﬀens, N. K., Peters, K., Haslam, S. A., & van Dick, R. (2017). Dying for charisma: Leaders' inspirational appeal increases post-mortem. The Leadership Quarterly, 28, 530–542. Stentz, J. E., Plano Clark, V. L., & Matkin, G. S. (2012). Applying mixed methods to leadership research: A review of current practices. The Leadership Quarterly, 23, 1173–1183. Stone-Romero, E. F. (2002). The relative validity and usefulness of various empirical research designs. In S. G. Rogelberg (Ed.). Handbook of research methods in industrial and organizational psychology (pp. 77–98). Malden, MA: Blackwell. Stone-Romero, E. F., & Rosopa, P. J. (2010). Research design options for testing mediation models and their implications for facets of validity. Journal of Managerial Psychology, 25, 697–712. Stone-Romero, E. F., Weaver, A. E., & Glenar, J. L. (1995). Trends in research design and data analytic strategies in organizational research. Journal of Management, 21, 141–157. Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25, 1–21. Suddaby, R. (2010). Construct clarity in theories of management and organization. Academy of Management Review, 35, 346–357. Taylor, L. A., III, Goodwin, V. L., & Cosier, R. A. (2003). Method myopia: Real or imagined? Journal of Management Inquiry, 12, 255–263. Thyer, B. A. (2012). Quasi-experimental research designs. Oxford, UK: Oxford University Press. Van Knippenberg, D., & Sitkin, S. B. (2013). A critical assessment of charismatic transformational leadership research: Back to the drawing board? Academy of Management Annals, 7, 1–60. Van Witteloostuijn, A. (2015). Toward experimental international business: Unraveling fundamental causal linkages. International Journal of Cross-Cultural Management, 22, 530–544. Vanhove, A. J., & Harms, P. D. (2015). Reconciling the two disciplines of organisational science: A comparison of ﬁndings from lab and ﬁeld research. Applied Psychology. An International Review, 64, 637–673. Weber, S. J., & Cook, T. D. (1972). Subject eﬀects in laboratory research: An examination of subject roles, demand characteristics, and valid inference. Psychological Bulletin, 77, 273–295. Webster, M., Jr., & Sell, J. (2014). Why do experiments? In M. WebsterJr., & J. Sell (Eds.). Laboratory experiments in the social sciences (pp. 5–21). (2nd ed.). London, UK: Elsevier. Wetzel, C. G. (1977). Manipulation checks: A reply to Kidd. Representative Research in Social Psychology, 8, 88–93. Woﬀord, J. C. (1999). Laboratory research on charismatic leadership: Fruitful or futile? The Leadership Quarterly, 10, 523–529. Zelditch, M. (1969). Can you really study an army in the laboratory? In A. Etzioni, & E. Lehman (Eds.). A sociological reader on complex organizations (pp. 528–539). New York, NY: Holt, Rinehart and Winston. Zellmer-Bruhn, M., Caligiuri, P., & Thomas, D. C. (2016). From the editors: Experimental designs in international business research. Journal of International Business Studies, 47, 399–407. Zhang, X., & Bartol, K. M. (2010). Linking empowering leadership and employee creativity: The inﬂuence of psychological empowerment, intrinsic motivation, and creative process engagement. Academy of Management Journal, 53, 107–128.

Oakes, W. (1972). External validity and the use of real people as subjects. American Psychologist, 27, 959–962. Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist, 17, 776–783. Orne, M. T. (1969). Demand characteristics and the concept of quasi-controls. In R. Rosenthal, & R. L. Rosnow (Eds.). Artifact in behavioral research (pp. 147–179). New York, NY: Academic Press. Ortmann, A., & Hertwig, R. (2002). The costs of deception: Evidence from psychology. Experimental Economics, 5, 111–131. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Perdue, B. C., & Summers, J. O. (1986). Checking the success of manipulations in marketing experiments. Journal of Marketing Research, 23, 317–326. Peterson, R. A. (2001). On the use of college students in social science research: Insights from a second-order meta-analysis. Journal of Consumer Research, 28, 450–461. Pfeﬀer, J., & Sutton, R. I. (2006). Hard facts, dangerous half-truths and total nonsense. Boston, MA: Harvard University Press. Piccolo, R. F., Bono, J. E., Heinitz, K., Rowold, J., Duehr, E., & Judge, T. A. (2012). The relative impact of complementary leader behaviors: Which matter most? The Leadership Quarterly, 23, 567–581. Podsakoﬀ, N. P., Podsakoﬀ, P. M., MacKenzie, S. B., & Klinger, R. L. (2013). Are we really measuring what we say we're measuring? Using video techniques to supplement traditional construct validation procedures. Journal of Applied Psychology, 98, 99–113. Podsakoﬀ, N. P., Whiting, S. W., Podsakoﬀ, P. M., & Mishra, P. (2011). Eﬀects of organizational citizenship behaviors on selection decisions in employment interviews. Journal of Applied Psychology, 96, 310–326. Podsakoﬀ, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoﬀ, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903. Podsakoﬀ, P. M., MacKenzie, S. B., Moorman, R., & Fetter, R. (1990). The impact of transformational leader behaviors on employee trust, satisfaction, and organizational citizenship behaviors. The Leadership Quarterly, 1, 107–142. Podsakoﬀ, P. M., MacKenzie, S. B., & Podsakoﬀ, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569. Podsakoﬀ, P. M., MacKenzie, S. B., & Podsakoﬀ, N. P. (2016). Recommendations for creating better concept deﬁnitions in the organizational, behavioral, and social sciences. Organizational Research Methods, 19, 159–203. Podsakoﬀ, P. M., & Schriesheim, C. A. (1985). Field studies of French and Raven's bases of social power: Reanalysis, critique, and suggestions for future research. Psychological Bulletin, 97, 387–411. Postman, L. (1955). The probability approach and nomothetic theory. Psychological Review, 62, 218–225. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal eﬀects. Biometrika, 70, 41–55. Rosenthal, R. (1967). Covert communication in the psychological experiment. Psychological Bulletin, 67, 356–367. Rosenthal, R., & Rosnow, R. L. (1969). The volunteer subject. In R. Rosenthal, & R. L. Rosnow (Eds.). Artifact in behavioral research (pp. 41–92). New York, NY: Academic Press. Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research: Methods and data analysis (2nd ed.). New York, NY: McGraw-Hill. Rousseau, D. M. (2012). Envisioning evidence-based management. In D. M. Rousseau (Ed.). The Oxford handbook of evidence-based management (pp. 3–23). New York, NY: Oxford University Press. Rynes, S. L., & Bartunek, J. M. (2017). Evidence-based management: Foundations, development, controversies and future. Annual Review of Organizational Psychology and Organizational Behavior, 4, 235–261. Sawyer, A. G. (1975). Demand artifacts in laboratory experiments in consumer research. Journal of Consumer Research, 1, 20–30. Scandura, T. A., & Williams, E. A. (2000). Research methodology in management: Current practices, trends, and implications for future research. Academy of Management Journal, 43, 1248–1264. Schaubroeck, J., Lam, S. S. K., & Cha, S. E. (2007). Embracing transformational leadership: Team values and the impact of leader behavior on team performance. Journal of Applied Psychology, 92, 1020–1030. Schriesheim, C. A., House, R. J., & Kerr, S. (1976). Leader initiating structure: A reconciliation of discrepant research results and some empirical tests. Organizational Behavior and Human Performance, 16, 297–321. Schriesheim, C. A., & Stogdill, R. M. (1975). Diﬀerences in factor structure across three versions of the Ohio State leadership scales. Personnel Psychology, 28, 189–206. Schultz, D. P. (1969). Human subjects in psychological research. Psychological Bulletin, 72, 214–228. Schwab, D. P. (1980). Construct validity in organizational behavior. In L. L. Cummings, & B. Staw (Vol. Eds.), Research in organizational behavior. Vol. 2. Research in organizational behavior (pp. 3–43). Greenwich, CT: JAI Press. Schwab, D. P. (2005). Research methods for organizational studies (2nd ed.). Mahwah, NJ: Lawrence Earlbaum. Schwenk, C. R. (1982). Why sacriﬁce rigour for relevance? A proposal for combining laboratory and ﬁeld research in strategic management. Strategic Management Journal,

23

Experimental designs in management and leadership research: Strengths, limitations, and recommendations for improving publishability

Experimental designs in management and leadership research: Strengths, limitations, and recommendations for improving publishability

Recommend Documents