Experimental designs in management and leadership research: Strengths, limitations, and recommendations for improving publishability

Experimental designs in management and leadership research: Strengths, limitations, and recommendations for improving publishability

The Leadership Quarterly xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect The Leadership Quarterly journal homepage: www.elsevier.com/lo...

925KB Sizes 0 Downloads 59 Views

The Leadership Quarterly xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

The Leadership Quarterly journal homepage: www.elsevier.com/locate/leaqua

Experimental designs in management and leadership research: Strengths, limitations, and recommendations for improving publishability ⁎

Philip M. Podsakoffa, , Nathan P. Podsakoffb a b

Department of Marketing, Warrington College of Business Administration, University of Florida, Gainesville, FL 32611, United States of America Department of Management and Organizations, Eller College of Management, University of Arizona, Tucson, AZ 85720-1080, United States of America

A R T I C LE I N FO

A B S T R A C T

Keywords: Experimental designs Laboratory experiments Field experiments Quasi-experiments Causal inference

Despite the renewed interest in the use of experimental designs in the fields of leadership and management over the past few decades, these designs are still relatively underutilized. Although there are several potential reasons for this, chief among them is misunderstanding the value of these designs. The purpose of this article is to review the role of laboratory, field, and quasi-experimental designs in management and leadership research. We first discuss the primary goals of experimental studies. Next, we examine the characteristics of experimental designs and how to distinguish laboratory, field, and quasi-experiments from one another and from non-experimental studies. Following these discussions, we provide examples of each type of experimental design and discuss their relative strengths and limitations. Finally, we discuss steps that researchers can take to increase the probability of having articles reporting experiments accepted by leadership and management journals.

We consider the … experiment to be the core research method … In advocating the experimental method, we are taking it as axiomatic that the purpose for which this method is best suited is that of testing theory rather than describing the world as it is. Without doubt, for descriptive and exploratory purposes, there are alternative models of systematic observation and data collection that can better serve the needs of the researcher. However, for subjecting theory-inspired hypotheses about causal relationships to potential confirmation or disconfirmation, the experiment is unexcelled in its ability to provide unambiguous evidence about causation, to permit control over extraneous variables, and to allow for analytic exploration of the dimensions and parameters of a complex phenomenon. (Aronson, Brewer, & Carlsmith, 1985, p. 443) Introduction If the growing percentage of articles published in industrial/organizational (I/O) psychology and management over the past few decades

is any indication, then there is renewed interest in the use of experimental designs. Several authors (Austin, Scherbaum, & Mahlman, 2002; Colquitt, 2008; Griffin & Kacmar, 1991; Scandura & Williams, 2000; Stone-Romero, Weaver, & Glenar, 1995; Taylor, Goodwin, & Cosier, 2003) have chronicled the downward trend in publication rates of experimental studies, particularly those conducted in laboratory settings, from the 1960s through the late 1990s. However, our examination of recent leadership, management, and I/O psychology publications suggests that this trend may be reversing.1 Indeed, the results of our review indicated that although the percentage of articles using laboratory, field, or quasi-experimental designs remained relatively stable between 1990 and 2009 at around 7%, it increased to 8% during 2010–2014, and then to almost 11.5% during 2015–2018, with the vast majority of the experiments (about 83%) conducted in the laboratory. One obvious reason for the use of experimental designs is their ability to provide evidence of causality (Antonakis, Bendahan, Jacquart, & Lalive, 2010; Campbell & Stanley, 1963; Colquitt, 2008; Falk & Heckman, 2009). Indeed, as indicated by the quotation at the beginning of this article, the power of experiments to establish causeand-effect relationships is critical to the development of knowledge in



Corresponding author. E-mail addresses: [email protected]fl.edu (P.M. Podsakoff), [email protected] (N.P. Podsakoff). 1 We searched the Academy of Management Journal, Administrative Science Quarterly, Journal of Applied Psychology, Journal of Organizational Behavior, Leadership Quarterly, Journal of Management, and Personnel Psychology, using the key words “experiment,” “experiments,” “experimental,” “laboratory experiment,” “quasiexperiment,” “quasi-experiments,” “field experiment,” and “field experiments.” We excluded articles that were not empirical in nature; using the remaining articles, we calculated the percentage of articles that were experimental in nature in six time periods (1990–1994, 1995–1999, 2000–2004, 2005–2009, 2010–2014 and 2015–March, 2018). https://doi.org/10.1016/j.leaqua.2018.11.002 Received 14 June 2018; Received in revised form 28 October 2018; Accepted 5 November 2018 1048-9843/ © 2018 Published by Elsevier Inc.

Please cite this article as: Podsakoff, P.M., The Leadership Quarterly, https://doi.org/10.1016/j.leaqua.2018.11.002

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

encouraging, there remains a need for a more comprehensive discussion of the strengths and potential limitations of experimental designs. Therefore, this article is intended to provide an integrative review of the role of experiments in management and leadership research. First, we discuss the basic goals of experimental designs and identify the conditions necessary to establish causal relationships. Next, we: (a) examine the characteristics of experiments and discuss how to distinguish between laboratory, field, and quasi-experimental designs, (b) provide examples of each type from the leadership literature, and (c) discuss the relative strengths and limitations of each type of experimental design. Finally, we discuss some of the practical issues researchers encounter when using experimental methods in various research settings and provide some recommendations for addressing them. Although this article is intended primarily for doctoral students, our recommendations should also prove worthwhile for any researcher interested in improving the rigor and publishability of their experimental research.

the organizational and behavioral sciences. It is therefore not surprising that Jones (1985, p. 282) argued that experiments are “the most powerful technique[s] available for demonstrating causal relationships between variables,” and other scholars (Antonakis, 2017; Eden, 2017; Hauser, Linos, & Rogers, 2017; Holmes, 2014; Kenny, 1979) have referred to them as the “gold standard” of scientific research. Similarly, several researchers (Aronson et al., 1985; Colquitt, 2008; Fisher, 1984; Ilgen, 1986) have noted the importance of experimental designs for testing theory and helping us to develop a better understanding of the complex world in which we live. Finally, we do not believe that it is coincidental that the growth in the percentage of experimental studies over the past decade follows Colquitt's (2008) call in the Academy of Management Journal (AMJ) for more laboratory experiments. Although Colquitt's editorial was directed at researchers interested in publishing experiments in AMJ, it also served as a signal that other management journals might be interested in experimental research. The recent appearance in management and leadership journals of editorial statements citing Colquitt (2008) and echoing his call for laboratory experiments (cf. Anderson & Edwards, 2015; Antonakis, 2017; Mueller, 2018; Van Witteloostuijn, 2015; Zellmer-Bruhn, Caligiuri, & Thomas, 2016) lends weight to this proposition. That said, we still believe that experiments (particularly laboratory experiments) are often under-appreciated in management and leadership research. This lack of appreciation stems from the criticisms directed at such designs over the years. First are the criticisms of laboratory research for a presumed lack of realism, use of student subjects, and concerns about external validity (e.g., Colquitt, 2008; Greenberg & Tomlinson, 2004; Ilgen, 1986; Mook, 1983; Taylor et al., 2003). Campbell (1986, p. 276) has captured the essence of these criticisms, noting that “in the minds of its critics, laboratory research is of low quality, experimental in nature, theoretical and esoteric, and rigidly controlled to the point of sterility. Worst of all, it uses students as subjects.” Although Colquitt (2008) noted that researchers may be more likely to subscribe to this view than editors and reviewers, the effect remains the same – fewer papers reporting laboratory experiments are submitted (and subsequently published) in management journals. Next, we believe that there is a general misunderstanding of some basic characteristics of experimental designs and their subsequent strengths and limitations. For example, it is not uncommon for laboratory experiments to be criticized for their “artificiality” and the amount of control they exercise over extraneous variables (Babbie, 2014), even though this control is among the most important virtues of the method (Henshel, 1980; Mook, 1983; Webster & Sell, 2014). As long as these misunderstandings persist, it is unlikely that the advantages of experimental methods will be fully appreciated. Third, several researchers (Greenberg & Tomlinson, 2004; Stone-Romero et al., 1995), have noted that it is easier to administer questionnaire surveys in field settings than to conduct experiments in laboratory settings, and that the growth in the use of covariance structure and multilevel analyses has provided researchers with more sophisticated ways of analyzing survey data. Finally, like Schwenk (1982) and Taylor et al. (2003), we think that many researchers perceive that rigor and relevance are orthogonal concepts and that relevance should take precedent over rigor. This false dichotomy has also been noted by other scholars (Lonati, Quiroga, Zehnder, & Antonakis, 2018). However, misplaced criticism of laboratory experiments leads many researchers to focus their efforts on nonexperimental, questionnaire-based research in organizational settings, rather than on experimental studies in more controlled settings. Of course we are not implying that experimental methods have been completely neglected in leadership and management. For example, in addition to Colquitt's (2008) call for more laboratory experiments, Grant and Wall's (2009) article on the benefits of quasi-experiments in organizational settings, and the recent articles by Chatterji, Findley, Jensen, Meier, and Nielson (2016), Eden (2017), and Hauser et al. (2017), on the virtues of field experiments have made important contributions to the literature. However, although these articles are

What are the goals of experimental research designs? The basic goal of experimental research designs is to determine the causal relationships between independent and dependent variable(s). Although there is some debate about what constitutes a causal relationship (e.g., Cheng, 1997; Cook & Campbell, 1979; Pearl, 2000; Spirtes, Glymour, & Scheines, 1993), most organizational and behavioral scientists (cf. Antonakis et al., 2010; Bickman & Rog, 2009; Campbell, 1957; Cook & Campbell, 1979; deVaus, 2001) subscribe to the idea that a cause-effect relationship is established with three criteria: (a) covariation between the independent and dependent variables; (b) temporal precedence, such that variation in the independent variable precedes variation in the dependent variable; and (c) alternative explanations for the observed relationship have been ruled out. The importance of the second and third criteria for establishing a causal relationship can be illustrated with a simple example. Suppose a researcher hypothesizes that supportive leader behaviors (SLB) increase employees' task performance (TP), and that after gathering data on these variables in an organizational setting the researcher finds that there is a relatively strong positive correlation (r = 0.63) between the measures of these variables. On this basis, the researcher might conclude that SLB causes an improvement in TP. However, the correlation can be explained by a variety of alternate causal relationships between these variables, which are illustrated in Fig. 1. First, as indicated in Panel 1, it is possible that the observed correlation between SLB and employees' TP supports the hypothesis that SLB cause employees to perform better. Second, as indicated in Panel 2, it is also possible that this correlation reflects the fact that leaders are more supportive of employees who perform well. In other words, high employee TP causes leaders to exhibit more SLB. A third possibility (illustrated in Panel 3) is that SLB and employee TP are reciprocally related: supportive leaders elicit better performance from their employees and this high performance is reinforced by yet more support from leaders. Of course, it is also possible that the observed correlation between SLB and employee TP is spurious and due to a third (confounding) variable. For example, it is possible that the organization's reward system causes SLB and employee TP to covary, although they are not causally related. This spurious relationship is shown in Panel 4. Finally, as illustrated in Panel 5, the correlation between SLB and TP may be moderated by another variable, such that the relationship is positive at one level of the moderator and weaker, non-existent, or negative at another level of the moderator. The role of experimental designs in minimizing threats to internal validity Experimental research designs are important because they minimize threats to internal validity. Internal validity is the confidence a 2

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Fig. 1. Possible causal relationships between supportive leader behaviors (SLB) and employee task performance (TP).

whether an independent variable causes changes in a dependent variable is to manipulate the independent variable quantitatively (e.g., in terms of its magnitude, intensity, or frequency). For example, a leadership researcher might be interested in comparing the effects of high and low levels of autocratic leadership. Such an experiment would require the researcher to define the conceptual domain of autocratic leadership, to operationalize it, to manipulate its level (so one group is exposed to high levels and another group is exposed to low levels), and then to observe the effects on the dependent variable(s) of interest. However, if an experiment includes a more extensive range of values of the independent variable (e.g., low, medium, and high), it is possible to explore curvilinear effects of the independent variable on the dependent variable(s). Yet another possibility is that a researcher is interested in comparing the effects of two qualitatively different types of leadership behavior (e.g., SLB and charismatic leadership behavior) on one or more dependent variables. Such an experiment would require establishing a conceptual distinction between the two forms of leader behavior, followed by the manipulation of these behaviors and comparison of their effects. However, since we are dealing with qualitatively different treatments, the researcher must also establish that the comparison is fair – i.e., the manipulations represent equivalent levels of the respective constructs. According to Cooper and Richardson (1986, p. 179), fair comparisons require that:

researcher has that a change (whether naturally occurring or due to manipulation) in the independent variable causes the observed change in the dependent variable. Although there are a number of confounding variables that may threaten internal validity, the most prominent include selection, history, maturation, testing, instrumentation, regression, mortality and selection by maturation interactions (Campbell & Stanley, 1963; Cook & Campbell, 1976, 1979; Crano, Brewer, & Lac, 2015). Definitions and examples of these threats are provided in Table 1. As we note below, laboratory experiments are particularly well-suited to minimizing threats to internal validity and establishing causal relationships, because participants are randomly assigned to treatments and because these designs offer the researcher a high degree of control over the independent and extraneous variables. What are the characteristics that differentiate between types of experimental studies and between experimental and nonexperimental studies? Although there are many types of experimental designs (e.g., Eden, 2017; Harrison & List, 2004; Shadish, Cook, & Campbell, 2002), we focus on the three designs most widely used in management and leadership research: laboratory experiments, field experiments, and quasiexperiments. Fig. 2 provides four questions that researchers can use to distinguish between these types of experiments and between experimental and non-experimental designs. The first question is whether the independent variable is explicitly manipulated or not. In a study designed to determine whether changes in an independent variable have an effect on changes in a dependent variable there should be at least two different treatment conditions. For example, a researcher interested in the simple question of what effect the manipulation of an independent variable (e.g., abusive supervision) has on a dependent variable (e.g., counterproductive employee behavior) might investigate this by comparing a treatment group (exposed to abusive supervision) with a control group that does not receive the treatment. Experiments that compare the effects of the presence or absence of an independent variable using an experimental and a control group may prove particularly useful in the early stages of a research program, when a leadership researcher is trying to determine whether the independent variable has any effect on the outcome(s) of interest. Another experimental technique that can be used to determine

“the competing theories, factors, or variables are operationalized, manipulated, or measured with equivalent strength. By equivalent strength we mean that: (a) the competing theories, factors, or variables are operationalized, manipulated, or measured with equal care and fidelity (i.e., there is procedural equivalence); and (b) the values taken by the factors or variables vary over equivalent ranges of values in their respective populations (i.e., there is distributional equivalence).” At a minimum, it would be important to show that the levels of SLB and charismatic leadership behavior are manipulated to be approximately equivalent with respect to the distance from their respective population means. Otherwise, as noted by Cooper and Richardson (1986, p. 179), “When a convincing case for … equivalence cannot be made, and the results favor the theory or construct that was more strongly operationalized … then the possibility that the comparison was 3

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Table 1 Definitions and examples of threats to internal validity Label

Definition

Example(s)

Selection

Potential threat due to differences between experimental and control (or comparison) groups that exist prior to the administration of the treatment(s) and may be responsible for the observed effect on the dependent variables(s).

History

Potential threat due to the occurrence of an unanticipated event during the experiment that is not part of the experimental treatment; this event may be responsible for the observed effect on the dependent variable (s). Generally speaking, the potential threat of history becomes more problematic as the length of time between the treatment and the measurement of the dependent variable(s) increases. Potential threat due to study participants growing more mature, more experienced, more fatigued, more knowledgeable, older, etc., when these processes are not the treatment of interest. Maturation threats become more problematic as the period of time between the treatment and the measurement of the dependent variable(s) increases. Potential threat due to the fact that pretest measures may sensitize, prime, or otherwise influence subsequent measures of the dependent variable(s).

Participants who are selected by an organization for high-potential leadership training may differ from control/comparison group participants who are not selected for this training with respect to conscientiousness, intelligence, and interpersonal effectiveness. To the extent that these pre-existing differences are associated with leadership emergence and effectiveness, the causal effect of the treatment will be less clear. A researcher examining the effects of a leadership development program will be unsure if the program is responsible for changes in the effectiveness of the leaders if changes in the organization’s compensation system take place between the delivery of the program and measurement of the dependent variable(s).

Maturation

Testing

Instrumentation

Potential threat due to changes in the measurement instrument from the pretest to the posttest that cause changes in the dependent variable; such changes would not be attributable to the treatment.

Regression

Potential threat due to the fact that when participants are assigned to treatment and control/comparison groups on the basis of extreme scores, their posttest scores on the dependent variable(s) may become more moderate (i.e., regress toward their mean).

Mortality

Potential threat due to fact that participants who drop out of an experiment may differ in some meaningful way from those who remain in the experiment.

Selection x maturation interaction

Potential threat due to differential maturation of the treatment and control or comparison groups with respect to the dependent variable, where the differential maturation is attributable to differences between the groups at the start of the study.

The maturation of middle-school students who are exposed to a leadership skills development program may make it difficult to tell whether it is the program or the students’ natural gains in life experience and intellectual skills that are responsible for the effectiveness of the program. Participants who are asked to rate their stress level before the implementation of a stress reduction program may monitor indicators of stress more carefully, thus making it difficult to tell whether the stress reduction program, the increased monitoring, or a combination of the two is responsible for any change in the dependent variable(s). Observers who are coding leaders’ behaviors for a study in which the leaders are trained to be more supportive may change how they categorize the behaviors as they become more experienced or more familiar with the coding system. To the extent that such changes in coding are reflected in the posttest measures, the effects of the treatment will be less certain. If poorly performing supervisors are selected to participate in a program in which they receive feedback about their lack of effectiveness and are assigned specific goals, it will be difficult to tell whether any improvement in their performance is due to the feedback and goal-setting program or to regression to the mean level of performance. If a study uses several treatment groups in which leaders are trained to exhibit different leadership styles and the number of leaders who drop out of an “autocratic leader” treatment group is higher than from the other groups because some of the leaders are uncomfortable exhibiting autocratic leader behaviors, it would call into question the finding that leaders trained to exhibit empowering leader behaviors become more effective than the leaders trained to be autocratic. If participants are assigned to a treatment group on the basis of their interpersonal skills, and participants who have better interpersonal skills mature faster than participants with lower interpersonal skills (who will have been assigned to the control or comparison group), it will be less clear whether any change in the dependent variable is due to the treatment or to the initial group differences in interpersonal skills.

assign them (or allow them to assign themselves) to treatment conditions. For example, organizations may assign employees to a training program based on their potential (e.g., leadership development) or deficiencies (e.g., knowledge or skills training programs), or offer an organization-sponsored program on a voluntary basis (e.g., stress management intervention). Although these assignment procedures may seem practical, they can be problematic because they introduce the potential for confounding factors (e.g., demographic characteristics, personality variables, IQ, etc.) to influence the internal validity of the study. One way to address potential confounds is by “matching” participants to each experimental condition with respect to what the researcher considers to be the most prominent confounding variables (Holmes, 2014). For example, a researcher interested in determining which of two leader behavior training programs (leadership empowerment vs. directive leadership) has the greatest impact on leaders' effectiveness might match participants in the treatment conditions on IQ, as intelligence has been shown to be related to leadership emergence and effectiveness (Judge, Colbert, & Ilies, 2004). Unfortunately, however, matching participants on one (or a few) characteristic(s) does not guarantee that all potential confounds are controlled (Holmes, 2014; Shadish et al., 2002). For example, matching trainees on intelligence

unfair must be explicitly addressed when the results are discussed.” Because all of the designs described above expose each participant to a single treatment, they are referred to as between-subjects (or between-participants) designs. It is also possible to conduct a withinperson (or within-subjects) experiment in which the participants are exposed to multiple treatments over time. Although within-subject designs have the advantage of reducing the error variance associated with individual differences among the participants and thus increasing statistical power, the fact that exposure to one treatment can confound the effects of subsequent treatments leads many researchers to prefer between-subjects designs. However, regardless of whether a study uses a between- or within-subjects design, it can only be considered experimental if at least one of the independent variables has been manipulated; otherwise, the study is considered to be non-experimental. The second question used to determine the type of experimental design is: Are participants randomly assigned to conditions? In some experimental studies participants are allowed to remain in the groups in which they naturally reside (e.g., existing work groups in organizations; sports teams; cohorts/classes, etc.). In these studies, the researcher does not control the assignment of the participants to groups, but does control how treatment(s) are assigned to the pre-existing groups. In other studies, the organization for which the participants work may 4

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Fig. 2. Decision tree for classifying experimental research designs.

temperature, lighting, ambient noise, equipment, etc.), psychological characteristics (e.g., cognitive requirements of the task, job stress or strain, work-related distractions, etc.) and social characteristics (e.g., presence of other people, number and type of interactions with others, potential for interpersonal conflict, etc.). The problem is that these factors may affect the dependent variable(s) directly or through interactions with the independent variable(s). Laboratory experiments offer the researcher greater control over extraneous variables than do field experiments, because the researcher can control the independent variable and a host of extraneous physical, psychological, and social factors in a laboratory setting. This is one reason why research conducted in laboratory settings generally has higher expected internal validity than research conducted in field settings; although this depends on the specific level of control exercised on these characteristics in each setting. Of course, as noted in Fig. 2, it is possible that even in studies where an independent variable has been manipulated, participants may not be assigned randomly to conditions. In such cases, it is important to ask a final set of questions: does the design include a control/comparison group, or multiple observations of a single group of participants? If the design includes a control or comparison group or if multiple measures of the dependent variable are taken before and after the manipulation of the independent variable in a single group, then the study qualifies as a quasi-experiment. If neither is the case, then the study is non-experimental, even if the independent variable has been manipulated. Campbell and Stanley (1963) referred to this kind of design as a preexperimental design.

will not necessarily control for extraversion, conscientiousness, openness to experience, or other factors which have also been shown to influence perceived leadership effectiveness (Judge, Bono, Ilies, & Gerhardt, 2002). Moreover, the difficulty increases as the number of characteristics on which the participants must be matched increases, as does the number of cases required to select a matched sample (Schwab, 2005). That said, some matching algorithms, such as propensity score matching, do work well (Holmes, 2014). We discuss this approach briefly in the section on quasi-experimental designs. Unlike the approaches discussed above, random assignment is used in experimental studies to create multiple groups that are presumed to be equivalent in terms of various attributes (e.g., age, gender, personality, race, IQ, etc.). Under random assignment each participant has an equal chance of being assigned to each treatment condition. Schwab (2005, p. 64) noted that the primary advantage of random assignment over other assignment procedures is that it “controls for nuisance variables whether or not researchers are aware of them.” Random assignment is often viewed as a great equalizer, because it increases confidence that all extraneous factors that could influence participants' behaviors are approximately equally distributed across conditions. In a study designed to examine the effects of an independent variable on dependent variables at the individual level, this is accomplished by randomly assigning participants to treatments, However, in studies designed to examine the effects of independent variables on dependent variables at the group level, control is maximized by (a) randomly assigning participants to groups and then (b) randomly assigning groups to treatments. This is particularly important if the composition of the groups (e.g., in terms of demographic characteristics, personality traits, abilities, skills, or other psychological variables) could influence the outcomes of the experiment. In such cases, failing to assign participants to the groups randomly would provide a potential alternative explanation for the observed effects. Assuming that participants have been assigned to treatment conditions randomly, the third question is: Does the researcher have control over the experimental setting? As Crano et al. (2015) have noted, research settings may differ in a number of ways that affect the level of control a researcher has over extraneous variables. These factors include the setting's physical characteristics (e.g., physical layout,

The importance of manipulation checks in experimental designs Although random assignment facilitates internal validity by minimizing or eliminating individual differences as an explanation for observed effects, manipulation checks are required to confirm that the treatment conditions have operationalized the independent variable as it has been conceptualized (i.e., the manipulation is construct valid; Cook & Campbell, 1979). Ideally, manipulation checks should be undertaken as part of a pilot study, both to allow the researcher to revise the manipulation before it is used in the primary study (should this 5

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Laboratory experiments

prove necessary), and because manipulation checks made before or after measurement of a dependent variable can prove problematic (cf. Aronson & Carlsmith, 1968; Kidd, 1976; Lonati et al., 2018; Perdue & Summers, 1986; Wetzel, 1977). For example, manipulation checks carried out before measurement of the dependent variable(s) could alert participants to the nature of the study (i.e., serve as demand characteristics), whereas manipulation checks carried out after the measurement of the dependent variable(s) may be ineffective because the effects of the manipulation may have already dissipated or because participants' responses to the manipulation may bias their response to the manipulation check (Kidd, 1976; Lonati et al., 2018; Perdue & Summers, 1986). Although there are some circumstances under which manipulation checks may be deemed unnecessary or counterproductive (cf. Sigall & Mills, 1998), providing evidence that the treatments used in an experiment are related to the level of the variables they are intended to manipulate increases confidence in the inferences made from such studies (Perdue & Summers, 1986). Furthermore, if an experiment includes multiple independent variables it is important to determine whether the manipulations themselves are confounded. More specifically, Perdue and Summers (1986, p. 322) argued that,

Characteristics of laboratory experiments Fisher (1984, p. 169) defined a laboratory experiment as a procedure in which the researcher attempts to test causal hypotheses by manipulating one or more independent variables (hypothesized causes) and measuring one or more dependent variables (hypothesized effects) while controlling for all other variables. If done properly, the researcher may conclude that varying levels of the independent variable caused the observed differences in the dependent variable, since nothing in the situation, procedure, or subjects was systematically different across groups except for the independent variable. (Italics in original.) As indicated in Table 2, laboratory experiments are designed to establish causal relationships between independent and dependent variables. Laboratory experiments accomplish this more effectively than other experimental designs, because the researcher not only has precise control over the independent variable, but also because the participants are randomly assigned to treatment conditions, and the researcher exercises considerable control over the research setting. Thus, unlike experiments conducted in field settings, laboratory experiments enable the researcher to control a variety of physical, psychological, and social extraneous variables, which reduces the number of alternative explanations (rival hypotheses) that can be used to explain changes in the dependent variable(s) and increases the internal validity of the study and the replicability of its findings (Camerer, 2015). Of course, the high degree of control over the independent and extraneous variables that is possible in laboratory settings also has some potential disadvantages. For example, laboratory experiments are often compared unfavorably with field experiments and quasi-experiments on the grounds that controlled settings (a) are artificial and lack realism, (b) increase the potential for subjects' reactivity, and (c) lack generalizability.

an adequate analysis of a manipulation check for a given factor (manipulation) within a multiple-factor design requires the use of the full-factorial ANOVA model whenever it is plausible that one manipulation may have inadvertently affected an independent variable associated with a different manipulation. Furthermore, researchers must be concerned with the statistical significance of all main and interaction effects, not just those involving the factor corresponding to the manipulation check measure being analyzed. A statistically significant main effect for the manipulation (factor) corresponding to the manipulation check being analyzed provides evidence in favor of the convergent validity of that particular manipulation. To the extent that other main and/or interaction effects are statistically significant, the discriminant validity of the associated manipulations becomes suspect. Ideally, only one effect, the main effect of the factor (manipulation) of interest, will be statistically significant. If effects associated with other manipulations prove to be statistically significant, these manipulations will have been “falsified” in the sense that they have not had their intended effects.

Examples of laboratory experiments in the leadership domain To illustrate some of the ways in which laboratory experiments have been used in leadership research we provide a few examples from the literature. The first example (Doci & Hofmans, 2015) treats leadership as the dependent variable, whereas the other two examples (Howell & Frost, 1989; Podsakoff, Whiting, Podsakoff, & Mishra, 2011) treat leadership as the independent variable. Doci and Hofmans (2015) conducted a laboratory experiment to examine the effects of task complexity on transformational leadership behaviors. They hypothesized that leaders who experience stress due to the complexity of their group's task are less likely to engage in transformational leadership than leaders whose groups perform less complex tasks. They also hypothesized that the putative negative effect of task complexity on transformational leadership would be mediated by the leaders' core self-evaluations (i.e., leaders' perceptions of themselves, their worth, and their abilities). They used a within-subjects research design in which participants were required to work in three-person groups simulating teams charged with making decisions of varying complexity. One member was randomly assigned to play the role of the leader and the other two participants played the role of subordinates. Three different tasks were developed (choosing a new office space to rent; choosing a new product to market; choosing a new project manager) at three different levels of complexity (low; moderate; high). After participants were allocated to groups and roles, the groups performed a training task to familiarize participants with the requirements of the experimental tasks. Then each group was asked to solve three different decision-making tasks of variable complexity. To control for possible order effects, both the sequence of the tasks and their complexity were randomly assigned to groups. For manipulation checks, participants

Perdue and Summers (1986) also note that if researchers are concerned that other, related constructs may be influenced by their manipulation or that their manipulation could be interpreted in terms of more than one construct, then they would be wise to check for confounds. Confound checks are measurements of variables that have not been explicitly manipulated, but may nevertheless have an effect on the dependent variable(s). If it can be shown that the treatment influences measures of the manipulated variable but not the potential confounds, this will increase confidence that the theoretical construct of interest – and not some other construct – caused the observed variation in the dependent variables.

What are the characteristics, strengths, and limitations of the various types of experimental designs? There is considerable variability within each of the three general types of experimental design. Some of these differences directly influence the inferences that can be made about the internal (and external) validity of the findings. In the sections that follow, we clarify the implications of these differences by: (a) comparing laboratory, field, and quasi-experiments with respect to a number of characteristics, (b) providing examples from the leadership literature, and (c) discussing their strengths and limitations. A summary of these points are provided in Table 2.

6

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Table 2 Comparison of the characteristics of laboratory experiments, field experiments, and quasi-experiments. Characteristic

Laboratory experiments

Field experiments

Quasi-experiments

Objective or goal of the design Manipulation of the IV Random assignment of participants to conditions Controlled setting Amount of control over IV Amount of control over extraneous variables Internal validity of findings? Replicability of findings Number of potential rival hypotheses Realism of research setting Participants' awareness of participation Generalizability of findings Strengths of design

Establish causal relationship between IV and DV

Establish causal relationship between IV and DV

Yes Yes

Yes Yes

Establish causal relationship between IV and DV Yes No

Yes Very precise control of IV

No Variable

No Variable

High

Moderate to high

Low to moderate

High

Moderate to high

Low to moderate

High Low

Moderate Low to moderate

Low to moderate Moderate to high

Low

High

High

High

Moderate to low

Low

Low to moderate

Moderate to high

High

manipulates IV(s) and has • Researcher substantial control over potential confounding variables.

assignment to experimental • Random conditions reduces the chance that pre-

• • • • • • • • • • •

existing differences between conditions will be able to account for observed changes in the DV(s). Control over independent and confounding variables reduces error variance, and increases the likelihood that relationships will be detected. Minimizes effects of endogeneity biases. Allows researchers to obtain consistent estimates of the effects of the IV(s) on the DV (s), and estimators that converge on the population parameters as sample size increases. High internal validity permits researcher to make strong claims regarding causal relationships between IV and DVs. Provides researcher with an effective way of examining both the main and interactive effects of two or more IVs on DVs. Allows for use of complex factorial designs that would be difficult, if not impossible to implement in field settings. Particularly effective in testing “crucial hypotheses” Permits researcher to study topics that are difficult, if not impossible, to study in natural environments (e.g., field settings where researchers are concerned for the health and safety of workers). Easier for researchers to examine participants' behavior or the outcomes of their behavior, rather than their behavioral intentions or perceptions of specific behaviors. Permits researchers to examine the construct validity of their measures using experimental (causal) techniques. Permits strong (experimental) tests of mediation hypotheses.

(or someone in the organization) variable is manipulated by • Researcher • Independent manipulates the IV(s). someone. assignment to experimental causal inference when random • Random • Strengthens conditions reduces the chance that preassignment is not possible or ethical. existing differences between conditions will researcher to explore causal • Permits be able to account for observed changes in relationships between IV and DV. the DV(s). with the growing interest in • Consistent evidence-based management. researcher to explore causal • Permits relationship between IV and DV. the experiment is conducted in a • Because real-life context, the intensity of the IVs is with the growing interest in • Consistent more likely to “mirror” that in real life. evidence-based management. the experiment is conducted in a Observed behaviors are more likely to • Because • real-life context, the intensity of the IVs is reflect the form and strength of real life more likely to “mirror” that in real life.

behaviors are more likely to reflect • Observed the form and strength of real life because

• • • • • • •

they are occurring in a natural setting. Typically results in lengthier exposure of participants to the experimental setting(s). Because participants are generally less aware of experimental conditions than in a laboratory experiment, demand characteristics are less likely to influence the results. Because they are conducted in organizational settings, the results are likely to have practical relevance. External validity is typically higher than laboratory experiments. Facilitates the development of better theories of time and temporal relationships. Facilitates collaboration between researchers and practitioners. In principle, allows for strong (experimental) examination of mediation hypotheses.

• • • • • • •

because they are occurring in a natural setting. Typically results in lengthier exposure of participants to the experimental setting(s). Because participants are generally less aware of experimental conditions than in a laboratory experiment demand characteristics are less likely to influence the results. Because they are conducted in organizational settings, their results are likely to have practical relevance. Facilitates the development of better theories of time and temporal relationships. High external validity. Minimizes ethical concerns about harm to participants, inequity, paternalism, and deception. Facilitates collaboration between researchers and practitioners.

(continued on next page)

7

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Table 2 (continued) Characteristic Potential limitations of design

Laboratory experiments inability to manipulate complex • Presumed constructs (e.g. leadership behaviors) with precision.

Field experiments be difficult to gain access to • May organizations to carry out this type of research.

Quasi-experiments be difficult to gain access to • May organizations to carry out this type of research.

awareness of experimental precise control over the IV(s) than in the precise control over the IV(s) than in • Participants' • Less • Less setting increases the probability that demand laboratory. the laboratory. characteristics act as confounding variables. difficult to control extraneous variables of random assignment increases the • More • Lack probability that pre-existing group of studies (brief exposure to (environmental factors) than in laboratory • Artificiality differences influence subsequently observed manipulation; behaviors and decisions are settings. changes in the DV(s) of interest. less meaningful and less consequential in the More difficult to replicate findings than in • laboratory setting) reduces ecological laboratory research. Less confidence that experimental and • validity. control groups do not differ in some Difficult to implement complicated factorial • important ways than in randomized designs. high level of control may result in an • The experiments. impoverished environment in which the Field experiments typically require more • manipulated variable is the only stimulus to time, effort, and planning than laboratory More difficult to exercise control over • which participants can respond. experiments. extraneous variables (environmental factors) than in laboratory settings. that student participants are not Potentially more ambiguity about cause• Concern • representative of non-student populations. effect relationships than in laboratory Susceptible to the effects of endogeneity • experiments. biases. from laboratory settings may not • Findings generalize to non-laboratory (e.g., More threats to internal validity than in • organizational) settings. laboratory settings. deception is used in laboratory More difficult to replicate findings than in • When • experiments, some participants may suspect laboratory research. the deception and behave differently from Difficult to implement complicated factorial • participants who do not suspect deception. designs. typically require more • Quasi-experiments time, effort, and planning than laboratory experiments.

more ambiguity about cause• Potentially effect relationships than in laboratory Examples from the leadership literature

and Hofmans (2015) • Doci et al. (2011) • Podsakoff • Howell and Frost (1989)

et al. (2002) • Dvir et al. (2013) • Martin • Avey et al. (2011)

experiments.

et al. (2001) • Hui & Hofmann (2011, Study 1) • Grant • DeRue et al. (2012)

Note. IV = independent variable. DV = dependent variable.

depicting high and low levels of each. These scripts were rated by subject-matter experts to ensure that they depicted the intended behavior at the intended intensity level (high vs. low). They trained one actor to serve as the interviewer and another to serve as the interviewee and recorded a series of scripted interviews. In the main experiment, Podsakoff et al. presented the videos to subjects who were randomly assigned to one of the 32 treatment conditions. Manipulation and confound checks of the videos were performed in a pilot study using students who did not participate in the main experiment. As task performance and OCB are qualitatively different variables, the authors established the equivalence of the corresponding manipulations (Cooper & Richardson, 1986). Specifically, the results of the pilot study demonstrated that: (a) videos intended to depict high levels of each behavioral variable (i.e., supervisor task performance, administrative task performance, helping, voice and loyalty) elicited high ratings of the corresponding behavior; (b) videos intended to depict low levels of each behavior elicited low-level ratings of the corresponding behavior; (c) mean ratings of videos depicting different behaviors had similar ratings; and (d) high- and low-level videos of each behavior had significantly different ratings. Podsakoff et al. (2011) found that the hypothetical job candidate was generally rated more competent, received higher overall evaluations, and received higher salary recommendations when exhibiting higher levels of helping, voice, and loyalty behaviors in the interview than when exhibiting lower levels of these behaviors, even after controlling for the scripted responses regarding task performance. They also found that the interviewee's responses to voice- and loyalty-related questions interacted with job level such that these responses tended to have stronger effects on selection decisions related to the supervisory position compared to the entry-level position. Finally, content analyses of participants' open-ended responses indicated that selection decisions were particularly sensitive to responses indicating low levels of voice

rated the complexity of each task immediately after they had performed it; and the participants assigned to subordinate roles provided the measures of the dependent variables by rating their leader's transformational leadership behavior after performing each task. Consistent with their hypotheses, Doci and Hofmans (2015) reported that task complexity affected transformational leadership. However, post hoc analyses indicated that the overall difference between conditions was primarily due to lower ratings of transformational leadership in the high task complexity condition compared with the low task complexity condition. Ratings of transformational leadership in the low and moderate complexity conditions were only marginally different, and ratings in the high and moderate complexity conditions were not significantly different. Doci and Hofmans also found partial support for their hypothesis that core self-evaluations mediated the relationship between task complexity and transformational leadership behavior. In our second example, Podsakoff et al. (2011) examined the effects of job candidates' propensity to exhibit organizational citizenship behaviors (OCBs) on selection decisions. They set up simulated interviews in which an actor playing a job candidate responded to interview questions about job performance and three types of OCB in administrative positions. More specifically, they used a 2 (task behavior response: high vs. low) × 2 (helping behavior response: high vs. low) × 2 (loyalty behavior response: high vs. low) × 2 (voice behavior response: high vs. low) × 2 (job position: supervisory vs. entry-level) betweensubjects factorial design to examine the effects of these factors on participants' overall evaluations of the candidate, ratings of the candidate's perceived competence, and recommendations for the candidate's starting salary. Podsakoff et al. (2011) developed interview questions designed to capture candidates' likely task performance and OCBs in each of the two jobs and operationalized task performance and OCBs by creating scripts

8

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Table 3 Manipulations used in Howell and Frost's (1989) study to operationalize charismatic, structuring and considerate leader behaviors. Source: Adapted from Howell and Frost (1989). Charismatic leadership Verbal behaviors

overarching goals • Articulate high performance • Communicate expectations to participants and display confidence in their ability to reach these expectations Empathize with participants' needs

Non-verbal behaviors and interaction style

Paralinguistic cues

• a powerful, confident, dynamic • Project presence between pacing and sitting on • Alternate edge of desk toward participants and maintain • Lean direct eye contact a relaxed posture and animated • Adopt facial expressions to participants in a captivating, • Speak engaging tone of voice

Structuring leadership

Considerate leadership

nature of the task concern for personal well-being of • Explain • Express participants what needed to be done, and how • Decide it should be done importance of the comfort and • Emphasize satisfaction of participants clear about the quantity of work to be • Beaccomplished within a specified period of in two-way communication with • Engage time participants specific work standards • Maintain a neutral, business-like manner – a friendly, approachable persona • Project • Project Adopt a relaxed posture and friendly facial neither warm nor cold • expressions desk and maintain intermittent • Siteyebehind contact Sit on the edge of the desk, lean toward • participants, and maintain direct eye contact. a neutral facial expression and • Adopt demeanour (i.e., do not provide positive Adopt a relaxed posture and friendly facial • reinforcement by nodding or smiling) expressions and demeanour (i.e., nodding to participants using a moderate • Speak level of speech intonation • Neutral tone of voice



approval to participants when appropriate, smiling etc.) Speak to participants using warm tone of voice

indicate that laboratory experiments can differ in complexity. Doci and Hofmans's experiment was relatively straightforward, examining the effect of just one independent variable on one mediator and one dependent variable. They used a modified within-subjects design, in which all of the subjects were exposed to all levels of the independent variable. In contrast, the Podsakoff et al. and Howell and Frost studies were considerably more complex, in that they tested the main and interactive effects of several independent variables on several dependent variables. Both these studies employed between-subjects designs in which the participants were exposed to only one treatment. Finally, despite the fact that all three of the studies were conducted in controlled laboratory settings and involved an element of role-playing, they varied with respect to the level of mundane realism – which refers to the extent to which situations that participants encounter in the laboratory are similar to situations they encounter in real life (Aronson & Carlsmith, 1968). For example, unlike the Doci and Hofmans study, in which participants were assigned specific roles and exposed to all levels of the independent variable, or the Podsakoff et al. (2011) study, in which participants viewed recorded interviews on computers, participants in the Howell and Frost study were immersed in a 2.5 h simulation in which they were expected to perform tasks that are similar to work carried out in real-life organizations with a confederate leader and co-workers. Evidence for the realism of Howell and Frost's study was provided by post-study debriefs, which indicated that none of the participants suspected the true purpose of the study or the use of confederates.

and helping behaviors. A final example of a laboratory experiment that incorporates leadership concepts was reported by Howell and Frost (1989). These researchers conducted a complex study examining the main and interactive effects of three different leadership styles and two levels of group productivity norms on participants' interpersonal adjustment, task adjustment, and task performance using a decision-making task. In this 3 × 2 between-subjects factorial study, participants were randomly assigned to work (a) under the supervision of a confederate of the researcher exhibiting high or low levels of charismatic, structuring, or considerate leadership behaviors and (b) in the presence of two coworkers (also confederates) who exhibited high or low productivity on the task. In order to operationalize the leadership behaviors investigated in the study, Howell and Frost (1989) made a concerted effort to distinguish between them on the basis of the (a) verbal behaviors, (b) nonverbal behaviors and interaction style, and (c) paralinguistic cues associated with these behaviors in the literature (see Table 3 for a summary of the differences). They then trained their confederates (professional actors) to engage in these behaviors and conducted manipulation checks to confirm that the actors portrayed the intended leadership styles accurately. They also trained the confederate co-workers to exhibit high or low productivity norms and checked that this manipulation had the intended effects. Howell and Frost found that participants working under the direction of a charismatic leader expressed higher levels of adjustment to their task, their leader, and co-workers, and performed better than participants working for structuring or considerate leaders, regardless of the productivity norm displayed by their co-workers. They also found that participants working under a structuring leader in a group that exhibited a high productivity norm reported higher task satisfaction and lower role conflict than participants working under a structuring leader in a group that exhibited a low productivity norm. Finally, Howell and Frost reported that participants working under the direction of a considerate leader in a group exhibiting a high productivity norm expressed higher task satisfaction than participants working under the direction of a considerate leader in a group exhibiting a low productivity norm. Together, the three studies discussed above demonstrate that laboratory experiments can be applied to various aspects of leadership using a variety of designs in order to address a range of research questions. For example, these studies demonstrate that leadership can be treated as an independent (Howell & Frost, 1989; Podsakoff et al., 2011) or dependent variable (Doci & Hofmans, 2015). They also

Strengths of laboratory experiments One of the strengths of randomized laboratory experiments is that they permit researchers to address concerns about endogeneity. As noted earlier, one of the conditions necessary for establishing causality is ruling out the possibility that some factor, other than the independent variable being manipulated, is the cause of the change in the dependent variable. Experiments achieve this when participants are randomly assigned to conditions, thus creating a context in which the independent variable is not correlated with other manipulated or unmeasured variables. However, when this is not the case, and the independent variable may be correlated with confounding factors, the problem of endogeneity exists. According to Semadeni, Witherss, and Certo (2014, pp. 1070–1071), “Endogeneity occurs when an independent variable is correlated with the error term (also known as “disturbance” or “residual”) in an ordinary least squares (OLS) regression model … [When this happens] the errors are not random… [and] 9

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

… this leads to biased coefficient estimates.” Because this is not the case in rigorous laboratory experiments, endogeneity is not a concern when interpreting findings from these designs (Antonakis et al., 2010). Indeed, randomized laboratory experiments permit researchers to obtain consistent estimates of the effects of the independent variable(s) on the dependent variable(s), with estimators that converge on the population parameters as the sample size increases (Boruch, Weisburd, Turner, Karpyn, & Littell, 2009; Shadish, 2011). Laboratory experiments possess several other strengths. First, they permit researchers to make strong claims about the internal validity of their findings (Antonakis et al., 2010; Brown & Lord, 1999; Colquitt, 2008; Ilgen, 1986; Wofford, 1999). Because (a) participants in laboratory experiments are randomly assigned to treatment conditions, (b) researchers have precise control over the independent variable(s) of interest, and (c) researchers can exercise substantial control over potential confounding (extraneous) variables, confidence is increased that the independent variable - and not some other factor - is the cause of the observed changes in the dependent variable(s) (Campbell & Stanley, 1963; Cook & Campbell, 1979; Falk & Heckman, 2009; James, 1980; Stone-Romero, 2002). Moreover, because laboratory experiments reduce random error, they increase the likelihood that relationships between the independent and dependent variables will be detected. Third, Mook (1983) and Ilgen (1986) noted that laboratory experiments are particularly well-suited to exploring “can it happen” hypotheses, which are fundamental to the testing of some theoretical statements, while Griffin and Kacmar (1991) argued that laboratory experiments are an especially effective method of testing “crucial hypotheses” – i.e., hypotheses designed to test competing theories or models. A good illustration of the use of laboratory research to test crucial hypotheses is provided in a monograph by Latham, Erez, and Locke (1988). In previous research, Latham and Erez had reported conflicting findings on the relationship between participation in goal setting, goal commitment, and job performance. With Locke serving as a mediator, the authors conducted several experiments designed to reconcile inconsistent findings. In summarizing the results of their studies the authors noted the advantages of using experimental studies to resolve scientific conflicts and disputes. For example, from her experience, Erez (in Latham et al., 1988, p. 768) concluded that, “The collaboration process is not a zero-sum game… both sides gain from the process because it helps to define the specific conditions necessary to validate their predictions.” Another strength of laboratory experiments, illustrated by both the Podsakoff et al. (2011) and Howell and Frost (1989) articles, is that they permit researchers to examine the main and interactive effects of independent variables using fairly complex factorial designs. Although such designs are obviously not limited to laboratory settings, it is difficult to imagine implementing the six experimental conditions utilized by Howell and Frost in a field setting, let alone teasing out the effects of the 32 treatment conditions on the three dependent variables used in the Podsakoff et al. study. Fifth, several authors have noted that laboratory experiments allow researchers to study leadership issues and topics that are difficult, if not impossible, to study in natural environments because they are rare or because they raise ethical concerns (Brown & Lord, 1999; Falk & Heckman, 2009; Griffin & Kacmar, 1991; Ilgen, 1986). For example, Brown and Lord's (1999, p. 534) discussion of Hunt, Boal, and Dodge's (1999) study of the effects of different types of charismatic leadership provides an excellent illustration of the value of laboratory experiments when it comes to studying rare events:

this difficulty reflects the fact that it is unlikely that enough data will be available to examine the effectiveness of different responses to crisis situations. Moreover, to merely describe how different leaders have responded to crises does not necessarily provide direction regarding what could be done in a crisis. Thus, when the phenomenon under consideration is rare, experimental studies provide not only the opportunity to study these situations, but also provide the opportunity to discover the most effective leadership styles for these events and to determine causality. Sixth, laboratory experiments permit researchers to examine the actual behaviors of participants, or the outcomes of these behaviors, rather than their intentions to exhibit such behaviors or their perceptions of these behaviors (Baumeister, Vohs, & Funder, 2007; Colquitt, 2008). More specifically, Baumeister et al. (2007, p. 396) noted that although psychology calls itself the science of behavior, some psychological subdisciplines have never directly studied behavior, and studies on behavior are dwindling rapidly in other subdisciplines … [and] … the direct observation of behavior has been increasingly supplanted by introspective self-reports, hypothetical scenarios, and questionnaire ratings...[Indeed] the selfreport appears to have all but crowded out all other forms of behavior. Behavioral science today … mostly involves asking people to report on their thoughts, feelings, memories, and attitudes. Occasionally they are asked to report on recent or hypothetical actions. Or, somewhat differently (and more rarely), reaction times, implicit associations, or memory recall might be assessed in the service of illuminating a cognitive process. But that is as close as most research gets. Direct observation of meaningful behavior is apparently passé. The same criticism of self-reports could be leveled at research in the field of organizational behavior. However, one of the main advantages of laboratory experiments is that they offer researchers the chance to observe behaviors and their outcomes directly. This advantage is important in the context of leadership research, because most of the contemporary theories and models and incorporate leader behaviors. For example, Doci and Hofmans (2015) examined the effects of task complexity on the transformational leadership behaviors of participants, and Howell and Frost (1989) examined the effects that specific leadership styles had on participants' task performance. Use of laboratory experiments to validate a scale. There are two further, often unappreciated, advantages of laboratory experiments in the context of leadership research. The first is that laboratory experiments can provide strong evidence for the validity of leadership measures. Traditionally, the validity of leadership measures has been inferred from (a) examining the content validity of items intended to measure the leadership construct, (b) assessing the psychometric properties of the scale (e.g., reliability, measurement model fit, factor loadings) and (c) observing whether the empirical relationships between measures of the leadership construct and other constructs in the nomological network are consistent with hypotheses (Churchill, 1979; MacKenzie, Podsakoff, & Podsakoff, 2011; Schwab, 1980). Researchers often assume that if these conditions are satisfied, one can be reasonably confident that inferences based on the measurements made using the scale are valid. However, a number of researchers (Borsboom, 2009; Borsboom, Mellenbergh, & Van Heerden, 2004; MacKenzie et al., 2011; Podsakoff, Podsakoff, MacKenzie, & Klinger, 2013) have noted that there are several problems with this approach to scale validation. Chief among these problems is the fact that the concept of validity implies direction and causality, and correlational evidence based on a nomological network does not provide evidence for either one of these criteria. Indeed, as noted by Podsakoff et al. (2013, p. 100) this traditional approach,

In their investigation, Hunt et al. (1999) experimentally examined how different forms of charismatic leadership (visionary versus crisis responsive) functioned during times of crisis and how effective these forms of leadership were once a crisis had subsided. By their very nature crises are rare events, making them difficult to investigate in the field using correlational survey methods. In part,

…fails to test the heart of what validity is all about. When most 10

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

employee trust and job satisfaction; Schaubroeck, Lam, and Cha (2007) investigated whether team potency mediated the relationship between transformational leadership and team performance; and Detert, Trevino, Burris, and Andiappan (2007) examined whether group-level counterproductivity mediated the relationships between different modes of managerial influence and unit-level financial performance and customer satisfaction. Measurement-of-mediation designs. Traditionally, mediated effects models are tested using one of two techniques. The first technique, which is typically used in non-experimental field studies, involves measuring the predictor variable, the proposed mediating variable, and the criterion variable, and then demonstrating that the indirect effect of the predictor variable on the criterion variable (through the mediator) is statistically significant. Zhang and Bartol (2010) used this design in a cross-sectional field study to demonstrate that the relationship between empowering leadership and employee creativity was mediated by employees' feelings of psychological empowerment, intrinsic motivation, and engagement in the creative process. The second technique, used in laboratory experiments, involves measuring the presumed mediator and dependent variable after manipulation of the independent variable in order to demonstrate that variation in the independent variable is related to variation in the dependent variable “through” the mediating variable. Allen and Rush (1998, Study 2) used this design in a laboratory study to demonstrate that the effect of OCBs on raters' evaluations of an instructor's performance was mediated by the raters' liking of and affective commitment to the instructor. Spencer et al. (2005) referred to these designs as measurement-of-mediation designs and noted that they are useful in situations where it is easy to measure the proposed psychological mediating mechanism, but hard to manipulate it. However, Spencer et al. (2005) also identified a number of potential limitations of these designs: (a) the observed relationship between the mediator and the dependent variable may be spurious, because the designs are correlational; (b) measuring the mediating variable at approximately the same time as the dependent variable may sensitize or prime participants to respond to the dependent variable; and (c) neither design permits strong causal inferences about the relationship between the mediator and the dependent variable. In addition, because these designs often measure the mediator and the dependent variable using the same source and at the same point in time, they are also vulnerable to common method biases (Podsakoff, MacKenzie, Lee, & Podsakoff, 2003; Podsakoff, MacKenzie, & Podsakoff, 2012). For all of these reasons, several experts (e.g., Judd, Kenny, & McClelland, 2001; Kenny, 2008; MacKinnon, 2008) have noted that a statistically significant indirect effect does not, on its own, imply causation. Experimental-causal-chain designs. Fortunately, alternatives to measurement-of-mediation designs do exist. Spencer et al. (2005) referred to one alternative as the experimental-causal-chain design. This design involves carrying out two sequential experiments. In the first experiment the independent variable is manipulated to demonstrate its effect on the presumed mediating variable. In the second experiment the presumed mediating variable is manipulated to determine its effect on the dependent variable of interest. If the independent variable causes the mediator and the mediator causes the dependent variable, then this is interpreted as support for the hypothesized causal chain linking the independent variable to the dependent variable through the mediating variable. A recent study by Liang et al. (2016) used the experimental-causalchain approach to examine why supervisors abuse poorly performing subordinates. Lian et al. hypothesized that poorly performing subordinates elicit hostility from supervisors, prompting them to engage in abusive behavior. They conducted two laboratory experiments to test this hypothesis. In the first experiment they manipulated whether participants in a poor subordinate performance condition made hostile or non-hostile attributions about the poor performing subordinate, by asking the participants to imagine that the subordinate's poor

researchers are asked what is meant by the concept of validity, they say that indicators of a construct are valid to the extent that they measure what a theory says it does (Kelley, 1927). This suggests that basing inferences about validity primarily on a scale's relationships with other constructs in its nomological net is problematic because these relationships only indirectly reveal whether the scale is measuring what it is intended to measure (i.e., whether changes in the theoretical construct cause corresponding changes in the observed responses to the scale items). It would be better to test this assumption directly. Podsakoff et al. (2013) argue that a more effective way of establishing whether a scale is measuring what it is intended to measure is to manipulate the construct of interest and observe what effect this has on items designed to reflect the underlying construct. Assuming that variations in the manipulated construct cause variations in the items, one can infer that the scale is valid. Podsakoff et al. go on to say that scales purporting to measure behavioral constructs (included in many prominent leadership theories) are particularly amenable to such validation using videos that display high and low levels of the leadership behaviors in experiments. The authors provide a step-by-step guide to the development and use of such videos in the scale validation process. A good example of this procedure was reported by Maynes and Podsakoff (2014, Study 4). These authors were interested in demonstrating the validity of their newly developed measures of employee voice behaviors (i.e., constructive, destructive, supportive, and defensive voice). More specifically, they wanted to demonstrate that (a) their measures of employee voice covaried with the attributes they were intended to measure; (b) variation in the measures was preceded by variation in the attributes; and (c) variation in the measures was not caused by other variables. Following the recommendations of Podsakoff et al. (2013), Maynes and Podsakoff developed scripts portraying high and low levels of each type of voice behavior, validated the scripts with subject-matter experts, filmed the scripts with professional actors, showed participants who had been randomly assigned to experimental conditions videos exhibiting high or low levels of each behavior, and asked them to rate the actors using items of all four voice behaviors, as well as other, related constructs. Consistent with their hypotheses, Maynes and Podsakoff (2014) reported that manipulations of all four voice behaviors influenced ratings of the respective voice behavior, and that estimates of the strength of the paths from voice manipulations to putatively related measures were significantly greater than those for the paths to unrelated measures. Taken together, these findings suggest that the measures of voice behavior developed by Maynes and Podsakoff have a high degree of veridical validity (MacKenzie et al., 2011), measure relatively distinct constructs, and are relatively distinct from measures of related constructs (i.e., the measures possess discriminant validity). Given the validity-related criticisms directed at some scales purporting to measure leadership constructs (Podsakoff & Schriesheim, 1985; Schriesheim, House, & Kerr, 1976; Schriesheim & Stogdill, 1975; Van Knippenberg & Sitkin, 2013), this procedure may prove particularly worthwhile for researchers who are interested in validating measures of leadership behaviors. Use of laboratory experiments to test mediation models. Another underappreciated benefit of laboratory experiments is that they allow researchers to conduct strong tests of mediated effects models (Eden, Stone-Romero, & Rothstein, 2015; MacKinnon, Fairchild, & Fritz, 2007; Spencer, Zanna, & Fong, 2005; Stone-Romero & Rosopa, 2010). Most leadership researchers are interested not just in the direct effects of leader behaviors on employee outcome variables, but also in identifying the theoretical mechanisms that transmit the effect of these behaviors to their outcomes. For example, Podsakoff, MacKenzie, Moorman, and Fetter (1990) examined whether the relationship between transformational leadership and employees' OCB was mediated by 11

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

provided by a study conducted by Bauman, Tost, and Ong (2016). These authors hypothesized that unethical behavior by high-ranking individuals changes how people respond to lower-ranked individuals who transgress in the same way. More specifically, they hypothesized that observers would recommend less severe punishment for people if they were imitating higher-ranked individuals rather than people of the same rank, but that this effect would only operate when the two transgressors were members of the same organization. They based this hypothesis on the argument that “the rank-dependent imitation effect on punishment arises because the first actor's behavior is either seen as a mitigating circumstance that reduces blame for the imitator or makes the behavior seem more normal (or both) … [and that] … these mechanisms should only engage when transgressors are members of the same organization” (Bauman et al., 2016, p. 128). To test these hypotheses, Bauman et al. (2016) conducted a 2 × 2 scenario study (Study 2)2 in which they manipulated the rank of the first actor (e.g., higher rank or the same rank as the second actor) and whether the actors worked for the same organization or not, and analyzed the severity of the punishments recommended by the participants. Consistent with their predictions, these authors found an interaction between the rank of the first actor and organization commonality. More specifically, they found that (a) when the two actors were from the same organization, recommended punishments were less severe when the first actor was ranked above the second actor, but (b) when the two actors were from different organizations the severity of the punishment was the same, regardless of the rank of the first actor. Based on these findings, Bauman et al. (2016, p. 129) concluded that, “Given that outgroup members should not influence attributions of blame or perceptions of descriptive norms, the results provide initial evidence that attributions of blame and descriptive norms may play a role in the rankdependent imitation effect.”

performance was intended to cause harm (hostile) or was outside the subordinate's control (non-hostile). Consistent with their hypotheses, Liang et al. found that participants in the hostile attribution condition experienced greater hostility toward the subordinates than participants in the non-hostile condition. In the second laboratory experiment, they manipulated whether participants felt broadly hostile (e.g., angry, hostile, scornful, disgusted, or loathing) or well-disposed (e.g., happy, joyful, delighted, cheerful, excited, or enthusiastic) to a subordinate with whom they interacted, to determine whether these feelings affected participants' intention to engage in abusive behavior. Consistent with their hypotheses, Liang et al. found that participants' in the hostile condition indicated that they intended to use more abusive supervisory tactics on their subordinate than participants in the well-disposed condition. Thus, using these two experiments, Liang et al. supported an experimental-causal-chain between poor performance, hostility, and intentions to employ abusive supervision. Spencer et al. (2005, p. 846) argued that the experimental-causalchain approach provides particularly strong evidence for mediating effects, even though they do not allow for statistical tests of mediation: “The reason we make this claim is that by manipulating both the independent variable and the mediating variable we can make strong inferences about the causal chain of events. We argue that such designs should be understood as a powerful way to examine psychological processes.” In addition, since the mediating variable and dependent variable are not obtained from the same source in this design, potential common source biases are minimized (Podsakoff et al., 2003; Podsakoff et al., 2012). However, as noted by Spencer et al. (2005) and others (Fischer, Dietz, & Antonakis, 2017; Stone-Romero & Rosopa, 2010), this approach is not without its limitations. One is that the researcher must be able to measure the proposed mediating mechanism, which is not always possible. Another limitation is that the researcher must be able to manipulate both the independent and mediating variables. Third, the researcher must be able to provide a compelling argument that the mediating variable measured in Study 1 and manipulated in Study 2 is, in fact, the same variable. Fourth, these designs do not allow researchers to estimate the indirect effect statistically or to calculate how much of the effect of the independent variable on the dependent variable can be attributed to the mediator. In other words, this design does not provide a statistical test of the “indirect effect,” nor an effect size for the mediating effect. Finally, experimental-causal-chain designs are undoubtedly more difficult to implement (if not impossible) when a model includes multiple mediators or when multiple mediating effects are sequentially ordered. Notwithstanding these limitations, we believe that the experimental-causal-chain approach has more benefits than limitations, and we encourage leadership researchers to consider this approach. Moderation-of-process designs. Unfortunately, the experimentalcausal-chain approach cannot be used in cases where the proposed mediator is not amenable to measurement. In such cases an experimental moderation-of-process approach can be used to examine the effects of an unmeasured mediating mechanism, provided that the presence or absence of the mediation process can be manipulated (Spencer et al., 2005). The moderation-of-process approach can provide evidence of the mediating effects of psychological processes provided two conditions are met. The first condition is that the presumed moderating variable has an effect on the proposed psychological mechanism or process. The second condition is that the only way in which the moderator influences the relationship between the independent variable and the dependent variable is through its effect on the psychological process, and that there is no other explanation for the observed pattern of moderating effects. In other words, the manipulations of the moderator indicate the presence or absence of the process presumed to transmit the effect of the independent variable on the dependent variable, and not some other process. One example of the use of this design in the leadership domain is

Potential limitations of laboratory research Despite their advantages, laboratory experiments have some potential limitations. However, several of these limitations may be less problematic than once thought. For example, Wofford (1999) noted that one of the main criticisms is that complex constructs, like leadership, cannot be validly operationalized in laboratory settings. Indeed, leader behavior is complex and many types of leader behavior are highly correlated with each other (DeRue, Nahrgang, Wellman, & Humphrey, 2011; Judge & Piccolo, 2004; Piccolo et al., 2012). That said, contemporary definitions of charismatic and transformational leadership appear to be no more difficult to operationalize in laboratory settings than in field settings, where they are typically measured using surveys. Moreover, laboratory experiments have been used to examine a variety of complex processes, including strategic management decisions (Schwenk, 1982) and battlefield decision processes (Zelditch, 1969). The relatively high correlations reported between different forms of leadership are more problematic if there is overlap at the conceptual level. However, leadership is not the only area in the management literature where there is overlap between purportedly different constructs. For example, organizational commitment and organizational involvement share conceptual content, as does work-related prosocial behavior and OCB. Regardless of context, the key is to employ clear and concise conceptual definitions of constructs which not only identify the attributes shared by the related constructs, but also articulate the attributes unique to the focal construct under consideration (MacKenzie, 2 We generally agree with Lonati et al. (2018) regarding the potential weaknesses of hypothetical choice scenario studies (e.g., the potential to create unwanted demand effects, and the lack of certainty as to whether self-reported choices would be reflected in actual behavior). Nevertheless, we reference the Bauman et al. (2016) study because it is one of the few leadership studies we could find that employs a moderation-of-process approach.

12

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Henshel (1980), Mook (1983), and Ilgen (1986), criticism about the artificiality of laboratory experiments demonstrates a misunderstanding of the objectives of this research. The basic goal of laboratory experiments is to test theoretical propositions about the causal relationships between variables (Berkowitz & Donnerstein, 1982; Ilgen, 1986; Kruglanski, 1975; Postman, 1955) and in this context, artificiality may be regarded as a virtue rather than a vice (Henshel, 1980; Mook, 1983). Moreover, as noted by Ilgen (1986), beyond allowing researchers to examine whether some event, condition, or process can occur, laboratory experiments are particularly well-suited to addressing questions and issues which may be impractical to investigate in field settings. For example, conducting field research may (a) be too costly, (b) raise ethical concerns, (c) put the health or safety of organizational participants at risk or (d) not allow the researcher to examine the effects of certain variables directly. Similarly, because most social phenomena are complex and determined by multiple factors, “it is nearly impossible to learn much about specific cause-effect relations without the benefit of a controlled artificial research setting” (Kardes, 1996, p. 280). That said, we do not mean to suggest that laboratory experiments are appropriate for exploring all leadership phenomena. For example, Mitchell, Vogel, and Folger (2015) noted that it may be difficult to get participants to express realistic yet taboo reactions in laboratory settings, such as the satisfaction experienced from witnessing one's peers being subjected to (justified) abusive supervision. Moreover, building on the work of Heath and Sitkin (2001), Mitchell et al. argued that some aspects of organizations may be difficult to capture in laboratory settings. Although we generally agree with these statements, we feel it is important to note that these concerns should not discourage leadership researchers from studying phenomena that can be examined in a laboratory setting. Regarding the generalizability of laboratory experiments, there are several points worth noting. First, a number of scholars (Bass & Firestone, 1980; Berkowitz & Donnerstein, 1982; Highhouse, 2009; Kardes, 1996; Lucas, 2003; Lynch, 1982) have noted that illuminating the psychological processes by which the independent variable affects the dependent variable is more important than demonstrating that findings from the laboratory generalize to field settings. Second, comparisons of the empirical relationships reported in laboratory experiments and field studies (Anderson, Lindsay, & Bushman, 1999; Locke, 1986; Mitchell, 2012; Vanhove & Harms, 2015), suggest that they are often quite similar. For example, Locke (1986) asked prominent scholars in a variety of different content areas in I/O psychology, organizational behavior and human resources management to compare the results of laboratory and field studies. Locke's summary of these comparisons indicated that the direction of the effects observed were almost always the same in both settings. More compelling evidence for the comparability of laboratory and field findings comes from a series of meta-analytic studies. Anderson et al. (1999) reported a high correlation (r = 0.73) between effect sizes from laboratory and field studies of a variety of social psychological phenomena. Similar results were reported by Mitchell (2012), who replicated the Anderson et al. meta-analysis using a substantially larger sample (217 versus 38) covering a wider range of psychological phenomena. After removing an outlier, Mitchell reported that the average correlation between the effect sizes reported in field and laboratory settings was virtually the same (r = 0.71) as that reported by Anderson et al. Mitchell cautioned that the average correlation did vary by sub-field, but it is encouraging that the correlation was highest in the field of I/O psychology (r = 0.89). Finally, a more recent study by Vanhove and Harms (2015) examined 203 meta-analyses from both laboratory and field settings reporting estimates for relationships involving “workplace phenomena.” These authors reported that although the correspondence between findings from laboratory and field settings is still fairly high in some cases, the relationship is dependent on several factors, including the specific type of variables used as predictors and outcomes in these settings (i.e., demographic characteristics, traits, psychological states,

2003; Podsakoff, MacKenzie, & Podsakoff, 2016; Suddaby, 2010). Perhaps one of the best examples of this approach is Howell and Frost's (1989) laboratory-based manipulation of charismatic, structuring, and considerate leadership styles that we discussed earlier. Drawing on the conceptual definitions in the literature, Howell and Frost distinguished these leadership styles in terms of three main attributes: verbal behaviors; nonverbal behaviors/interaction styles; and paralinguistic cues. The results of their study show that rather complex leader behaviors can be distinguished from one another, and that they produce some predictable differences in a variety of outcome variables. Another major concern regarding laboratory experiments is that their findings may be affected by demand characteristics (Lonati et al., 2018; Orne, 1962, 1969), and experimenter expectancy effects (Rosenthal, 1967; Rosenthal & Rosnow, 1991). According to Crano et al. (2015, p. 134), demand characteristics represent the “totality of all social cues communicated in a laboratory not attributable to the manipulation, including those emanating from the experimenter and the laboratory setting, which alter and therefore place a demand on the responses of participants.” As noted by Shimp, Hyatt, and Snyder (1991), these cues are problematic because they lead researchers to make erroneous inferences about cause-effect relationships. The related phenomenon of experimenter expectancy effects is defined by Rosenthal and Rosnow (1991, p. 619) as an “artifact which results when the hypothesis held by the experimenter leads unintentionally to behavior toward the subjects which, in turn, increases the likelihood that the hypothesis will be confirmed.” Although demand characteristics and experimenter expectancy effects can be problematic in any research setting (McCambridge, de Bruin, & Witton, 2012), they are often more challenging in laboratory experiments because such settings heighten the salience of the stimuli manipulated by the researcher. Lonati et al. (2018) identified several features of experimental settings that increase the probability of demand characteristics, including the perceived authority of the experimenter over participants, participants' need for social approval, the salience of the experimental manipulation, and situations in which experimenters who are not blind to the treatments interact with the participants. Despite the potential problems produced by demand artifacts, research has shown that these effects typically occur under specific conditions (i.e., when participants are apprehensive about how their performance is being evaluated or when they become aware of the specific hypothesis being tested and adopt a faithful participant role; Weber & Cook, 1972), and demand effects are not inevitable provided researchers take precautions to minimize potential problems. Strategies for reducing demand artifacts have been provided by Sawyer (1975), Rosenthal and Rosnow (1991) and Lonati et al. (2018). In addition, Rosenthal and Rosnow (1991) have provided several suggestions for minimizing experimenter expectancy effects. These include (a) taking steps to ensure that experimenters are blind to the hypotheses and/or treatment condition being administered, (b) minimizing the interactions between experimenter and study participants, (c) using more than one experimenter, (d) monitoring the behavior of the experimenters during the study and (e) analyzing experiments for order effects. Laboratory experiments are also criticized for their artificiality on the grounds that (a) observations in laboratory settings are made over a relatively short period; (b) the consequences of behavior in a laboratory setting rarely correspond to those for the same behavior in real organizational settings; (c) “paper people” are not the same as the stimuli employees encounter in real organizational settings (Murphy, Herr, Lockhart, & Maguire, 1986); and (d) social interactions and tasks performed in work settings are much more complex than those that occur in the laboratory (Dobbins, Lane, & Steiner, 1988; Ilgen, 1986). These criticisms have led some researchers (Gadlin & Ingle, 1975; Harré & Secord, 1972; Kingstone, Smilek, Ristic, Freisen, & Eastwood, 2003) to argue that the control exercised in the laboratory is purchased at the price of generalizability, and other researchers to dismiss the findings from laboratory experiments as irrelevant. However, as noted by 13

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

would encourage leadership researchers to consider the short- and longterm effects of using deception in laboratory experiments and consistent with APA guidelines, to avoid the unnecessary use of deception. Moreover, if deception is used, researchers should be able to justify its use to participants during the debrief.

workplace characteristics and decision-making). Third, criticisms regarding the lack of generalizability of laboratory experiments appear to assume that the findings of field studies are inherently more generalizable. However, research by Dipboye and Flanagan (1979) called this assumption into question. Their analysis of the applied psychology literature indicated that studies conducted in field settings are as narrow as laboratory studies in terms of the types of actors, behaviors, and settings sampled. This led them to concluded that,

What can be done to increase the probability that laboratory experiments will be published? Given the predisposition that many reviewers and editors appear to have against laboratory experiments, researchers must present compelling reasons for conducting and reporting such studies. Colquitt (2008) provided a useful set of recommendations for those interested in publishing laboratory experiments in the organizational sciences. These recommendations include (a) making a significant contribution to the literature by testing, extending, or building new theory; (b) ensuring that the experimental setting and procedures capture the essence of the constructs of interest (i.e., experimental realism); (c) meeting high standards of technical adequacy in terms of internal validity, construct validity, and statistical conclusion validity; (d) using behavioral dependent variables or their outcomes (where appropriate); and (e) striving to produce original, interesting, and important research findings. Although Colquitt's recommendations were developed specifically for researchers interested in publishing in AMJ, they are relevant to scholars wanting to publish laboratory research in other journals in the organizational sciences, including those focused on the leadership domain. Beyond following Colquitt's (2008) recommendations, there are other strategies researchers can use to increase the likelihood that their laboratory experiments will be published. These strategies include (a) providing complementary qualitative data, either from the laboratory experiment itself or from a separate qualitative study; (b) pairing the results of the laboratory experiment with a quantitative field study; or (c) combining the results of several laboratory experiments to help clarify theoretical mechanisms (i.e., mediating variables) or boundary conditions (i.e., moderators). A variation on the first strategy was employed by Podsakoff et al. (2011) in their examination of the effects of job applicants' propensity to exhibit OCB on selection decisions. In addition to the quantitative analysis of the effects of task performance and OCB on participants' ratings and salary recommendations for interviewees, Podsakoff et al. also content analyzed participants' responses to open-ended questions about their decisions. The results of this analysis not only supported the quantitative findings, but also provided additional insights into the factors influencing raters' judgments. For example, these analyses indicated that (a) low-level responses had substantially more perceived influence on participants' evaluations and decisions than high-level responses and (b) low levels of voice and helping behaviors were perceived to have more influence than low levels of task performance or organizational loyalty. These findings led Podsakoff et al. to speculate that responses indicating low helping behavior might be treated as a signal that the job candidate might be difficult to work with, or unwilling to chip in and help others when necessary. Likewise, these authors speculated that low voice might be interpreted by raters as a signal that the job candidate would be unwilling to take the initiative to help the organization, even if he or she had suggestions for improvement. It is unlikely that these deeper insights would have been made without the qualitative data. Other advantages of using mixed methods designs that integrate quantitative and qualitative data in leadership research have been discussed by Stentz, Plano Clark, and Matkin (2012). An example of the second strategy is the study by Giessner, van Knippenberg, and Sleebos (2009) on the effects that leaders' group prototypicality and performance have on followers' perceptions of their leadership effectiveness. Using a laboratory experiment, in combination with a scenario study and a cross-sectional field study, these authors found support for their hypothesis that prototypical and non-prototypical leaders would receive similar evaluations after success, but after

Contrary to the common belief that field settings provide for more generalization of research findings than laboratory settings do, field research appeared as narrow as laboratory research in the actors, settings, and behaviors sampled. Indeed, industrial-organizational psychology seems to be developing in the laboratory a psychology of the college student, and in the field, a psychology of the self-report of male, professional, technical, and managerial employees in productive-economic organizations. (Dipboye & Flanagan, 1979, p. 141) Of course, we are not suggesting that leadership researchers conducting laboratory experiments should ignore concerns about the generalizability of their findings. After all, leadership researchers are interested in improving the effectiveness of leaders in real-life organizational settings. However, like Highhouse (2009), we think that the primary focus of laboratory experiments should be on making sure that the manipulations of leadership phenomena are valid, representative, fair, and powerful enough to produce the intended effects. Another concern about laboratory research is that student participants are not representative of the general population, which raises questions about the applicability of studies which rely on student samples. Indeed, there is a long history of concern about the use of student participants in social science research (Cooper, McCord, & Socha, 2011; McNemar, 1946; Oakes, 1972; Peterson, 2001; Rosenthal & Rosnow, 1969; Schultz, 1969). The basic issue is whether student samples produce the same results as non-student samples. Several authors (Compeau, Marcolin, Kelley, & Higgins, 2012; Gordon, Slade, & Schmitt, 1986, 1987; Henry, 2008; Landy & Bates, 1973; Sears, 1986; Slade & Gordon, 1988) have claimed that student participants differ in important ways from non-student participants. However, others (Dobbins et al., 1988; Greenberg, 1987) have argued that criticisms of the use of students demonstrates a misunderstanding of the goals of laboratory experiments, and are often flawed. One potential way to reconcile these conflicting opinions was suggested by Gordon et al. (1986). These authors noted that in many of the cases where student and non-student samples produce similar results, the student and non-student samples were equally (un)familiar with the tasks that they are asked to perform, and that differences are likely to occur when populations are differentially familiar with a task. In other words, to increase generalizability, student and non-student participants should be matched for their task familiarity. Unfortunately, we are not aware of any systematic examination providing evidence of this proposal; thus, we regard it as an interesting avenue for future research. One final limitation of laboratory experiments is the possible side effects of using deception. Although the use of deception is fairly common in some domains of psychological inquiry, several researchers (Antonakis, 2017; Hertwig & Ortmann, 2008; Jamison, Karlan, & Schechter, 2008; Ortmann & Hertwig, 2002) have noted potential problems with the practice. For instance, research has shown that participants who feel deceived behave differently from those who do not sense deception, and that repeated exposure to deception makes participants less trusting of researchers' intentions in subsequent studies (Hertwig & Ortmann, 2001, 2008; Jamison et al., 2008). Of course, obscuring the reasons for an experiment or a manipulation is not always deception. However, lying, deliberately misleading study participants, or mischaracterizing the purpose of the experiment, is typically considered to be deception (Ortmann & Hertwig, 2002). Therefore, we 14

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Nahrgang et al., 2013). Therefore, well-crafted group experiments (whether conducted in the lab or the field) should enhance the likelihood of getting research published in management and leadership journals.

failure group-prototypical leaders would be rated more effective than non-prototypical leaders. In discussing the merits of using a cross-sectional field study to supplement their experimental study the authors noted that, The laboratory may seem a somewhat artificial setting to study the role of leader group prototypicality, given the lack of interaction between leader and followers and the ad hoc nature of the group and the leadership relation… [Therefore], an obvious and important question is whether the relationships studied in the present research may not only be observed in a laboratory setting but also in an organizational setting. To address exactly that question, the present research combined different research methodologies to provide both evidence from controlled experiments that can speak to issues of causality and evidence from the field that can speak to issues of generalizability. (Giessner et al., 2009, p. 446)

Field experiments Characteristics of field experiments Hauser et al. (2017, p. 186) defined field experiments as “Studies that induce a change in a randomly selected subset of individuals (or teams, or units) within their natural organizational context, and compare outcomes to a randomly selected group for which the change was not introduced.” As indicated in Table 2, field experiments share several similarities with laboratory experiments (e.g., establishing casual relationships; manipulation of independent variables; random assignment). That said, there are important differences, some of which favor the field experiment. For example, because field experiments are conducted in “natural” environments, they tend to be more realistic, less likely to sensitize participants to the experimental conditions, and produce results that are generally perceived to be more generalizable. However, unlike laboratory experiments, field settings may require that the independent variables be manipulated by someone in the organization where the study is being conducted, rather than the researcher. This lack of control raises concerns about the quality of the manipulation (construct validity), if the person carrying out the manipulation does not maintain the same standards as a good experimenter. Field experiments also do not typically permit control over potentially important aspects of the context in which the study is conducted (Harrison & List, 2004). This can increase the possibility that rival explanations explain the findings of field experiments, and also means that the results of field experiments may be more difficult to replicate.

A variation on the third strategy (combining the results of several laboratory experiments to clarify the boundary conditions for a specific relationship) has been reported recently by Bendahan, Zehnder, Pralong, and Antonakis (2015). These authors examined the effects that leadership power has on corruption. In their first experiment, they manipulated two variables assumed to be related to a leader's power (number of followers, and the amount of autonomy the leader has in allocating rewards), and found that both of these variables independently influenced leaders' antisocial behavior (corruption). In the second experiment, Bendahan et al. examined the potential moderating effects of two individual difference variables (leader's personality and testosterone level) on the relationship between power and corruption. Their results showed that power interacted with testosterone level to predict corruption, such that corruption was highest when power and baseline testosterone level were both high. They also found that a leader's honesty did not interact with power, but it did have a main effect on corruption that dissipated over time. Thus, the combination of these two studies provided support for Lord Acton's maxim that “Power tends to corrupt, and absolute power corrupts absolutely” (Acton & Himmelfarb, 1948), and also provided new insights into the effects of testosterone on this relationship. Finally, given the increased interest in the study of groups and teams (Mathieu, Hollenbeck, van Knippenberg, & Ilgen, 2017), another strategy for increasing the likelihood of publishing laboratory experiments is to study the relationships between group-level inputs, processes, and outcomes. Several examples using this strategy exist in the literature. For example, Johnson, Hollenbeck, DeRue, Barnes, and Jundt (2013) examined the effects of providing groups with feedback and diagnostic lists on team change processes and performance in selfmanaged teams, and found that structurally misaligned teams that received diagnostic lists and feedback about their misalignment were more likely to change their structure and improve their performance than teams that did not receive the feedback or diagnostic lists. Nahrgang et al. (2013), examined the effects of three different types of goal setting (specific learning, general “do your best” learning, and specific performance goals) on team performance. Contrary to findings at the individual level, they reported that: (a) teams with specific learning goals performed worse than teams with “do your best” learning goals or specific performance goals, and (b) the negative effect of specific learning goals, relative to “do your best” or specific performance goals were magnified under conditions of higher task complexity. The benefits of using laboratory experiments to study group phenomena include: (a) enhancing a researcher's ability to establish causal relationships between group inputs and group processes and their outcomes, (b) minimizing the effects of extraneous variables that are difficult to control when conducting research with real groups in organizational settings, and (c) they can be used to provide strong tests of whether individual-level effects are homologous at the group level (cf.

Examples of field experiments in leadership research We focus on three studies (Avey, Avolio, & Luthans, 2011; Dvir, Eden, Avolio, & Shamir, 2002; Martin, Liao, & Campbell, 2013) that highlight some of the challenges of conducting experiments in field settings. Dvir et al. (2002) studied the effects of transformational leadership training on follower development and performance in two phases. In Phase 1, 160 infantry cadets engaged in officer training in the Israel Defense Force (IDF) were randomly assigned to experimental or alternative treatment group workshops designed to enhance their leadership skills. The experimental workshops focused on transformational leadership theory, whereas the alternative treatment workshops focused on elements of “eclectic” leadership and were based on a psychodynamic framework. Both workshops used a variety of training methods, including role playing, group discussions, simulations, presentations, video cases, and peer and trainer feedback. Phase 2 of the study began after the cadets had completed the officer training course. This phase was conducted during a four-month infantry basic training course that began two months after the leadership training workshops ended. In this phase 54 (34%) of the cadets from Phase 1 were assigned to lead platoons undergoing basic training; 32 of them had received training in transformational leadership and the remaining 22 had received training in eclectic leadership. These 54 platoon leaders had a total of 90 non-commissioned officers (NCOs) reporting directly to them and a total of 724 indirect followers (new recruits) who reported to the NCOs. Dvir et al. (2002) assessed the impact of the platoon leaders' behaviors on (a) the development of their direct followers (NCOs) and (b) the development and performance of their indirect followers (new recruits) at the end of basic training course. Dvir et al. (2002) collected leadership ratings and developmental information from the NCOs and the new recruits at the beginning and end of the basic training course and the performance data from the new recruits at the end of basic training. They conducted several manipulation checks to determine whether the transformational leadership 15

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

training had the intended effects. As expected, the first manipulation (confound) check showed that there were no significant differences in how favorably the platoon leaders in the experimental and alternative treatment groups responded to the leadership workshops. This finding suggests that differences in the effects produced by the platoon leaders could not be attributed to differences in how positively they responded to the training they had received. The second manipulation check indicated that the platoon leaders from the experimental group had acquired more knowledge about transformational leadership theory than the platoon leaders in the alternative treatment group. Finally, the third manipulation check indicated that NCOs in platoons led by the members of experimental group gave their platoon leaders higher ratings for transformational leadership behavior than the NCOs in the alternative treatment group. In contrast, new recruits' ratings of the transformational leadership behavior of platoon leaders were not significantly different across groups. The results of the study revealed several differences between the development of the NCOs of the platoon leaders who had received transformational leadership training and those who had received eclectic leadership training. For example, there were significant group differences in NCOs' feelings of self-efficacy, critical or independent thinking ability, and extra effort, as well as a marginal difference in the strength of their collectivistic orientation. However, somewhat surprisingly, the majority of these differences resulted from the fact that the dependent variables tended to decline over basic training in the alternative treatment group, but remained unchanged in the experimental group. Thus, the primary impact of the transformational leadership training was to prevent regression in NCOs serving under the person trained, rather than to enhance their development. Although Dvir et al. (2002) found that measures of new recruits' development were similar regardless of the leadership training their platoon leader had received, they did find performance differences between the two groups of new recruits. Specifically, recruits in the experimental platoons performed better than recruits in the eclectic leadership platoons on a written test about light weapons and on an obstacle course, and there was a marginal group difference in performance on a practical light weapons test. Dvir et al.'s (2002) study is interesting, because it provides some insight into both the benefits and challenges of field experiments in real organizational settings. On the positive side, this study is one of the first to demonstrate that leaders' behavior can affect both their direct and indirect followers. Demonstrating the indirect impact of leadership would likely be difficult, although perhaps not impossible, in a laboratory setting. On the other hand, the Dvir et al. study also demonstrates some of the challenges that researchers face in conducting experiments in field settings because they may not possess as much control over the independent variable and have considerably less control over extraneous variables. This was exemplified by the fact that a few weeks before the platoon leaders in the experimental condition began performing their leadership role they received a three-hour “booster” session to reinforce the lessons that they had learned in training, but due to budgetary constraints the platoon leaders in the control condition did not receive an analogous session. Dvir et al. noted that this difference in training protocols meant that they could not rule out the possibility that the booster session was responsible for some of the effects they had observed. In our second example, Martin et al. (2013) utilized a pretestposttest experimental design with a control group to examine the consequences of leader behaviors. Business leaders in the United Arab Emirates were recruited and randomly assigned to one of three training conditions: (a) empowering leadership, (b) directive leadership or (c) a control condition. Leaders in the empowering and directive leadership groups received their experimental treatments in two phases. The first phase consisted of a one- to two-hour training session. The second phase lasted for 10 weeks and required the leaders to engage in the newly learned leader behavior for 15 min each day. To ensure

compliance with this protocol, participating leaders were asked to maintain a daily log and hold bi-weekly discussions with a research assistant. Participants in the control group did not receive any specific training in leadership behavior, and were simply told to continue leading their teams in their usual style. Data on the dependent variables were obtained from internal or external customers of the leaders. Employee surveys were used to check the two manipulations of leadership behavior, and a measure of employees' satisfaction with their supervisor was used in moderator analysis. Customer surveys were used to measure core task proficiency and proactivity of the work units. The manipulation checks seemed to confirm that the leadership treatments had their intended effects; specifically, the authors reported that the directive leadership treatment accounted for 15% of the variance in the directive leadership manipulation check, and the empowering leadership treatment accounted for 36% of the variance in the empowering leadership manipulation check. Consistent with the authors' hypotheses, the results showed that work units' task proficiency and their satisfaction with their leaders increased between the pretest and posttest in both the directive and empowering leadership groups, but not in the control group. Post hoc comparisons indicated that posttest proficiency was higher in the empowering and directive leadership conditions than in the control condition, and similar in the two experimental groups. Also consistent with the authors' hypotheses, work units' proactivity increased from the pretest to the posttest in the empowerment leadership group, but not the other two groups; post hoc comparisons indicated that posttest proactivity was similar in the directive and control group work units, and higher in the empowering group units than in either the directive or control group units. However, Martin et al. (2013) did not find evidence to support their hypothesis that satisfaction with the leader moderated the relationships between leader behavior and the outcome variables. Indeed, contrary to their hypotheses (a) satisfaction with the leader did not interact with directive leadership to influence task proficiency and (b) although satisfaction with leader did moderate the relationships between empowering leadership and work units' task proficiency and proactivity, the interactive effects were in the opposite direction to that predicted. The Martin et al. (2013) study is interesting because the authors examined: (a) the effects of two qualitatively different types of leadership behavior on unit-level outcome variables, and (b) potential moderators of these effects. Thus, this study demonstrates that field experiments possess some of the same flexibility as laboratory experiments. However, Martin et al. did report that the empowering leadership treatment accounted for more than twice the amount of variance in the relevant manipulation check (36%) than the directive leadership treatment did (15%), which raises questions about the equivalence of the treatments (Cooper & Richardson, 1986). There are several reasons for the possible lack of equivalence in this study. For example, (a) the directive leadership treatment may not have captured the complete conceptual domain of this construct or (b) the manipulation check may not have measured the directive leadership construct effectively. In the first case, the manipulation lacks construct validity, whereas in the second case it is the measure that lacks validity. Alternatively, it is possible that the manipulation captured the complete domain of the construct, but the directive training was not as effective as the empowerment training or that although the training in both experimental conditions was equally effective, the participants in the directive leadership condition were more uncomfortable exhibiting the directive leadership behaviors than participants in the empowering leadership condition. In any case, differences in the strengths of these manipulations may be an important qualifier of the reported findings. We are not suggesting that establishing the equivalence of qualitatively different behavioral treatments is more important in field experiments than in laboratory experiments. However, it may be easier to ensure equivalence in laboratory experiments than field experiments, because laboratory researchers generally have greater ability to pilot 16

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

variety of topics that are of interest to organizational scholars and practitioners alike, including the effects of monetary (Bandiera, Barankay, & Rasul, 2007, 2011; Shearer, 2004) and non-monetary incentives (Bandiera et al., 2011) on employees' motivation and performance, how credit market imperfections and liquidity constrain a firms' growth (de Mel, McKenzie, & Woodruff, 2008), and how frontline opinion leaders can be used as change agents in organizational settings (Lam & Schaubroeck, 2000). Third, field experiments are less susceptible to criticisms about artificiality because they are conducted in real organizational settings, with “real” people performing “real” jobs; the manipulation of the independent variable reflects the intensity of stimulus events in organizational settings; and participants are typically exposed to the manipulation for a longer period of time and are naturally incentivized to do their jobs (Lonati et al., 2018). Moreover, because participants are often less aware of the experimental conditions in field settings, the possible effects of demand characteristics are less of a concern. In addition, field experiments, like laboratory experiments, provide an opportunity to measure behaviors (and their outcomes) rather than attitudes and perceptions. Finally, like laboratory experiments, field experiments can be used to examine mediation effects using experimental-causal-chain and moderation-of-process designs. Although we are not aware of any set of field experiments in the leadership domain that were explicitly designed to test for mediation effects using either of these approaches, Eden et al. (2015) demonstrated that meta-analyses can be used to synthesize the results of multiple randomized field experiments to establish evidence in support of an experimental-causal-chain model. These authors examined the evidence relating to Eden's (1992, 2003) Pygmalion mediation model. This model hypothesizes that: managers' expectations → managers' leadership behavior → subordinates' self-efficacy → subordinates' performance. In order to test these causal linkages, Eden et al. (2015) first examined meta-analytic evidence from five field experiments demonstrating that managers' expectations cause managerial leadership behavior. Next, they examined evidence from the only true field experiment (Dvir et al., 2002) showing that leadership behavior affects subordinates' self-efficacy. Finally, they examined data from five field experiments showing that subordinates' self-efficacy increases their performance. Eden et al. argued that the collective evidence from these studies supports the hypothesis that leadership behavior and self-efficacy mediate the relationship between leaders' expectations and subordinates' performance. Of course, as noted earlier, the ability to use experimental-causalchain designs to make strong statements about the veracity of mediating effects is conditional on several factors: the measurability of the proposed mediator, the manipulability of both the independent and mediating variables, and whether the mediating variables being manipulated and measured do in fact represent the same construct. Nevertheless, given the advantages of field experiments when it comes to maximizing the internal and external validity of findings, we encourage leadership researchers to consider using experimental-causalchain designs to examine mediation hypotheses. Like Eden (2017) and Hauser et al. (2017), we also encourage researchers to explore the possibility of using moderation-of-process designs in their field experiments on leadership. As noted by Hauser et al. (2017, p. 195), this is possible if researchers include “not just an experimental condition that produces the desired treatment effect (as predicted by the theory), but also include a condition that turns off this effect by blocking a theorized pathway responsible for the effect. That is, by also testing a version of the intervention where the treatment should not show the same impact.” Of course, researchers examining psychological processes using moderation-of-process designs must have strong theoretical grounds for assuming that the levels of the moderator block or facilitate the psychological process of interest, and that there is not some other process that could explain the effects of the independent variable on the dependent variable(s).

test their manipulations. For example, in the Podsakoff et al. (2011) laboratory experiment, the authors assessed the equivalence of their behavioral manipulations (task performance, helping, voice and loyalty) in a pilot study with students who did not participate in the main study. In the pilot study, students were: (a) shown one video depicting the questions and responses for either a high or low level of one behavior, (b) asked to sort the script of that video into one of the four behavioral categories on the basis of its content, and (c) then rate the extent to which the behavior in the script represented a high or low level of the focal behavior, using a scale from 1 (low level) to 7 (high level). Analysis of the data showed that: (a) the participants classified the scripts into the appropriate behavioral category 92% of the time, (b) there was a high degree of consensus about the level of behavior (high vs. low) depicted in videos, and (c) high- and low-level videos of each behavior were found to differ significantly from each other. Notwithstanding the potential difficulties in obtaining this data in field settings, when possible, leadership researchers conducting field experiments should examine the equivalence of their manipulations before employing them. Our final field experiment was conducted by Avey et al. (2011). These researchers examined the effects of leader positivity and problem complexity on followers' positivity and job performance. The sample consisted of engineers in an aerospace firm, and participants were randomly assigned to one of four conditions: (a) a high leader positivity-low problem complexity condition, (b) a high leader positivityhigh problem complexity condition, (c) a low leader positivity-low problem complexity condition, and (d) a low leader positivity-high problem complexity condition. In order to make the experimental tasks as realistic as possible, participants in all conditions were asked to solve problems that were directly related to their jobs. Similarly, in order to ensure that the leadership manipulations were as realistic as possible, Avey et al. led the participants to believe that the high or low expressions of positivity they received were from a team of senior engineering leaders to whom they reported. Manipulation checks indicated that the leader positivity manipulation influenced the measure of this construct, but did not influence the measure of problem complexity, whereas the manipulation of problem complexity influenced the measure of problem complexity but not the measure of leadership positivity. In addition, the leadership positivity by problem complexity interaction did not affect either the leadership positivity or the problem complexity manipulation checks. Consistent with Perdue and Summers (1986), these findings provide strong evidence that leadership positivity can be manipulated independently from problem complexity, and that the manipulations of these variables were not confounded in this study (at least not with the content from their other manipulation). The results showed that (a) leadership positivity had a positive effect on followers' positivity and performance, (b) problem complexity had a negative effect on followers' positivity and (c) there was no interaction between leader positivity and task complexity with respect to followers' positivity or performance. Strengths of field experiments Similar to laboratory experiments, one of the strengths of field experiments is that the researcher (or someone in the organization) exercises control over the independent variable(s), and the participants are randomly assigned to conditions. This means that researchers can use field experiments to explore causal relationships, and reduce the threats to the internal validity of their studies. Second, field experiments are a constructive response to the growing interest in evidencebased management practices that has developed over the past few decades (Pfeffer & Sutton, 2006; Rousseau, 2012; Rynes & Bartunek, 2017; Shadish & Cook, 2009). As noted by Shadish and Cook (2009), there is an increasing number of fields in which practitioners and administrators are interested in developing and adopting evidence-based interventions, and field experiments provide such evidence. For example, field experiments lend themselves to the investigation of a 17

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

finding, which might not have otherwise been readily identifiable.

Potential limitations of field experiments As noted above, field experiments have some limitations (e.g., researchers are unlikely to exercise as much control over the independent or extraneous variables as in laboratory experiments, it is potentially more difficult to replicate the findings of field experiments, and it may prove more difficult to implement complex factorial designs in field settings than in the laboratory). In addition, another potential limitation of field experiments is that it can be difficult to gain access to organizations to carry out field experiments. Indirect evidence of this is provided by Scandura and Williams (2000), who reported that field experiments accounted for a relatively small (and declining) percentage of studies reported in the leading management journals (i.e., 3.9% in the 1980s to 2.2% in the 1990s). This is consistent with our analysis, which indicated that the percentage of field experiments published in the seven journals we examined never exceeded 2%. However, given that field experiments require organizations to allow a researcher to manipulate variables likely to have a significant impact on employees' attitudes, perceptions, and behaviors, it is not really surprising that they are often reluctant to participate. Moreover, since organizations want to optimize the performance of their units, managers may look unfavorably on the potential costs of disruption resulting from the random assignment of participants to conditions. Finally, Eden (2017) noted that field experiments are often perceived to be complex and difficult to conduct, which may deter scholars from attempting them.

Quasi-experiments Characteristics of quasi-experiments According to Grant and Wall (2009, p. 655), A quasi-experiment is a study that takes place in a field setting and involves a change in a key independent variable of interest but relaxes one or both of the defining criteria of laboratory and field experiments: random assignment to treatment conditions and controlled manipulation of the independent variable. Quasi-experiments thus include experimenter-controlled and manager-controlled interventions in which random assignment is not achieved, such as when treatments are assigned to intact or preexisting groups. As indicated in Table 2, quasi-experiments share some similarities with both laboratory and field experiments. For example, like laboratory and field experiments, their objective is to establish causal relationships and they involve manipulation of independent variable(s). However, in contrast to other types of experimental designs, participants are not randomly assigned to treatment conditions in quasi-experiments. In fact, it is common for treatment conditions to be assigned to pre-existing groups or for participants to self-select into treatment conditions. As noted earlier, assigning participants to treatments in these ways increases the possibility that participants in the various conditions differ on other characteristics that may be related to the dependent variable(s). Like field experiments, quasi-experiments also differ from laboratory experiments in that the researcher typically does not control potentially important aspects of the context in which the study is conducted; thus, increasing the number of rival hypotheses that might account for the findings.

What can be done to increase the probability that field experiments will be published? Although there is considerably less resistance among editors and reviewers to publishing field experiments, the larger challenge may be convincing an organization of the benefits of participation. Fortunately, Eden (2017) has provided useful suggestions for overcoming managers' objections to field experiments. These include (a) refraining from using research jargon, (b) explaining the purpose and value of randomization to managers, (c) looking for creative ways to implement randomized experiments, (d) using treatments of deleterious independent variables (e.g. stress) that are designed to reduce, rather than increase, their effects, and (e) piggybacking on naturally occurring events in the organization. We believe that these recommendations are sound and would encourage readers interested in conducting field experiments to read Eden's paper for additional details on how to implement them. We also think that Colquitt's (2008) suggestions for those interested in publishing laboratory experiments are also relevant to those wanting to increase their likelihood of publishing field experiments in the leadership domain (e.g., aim to test, extend or build new theory; ensure high internal validity, construct validity and statistical conclusion validity; use behavioral dependent variables where possible; strive to produce original, interesting, and important research findings). In addition, researchers should emphasize the fact that field experiments can combine the best elements of experimental research with the ecological validity of real organizational settings in their papers. Finally, as we noted earlier in our discussion of laboratory experiments, combining the results of field experiments with the results of other qualitative or quantitative studies should make them easier to publish. A good example of this approach is Li, Zheng, Harris, Liu, and Kirkman's (2016) examination of the spillover effects of providing positive social recognition in teams. These authors combined two laboratory experiments and one field experiment to show that the recognition received by a single team member boosted his or her teammates' individual performance, and the collective performance of the team. However, the results of this field experiment also highlighted an unintentional downside of administering individual recognition in existing organizational teams, in that the performance of employees in the control condition decreased following the recognition announcements in the experimental condition. Since this drop in performance did not occur in the more internally valid laboratory experiments conducted by Le et al., it led the researchers to speculate on potential explanations for this

Examples of quasi-experiments in leadership research As noted by Shadish et al. (2002), there is a wide variety of quasiexperimental designs. To illustrate this variety, we highlight one study that uses a pretest-posttest nonequivalent groups design (Hui, Lam, & Schaubroeck, 2001), one that uses an interrupted time-series design (Grant & Hofmann, 2011), and one that uses a cohort design (DeRue, Nahrgang, Hollenbeck, & Workman, 2012). These designs are generally not as effective as regression-discontinuity designs (RDDs), but few studies in the leadership domain have used RDDs (for an exception see Steffens, Peters, Haslam, & van Dick, 2017). This is unfortunate, because when used properly, regression discontinuity designs provide strong evidence of cause-effect relationships. We encourage leadership researchers interested in learning more about them to refer to Shadish et al. (2002), Antonakis et al. (2010), and Cappelleri and Trochim (2015). In the first of the studies we explore, Hui et al. (2001) examined the effect that training bank employees to become service quality leaders has on customer satisfaction and employees' compliance to the requirements of a new service quality program. Hui et al. tested two hypotheses. The first hypothesis stated that, compared with organizational units that did not have a service quality leader, units using frontline employees as service quality leaders would be more successful in implementing the service quality initiative. The second hypothesis stated that, compared with organizational units that used randomly selected frontline employees as service quality leaders, units using frontline employees selected on the basis of their OCB would be more successful in implementing the service quality initiative. Hui et al. (2001) tested their hypotheses in three U.S. branches of a large multinational bank, using a two-wave, repeated measures design. In one branch employees were selected to become service quality leaders on the basis of previous OCB, in the second branch the selection of service quality leaders was random, and in third (control) bank no employees were trained to be service quality leaders. No differences in age, education level, or organizational tenure were found between the 18

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

quasi-experiment indicated that fundraisers who received an ideological message from a beneficiary performed significantly better after the message was delivered, whereas the performance of fundraisers who received messages from leaders did not change following delivery of the message. However, Grant and Hofmann noted that since the fundraisers were not randomly assigned to treatment conditions, their results were subject to several threats to validity. These included selection threats (fundraisers who showed up for the scholarship student's speech may have been more committed than other fundraisers), and multiple treatment effects (the messages varied in terms of content as well as source). Nevertheless, Grant and Hofmann provided reasons why several other potential threats (e.g., history, testing, instrumentation, test-treatment interaction effects, statistical regression, resentful demoralization, compensatory rivalry, compensatory equalization and treatment diffusion) were not likely to have influenced their findings. They conducted two laboratory experiments to address some of these potential threats. The final quasi-experiment was reported by DeRue et al. (2012). These authors were interested in exploring the effects that structured reflection, in the form of after-event reviews (AERs), have on experience-based leadership development activities, as well as how prior experiences and personality influence the impact of AERs on leadership development. DeRue et al. used MBA students as participants in a quasiexperimental cohort design. According to Cook and Campbell (1979, p. 127), “cohorts” are “groups of respondents who follow each other through formal institutions or informal institutions.” In the DeRue et al. study, the first cohort of MBA students (the control group) preceded the second cohort of MBA students (the experimental cohort) by two years. Comparisons of the two cohorts indicated that they were similar with respect to a variety of factors previously shown to be related to leadership, including demographic variables, experience, cognitive ability, and personality traits. In addition, in order to reduce the likelihood that the subsequent findings would be due to unknown confounds or selection-maturation biases, both cohorts were exposed to the same curriculum, taught by the same instructors, and exposed to the same extracurricular activities and leadership development experiences. Unlike the participants in the control cohort, who were asked by trained facilitators simply to discuss the lessons they had learned after each major leadership developmental activity, the participants in the experimental cohort were guided through the AER protocol by the facilitators. Participants first answered a series of questions relating to the activity (e.g., about the goal of the experience, their own behavior and contributions, the behavior of others, and specific actions they could take to improve their future performance), and were then guided by their facilitator to identify what they had learned about their leadership capabilities. Consistent with DeRue et al.'s (2012) hypotheses, the results showed that the AER intervention had a positive effect on leadership development, and that this effect was stronger in participants who were more conscientious, more open to new experiences, more emotionally stable, and had experienced greater developmental challenges in their previous work experiences. However, in contrast to their hypotheses, the authors found that neither participants' cognitive ability nor their amount of work experience moderated the relationship between the AER intervention and leadership development.

tellers who were trained in the two banks that received the experimental treatment. This treatment consisted of three weekly, two-hour group training sessions (led by an independent consultant) for those selected to become service quality leaders. The first session was devoted to a discussion of the new company policy for improving the quality of service. The second session identified specific behavioral changes that were needed to improve the quality of customer service, and also included a discussion on how to use conversations to alert tellers to the benefits of providing quality service. The final session involved brainstorming strategies for improving service quality. Three dependent variables were used to assess the effects of the training: customers' ratings of satisfaction with the service they had received, bank employees' self-ratings of compliance to the new service quality program, and supervisors' ratings of their employees' compliance to the new service quality program. A manipulation check indicated that leaders selected on the basis of their OCB did, indeed, receive higher supervisor ratings on a measure of OCB than other employees in the three bank branches, as well as the leaders chosen randomly to be trained in the other branch bank. Consistent with both hypotheses, Hui et al. (2001) reported that the two branches that used trained frontline employees as service quality leaders received higher customer satisfaction ratings than the branch without any service quality leaders (Hypothesis 1), and that the branch with “good citizens” as leaders received higher customer satisfaction ratings than the branch with randomly selected leaders (Hypothesis 2). Supervisor ratings of employees' compliance to the new service quality program also provided support for the hypotheses; however, employees' self-ratings only supported Hypothesis 1. Supervisors in the two branches that used frontline leaders reported better compliance to the new service quality plan than supervisors in the branch without frontline leaders but, inconsistent with Hypothesis 2, self-reported compliance to the new service quality program was similar in the branch with “good citizens” as leaders and the branch that used randomly selected leaders. In discussing the potential limitations of their study, Hui et al. (2001) noted that although the three bank branches were randomly assigned to treatment conditions, the tellers were not randomly assigned to the branches, and the researchers could not control for extraneous influences in the branch environments. However, they went on to provide reasons why several potential threats to construct and internal validity (e.g. compensatory equalization, resentful demoralization, selection, maturation, and reactance) were implausible explanations for their findings. In the second quasi-experiment, Grant and Hofmann (2011, Study 1) examined the effects that the source of an ideological message (leader versus beneficiary of the message) has on the performance of the targets of the message. They noted that although virtually all of the previously reported studies had positioned leaders as the source of ideological messages, in some organizations such messages are delivered by beneficiaries. They went on to hypothesize that ideological messages delivered by beneficiaries have stronger effects on employees' performance than messages delivered by leaders. Grant and Hofmann (2011) tested their hypotheses by studying the behavior of 60 university fundraisers in a naturally occurring quasiexperiment that took place over a three-month period. The fundraisers were responsible for contacting alumni and persuading them to donate money. The interventions occurred when the fundraisers' manager invited two university leaders and a scholarship student (a beneficiary of the fundraising process) to deliver messages at the beginning of the fundraisers' shifts. The authors tracked the performance of the fundraisers on a daily basis, before and after the interventions. Fourteen of the fundraisers received a message from a Director of the Young Alumni, 23 fundraisers received a message from a member of the Board of Trustees, and 18 fundraisers received a message from a scholarship student beneficiary. Performance was measured by the amount of money raised by each of these three groups. The results of Grant and Hofmann's (2011) interrupted time-series

Strengths of quasi-experiments Quasi-experiments share many of the strengths of field experiments. First, control is exercised over the independent variable(s) of interest in quasi-experiments.3 In addition, quasi-experiments are less susceptible 3 It is worth noting, however, that in one type of quasi-experiment (often referred to as a “natural experiment”), the manipulation can happen naturally. In these studies, the researcher does not manipulate the independent variable, but instead takes advantage of the natural occurrence of the manipulation.

19

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

confidence in causal inferences based on quasi-experimental data (compared with laboratory experiments), as well as confidence in the replicability of the findings. Finally, it is also difficult, if not impossible, to find quasi-experiments that examine multiple independent variables using complex factorial designs.

to criticisms about artificiality and demand characteristics (compared to laboratory experiments). The reasons for this advantage are that: (a) quasi-experiments are typically conducted in real organizational settings with real employees performing real jobs, (b) manipulation of the independent variable reflects the intensity of stimulus events in real organizational settings, (c) participants are typically exposed to the treatment(s) for a longer period of time than participants in laboratory experiments, and (d) participants are normally less aware of the experimental conditions, and therefore, less subject to some forms of participant reactivity. Finally, like other experimental designs, quasiexperiments provide researchers with an opportunity to measure actual behaviors (or their outcomes), rather than focusing only on attitudes and perceptions. In addition to the strengths identified above, Grant and Wall (2009, p. 653) have noted that quasi-experiments may be particularly beneficial for: “(a) strengthening causal inference when random assignment and controlled manipulation are not possible or ethical; (b) building better theories of time and temporal progression; (c) minimizing ethical dilemmas of harm, inequity, paternalism, and deception; (d) facilitating collaboration with practitioners; and (e) using context to explain conflicting findings.” Similar points regarding the benefits of quasi-experiments for exploring causal relationships when ethical considerations preclude random assignment or controlled manipulations, or when there is reluctance to participate in such studies, has been noted by Thyer (2012). Furthermore, Thyer has also noted that small scale quasi-experiments may be particularly useful in testing the effectiveness of interventions before investing more resources in conducting largescale field experiments.

What can be done to increase the probability that quasi-experiments will be published? The critical first step is to obtain access to a participating organization. Grant and Wall (2009) have provided several worthwhile recommendations for gaining the cooperation of organizations including: (a) building long-term relationships and trust with organizations; (b) disseminating the findings of previous research to practitioners; (c) explaining how quasi-experiments can help practitioners achieve their goals; (d) asking questions in order to find out what practitioners value and tailoring the study to these values; (e) highlighting the advantages that quasi-experiments have for researchers; (f) emphasizing common goals and unique expertise; (g) translating research jargon into common language; and (h) finding the right contacts in the organization. Beyond these strategies, leadership researchers need to look for ways to enhance the contributions of their quasi-experiments. One obvious way is to follow the lead of Grant and Hofmann (2011), and pair the results of a quasi-experiment with laboratory (or field) experiments that directly address problems associated with the nonrandom assignment of participants to treatments. Similarly, the findings of quasi-experiments can be enhanced by combining them with the results of a non-experimental survey study that provide additional insights into the boundary conditions of the relationships examined in the quasi-experiment, or with a qualitative study that delves deeper into the theoretical mechanisms responsible for the findings. We believe that there are other strategies that researchers can use to minimize threats to validity in quasi-experiments. The first recommendation is to anticipate and proactively address the likely threats to internal validity (Cook & Campbell, 1979; Mark & Reichardt, 2009). At the most basic level this can be accomplished by listing the threats in the planning phase of the study and then examining how well design decisions address each of them. Such planning helps avoid problems that might not otherwise be recognized until after the study has been conducted. Given that participants are not randomly assigned to conditions in quasi-experiments, one of the most obvious threats to the validity in such studies is selection. In response to this threat, researchers (Gu & Rosenbaum, 1993; Rosenbaum & Rubin, 1983; Smith, 1997; Stuart, 2010) have developed, described, and in some cases tested, the effects of a variety of techniques designed to match non-randomly assigned participants across conditions. Matching is “any method that aims to equate (or ‘balance’) the distribution of covariates in the treated and control groups” (Stuart, 2010, p. 1). These procedures are designed to rule out potential threats to internal validity by ensuring that groups are equivalent with respect to potential confounding factors. Matching techniques include propensity score matching, individual-to-individual (or 1:1) matching, frequency distribution matching, weighted matching, and sub-classification matching. A complete treatment of matching techniques is beyond the scope of this paper, but we encourage interested readers to examine articles on this topic by Stuart (2010), Harder, Stuart, and Anthony (2010), Connelly, Sackett, and Waters (2013), and Li (2013), as well as the book by Holmes (2014). Finally, because there is a greater likelihood that the independent variables examined in quasi-experimental studies are endogenous, and are correlated with the error terms of the dependent variables, quasiexperiments are susceptible to endogeneity biases. This makes the estimates inconsistent. As a result, we encourage researchers interested in publishing such studies to heed the recommendations of experts (Antonakis et al., 2010; Kennedy, 2008) on how these biases can be controlled.

Limitations of quasi-experiments Despite the potential benefits of quasi-experiments, researchers are likely to encounter certain limitations when using these designs. For example, because many of the variables that management researchers are interested in manipulating (e.g., leadership behaviors, incentive systems, organizational or job characteristics etc.) are likely to affect the actions of employees, organizations may be unwilling to participate in the research unless they are convinced that the outcomes will be positive. Furthermore, some managers may be reluctant to have researchers figuratively “looking over their shoulders” and scrutinizing the effectiveness of their actions. These factors not only make it difficult to gain access to organizations, they also mean that more time, effort, and planning may be required to conduct quasi-experiments than laboratory experiments. Aside from these practical considerations, researchers conducting quasi-experiments face other, design-related challenges. Foremost among them, quasi-experiments do not allow for random assignment, which raises the possibility that pre-existing differences between groups, or selection biases, may (at least in part) account for the observed results. In addition, quasi-experimental designs are more susceptible to the potential deleterious effects of endogeneity biases (Antonakis et al., 2010). Third, researchers typically exercise considerably less control over the treatment condition(s) and extraneous variables in quasi-experiments compared with laboratory or field experiments. This is illustrated by Grant and Hofmann's (2011, Study 1) inability to control the specific content of the ideological messages presented by the three different sources, which raises obvious questions about the construct validity (and equivalence) of the manipulations in the study. Researchers' lack of control over the potential effects of other, extraneous variables is also illustrated by the fact that in all the quasi-experiments we discussed, the authors found it necessary to explain why their findings could not be accounted for by confounding variables. When taken together, these limitations tend to decrease

(footnote continued) Grant and Hofmann's (2011, Study 1) is an example of this type of study. 20

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

generalizability of field and laboratory research findings. American Psychologist, 35, 463–464. Bauman, C. W., Tost, L. P., & Ong, M. (2016). Blame the shepherd not the sheep: Imitating higher-ranking transgressors mitigates punishment for unethical behavior. Organizational Behavior and Human Decision Processes, 137, 123–141. Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). Psychology as the science of selfreports and finger movements: Whatever happened to actual behavior? Perspectives on Psychological Science, 2, 396–403. Bendahan, S., Zehnder, C., Pralong, F. P., & Antonakis, J. (2015). Leader corruption depends on power and testosterone. The Leadership Quarterly, 26, 101–122. Berkowitz, L., & Donnerstein, E. (1982). External validity is more than skin deep: Some answers to criticisms of laboratory experiments. American Psychologist, 37, 245–257. Bickman, L., & Rog, D. J. (2009). Applied research design: A practical approach. In L. Bickman, & D. J. Rog (Eds.). The Sage handbook of applied social research methods (pp. 3–43). (2nd ed.). Thousand Oaks, CA: Sage. Borsboom, D. (2009). Measuring the mind: Conceptual issues in contemporary psychometrics. Cambridge, England: Cambridge University Press. Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071. Boruch, R. F., Weisburd, D., Turner, M. T., III, Karpyn, A., & Littell, J. (2009). Randomized controlled trials for evaluation and planning. In L. Bickman, & D. J. Rog (Eds.). The Sage handbook of applied social research methods (pp. 147–181). Thousand Oaks, CA: Sage. Brown, D. J., & Lord, R. G. (1999). The utility of experimental research in the study of transformational/charismatic leadership. The Leadership Quarterly, 10, 531–539. Camerer, C. F. (2015). The promise and success of lab-field generalizability in experimental economics: A critical reply to Levitt and List. In G. Fréchette, & A. Schotter (Eds.). Handbook of experimental economic methodology (pp. 249–295). Oxford, UK: Oxford University Press. Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54, 297–312. Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago, IL: Rand McNally. Campbell, J. P. (1986). Labs, fields, and straw issues. In E. A. Locke (Ed.). Generalizing from laboratory to field settings (pp. 269–279). Lexington, MA: Heath. Cappelleri, J. C., & Trochim, W. M. (2015). Regression discontinuity design. International encyclopedia of the social & behavioral sciences. Vol. 20. International encyclopedia of the social behavioral sciences (pp. 152–159). Chatterji, A. K., Findley, M., Jensen, N. M., Meier, S., & Nielson, D. (2016). Field experiments in strategy research. Strategic Management Journal, 37, 116–132. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405. Churchill, G. A., Jr. (1979). A paradigm for developing better measures of marketing constructs. Journal of Marketing Research, 16, 64–73. Colquitt, J. A. (2008). From the editors: Publishing laboratory research in AMJ: A question of when, not if. Academy of Management Journal, 51, 616–620. Compeau, D., Marcolin, B., Kelley, H., & Higgins, C. (2012). Research commentary—Generalizability of information systems research using student subjects — A reflection on our practices and recommendations for future research. Information Systems Research, 23, 1093–1109. Connelly, B. S., Sackett, P. R., & Waters, S. D. (2013). Balancing treatment and control groups in quasi-experiments: An introduction to propensity scoring. Personnel Psychology, 66, 407–442. Cook, T. D., & Campbell, D. T. (1976). The design and conduct of quasi-experiments and true experiments in field settings. In M. Dunnette (Ed.). Handbook of industrial and organizational psychology (pp. 223–326). Skokie, IL: Rand McNally. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Boston, MA: Houghton Mifflin Company. Cooper, C. A., McCord, D. M., & Socha, A. (2011). Evaluating the college sophomore problem: The case of personality and politics. Journal of Psychology, 145, 23–37. Cooper, W. H., & Richardson, A. J. (1986). Unfair comparisons. Journal of Applied Psychology, 71, 179–184. Crano, W. D., Brewer, M. B., & Lac, A. (2015). Principles and methods of social research (3rd ed.). New York, NY: Routledge. de Mel, S., McKenzie, D., & Woodruff, C. (2008). Returns to capital in microenterprises: Evidence from a field experiment. Quarterly Journal of Economics, 123, 1329–1372. DeRue, D. S., Nahrgang, J. D., Hollenbeck, J. R., & Workman, K. (2012). A quasi-experimental study of after-event reviews and leadership development. Journal of Applied Psychology, 97, 997–1015. DeRue, D. S., Nahrgang, J. D., Wellman, N., & Humphrey, S. E. (2011). Trait and behavioral theories of leadership: An integration and meta-analytic test of their relative validity. Personnel Psychology, 64, 7–52. Detert, J. R., Trevino, L. K., Burris, E. R., & Andiappan, M. (2007). Managerial modes of influence and counterproductivity in organizations: A longitudinal business-unitlevel investigation. Journal of Applied Psychology, 92, 993–1005. deVaus, D. (2001). Research design in social research. Thousand Oaks, CA: Sage. Dipboye, R. L., & Flanagan, M. F. (1979). Are findings from the field more generalizable than in the laboratory? American Psychologist, 34, 141–150. Dobbins, G. H., Lane, I. M., & Steiner, D. D. (1988). A note on the role of laboratory methodologies in applied behavioural research: Don't throw out the baby with the bath water. Journal of Organizational Behavior, 9, 281–286. Doci, E., & Hofmans, J. (2015). Task complexity and transformational leadership: The mediating role of leaders' state core self-evaluations. The Leadership Quarterly, 26, 436–447. Dvir, T., Eden, D., Avolio, B. J., & Shamir, B. (2002). Impact of transformational leadership on follower development and performance: A field experiment. Academy of Management Journal, 45, 735–744. Eden, D. (1992). Leadership and expectations: Pygmalion effects and other self-fulfilling prophecies in organizations. The Leadership Quarterly, 3, 271–305. Eden, D. (2003). Self-fulfilling prophecies in organizations. In J. Greenberg (Ed.).

Concluding remarks Although there is evidence of renewed interest in the use of experimental designs in management and leadership research (e.g., Anderson & Edwards, 2015; Antonakis, 2017; Colquitt, 2008; Van Witteloostuijn, 2015; Zellmer-Bruhn et al., 2016), they are still relatively underutilized. The strength of experimental designs is that they provide strong evidence of causal relationships between independent and dependent variables. We therefore encourage leadership researchers to include laboratory experiments, field experiments, and quasi-experiments in their methodological toolkit. Although we have not addressed all of the issues associated with this important topic, we have hopefully provided leadership researchers interested in using experimental designs with some valuable suggestions for improving their research. Like Ariely (2010, pp. 292–293), we believe that the knowledge gained from experimental studies is important for both leadership scholars and practitioners alike: The importance of experiments as one of the best ways to learn what really works and what does not seems incontrovertible. I don't see anyone wanting to abolish scientific experiments in favor of relying more heavily on gut feelings or intuitions. But, I'm surprised that the importance of experiments isn't recognized more broadly, especially when it comes to important decisions in business or public policy. Frankly, I am often amazed by the audacity of the assumptions that businesspeople and politicians make, coupled with their seemingly unlimited conviction that their intuition is correct…But politicians and businesspeople are just people, with the same decision biases we all have, and the types of decisions they make are just as susceptible to errors in judgment as medical decisions. So shouldn't it be clear that the need for systematic experiments in business and policy is just as great? Acknowledgements Philip M. Podsakoff gratefully acknowledges the support provided by the Hyatt and Cici Brown Chair in Business. References Acton, J. E. E. D. A., & Himmelfarb, G. (1948). Essays on freedom and power. Boston, MA: Beacon Press. Allen, T. D., & Rush, M. C. (1998). The effects of organizational citizenship behavior on performance judgments: A field study and a laboratory experiment. Journal of Applied Psychology, 83, 247–260. Anderson, C. A., Lindsay, J. J., & Bushman, B. J. (1999). Research in the psychological laboratory: Truth or triviality? Current Directions in Psychological Science, 8, 3–9. Anderson, D. M., & Edwards, B. C. (2015). Unfulfilled promise: Laboratory experiments in public management research. Public Management Review, 17, 1518–1542. Antonakis, J. (2017). On doing better science: From thrill of discovery to policy implications. The Leadership Quarterly, 28, 5–21. Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making causal claims: A review and recommendations. The Leadership Quarterly, 21, 1086–1120. Ariely, D. (2010). The upside of irrationality: The unexpected benefits of defined logic at work and at home. New York, NY: HarperCollins. Aronson, E., Brewer, M., & Carlsmith, J. M. (1985). Experimentation in social psychology. In G. Lindzey, & E. Aronson (Vol. Eds.), Handbook of social psychology(3rd ed.). Vol. 1. Handbook of social psychology (pp. 441–486). New York, NY: Random House. Aronson, E., & Carlsmith, J. M. (1968). Experimentation in social psychology. In G. Lindzey, & E. Aronson (Vol. Eds.), The handbook of social psychology. Vol. 2. The handbook of social psychology (pp. 1–79). Reading, MA: Addison - Wesley. Austin, J. T., Scherbaum, C. A., & Mahlman, R. A. (2002). History of research methods in industrial and organizational psychology: Measurement, design, analysis. In S. G. Rogelberg (Ed.). Handbook of research methods in industrial and organizational psychology (pp. 1–33). Malden, MA: Blackwell. Avey, J. B., Avolio, B. J., & Luthans, F. (2011). Experimentally analyzing the impact of leader positivity on follower positivity and performance. The Leadership Quarterly, 22, 282–294. Babbie, E. (2014). The practice of social research (14th ed.). Boston, MA: Cengage Learning. Bandiera, O., Barankay, I., & Rasul, I. (2007). Incentives for managers and inequality among workers: Evidence from a firm level experiment. Quarterly Journal of Economics, 122, 729–774. Bandiera, O., Barankay, I., & Rasul, I. (2011). Field experiments with firms. Journal of Economic Perspective, 25, 63–82. Bass, A. R., & Firestone, I. J. (1980). Implications of representativeness for

21

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

Organizational behavior: The state of the science (pp. 91–122). (2nd ed.). Mahwah, NJ: Lawrence Erlbaum. Eden, D. (2017). Field experiments in organizations. Annual Review of Organizational Psychology and Organizational Behavior, 4, 91–122. Eden, D., Stone-Romero, E. F., & Rothstein, H. R. (2015). Synthesizing results of multiple randomized experiments to establish causality in mediation testing. Human Resource Management Review, 25, 342–351. Falk, A., & Heckman, J. J. (2009). Lab experiments are a major source of knowledge in the social sciences. Science, 326, 535–538. Fischer, T., Dietz, J., & Antonakis, J. (2017). Leadership process models: A review and synthesis. Journal of Management, 43, 1726–1753. Fisher, C. D. (1984). Laboratory experiments. In T. S. Bateman, & G. R. Ferris (Eds.). Method & analysis in organizational research (pp. 169–185). Reston, VA: Reston Publishing. Gadlin, H., & Ingle, G. (1975). Through the one-way mirror: The limits of experimental self-reflection. American Psychologist, 30, 1003–1009. Giessner, S. R., van Knippenberg, & Sleebos, E. (2009). License to fail? How leader group prototypicality moderates the effects of leader performance on perceptions of leadership effectiveness. The Leadership Quarterly, 20, 434–451. Gordon, M. E., Slade, L. A., & Schmitt, N. (1986). “Science of the sophomore” revisited: From conjecture to empiricism. Academy of Management Review, 11, 191–207. Gordon, M. E., Slade, L. A., & Schmitt, N. (1987). Student guinea pigs: Porcine predictors and particularistic phenomena. Academy of Management Review, 12, 160–163. Grant, A. M., & Hofmann, D. A. (2011). Outsourcing inspiration: The performance effects of ideological messages from leaders and beneficiaries. Organizational Behavior and Human Decision Processes, 116, 173–187. Grant, A. M., & Wall, T. D. (2009). The neglected science and art of quasi-experimentation: Why-to, when-to, and how-to advice for organizational researchers. Organizational Research Methods, 12, 653–686. Greenberg, J. (1987). The college sophomore as guinea pig: Setting the record straight. Academy of Management Review, 12, 157–159. Greenberg, J., & Tomlinson, E. C. (2004). Situated experiments in organizations: Transplanting the lab to the field. Journal of Management, 30, 703–724. Griffin, R., & Kacmar, K. M. (1991). Laboratory research in management: Misconceptions and missed opportunities. Journal of Organizational Behavior, 12, 301–311. Gu, X. S., & Rosenbaum, P. R. (1993). Comparison of multivariate matching methods: Structures, distances, and algorithms. Journal of Computational and Graphical Statistics, 2, 405–420. Harder, V. S., Stuart, E. A., & Anthony, J. C. (2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychological Methods, 15, 234–249. Harré, R., & Secord, P. F. (1972). The explanation of social behavior. Oxford, UK: Blackwell. Harrison, G. W., & List, J. A. (2004). Field experiments. Journal of Economic Literature, 42, 1009–1055. Hauser, O. P., Linos, E., & Rogers, T. (2017). Innovation with field experiments: Studying organizational behaviors in actual organizations. In A. P. Brief, & B. M. Staw (Vol. Eds.), Research in organziational behavior. 37. Research in organziational behavior (pp. 185–198). Heath, C., & Sitkin, S. B. (2001). Big-B versus Big-O: What is organizational about organizational behavior? Journal of Organizational Behavior, 22, 43–58. Henry, P. J. (2008). College sophomores in the laboratory redux: Influences of a narrow data base on social psychology's view of the nature of prejudice. Psychological Inquiry, 19, 49–71. Henshel, R. L. (1980). The purposes of laboratory experimentation and the virtues of deliberate artificiality. Journal of Experimental Social Psychology, 16, 466–478. Hertwig, R., & Ortmann, A. (2001). Experimental practices in economics: A methodological challenge for psychologists? Behavioral and Brain Sciences, 24, 383–403. Hertwig, R., & Ortmann, A. (2008). Deception in experiments: Revisiting the arguments in its defense. Ethics & Behavior, 18, 59–92. Highhouse, S. (2009). Designing experiments that generalize. Organizational Research Methods, 12, 554–566. Holmes, W. M. (2014). Using propensity scores in quasi-experimental designs. Thousand Oaks, CA: Sage. Howell, J. M., & Frost, P. J. (1989). A laboratory study of charismatic leadership. Organizational Behavior and Human Decision Processes, 43, 243–269. Hui, C., Lam, S. S. K., & Schaubroeck, J. (2001). Can good citizens lead the way in providing quality service? A field quasi-experiment. Academy of Management Journal, 44, 988–995. Hunt, J. G., Boal, K. B., & Dodge, G. E. (1999). The effects of visionary and crisis-responsive charisma on followers: An experimental examination of two kinds of charismatic leadership. The Leadership Quarterly, 10, 423–448. Ilgen, D. R. (1986). Laboratory research: A question of when, not if. In E. Locke (Ed.). Generalizing from laboratory to field settings (pp. 257–268). Lexington, MS: Lexington Books. James, L. R. (1980). The unmeasured variables problem in path-analysis. Journal of Applied Psychology, 65, 415–421. Jamison, J., Karlan, D., & Schechter, L. (2008). To deceive or not to deceive: The effect of deception on behavior in future laboratory experiments. Journal of Economic Behavior & Organization, 68, 477–488. Johnson, M. D., Hollenbeck, J. R., DeRue, D. S., Barnes, C. M., & Jundt, D. (2013). Functional versus dysfunctional team change: Problem diagnosis and structural feedback for self-managed teams. Organizational Behavior and Human Decision Processes, 122, 1–11. Jones, R. A. (1985). Research methods in the social and behavioral sciences. Sunderland, MD: Sinauer Associates. Judd, C. M., Kenny, D. A., & McClelland, G. H. (2001). Estimating and testing mediation and moderation in within-subject designs. Psychological Methods, 6, 115–134. Judge, T. A., Bono, J. E., Ilies, R., & Gerhardt, M. W. (2002). Personality and leadership: A qualitative and quantitative review. Journal of Applied Psychology, 87, 765–780. Judge, T. A., Colbert, A. E., & Ilies, R. (2004). Intelligence and leadership: A quantitative

review and test of theoretical propositions. Journal of Applied Psychology, 89, 542–552. Judge, T. A., & Piccolo, R. F. (2004). Transformational and transactional leadership: A meta-analytic test of their relative validity. Journal of Applied Psychology, 89, 755–768. Kardes, F. R. (1996). In defense of experimental consumer psychology. Journal of Consumer Psychology, 5, 279–296. Kelley, T. L. (1927). Interpretation of educational measurements. New York, NY: Oxford University Press. Kennedy, P. (2008). A Guide to econometrics (6th ed.). Malden, MA: Blackwell. Kenny, D. A. (1979). Correlation and causality. New York, NY: John Wiley & Sons. Kenny, D. A. (2008). Reflections on mediation. Organizational Research Methods, 11, 353–358. Kidd, R. F. (1976). Manipulation checks: Advantage or disadvantage. Representative Research in Social Psychology, 7, 160–165. Kingstone, A., Smilek, D., Ristic, J., Freisen, C. K., & Eastwood, J. D. (2003). Attention, researchers! It is time to take a look at the real world. Current Directions in Psychological Science, 12, 176–180. Kruglanski, A. W. (1975). The human subject in the psychology experiment: Fact and artifact. In L. Berkowitz (Vol. Ed.), Advances in experimental social psychology. Vol. 8. Advances in experimental social psychology (pp. 101–147). New York, NY: Academic Press. Lam, S. S. K., & Schaubroeck, J. (2000). A field experiment testing frontline opinion leaders as change agents. Journal of Applied Psychology, 85, 987–995. Landy, F. J., & Bates, F. (1973). Another look at contrast effects in the employment interview. Journal of Applied Psychology, 58, 141–144. Latham, G. P., Erez, M., & Locke, E. A. (1988). Resolving scientific disputes by the joint design of crucial experiments by the antagonists - application to the Erez-Latham dispute regarding participation in goal setting. Journal of Applied Psychology, 73, 753–772. Li, M. (2013). Using the propensity score method to estimate causal effects: A review and practical guide. Organizational Research Methods, 16, 188–226. Li, N., Zheng, X., Harris, T. B., Liu, X., & Kirkman, B. L. (2016). Recognizing “Me” benefits “We”: Investigating the positive spillover effects of formal individual recognition in teams. Journal of Applied Psychology, 101, 925–939. Liang, L. H., Lian, H., Brown, D. J., Ferris, D. L., Hanig, S., & Keeping, L. M. (2016). Why are abusive supervisors abusive? A dual—system self-control model. Academy of Management Journal, 59, 1385–1406. Locke, E. A. (1986). Generalizing from laboratory to field: Ecological validity or abstraction of essential elements? In E. A. Locke (Ed.). Generalizing from laboratory to field settings (pp. 257–267). Lexington, MA: Heath. Lonati, S., Quiroga, B. F., Zehnder, C., & Antonakis, J. (2018). On [Lonati, S., Quiroga, B. F., Zehnder, C., & Antonakis, J. (2018). On]. https://urldefense.proofpoint.com/v2/ url?u=https-3A__doi.org_10.1016_j.jom.2018.10.003&d=DwICaQ&c= pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=wn-_K4u1Gj1cQJ6TSL_ eQysCX9ZUpQUFfGR9WdjO6DifxnT6bgHWWmjgPFNqv2vM&m=l8_Iu-MD__ v4krQzIoBNVIK9IBiVPDFkqmbLyqmLRH4&s= 67rFzhwcASkwjx5x6kWoTOgWnfpF8qT5BN-dirlLi10&e=. Lucas, J. W. (2003). Theory-testing, generalization, and the problem of external validity. Sociological Theory, 21, 236–253. Lynch, J. G., Jr. (1982). On the external validity of experiments in consumer research. Journal of Consumer Research, 9, 225–239. MacKenzie, S. B. (2003). The dangers of poor construct conceptualization. Journal of the Academy of Marketing Science, 31, 323–326. MacKenzie, S. B., Podsakoff, P. M., & Podsakoff, N. P. (2011). Construct measurement and validation procedures in MIS and behavioral research: Integrating new and existing techniques. MIS Quarterly, 35, 293–334. MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. New York: Lawrence Erlbaum. MacKinnon, D. P., Fairchild, A. J., & Fritz, M. S. (2007). Mediation analysis. Annual Review of Psychology, 58, 593–614. Mark, M. M., & Reichardt, C. S. (2009). Quasi-experimentation. In L. Bickman, & D. J. Rog (Eds.). The Sage handbook of applied social research methods (pp. 182–213). Los Angeles, CA: Sage. Martin, S. L., Liao, H., & Campbell, E. M. (2013). Directive versus empowering leadership: A field experiment comparing impacts on task proficiency and proactivity. Academy of Management Journal, 56, 1372–1395. Mathieu, J. E., Hollenbeck, J. R., van Knippenberg, D., & Ilgen, D. R. (2017). A century of work teams in the Journal of Applied Psychology. Journal of Applied Psychology, 102, 452–467. Maynes, T. D., & Podsakoff, P. M. (2014). Speaking more broadly: An examination of the nature, antecedents, and consequences of an expanded set of employee voice behaviors. Journal of Applied Psychology, 99, 87–112. McCambridge, J., de Bruin, M., & Witton, J. (2012). The effects of demand characteristics on research participant behaviours in non-laboratory settings: A systematic review. PLoS One, 7, 1–6. McNemar, Q. (1946). Opinion-attitude methodology. Psychological Bulletin, 43, 289–374. Mitchell, G. (2012). Revisiting truth or triviality the external validity of research in the psychological laboratory. Perspectives on Psychological Science, 7, 109–117. Mitchell, M. S., Vogel, R. M., & Folger, R. (2015). Third parties' reactions to the abusive supervision of coworkers. Journal of Applied Psychology, 100, 1040–1055. Mook, D. (1983). In defense of external invalidity. American Psychologist, 38, 379–387. Mueller, J. (2018). Finding new kinds of needles in haystacks: Experimentation in the course of abduction. Academy of Management Discoveries, 4, 103–108. Murphy, K. R., Herr, B. M., Lockhart, M. C., & Maguire, E. (1986). Evaluating the performance of paper people. Journal of Applied Psychology, 71, 654–661. Nahrgang, J. D., DeRue, D. S., Hollenbeck, J. R., Spitzmuller, M., Jundt, D. K., & Ilgen, D. R. (2013). Goal setting in teams: The impact of learning and performance goals on process and performance. Organizational Behavior and Human Decision Processes, 122, 12–21.

22

The Leadership Quarterly xxx (xxxx) xxx–xxx

P.M. Podsakoff, N.P. Podsakoff

3, 213–225. Sears, D. O. (1986). College sophomores in the laboratory: Influences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology, 51, 515. Semadeni, M., Withers, M. C., & Certo, S. T. (2014). The perils of endogeneity and instrumental variables in strategy research: Understanding through simulations. Strategic Management Journal, 35, 1070–1079. Shadish, W. R. (2011). Randomized controlled studies and alternative designs in outcome studies: Challenges and opportunities. Research on Social Work Practice, 21, 636–643. Shadish, W. R., & Cook, T. D. (2009). The renaissance of field experimentation in evaluating interventions. Annual Review of Psychology, 60, 607–629. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York, NY: Houghton Mifflin. Shearer, B. S. (2004). Piece rates, fixed wages and incentives: Evidence from a field experiment. Review of Economic Studies, 71, 513–534. Shimp, T. A., Hyatt, E. M., & Snyder, D. J. (1991). A critical appraisal of demand artifacts in consumer research. Journal of Consumer Research, 18, 273–283. Sigall, H., & Mills, J. (1998). Measures of independent variables and mediators are useful in social psychology experiments: But are they necessary? Personality and Social Psychology Review, 2, 218–226. Slade, L. A., & Gordon, M. E. (1988). On the virtues of laboratory babies and student bath water: A reply to Dobbins, Lane, and Steiner. Journal of Organizational Behavior, 9, 373–376. Smith, H. (1997). Matching with multiple controls to estimate treatment effects in observational studies. Sociological Methodology, 27, 325–353. Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005). Establishing a causal chain: Why experiments are often more effective than mediational analysis in examining psychological processes. Journal of Personality and Social Psychology, 89, 845–851. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York, NY: Springer-Verlag. Steffens, N. K., Peters, K., Haslam, S. A., & van Dick, R. (2017). Dying for charisma: Leaders' inspirational appeal increases post-mortem. The Leadership Quarterly, 28, 530–542. Stentz, J. E., Plano Clark, V. L., & Matkin, G. S. (2012). Applying mixed methods to leadership research: A review of current practices. The Leadership Quarterly, 23, 1173–1183. Stone-Romero, E. F. (2002). The relative validity and usefulness of various empirical research designs. In S. G. Rogelberg (Ed.). Handbook of research methods in industrial and organizational psychology (pp. 77–98). Malden, MA: Blackwell. Stone-Romero, E. F., & Rosopa, P. J. (2010). Research design options for testing mediation models and their implications for facets of validity. Journal of Managerial Psychology, 25, 697–712. Stone-Romero, E. F., Weaver, A. E., & Glenar, J. L. (1995). Trends in research design and data analytic strategies in organizational research. Journal of Management, 21, 141–157. Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25, 1–21. Suddaby, R. (2010). Construct clarity in theories of management and organization. Academy of Management Review, 35, 346–357. Taylor, L. A., III, Goodwin, V. L., & Cosier, R. A. (2003). Method myopia: Real or imagined? Journal of Management Inquiry, 12, 255–263. Thyer, B. A. (2012). Quasi-experimental research designs. Oxford, UK: Oxford University Press. Van Knippenberg, D., & Sitkin, S. B. (2013). A critical assessment of charismatic transformational leadership research: Back to the drawing board? Academy of Management Annals, 7, 1–60. Van Witteloostuijn, A. (2015). Toward experimental international business: Unraveling fundamental causal linkages. International Journal of Cross-Cultural Management, 22, 530–544. Vanhove, A. J., & Harms, P. D. (2015). Reconciling the two disciplines of organisational science: A comparison of findings from lab and field research. Applied Psychology. An International Review, 64, 637–673. Weber, S. J., & Cook, T. D. (1972). Subject effects in laboratory research: An examination of subject roles, demand characteristics, and valid inference. Psychological Bulletin, 77, 273–295. Webster, M., Jr., & Sell, J. (2014). Why do experiments? In M. WebsterJr., & J. Sell (Eds.). Laboratory experiments in the social sciences (pp. 5–21). (2nd ed.). London, UK: Elsevier. Wetzel, C. G. (1977). Manipulation checks: A reply to Kidd. Representative Research in Social Psychology, 8, 88–93. Wofford, J. C. (1999). Laboratory research on charismatic leadership: Fruitful or futile? The Leadership Quarterly, 10, 523–529. Zelditch, M. (1969). Can you really study an army in the laboratory? In A. Etzioni, & E. Lehman (Eds.). A sociological reader on complex organizations (pp. 528–539). New York, NY: Holt, Rinehart and Winston. Zellmer-Bruhn, M., Caligiuri, P., & Thomas, D. C. (2016). From the editors: Experimental designs in international business research. Journal of International Business Studies, 47, 399–407. Zhang, X., & Bartol, K. M. (2010). Linking empowering leadership and employee creativity: The influence of psychological empowerment, intrinsic motivation, and creative process engagement. Academy of Management Journal, 53, 107–128.

Oakes, W. (1972). External validity and the use of real people as subjects. American Psychologist, 27, 959–962. Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist, 17, 776–783. Orne, M. T. (1969). Demand characteristics and the concept of quasi-controls. In R. Rosenthal, & R. L. Rosnow (Eds.). Artifact in behavioral research (pp. 147–179). New York, NY: Academic Press. Ortmann, A., & Hertwig, R. (2002). The costs of deception: Evidence from psychology. Experimental Economics, 5, 111–131. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Perdue, B. C., & Summers, J. O. (1986). Checking the success of manipulations in marketing experiments. Journal of Marketing Research, 23, 317–326. Peterson, R. A. (2001). On the use of college students in social science research: Insights from a second-order meta-analysis. Journal of Consumer Research, 28, 450–461. Pfeffer, J., & Sutton, R. I. (2006). Hard facts, dangerous half-truths and total nonsense. Boston, MA: Harvard University Press. Piccolo, R. F., Bono, J. E., Heinitz, K., Rowold, J., Duehr, E., & Judge, T. A. (2012). The relative impact of complementary leader behaviors: Which matter most? The Leadership Quarterly, 23, 567–581. Podsakoff, N. P., Podsakoff, P. M., MacKenzie, S. B., & Klinger, R. L. (2013). Are we really measuring what we say we're measuring? Using video techniques to supplement traditional construct validation procedures. Journal of Applied Psychology, 98, 99–113. Podsakoff, N. P., Whiting, S. W., Podsakoff, P. M., & Mishra, P. (2011). Effects of organizational citizenship behaviors on selection decisions in employment interviews. Journal of Applied Psychology, 96, 310–326. Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903. Podsakoff, P. M., MacKenzie, S. B., Moorman, R., & Fetter, R. (1990). The impact of transformational leader behaviors on employee trust, satisfaction, and organizational citizenship behaviors. The Leadership Quarterly, 1, 107–142. Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569. Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2016). Recommendations for creating better concept definitions in the organizational, behavioral, and social sciences. Organizational Research Methods, 19, 159–203. Podsakoff, P. M., & Schriesheim, C. A. (1985). Field studies of French and Raven's bases of social power: Reanalysis, critique, and suggestions for future research. Psychological Bulletin, 97, 387–411. Postman, L. (1955). The probability approach and nomothetic theory. Psychological Review, 62, 218–225. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. Rosenthal, R. (1967). Covert communication in the psychological experiment. Psychological Bulletin, 67, 356–367. Rosenthal, R., & Rosnow, R. L. (1969). The volunteer subject. In R. Rosenthal, & R. L. Rosnow (Eds.). Artifact in behavioral research (pp. 41–92). New York, NY: Academic Press. Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research: Methods and data analysis (2nd ed.). New York, NY: McGraw-Hill. Rousseau, D. M. (2012). Envisioning evidence-based management. In D. M. Rousseau (Ed.). The Oxford handbook of evidence-based management (pp. 3–23). New York, NY: Oxford University Press. Rynes, S. L., & Bartunek, J. M. (2017). Evidence-based management: Foundations, development, controversies and future. Annual Review of Organizational Psychology and Organizational Behavior, 4, 235–261. Sawyer, A. G. (1975). Demand artifacts in laboratory experiments in consumer research. Journal of Consumer Research, 1, 20–30. Scandura, T. A., & Williams, E. A. (2000). Research methodology in management: Current practices, trends, and implications for future research. Academy of Management Journal, 43, 1248–1264. Schaubroeck, J., Lam, S. S. K., & Cha, S. E. (2007). Embracing transformational leadership: Team values and the impact of leader behavior on team performance. Journal of Applied Psychology, 92, 1020–1030. Schriesheim, C. A., House, R. J., & Kerr, S. (1976). Leader initiating structure: A reconciliation of discrepant research results and some empirical tests. Organizational Behavior and Human Performance, 16, 297–321. Schriesheim, C. A., & Stogdill, R. M. (1975). Differences in factor structure across three versions of the Ohio State leadership scales. Personnel Psychology, 28, 189–206. Schultz, D. P. (1969). Human subjects in psychological research. Psychological Bulletin, 72, 214–228. Schwab, D. P. (1980). Construct validity in organizational behavior. In L. L. Cummings, & B. Staw (Vol. Eds.), Research in organizational behavior. Vol. 2. Research in organizational behavior (pp. 3–43). Greenwich, CT: JAI Press. Schwab, D. P. (2005). Research methods for organizational studies (2nd ed.). Mahwah, NJ: Lawrence Earlbaum. Schwenk, C. R. (1982). Why sacrifice rigour for relevance? A proposal for combining laboratory and field research in strategic management. Strategic Management Journal,

23