A scenario-based experiment and a field study: A comparative examination for service failure and recovery

A scenario-based experiment and a field study: A comparative examination for service failure and recovery

International Journal of Hospitality Management 41 (2014) 125–132 Contents lists available at ScienceDirect International Journal of Hospitality Man...

796KB Sizes 0 Downloads 51 Views

International Journal of Hospitality Management 41 (2014) 125–132

Contents lists available at ScienceDirect

International Journal of Hospitality Management journal homepage: www.elsevier.com/locate/ijhosman

A scenario-based experiment and a field study: A comparative examination for service failure and recovery Jong-Hyeong Kim a,1 , SooCheong (Shawn) Jang b,∗ a b

School of Tourism Management, Sun Yat-sen University, 135 Road Xin Gang Xi, 510275 Guangzhou, China School of Hospitality and Tourism Management, Purdue University, Marriott Hall, 900 W. State Street, West Lafayette, IN 47907, USA

a r t i c l e

i n f o

Keywords: Service failure Service recovery Field research Scenario experiment Research methodology

a b s t r a c t Scenario-based experiments are an important method in service marketing, especially in the field of service failures and service recoveries. Field studies on these topical areas are rare because of the expense and ethical issues in a real setting. However, this raises a question: Can the results obtained from experiments accurately predict real-world field behavior? In order to obtain more accurate information regarding service failures and recoveries, this study compares the results from a scenario-based experiment with those from a field study. The findings provided mixed support for the concordance between the scenario-based experimental results and those obtained in a field setting. Negative emotions, such as anger and discontent toward service failures, were consistent in both cases. However, positive emotions (i.e., contentment with recovery efforts and overall satisfaction) and switching behavioral intentions significantly differed depending on the data source (i.e., scenario or field). Specifically, the scenario experiments tended to overstate positive feelings and understate negative behavioral intentions resulting from service failures. An analysis of these differences suggests practical implications to enhance the design of future scenario-based experiments. © 2014 Elsevier Ltd. All rights reserved.

1. Introduction Many critics of experimental methods claim that people’s behavior in the laboratory, as well as their behavior in hypothetical scenarios, is unconnected to their behavior in the field (Falk and Heckman, 2009). Common criticisms are that the artificial conditions of the experiment produce unrealistic data (Bardsley, 2005). Bardsley (2005) noted that experimental studies lack the rich, real-life context that may be important for behavior in the field. Moreover, experimental studies may be subject to an experimenter demand effect (Orne, 1962). Researchers have noted that participants in experimental studies may alter their actions to conform to the behavior that they believe the experimenter desires (e.g., Levitt and List, 2007; Orne, 1962). However, researchers who use hypothetical elicited methods, such as scenario-based experiments, argue that they provide a high degree of internal validity by manipulating and controlling variables. Further, they avoid the expense and ethical issues involved in real settings, such as actual service failures in a restaurant (Bitner, 1990). This controllability

∗ Corresponding author. Tel.: +1 765 496 3610. E-mail addresses: [email protected] (J.-H. Kim), [email protected] (S. Jang). 1 Tel.: +86 20 8411 4584. http://dx.doi.org/10.1016/j.ijhm.2014.05.004 0278-4319/© 2014 Elsevier Ltd. All rights reserved.

allows experimenters to test precise predictions derived from theories and/or models while holding all else constant (Calder et al., 1981). As discussed above, there is lively debate in the social sciences about field studies versus laboratory experiments. A number of hospitality and service marketing researchers favor the latter methodology for advancing causal knowledge (Falk and Heckman, 2009). A critical assumption underlying the interpretation of data from scenario experiments is that the results gained from this method can be extrapolated to a real-world setting. However, an important criticism by field researchers is that the external validity of the results of scenario experiments are questionable, especially in settings involving monetary loss (e.g., service failures) and those where customers’ emotions are more important than cognition (e.g., recovery situations). In an experimental condition, participants are not part of the service setting and therefore they do not need to worry about delays, financial loss, and waiting time (Michel, 2001). Previous empirical findings have bolstered criticisms of hypothetical elicited methods. For example, willingness-to-pay elicited from hypothetical decision tasks almost always exceeds willingness-to-pay elicited from non-hypothetical decision tasks (Little and Berrens, 2004; List and Gallet, 2001; Murphy et al., 2005). As a consequence, for scenario-based experiments to achieve their full potential as an invaluable empirical tool in service

126

J.-H. Kim, S. Jang / International Journal of Hospitality Management 41 (2014) 125–132

marketing, we need to examine whether we can reliably generalize the results of these experiments to real-world settings of interest to marketing researchers. With this issue in mind, this study investigates the potential correspondence, as well as incongruence, between scenario-based research methods (hypothetical experiment) and field study methods (non-hypothetical setting) regarding customers’ responses to service failures and service recoveries in restaurants. In particular, this study examines the ability of the scenario experiment to predict customers’ emotions (both positive and negative) and behavioral intentions in the real world. To that end, the following research questions are presented and tested in this study: (1) Can the results of scenario-based experiments be generalized to actual environments?; (2) If not, under what circumstances might the results correspond to those from the field?; and (3) If they do not correspond to results obtained in the field, how can the results (e.g., attitude and behavior) identified in scenario-based experiments be interpreted and the conclusions applied to natural settings?

Others have utilized the sequence-oriented problem identification (SOPI) research method, which blueprints a specific service transaction sequentially in a service encounter (Botschen et al., 1996). All these efforts are commendable and eliminate the skepticism surrounding the use of scenario methods. However, considering that researchers make inferences about real life when conducting scenario-based experiments, one of the most important questions is whether the responses generated from the method accurately predict real-world field behavior. Recently, some researchers have assessed whether results obtained in a lab are echoed in the field and vice versa (e.g., Aaker et al., 2008; Barsky, 2011; Levitt and List, 2007; Lusk and Norwood, 2009). The following section discusses previous studies that have compared results from laboratory experiments with those from field studies.

2.2. Review of previous lab-field comparisons 2. Literature review 2.1. Scenario-based experiments and field studies Conducting service failure and recovery research in the field is rare due to a number of challenges, including the expense and time involved (as the incidence of service failure is rare), ethical issues, and managerial unwillingness to intentionally impose service failures on customers (Bitner et al., 1990). For these reasons, previous empirical studies have used critical incident technology (CIT), which asks respondents to recall their actual critical failure incidents (e.g., Bitner et al., 1990) and written complaints (e.g., Tax et al., 1998). However, researchers have often criticized these methods due to various limitations. For example, Singh and Wilkes (1996) noted that the CIT method may produce flaws because of respondents’ recall bias (i.e., recalling the most significant service failures). Moreover, the time lag between the occurrence of the service failure and its subsequent description by the respondents may lead to reinterpretation of the incident (Johnston, 1995). Regarding the use of complaint letters, Day et al. (1981) stated that only a minority of dissatisfied customers write complaint letters, and therefore the complainers are not representative of all dissatisfied customers. The experimentally generated scenario method, which may overcome the limitations of other methods, has been widely used for studying service failures and recovery. It generally improves internal validity because it allows for tight control of the study environment. This control allows precise predictions derived from a theory or a model to be tested. Schendel and Hofer (1979) supported the use of experimental studies for several reasons. For example, experimental studies are ideal for dealing with questions that cannot be addressed through field research owing to access problems and expense. Moreover, the control inherent in experimental studies increases the ability to evaluate causal hypotheses. In the same vein, experimental research may provide an effective method for testing a nomological network. However, “realism” is the key drawback of this method. Respondents read a hypothetical scenario and are then asked to express how they feel (e.g., anger and satisfaction) about the described situation. Since the respondents are not part of the described service setting, they may not be sufficiently simulated to have a strong emotional response to the scenario. To overcome this issue with lack of realism, some researchers have focused on recent events, describing a service failure and recovery effort by a business a customer has recently patronized (e.g., Smith and Bolton, 1998).

Two prominent articles have examined the correspondence between laboratory experiments and field studies and came to different conclusions (Camerer, 2011; Levitt and List, 2007). Levitt and List (2007) expressed skepticism about generalizing the findings of laboratory experiments to the field. They contended that human behavior may be sensitive to a number of factors that systematically vary between the laboratory and real-world settings. Particularly, in experimental economics there are five ways in which a laboratory experiment influences human behavior: moral and ethical considerations, scrutiny of one’s actions by others, decision context, self-selection of the experimental subjects, and the stakes. On the other hand, Camerer (2011) highlighted evidence showing agreement across the two research methods by discussing six comparisons of laboratory and field studies. The criteria by which Camerer (2011) evaluated correspondence are based on whether the results from the two different types of studies arrived at the same effect sign, similar coefficients, and displayed correlations across contexts. After assessing the level of agreement between laboratory and field studies, Camerer concluded that “there is no replicated evidence that experimental economics lab data fail to generalize to central empirical features of field data” (p. 35). With these controversial claims in mind, we expanded our review of the literature comparing results across laboratory and field studies in parallel conditions. Table 1 summarizes this stream of research and its findings. As shown in Table 1, results for the correspondence between the two types of studies are mixed. Based on these results, it can be seen that not all laboratory experiments can be generalized to a field setting but that the results correspond to those in the field under some conditions. For example, assessments of emotional responses showed a relatively higher correspondence level between laboratory and field studies than assessments of behavioral responses. In particular, when a behavior was related to a sensitive issue (e.g., race or dignity) there were discrepancies between the two methods. This supports Levitt and List’s (2007) notion that scrutiny of subjects’ actions in an experimental setting causes individuals to respond and/or react differently in scenarios compared to real settings. To the best of our knowledge, no study in the service marketing literature has compared the results from experimental studies with those in the field. Therefore, much less is known about the correspondence between scenarios and real-world studies in this field. Thus, this study set out to examine the correspondence between the results of a scenario-based experiment, which is one of the most commonly used research methods in service marketing, and those of a field study.

J.-H. Kim, S. Jang / International Journal of Hospitality Management 41 (2014) 125–132

127

Table 1 Review of previous studies that compared results of laboratory and field experiments. Author (Year)

Context

Findings

Laboratory-field correspondence

Valentino et al. (2002)

This study explored the effect of political advertising, with and without racial cues, on support for George W. Bush preceding the 2000 presidential election. The authors conducted both a laboratory and a field study. Subjects in both studies watched identical experimental stimuli (control: product advertisement; neutral condition: no racial cues; white condition: white actors in the video; black and white condition: images of black actors receiving government services) While conducting longitudinal experiments both in the field and the laboratory, the authors investigated recollections of mixed emotions. In the field study, MBA students’ feelings after they received their graded exams were assessed in two different time frames (time one: when students reviewed test scores and time two: 2 weeks after time 1). In the experimental condition, the research subjects read two advertisements for a fictitious moving company, which were pretested to evoke either happy or mixed emotions. The participants’ emotions were assessed three different times (time 1: immediately after they read the advertisements; time 2: 1 week after time 1; time 3: 2 weeks after time 1) This study examined the effects of gender-specific appeals to donate to a public radio station based on contributions. Based on common practice in public radio fundraising (i.e., new potential donors are informed of the past donation behavior of others), the authors specifically tested the effect of identity congruency on donation behavior by manipulating the gender congruency. In the match condition, male (female) callers said “we had another member; he (she) contributed $240.” In the mismatch condition, the pronouns were switched. Both field experiments (during an on-air fundraising drive for a public radio station) and laboratory experiments were conducted. This research examined the influences on unethical work behavior while conducting two studies, a laboratory simulation (i.e., role play) and a field survey (majority of the study sample were supervisors and managers). The author specifically tested the relationship between mechanisms of moral disengagement, participation, and unethical behavior. The participants in the experiment were asked to play the role of a manager and were presented with a host of items and were required to indicate what actions they would take to resolve the issues contained within the set of items. The participants in both studies were asked to rate the items on an unethical deception scale. This study examined the influence of social factors (i.e., number of bidders and social information about the bidders) on sniping behavior in Internet auctions in a field and laboratory study. In the field study, the researchers examined actual eBay auctions, whereas a computer simulation, which replicated an online auction environment, was utilized in the laboratory experiment. Considering that real online auctions last days, the same duration of auction could not be replicated in the experiment. The subjects in the laboratory experiment joined the simulation only in the last minutes of the auction. In the real setting, sniping behavior was considered when a bid was placed within the last two minutes of the auction, whereas it was considered sniping in the laboratory experiment only if the bidder posted a bid in the last 60 s of the auction.

The outcome measure in both the laboratory and field study was candidate preference, where Al Gore = −1, no preference = 0, George W. Bush = 1. The mean of this variable in each group represented the percentage lead for George W. Bush. In the laboratory, control < neutral < white < black and white, whereas in the field, control < black & white < neutral < white

No

The results of the two studies consistently showed that mixed emotional experiences are more difficult to recall accurately compared to unipolar emotional experiences. Specifically, over time people remember mixed emotional experiences as less mixed. However, there was no memory decay effect with unipolar emotional experiences

Yes

Four laboratory experiments were conducted, as well as a field experiment. As the results of experiment 3A (among the four) best paralleled those of the field study, we reported the findings of a comparison between the field study and experiment 3A. In the field, the subjects donated more in the match condition than in the mismatch condition. However, this pattern was reversed in the laboratory experiment.

No

The findings were consistent across the studies that moral justifications and displacement of responsibility were significant predictors of unethical behavior. Moreover, participation in goal-setting (i.e., employee’s voice in the process) was negatively related to the perpetration of unethical behavior

Yes

The results from both the field and the laboratory experiments supported the influence of social factors on sniping behavior. They found that bidders rely on others’ bidding behavior and their social characteristics as an indication of the true value of the item on sale.

Yes

Aaker et al. (2008)

Shang et al. (2008)

Barsky (2011)

Kamins et al. (2011)

3. Methods 3.1. Research setting, design, and process Data was collected on service failure and recovery efforts from real dining experiences and from scenario-based experiences. The scenarios were carefully developed after combining the results from a preliminary study and a review of previous service failure studies (e.g., Silber et al., 2009; Tsai and Su, 2009). We interviewed 95 students at a large southern Taiwanese university using open-ended questions (e.g., recall the most recent

restaurant service failure you have experienced and describe how the management responded to the failure). By combining content analysis of the interviews with previous findings in the literature, six different types of service failures that potentially give rise to various degrees of negative emotions in a restaurant setting were identified: improperly cooked food, foreign objects in food, inattentive employees, rude/discourteous behavior by employees, slow service, and disordered delivery of dishes. Appropriate service recovery responses to the different service failures were developed adopting the same procedure. The following scenarios were developed to describe each of these service failure situations:

128

J.-H. Kim, S. Jang / International Journal of Hospitality Management 41 (2014) 125–132

Improperly cooked food: Some of the dishes you ordered do not taste good and are improperly cooked. You show the dish to the server and mention that it is not properly cooked. After a while, a manager comes to your table and finds that indeed your dishes were improperly cooked. The manager apologizes for the dish defect and asks the server to bring another one. Foreign object in a dish: While you are eating the dishes that you ordered, you find a bug in your food. You call to the server and show him or her the bug. The server goes to the kitchen and talks with other employees. After a while, a manager comes to your table and apologizes for this occurrence. The manager asks a server to bring another dish and also offers a 50% discount on the meal. Inattentive employees: While you are eating, you need more water. You try to find your server and discover that he or she is close to your table but does not pay any attention to you. The server seems to avoid eye contact with you and is busy chatting with a coworker. Finally, you call to the server and ask him or her to bring a glass of water. After a while, the server delivers a glass of water and then quickly goes back to chatting with the coworker. A manager eventually comes to your table and apologizes for the server’s attitude and behavior. Discourteous behavior: While you are eating, you need an extra plate. You look around to find your server and, at that moment, the server passes by your table. You call to the server and ask him or her to bring you an additional plate. After some time, the server arrives with a plate and bangs it down loudly on the table. You do not think the plate slips from the server’s hand; rather, you feel that he or she dropped the plate deliberately on the table. The server looks very irritated about bringing you the extra plate. You stare at the server because you are shocked by this attitude and behavior. A manager then comes to your table and apologizes for the server’s attitude and behavior. Slow service: A server informs you that your dishes will be delivered a little late. About 20 min after the server leaves your table, he or she returns and apologizes for the slow service and asks you to wait a little longer. After waiting for another 20 min, the server arrives with your food. While you are eating your food, the server comes to your table and asks you to choose a free dessert because of the slow service. Disordered delivery of dishes: While you are waiting for your food, you find that customers at other tables who arrived later than you have already started their meal. After you wait for a while, the server arrives with your food. The server is aware of your dissatisfaction with the order in which the customers were served and apologizes. Data were collected in Kaohsiung, the second largest city in Taiwan. The data collection was designed to cover a wide range of locations during both weekdays and weekends. Contact was made with the respondents at different downtown restaurants. A group of eight students who are able to communicate in Mandarin and Taiwanese were recruited and trained by one of the coauthors. The interviewees intercepted restaurant customers who had finished their meals and were exiting restaurants between 12:00 pm and 3:00 pm and between 6:00 pm and 9:00 pm. The restaurant customers were first asked whether they had experienced any service failures during their dining experience. Those who experienced failures and agreed to participate in the study proceeded to a short interview and received a survey questionnaire. Those who had not experienced any service failures and agreed to participate in the study received a written scenario describing different failure situations. They were asked to imagine a situation in which they were dining at the same restaurant where they had just finished their meal. The participants were randomly assigned to one of the service failures. The participants in both the real experience group and the scenario group were asked to rate their negative

emotions toward the service failure, their positive emotions toward the service recovery efforts, their overall satisfaction, and their behavioral intentions (i.e., customer loyalty and switching behavior). 3.2. Measures The questionnaire included manipulation checks for simulated service failures. The participants were asked to evaluate the dining experience presented in the scenario on a 7-point scale (i.e., wise choice to dine in this restaurant and overall satisfaction with the experience). Then, they were asked to indicate their negative feelings regarding the service failure, their positive feelings about the service recovery efforts, and their overall satisfaction. Multiitem scales were used to measure each construct in this study. Validated scales in previous studies were identified and modified to fit the study setting. The items in all the scales were measured on a 7-point Likert-type scale ranging from strongly disagree (1) to strongly agree (7). Emotions were measured with the constructs of anger, discontent, and contentment adopted from Richins (1997) and Smith and Bolton (2002). Anger was assessed with three items (Cronbach’s ˛ = .90): enraged, furious, and incensed. Discontent was evaluated with four items (Cronbach’s ˛ = .91): distressed, discontented, annoyed, and frustrated. Content with the recovery efforts was assessed with three items (Cronbach’s ˛ = .92): satisfied, pleased, and happy. Overall satisfaction was measured with two items (Cronbach’s ˛ = .90) adapted from Maxham and Netemeyer (2002): “Overall, I am pleased with the dining experience,” and “Overall, I am satisfied with the dining experience.” Behavioral intentions were assessed using items adapted from Athanassopoulos et al. (2001) and Bansal et al. (2004). Switching behavior was assessed using three items (Cronbach’s ˛ = .74): “I will look for another restaurant next time,” “I will eat at another restaurant next time,” and “I will not dine at this restaurant again.” Customer loyalty was measured with three questions (Cronbach’s ˛ = .86): “I would like to visit this restaurant again in the near future,” “I plan to revisit this restaurant again in the near future,” and “I will make an effort to revisit this restaurant in the near future.” 4. Results 4.1. Sample characteristics A total of 3105 restaurant customers were approached. Of those, 117 had experienced service failures and agreed to participate in the study (3.8%). Among the 117 failure cases, 20 included multiple failures (e.g., both process and interpersonal failures), and six were related to the restaurant environment. Of the 91 failure experiences considered in this study, 17 were core failures (5 no recovery; 12 recovery), 35 were interpersonal failures (26 no recovery; 8 recovery), and 40 were procedural failures (14 no recovery; 26 recovery). As the current study specifically focused on situations where individuals receive service recovery following a failure, incidents where customers did not receive any recovery were excluded from the data analysis. Therefore, 46 cases were retained for further data analysis. Of the 46 participants, 69.6% were female and 30.4% male. Since there was a significant gender imbalance in the sample, we conducted a chi-square test to investigate whether gender is related to emotions and behavioral intentions resulting from service failures and recoveries. The results showed that the mean ratings of emotions did not differ by gender (anger: 2 [13, N = 46] = 19.0, p = .123; discontent: 2 [14, N = 46] = 18.58, p = .182; content: 2 [16, N = 46] = 16.36, p = .428; overall satisfaction: 2 [11, N = 46] = 15.58, p = .157; loyalty: 2 [14, N = 46] = 13.61, p = .479; switching: 2 [14,

J.-H. Kim, S. Jang / International Journal of Hospitality Management 41 (2014) 125–132

129

Table 2 Comparison between the mean ratings of field study and scenario experiment. Variable

Field study (A)

Scenario experiment (B)

Mean difference (C: B − A)

Percentage difference (C/A)

p value

Remarks

Anger Discontent Content Overall Satisfaction Switching Loyalty

4.62 5.09 3.60 4.01 4.64 3.84

4.48 4.97 4.66 4.44 4.26 4.13

−0.14 −0.12 1.06 0.43 −0.38 0.29

−3.03% −2.36% 29.44% 10.72% −8.19% 7.55%

.514 .517 .000 .033 .015 .096

3.03% decrease in scenario data 2.36% decrease in scenario data 29.44% increase in scenario data 10.72% increase in scenario data 8.19% decrease in scenario data 7.55% increase in scenario data

N = 46] = 17.72, p = .220). The age of the participants ranged from 18 to 47 years, with an average age of 23.6 years. The majority of the sample ranged between 18 and 23 years (65.2%), followed by participants aged between 24 and 29 (28.3%), 30 and 45 (4.3%), and 46 and 64 years (2.2%). For the scenario data, 368 participants were recruited for the scenario-based experiment. Of the 368 participants, 54.1% were female and 45.9% male. The age of the study participants ranged from 18 to 73 years, with an average of 25.8 years. The majority of the participants were between 18 and 23 years old (64.1%), followed by participants between 24 and 29 (14.4%), 30 and 45 (13.3%), 46 and 64 (6.5%), and 65 or above (1.6%). Before we tested the correspondence between the scenariobased experiment and the field study, we reviewed the field interviews (i.e., description of service failures and recoveries) to check the consistency of the failure situations projected in the scenarios. The types of service failures in the two studies were highly comparable. For example, in the field study respondents experienced all of the same service failures described in the scenarios, such as a foreign object in a dish (8.7%), improperly cooked food (17.4%), poor behavior (13.0%), poor employee attitudes (4.3%), slow service (28.2%), and disordered delivery (21.7%). The only type of service failure identified in the real-world setting that was not included in a scenario was delivery of the wrong dish (6.5%). The service recoveries that customers received in the real world setting were also quite consistent with those developed in the scenarios. For core service failures, customers received high levels of recovery (e.g., discounts or a free dish). The only difference between the core service failures in the field study and the scenarios was that one subject received a souvenir (i.e., mug) following a core service failure (i.e., improperly cooked food). Regarding interpersonal service failures, such as employees’ attitudes and behaviors, the majority of the subjects (63%) received a low level of recovery (e.g., an apology), whereas the others received a high level of recovery (e.g., discounts). Some differences were identified in the field study for the remedies following procedural service failures (i.e., slow service and disordered delivery of dishes). In the real-world setting, the majority of customers (77.0%) who experienced slow service received a low level of recovery. However, in the scenarios the subjects received a free desert. Moreover, more than half of customers (60%) who experienced disordered delivery service in the real-world setting received a high level of recovery (e.g., discounts). However, we provided a low level of recovery (i.e., apology) in the scenario. The recoveries for these two real experiences of procedural failures were somewhat mismatched with those in the scenario (higher recovery in one and lower recovery in the other). However, as the number of cases were similar (slow service: 13; disordered service: 10) and they were included in the same failure group, we believe the possible influence of the unequal recoveries on the correspondence between the scenario and the field was canceled out. 4.2. Manipulation checks of scenario data To examine the effects of the recovery manipulation in the study, the mean ratings of the evaluations of the dining experiences in the

recovery group were compared with those in the no recovery group. The customers’ evaluation ratings were represented by the mean value of contentedness with service recovery efforts. Therefore, no recovery groups for the same service failures were added to the data set and a t-test was conducted on the variable. The results of a series of t-tests showed that the participants in the recovery group reported higher mean scores than those in the failure group (Mrecovery = 4.54, Mno recovery = 2.57, t = 23.38, p < .001). 4.3. Comparing analysis results between real dining experiences and scenarios To determine whether the results from the two data sources corresponded to each other, we conducted a t-test on the dependent variables. Although the mean differences were not significant, the scenario experiments seemed to understate negative emotions toward service failures (anger: Mfield = 4.62, Mscenario = 4.48, n.s., 3.03% higher in the field data than in the scenario data; discontent: Mfield = 5.09, Mscenario = 4.97, n.s., 2.36% higher in the field data than in the scenario data). The scenarios also appeared to overstate positive emotions. For example, the results of the t-test showed that participants in the real-world setting were less satisfied with the recovery efforts than those in the scenario (Mfield = 3.60, Mscenario = 4.66, t = −5.73, p < .001, 29.44% lower in the field data than in the scenario data). Similarly, participants in the field had a lower mean score on overall satisfaction than those in the scenario group (Mfield = 4.01, Mscenario = 4.40, t = −2.14, p < .05, 10.72% lower in the field data than in the scenario data). Moreover, the results showed that the mean rating of switching behavioral intentions was different from the data source (t = 2.44, p < .05, 8.19% higher in the field data than in the scenario data). The participants who actually experienced service failures in a restaurant were more likely to switch to another restaurant than those who experienced them in a scenario (Mfield = 4.64, Mscenario = 4.26) (Table 2). To closely examine the specific failure type and/or situation that yielded incongruent results between the two studies, we matched the samples from the field study and the scenario experiment by failure type (i.e., core, interpersonal, and procedural) and conducted a series of t-tests on dependent measures. As reported in Table 3, the most significant group differences were identified in the procedural failures. The mean ratings for four of the six dependent measures were significantly different. For example, the respondents in the field study showed significantly higher mean ratings for anger (Mfield = 4.74, Mscenario = 4.20, t = 2.26, p < .05) and discontent (Mfield = 5.27, Mscenario = 4.74, t = 2.57, p < .05) than those in the scenario experiment. Moreover, the respondents in the field study were significantly less satisfied with the recovery efforts than those in the scenario (Mfield = 3.43, Mscenario = 4.78, t = −5.75, p < .001). In the same vein, the respondents who actually experienced a procedural failure showed higher behavioral intentions to switch to another restaurant than those who read a scenario (Mfield = 4.55, Mscenario = 4.10, t = 2.39, p < .05). However, for core and interpersonal failures the differences for most of the dependent variables were minimal. For example, respondents who experienced core service failures in the

130

J.-H. Kim, S. Jang / International Journal of Hospitality Management 41 (2014) 125–132

Table 3 Mean ratings of dependent variables by service failure condition. Variables

Failure conditions Core failure Real (Mean ± SD)

Anger Discontent Content Overall Satisfaction Loyal Behavior Switching behavior

4.72 5.06 3.96 3.83 3.67 5.06

± ± ± ± ± ±

1.37 1.14 1.48 1.25 1.54 1.07

Interpersonal failure Scenario (Mean ± SD)

t value

Real (Mean ± SD)

± ± ± ± ± ±

.33 .06 −1.77 −1.69 −1.24 2.31*

4.04 4.58 3.31 3.94 3.89 4.33

4.57 5.04 4.68 4.48 4.12 4.30

1.51 1.33 1.33 1.27 1.18 1.08

± ± ± ± ± ±

.26 .65 1.35 .77 1.01 1.12

Procedural failure

Scenario (Mean ± SD)

t value

Real (Mean ± SD)

± ± ± ± ± ±

−1.38 −1.41 −3.14** −.82 −.31 −.09

4.74 5.27 3.43 4.10 3.91 4.55

4.64 5.11 4.55 4.27 4.01 4.36

1.30 1.11 1.14 1.17 1.15 .97

± ± ± ± ± ±

1.13 .97 1.85 1.56 1.42 1.22

Scenario (Mean ± SD)

t value

± ± ± ± ± ±

2.26* 2.57* −5.75*** −1.60 −1.79 2.39*

4.20 4.74 4.78 4.47 4.27 4.10

1.09 .94 .81 .95 .76 .76

Note. * p < .05. ** p < .01. *** p < .001.

field showed significantly higher intentions to switch to another restaurant than those who experienced the same failures in a scenario (Mfield = 5.06, Mscenario = 4.30, t = 2.31, p < .05). Moreover, when respondents in the field study experienced interpersonal failures they evaluated the service recovery efforts as significantly less satisfying than those who experienced these failures in the scenarios (Mfield = 3.59, Mscenario = 4.55, t = −2.31, p < .05). Figs. 1 and 2 portray the mean ratings of emotions and behavioral intentions by data source and failure type. 5. Discussion The results revealed differences between real world experiences and the scenario data. This suggests that researchers need to pay special attention when interpreting their findings from scenario experiments, especially when they make inferences based on the data. Overall, the mean ratings for positive emotions (i.e., contentment and overall satisfaction) in the field study were lower than those in the scenario experiment. Similarly, respondents in the field

study showed higher mean ratings for switching intentions and lower loyalty behaviors than those in the scenario group. Therefore, this study suggests that the mean ratings of positive emotions, as well as switching behaviors, in the scenario results need to be re-evaluated to accurately predict field behavior. There may be a number of explanations for the discordance between the results of the field and scenario studies. First, the subjects in the scenario read a hypothetical situation, whereas in the field study they actually experienced failure and loss, such as money or time. Therefore, although the scenario simulates a reallife dining experience, the subject’s engagement with the dining experience could be quite different. In the same vein, scenarios are artificial. Thus, the subjects could view the scenario more from an observer’s perspective than from the perspective of a customer with direct involvement. As a result, in the scenarios the responses to questions may be based on the subject’s cognitive evaluation (i.e., rational) of each situation, and the subject may not fully emotionally empathize with the failure situation. In the same vein, the experience of an actual procedural service failure in a real setting,

Fig. 1. Comparison between scenario and field data (Emotions).

J.-H. Kim, S. Jang / International Journal of Hospitality Management 41 (2014) 125–132

131

Fig. 2. Comparison between scenario and field data (Behavioral Intentions).

in which customers have waited for a long time to be seated and/or to receive their food, cannot be replicated by simulated scenarios. Therefore, the dependent variables associated with this type of failure differed significantly between the field-based and the scenario-based data. The findings suggest that procedural service failure scenarios should be carefully developed to simulate the same degree of emotion that customers experience in real settings. The development of the scenarios should include a description of the environment (e.g., waiting areas) and the customer’s level of hunger. In the current study, the mean ratings of positive emotions, such as contentment with recovery efforts and overall satisfaction, were significantly lower in the field setting than in the scenario experiment. Therefore, the value of positive emotions in scenario experiments should be somewhat depreciated to correctly infer those in the field. This kind of incongruent result between the two data sources appears to be supported by previous research, which reported “demand effects” resulting from the scenario method (e.g., McCollough et al., 2000). Experimenter demand effects refer to changes in behavior by study subjects due to cues about what the researcher is trying to investigate, what the researcher anticipates finding, and what this implies regarding how participants are expected to behave (Orne, 1962). In the current scenario-based experiment, following previous researchers who utilized the SOPI research method (e.g., Botschen et al., 1996), the scenario described a dining service transaction sequentially in a full-service restaurant (i.e., reception, ordering, consumption, and check-out). Moreover, the descriptions of the service stages containing routine service, service failure, and service recovery were separated. Therefore, the emotional ratings for the subjects in the scenario experiment are somewhat transaction specific. However, in a real dining setting customers make a cumulative evaluation considering multiple transactions and/or interactions at different stages of the dining experience. As a result, when scenario respondents read a description of recovery efforts, they may have been more focused on a specific event compared with customers in a real setting and rated their emotions accordingly. This divergence between scenario and field data could be severe, especially when researchers highlight the descriptions of failures and recovery situations in a scenario (e.g., indicating the failure incident with different colors, typing in bold, or using different font sizes). The other interesting result of this study is that the mean ratings for negative feelings toward service failures were generally higher for the field group than the scenario group, except for interpersonal failures. One plausible explanation may be related to the customer’s relationship with the employees and/or the restaurant. Previous researchers noted that the customer-business relationship moderates negative feelings toward failures and recovery effects (e.g.,

Kim et al., 2012; Kwon and Jang, 2012). The respondents in the realworld group may already have relationships or be familiar with the service employees from previous visits. Therefore, they may have a more lenient attitude toward interpersonal failures than those in the scenario group. 6. Conclusions This study compared emotional responses and behavioral intentions elicited from a scenario-based experiment with those from real failure experiences. Negative emotions, including anger and discontentment with service failures, measured across the two groups were quite consistent. However, positive emotions and switching behavioral intentions were quite different. The results provide some insights for researchers in service marketing who utilize scenario-based experiments. For example, they show that scenario-based experiments are particularly effective for measuring variables that are based on cognitive evaluations (e.g., assessment of negative feelings). They also suggest that scenarios should be carefully developed to study procedural service failures and interpersonal failures to create real world-like experiences. The strength of experimental studies is that they allow researchers to examine the true causal relations between variables while controlling confounding variables (e.g., Calder et al., 1981). However, researchers need to develop scenarios that include experiences very similar to those encountered in the field to make them simulate reality as accurately as possible. In the field, a full-service restaurant generally develops strategies to reduce a customer’s perceived waiting time, which can decrease associated negative emotions (Kim, 2011). For example, restaurants modifying the environment where waiting occurs by providing a newsstand, a television set, or music in a waiting area is a common practice. According to Zakay’s (1989) allocation model, the perceived amount of time is longer when an individual becomes conscious of the passage of time. On the other hand, time would fly when an individual is focused on an activity and/or is having fun. Therefore, the intensity of negative emotions resulting from the description of slow service could be significantly different from the experience in the real setting. As a result, scenarios should include a description of the environment and the emotionality of the failure, which is necessary to see emotional effects. According to the literature (e.g., Kim et al., 2012; Kwon and Jang, 2012), the customer-business relationship affects respondents’ evaluations of interpersonal service failures in the food service sector. Therefore, researchers should develop effective ways to stimulate real experiences in a scenario. For example, they can indicate a customer’s relationship with a business (e.g., first time visit) as well as use real names of service employees. To

132

J.-H. Kim, S. Jang / International Journal of Hospitality Management 41 (2014) 125–132

conclude, based on the results of the current study we suggest that researchers should acknowledge the different responses between scenario experiments and the field regarding positive emotions and switching behavioral intentions. Therefore, researchers should pay special attention while interpreting their scenario experiment results and re-evaluate them in order to predict field behavior accurately. We also believe that the current study offers strategies to overcome the limitations of utilizing scenario-based experiments and that researchers in the area of service marketing are in a better position to develop effective research methods to study service failures as a result of this study. Although the results of this study have useful implications for research, some limitations must be considered. First, we conducted a preliminary study and an extensive literature review to develop appropriate recoveries for each service failure. However, some recoveries developed in the scenarios, especially for procedural failures, did not accurately parallel those that customers received in real settings. In the scenarios, higher recovery was provided for slow service and lower recovery was provided in the disordered delivery service. Since the number of cases in the field study were almost equal (i.e., slow service: 13; disordered service: 10) and they were included in the same failure group, we believe the possible influence of any unequal recoveries on the correspondence between the scenario and the field was canceled out. However, this may have affected the results regarding the correspondence between the scenario-based experiment (i.e., procedural failure) and those of the field study, especially subjects’ evaluation of service recoveries and future behavioral intentions. Moreover, the data were collected from Taiwanese restaurant customers. Therefore, the generalizability of the findings is limited. As Mattila (1999) noted, Asian customers’ evaluations of service encounters are significantly lower than those of Westerners. Mattila (1999) attributed this to the significant power relations that exist in Asian countries where lower social status service employees are required to provide customers with high levels of service. On the other hand, Western countries are less accepting of social status differences and tend to expect more egalitarian service. Therefore, the results pertaining to interpersonal service failures in particular might be different if this study were replicated in a Western country. Another drawback of this study is that we could not control for customer relationships in the real experience group because of the small sample size. Considering that previous researchers have found that customer relationships are a significant factor in buffering failure situations (e.g., Kim et al., 2012; Kwon and Jang, 2012), care is needed when interpreting the results. Therefore, future research that includes multiple factors of the customer-business relationship (e.g., trust and commitment) as well as customer expectations and previous experiences is required to examine the moderating role of the customer-business relationship on the subjective evaluation of negative and positive emotions. Finally, to enhance the generalizability of the findings, future research should be expanded to include data from different populations and/or from different service industries. References Aaker, J., Drolet, A., Griffin, D., 2008. Recalling mixed emotions. J. Consumer Res. 35 (August), 268–278. Athanassopoulos, A., Spiros, G., Vlassis, S., 2001. Behavioural responses to customer satisfaction: an empirical study. Eur. J. Marketing 35 (5/6), 687–707. Bansal, H.S., Irving, P.G., Taylor, S.F., 2004. A three-component model of customer commitment to service providers. J. Acad. Marketing Sci. 32 (3), 234–250. Bardsley, N., 2005. Experimental economics and the artificiality of alteration. J. Econ. Methodol. 12, 239–251.

Barsky, A., 2011. Investigating the effects of moral disengagement and participation on unethical work behavior. J. Business Ethics 104, 59–75. Bitner, M.J., 1990. Evaluating service encounters: the effect of physical surroundings and employee responses. J. Marketing 54 (2), 69–82. Bitner, M.J., Booms, B.H., Tetreault, M.S., 1990. The service encounter: diagnosing favorable and unfavorable incidents. J. Marketing 54 (1), 71–84. Botschen, G., Bstieler, L., Woodside, A.G., 1996. Sequence-oriented problem identification within service encounters. J. Euromarketing 5 (2), 19–52. Calder, B.J., Phillips, L.W., Tybout, A.M., 1981. Designing research for application. J. Consum. Res. 8, 197–207. Camerer, C., 2011. The promise and success of lab-field generalizability in experimental economics: a critical reply to Levitt and List, Available at Social Science Research Network: http://papers.ssrn.com/sol3/papers. cfm?abstract id=1977749 Day, R.L., Grabicke, K., Schaetzle, T., Stauback, F., 1981. The hidden agenda of consumer complaining. J. Retail. 57 (Fall), 86–106. Falk, A., Heckman, J.J., 2009. Lab experiments are a major source of knowledge in the social sciences. Science 326 (5952), 535–538. Johnston, R., 1995. The zone of tolerance: exploring the relationship between service transactions and satisfaction with the overall service. Int. J. Service Ind. Manage. 6 (2), 46–61. Kamins, M.A., Noy, A., Steinhart, Y., Mazursky, D., 2011. The effect of social cues on sniping behavior in Internet auctions: field evidence and a lab experiment. J. Interact. Marketing 25, 241–250. Kim, J.-H., 2011. Application of the concept of multiphase experience to wait management in restaurant experiences. Asia Pacific J. Tourism Res. 16 (4), 379–394. Kim, W., Ok, C., Canter, D.D., 2012. Moderating role of a priori customer-firm relationship in service recovery situations. Service Ind. J. 32 (1), 59–82. Kwon, S.Y., Jang, S., 2012. Effects of compensation for service recovery: from the equity theory perspective. Int. Hospitality Manage. 31, 1235–1243. Levitt, S.D., List, J.A., 2007. Viewpoint: on the generalizability of lab behavior to the field. Can. J. Econ. 40 (2), 347–370. List, J.A., Gallet, C.A., 2001. What experimental protocol influence disparities between actual and hypothetical stated values? Environ. Resour. Econ. 20, 241–254. Little, J., Berrens, R., 2004. Explaining disparities between actual and hypothetical stated values: further investigation using meta-analysis. Econ. Bull. 3, 1–13. Maxham III, J.G., Netemeyer, R.G., 2002. Modeling customer perceptions of complaint handling over time: the effects of perceived justice on satisfaction and intent. J. Retail. 78 (4), 239–252. McCollough, M.A., Berry, L.L., Yadav, M.S., 2000. An empirical investigation of customer satisfaction after service failure and recovery. J. Service Res. 3 (2), 121–137. Michel, S., 2001. Analyzing service failures and recoveries: a process approach. Int. J. Service Ind. Manage. 12 (1), 20–33. Murphy, J.J., Allen, P.G., Stevens, T.H., Weatherhead, D., 2005. A meta-analysis of hypothetical bias in stated preference valuation. Environ. Resour. Econ. 30, 313–325. Orne, M.T., 1962. On the social psychological experiment: with particular reference to demand characteristics and their implications. Am. Psychol. 17, 776–783. Richins, M.L., 1997. Measuring emotions in the consumption experience. J. Consumer Res. 24 (September), 127–146. Schendel, D.E., Hofer, C.W., 1979. Strategic Management. Little, Brown, Boston. Shang, J., Reed II, A., Croson, R., 2008. Identity congruency effects on donations. J. Marketing Res. 55, 351–361. Singh, J., Wilkes, R.E., 1996. When consumers complain: a path analysis of the key antecedents of consumer complaint response estimates. J. Acad. Marketing Sci. 24 (4), 350–365. Silber, I., Israeli, A., Bustin, A., Zvi, O.B., 2009. Recovery strategies for service failures: the case of restaurants. J. Hospitality Marketing Manage. 18 (7), 730–740. Smith, A.L., Bolton, R.N., 1998. An experimental investigation of customer reactions to service failure and recovery encounter: paradox or peril? J. Service Res. 1 (August), 65–81. Smith, A.L., Bolton, R.N., 2002. The effect of customers’ emotional responses to service failures on their recovery effort evaluations and satisfaction judgments. J. Acad. Marketing Sci. 30 (1), 5–23. Tax, S.S., Brown, S.W., Chandrashenkaran, M., 1998. Customer evaluations of service complaint experiences: implications for relationship marketing. J. Marketing 62 (April), 60–76. Tsai, C.-T., Su, C.-S., 2009. Service failures and recovery strategies of chain restaurants in Taiwan. Service Ind. J. 29 (12), 1779–1796. Valentino, N.A., Traugott, M.W., Hutchings, V., 2002. Group cues and ideological constraint: a replication of political advertising effects studies in the lab and in the field. Political Commun. 19, 29–48. Zakay, D., 1989. An integrated model of time estimation. In: Levin, I., Zakay, D. (Eds.), Time and Human Cognition: A Life Span Perspective. North Holland, Amsterdam, pp. 365–395.