Accepted Manuscript Impertinent mobiles - Effects of politeness and impoliteness in human-smartphone interaction
Astrid Carolus, Ricardo Muench, Catharina Schmidt, Florian Schneider PII:
S0747-5632(18)30616-2
DOI:
10.1016/j.chb.2018.12.030
Reference:
CHB 5852
To appear in:
Computers in Human Behavior
Received Date:
06 January 2018
Accepted Date:
18 December 2018
Please cite this article as: Astrid Carolus, Ricardo Muench, Catharina Schmidt, Florian Schneider, Impertinent mobiles - Effects of politeness and impoliteness in human-smartphone interaction, Computers in Human Behavior (2018), doi: 10.1016/j.chb.2018.12.030
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Impertinent mobiles - Effects of politeness and impoliteness in human-smartphone interaction
Astrid Carolus, Ricardo Muench, Catharina Schmidt, Florian Schneider Julius-Maximilians-University Wuerzburg, Germany
Corresponding author. Media Psychology, Institute Human-Computer-Media, JuliusMaximilians-University, Wuerzburg, Oswald-Kuelpe-Weg 82, 97074 Wuerzburg, Germany E-mail address:
[email protected]
ACCEPTED MANUSCRIPT
Abstract This study aims to reveal first insights into human-smartphone interaction by focusing on the effects of smartphones either “speaking” politely or impolitely. Following the idea of media equation and the corresponding paradigm “computers as social actors” (CASA), smartphones are conceptualized as social agents suggested to elicit social responses in their human users (Nass, Steuer, & Tauber, 1994). In a laboratory experiment, (n = 85) participants interacted with a talking phone, which replied to them either politely or impolitely. Participants evaluated this phone twice, before and after they had received the phone’s feedback. ANOVA revealed polite phones to be evaluated significantly better than impolite phones. Comparing evaluations before and after the feedback showed that polite phones were revaluated regarding friendliness but not regarding competence. In contrast, the second evaluation of impolite phones deteriorated on both dimensions: friendliness and competence. Furthermore, results were not affected by ownership (subject’s own vs. not subject’s own phone). However, the gender of the phone (female vs. male voice) impacted the evaluation: impolite male phones were evaluated less positively regarding their competence, impolite female phones were not. Transferring the CASA paradigm to "talking smartphones” is considered as a heuristically fruitful approach to further analyze humans interacting with phones as well as with speech assistants in general. Results are discussed as an empirical contribution of conceptualizing “smartphones as social actors” (SASA), activating social norms originally exclusive for human-human interactions. Keywords: smartphone, media equation, social norms, vocal social agents, human-machine interaction, mobile phone, CASA, impoliteness
ACCEPTED MANUSCRIPT
1. Introduction Over the last few years, new technologies have emerged supporting and simplifying our everyday lives. Smartphones have turned out to be one of the most popular technological innovations offering a variety of functions and supporting their users regarding a variety of everyday challenges: For example, providing information regarding healthy living, health apps monitoring their user’s state of health, reminding them to drink more water or giving recommendations for training exercises (Stock, 2017). To do so, the device needs to give feedback on the user’s behavior. Consequently, if the users do not accomplish their goals, the phone will give a negative feedback – either by pushing a textual notification or a verbal output. However, negative evaluations might trigger negative emotional reactions. The feedback or even the sender of that feedback – the phone itself – might be devalued. From a rational point of view this reaction seems inappropriate. After all, the user himself installed the app and did not accomplish the goal. Furthermore, the user knows that the device does not act intentionally or consciously. Nevertheless, the user might react emotionally. Similarly, think of yourself: Have you never been upset with your computer freezing and not saving the latest version of your document? A psychological perspective asks for fundamental cognitions, emotions and motivations to complement the technological view and gain deeper insights into human-computer interaction. A series of experimental studies focused on users interacting with desktop PCs revealing the tendency to automatically react to computers as if they were human beings (e.g. Reeves & Nass, 1996; Fogg & Nass, 1997a; Nass & Lee, 2001; Johnson & Gardner, 2007). Nass, Steuer, and Tauber (1994) introduced the experimental paradigm “Computers As Social Actors” (CASA) to provide evidence that computers elicit social responses in their human users. Compared to the 1990ies, when these experiments were conducted, today’s technological equipment has changed fundamentally. Although desktop PCs are still in use, mobile technology has overtaken. Smartphones are the number one device we use continuously throughout the day constituting some kind of “digital companion” (Carolus, Binder, Muench, Schmidt, Schneider, & Buglass, in press). Furthermore, smartphones integrate a variety of functions, which used to be spread over several devices before. They are more interactive than desktop PCs, sending more cues (e.g. visual signals, sounds, vibrations, notifications), thereby adding further dimensions to the humancomputer interaction (HCI) in everyday life or extending existing dimensions. One of the most prominent is: phones have become talkative. For example, Google’s virtual assistant (Google Assistant) has been further extended over the last few years. Today, it supports a variety of functions on a variety of devices. If the assistant is installed on their smartphone, users will interact with it through voice demands, primarily. They engage in a two-way conversation with the system guiding them through shopping processes or suggesting and recommending certain products, for instance. In contrast to older devices, today’s smartphones do not only mediate communication with a human counterpart but have become the counterpart themselves. Considering intuitive voice control, smartphones might be regarded as “vocal social agents” meeting the requirements of the “social actor” which the CASA paradigm
ACCEPTED MANUSCRIPT postulates (Gehl & Bakardjieva, 2017; Nass & Brave, 2005). Considering smartphones as an “entity that obtains a range of social attributions and social responses” (Nass & Brave, 2005, p. 123), our study aims to reconsider the research tradition of CASA. By transferring its basic principles to human-smartphone interaction, we focus on users interacting with “talking” smartphones and will ask if they (unintentionally) adopt social rules originally exclusive for human-human interaction. In our experimental study we will ask: Do participants evaluate a smartphone which has replied to them politely in a different way than they evaluate one which has replied impolitely? In sum, this study attempts to gain further insights into (1) the transferability of the CASA paradigm to modern technologies and into (2) the basic principles of human-smartphone interaction.
2. Literature Review The basic idea of this paper is to focus on humans interacting with smartphones and adopting social norms actually exclusive for human-human interaction. By choosing smartphones we concentrate on one of the most popular devices which (1) meets the postulated CASA conditions and (2) offers the seminal feature of voice output. We will start our literature review with the research on media equation and the CASA paradigm, followed by theoretical principles of social cues, social norms, and politeness. We will end with a summary introducing the SASA paradigm constituting “smartphones as social actors”.
2.1 “Media equals real life”: computers as social actors Reeves and Nass (1996, p. 5) introduced the concept of media equation, which they briefly summarize as “media equals real life”. The fundamental assumption in a nutshell is: If a medium “communicates” with us, we unconsciously react as if it was a human being. More generally speaking, when interacting with technological devices which send social cues, users adopt basic social rules and norms they also adopt in human-human interaction. These social rules are followed rather unconsciously requiring only low active mental involvement (Langer, 1989). Thus, media equation is universal and almost unavoidable, it “applies to everyone, it applies often, and is highly consequential” (p. 5). Research referring to media equation transfers the social dynamics of human-human interaction to humancomputer interaction with most studies following a similar approach: the human counterpart is replaced by a media device to see if the same social rules apply (Johnson, Gardner, & Wiles, 2004). In sum, research has shown that even minimal social cues are sufficient to trigger social responses (Nass & Moon, 2000). Johnson and Gardner (2009) give an overview of media equation research presenting four basic categories of psychological effects: social rules and norms, personality traits, communication, and identity processes have been shown to be transferable to users interacting with computers. Regarding their literature overview, we confine ourselves to exemplifying a brief selection of findings showing the
ACCEPTED MANUSCRIPT impact of minimal social cues. For example, a computer appearing similar to the participant’s personality was rated significantly better than a computer appearing dissimilar (Nass, Moon, Fogg, Reeves, & Dryer, 1995). Nass and his colleagues were also able to show that simply telling participants that a computer belongs to the same team as they do (signified by a color) results in significantly better evaluations of both performance and friendliness compared to a computer of the other team (Nass, Fogg, & Moon, 1996). As previously introduced, the paradigm of “computers are social actors” condenses these findings (Nass & Moon, 2000; Nass & Steuer, 1993; Nass et al., 1994) thereby defining “social actors” as entities that appear to adopt social traits and to give social responses (Nass & Brave, 2005). These phenomena are hard to explain rationally. If participants are asked directly, whether they regard computers as intentional actors, they will deny it (Nass & Moon, 2000). Thus, social responses do not arise from conscious beliefs (Nass, Steuer, & Tauber, 1994). To resolve this contradiction, Reeves and Nass (1996) adduce an evolutionary perspective this paper primarily refers to. The human brain as well as the human body are products of the evolution “designed by natural selection to serve survival and reproduction” in our ancestors’ world (Hagen, 2002). As a consequence, the human brain is adopted to the so-called environment of evolutionary adaptedness (EEA) which differs radically from today’s world. In EEA, all objects perceived were real physical objects resulting in the heuristic equation of everything that appeared as a person sure enough was a person. As successfully interacting with these other persons was fundamental for survival and reproduction, the evolved mechanism to automatically and therefore resource-efficiently detect human beings was advantageous. These evolved psychological mechanisms still shape today’s behavior in rapidly processing social information resulting in automatic reactions (Buss, 2015; Reeves & Nass, 1996). Thus, our brains are adapted to the challenges of EEA, but encounter today’s new media and technology. Today, the formerly exclusively human cues are sent by technological devices still resulting in automatic and unconscious reactions - just as they did when sent by another human. One might disagree with the idea of an evolutionary adaption problem and simply refer to the novelty and unfamiliarity of modern technology, which will decrease with increasing user experience. However, Johnson et al. (2004) showed that experienced participants were particularly prone to effects of media equation. In their study, only participants who reported to be experienced computer users showed behavior corresponding to media equation predictions. The authors concluded that high experience results in an automatic and mindless usage of computers facilitating social reactions towards technology. Revisiting CASA research so far reveals that not all studies confirmed the idea of media equation resulting in an inconsistent total view. Nass and Brave (2005) themselves mentioned several unsuccessful and unpublished studies. Additionally, several published studies failed to show that participants follow social rules when interacting with computers (Shechtman & Horowitz, 2003), PDAs (Goldstein, Alsiö, & Werdenhoff, 2002) or twitter chatbots (Mou & Xu, 2016). Kiesler and Sproull (1997) reported two studies in line with the idea of CASA by showing that users trust and cooperate with a computer. However, Kiesler and Sproull assume “that the user’s response is an ‘as if’ response
ACCEPTED MANUSCRIPT rather than a true attribution of humanity” (p.197), which is rather limited to specific experimental settings. Their assumption refers to a concept named ‘ethopoeia’ (Reeves & Nass, 1996): social reactions to a non-social entity are “mindless” and would be denied, consciously. In sum, CASA studies reveal ambiguous results, which requires further research focusing on the interaction with modern technology. The idea of media equation and its theoretical assumptions are considered a promising framework for research focusing on humans interacting with computers. The corresponding research paradigm considers “computers as social actors” and originates in studies focusing on desktop PCs, the popular devices at the time. Results showed that computers sent cues which triggered social reactions known from HCI. More than 20 years later, devices as well as user behavior have changed fundamentally. However, different CASA studies focusing on more recent technology are far from drawing a comprehensive picture regarding both the completeness of devices analyzed as well as the methodological implementation. Especially smartphones seem to be worthy of being considered. On the one hand, they have succeeded desktop PCS as the most common device. On the other hand, they exceed them regarding both functions and options of use, as a result of which the smartphone is becoming the interaction partner itself. Regarding the criteria of a “social actor” which the CASA paradigm postulates, smartphones are to be considered as a research object of modern CASA studies.
2.2 Introducing: Smartphones as social actors (SASA) CASA originally refers to desktop PCs. However, mobile devices, particularly smartphones, have overtaken. In contrast to immobile desktop PCs, smartphones accompany us throughout the day resulting in a fundamentally different usage behavior regarding both the variety of functions as well as the frequency and duration of use (Lee, Chang, Lin & Cheng, 2014). Independent of time and place, smartphones are used more intensively, more comprehensively and more intuitively. They respond to their multifaceted users’ commands in a more multifaceted way than older technology. Smartphones send a variety of social cues, which are postulated to constitute the fundamental precondition of CASA. For example, if we are lost we will ask our phone for directions. And our phone will respond by voice output navigating the way to unknown places. It fulfills tasks, reacts and behaves in ways that only human counterparts have been capable of – up to now. Consequently, if the bulky equipment of the 1990ies which was far more limited regarding usage, functions and output was able to elicit social reactions in their users, we postulate smartphones to do the same. Thus, we consider them to be “social actors” and to constitute the coherent research object in terms of modern CASA studies. Consequently, we refer to “smartphones as social actors” (SASA). Literature research revealed (1) a limited amount of studies on that subject (2) yielding mixed results. Goldstein et al. (2002) replaced desktop PCs with personal digital assistants (broadly defined as the ancestors of smartphones) to replicate the effect that a present device is evaluated better by its users than an absent one (Reeves & Nass, 1996). However, they could not replicate the original findings. Similarly, Lang, Klepsch, Nothdurft, Seufert, and Minker (2013) revealed a web-based anthropomorphic
ACCEPTED MANUSCRIPT agent would not be assessed more positively overall if the agent himself asked the evaluating questions. Thus, they also failed to confirm the original results by Nass, Moon, and Carney (1999), who showed that evaluations differ if the computer itself asked for the evaluation or if participants changed device to give the evaluation. More recently, Carolus, Schmidt, Schneider, Mayr, and Muench (2018) confirmed the original results. In a laboratory experiment, participants interacted with a smartphone. Afterwards, they evaluated this phone. Results showed that their evaluations were significantly better if the target phone itself asked for these. Kim (2014) analyzed smartphones acting as specialists in an advertising context. Compared to generalist smartphones, specialist smartphones elicit greater trust in advertisements and an increased purchase intention toward the advertised products. Wang (2017) took smartphone users’ social dispositional factors into account. By conducting an online survey, he revealed associations between dispositions and users’ acceptance and awareness of anthropomorphism. In another study, Carolus, Schmidt, Muench, Mayer, and Schneider (2018) focused on gender stereotypes. Participants of their study interacted with a phone to solve social dilemmas with a phone, which was presented either in a blue or a pink sleeve. Results revealed that participants ascribed significantly more masculine attributes to the blue smartphone and more female attributes to the pink phone. Furthermore, the blue phone was rated to be more competent and its advice was followed significantly more often. To summarize, we therefore argue to consider “smartphones as social actors” and as research objects of modern CASA studies. First empirical SASA studies revealed mixed results pointing to the need for more research. Consequently, we maintain the basic principles of an established paradigm but adopt it to modern technology. Updating the CASA to the SASA paradigm seems to be a promising approach to gain deeper insights into human-smartphone interaction.
2.3 Social cues: voice and speech Social cues sent by the device constitute the prerequisite of the phenomenon of media equation. As outlined above, research following the CASA paradigm considered a variety of cues ranging from gender cues to cues of politeness. First, studies manipulated the voice or speech of devices to analyze participants’ responses. In the process, they manipulated gender cues, one of the most salient identity cues manifested in human voice. Nass, Moon, and Green (1997) pre-recorded male and female voices, which were played back via a CD-player to create the illusion of a talking computer. A few years later Lee, Nass & Brave (2000) used CSLU Toolkit, a software to generate speech, to implement male and female voices. In both experiments, the voices presented were the only social cues to possibly evoke gender stereotypic responses. Results yielded first empirical indications of stereotypes known from human-human interaction to be adopted in HCI. As a result, evaluations of the performances of the computers were biased corresponding to gender stereotypes (Lee et al., 2000; Nass et al., 1997). In addition to gender, voice can also function as a marker of personality (Brown, Strong & Rencher, 1973) and therefore influence the perception of the speaker e.g. in terms of friendliness.
ACCEPTED MANUSCRIPT Compared to the technology of the 1990ies, modern “intelligent personal assistants” like Siri for iPhones, Microsoft’s Cortana, and Google’s Assistant for Android phones allows users to talk to their phone which “talks” back, e.g. answering questions or requesting web services. Following the idea of conversation to constitute the next natural form of human-computer interaction (Luger & Sellen, 2016) and considering the growing popularity of devices and apps using speech interfaces, this study chooses vocal cues over textual cues. Implementing voice control means integrating basic principles of human expression and interaction. Pinker (1994) argues from an evolutionary perspective and assumes a universality of language as an instinct or innate ability of humans. Furthermore, speech is regarded as unique to humans constituting a “definitive marker of humanness” (Nass & Gong, 2000, p. 38). In line with the evolutionary explanation of automatically recognizing human beings by social cues in the EEA, hearing a voice meant hearing another human being. Massaro (1998) pointedly derives the simple heuristic principle that has been universally valid for over 200,000 years: speech equals human. Furthermore, taking into account the importance of speech to identify gender (Slobin, 1971), the equation might be extended to: speech equals gender. Again, from an evolutionary perspective, gender is a highly relevant characteristic of humans. Thus, detecting gender rather automatically, visually and auditorily, was advantageous regarding survival and reproduction. Consequently, gender is one of the first features attributed to a voice and also the most important criterion when it comes to similarities and differences between voices (Singh & Murry, 1978). However, speech as a sufficient condition of humanity is challenged by modern technology using text-to-speech (TTS). TTS systems have evolved rapidly. Even so, the quality and structure of text-to-speech are still limited resulting in e.g. unexpected breaks, wrong emphasis of words and sentences or pauses between syllables. Although these limitations are decreasing, they still allow to intuitively distinguish between synthetic and human voices (Kamm, Walker, & Rabiner, 1997). Nevertheless, computer voices as well as fragments or nuances of speech are by tendency interpreted as a “communicative act”, which we actively try to make sense of (Grice, 1975). Even syllables arbitrarily combined and played backwards are still processed in the manner of processing normal human speech (Slobin, 1971). Consequently, although TTS may be detected as being non-human it is interpreted as a communicative act we react to accordingly. Also in contrast to recordings and playbacks of human speech, TTS is considered an efficient way of implementing experimental manipulation of CASA and SASA studies.
2.4 Social norms: politeness and impoliteness Throughout the history of social science, numerous scientific publications have addressed social norms attempting to describe and explain human behavior (Cialdini & Trost, 1998), guiding as well as restricting their social behavior coherence (Sherif 1936; Triandis, 1994). Violations of these norms are carried out by social networks rather than by judicial systems. Hence, social norms promote e.g. social control and group coherence (Blake & Davis, 1964; Hechter & Opp, 2001). Politeness, for example, is “an integral part in any human society” reflecting and regulating social distance (Stephan, Libermann,
ACCEPTED MANUSCRIPT & Trope, 2010, p. 268). Goffman (1967) referred to public identity when introducing “face” as “an image of self delineated in terms of approved social attributes – albeit an image that others might share” (p. 5). Brown and Levinson (1978) took up this idea understanding faces as “basic wants, which every member knows every other member desires” (p. 312). Because face depends on the ascription of other group members we are motivated to save our own as well as the others’ faces, thus constraining individual self-interests. Considering the face as being rather fragile and vulnerable when interacting socially, “face work” and polite behavior offer a solution. Acting politely saves both the speaker’s and the listener’s face. In contrast, so-called “face threatening acts” attack the addressees’ face by ignoring their needs, pressuring them or directly accusing them verbally, paraverbally or nonverbally. According to Brown and Levinson (1978), several communicative acts are to be defined as inherent face threatening acts, e.g. giving negative feedback to someone. To avoid insulting the recipient, face threatening acts need to be mitigated. This mitigation is achieved by using “politeness strategies”, which weaken “face threatening acts” and allow the recipient to save face. Humans try to maintain their own positive selfimage until an interaction partner acts contrary to rules of politeness (Brown & Levinson, 1978). If the interaction partner threatens one’s positive self-image, the threatened person will act accordingly. He will violate rules of politeness to defend his positive self-image. However, violating social norms regarding politeness elicits negative emotional reactions (Goffman, 1967) and causes offence (Culpeper, 1996), resulting in negative sentiments towards the perpetrator and causing sanctions by the social network (Blake & Davis, 1964). The basic principles of politeness as a social norm have already been applied to CASA research. Fogg and Nass (1997b) as well as Johnson et al. (2004) tested users’ reaction towards a feedback given by a computer. Participants interacted with a computer, which rated their performance by giving either a neutral or positive feedback. Hence, participants were informed that the computer was either rating their actual performance (indicating “praise”) or that the computer’s feedback was generated automatically (indicating “flattery”). Subsequently, participants evaluated the computer: the computer giving positive feedback was rated significantly better than the computer giving neutral feedback regardless of the feedback type (praise vs. flattery) indicating participants to be following social norms of politeness. In another study, Nass et al. (1994, p.74) showed that participants who were asked by a computer about its performance rated this computer significantly better than participants asked by a different computer or evaluating via paper and pencil questionnaire. Thus, a computer “asking” for direct feedback “itself” seemed to elicit the social norm of politeness. In line with the interviewer-based bias known from human interviewers (Finkel, Guterbock, & Borg, 1991), participants would respond more positively when the computer itself asked for its evaluation. As outlined before, Carolus et al. (2018) revealed significantly better evaluations for the target phone for its evaluation asking itself. Previous research also analyzed the effects of criticism in terms of negative feedback given by a computer. Regarding human-human interactions, problems that cannot be solved and need to be announced to the conversation partner result in either taking the blame or assigning blame to somebody
ACCEPTED MANUSCRIPT else. Taking the blame is the modest solution. But since modest comments are believed to be true they might lead to being perceived as incompetent (Nass & Steuer, 1993). In an experimental study, participants were told to interact with a phone-based system for buying books. While interacting, the system reported several errors for which it either blamed itself or the user. Results revealed that a system that blamed itself was rated to be more likeable. However, a system blaming the user was rated to be more competent (Nass & Brave, 2005). In sum, previous CASA studies which focused on politeness, mostly neglected the impact of impoliteness. Positive feedback was contrasted with neutral feedback but not with unfriendly or impolite feedback thereby confirming Locher and Bousfield (2008) criticizing an “enormous imbalance that exists between academic interest in politeness phenomena as opposed to impoliteness phenomena” (p. 1). Identifying a desiderata here, this paper focuses on impoliteness or face threatening acts constituting a phenomenon fundamental for social coexistence or communal life.
2.5 Summary: Smartphones talking either politely or impolitely A considerable amount of research focusing on human users interacting with computers is based on the CASA paradigm revealing social rules derived from human-human interaction to be relevant here (Fogg & Nass, 1997a; Moon & Nass, 1996; Nass & Lee, 2001). Speech or language constitutes a social cue essential for human interaction and increasingly adopted by modern technology. Furthermore, with usage becoming more interactive, e.g. replying to users, social norms are becoming more relevant. Considering smartphones as an (1) omnipresent device we routinely interact with on a daily basis and as a (2) device able to send a variety of rich social cues, results in smartphones becoming some kind of “digital companion” throughout the day. From this point of view smartphones appear as “social actors”, which should be at least equivalent to desktop computers, which the CASA paradigm is originally based on. However, literature research indicated that studies focusing on smartphones as social actors are rather rare. Furthermore, CASA studies so far have analyzed effects of politeness, disregarding impoliteness. Constituting a desiderata here, this study aims to contribute to the analysis of humansmartphone interaction broadly asking if users apply social norms to the interaction with smartphones “behaving” politely or impolitely. To do so, we transferred the basic principles of CASA to smartphones and focused on politeness as well as on impoliteness. Thus, we asked if users would evaluate a smartphone they had interacted with before differently depending on whether this phone was “polite” or “impolite”.
2.6 Hypotheses Politeness has been introduced to be a universal social norm in human-human interaction resulting in behavior patterns to be expected from conversational partners. Fogg and Nass (1997b) analyzed these
ACCEPTED MANUSCRIPT patterns using the CASA paradigm and distinguishing between computers giving either “polite” or “neutral” feedback. However, two aspects do limit conclusions. (1) “Neutral” feedback was defined as a message which was neither positive nor negative. Following Brown and Levinson (1978), “neutrality” needs to be distinguished from impoliteness, which they defined not only as the absence of politeness but as a face threatening act and therefore as a violation of social norms. This violation elicits negative emotional reactions (Goffman, 1967), negative sentiments towards the perpetrator (Blake & Davis, 1964) or a counter-attack in order to defend one’s self-image (Brown & Levinson, 1978; Culpeper, 1996). (2) Furthermore, evaluation of the computer was limited to three items (helpful, intelligent, and insightful). As a consequence, to adopt the CASA paradigm and the focus on politeness to humansmartphone interaction we did not replicate the study by Fogg and Nass (1997b) but did transfer the basic principles of their CASA study (a) replacing desktop PCs with smartphones and (b) modifying the type of feedback replacing “neutral” feedback with “impolite” feedback. Furthermore, (c) we differentiated the dimensions on which the participants evaluated the phone to ask for both a taskoriented dimension (competence) and a relationship-oriented dimension (friendliness). Three blocks of hypotheses were derived. First, we expected users’ evaluations of a smartphone to differ as a consequence of whether they had interacted with a “polite phone” or an “impolite phone” before. Thus, we hypothesized participants who had received an impolite reply by the smartphone to evaluate this specific phone significantly worse regarding friendliness (H1a) and competence (H1b) than users who had interacted with a “polite” phone. H1a: Concerning friendliness, smartphones which have given impolite feedback are evaluated significantly worse by their users than smartphones which have given polite feedback. H1b: Concerning competence, smartphones which have given impolite feedback is evaluated significantly worse than smartphones which have given polite feedback. Second, we focused on changes in the evaluations over time as an effect of polite feedback (H2) and impolite feedback (H3). Hence, we compared the evaluation before and after the manipulation of feedback separately for each group. Following Fogg and Nass (1997b), smartphones giving positive feedback should be evaluated more positively, afterwards. Thus, we expected that users would evaluate the phone to be friendlier after it has given polite feedback (H2a). However, we expected the improvement to be limited to friendliness, the relationship-oriented dimension. Hence, we did not expect any impact on competence, the task-orientated dimension (H2b). H2a: Concerning friendliness, smartphones are evaluated significantly better after they have given polite feedback (t2) than before the feedback (t1). H2b: Concerning competence, smartphones are not evaluated significantly differently after they have given polite feedback (t2) than before the feedback (t1).
ACCEPTED MANUSCRIPT
In contrast to the divergent impact of polite feedback on friendliness and competence, the impolite feedback of smartphones should influence both friendliness (H3a) and competence (H3b). As previously mentioned, negative feedback given by a device was shown to result in an ascription of competence (Nass & Brave, 2005). However, our smartphones did not only give negative feedback but actively devalued participants’ contributions resulting in a face threatening act which needs to be distinguished from mere criticism. Following this reasoning, an “impolite phone” was expected to be evaluated less friendly after having replied to the users impolitely. Further, the threatening act of impoliteness should result in a more general punishment not distinguishing between characteristics that are task-related and those that are not (Goffman, 1967). Hence, it was expected that “impolite smartphones” were also evaluated to be less competent. H3a: Concerning friendliness, smartphones are evaluated significantly worse after they have given impolite feedback (t2) than before the feedback (t1). H3b: Concerning competence, smartphones are evaluated significantly worse after they have given impolite feedback (t2) than before the feedback (t1). In prototypical CASA studies, participants were to interact with a desktop PC they had never worked with before. Compared to smartphones, desktop PCs were less individualized regarding settings and programs and it was more common to share PCs (e.g. within families, in offices or libraries). In contrast, a smartphone is almost exclusively used by a single owner who installs certain apps, ringtones or wallpapers. Using one’s individualized smartphone as extensively as statistics reveal might elicit some kind of emotional attachment to the phone e.g. conceptualized as trust or involvement, which has not been established with desktop computers (Rempel, Holmes & Zanna, 1985; Walsh et al, 2011). Consequently, transferring the CASA paradigm to the context of human-smartphone interaction and thereby requesting participants to interact with an unfamiliar phone calls for an analysis of this unfamiliarity itself. Following Reeves and Nass (1996), who consider media equation to be universal and unavoidable, effects should be independent of the specific device. However, interacting with a foreign phone participants are not accustomed to (e.g. usability or operating system) might affect the evaluations. In order to test for potential effects of ownership on the evaluations of an unfamiliar phone or of one’s own phone we ask research question 1 (RQ1): RQ1: In contrast to one’s own smartphone, are smartphones that do not belong to the participants evaluated significantly differently regarding competence and friendliness? As with ownership, evaluations might also be affected by the voice the smartphone talks with. As outlined above, voice is not only a “definitive marker of humanness” (Nass & Gong, 2000, p. 38) but
ACCEPTED MANUSCRIPT also a marker of gender (Slobin, 1971). Consequently, and independent of the level of politeness, the evaluation of the phone might be affected by the gender of the smartphone voice. To check for potential effects regarding (ascribed) gender of the smartphone we ask RQ2: RQ2: Are smartphones which either talk in a female or a male voice evaluated significantly differently regarding competence and friendliness?
3. Methods 3.1 Participants A total of 85 university students (49 females, 36 males), voluntarily participated in a laboratory study from 5 to 28 July, 2017. Participants were recruited through social media platforms, onlineadvertisements, the university platform for the recruitment of participants as well as cold calls. In return for their participation people had the chance of winning one of 15 restaurant vouchers. Participants ranged in age from 18 to 31 with an average age of 22.45 years (SD = 2.63). Regarding smartphone usage, they reported a mean experience of 9.08 years (SD = 2.65) and 4.16 hours of daily usage (SD = 2.24).
3.2 Material We conducted a laboratory study following the design established by Fogg and Nass (1997b) as well as Johnson et al. (2004). Participants were to evaluate computers which had given either polite or neutral feedback before. Our study differed from this approach focusing on phones and not computers as well as on the effects of these devices acting politely in contrast to impolitely. Both aspects required pretesting. First, the assertion of feedback given by the phone to be either impolite or polite needed to be pretested. Second, as the smartphone communicated by speech and as a person’s voice usually reveals his or her gender we were to avoid biases due to potential differences between the voices themselves. Therefore, the second test was carried out to make sure we used a male and a female voice that were evaluated similarly. Four variations of feedback were needed for the final experiment: a polite and an impolite feedback given by both a female and a male voice. To develop the feedback, participants rated a polite and an impolite text which presented together with two distractor phrases. Structure, number of sentences as well as content were controlled. Only politeness differed depending on the user’s performance being either praised (“Thank you for your cooperation. Thanks for taking your time to support me. I appreciate your support and your feedback. The data you provided will contribute a
ACCEPTED MANUSCRIPT significant part to the development of this software.”) or criticized (“Unfortunately I cannot thank you for your cooperation. You did not take enough time to support me. I do not appreciate your support and your feedback. The data you provided will not contribute to the development of this software.”) with the distractor phrases containing neutral feedback regarding the personal information participants provide (“The analysis of your data serves the development of this software.” and “Unfortunately the data you provided is not suited for further analysis. It will be excluded from further development.”). In a pretest, participants rated politeness of the phrases on a seven point Likert scale ranging from (1) impolite up to (7) polite. An online survey realized via Google Forms was voluntarily completed by 118 women and 21 men. On average, they were 28.76 years old (SD = 9.52). Dependent-samples t-tests revealed significant differences with the “polite” feedback (M = 5.83, SD = 1.12) to be rated significantly better than the “impolite” feedback (M = 2.09, SD = 1.51); t (138) = 23.58, p < .001). Consequently, the “impolite” and the “polite” feedback were used for the experiment. A second pretest focused on the voices; in this test the feedback was given to find out whether there were any differences in evaluations referring to the voice itself and not to the manipulation of politeness. Two voices (male and female) were generated via the text-to-speech platform ‘acapelabox’ (www.acapelabox.com). To check for any differences in phonemic structure as well as content potentially biased regarding gender stereotypes, a neutral meaningless text was used for the pretest (Banse & Scherer, 1996; Scherer, Banse, Wallbott, & Goldbeck, 1991). Two standard sentences were recited by the male and the female voice: “Hat sundig pron you venzy” and “Fee gott laish jonkill gosterr” (Scherer et al., p. 126). Audio outputs were presented to 76 students (85% female) of the who listened to the audio files and rated these voices in terms of their gender which was identified accurately by every participant. Further, participants rated the friendliness of the voice by assessing how “friendly”, “warm”, “fun”, “likable”, “polite” and “enjoyable” the voices were (see Nass et al., 1999). The scale exhibited good reliability with Cronbach’s Alpha = .82. To minimize the risk of gender biases we needed both voices to be similarly evaluated. To test, we conducted a dependent-sample t-test. Due to the fact that no meaningful impact of voices was expected, the alpha-level was set to .2 which is the usual procedure to control for Error Type II (Dixon & Massey, 1950). T-test supports null hypothesis with evaluation of friendliness of a male voice (M = 4.08, SD = 1.01) not differing significantly from a female voice (M = 4.07, SD = 1.16); t (74) = 2.86, p = .91, d = .01.
3.3 Procedure and Data Analyses Participants were welcomed by the experimenter and introduced into the experimental setting. They agreed to the ethical guidelines following the ethics commission of the German Psychological Association (DGPs). Subsequent experiment consisted of four different parts: (1) interaction with the smartphone, (2) first evaluation of the smartphone by participants (time of evaluation: t1), (3) feedback by the smartphone, (4) second evaluation of the smartphone by participants (time of evaluation: t2). To
ACCEPTED MANUSCRIPT interact with the smartphone a combination of speech-output and text-based supplements as well as the use of the touch screen for input were utilized. The interface was provided via the online tool ‘soscisurvey’ (www.soscisurvey.de). To simulate a smartphone application as externally valid as possible typical elements of the browser the tool worked in were hidden. Participants were introduced to this app and started the interaction thereby activating speech output. While the phone “talked” extracts of the text were also presented on the screen supporting the participants’ comprehension. Half of the participants interacted with a female voiced phone, the other half with a male voiced phone (gender). Initially, the smartphone welcomed the participants and described the pretended purpose of the main study: (1) the participants’ help is needed for the development of a new internet search algorithm that is heavily personalized and (2) their help entails disclosing personal information. The smartphone then asked participants to disclose personal information increasingly private, starting with gender and age and ending with banking information. Participants could provide each piece of information or decline to answer the question. After completing this task, and without having received feedback yet, the participants changed position and evaluated the smartphone at a desktop PC at a desk a few meters away. This survey was carried out using soscisurvey again. However, the interface looked differently compared to the one displayed on the smartphone. Next, participants returned to the smartphone which gave them a feedback on their performance regarding support in terms of task attendance – either politely or impolitely (politeness). For the feedback the exact wording pretested before was used: politely (“...You really supported me. I appreciate it.”) or impolitely (“... You did not take enough time to support me. I do not appreciate it.”). Afterwards, participants returned to the desktop PC to evaluate the smartphone again and to report demographic information. In addition, the ownership of the smartphone was randomized. Half of the participants interacted with their own phone, the other half with an unfamiliar one. Our hypotheses focused on the potential effects of a polite in contrast to an impolite phone. Consequently, hypotheses were tested in a 2x2 experimental mixed factorial design. Participants were randomly assigned to two feedback conditions with a smartphone replying to the participant either politely (polite feedback) or impolitely (impolite feedback). Repeated measures were implemented with the participants evaluating the phone before and after this feedback was given (time of evaluation: t1 or t2). Further, two explorative research questions referred to ownership of the smartphone (participant's own phone vs. phone provided by the experimenter: Samsung Galaxy S5; Android 6.0.1) and gender of the voice the phone was talking with (female vs. male).
3.4 Measures Self-reports are prone to response biases, e.g. conscious manipulations or social desirability (Van de Mortel, 2008). This is particularly challenging when it comes to evaluating smartphones regarding characteristics that are exclusive for humans. First, smartphones obviously are technological devices and nonhuman, second, Reeves and Nass (1996) argued that media equation is an unconscious
ACCEPTED MANUSCRIPT phenomenon. To aim for a more automatic reaction less influenced by conscious rationalizations we used a forced yes-no-task and applied time pressure. Adjectives were displayed on the computer screen for a maximum of 2 seconds and participants had to decide whether this adjective fits the smartphone they had interacted with (0 = no, 1 = yes). Adjectives were derived from the 12-item-scale to assess the “valence of a tutoring-computer” by Nass et al. (1997; 1999). Following principles of content analysis, three independent raters distinguished two dimensions with six items each: “competence” (competent, informative, helpful, analytical, knowledgeable, useful) and “friendliness” (likeable, friendly, warm, enjoyable, fun, polite). Both scales exhibited acceptable reliability with Cronbach’s Alpha = .74 for “friendliness” and .75 for “competence” (Bland & Altman, 1997).
4. Results 4.1 Preliminary analysis: gender of the voice, ownership of the smartphone In order to ensure that ownership and gender do not have an impact on the participants before the manipulation of interest (politeness of feedback) was carried out, their effects on evaluations of friendliness and competence were tested at t1. As with the pretesting of voices reported before, we tested for equivalence with alpha-level set to 0.2 to minimize Error Type II. As expected, none of the t-tests showed any significant result. Evaluations of friendliness did not differ significantly (t(83)= .22, p = .83, 95% CI [-.12, .15], d = .05) between participants who had interacted with their own phone (M = .49, SD = .30) and participants who evaluated a phone that did not belong to them (M = .47, SD = .32). The same pattern was observed for competence (t(83) = -.61, p = .55, 95% CI [-.16, .08], d = .13). Owners of the phone did not evaluate it significantly differently (M = .74, SD = .33) than non-owners (M = .77, SD = .22). There was also no significant effect of gender of the phone, neither on friendliness (t(83)= .79, p = .43, 95% CI [-.08, .19], d = .02) nor on competence (t(83)= -.23, p = .82, 95% CI [-.13, .10], d = .05). The descriptive results underlined the expected results: “female phone voices” (friendliness: M = .45, SD = .30; competence: M = .76, SD = .26) and “male phone voices” (friendliness: M = .51, SD = .32; competence: M = .75, SD = .30) were not evaluated significantly differently.
4.2 Main analysis To test hypotheses 1 to 3 a mixed ANOVA with a within-subject-factor (time of evaluation) and a between-subject-factor (type of feedback) regarding dependent measures (friendliness, competence) was conducted. Results are reported stepwise following the hypotheses. Table 1 reports the means and standard deviations by condition.
ACCEPTED MANUSCRIPT Hypothesis 1 focused on the differences between polite and impolite phones after they had given feedback (t2). Evaluations of friendliness (H1a) and competence (H1b) were predicted to differ after the participants had received a polite or impolite feedback from the smartphone. Regarding hypothesis 1a, ANOVA revealed a significant interaction between time of evaluation and type of feedback, F(1,83) = 38.87, p < .001, η² = .32 (see Figure 1). According to guidelines by Cohen (1988), an effect size of η² = .32 is considered a large effect. Simple effects analysis revealed the groups to differ significantly after the feedback (t2: F(1,83) = 26.04, p < .001). As hypothesized, polite feedback results in higher scores of friendliness (M = .57, SD = .30) than impolite feedback (M = .26, SD = .27) at second measure time. Consequently, data support hypothesis 1a: impolite smartphones were assessed to be significantly less friendly than polite smartphones.
Figure. 1. Interaction between the factors ‘time of evaluation’ and ‘type of feedback’ for ‘friendliness’.
ACCEPTED MANUSCRIPT Hypothesis 1b referred to the evaluation of competence. Again, ANOVA revealed a significant interaction between time of evaluation and type of feedback, F(1,83) = 6.70, p = .02, η² = .07 (depicted in Figure 2) whereby polite phones (M = .80, SD = .27) were evaluated to be more competent than impolite phones (M = .59, SD = .32) at second measure time. Simple effect analysis showed significant differences after the participants had received polite/impolite feedback (F(1,83) = 11.03, p < .001). According to Cohen (1988) the effect size of η² = .07 is considered medium. Data support hypothesis 1b: “impolite smartphones” were assessed to be significantly less competent than “polite smartphones”.
Figure. 2. Interaction between the factors ‘time of evaluation’ and ‘type of feedback’ for the scale ‘competence’. Table 1 Means (and standard deviations) for evaluations of friendliness and competence for “impolite phones”, “polite phones” and averaged for first (t1) and second (t2) measurement time.
ACCEPTED MANUSCRIPT “impolite phone”
“polite phone”
Total
Measure
t1
t2
t1
t2
t1
t2
Friendliness
.47(.30)
.26(.27)
.50(.32)
.57(.30)
.48(.31)
.42(.32)
Competence
.71(.28)
.59(.32)
.80(.27)
.80(.27)
.75(.28)
.70(.31)
Note. Scores ranged from 0 to 1; t1.
While hypothesis 1 referred to inter-group differences of evaluations after the feedback (t2), hypotheses 2 and 3 focused on intra-group differences of the evaluations before and after the feedback (within group, t1 vs. t2). Proceeding from the significant interactions between the two factors ‘time of evaluation’ and ‘type of feedback’, simple effects analyses were conducted. Hypothesis 2 referred to friendliness (H2a) and competence (H2b) of “polite phones”. Regarding H2a (Figure 1), simple effect analysis showed a significant difference before and after the feedback (F(1,83) = 6.00, p = .02). In contrast, evaluations of competence (H2b, Figure 2) were found not to be significantly affected by politeness of the smartphone (F(1,83) = .00, p = 1.00). As a result, hypotheses 2a and 2b are to be accepted: Polite feedback by a smartphone affected the evaluation of friendliness but not the evaluation of competence. Hypothesis 3 referred to impolite phones which were predicted to be devalued regarding both friendliness (H3a) and competence (H3b). Again simple effect analysis was conducted revealing significant differences for both evaluations of friendliness (before feedback: M = .47, SD = .30; after feedback: M = .26, SD = .27), with F(1,83) = 40.26, p < .001, as well as evaluations of competence (before feedback: M = .71, SD = .28; after feedback: M = .59, SD = .32), with F(1,83) = 13.25, p < .001. Supporting hypotheses, an impolite feedback deteriorated the assessment of friendliness and competence of a smartphone. Finally, two research questions focused on evaluation effects as a consequence of both smartphone ownership and gender of the phone voice over time. Preliminary analysis had already focused on evaluations at the first time of measurement with smartphones of both conditions “acting” exactly in the same way until that point in time. Results showed the conditions (female vs. male voice; own phone vs. not own phone) to not differ significantly. Research questions take up the issue of ownership (RQ1) and gender of the voice (RQ2), incorporating experimental manipulation of politeness. Regarding RQ1, two two-way ANOVAs with the fixed factors ‘type of feedback’ and ‘ownership of smartphone’ did not show significant interactions for t2, neither regarding friendliness (F(1,83) = .15, p = .70, η² = .002) nor competence (F(1,83) = .13, p = .72, η² = .002). Likewise, a main effect of ownership was also not significant regarding both friendliness (F(1,83) = .01, p = .93, η² < .001) and competence
ACCEPTED MANUSCRIPT (F(1,83) = .09, p = .77, η² = .001). Consequently, interacting with one’s own phone in contrast to a phone received from the experimenter did not affect the evaluation significantly. The second research question asked for differences in phone evaluations as a consequence of the gender of the voice the phones were “talking” with. Again, two two-way ANOVAs with the fixed factors ‘type of feedback’ and ‘gender of smartphone’ were conducted. Regarding friendliness, neither the interaction was significant (F(1,83) = .17, p = .68, η² = .002) nor the main effect of gender (F(1,83) = .17, p = .68, η² = .002). Regarding competence, ANOVA showed a significant interaction (F(1,83) = 5.05, p = .03, η² = .06) with simple effect analysis revealing a significant group difference for male phones (F(1,83) = 16.30, p < .001). The polite male phone (M = .82, SD = .26) was rated to be more competent than the impolite male phone (M = .47, SD = .33). However, this effect could not be shown for female phones (F(1,83) = .69, p = .41), as the polite female phone (M = .79, SD = .28) was not rated significantly more competent than the impolite female phone (M = .71, SD = .26). Regarding competence, for impolite phones there was a significant difference between the impolite female phone and the impolite male phone (F(1,83) = 7.77, p = .01). To conclude, gender of the phone affects the evaluation depending on the type of feedback. In addition to this, the participants’ gender did not significantly affect the evaluations of the polite or impolite phones (friendliness: F(1,83) = .05, p = .83; competence: F(1,83) = 2.05, p = .16).
5. General Discussion The main goal of this study was to contribute to the analysis of human-smartphone interaction. By transferring the basic principles of the paradigm “computers as social actors” (CASA) to smartphones we broadly asked if phones are recognized as “social actors” initiating the adoption of social norms and therefore justifying the idea of smartphones as social actors (SASA). Regarding the variety of features and functions resulting in many-faceted user behavior and gratifications, this study considers smartphones to differ significantly from traditional personal computers CASA research has analyzed. Consequently, findings concerning desktop PCs cannot be simply transferred to smartphones but need to be explored. Further, smartphones are today’s most popular devices which accompany their owners throughout the day. In contrast to other technologies which are rather discussed regarding future everyday life (e.g. social robots) smartphones are ubiquitous indicating practical implications, just now. First, CASA studies analyzing smartphones yielded inconsistent results requiring further research. Following Fogg and Nass (1997b), our study focused on the social norm of politeness. However, we replaced desktop computers with smartphones and tested for effects of a face threatening act conducted by a phone, with participants receiving polite or impolite feedback from that phone. To do so, participants interacted with a phone which they evaluated twice – before and after the phone had feedbacked them, either politely or impolitely. Results revealed an effect of feedback. An impolite phone
ACCEPTED MANUSCRIPT was devalued regarding both friendliness and competence. In contrast, a smartphone “replying” politely was revaluated in terms of friendliness but yielded no significant improvement in terms of competence. Explorative analysis revealed that evaluations conducted by participants’ rating their own phones and evaluations of phones given by the experimenter did not differ significantly, neither before nor after the feedback was given (RQ1). Considering the time-consuming use of one’s smartphone, the brief experimental manipulation (feedback) affecting the evaluation of this well-known device is rather surprising and worth to reflect. The manipulation affecting the evaluation of foreign devices the participants had never used before might be comprehensible. However, reassessing one’s own, familiar device because of a brief statement it “gives” refers to a sensitivity developer, practitioners as well as researchers should be aware of. Apart from the valence, the feedback of an application installed on the phone might affect the assessment of the device itself. Perhaps, fulfilments of certain tasks at the phone might elicit similar effects. Consequently, affirmative feedback could be implemented to upgrade a review of a technological device (e.g. computer, smartphone, smart assistant) or a process (e.g., software, user experience), on the one hand. On the other hand, critical feedback should be handled with care. Future research will need to clarify if these effects can be transferred to assessment procedures, in general. Will reviews of a product bought, or a service used be more positive, if the reviewer is feedbacked favorably, shortly before? From a methodological perspective, RQ1 refers to a basic question of transferring the CASA paradigm to smartphones: Is it possible to employ phones which are not owned by the participants? Back in the 1990ies, the use of a given experimental desktop PCs was easier to argue than the use of a foreign smartphone. Smartphones are normally used exclusively by one owner. However, results of RQ1 indicate two possible ways: one’s own device as well as a device provided by the researcher. RQ2 concentrated on the voices of the smartphones to test for potential gender (Lee et al., 2000). Using “talking” smartphones results in an ascription of gender, inevitably (Singh & Murry, 1978). The female and the male voice were pretested to be equally friendly. A test of equivalence supported equivalence prior to the feedback - regarding both friendliness and competence. The experimental manipulation changed results to some extent. The gender of the voice the phone answered in did not affect participants’ evaluations of friendliness but the evaluation of competence. Polite male phones were rated to be more competent than impolite male phones and impolite female phones were rated to be more competent than impolite male phones. The interaction between type of feedback and gender emphasizes the importance of gender, again. In line with CASA paradigm, “speaking” technology seems to elicit gender ascriptions resulting in gender stereotypes developers should be aware of. To conclude, “male phone” seem to be especially vulnerable to the effects of impoliteness resulting in male voiced technological assistants to be implemented carefully. To address results and interpretations in depth, our first hypothesis focused on the differences between participants’ evaluations depending on whether they had received polite or impolite feedback from the phone. Supporting H1a (friendliness) and H1b (competence), the analysis revealed significant
ACCEPTED MANUSCRIPT differences at the second evaluation time with worse evaluations of impolite phones. In line with social norms of human interactions, an impolite smartphone is sanctioned. However, descriptive analysis reveals impolite and polite phones to be not evaluated equivalently already before manipulation took place: Although the competence of the impolite phone is statistically not significantly lower, equivalence cannot be assumed (MD = .10, p = .11). This needs to be considered when interpreting the significant difference after the manipulation although we assume the slight difference to be rather random as manipulation had not taken place yet. Future research needs to clarify this effect. Hypotheses 2 and 3 asked for longitudinal differences. Data analysis supports hypothesis 2. The phone was rated to be friendlier after it “replied” politely indicating a positive politeness strategy (Brown & Levinson, 1978). Moreover, results reveal that the evaluation is affected domain-specifically. Supporting hypothesis 2b, friendliness but not competence is affected by politeness: mean evaluations of competence are nearly the same for ratings before and after the polite feedback indicating that competence rating might be taskoriented and not affected by the smartphone’s politeness, therefore. In line with human interactions, politeness seems to be fundamental for social relationships but not for task-related situations (Brown & Levinson, 1978). Thus, a politeness strategy does not result in a general improvement of evaluations but is limited to relevant domains. The third hypothesis focused on the participants receiving impolite feedback. Again, data support our prediction. Impoliteness of the phone results in users devaluing both friendliness (H3a) and competence (H3b). Because a direct sanction of the phone’s face threatening act was hardly possible during the interaction task, participants seem to seize the opportunity to give a worse feedback, at the end of the session (Blake & Davis, 1964; Brown & Levinson, 1978). In contrast to the effects of a polite feedback (H2), the phone was devalued in terms of both friendliness and competence. Similar to human-human interactions, violating social norms seems to elicit negative emotional reactions (Goffman, 1967) and offence (Culpeper, 1996) resulting in more general negative sentiments towards the perpetrator. Regarding fields of application, this result is highly relevant. While the positive effects of polite behavior are limited to associated domains, the reactions to impolite phones indicate devastating consequences. Real usage scenarios often provide delayed options to feedback at the end of the interaction, which users might use to “take revenge”. And this revenge might be more general. Interactive applications or devices which feedback their users should be designed carefully to avoid fierce opposition elicited devices allegedly violating social norms.
5.1 Limitations and Future Research While this study provides promising insights into the idea of conceptualizing smartphones as social actors, there are limitations as well as questions remaining unanswered regarding both methodological as well as theoretical considerations. Starting with methodology, the part of the experiment in which participant interacted with the phone needs to be reconsidered. First, our cover story told participants they were going to take part in an experiment on a search algorithm. In line with this story, participants were requested by the smartphone to disclose personal data. The level of privacy of this data increased
ACCEPTED MANUSCRIPT over time, starting with age and height but becoming more intimate by being asked for the total amount of sexual partners and their banking PIN. Disclosing this intimate data might be associated with embarrassment as well as a potential security risk. Participants sensitive to these issues might mistrust and potentially react reluctantly which in turn may result in biased evaluations of the smartphone. Future research should consider a less offensive cover story. Second, participants were addressed by the phones via voice output. However, they could not respond by language but needed to enter their answers via touchscreen. Future research should focus on the integration of voice input on the part of the respondents to increase usability. Voice input would prevent a change of communication channel resulting in a more intuitive interaction: answering vocal signals by sending vocal signals. A consistent modality would be a further step towards the idea of bringing CASA to the world of modern technology as both current applications and current hardware releases are already capable of speech recognition potentially reinforcing effects of media equation. Consequently, voice-controlled intelligent personal assistant services like Amazon Echo or Google Home constitute interesting research objects for modern CASA research. Third, the feedback the smartphone gave was not tailored to the participants’ answers but was held constant to increase comparability. However, participants who had performed well might but received a negative feedback might devalue the phone because of the non-matching feedback. Future research should test tailored feedback. Fourth, the measures implemented in this study (and implemented in CASA research, in general) should be evaluated critically. Both friendliness and competence were assessed by adopting the items introduced by Nass et al. (1997; 1999) which we transferred into a forced yes-no-task with applied time pressure to minimize both conscious manipulations and the influence of social desirability, which standard self-report measures are prone to. However, being forced to choose between only these two options might not meet the requirements of a fine adjustment. Future research should focus in the development of new measures to gain more valid insights into the participants’ assessment of the “characteristics” of devices. Similar, future studies will need to clarify what participants exactly evaluated. The items of our measures referred to “the smartphone”, explicitly. Thus, we conclude that the participants referred to “the smartphone” they had just interacted with when answering the items. However, this conclusion drawn is not a compelling evidence. Especially the group of participants devaluing their own phone because of a brief manipulation raises questions. Future studies will need to differentiate point of references (device, application, developer) by comparing different measures and different. Furthermore, studies might involve the assessment of physiological components, which are interpreted as non-reactive and therefore less manipulable (Kluger, Lewinsohn, & Aiello, 1994). Moreover, Nass et al. (1999) demonstrated that evaluations would be better if the computer itself asked for it. Thus, future research will need to manipulate the place of evaluation. Finally, students seem to be an appropriate starting point for research on smartphones because they represent the age group using the device most intensively. In consequence, they were suitable participants of a first study. However, they might be particularly skeptical regarding the experimental
ACCEPTED MANUSCRIPT scenario with “talking” phones. Future studies should focus on less experienced participants. Further, sample size and statistical power need to be increased. Regarding the theoretical background, the idea of politeness as well as the basic idea of CASA need to be reconsidered. Starting with politeness, this study mainly refers to the concept introduced by Brown and Levinson (1978) which is rather a linguistic tool to strategically shape social interaction resulting in avoiding or at least attenuating conflicts (Eelen, 2001; Ide, 1989). Although claiming their theory to be culturally universal, universality is questioned, and cultural sensitivity is assumed (Ide, 1989). Consequently, to generalize our findings the concept of politeness we refer to needs to be reassessed considering possible intercultural differences (Ide, 1989; Hill, Ide, Ikuta, Kawasaki & Ogino, 1986). Further, the experimental manipulation was rather minimal. The experimental conditions only differed regarding four sentences of feedback. Additionally, intensity of impoliteness of the “impolite phone” was not very drastic with the impolite condition being rather “not polite” than explicitly “impolite” (“Unfortunately I cannot thank you for your cooperation. You did not take enough time to support me. I do not appreciate your support and your feedback.”). Hence, the task participants accomplished was not of real relevance for them. Future research should focus on more realistic task, e.g. health apps or fitness apps which feedback real performances. In addition, the phenomenon of media equation is not limited to politeness. The CASA paradigm has been adopted to a variety of social norms offering a fruitful approach for further research on human-smartphone interaction. Consequently, future work can draw on a body of research approaches known from CASA studies, e.g. the analysis of gender stereotypical behavior (Nass et al., 1997), affiliation with a team (Nass et al., 1996) or reciprocal behavior (Fogg & Nass, 1997a). However, these CASA studies revealed contradictory results remaining open questions which future research should refer to. Hence, the basic idea of computers constituting “social actors” needs to be refined. Kiesler and Sproull (1997) refuse the conclusion drawn by Nass and his colleagues and argue, it may be possible that users do not actually consider a device to be a social actor, but still follow social rules when interacting with this device. In line with the common phrase “if it walks like a duck and talks like a duck, we are going to treat it like a duck”, Kiesler and Sproull reason that participant behavior may rather be a consequence of the information they have in the specific experimental setting. In this view, participant behavior could not be regarded as the result of an ascription of humanity to the device, thus rejecting the idea of the “social actors” concept. The evolutionary perspective we introduced here might serve to integrate these two conflicting positions. Evolutionary psychology considers the human body, brain, and extensive parts of human behavior as a product of evolved adaptations. These adaptations enabled our ancestors to solve adaptive problems in their specific environment. The human species evolved to live in social groups, so detection of other human beings and production of appropriate social behavior was important. This resulted in evolved psychological mechanisms which allowed for automatic - and therefore rapid - detection of and reaction towards other human beings. These mechanisms still shape today’s human behavior. However, they were shaped to fit an ancient world
ACCEPTED MANUSCRIPT without today’s new media and technology. Therefore, seemingly social cues sent by today’s technological devices can elicit automatic social reactions that were originally designed for human interaction. Thus, social reactions towards computers do not depend on a conscious assumption of a “social actor”, and we share Kiesler and Sproull’s (1997) cautious position on deducing conscious attributions of humanity from the existence of social reactions. However, social reactions towards computers are produced by evolved psychological mechanisms for social interaction with human conspecifics and are triggered rather unconsciously. Within the automatic functioning of our evolved mechanisms, computers, smartphones, and digital assistants assume the roles of our ancestors’ human counterparts and might - with all due care - be regarded as “social actors”. This evolutionary perspective then rises questions about how information from automatic, evolved functioning may in turn inform conscious behavior, attributions and interaction with technology.
6. Conclusion Social psychologists have extensively studied the effects of politeness within human-human interactions. Studies in the field of human-computer interaction have drawn on this body of research. The present study aims for moving the idea forward and meeting the requirements of modern technology. Thus, we argue to focus on human-smartphone interaction to analyze smartphones as social actors (SASA). Our findings support the SASA paradigm in two ways: First, applying the methodological approach of CASA studies established for desktop PCs to modern smartphones does work. Second, first empirical evidence has been found that phones are considered social actors eliciting reactions according to social norms: smartphones sending social cues trigger reactions comparable to human interaction partners. It seems as if we (unconsciously) react socially regarding attributions and responses. In sum, we interpret our findings as a first step towards the confirmation of our idea of the SASA paradigm. Consequently, our findings are worth to be considered in terms of designing humanmachine interaction. It takes no more than simple vocal cues to elicit social responses. Designers should be aware of the human sensitivity in terms of cues which might be interpreted as social. On the one hand designers might avoid certain (negative) reactions to their product. On the other hand, they could design their product to trigger certain (desired) reactions. Furthermore, our study calls for future research from various scientific perspectives. As our approach did only need basic skills in programming, researchers from various disciplines can implement phones as social actors and therefore carrying out studies efficiently. In addition, other technologies such as robots and speech assistant which currently capture the market are to be focused on. Coming to our introductory example of mobile devices or apps giving a feedback on users’ performance e.g. regarding health activities, this feedback needs to be carefully phrased. A negative feedback “communicated” by the smartphone might result in a devaluation not only of the app but also of the smartphone itself. Our findings would suggest to complement a critical review with e.g.
ACCEPTED MANUSCRIPT encouraging or motivating statements to lessen possible negative effects. By transferring a research approach known from the 1990ies into a world more and more digitalized with devices becoming mobile and talkative we showed the resulting SASA paradigm to be a promising approach and heuristically fruitful to explore the psychological effects of modern, mobile technology.
References Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of personality and social psychology, 70(3), 614-636. doi:10.1037//0022-3514.70.3.614 Blake, J., & Davis, K. (1964). Norms, Values, and Sanctions. Handbook of Modern Sociology, 456– 484. Bland, J. M., & Altman, D. G. (1997). Statistics notes: Cronbach's alpha. BMJ, 314(7080), 572. doi:10.1136/bmj.314.7080.572 Brown, P., & Levinson, S. C. (1978). Universals in language usage: Politeness phenomena. Questions and politeness: strategies in social interaction, 56–311. Brown, B. L., Strong, W. J., & Rencher, A. C. (1973). Perceptions of personality from speech: Effects of manipulations of acoustical parameters. The Journal of the Acoustical Society of America, 54(1), 29-35. Buss, D. (2015). Evolutionary psychology: The new science of the mind (5th ed.). New York, NY: Routledge. Carolus, A., Binder, J. F., Muench, R., Schmidt, C., Schneider, F., & Buglass, S. (in press). Smartphones as digital companions: Characterizing the relationship between users and their phones. New Media & Society. Carolus, A., Schmidt, C., Muench, R., Mayer, L., & Schneider, F. (2018). Pink stinks - at least for men. How minimal gender cues affect the evaluation of smartphones. In M. Kurosu (Eds.), Human-Computer Interaction. Interaction in Context. HCI 2018. Lecture Notes in Computer Science, vol 10902 (pp.512-525). Springer, Cham. doi:10.1007/978-3-319-91244-8_40 Carolus, A., Schmidt, C., Schneider, F., Mayr, J., & Münch, R. (2018). Are people polite to smartphones? How evaluations of smartphones depend on who is asking. In M. Kurosu (Eds.), Human-Computer Interaction. Interaction in Context. HCI 2018. Lecture Notes in Computer Science, vol 10902 (pp.500-511). Springer, Cham. doi:10.1007/978-3-319-91244-8_39 Cialdini, R., & Trost, M. (1998). Social influence: Social norms, conformity and compliance. In D. T. Gilbert, S. T. Fiske & G. Lindzey (Eds.), The Handbook of Social Psychology (4th ed., pp. 151192). Boston: McGraw-Hill. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale: Lawrence Erlbaum Associates.
ACCEPTED MANUSCRIPT Culpeper, J. (1996). Towards an anatomy of impoliteness. Journal of Pragmatics, 25(3), 349–367. doi:10.1016/0378-2166(95)00014-3 Dixon, W. J., & Massey Frank, J. (1950). Introduction To Statistical Analysis. New York: McGrawHill Book Company. Eelen, G. (2001). A Critique of Politeness Theories. Manchester: St. Jerome Publishing. Finkel, S. E., Guterbock, T. M., Borg, M. J.: Race-of-interviewer effects in a preelection poll Virginia 1989. Public opinion quarterly, 55(3), 313-330 (1991). Fogg, B. J., & Nass, C. (1997a). How users reciprocate to computers : an experiment that demonstrates behavior change. In Extended Abstracts of the CHI’97 Conference of the ACM/SIGCHI (pp. 331– 332). New York: ACM Press. doi:10.1145/1120212.1120419 Fogg, B. J., & Nass, C. (1997b). Silicon sycophants: the effects of computers that flatter. International Journal of Human-Computer Studies, 46(5), 551–561. doi:10.1006/ijhc.1996.0104 Gehl, R.W., & Bakardjieva, M. (2016). Socialbots and Their Friends: Digital Media and the Automation of Sociality. New York, London: Routledge. Goffman, E. (1967). Interaction ritual: Essays on face-to-face behavior. New York: Anchor Books. Goldstein, M., Alsiö, G., & Werdenhoff, J. (2002). The media equation does not always apply: People are not polite towards small computers. Personal and Ubiquitous Computing, 6(2), 87-96. doi:10.1007/s007790200008 Grice, H. P. (1975). Logic and Conversation. In P. Cole and J. Morgan (Eds.), Syntax and Semantics, (pp. 41-58). New York: Academic Press. Hagen, E. (2002). The evolutionary psychology. Retrieved from http://www.anth.ucsb.edu/projects/ human/epfaq/ep.htm Hechter, M., & Opp, K. D. (2001). Social norms. New York: Russell Sage Foundation. Hill, B., Ide, S., Ikuta, S., Kawasaki, A., & Ogino, T. (1986). Universals of linguistic politeness: Quantitative evidence from Japanese and American English. Journal of pragmatics, 10(3), 347371. Ide, S. (1989). Formal forms and discernment: Two neglected aspects of universals of linguistic politeness. Multilingua-journal of cross-cultural and interlanguage communication, 8(2-3), 223248. Johnson, D., & Gardner, J. (2007). The media equation and team formation: Further evidence for experience as a moderator. International Journal of Human-Computer Studies, 65(2), 111-124. doi:10.1016/j.ijhcs.2006.08.007c Johnson, D. & Gardner J. (2009). Exploring mindlessness as an explanation for the media equation: a study of stereotyping in computer tutorials. Journal of Personal and Ubiquitous Computing, 13, 151-163. doi: 10.1007/s00779-007-0193-9
ACCEPTED MANUSCRIPT Johnson, D., Gardner, J., & Wiles, J. (2004). Experience as a moderator of the media equation: the impact of flattery and praise. International Journal of Human Computer Studies, 61(3), 237–258. doi: 10.1016/j.ijhcs.2003.12.008 Kamm, C., Walker, M., & Rabiner, L. R. (1997). The Role of Speech Processing in Human-Computer Intelligent Communication. Speech Communication, 23, 263–278. doi:10.1016/s01676393(97)00059-9 Kim, K. J. (2014). Can smartphones be specialists? Effects of specialization in mobile advertising. Telematics and Informatics, 31(4), 640-647. doi:10.1016/j.tele.2013.12.003 Kiesler, S. & Sproull, L. (1997). “Social" Human-Computer Interaction. In B. Friedman (Ed.), Human Values And The Design of Computer Technology (pp. 191-199). Cambridge: University Press. Kluger, A. N., Lewinsohn, S., & Aiello, J. R. (1994). The influence of feedback on mood: Linear effects on pleasantness and curvilinear effects on arousal. Organizational Behavior and Human Decision Processes, 60(2), 276-299. doi:10.1006/obhd.1994.1084 Lang, H., Klepsch, M., Nothdurft, F., Seufert, T., & Minker, W. (2013). Are computers still social actors?. In CHI'13 Extended Abstracts on Human Factors in Computing Systems (pp. 859-864). New York, NY: ACM. doi:10.1145/2468356.2468510 Langer, E. J. (1989). Minding matters: The consequences of mindlessness-mindfulness. Advances in experimental social psychology, 22, 137-173. doi:10.1016/S0065-2601(08)60307-X Lee, E. J., Nass, C., & Brave, S. (2000). Can computer-generated speech have gender?: an experimental test of gender stereotype. In CHI'00 Extended Abstracts on Human Factors in Computing Systems (pp. 289-290). New York: ACM. doi:10.1145/633292.633461 Lee, Y. K., Chang, C. T., Lin, Y., & Cheng, Z. H. (2014). The dark side of smartphone usage: Psychological traits, compulsive behavior and technostress. Computers in Human Behavior, 31, 373-383. doi:10.1016/j.chb.2013.10.047 Locher, M. A., & Bousfield, D. (2008). Introduction: Impoliteness and power in language. In M.A. Locher & D. Bousfield (Eds.), Impoliteness in Language: Studies on its Interplay with Power in Theory and Practice (pp. 1-13). New York, NY: Mouton de Gruyter. Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. London: MIT Press. Moon, Y., & Nass, C. (1996). How “real” are computer personalities? Psychological responses to personality types in human-computer interaction. Communication research, 23(6), 651-674. Mou, Y., & Xu, K. (2017). The media inequality: Comparing the initial human-human and human-AI social interactions. Computers in Human Behavior, 72, 432-440. Nass, C., & Brave, S. (2005). Wired for speech: How voice activates and advances the humancomputer relationship. Cambridge, MA: MIT press. Nass, C., Fogg, B. J., & Moon, Y. (1996). Can computers be teammates?. International Journal of Human-Computer Studies, 45(6), 669-678. doi:10.1006/ijhc.1996.0073
ACCEPTED MANUSCRIPT Nass, C., & Gong, L. (2000). Speech interfaces from an evolutionary perspective. Communications of the ACM, 43(9), 36–43. doi:10.1145/348941.348976 Nass, C., & Lee, K. M. (2001). Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. Journal of Experimental Psychology: Applied, 7(3), 171–181. doi:10.1037//1076-898x.7.3.171 Nass, C., & Moon, Y. (2000). Machines and Mindlessness: Social Responses to Computers. Journal of Social Issues, 56(1), 81–103. doi:10.1111/0022-4537.00153 Nass, C., Moon, Y. M., & Carney, P. (1999). Are people polite to computers? Responses to computerbased interviewing systems. Journal of Applied Social Psychology, 29(5), 1093–1110. doi:10.1111/j.1559-1816.1999.tb00142.x Nass, C., Moon, Y., Fogg, B. J., Reeves, B., & Dryer, D. C. (1995). Can computer personalities be human personalities?. International Journal of Human-Computer Studies, 43(2), 223-239. doi:10.1006/ijhc.1995.1042 Nass, C., Moon, Y., & Green, N. (1997). Are machines gender neutral? Gender‐stereotypic responses to computers with voices. Journal of applied social psychology, 27(10), 864-876. doi:10.1111/j.1559-1816.1997.tb00275.x Nass, C., & Steuer, J. (1993). Voices, boxes, and sources of messages. Human Communication Research, 19(4), 504-527. doi:10.1111/j.1468-2958.1993.tb00311.x Nass, C., Steuer, J., & Tauber, E. R. (1994, April). Computers are social actors. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 72-78). ACM. Pinker, S. (1994). The Language Instinct. New York, NY: Harper Perennial Modern Classics. Reeves, B., & Nass, C. (1996). The Media Equation. How people treat computers, television, and new media like real people and places. CSLI Publications and Cambridge university press. Rempel, J. K., Holmes, J. G., & Zanna, M. P. (1985). Trust in close relationships. Journal of personality and social psychology, 49(1), 95. doi:10.1037//0022-3514.49.1.95 Scherer, K. R., Banse, R., Wallbott, H. G., & Goldbeck, T. (1991). Vocal cues in emotion encoding and decoding. Motivation and emotion, 15(2), 123-148. Retrieved from https://doi.org/10.1007/BF00995674
Shechtman, N., & Horowitz, L. M. (2003, April). Media inequality in conversation: how people behave differently when interacting with computers and people. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 281-288). ACM. Sherif, M. (1936). The psychology of social norms. Journal for the Theory of Social Behaviour, 41, 53–76. Retrieved from https://doi.org/10.1086/217439 Singh, S., & Murry, T. (1978). Multidimensional classification of normal voice qualities. The Journal of the Acoustical Society of America, 64(1), 81-87. Retrieved from http://dx.doi.org/10.1121/1.381958 Slobin, D. I. (1971). Psycholinguistics. Illinois: Scott Foresman.
ACCEPTED MANUSCRIPT Stephan, E., Liberman, N., & Trope, Y. (2010). Politeness and Psychological Distance: A Construal Level Perspective. Journal of Personality and Social Psychology, 98(2), 268–280. http://doi.org/10.1037/a0016960 Stock, J (2017, September 8). The best health apps that put a doctor in your pocket. Retrieved from http://www.telegraph.co.uk/business/open-economy/apps-to-improve-health-and-fitness/ Triandis, H. C. (1994). Culture and social behavior. New York: Mcgraw-Hill Book Company. Van de Mortel, T. F. (2008). Faking it: social desirability response bias in self-report research. Australian Journal of Advanced Nursing, 25(4), 40-48. Retrieved from http://www.ajan.com.au/Vol25/Vol_25-4_vandeMortel.pdf Wang, W. (2017). Smartphones as Social Actors? Social dispositional factors in assessing anthropomorphism. Computers in Human Behavior, 68, 334-344. Retrieved from https://doi.org/10.1016/j.chb.2016.11.022
ACCEPTED MANUSCRIPT
Highlights
results support research paradigm conceptualizing “smartphones as social actors” smartphones elicit reactions according to social norm of politeness phone “talking impolitely” was devalued regarding friendliness and competence phone “talking politely” was revaluated regarding friendliness but not competence gender of the phone impacted the evaluation of “impolite phones”