Audio channel constraints in video-mediated communication

Audio channel constraints in video-mediated communication

Interacting with Computers 16 (2004) 1069–1094 www.elsevier.com/locate/intcom Audio channel constraints in video-mediated communication Alison Sanfor...

270KB Sizes 0 Downloads 56 Views

Interacting with Computers 16 (2004) 1069–1094 www.elsevier.com/locate/intcom

Audio channel constraints in video-mediated communication Alison Sanforda,*, Anne H. Andersonb, Jim Mullinb a

Psychology, University of Strathclyde, 40 George Street, Glasgow G11QE, UK b University of Glasgow, 58 Hillhead Street, Glasgow G128QQ, UK Received 12 February 2004; revised 23 June 2004; accepted 30 June 2004 Available online 7 August 2004

Abstract This study investigated the effects of two types of audio channels upon the effectiveness of task-based interactions in a video-mediated context (VMC). Forty undergraduates completed a collaborative task (The Map Task) using either a full or half-duplex audio channel. Their performance was compared to face-to-face interactions, taken from the Human Communication Research Centre corpus of Map Task Dialogues. Effects of varying the audio channel were explored by comparing task performance, patterns of speech, and establishment of mutual understanding. Users of the full-duplex VMC made insufficient allowance for the VMC context; they completed the task less accurately than face-to-face participants, and interrupted each other more frequently than other participants. Participants in the half-duplex VMC however performed as well as face-to-face participants. They made sensible adaptations to the constraints imposed by the half-duplex VMC context, producing longer dialogues, with more explicit turn-taking management, and taking greater care in establishing mutual knowledge. q 2004 Elsevier B.V. All rights reserved. Keywords: Video-mediated communication; Audio channel configuration; Task performance; Patterns of speech; Discourse analysis; Adaptations

There is an expanding interest in the use of desktop Video Mediated Communication (VMC) for a wide range of applications. For example, VMC is being used in distance education (Fels and Weiss, 2000; Knowles and Dillon, 1996; McAndrew et al., 1996; Robinson, 1993), telemedicine (Furnace et al., 1996; Reiss et al., 1996; Tro¨ster et al., 1995;) and telepsychiatry (Manning et al., 2000) and to facilitate group meetings in the research and business sectors (Carletta et al., 2000; * Tel.: C44-141-548-2696; fax: C44-141-548-4001. E-mail address: [email protected] (A. Sanford). 0953-5438/$ - see front matter q 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.intcom.2004.06.015

1070

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

O’Conaill et al., 1993; O’Conaill and Whittaker, 1997). Much of the research in this area has focused upon the visual channel of communication, with some debate occurring over the advantages of adding a visual channel in mediated communication (e.g. Daly-Jones et al., 1998; Finn, 1997). The effects of the configuration and quality of the audio channel available in VMC has attracted considerably less attention, this is surprising given earlier findings (see for example, O’Malley et al., 1996; Tang and Isaacs, 1993; Williams, 1977), who found that the quality of the audio channel can have a profound effect upon communication. In the present study we seek to establish the effects of two different types of audio channels (half-duplex and full duplex) upon task-based interactions VMC, by varying the type of audio channel made available to participants whilst holding the quality of the visual channel constant. We will determine whether changes in the audio channel affect the process of communication by examining the quality of task performance, turn management and structure of the dialogues, and processes involved with establishing mutual understanding. Comparisons will be made with data collected from face-face interactions. Analysis of this range of factors will enable us to determine the effectiveness of the task-based interactions, and any cognitive or social costs incurred due to changes in the audio signal. Several points emerge from earlier literature. The first point relates to task performance. Early research has shown that communication in computer or technology-mediated settings rarely has a detrimental affect upon collaborative task performance (e.g. Anderson et al., 1996; Cohen, 1984; Doherty-Sneddon et al., 1997; Ochsman and Chapanis, 1974; Williams, 1977). Several exceptions to this general finding have been reported. O’Malley et al. (1996) found that task performance in a collaborative problem solving task (The Map Task, Brown et al., 1984) was reduced when participants completed the task over videophones. O’Malley et al., suggest that poorer task performance was associated with an increased rate of interruption, which occurred in videophone conversations because of the ‘delay’ in transmission of the audio signals (the reasons for which are explained below). A reduction in task performance has also been noticed when novice users of text-based conferencing systems interact with each other (e.g. McGrath et al., 1993; Newlands et al., 2003), however performance improved as the novices gained experience of the system. The effect of availability and quality of audio signals upon interpersonal communication requires investigation. The second point arises from research into the impact of VMC. The audio channel in many VMC systems does not provide the quality of audio we are accustomed to when speaking over the telephone or in face-to-face interactions. Two problems illustrate this point clearly; transmission lag and the use offull or half-duplex audio channels. Problems with transmission lag (visual signals being received slightly after the audio signals) occur because it takes more time to digitise and compress a video image for transmission than an audio signal (Angiolillo et al., 1997). It is difficult, therefore, to keep the two sets of signals together when they are transmitted over the network. Whilst it is possible to build-in a delay in the audio channel, so that the signals arrive at approximately the same time, it is difficult to get the synchronisation perfect. Whether a VMC system can support a full duplex (or ‘open channel’) or half-duplex audio depends on the type of network being used. Open channel audio would often be preferred, but it can be problematic due to the effects of ’echo’ and ’feedback’ (O’Conaill et al., 1993). These problems can be overcome by using uni-directional microphones, and by ’dampening’ the rooms in which the video-conferencing occurs. One alternative is to use halfduplex audio tools, transmitting only one voice at a time. Switching between speakers can be controlled by a ‘voice activated’ mechanism (the first person to talk gains control of the audio

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1071

channel), or it can be controlled manually using a ‘click to speak’ audio tool. The latter is often preferred because voice-activated switching can be disrupted by back-ground noise, which can result in the audio channel being switched from the current speaker before that person has completed their turn (O’Conaill et al., 1993). In summary, the audio channel in VMC frequently differs from more familiar communicative settings, due to delay in the audio signal, audio and visual signals being out of synch with each other, and the use of click to speak audio tools. These factors can affect the quality of the audio link between participants. The third point emerging from the literature is that the effects of varying the quality and availability of a visual channel has attracted much attention, but systematic comparisons of variations in the audio channel are rare. With regards to the visual channel, there is a growing consensus of opinion that participants prefer to use a mediated context that allows them to see who they are talking to, perhaps because provision of a visual channel enhances feelings of ‘social presence’ (Anderson et al., 1997; Daly-Jones et al., 1998; Gale, 1990; Olson et al., 1997; Sellen, 1995). Further research (Anderson et al., 1996, 1997; Daly-Jones et al., 1998; O’Conaill et al., 1993; O’Conaill and Whittaker, 1997; Sellen, 1992, 1995) has examined the impact of the visual channel upon patterns of speech in VMC. For example, Sellen (1992, 1995) compared patterns of speech in three multiparty video-conferencing systems (Picture-in-Picture (PIP), Hydra and LiveWire) with face-to-face and audio conferencing. Each system provided good quality visual signals, but they differed in the way in which they displayed the videos of the participants. Sellen found very few effects of communicative context; length and frequency of turns were similar across all VMC contexts, and did not vary significantly from face-to-face or audioonly interactions. However, group discussions in the Hydra and PIP settings were more formal, and contained fewer interruptions than face-face discussions. In a similar manner, O’Conaill et al. (1993) and O’Conaill and Whittaker (1997) evaluated the effects of two VMC systems that provided either good or low quality audio and visual signals. The first VMC system (ISDN) provided poor quality video and audio signal; the temporal resolution of the video signal was low, the audio channel was half-duplex and was not in synch with the visual signals. The second VMC system (LIVE-NET) provided full duplex audio and broadcast quality video with negligible transmission delay. Overall, the findings by O’Conaill et al. (1993) and O’Conaill and Whittaker (1997) show that patterns of speech were more disrupted in ISDN than LIVE-NET meetings. ISDN discussions were more formal, turns of speech were longer and they contained very fewer interruptions or backchannels. When speech did overlap (usually due to simultaneous starts) it was due to lag in the transmission of the signals; speakers tried to avoid this by making greater use of explicit forms of turn management (for instance, naming the next speaker). Discussions over the LIVE-NET system more closely resembled face-face interactions, but still contained fewer interruptions and more explicit turn management, suggesting a more formal style of interaction despite better quality of visual and audio signals. However, it is difficult to know whether this effect is due to the lag in transmission of signals, to the half-duplex audio channel, to the impoverished video images, or indeed to the joint effect of all of these features of the ISDN system. O’Conaill et al. (1993) suggest that further laboratory based research is required to tease apart the relative impact of low quality video and audio signals upon VMC collaborations. The fourth point concerns the processes involved in establishing mutual understanding, or ‘grounding’ (Clark and Wilkes-Gibbs, 1986; Isaacs and Clark, 1987; Clark

1072

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

and Schaefer, 1987, 1989). The process of grounding ensures that each participant has understood previous contributions to the conversation to a level sufficient for their current purpose, and is a crucial factor ensuring effective communication. Clark and Brennan (1991) propose that the way in which grounding is achieved will be determined by the purpose of the conversation. More importantly for this study, the process of grounding also changes with the medium of communication. Clark and Brennan (1991) suggest that different communication contexts can be described in terms of the grounding constraints that they afford; the process of grounding in VMC should be facilitated by the grounding constraints of visibility, audibility, cotemporality, and sequentiality. That is, participants will be able to make use of the visual and spoken channels to ensure that they have understood each other. The quality of the video and audio links provided by a VMC system may restrict the use of these grounding constraints. Changes in patterns of speech found in VMC could change the process of grounding, or make it more difficult to complete successfully. This point is supported by Isaacs and Tang (1994), who found that even small transmission lags disrupted the process of grounding, and reduced levels of user satisfaction. The impact of these factors on task performance and communicative effectiveness still needs to be examined. With these points in mind, this study will examine the effect of using two differently configured audio channels (full duplex and half-duplex) to determine whether restraining access to the spoken channel has any impact on establishing effective communication during a collaborative task. Pairs of participants completed one version of the Map Task (Brown et al., 1984) whilst communicating remotely via a desktop video-mediated conferencing system, which provided video signals of moderate quality and either full or half-duplex audio. The Map Task was chosen because it affords an objective measure of the quality of the interactions between participants; a low task accuracy score indicates that participants communicated and collaborated in an effective manner. Other measures were included so that the effectiveness of turn-taking procedures could be ascertained. Following the work by O’Conaill and Whittaker (1997) and Sellen (1992, 1995), this analysis was based the length of the dialogues, the number of turns of speech, the length of turns, and rate of interruptions. In addition, the way in which mutual understanding was established in each communicative context was analysed via Conversational Games Analysis (Kowtko et al., 1992). This analysis examines a range of communicative strategies used by participants as they try to establish that they have understood each other’s contributions to the dialogue. Comparisons will also be made between the VMC interactions and dialogues taken from the Human Communications Research Centre (HCRC) corpus of Map Task dialogues. The HCRC dialogues took place in a face-to-face context, and provide a base-line set of measures against which the effectiveness of the communication and collaboration in the two VMC contexts can be evaluated. Conversational Games Analysis (CGA) provides a framework for looking at the pragmatic function and content of utterances, and how they are used to achieve Conversational Goals. It is derived from Artificial Intelligence models of communication (e.g. Houghton and Isard, 1987) and was developed to examine goal directed exchanges that occur during the Map Task. The analysis involves coding every utterance in terms of what the speaker is attempting to achieve, based upon the function of the utterance rather

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1073

than its linguistic form or content. CGA is based upon two hierarchically related levels of dialogue structure, Conversational Games and Conversational Moves. Conversational Games are units of linguistic interactions, consisting of the series of initiating and response Moves required to fulfil the purpose, or ‘conversational goal’ of the interaction. A Conversational Game takes the name of the Move that initiated the Game. For instance, an INSTRUCT Game is initiated by an Instruct Move, and consists of the Moves required to complete the instruction. Fig. 1 (taken from Carletta et al., 1977) demonstrates how the conversational games are defined.

Fig. 1. Conversational move categories.

1074

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

Although some Games can be accomplished with just an Initiating Move, others require several Conversational Moves to complete a Game. In addition, one Conversational Game can be nested (embedded) within another Game. For instance, during an INSTRUCT Game a participant may wonder if she has correctly understood an instruction. She may ask a question to seek clarification of the instructions, this initiates a new Conversational Game (in this case a CHECK Game) within the already existing INSTRUCT Game. An example of INSTRUCT Game with embedded CHECK Game is given below: Example 1. An INSTRUCT Game IG: Now go straight down about 5 centimetres Instruct Move CHECK Game embedded. IF: Towards crane bay at the bottom? Check Move IG: Yes Reply-yes stop a few millimetres from crane bay Clarify Move End of embedded CHECK Game IF: Okay done that Acknowledge Move End of INSTRUCT Game CGA was chosen for this study for three reasons. Firstly, the distribution of the Conversational Games and Moves (several specifically highlighting aspects of the processes of grounding), could illustrate the effects of varying the configuration of the audio channel in the two VMC contexts and how this might effect the process of grounding. Secondly, the reliability of the coding has been demonstrated (Carletta et al., 1997; Kowtko et al., 1992). Thirdly, the analysis has been applied to a large corpus of dialogues (e.g. The Map Task Corpus, Anderson et al., 1991) and to dialogues undertaken in video mediated and computer mediated contexts (e.g. Anderson et al., 1997; Doherty-Sneddon et al., 1997; Newlands et al., 2000; O’Malley et al., 1996). Results show that several of the Moves and Games in CGA are affected by communicative context, including Moves that are used to establish grounding (such as CHECK and ALIGN Games). Doherty-Sneddon et al. (1997) report the results from two studies which compared VMC and audio-conferencing (VMC with the video channel disabled) with face-face and co-present spoken only dialogues. They found a significant increase in use of ALIGN Games when comparing audio conferencing with VMC, and suggest that the ALIGN Games were used as substitutes for non-verbal forms of alignment in the audio conferencing context.

1. Method Design. Two groups of participants completed one attempt at the Map Task using VMC. One group of participants used a VMC system with an open-channel audio link,

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1075

the other used the same VMC system but with a half-duplex audio channel. Their data is compared with data taken from the face-face condition of the HCRC Map Task corpus of dialogues. The design is therefore a between subjects design, with type of audio tool as the between groups factor (open channel vs. half-duplex vs. face-face). Participants in VMC context. 40 undergraduate students from the University of Glasgow volunteered to take part in the VMC study. Participants were recruited in pairs who were familiar to each other; partners had known each other for at least two months (mean length of familiarity was 3.16 years, range 2 months–10 years). Participants in face-to-face context. A sample of dialogues were taken from the HCRC corpus consists of ten dialogues, these being the first attempts at the Map Task by participants in the face-face condition with a familiar partner (Anderson et al., 1991). Participants were taken from the University of Glasgow population, and had known each other for approximately 2 years. Task and Materials. Three versions of the Map Task were used in the VMC part of the study. Maps 1 and 2 were used alternatively in both of the VMC conditions. Map 3 was kept in reserve, and only used if participants had to restart a trial. The materials for the Map Task were pairs of schematic maps, on which landmarks are depicted by simple line drawings and labelled with their intended names. The maps were reproduced on A3 paper (30 cm by 42 cm). The maps were fastened onto hardboard, and fixed over the right or left side of the monitor screen, depending on the handedness of the participants. An example of a pair of the Maps is shown in Fig. 2. The task required one participant (the Instruction Giver) to tell the other participant (the Instruction Follower) about the route drawn on the Instruction Giver’s map, so that the Instruction Follower can reproduce it on his or her map as accurately as possible. Each pair of maps portrayed the same location, but there were several specific and intentional differences. For example, a safe route past the landmarks was already marked onto the Instruction Giver’s map, but this route was missing from the Instruction Follower’s map; some of the landmarks on one map did not appear on the other map. In the face-to-face version of the Map Task, pairs of participants were seated one on each side of a double-sided easel, on which the maps are displayed. The easel prevented participants from seeing each other’s maps whilst allowing a view of the face and upper half of their partner’s body. Scoring The Map Task: Task performance was assessed following the procedure devised by Anderson et al. (1991), this involved calculating how accurately the route was reproduced on Instruction Followers’ maps. Route accuracy is defined as the deviation in centimetre squares between the expected route and the route drawn by the Instruction Follower. A low deviation score indicates that the Instruction Follower’s route closely resembles the original route, whilst a high deviation score correlates with poor task accuracy. Apparatus for the Video Mediated Contexts. Two SunSPARC 20 stations were operated over a dedicated local area network, running the operating system Solaris and OpenWindows 3.1 Graphical User Interface; this environment enabled the running of video (VIC) and audio (VAT) conferencing tools, both of which were publicly available. VIC provides full colour jpeg encoded video at 5–6 frames per second, and was used in both VMC conditions. The visual input was provided via 2 JVC Videomovie GR.AX60

1076 A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

Fig. 2. Examples of an instruction giver’s map (left) and an instruction follower’s map.

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1077

compact VHS recorders, which were centrally placed over the monitor. A very small lag between the visual and audio signals occurred in both VMC systems. The audio tools used are described below. Open channel audio. An audio link was run between the 2 rooms in which the SunSPARC stations were situated, providing full duplex sound. Participants communicated via SHURE SM2 dual receiver head and boom mounted microphones. This provided the subjects with full duplex (open channel) audio. The audio output was combined (using a MACKIE micro Series 1202 12-Channel Mic/line Mixer), and an analogue recording made via a JVC KD A33 stereo Cassette recorder. Half-duplex audio. In this VMC set-up the audio tool was VAT 3.4, which was set for a ‘click to speak’ method of channel activation. The audio was captured on SunSPARC microphones, and relayed to each participant on the computer’s loud speakers. The spoken dialogues were recorded using a Sony Professional Walkman Stereo Cassette-recorder WM-D6C. Monitor Configuration. For a right-handed subject the A3 paper version of the map was located over the right-hand side of the monitor screen, and the video window and audio tool (when in use) was position on the left-hand side. This was reversed for left-handed participants. Procedure. The two types of audio-tools were used on alternative days, starting with the Open channel audio set up. The allocation of subjects to the two conditions of audio tool was therefore quasi-random, depending on which day participants were available. On arrival the pairs of subjects were randomly allocated the role of Instruction Giver or Instruction Follower. Participants were then taken to the two separate rooms in which the VMC systems were set-up. They were given written instructions on the Map Task, which simply state that the Instruction Giver’s role was to describe to the Instruction Follower where the pathway went on the map, so that the Instruction Follower could draw the route as accurately as possible. They were warned that the maps might not be identical, as different explorers had drawn them. Written instructions concerning the method of communication were also available, as follows: Instructions for communicating: Open channel. “Put on the headphones, and adjust the microphone so that it is in front of your mouth. You will now be able to hear and talk to your partner, and you will be able to see your partner on the computer monitor. Once you have both read the instructions, and are happy that you can communicate to each other, let the experimenter know that you are ready to begin.” Instructions for communicating: Click to speak. “In front of you on the computer monitor, you will see a video picture of your partner. Under the video picture of your partner there is an audio toolbox. To talk to your partner, move the ’mouse’ so that the pointer is in the large grey square of the audio toolbox. Hold down the right-hand button on the ’mouse’ whilst you talk to your partner. You need to keep the right hand button down all the time you are talking, but release the button to hear what you partner is saying. Please have a short talk with your partner, and make sure that you both understand the instructions before you tell the experimenter that you are ready to start the Map Task.” Participants were allowed to take as much time as they required to complete the task, since no time constraints had been imposed on the group of participants who had undertaken the task in the face-to-face context. On completion of the task the participants

1078

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

using the VMC systems were asked to fill in a short questionnaire, which elicited their views on the task and the communication environment that they had just used. Full orthographic transcriptions of the 20 dialogues obtained from the VMC groups were made, using the audio recordings taken during the study.

2. Results The results of comparisons between data taken from the two VMC and the face–face dialogues were analysed, and will be presented in the following order: first, task performance; second, analysis of speech patterns (length of dialogues, turns, etc.) and turn management (rate of interruptions, explicit turn-taking procedures); third, Conversational Games Analysis; fourth, subjective data on ease of communication. Since large differences in variance were observed in some of the data, each set of measures in the results section was first checked for homogeneity of variance. The appropriate form of parametric and non-parametric one way Analysis of Variance was applied to each set of data, with communicative context as the between group factor, this had three levels: face-to-face, open channel video mediated communication (OC), click to speak video mediated communication (CTS). Task performance. The routes drawn in the two VMC environments were scored for accuracy, using the method described above. Since there was considerable variance in the route accuracy scores, the data was submitted to a square root transformation, which resulted in homogeneity of variance between the three group means. The means for the transformed route accuracy scores for the VMC and face-to-face Map Tasks are presented in Table 1. The route accuracy scores for the VMC context appear to be larger than the scores obtained in the face-to-face context, indicating that the users of the VMC systems may have drawn their routes less accurately than participants in the face-to-face context. A One-way ANOVA produced a significant main effect of communicative context [F(2,27)Z3.80, p!0.05]. Post hoc analysis (Tukey HSD) showed that the only significant difference occurred between the route accuracy scores for the OC and Face-to-face context (p!0.05). Participants in the OC context performed the Map Task less accurately than participants who communicated face-to-face. To determine why the users of the OC conferencing system performed less well, the way in which they interacted was examined. In the next stage of the analysis, the patterns of speech were analysed to find out if (and how) the different types of audio signals had disrupted normal turn-taking procedures. Analysis of speech patterns and turn management. The following measures were used in this analysis; the length of the dialogues (number of words and turns), average length of turns, and time taken to complete the task. The data for these measures are presented in Table 2 below. Table 1 Mean route accuracy scores for VMC and face-to-face map tasks Face-to-face

Open channel

Click to speak

8.38 (3.21)

11.69 (2.60)

10.13 (2.13)

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1079

Table 2 Measures of the process of communication. mean number of words, turns, words per turn, and time per dialogue

Words Turns Turn length Time in minutes

Face-to-face

Open channel

Click to speak

1612.10 (574.32) 197.30 (72.27) 8.37 (1.71) 7.44 (3.25)

1908.40 (683.94) 235.30 (86.19) 8.11 (3.55) 9.94 (4.13)

2910.50 (1476.78) 218.70 (114.24) 14.14 (3.45) 22.23 (11.50)

The data above shows that the different audio tools had some impact on the process of communication. For example, dialogues in the CTS condition contained more words, and take longer to complete than in OC context or face-to-face communication. An independent ANOVA was carried out on each set of data, the results revealed several significant differences due to communicative context. Although there were no significant differences in the number of turns of dialogue across the three communicative contexts (pO0.1), there was a significant effect of context upon the length of the dialogues in terms of the number of words per dialogue [F(2,27)Z4.7, p!0.01]. Further analysis (using Tukey HSD) revealed that the only significant difference was between CTS and face-toface dialogues; CTS dialogues contained more words (p!0.05). A main effect of communicative context was also observed for turn length [F(2,27)Z11.62, p!0.001. Post hoc analysis revealed that significant differences (p!0.001) in turn length occurred between CTS and face-to-face dialogues and between CTS and OC dialogues. All other comparisons were non-significant (pO0.1). Analysis of time taken to complete the task was computed using Kruskal Wallis ANOVA by ranks, which revealed a main effect of communicative context [H(df 2)Z16.66, p!0.001]. Further analysis (using Mann Whitney U two-tailed tests) showed it took more time to complete the task in the CTS context than in the OC or face-to-face contexts (p!0.01). Participants in the CTS condition took more than twice as long to complete the tasks than the other participants in the study. To summarise, dialogues in the click to speak VMC context took nearly twice as many words, turns were almost twice as long, and required more than twice the amount of time to complete as face-to-face interactions. In contrast, all of the comparisons between open channel and face-to-face dialogues were non-significant. Therefore, participants using an open channel VMC system produced dialogues which were structurally very similar to face-to-face interactions, but users of the click to speak VMC system produced dialogues which were structured quite differently from both face-to-face and open channel VMC. This may have been a result of using a half-duplex audio link, which can only transmit one voice at a time. If participants attempt to interrupt each other the audio signals cannot be transmitted, all that is heard is a nasty buzz (or ‘system noise’). Managing smooth transition of turn taking may therefore become more problematic in a CTS context than in a OC context. The next step in the analysis was to examine how turn taking was achieve. This process was analysed in terms the number of interruptions that occur in each dialogue, and the use of explicit turn-taking procedures. Interruptions, which frequently occur in face-face interactions, do not usually disrupt the flow of conversation. In mediated contexts

1080

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

however, interruptions may occur more frequently because interlocutors find it difficult to time maintain smooth turn taking. Analysing the rate at which interruptions occurred in the two VMC conditions provided us with an objective measure of successful or smooth turn taking, and the formality of the interactions. Following the definition used in the HCRC corpus of Map Task dialogues, interruptions in the study were defined as “.points where one person began to speak while another was already talking” (Boyle et al., 1994, p. 8). An example of an interruption is given below. In this example IG denotes the Instruction Giver, IF denotes the Instruction Follower, the brackets (! O) indicate which turns constitute an interruption, the forward slashes (/) representing whereabouts in the text the second speaker interrupted the first speaker. Example of Interrupted Speech. !IG: You want to be going horizontal towards/ IF: You mean towards IG: towards West Lake but/ IF: West Lake?O IG: Yes just for 2 centimetres. The example contains two points of interruption, during which some amount of overlapping speech would occur. The term ‘overlapping speech’ could therefore be applied, instead of interruption. Overlaps occur if one or more words of the second speaker’s contribution are perceived to overlap the first speaker’s contribution. However, episodes of simultaneous talk were impossible in the click to speak context, although attempts to do so were noticeable, due to the noise made by the VC system when one person tries to take the floor whilst someone else is speaking. Therefore the term interruption will be used here to denote occasions when participants either spoke whilst someone else was speaking or attempted to do so. The frequency of interrupted speech was obtained for each set of dialogues. A standardised rate of interruption was also calculated (the number of interruptions that occur in every 100 turns of dialogue), to take into account significant differences in length of dialogues in the click to speak VMC and face-to-face contexts. The raw and standardised data are shown in Table 3. A greater amount of interruption occurred in the OC dialogues than in either the CTS or face-to-face interactions. Analysis was carried out using Kruskal Wallis One way ANOVA, first to the raw scores and then to the standardised scores. For the raw scores there was a main effect of context [H(df 2)Z16.46, p!0.001]. Mann Whitney U comparisons revealed that twice as many interruptions occurred in the OC context than either the CTS or face-to-face contexts (p!0.01). The number of interruptions in the CTS Table 3 Mean number of interruptions

Interruptions (raw) Interruptions (std)

Face-to-face

Open channel

Click to speak

14.00 (8.27) 8.25 (1.14)

37.80 (13.37) 17.09 (7.10)

10.00 (6.75) 4.26 (3.45)

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1081

and face-to-face context did not vary significantly (pO0.10). Analysis of the standardised data produced a similar pattern of results, with a main effect of context [H(df 2)Z21.231, p!0.01]. Follow up analysis showed that significant differences occurred between all three group means (p!0.01). The highest rate of interrupted speech occurred in OC context, fewer interruptions occurred in face-to-face interactions, and the lowest rate of interruption occurred when the CTS audio tool was used. The percentages of turns with interruptions were 17% in open channel, 8% in face-to-face and 4% in click to speak contexts. Analysis of the frequency and rate of interruption has shown that switching turns between speakers appeared to be handled more smoothly in the CTS context. In contrast, this process appears to have been more disrupted in the OC dialogues, where interruptions occurred twice as frequently as in the face-to-face dialogues. One way to avoid interrupted speech is to make use of explicit turn-taking procedures, so these are analysed in the next section. Analysis of explicit turn-taking procedures was based upon the use of question-ending turns (regardless of whether they were ore were not ‘tag questions’). Turns that end with a question clearly indicate to other participants that the previous speaker has finished their turn1. The frequency of these phenomena was calculated from the transcribed dialogues. A few examples of question-ending turns are shown below. Examples of question ending turns: (1) I am now just below the left hand corner of East Lake, Yeah? [tag question] (2) l presume the path goes round the Forest, does it? [tag question] (3) Have you got the Pelicans marked on your map? [non tag question] The frequency of question-ending turns was standardised so that variations in length of dialogues in the different communicative contexts would not have a confounding effect. The frequency of question ending turns was numerically highest in the CTS context (mean 34.36, SD 6.18), slightly lower in the OC context (mean 26.57, SD 10.69), and lowest in face-to-face dialogues (mean 22.85, SD 7.73). An ANOVA was applied, showing a main effect of communicative context [F(2,27)Z4.87, p!0.01]. Post hoc tests (Tukey HSD) revealed that the only significant difference was between the group means for CTS and face-to-face contexts (p!0.05). Therefore, speakers in the CTS context completed their turns with a question more frequently than speakers in either the OC or face-to-face dialogues. Over one third of all the turns in the CTS dialogues were completed with a question, compared to 27 and 23% of turns in the OC and face-to-face interactions. This communicative strategy would assist in the smooth transition of speaker turns, clearly signalling when a speaker was about to release control of the half-duplex audio channel. This could explain why there were so few interruptions in the CTS interaction. Conversational Games Analysis: The aim of this analysis was to see if types of pragmatic functions, or conversational goals, occurring in the VMC contexts differed depending on the type of audio tool being used, and whether this differed in any significant 1 In this study participants very rarely used names to assist turn-taking, so this form of explicit turn-taking was not explored further.

1082

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

way from the face-to-face interactions. The results will be reported at the Conversational Move level, however Moves that were a response to Initiating Moves were excluded from the analysis as they might be associated (correlated) with specific types of Initiating Moves (e.g. a Check Move often elicits a Clarify response Move). Therefore, the following analysis was based upon the frequency of occurrence of Initiating Moves (Instructs, Aligns, Query-yn, Query-w, Explains and Checks) and any subsequent re-occurrence of these Initiating Moves within a Game. Performing the analysis at the level of Conversational Moves, instead of Conversational Games, has been shown to improve the reliability of the coding system (see Carletta et al., 1997 for a detailed discussion of this point). It is also the most appropriate level of analysis for this study, as we wanted to determine the relative usage of the different types of Conversational Moves, not their level of embeddedness. Carletta et al., 1997 also found that whilst coders typically have very high levels of agreement about the start points of new Conversational Games, agreement over where a Game ends is slightly less reliable. We therefore instructed the coder to decide the whereabouts of the start and end points of Games before coding any intervening Moves, thereby improving the reliability of the coding system. The VMC dialogues were coded by an assistant, who was trained in the application of Conversational Games Analysis by one of the authors, an experience coder. The coder did not know whether the dialogues originated from either the OC or CTS context. Reliability of coding. The coder of the VMC interactions coded (blind) one of the Map Task dialogues and this coding was checked for reliability against the HCRC corpus of coded dialogues. Inter-judge reliability of the 225 Games contained in the dialogue was high [KZ.88] and was significant (p!0.001). The coder and one of the authors carried out an inter-judge reliability, both coding afresh the same dialogue from the OC context. The reliability of coding for the 144 Games was again high, [KZ0.91] and was significant (p! 0.001). According to Landis and Koch (1977, in Everitt, 1996), these levels of Kappa indicates almost perfect agreement between coders, and compares favourably with the levels of Kappa presented in previous studies where Conversational Games Analysis has been applied (for example, Carletta et al., 1997; Doherty-Sneddon et al., 1997; Kowtko et al., 1992; Newlands et al., 2003). Although the dialogues from the CTS context contained more words than dialogues in the other context, preliminary analysis revealed that communicative context did not significantly affect the number of Initiating Moves required to complete the Map Tasks (pO0.1). Therefore the data for the following analysis is based on the number of times each type of Conversational Move was initiated during the VMC and face-to-face interactions. The mean frequencies for each Conversational Move, by Instruction Giver and Instruction Follower, in the three contexts are given in Table 4. In line with previous research (for example, Doherty-Sneddon et al., 1997; Newlands et al., 2003) some categories of Initiating Moves were used more frequently than other Moves, and there appears to be some effect of role of participant. For example, Instruction Givers usually initiates Instruct Moves, and Instruction Followers typically introduce Check Moves. A series of 2 way mixed analysis of variance were computed for each type of Conversational Move, with communicative context (faceto-face, open channel, and click to speak) as a between subjects factor, and role of

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1083

Table 4 Mean frequency of initiating moves in face-to-face, open channel and click to speak dialogues by instruction giver (IG) and instruction follower (IF) Face-to-face

Instruct Explains Checks Aligns Query-yn Query-w

Open channel VMC

Click to speak VMC

IG

IF

IG

IF

IG

IF

17.40 (7.02) 7.30 (4.27) 3.00 (2.86) 9.50 (5.50) 11.30 (4.90) 3.20 (2.65)

0.40 (0.69) 12.20 (7.11) 16.30 (3.77) 0.80 (1.03) 6.40 (6.80) 4.80 (2.82)

12.40 (3.50) 10.80 (6.90) 2.60 (1.89) 16.60 (9.92) 7.40 (3.16) 2.50 (3.40)

0.50 (1.26) 12.10 (5.93) 17.10 (4.72) 3.30 (2.75) 8.30 (4.24) 9.90 (4.48)

14.90 (4.20) 8.60 (4.81) 3.30 (3.36) 21.30 (12.27) 10.00 (4.52) 7.80 (6.10)

0.60 (1.57) 13.20 (10.76) 14.80 (10.75) 3.30 (3.19) 4.10 (2.81) 10.40 (6.48)

participant (Instruction Giver or Instruction Follower) as a within dialogue repeated measure. The results are presented below. Instruct Initiating Moves. The analysis revealed no significant effect of communicative context (pO0.1). The main effect of role of participants was significant, [F(1,27)Z 222.466, p!0.001], confirming that the Instruction Giver initiated more Instruct Moves than the Instruction Follow (overall means 14.90 vs. 0.50). The interaction between context and role was non-significant (pO0.10). Explain Initiating Moves. The analysis revealed no significant effect of communicative context (pO0.1). The main effect of role of participants was significant, [F(1,27)Z7.38, p!0.01], confirming that the Instruction Follower initiated more Explain Moves than the Instruction Giver (overall means 12.50 vs. 8.90). The interaction between context and role was non significant (pO0.10). Check Initiating Moves. The analysis revealed no significant effect of communicative context (pO0.1). The main effect of role of participants was significant, [F(1,27)Z104.59, p!0.001], confirming that the Instruction Follower initiated more Explain Moves than the Instruction Giver (overall means 16.06 vs. 2.97). The interaction between context and role was non significant (pO0.10). Align Initiating Moves. The analysis revealed a significant effect of communicative context [F(2,27)Z4.09, p!0.05]. Post hoc analysis (Tukey HSD) revealed that the only significant difference due to context occurred between face-to-face and CTS contexts, with more than twice as many Align Moves being initiated in the CTS context. A significant effect of role was also observed [F(1,27)Z77.41, p!0.001], the Instruction Giver initiating a greater number of Align Moves than the Instruction Follower (overall means 15.80 vs. 2.47). The interaction between context and role was also significant [F(2,27)Z 3.14, p!0.05], further analysis by Tukey HSD showed that Instruction Givers in the CTS context initiated more than twice as many Align Moves than Instruction Givers in the faceto-face dialogues (21.30 vs. 9.50). All other comparisons were non-significant (pO0.1). Query-yn Initiating Moves. The analysis revealed no significant effect of communicative context (pO0.1). The main effect of role of participants was significant, [F(1,27)Z13.64, p!0.001], confirming that the Instruction Giver initiated more yes-no questions than the Instruction Follow (overall means 9.57 vs. 4.10). A significant effect of role was also observed [F(1,27)Z77.41, p!0.001], the Instruction Giver initiating

1084

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

a greater number of Align Moves than the Instruction Follower (overall means 15.80 vs. 2.47). The interaction between context and role was also significant [F(2,27)Z5.63, p! 0.01], further analysis by Tukey HSD showed that Instruction Givers initiated more Query-yn Moves in the face-to-face interactions than in the OC context (11.30 vs. 7.40 Moves per dialogue), and that Instruction Followers in the OC dialogues initiated more Query-yn Moves than those in the CTS context (8.30 vs. 4.10 Moves per dialogue). All other comparisons were non-significant (pO0.1). Query-w Initiating Moves. The analysis revealed a significant main effect of communicative context [F(2,27)Z4.66, p!0.01), post hoc analysis showed that the only significant difference occurred between the CTS and face-to-face dialogues; more than twice as many open-ended questions were initiated in the CTS context than in the face-to-face interactions. A significant effect of role was also observed [F(1,27)Z16.12, p!0.001], with Instruction Followers initiating more Query-yn Moves than Instruction Givers (overall means 8.37 vs. 4.50]. The interaction between context and role was also significant [F(2,27)Z3.36, p!0.05], further analysis showed several significant differences. Instruction Givers in the CTS context initiated more than twice as many Query-w Moves than their counterparts in the face-to-face or the OC context (p!0.05). The number of Query-w Moves initiated by Instruction Followers in the CTS and OC contexts did not differ significantly (means being 10.40 and 9.90, respectively), but in both contexts these participants initiated more Query-w Moves than in the face-to-face context (p!0.05). In summary, communicative context had a significant main effect on the frequency of occurrence of Align and Query-w initiating Moves, with more of these Moves occurring in the CTS context than in face-to-face interactions. The role that participants played also effected the frequency of each type of Conversational Move; Instruction Givers initiating a greater number of Instructs, Align and Query-yn Moves and Instruction Followers introducing more of the Explain, Check and Query-w Moves. Significant interactions between context and role were observed for Align, Query-yn and Query-w initiating Moves, illustrating the exaggerated use of these Moves by some participants, depending on the role they played. For example, Instruction Givers typically initiated more Align Moves than Instruction Followers did. This effect was especially strong in the CTS context when compared to the face-to-face interactions, highlighting the fact that Instruction Givers in this setting spent more time and effort establishing common ground with their partners than in the other two contexts. Subjective Data. After completion of the Map Tasks, participants in the open channel and CTS settings were asked to answer four questions about communicating in these contexts (see Table 5). Table 5 Questionnaire on ease of communication in open channel and CTS video mediated contexts (Q1) (Q2) (Q3) (Q4)

How easy was it to communicate with your partner? How easy was it to take turns at speaking? How often do you think you interrupted your partner? How often do you think you looked at the video of your partner?

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1085

Table 6 Group means from questionnaire data for participants in the click to speak and open channel video mediated contexts Click to speak

Open channel

Role

IG

IF

IG

IF

Question 1 Question 2 Question 3 Question 4

2.20 (0.79) 1.80 (0.78) 3.50 (0.53) 2.10 (1.10)

2.20 (0.92) 2.40 (1.07) 2.90 (0.74) 3.20 (1.22)

1.60 (0.96) 1.50 (0.53) 3.30 (0.67) 3.10 (0.87)

2.40 (0.84) 1.20 (0.42) 3.50 (1.27) 3.20 (0.79)

In questions 1 and 2, a rank of one indicated that participants found it very easy to communicate or take turns, whereas a rank of 5 indicated that they found it very difficult. For questions 3 and 4, a rank of one indicated that participants thought that they interrupted, or looked at their partner very frequently. The group means (and standard deviations) for these data are given in Table 6. Participants in both VMC contexts found it fairly easy to communicate with each other and to take turns, though the ratings by Instruction Followers in the CTS context are higher than those in the other context (a higher rank indicating that turn taking was less easy). Subjective ratings of how frequently participants interrupted their partners would indicate that they thought they did so only occasionally, no participants thought they did this very frequently. The data for question 4 (how often did they look at their partner) are also clustered around the middle of the ranked scores, except for Instruction Givers in the CTS context, who think they gazed more frequently than the other participants. To establish if the distribution of responses varied with communicative context or role of participants, analysis of variance was computed on the data for each question, with type of audio tool (CTS or OC) and role of participant (Instruction Giver or Instruction Follower) as between group factors2. Only responses to the Question 2 (ease of turn taking) produced significant differences in distribution of answers. A main effect of context [F(1,36)Z10.075, p!0.01] showed that CTS users considered turn-taking to be more difficult than participants in the OC context (mean ratings were 2.1 and 1.35, respectively). The main effect of role of participants was non significant (pO0.1), but the interaction between audio tool and role of participant approached significance [F(1,36)Z3.63, p!0.06]. Further analysis of the means involved in this interaction (by Simple Effects analysis) showed that the Instruction Followers had a tendency to rate turn taking differently in the two VMC contexts; [F(1,36)Z12.89, p!0.01]. Instruction Followers who used the open channel audio tool thought that turn taking was easier to accomplish than Instruction Followers in the click to speak context. However, this result must be treated with some caution and should be repeated with a larger sample size in order to confirm the interaction between role and context. All other differences between the means in the interaction were non-significant (pO0.01). Overall, the subjective data confirms the general feeling obtained informally 2 In this case the role of participants is taken as a between groups factor as each individual gave their rating for each of the questions, rather than their rating as a pair of participants.

1086

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

during de-briefing sessions; participants had enjoyed using this new form of technology and had not found it difficult to communicate with each other in this novel context.

3. Discussion This study investigated the effects of two types of audio tool used in VMC; the provision of half-duplex (CTS) and full-duplex (OC) audio channels. These VMC contexts were compared with face-to-face interactions. The study explored the effects these different audio links have upon task performance and the process of communication, whilst holding the quality of the visual signals constant in the two VMC contexts. The results show that one of the VMC contexts (open channel VMC) had an adverse impact upon task performance; tasks were completed less accurately in the open channel VMC context than in the face-to-face interactions. The task performance for users of the click to speak VMC system did not differ significantly from the performance of participants in the face-to-face context. Why did users of the open channel VMC system perform less well in the Map Task? Analysis of the process of communication suggested some possible explanations. The structure of the dialogues in the OC context was similar to face-to-face interactions in many ways, the length of the dialogues in the OC and face-to-face were very similar, and turn length (words per turn) did not vary with context. However, the OC dialogues contained many more interruptions than face-to-face interactions. In contrast, participants interacted quite differently in the CTS context; they said more, used longer turns and interrupted each other very infrequently. Since the quality of the visual signals was identical in both of the VMC contexts, it is unlikely that variations in task performance and the process of communication can be attributed to the video images. It seems more probable that these effects were due to the different types of audio links used in the VMC systems. One of the outstanding differences in the way people interacted in the OC context was that they interrupted each other very frequently, more frequently than participants in either face-to-face or CTS VMC contexts; 17% of turns contained interruptions in the open channel context, compared to 8 and 4% of turns in the face-to-face and CTS dialogues. How do these findings compare with the rate of interruptions reported by other researchers in this area? Studies, which provide data on the rate of overlapping speech or interruptions, include O’Conaill et al. (1993), O’Conaill and Whittaker (1997), and Anderson et al. (1996). O’Conaill et al. report that the mean number of interruptions in the LIVE-NET (which provided an open channel audio link with high quality visual signals) and face-toface contexts reported did not vary significantly either (11.75 and 18.60, respectively), whilst users of the ISDN system made significantly fewer interruptions; only 2% of turns were interruptions in this context. This latter effect is similar to the findings reported here, the percentage of turns that contained simultaneous speech was very low in the CTS context. Anderson et al. (1996) analysed the rate of interruption (a measure which included all forms of simultaneous speech) which occurred in a variety of communicative contexts, including a VMC context very similar to the open channel VMC used here. Their results revealed a non-significant difference in the percentage of turns containing interruptions

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1087

between the open channel VMC system and face-to-face interactions (10.9% of turns vs. 13.8% of turns). These comparisons would suggest that the rate of interrupted speech is likely to be higher when participants use an open channel audio link, and lower in a VMC context that provides half-duplex audio signals. This fits the pattern of results obtained in our analyses, but does not explain why the rate of interruption was unusually high in the open channel VMC context. The question that remains to be answered is whether these differences in rates of interruptions, especially the very high rate of interruptions observed in the OC setting, could account for the observed differences in task performance in this study? One possible explanation for the low level of task performance by users of the OC system is that they had gained the impression that they could communicate in this context as if they were in a face-to-face setting. The transparency of the audio link could have been the cause of this illusion. However, the quality of the visual signals provided by the VMC systems was moderately poor; the temporal resolution was approximately 5 frames per second. The visual information required to achieve smooth transitions of speakers’ turns was probably not available in these VMC contexts due to the low frame rate and size of the video images. In the open channel context this resulted in participants interrupting each other very frequently, which in turn disrupted the process of grounding. Since establishing mutual understanding is essential if participants are to complete the Map Task accurately, the effects of assuming that they could communicate in a style similar to face-to-face interactions resulted in poorer task performance. If users of the OC system had been more aware of the restrictions of the VMC system then they might have made greater allowances for the communicative restraints they were working under. Participants in the CTS context were more conscious of the restraints imposed by the technology, as they had to manually activate the audio channel each time they wanted to speak. They appear to have made some allowances for the VMC context, saying more, using longer turns, and taking greater care in managing turn-taking procedures. These differences could explain why they achieved levels of task performance similar to those obtained in the face-to-face interactions. One interesting fact about the CTS dialogues was that they were 35% longer (in words and minutes) than the dialogues occurring in the OC or face-to-face interactions. Dialogues in the Map Task tend to be very task-oriented, there are few episodes of social chatting, so the increase in linguistic output could suggest that the CTS dialogues contained more task-oriented information. The extracts below illustrate the length differences of turns in the CTS and OC contexts. In these extracts participants are talking about drawing the route from the West Lake to the Monument. The following symbols have been used in the extracts: IG refers to the Instruction Giver; IF refers to the Instruction Follower; pointed brackets show areas of overlapping or interrupted speech, and a forward slant (/) indicates the point at which an interruption occurs. Extract 1. Click to speak. IG: Now do you have a monument on your right had side of your page? IF: Ehm l do it’s just up from mid centre to the right, yeah? IG: That’s right. What l want you to do is from that point, ehm the point that you’re at on the shoulder of the west lake draw a line to the bottom of the monument going right,

1088

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

at about 40 degrees to the bottom of the monument but underneath it, as if you were going to go round it again. So straight from the shoulder of the west lake, straight down at a 45 degree angle to underneath the monument. Is that clear? IF: Yep it’s done. Extract 2. Open channel VMC IG: So have you got a monument down there? !IF: I’ve got a monument/ IG: right good IF: down thereO IG: So you’re going down and round the monument !IF: Right, round the monument and then/ IG: Uh huh, and then sort of up againO These extracts demonstrate that participants in the CTS context used longer turns, which frequently contained more information than occurred in the Open channel context. The provision of longer, more detailed instructions could have reduced the amount of clarification required to establish mutual understanding in this context. Newlands et al. (2003) report that participants completing the Map Task in a text-based computer mediated context adapted to the environment in a similar manner; Instruction Givers gave more precise instructions, which required less checking and clarification. A similar kind of adaptation seems to have occurred in the CTS dialogues, perhaps enabling participants in this context to complete the task more successfully. Additionally, the use of longer turns, instead of a greater number of short turns, would be beneficial in this communicative context; since it would reduce the number of speaker switches, and thus assist the management of turn taking. The extracts of dialogues from the two VMC contexts also demonstrate another distinctive difference between the CTS and OC dialogues; users of the CTS system appear to have taken great care not to interrupt each other. This has already been commented upon, but how did these participants manage to switch between speakers without interrupting each other? Why was it so important to avoid episodes of simultaneous speech? Since the quality of the video signals were identical in both of the VMC systems, the reduction in the number of interruptions was not due to variations in the quality of the video signals. It seems more likely that the consequences of interrupting each other in the CTS context where sufficient to induce greater care in turn-taking procedures; if both participants tried to talk at the same time, the audio signals masked each other and it was impossible to make out what either person was saying. In some of these situations the participants had to negotiate who was going to speak next; a process which sometimes required several speaker turns. The effects of interrupting were, therefore, quite disruptive in the CTS context, and this probably explains why these participants took great care to avoid doing so. Several methods of reducing the amount of overlapping speech have been observed in this study. First of all, the speakers used longer turns which reduced the number of times

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1089

turn taking had to be negotiated. Secondly, the CTS participants made greater use of question-ending turns than participants in the OC or face-to-face interactions; over a third of turns in the CTS dialogues ended with a question. The increased use of longer turns and question-ending turns in the CTS dialogues suggests that these participants made greater allowances for the restraints imposed upon them by a half-duplex audio channel. The CTS users appear to have adapted well to the novel communicative environment. It is possible that they did so because the audio channel had to be manually activated, which made them more aware of the technology they were using. By making greater allowances for the communicative context, the CTS users achieved a high level of task performance, and altered the way in which they communicated to a style appropriate to the communicative context. This suggests that CTS users took greater care in planning what they said to each other, and how they interacted. Some evidence for this suggestion is found in the results of the Conversational Games Analysis, which showed significant effects of communicative context for two types of Conversational Games; participants in the CTS context initiated a greater number of Align Moves and more Query-w Moves. In an Align Move or Game, the speaker asks the addressee to confirm that they have understood a previous utterance or instruction. Align Games are, therefore, important in the process of grounding as they are a means of obtaining feedback from the addressee. Our analysis showed that Instruction Givers made significantly greater use of alignment in the CTS context. An increased use of Align Games has also been reported by Doherty-Sneddon et al. (1997) and Anderson et al. (1997), who observed increased use of this type of conversational game in contexts where participants could not see each other. These authors suggest that people use Aligns as verbal substitutes for non-verbal ways of obtaining feedback, adopting a more cautious form of communication (see also Shadbolt, 1984). We claim that the increased use of Align Games in the CTS context occur for similar reasons. Although our participants can see each other, the quality of the video channel is low. In addition, the problems imposed by the half-duplex audio channel made it difficult for the participants to provide welltimed feedback, which is important in referential communication (Clark and Wilkes-Gibbs, 1986). The increased use of Query-w Moves, or open-ended questions, in the CTS context could also be taken as an indication that these participants were taking greater care in establishing mutual understanding. These Moves were used most frequently by the Instruction Follower, who sought information in this way twice as often as participants in the face-to-face context. The significant interaction between context and role of participants (p!0.05) also revealed a difference in the use of Query-w Moves by Instruction Followers in the OC and face-to-face conditions, so more open-ended questions were initiated by Instruction Followers in both of the video mediated contexts than in the face-to-face dialogues. Query-w Moves are requests for new information (rather than requesting confirmation about already given information), and show that making progress in the Map Task is not always the responsibility of the Instruction Giver; the Instruction Follower can assist by asking for new information as well as providing effective and timely feedback. The increased use of open-ended questions in the CTS dialogues could also account for the greater number of ‘question ending turns’ observed in

1090

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

this context (see also O’Conaill and Whittaker, 1997), which are supposed to facilitate turn taking in contexts where the quality of the audio and visual signals is low. Overall, the results of the Conversational Games Analysis support the idea that participants in the CTS context adapted to the situation in a sensible manner, making more requests for feedback and new information during the dialogues. The CTS dialogues are good examples of participants taking joint responsibility for the establishment of mutual knowledge, in the manner suggested by Clark and Brennan (1991), and could partly explain why these participants performed well on the task, whilst participants in the OC context produced significantly less accurate routes. These results also support the view purposed by Clark and Brennan (1991), who state that the process of grounding will change with communicative context, and that participants in a conversation will use the grounding constraints that cost the least collaborative effort. Requesting verbal forms of feedback, rather than relying on non-verbal forms of alignment, may require a greater amount of dialogue (which could explain why the CTS dialogues were longer and took more time), but receiving timely feed back would assist the process of establishing mutual understanding and hence improved performance on the Map Task. The subjective ratings data also confirms our view, since participants in the CTS context, especially the Instruction Followers, thought it was more difficult to take turns than in the OC context. This indicates that they were more aware of the constraints imposed by the technology, and adopted a more cautious style of communication and interaction. It is interesting to note, however, that the subjective ratings of the participants appear to be at odds with some of the objective measures of the process of communication. In particular, the subjective ratings of how frequently participants interrupted their partner appear to differ from the objective measures of interruptions in the VMC contexts. The majority of participants (78% of the subjects) thought that they had only interrupted their partners on a moderate number of occasions (defined on the questionnaire as ‘approximately every other time you spoke’). However, the objective data shows that participants in the OC context interrupted each other very frequently, whilst interruptions by users of the CTS system occurred very rarely. One possible reason for the differences between the objective and subjective measures is that interruptions and episodes of overlapping speech have different impacts in the two VMC contexts. In the OC context interruptions did not have a disruptive impact on the flow of communication; participants in the open channel VMC could talk over the top of each other for short periods of time without this really being noticed. This happens in everyday conversations fairly regularly, people are not always aware that they interrupt each other, and therefore under-estimate how frequently interruptions occur. However, in the CTS interactions the effects of interrupting a partner was more disruptive, and this might make people over-estimate how frequently the interrupted each other. This could account for discrepancies between objective and subjective data.

4. Conclusions This study has examined the impact of two forms of VMC upon collaborative interactions. The results have shown that task performance was detrimentally affected in

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1091

the OC context, though this did not occur in the CTS context. The findings suggest that poorer task performance in the OC context could have been the result of these participants making in-sufficient allowance for the VMC technology. When the OC dialogues were examined in detail, it was observed that they were very similar in structure to the face-toface interactions. There was one important difference; the OC dialogues contained a large number of interruptions. These could have occurred because the full duplex audio link made it easy for the participants to talk to each other, but the relatively low quality of the video images meant that it was difficult to make use of non verbal communication to assist in the smooth transition of speaker turns. The high incidence of interrupted speech disrupted the process of communication, and may have disrupted the process of establishing mutual understanding, a crucial element of many collaborative tasks. In contrast, participants in the CTS context achieved a level of task performance similar to face-to-face interactions. Analysis of the process of communication in this context suggests that they do so by adapting to the constraints imposed by the CTS environment. Although the adaptations to the novel CTS environment inflicted some penalties (e.g. lengthier dialogues, greater care in handling turn taking, a more formal style of communication, and greater use of verbal alignments) these participants achieved a reasonable level of task performance. Analysis of the process of communication revealed that these participants took great care not to interrupt each other, they also appeared to alter the way in which information was exchanged; using longer turns to reduce the number of speaker turns. These were sensible adaptations to the constraints imposed by a half-duplex audio channel. Our primary claim is that different types of audio channel will significantly determine how people interact, but the generality of these results to other settings and contexts requires some consideration. For instance, would similar results occur if a different task had been used? We would predict similar results to occur with other types of collaborative tasks, for instance the types of collaborative problem tasks used by Chapanis (1988). Tasks that are similar in nature (for instance tasks which fall into Quadrant 1 of McGrath’s (1984) Circumplex Model of Group Task Types) could be affected in similar ways by different configurations of the audio channel in a VMC setting. Another way of categorising tasks is to consider the impact of the task upon the participants and the type of computer-supported collaborative system (CSCW) that could best support their cooperative work (Schmidt, 19943). Thinking in terms of Schmidt’s classification of types of cooperative work, the Map Task requires pairs of participants to combine their different perspectives of the task and to work together cooperatively in order to complete the task successfully. We would therefore anticipate similar changes in patterns of speech and the process of grounding if the tasks involved were similar in nature to the Map task, but to ascertain the effect of OC or CTS audio channels upon tasks that require other considerations (e.g. tasks requiring negotiation, or mutual critical assessment) would require further research. Another possible limit on the generality of the results should be considered. Would the effects of audio channel be similar if small groups had been used instead of dyads? Based on previous work (Anderson et al., 1999; Daly-Jones et al., 1998; Olson et al., 1997), we 3

We are grateful to the anonymous referee who pointed out this reference to the authors.

1092

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

anticipate that the effect of the audio channel will be greater if participants work in groups rather than dyads. These studies have shown that the problems associated with turn taking escalate when the number of people taking part in video-mediated interactions increases. For instance, Anderson et al., (1999) found participants working in small groups said more, and interrupted each other more frequently, then pairs of participants who completed the Map Task whilst working in a high quality video conference context. In large group discussions it may therefore be beneficial to use a half-duplex audio channel, forcing participants to adapt to the communicative setting, rather than run the risk of extensive interruptions which could disrupt the communicative process. Further research is required to test out this idea, and to further our knowledge of how to achieve the best fit between the type of task being undertaken and the type of technology that can support users whilst they communicate with one another.

Acknowledgements This research was supported by the ESRC (UK) funded Human Communications Research Centre. We are grateful to the Centre for the use of their corpus of spoken Map Task dialogues. We would like to acknowledge Alan Dickson for his assistance with coding the dialogues.

References Anderson, A.H., Bader, M., Bard, E., Boyle, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Sotillo, C., Thompson, H., 1991. The HCRC Map Task Corpus. Language and Speech 34 (4), 351–360. Anderson, A.H., Clark, A., Mullin, J., 1991. Introducing information in dialogues: how young speakers refer and how young listeners respond. Journal of Child Language 18, 663–687. Anderson, A.H., Mullin, J., Katsavras Brundell, P., McEwan, R., Grattan, E., O’Malley, C., 1999. Multimediating multiparty interactions. In proceedings of Human Computer Interaction - INTERACT 99. A. Sasse & C. Johnson (Eds.). IOS Press. Anderson, A.H., Newlands, A., Mullin, J., Fleming, A.M., Doherty-Sneddon, G., Van der Velden, J., 1996. Impact of video-mediated communication on simulated service encounters. Interacting With Computers 8 (2), 193–206. Anderson, A.H., O’Malley, C., Doherty-Sneddon, G., Langton, S., Newlands, A., Mullin, J., Fleming, A.M., Van der Velden, J., 1997. The impact of VMC on collaborative problem solving: an analysis of task performance, communicative process, and user satisfaction, in: Finn, K.E., Sellen, A.J., Wilbur, S.B. (Eds.), VideoMediated Communication. Lawrence Erlbaum Associates, NJ, pp. 133–172. Angiolillo, J.S., Blanchard, H.E., Israelski, E.W., Mane, A., 1997. Technology constraints of Video-Mediated Communication’, in: Finn, K.E., Sellen, A.J., Wilbur, S.B. (Eds.), Video-Mediated Communication. Lawrence Erlbaum Associates, NJ, pp. 51–73. Boyle, E., Anderson, A.H., Newlands, A., 1994. The effects of visibility on dialogue and performance in a cooperative problem-solving task. Language and Speeech 37, 1–20. Brown, G., Anderson, A.H., Yule, G., Shillcock, R., 1984. Teaching Talk. Cambridge University Press, Cambridge. Carletta, J., Isard, A., Isard, S., Kowtko, J., Newlands, A., Doherty-Sneddon, G., Anderson, A.H., 1997. The Reliability of a Dialogue Structure Coding Scheme. Computational Linguistics 23 (1), 13–32. Carletta, J., Anderson, A.H., McEwan, R., 2000. The effects of multimedia communication technology on noncollocated teams: a case study. Ergonomics 43 (8), 1237–1251.

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

1093

Chapanis, A., 1988. Interactive human communication, in: Greif, I. (Ed.), Computer-Supported Cooperative Work: A Book of Readings. Morgan Kaufmann, Cambridge, MA, pp. 127–142. Clark, H.H., Brennan, B.E., 1991. Grounding in communication, in: Resnick, L.B., Levine, J., Teasley, S.D. (Eds.), Perspectives on socially shared cognition. American Psychological Association, Washington. Clark, H.H., Schaefer, E.F., 1987. Collaborating on contributions to conversations. Language and Cognitive Processes 2 (1), 19–41. Clark, H.H., Schaefer, E.F., 1989. Contributing to discourse. Cognitive Science 13, 259–294. Clark, H.H., Wilkes-Gibbs, D., 1986. Referring as a collaborative process. Cognition 22, 1–39. Cohen, P.R., 1984. The pragmatics of referring and the modality of communication. Computational Linguistics 10, 97–146. Daly-Jones, O., Monk, A., Watts, L., 1998. Some advantages of video conferencing over high-quality audio conferencing: fluency and awareness of attentional focus. International Journal of Human-Computer Studies 49, 21–58. Doherty-Sneddon, G., Anderson, A.H., O’Malley, C., Langton, S., Garrod, S., Bruce, V., 1997. Face-to-face and video mediated communication: a comparison of dialogue structure and task performance. Journal of Experimental Psychology: Applied 3 (2), 1–21. Everitt, B.S., 1996. Making Sense of Statistics in Psychology A Second-level Course. OUP. Finn, K.E., 1997. Introduction: An Overview of Video-Mediated Communication Literature, in: Finn, K.E., Sellen, A.J., Wilbur, S.B. (Eds.), Video-Mediated Communication. Lawrence Erlbaum Associates, NJ, pp. 3–21. Fels, D.I., Weiss, P.L., 2000. Towards determining an attention-getting device for improving interaction during video mediated communication. Computers in Human Behavior 16 (2), 189–198. Furnace, J., Hamilton, N.M., Helms, P., Duguid, K., 1996. Medical teaching at a peripheral city by videoconferencing. Medical Education 30 (3), 215–220. Gale, S., 1990. Human aspects of interactive multimedia communication. Interacting with Computers 2 (2), 175–189. Houghton, G., Isard, S., 1987. Why to speak, what to say and how to say it: Modelling language production in discourse, in: Morris, P. (Ed.), Modelling Cognition. Wiley, , pp. 249–267. Isaacs, E.A., Clark, H.H., 1987. References in conversation between experts and novices. Journal of Experimental Psychology: General 116, 26–37. Isaacs, E.A., Tang, J.C., 1994. What video can and cannot do for collaboration: a case study. Multimedia Systems 2, 63–73. Knowles, L.E., Dillon, C.L., 1996. Education and the continuing professional education of Architects: the role of satellite videoconferencing and learning. Journal of Architectural and Planning Research 13 (2), 140–151. Kowtko, J., Isard, S., Doherty-Sneddon, G., 1992. Conversational Games within Dialogue Research Paper HCRC/RP-31, Human Communications Research Centre, University of Edinburgh 1992. McAndrew, P., Foubister, S.P., Mayes, T., 1996. Videoconferencing in a language learning application. Interacting with Computers 8 (2), 207–217. McGrath, J.E., 1994. Groups: Interaction and Performance. Prentice Hall, Englewood Cliffs, NJ. McGrath, J.E., Arrow, H., Gruenfeld, D.H., Hollingshead, A.B., O’Connor, K.M., 1993. Groups, tasks and technology: the effects of experience and change. Small Group Research 24, 406–420. Manning, T.R., Goetz, E.T., Street, R.L., 2000. Signal delay effects on rapport in telepsychiatry. CyberPsychology and Behavior 3 (2), 119–127. Newlands, A., Anderson, A.H., Mullin, J., Fleming, A., 2000. Processes of Collaboration and Communication in desktop videoconferencing: do they differ from face-to-face interactions?. In Proceedings of Gotalog 2000, Fourth Workshop on the Semantics and Pragmatics of Dialogue. Go¨teborg, Sweden. June 2000;. Newlands, A., Anderson, A.H., Mullin, J., 2003. Adapting Communicative Strategies to Computer-Mediated Communication: an analysis of task performance and dialogue structure. Applied Cognitive Psychology 17 (3), 325–348. O’Conaill, B., Whittaker, S., 1997. Characterizing, predicting, and measuring video-mediated communication: a conversational approach, in: Finn, K.E., Sellen, A.J., Wilbur, S.B. (Eds.), Video-Mediated Communication. Lawrence Erlbaum Associates, NJ, pp. 107–131. O’Conaill, B., Whittaker, S., Wilbur, S., 1993. Conversations over video conferences: an evaluation of the spoken aspects of video-mediated communication. Human Computer Interaction 8, 389–428.

1094

A. Sanford et al. / Interacting with Computers 16 (2004) 1069–1094

Ochsman, R.B., Chapanis, A., 1974. The effects of ten communication modes on the behaviour of teams during cooperative problem solving. International Journal of Man-Machine Systems 6, 579–6619. Olson, J.S., Olson, G.M., Meader, D., 1997. Face-to-face group work compared to remote group work with and without video, in: Finn, K.E., Sellen, A.J., Wilbur, S.B. (Eds.), Video-Mediated Communication. Lawrence Erlbaum Associates, NJ, pp. 157–172. O’Malley, C., Langton, S., Anderson, A.H., Doherty-Sneddon, G., Bruce, V., 1996. Comparison of face-to-face and video-mediated interaction. Interacting with Computers 8 (2), 177–192. Reiss, J., Cameon, R., Matthews, D., Shenkman, E., 1996. Enhancing the role public health nurses play in serving children with special health needs: an interactive videoconference on Public Law 99-457 part H. Public Health Nursing 13 (5), 345–352. Robinson, A., 1993. Communication and contact by videoconferencing in the divided society of Northern Ireland. Journal of Educational Television 19 (3), 127–137. Schmidt, K., 1994. Cooperative Work and its Articulation: Requirements for Computer Support. Travail Humain 57 (4), 345–366. Schmidt, K., 1994. Cooperative Work and its Articulation: Requirements for Computer Support. Travail Humain 57 (4), 345–366. Sellen, A.J., 1995. Remote conversations: The effects of mediating talk with technology. Human Computer Interaction 10, 401–444. Shadbolt, N.R., 1984. Constituting Reference in Natural Language: The problem of referential opacity. Unpublished doctoral dissertation, University of Edinburgh, Edinburgh. Scotland. Tang, J.C., Isaacs, E., 1993. Why do users like video?. Computer Supported Cooperative Work (CSCW) 1, 163–196. Tro¨ster, A.I., Paolo, A.M., Glatt, S.L., Hubble, J.P., Koller, W.C., 1995. Interactive video conferencing in the provision of neuropsychological services to rural areas. Journal of Community Psychology 23, 85–88. Williams, E., 1977. Experimental comparisons of face-to-face and mediated communication: A review. Psychological Bulletin 84 (5), 963–976.