Experiments using multimedia interfaces in process control: Some initial results

Experiments using multimedia interfaces in process control: Some initial results

(_~mpul. & Graphics Vol. 17, No. 3, pp. 205-218, 1993 Printed in Great Britain. 0097-8493/93 $6.00 + .00 © 1993 Pergamon Press Ltd. Multimedia EXPE...

1MB Sizes 0 Downloads 29 Views

(_~mpul. & Graphics Vol. 17, No. 3, pp. 205-218, 1993 Printed in Great Britain.

0097-8493/93 $6.00 + .00 © 1993 Pergamon Press Ltd.

Multimedia

EXPERIMENTS USING MULTIMEDIA INTERFACES IN PROCESS CONTROL: SOME INITIAL RESULTS JAMES L. ALTY, MARIUS BERGAN, and PENNY CRAUFURD Loughborough University of Technology, Department of Computer Science, Ashby Road, Loughborough, Leicester LE11 3TU, United Kingdom and CIARAN DOLPHIN Work Research Center, Ltd., 22 North Umberland Road, Dublin 4, Ireland Abstract--This paper reports a series of experiments using different combinations of multimedia interfaces. The task used in the experiments is the Crossman Waterbath. The media combinations are compared and contrasted to draw out pointers towards the effects of different media. It is concluded that warnings affect comprehension and should be minimised during exploratory learning. Sound results were disappointing, and had a detrimental affect on performance. However, it was thought that the problem lay with the lack of discrimination between different sound levels. Those subjects who found the task hard were greatly helped by speech warnings. In conclusion, the experiments show that different presentational styles do indeed matter, and it is possible that multimedia interfaces are particularly useful for aspects of the interface which are difficult to understand• l. INTRODUCTION--FEATURES AND BENEFITS Any application in which technology is used exhibits a number of features. For example, the ability to transmit video pictures over large distances is a feature and the capability of providing output messages using the spoken voice is also a feature. Features can be extremely useful and relevant or they can be useless or even harmful. Whether a feature is useful or harmful may even depend upon the context in which it is used. The important point is that features in themselves are neutral. The capability of receiving video pictures, for example, is of no use to a person dying of thirst in a desert. In like manner, a rich multimedia interface that cannot be understood is of no use to the operator observing it. Features become useful only when they enable h u m a n beings to attain goals that are of importance to them, more easily than they could without the use of the feature. A technological feature that is useful and enables h u m a n beings to attain goals in a more eti~cient or safe m a n n e r is often termed a "benefit." These days, particularly with respect to the application of information technology, features are often promoted as benefits in themselves. For example, a recent issue of Open Computing[ 1] proclaimed: For while Multimedia has the potential to transform computing, many of the components that create it are already available--it's just that users haven't brought them together yet . . . . Admittedly, the one thing that Multimedia's been waiting for is applications. It should be noted that the term Multimedia can be used to describe a number of different phenomena. It can refer to the transmission of multimedia information, to the storage of information on different media, or to the provision of a number of media of c o m m u nication on a human machine interface. In this chapter we are using the term in the latter context.

Multimedia features are highly technological in nature. For example, Table 1 below shows some multimedia features for the S U N SPARC workstation. Such features have the potential of providing benefits to users but they do not provide such benefits in themselves. Most new developments in interface design have been brought in without full prior justification. H u m a n beings are very inventive and will usually eventually work out how to convert features into benefits. This is, however, rather a slow process and it can be speeded up by formal investigations into feature properties and their relationship to human capabilities and weaknesses. For example, some work already exists on media usage and characteristics[2, 3]. However, the move to Multimedia interfaces is more than just a matter of improving information presentation. Marmollin[4] has pointed out that computers can be viewed either as compensating for the intellectual limitations of man or as tools for extending and augmenting the mind. In this latter sense he argues that complex, dynamic and integrated representations of information really are necessary for utilizing all the capabilities of the m i n d . . , the use of multidimensional representations could also be described as an attempt to utilize the whole mind, i.e., both our ability to logical reasoning and our ability to creative thinking. •

.

.

To make the point, consider watching a baUgame on television. It is enough, visually, to see one player score with the ball, but hearing the event as well gives it an added dimension. Although the sound aspects would appear to be redundant, from the point of view of understanding what is happening, they do seem to add to our comprehension of the event. We believe that the application of Multimedia techniques to interface design is not at all well understood. We have therefore been investigating the application 205

206

J. L. ALTY,M. BERGAN,P. CRAUFURDand C. DOLPHIN Table 1. The SUN workstation multimedia features.

Bit-Mapped Display Graphics Audio Visual Data Compression Control Synchronisation Networking Authoring Performance

1152 x 900 bit mapped display, most models providing 8-bit pseudo colour All SPARC stations support Open Windows 2.0 with X 11/NeWs imaging Every SPARC workstation has built in audio capabilities (8 kHz single channel. VideoPix (a video frame grabber is also available) JPEG compression support for VideoPix. Uniflix--video compression/decompression toolkit for full-motion video No complete and comprehensive solution but a number of mechanisms are offered such as shared memory, interprocess communication, timers, semaphores and threads. Client-Server support. Multimedia as natural extension of the client-server model A number of tools are available including HyperNeWS, Cats Meow, and Mediawrite Suf~cient

of multimedia techniques in the improvement of interface design. Our main area of interest is in Process Control, but many of our results will hopefully be of more general use. 2. WHY M I G H T THERE BE PARTICULAR PROBLEMS WITH MULTIMEDIA INTERFACES?

In interface design we have always designed in a pragmatic manner. The first interfaces were produced without little thought as to their content or impact. For example, in the early days of computing, there was only scrolled text as an output medium. Although designers did not fully understand how to use this medium (in computer interfaces) in an effective manner, they did have a thorough understanding of text as a communication medium and had the communication experience of a few hundred years to fall back on. When graphics were used in interfaces, designers could rely on a solid base of experience in graphical communication. Colour was the first medium extension that caused any real problems. Many human beings simply do not understand how to use colour for communication (most of us have difficulty in matching clothes or wallpaper). Only a few human beings are skilled enough to use colour effectively (and we value them as artists). Therefore, the early colour interfaces were disasters with colour being used because it was there not because it was needed--as a feature not as a benefit. It is only recently that we have learned that the secret of using colour is to use it sparingly. Psychologists and artists knew this truth even when colour was initially being used in interface design, but nobody asked their opinion and many mistakes were made. Now we are moving towards a technology that provides text, graphics, animation, sound, speech, and combinations of these. If we knew little about colour we (as computer interface designers) know almost nothing (from a computer interface point of view) about the use of multimedia in communication. Other areas of communication activity could offer us the benefit of considerable experience. The production of films and video sequences, for example, is a highly skilled activity. Not only does the average human being have very high standards by which to judge such media, in addition there are highly developed mechanisms for achieving communication. More work is therefore needed to answer the ques-

tion "what media when?" and such work also needs to provide a methodology to assist designers. These two issues were important starting points for the PROMISE project--an ESPRIT II project that started in 198915]. The first 3 years were concerned with building a multimedia toolset and carrying out some experiments on multimedia interfaces. 3. M U L T I M E D I A EXPERIMENTS IN THE BEHAVIOUR LABORATORY

3.1. The experiment Because experiments and observations in industrial exemplars were expected to take some time to set up and execute, we decided to carry out some initial experiments in the laboratory. Obviously the task chosen should be relevant to process control (i.e., a relatively complex dynamic task with multiple state and control variables). It should also have scope for multimedia presentation, should be readily learned by complete novices, and should be straight forward to implement, i.e., in a 2-3 week period. The task selected for this experiment was Crossman's Waterbath[6] (Fig. 1). The task is closely related to the process control domain[7] and we felt there was scope for multimedia presentations. It has been used on numerous occasions[8, 9] since it was first used by Crossman and Cooke to show operator progression from a second order controller to exhibiting open-loop control. More recently it was used by Gillie and Berry[10] to compare visual displays where they concluded that previous claims that integrated displays are superior to separable displays for monitoring tasks may not generalise to control tasks.

invalve

flowrate

heater

Fig. 1. Crossman's Waterbath.

Experiments using multimedia interfaces The Waterbath Task, illustrated in Fig. 1, is a simulated thermal hydraulic system that consists of a single tank (in the variation used for this experiment). An inflow and an outfow pipe are connected to the tank. There is a valve on each pipe that may be used to regulate the flow. The heater is situated immediately underneath the tank. Inside the tank is an insulated container containing a fixed amount of water and a thermometer. There are also imaginary sensors to measure the level of water in the tank and the rate of flow out of the tank. Thus there are three control variables: in-valve, outvalve, and heater. In our implementation of the system the controls can be set to an integer value between 0 and 100 where 0 means completely shut (or off) and 100 means completely open (or on). The state of the process is indicated by the three variables level (in millimetres), flow-rate (millilitres per second) and temperature (C). The tasks that the subjects had to solve involved achieving new steady state process states from a steady state starting point. The target state was defined as a range within which each of the system variables had to lie after stabilisation of the system. Subjects were placed in front of a PROMISE terminal and their actions were recorded on two television cameras. Sound and speech activity were also recorded. In addition, some gesture information was recorded (as it happened) on the video tape by a hidden observer. 3.2. The procedure More than 50 subjects participated in the experiment. They were mainly undergraduate students in Computer Science and were also mainly male. The few females who took part were balanced across the experimental conditions. Each session lasted about 2½ hours including a debriefing interview. After having read a brief introduction to the experiments that explained the task and described the control and state variables without giving any principles on how to operate the system, the subject was told how to operate the interface used to control the simulation. Each session consisted of two halves. During the first half the subject completed 21 problems of increasing difficulty. These problems only involved simple increases or decreases in one or more variable. The number of variables that had to be increased or decreased is referred to as the dimensionality of the task by Sanderson, Verhage, and Fuid[7]. Another attribute of the task is its compatibility. This corresponds to the minimum number of actions required to solve a problem. Sanderson et al. called problems that could be solved with a single action compatible, and problems that required two actions incompatible. For example, the task of increasing the level and increasing the flowrate is compatible because the dual goal can be achieved with the single action of opening the in-valve from its initial setting. The task of increasing the level whilst decreasing the flow, however, is incompatible because it requires two actions, such as a reduction in the outvalve (to increase the level) followed by a reduction of

207

the in-valve (to decrease the flowrate). It was shown by Sanderson et al.[7] that compatibility had a higher impact on performance than did the dimensionality of a problem. The subject controlled the progression through the session by clicking with the mouse on a "next task" button in the top right corner of the screen after the simulator had recognised that an acceptable steady state had been reached on the previous problem. Subjects were asked to verbalise their beliefs about the system and their reasoning behind actions. They were also asked to minimise the number of actions used to solve each problem. After completing the first 21 tasks the subject was given sets of semantic differential scales on which to state their ratings on various aspects of the interface and on the mental effort required to perform the task. They were also asked to answer three specific questions about the relationship between the control and state variables. The questionnaires were repeated upon completion of the second half of the session. The problems in the second half were different from the first in that the target ranges for the state variables were smaller and placed further away from the starting point for the variables. This meant that the amount of change in the controls was often as important as choosing the correct control variable. Most subjects were able to complete all 13 of these problems in the time available. Finally the subject was interviewed about various aspects of the presentation and any other comments were recorded. In addition to the subjective ratings of the subjects, we recorded the complete session on video camera. As the experiments took place in a professional behaviour laboratory we were able to mix video from three sources on the recorded tape. The three shots available were one of the subject's face, one of the screen, and one overhead shot showing the mouse and keyboard activity. The sound was recorded through a microphone attached to the subject. The final data source for the analysis was a log of all user actions and system generated events. We logged complete snapshots of the system state so as to be able to replay sessions if necessary. The logged data was used for the quantitative analysis of performance. 3.3. Experimental conditions As the experiment was of an exploratory nature it was decided to use a wide range of different media interfaces with relatively few subjects in each group rather than having two or three larger groups. We wanted to observe as many media and combinations of media in action as we could. This had implications for the likelihood of achieving statistically significant results but it was felt that these were of secondary importance compared with the exploratory observations we would be making. There were eight different conditions with five or six subjects in each. Each subject used the same interface throughout the whole session. The conditions were T, G, TMM, GMM, TS, GS, GSP, and GW as shown in Table 2.

J. L. ALTY,M. BERGAN,P. CRAUFURDand C. DOLPHIN

208

Table 2. Experimental conditions. Text T G TMM GMM TS GS GSP GW

History table

Sound

Spoken warning

Text warning

• •

• •





• •

• • •

Graphics

History graph

• • • •



• • • •

T = text only; G = graphics; TMM = text multimedia; GMM = graphics multimedia; TS = text and sound; GS = graphics and sound; GSP = graphics and spoken warnings; GW = graphics and textual warnings.

• Text only (T)--Current values for the state variables were shown numerically with the bounding values for their target ranges. In this interface the subjects altered the controls by entering a letter indicating which control should be altered ( ' T ' for in-valve, " O " for out-valve and " H " for heater) followed by a numeric value between 0 and 100 into a large command field. The current values for the control variables were displayed on the screen. • Graphics (G)--The current values for the state variables were graphically coded in a realistic impression of the waterbath simulation. The level was shown as blue water rising and falling within the tank, the temperature was shown by rising and falling mercury (which happened to be red) in a thermometer and the flow rate was shown as a scaled up bar, which indicated the flowrate. The target ranges were also graphically coded with yellow marker lines indicating where the extremes of the target ranges were for each variable. The state variable values were numerically annotated on the diagram. Sliders were used for the three controls in this condition so no keystrokes were necessary. • Text Multi Media (TMM)--Numerical values of the state variables were shown in scrolling lists. These lists therefore showed the recent history of a state variable's values. Target ranges were shown numerically. In this condition the subject also received spoken warnings when abnormal situations arose. There were two types of warnings: For emergency warnings, such as when the tank was about to overflow with water and the system automatically shut off the invalve, a female voice was used. For trend warnings, which occurred whenever a state variable was persistently moving away from its target range over a long period, a male voice was used to draw the subjects attention to that fact. The flowrate variable was indicated with a continuous sound of flowing water which varied in volume according to the value of the flow rate. • Graphics Multi Media ( G M M ) - - A scaled-down version of the graphic impression of the system used in G above was shown together with a line graph that showed the trends and current value of each of the state variables as different coloured lines. The spoken warnings and the water flow sound were also present.

• Text and Sound (TS)--This was the same as T with the addition of the water sound for the flow rate. It was included to enable us to examine the effect of sound in isolation. • Graphics and Sound (GS)--The same as G with the inclusion of the water sound. • Graphics and Spoken Warnings (GSP)--The same as G with the inclusion of the spoken warnings in the event of erroneous process states. The warnings were in fact provided both textually in the form of messages appearing at the top of the screen and verbally. • Graphics and Textual Warnings (GW)--This is the same as G except that the warnings were only given in the textual manner. 3.4. Independent variables With the verbal protocol analysis, the subjective ratings and comments, and logged data there are clearly a great many ways in which data can be analysed and results presented. In this chapter we present the results under main headings that correspond to dependent variables under investigation. By comparing results from individual or combinations of conditions we can isolate the following main independent variables of which all but one relate to media. 1. The presence/absence of warnings--By comparing conditions G vs. GW and GSP, the effect of having warnings can be examined. Admittedly, the way the warnings are presented is likely to influence the overall effect of the warnings. A comparison of the two ways to present the warnings is also carried out. 2. No sound vs. sound--Comparing conditions T and G vs. TS and GS isolates the use of sound as the only difference between the two groups. 3. No speech vs. speech--GSP vs, GW in Table 2. An alternative grouping here could be T, G, TS, GS, and GW (i.e., conditions without spoken warnings) vs. TMM, GMM, and GSP (conditions with spoken wamings). However, this would introduce a number of other variants that could affect the analysis, such as the presence of trend information in some conditions, the presence of sound in some conditions and the use of text (numbers) as a main source of information in conditions some and graphical representations in others. Thus comparing conditions GSP and GW is most appropriate as the only dif-

Experiments using multimedia interfaces ference between the two is that in condition GW the warnings are redundantly presented with speech. 4. Text vs. graphics--T and TMM vs. G and GMM. There is pair-wise correspondence between the conditions in each of these groups. Within each pair the information content and redundant media usage are the same but the primary media for presenting the information is textual in the first condition and graphical in the second condition. To obtain an indication of whether an interface with a textual inclination or one with a graphical inclination is better for operating, these pairs can be analysed individually or the two groups mentioned above can be analysed. 3.5. Other measures 3.5.1. Performance measures. For the performance analysis of the data logs we chose the completion time for a task, the number of actions needed to complete the task and the number of warning situations as the dependent variables. 1. Time taken to complete the tasks--Completion time is perhaps the most straightforward measure. Obtaining a good (low) time score meant the session was completed quicker and it gave the subjects the feeling of being in control of the simulator. Thus, there was a clear incentive for the subjects to optimize their performance in this respect. Another reason for using the completion time is that the concept of time saving is extremely relevant to the process control domain. 2. Number of actions--The number of actions used is perhaps not as robust a measure as completion time. Although the subjects were asked to minimise the number of actions taken, it became very clear that in situations where there might be a trade-off between speeding up the process and minimising the actions subjects opted to speed up the process at the cost of additional actions. However, we include this measure in our analysis and attempt to offer explanations where the number of actions seems to deviate from the other two measures. 3. Number of warnings given--The number of warning situations should provide a good indicator of how well the subjects operated the simulator. Although not all the conditions included any actual warnings, the number of abnormal situations that would cause a warning to be triggered could still be counted. A large number of such situations would indicate a poor performance. Subjects were observed doing the tasks and were debriefed afterwards. They were also required to answer value judgement questionnaires about the interfaces. 3.5.2. User opinion o f the interface. 1. Subjective Rating Questionnaires--ln the middle and at the end of each session the users were asked to fill in a questionnaire in which they were asked to rate all the different aspects of the interface (graphics, text, sound, speech etc.) on a number of

209

bi-polar adjective scales. These were then analysed and compared between the different interfaces to find out the users perception of the strengths and weaknesses of the different interface styles. The subjective rating scales used were those reported by Shahnavaz, Wang, and Boucherat[ll] as part of their study on the specification of evaluation tools and methodology. A list of bipolar adjective pairs, for example "pleasing--irritating," on a scale ranging from 1 to 7 was presented. The positioning of the positive and negative adjectives on the top end or the bottom end of the scale was balanced between the pairs on the questionnaires to prevent response bias. However, when presented here the positive adjective is always placed at the top end of the scale to ease comparison. 2. Debriefing interviews--After the experimental session the subjects were interviewed and asked their views on the different styles of presentation, any tasks or procedures that they had found particularly difficult, any possible improvements they could suggest and so on. Some of the interviews were videoed and some of them were recorded on paper. Observation of testing--The recordings of the subjects taken during the experimentation were watched to glean any further information. For example: • additional comments about the interfaces; • problems the users had with the tasks; • misunderstandings/misconceptions about the system, controls or displays; • strategies used to achieve goals and avoid problems; • learning patterns. 3.5.3. Measures concerning user understanding o f the system 1. Verbal protocols analysis--The subjects were asked to speak out loud while they were doing the tasks and verbalize why they used the control actions that they used, any connections that they noticed between the control and the state variables and any other thoughts or opinions that they had about the system. The experimental sessions were videoed and the statements concerning connections between the variables were then analysed to find the percentage that were correct. The dependent variables used for the verbal protocol analysis correspond to those used by Sanderson et al.[7]. For 50% of the first 21 problems the number of correct and incorrect statements about a particular system or control variable was coded. The number of correct statements is expressed as a proportion of the total number of statements. 2. Knowledge state questionnaire--In the middle and at the end of each session the subjects were asked to complete a table that gave them control actions using each of the controls and asked them what the effect would be on each of the dependent variables. These tables were analysed to give an idea of the

210

J. L. ALTY,M. BERGAN,P. CRAUFURDand C. DOLPHIN



No Warnings given

200

5

[]

Warnings given

u>

2



°O ~ m

3 lOO

~4

0

Completion time

Number of Actions

Number of Warning situations

Fig• 2. The effect of warnings on performance. subjects' understanding of the effects of the different controls. 4. RESULTS AND DISCUSSION

4.1. Warnings The first effect analysed is that of receiving warnings about persistently bad trends in the state of the process. Figure 2 shows that the warnings improved performance with respect to completion time and number of situations where warnings should be given, but slightly increased the number of actions. However, the difference in the number of actions does not reach significance whereas the other two differences are significant (p < 0.02 for completion time and p < 0.04 for warning situations). The different pattern of results between the number of actions and the other two measures raises a number of interesting issues. In most cases receiving a warning about a bad trend simply enabled the subject to take corrective action earlier than would have been done without the warning• Following that reasoning, the number of actions would not benefit from the warnings as corrective



No Warning

action would have to be taken at some point in any case in order to solve the problem. However, the tendency for the number of actions to be greater in the warnings situation may indicate that warnings can elicit or trigger an initial pressurised response that may not be the best or even correct action and must be followed up by a further corrective action. Without any warnings, subjects strayed far more frequently into situations where a warning would normally have been given. Thus warnings tend to keep a subject within a narrower envelope. When subject comprehension is examined, warnings did have a significant detrimental effect on the number of correct statements made about the outvalve and the flowrate (Fig. 3). These are the variables that are the most difficult to comprehend for subjects in conditions both with and without warnings. Warnings may have a detrimental effect on subjects' understanding of the most difficult aspects of the system because they may be inherently disruptive to the thought process of the subjects. In many situations a subject may be in the middle of observing how the system behaves, trying to build a picture of cause and effect in the system, when a warn-

[]

Warning

1oo

.~ so (p<.o8) (p<.o I )

P" ~

20 0

In valve

Out valve

Heater

Level

Flow Rate Temperature

System or State Variable Fig. 3. The effect of warnings on comprehension.

211

Experiments using multimedia interfaces ing suddenly disrupts this process and forces a corrective action. The action taken may not always be the correct one, as seems to be indicated by the performance results, and it leaves the subject none the wiser about the underlying principles of what is going on. The subjective rating scales indicated that warnings were generally received very well by the subjects. This was also evident from the interviews. However, some subjects commented that warnings sometimes persisted after they were aware of the problem (warnings were repeated at regular intervals if the bad trend that caused the initial warning had not been turned round). This was often due to the slow response of the system, particularly with respect to the temperature. However, we did observe situations where a subject thought he had corrected a bad trend but had either taken the wrong corrective action or had not altered the controls sufficiently to turn round the trend. In these cases it may have been more appropriate to use a different warning that could bring the subjects attention to the mistake. 4.2. Sound (non-verbal) As mentioned previously, sound was intended to provide subjects with redundant information about the flowrate. We chose to use a realistic sample of water flowing from a tap that was played continuously and varied in volume according to the flowrate. In doing so we followed the philosophy that a realistic sound would be more easily understood than would a synthesized unrealistic sound. This view has been strongly put by Gaver[ 12]. The trends in the Fig. 4 (overleaf) indicate that sound had a detrimental effect on performance overall. However, only the number of warning situations approaches significance (p < 0.1). In order to see whether the sound was equally bad for all types of tasks, we broke down the tasks into three categories of difficulty. In category 1 we put the compatible tasks from the first half of the session, that is, the compatible tasks that had unrestricted target ranges for the state variables. In category

No Sound

200

8 -

2 we put the incompatible tasks with unrestricted target ranges. In category 3 we put the tasks from the second half of the session with restricted target ranges for the state variables. These categories also correspond roughly to the order in which subjects had to solve the problems, so it may also reveal something about the learning effects. A repeated measure analysis of variance with the difficulty of the tasks as within subjects factor revealed the following pattern. The clear pattern that emerges from the graphs in Fig. 5 is that operator performance with interfaces using sound was better for the difficult tasks than for the easy ones. The trend shows a marked difference between the category 2 and category 3 tasks. However, multivariate analysis does not show a significant interaction between the difficulty factor and the use of sound. The only dependent variable approaching significant interaction is the number of warning situations (p < 0.15). Although the trend is not significant, it is tempting to speculate about what caused this trend. The fact that category 3 tasks had restricted target ranges that were often placed a considerable distance away from the starting points for the state variables generally meant that larger changes in the flowrate were required in these tasks. The larger changes in the volume of the sound may have enabled subjects to make some use of the audio information or at least made it less irritating and distracting. It may also be that since the category 3 tasks were towards the end of the session, the subjects had become accustomed to it and perhaps managed to use it to their advantage. The verbal protocol analysis of number of correct statements made under this condition (Fig. 6) indicates that sound decreases comprehension about the very aspects of the system in relation to which it was meant to provided redundant information. This result was wholly unexpected. At worst we expected there would be no difference. The only reasonable explanation seems to be that the sound did in fact disturb and distract people from reasoning about the system. Note

[] Sound

t.o o

1.0-

o~

t5

O.8' ¢n O e'4

O.6'

_=

~ 4

100

0

.

. ~ 0.4E

0.2o

0.0-

o Completion time

N u m b e r of Actions Fig. 4. The effect of sound on performance.

ii

N u m b e r of W a r n i n g situations

J. U ALTY,M. BERGAN,P. CRAUFURDand C. DOLPHIN

212

÷ 300

Time

No Sound

o---

Actions

Sound t4"

12"

Warning situations

1C"

206

8 6 4

2 i



6

1



2

j

!

3

,

1

=

,

2

I

I

=

,

1

3

,

'2

!

3

Fig. 5. The effect of sound over task difficulty.

that the verbal protocol analysis was only based on selected tasks from categories 1 and 2 as described above where the sound appeared to have a worse impact on performance than in category 3 tasks. Also, it is important to remember that the flowrate and the out-valve were the aspects of the system that were most difficult to understand overall. Sound was generally received badly by the subjects. If we compare the way in which they rated sound and speech, for example, we can see from Fig. 7 (overleaf) that the sound was thought to be less important, more difficult to use, incomplete, unclear, and irritating than the speech. It is tempting to condemn sound completely, based on our findings. However, it is only the way in which we used sound that seems to have had a poor impact on performance, understanding, and subjective ratings. The main problem with the sound we used was that it was not possible to detect small changes in the sound volume. We were able to play the sound at 100 different volumes, but it appears that this scale was not sufficiently graded for the small changes in the flowrate that were often important. Additionally, subjects appeared unable to pick up the smallest volume changes. By opting for a sampled sound we had hoped to exploit



100

associations with water flowing on behalf of the subjects. We succeeded in this in that no subjects were in any doubt that the sound they heard was the sound of water flowing out of the bath. However, changing waterflows produces changes in the pitch of the sound as well as the volume. We did not have sophisticated enough equipment to produce both of these. Thus, we did not fully exploit associations with changingwaterflows on behalf of the subjects that were perhaps the most important association for the subjects to make. This would account for the sound not being a benefit to the subjects. To account for the decrease in performance the most obvious reason is that the sound was continuous. This was clearly an irritating factor that put many subjects off the task at hand. This does not mean that sound cannot be used to convey continuous information. In the Arkola simulation described by Gaver, Smith, and O'Shea[13], they used discrete sounds to convey continuous information about the state of a bottling plant with considerable success. On a positive note we should not forget the trend found in Fig. 5. The fact that the sound became more useful towards the end of the sessions may indicate that despite the inappropriate way in which we em-

No Sound [ ]

Sound

80

==6o =~ u=

4o

2O

In valve

Out valve

Heater

Level

Flow Rate Temperature

System or State Variable

Fig. 6. Effect of sound on comprehension.

213

Experiments using multimedia interfaces •

6

Sound

[]

(N.S.) _ _

(p<.04) '¢--"

(p<.06) "

Speech

(p<.03) .

(p<.07) ""

|iiii~!

(N.S.)

=....

2

0

:;:~:~:~

v ~,-:.:

.......

l.';N:.~i

~:::+:

Important

Easy to Use

Complete

Clear

Not tiring

Satisfact.

Pleasing

Rating Scale Item Fig. 7. Subjective rating of the auditory information. pioyed the sound, people still showed signs of using it with practice. This suggests that people have a remarkable ability to utilise the audio channel. 4.3. Speech and visual warnings We have seen that warnings about bad trends in the system had a positive effect on performance, particularly for the difficult tasks. Interestingly, we also found that the warnings decreased subjects' understanding of the system. The next question to be posed is whether the way in which the warnings were presented to the users had an impact on these effects. Here we compare the condition where the warnings were given only textually (GW) and the condition in which the warnings were spoken as well as given textually (GSP). It can clearly be seen (Fig. 8) that overall, there is no difference in performance due to the way in which the warnings were given. This was not anticipated because the speech had generally been well received by [ ] No Speech 1 O0

5-

80

4-

60

3-

the subjects and they claimed it helped them solve the problems. It appears that the real help came from the fact that they had warnings at all. Again we inspect where the differences (if any) between the speech and the no-speech conditions lie with respect to tasks of varying difficulty. The pattern in these graphs in Fig. 9 shows that speech was slightly more beneficial to subjects in the earlier/easier tasks. However, the interaction between difficulty and use of speech is only significant for completion time (p < 0.05). Nonetheless, it is possible to offer a plausible explanation for this pattern. In the beginning, the subjects react slightly better to spoken rather than printed warnings. This is probably due to the fact that a spoken warning is seldom (if ever) completely missed. As the session goes on the subjects in conditions with spoken warnings may come to rely on the warnings to help them solve the problem. They may relax in the knowledge that "it will tell me something is wrong anyway." []

,.0

Speech

o

0.5

•"=

OA

~

e,.,

O

¢13

0.6

=

"~= 0.3 L. ¢4

40

0.2

2O

1. ¢' Completion time

o.1 o.o

0N u m b e r of Actions

Fig. 8. The effect of speech on performance.

N u m b e r of Warning situations

J. L. ALTY,M. BERGAN,P. CRAUFURDand C. DOLPHIN

214

No Speech

Speech

ca

Actions

140

8

120

7

1013

6

8C

5

60

4

40

3

1.2

Warning situations

1.0 0.8 0.6

2(3

~l

"

J2

12

~3

0.4 0.2 •

't

"

'2

"

'3 "

I1

J2

"

13

Fig. 9. Performance of speech over task difficulty. As the warnings only come as a result of established bad trends, this is not an optimal way to operate the system. We received a great deal of positive feedback about the speech. Another interesting comparison is between the subjective ratings of spoken and textual warnings. Figure 10 shows no significant differences between the way subjects perceived spoken and textual warnings although there is a quite large difference with respect to the rating of importance. A closer look at the data showed that there was a bi-modal distribution for the ratings of importance for text warnings. One group of subjects had rated the textual warnings as very important and another group had rated them as rather unimportant. Further investigation revealed that these two groups corresponded very well to people who had rated the overall task as difficult and those who had rated the task as easy. This prompted a two-way analysis of variance to see if this was a significant pattern. Figure 1 1 shows that this result is highly significant. There is a clear difference in the rating of the text warnings. Those who found the task easy, found the text warnings unimportant and those who found it hard, valued the text warnings as being very important. This difference does not exist for speech warnings that was rated as important by all subjects.



The likely reason for this result is again that the speech is very intrusive and is never missed. Another important attribute of speech is that it can be processed in parallel with the visual perception of the screen, which means no extra effort is required to take in the spoken information. Thus all the subjects, irrespective of how hard they found the tasks, picked up and employed the spoken warnings. Textual warnings on the other hand can be easily missed. It also requires that the subject switches attention from operating the task to read the warning message. Subjects who found the problems easy to solve may have been more focussed at solving the problems and not paying so much attention to events in the periphery of their vision. It should be noted that the textual warnings were by no means obscure. Thus the earlier suggestion that spoken warnings invite the subjects to rely on them to a greater extent than the textual warnings, may have to be qualified to be true only for people who find the tasks easy to perform. 4.4. Other results A question that may be legitimately asked about multimedia interfaces is whether they actually make any difference. Does the presentation affect the way in

SpeeehWarnings

[]

TextualW a r n i n g s

,I P.,,0

4

~4

3

2 1

0

Important

Clear

Helpful

Complete

Not tiring Easyto U s e

Rating Scale Item Fig. 10. Subjective rating of textual and spoken warnings.

Pleasing

215

Experiments using multimedia interfaces

,a

=o

6

e-'

4

o

| Low Perceived Difficulty ] High Perceived Difficulty 2

E c:

0

Speech Warnings Textual Warnings Fig. 11. Subjective rating of warning medium by perceived task difficulty. is trying to model the system. They are effective in assisting a subject to attain a particular system state but are not so helpful if the goal of the user is improved understanding. Warnings should be particularly useful to trained operators but should be minimised during exploratory learning. This explains why subjects found the warnings useful and yet showed a lower understanding when warning levels were higher. Sound is generally thought to be useful in delivering warnings and for context switches. In these experiments sound was found to have a detrimental effect on performance. Even more worrying, it seemed to have a detrimental effect upon the level of understanding of the very variables (e.g., flowrate) it was designed to help. However our use of a complex naturalistic sound may not have had the discriminability required. Also human beings can distinguish quite small changes in sound. However they are not good at using sound as

which information is picked up and consequently reasoned about? Figure 12 indicates that the presentation does indeed matter. It can be seen that in these experiments there were certain aspects of the system that was very well understood by all subjects, irrespective of what interface was used. In particular, comments about the behaviour of the temperature and the operation of the in-valve were consistently above 95% correct for all interfaces. Some insignificant variations due to the interface was found for the heater and the level, whereas significant variations were found for the flowrate and the outvalve that averaged round 50% and 30% correct statements made respectively. It is clear that the presentation matters considerably for understanding more difficult aspects of a task. This opens up an interesting question of whether there is exists a point where a problem becomes so difficult that the type of presentation again becomes irrelevant. It seems like a reasonable proposition. If the information presented is not sufficient or pitched at the wrong level of abstraction for solving a difficult task it is unlikely that whether it is presented graphically or textually or otherwise will not make a great deal of difference. This is to say that presenting the information an appropriate manner is indeed important but presenting the right information to solve the problem is still more important. The users perception of the interfaces varies with the nature of presentation of information. The text only interface was the least popular being scored overall as the least helpful and the most irritating. Conversely the Graphics Multimedia interface received the most positive response on both scales (Fig. 13). 5. S O M E

POSSIBLE

FROM

THE

=

8o

~

60

(p<.01

~ ,o 8

~

20

Level

FlowRate State Variable

Temperature

100

8o

CONCLUSIONS

~

60

-~

2O

~

o

(p<.01)

RESULTS

The first point to be made (which will be rather obvious to psychologists) is the difficulties involved in this type of experimentation and the dangers in assuming that engineers and scientists can produce optimal interface designs by chance. The adaptability of human beings, the affect of a task upon human behaviour, and the variability between human beings always makes experimentation difficult. Rarely can one come out with a simple definitive statement. However, we can make some interesting observations from the data. Warnings seem to affect comprehension because they can often intrude when a subject

C~ 17;3-B

I00

Interface



Invalve

Heater

Description

Textonly

[ ] Graphicsonly ]

Outvalve System Variable

[] Text Multmedia

[ ] GraphicsandWarnings+ GraphicsandSpeech

GraphicsMultimedia Fig. 12. Effects of interface on comprehension.

J. L. ALTY,M. BERGAN,P. CRAUFURDand C. DOLPHIN

216

{F(4,41)=2.3 p<.07}

{F(4,41)=3.7 p<.01

6~

~5

~t_

5

~4

~

4

~

3

=

2

°~

t-

~2

:::::::::::::::::::::

1 :::::::::::::::::::::::

I r r i t a t i n g - Pleasing

Unhelpful - Helpful

Interface 0escription •

Textonly

[]

Text Multmedia

[]

Graphicsonly

[]

Graphicsand Warnings+ Graphicsand Speech

[]

GraphicsMultimedia Fig. 13. The user perception of the different interface styles.

an absolute measure. In this particular experiment it was the absolute value that also mattered as well as the change. It was important to know if the flow rate was changing in the right direction when a target limit was being approached, but the sound output changes were not sufficient even to provide for this. The continuous nature of the sound also had a deleterious effect. Most subjects liked the speech warnings. Speech did seem to improve performance during learning but it is very intrusive and tends to dominate other information. Those who found the task easy also found the textual warnings irrelevant. Those who found the task difficult found the textual warnings important. However this was not followed through with respect to speech. Even those who found the task easy regarded the speech warnings as important. It seems that speech warnings, in themselves, command special respect possibly because of parallel processing of audio information. For this particular task the graphical representation proved to be the most preferred. Thus our contention that this spatially oriented task would best be presented using graphics is born out by experiment. Finally, the media used for presentation does indeed matter. Those aspects of the system that were particularly difficult to understand benefitted greatly from an appropriate presentation confirming our hypothesis that higher bandwidth for a difficult concept should be of benefit. 6. THE

FUTURE--THE TO

THE

APPROACH

EXEMPLARS

The PROMISE system is being used to develop process information systems in two areas: DOW Benelux's chemical plant in Terneuzen, Holland, and the Hunterston B Nuclear Reactor Simulator of Scottish Power in Ayrshire, Scotland. These companies are partners

in the PROMISE project and the exemplar systems are part of the deliverable output from the project. The first version of the exemplars have been completed. All the experimentation reported above is really a prelude to moving into the Real Plant and Simulator to investigate how multimedia interfaces affect operators in the performance of their tasks. It is too early to report on the success or otherwise of the system, but we can give a general indication of the application areas we are attacking.

6.1. The DO W Benelux exemplar At DOW Benelux, the system has been installed in the control room of one of the chemical plants at the site in Terneuzen for about 8 months. It should be noted that it is a process information system, rather than control system. The information and control systems are separated in order to prevent impulsive control actions taking place. The information system exemplar consists of a range of multimedia applications such as an alarm handler, a flowsheet browser, trending, a logic analyser, a glossary, on-line help, and several expert systems. These are all integrated so that moving between applications is facilitated in a manner similar to a hypermedia system. For example, whilst inspecting a plant flowsheet the operator can request additional information about a process variable such as the flow through a pipe by clicking on the pipe and select "trending" from a menu. This activates the trending application and a recent history of the variable will be shown in an appropriate form, such as a line graph. The goal of the operator is to ensure efficient production of the Amines. The process is very complex and involves a large amount of recycling. This means that there are no absolute measurements for efficiency available to the operator. It is only through experience that judgements can be made about the process state

Experiments using multimedia interfaces based on multiple parameters. Investigations carried out by P R O M I S E revealed that different shifts of operators often use different sets of parameters in their monitoring. P R O M I S E has introduced knowledgebased systems to aid the operators in this respect, but it is not possible to remove the need for the operator to constantly make judgements about the quality and efficiency of the production. The operators have therefore been very involved in the design of the applications. There has been a long period of refining and adding functionality to the system in order to make it usable and to gain the operators confidence in it. We have now achieved this and the system is in a stable state. This is illustrated by the fact that when the P R O M I S E workstation was first put in the control room it was situated at the far end of the operators desk. It was soon requested that it was moved to a central position where it would be more easily accessible. We have already recorded a great deal of the operators' reactions to the system. A long period of semistructured evaluation of the system in use will now follow. This will partly be based on logs of system usage and process behaviour and partly based on extensive interviews with the operators. We will concentrate on the evaluation of the multimedia features of the system. It is hoped that through analysis of this data we will be able to derive general rules and guidelines for the use of multimedia in process control. 6.2. Scottish Power exemplar One of the main differences between the two application areas is that the interface developed at Scottish Power is used with a simulator of the Hunterston B Nuclear Reactor rather than the reactor itself. From the operators point of view there is no difference between the reactor and the simulator as the simulator control room is an exact copy of the real thing. It is used for operator training. From PROMISE's point of view this fact can be used to advantage. A main obstacle when evaluating a system in use with a "live" process is that the evaluation cannot be allowed to interfere with the running of the plant. With a simulator however, it is possible to perform more controlled comparative evaluations by repeatedly monitoring how operators handle process scenarios set up by the experimenters. The system developed at Scottish Power is similarly not intended to replace the current control system. The first P R O M I S E application is on the automatic reactor shutdown sequence. A reactor trip and consequent shutdown is a relatively rare occurrence, but when a trip does occur, a large number of operations have to be carried out within seconds and minutes depending upon the state of the plant at the time of the trip. The goal of these actions is to cool the reactor. The automatic shutdown sequence will always follow the safest route regardless of the extra cost this might incur. The role of the operators during a reactor shutdown is to closely monitor the progress of the sequence and the state of the relevant equipment. Operators cannot in-

217

terfere with the sequence itself but may provide additional information that allows the system to take preventive measure to ensure a more economical path (for example, an operator might realise that a piece of equipment is not available because it has just completed its maintenance period, but a phone call might bring it back on-line). Once the shutdown is completed (the normal sequence takes between 7 and 10 minutes) and the reactor is safe it is the operators responsibility to ensure that the reactor can be brought up again as soon as possible. The financial losses that may result from operator inefficiency during this short and stressful period can be significant. It is clear that the information requirements of the operator during a reactor shutdown are of a different nature than those of an operator monitoring Amine production at DOW. The short duration and the high stress of a shutdown means that there is no time for the operator to browse through extensive applications to find the intbrmation needed. Important information, such as the position of the inlet guide vanes on the gas circulators, must be automatically provided at appropriate times by the information system. Thus the interfaces that are under development will consist of a collection of predefined screens with different views of the key information. Abstract graphs that produce distinctive patterns when key variables are within expected bounds are used so that abnormalities can be quickly identified. We expect to carry out controlled experiments with the operators to evaluate the merits of the different types of presentations used. This will include the use of audio although it is expected that this must be restricted given the large amounts of audio alarms generated by the control system during the reactor shutdown.

Acknowledgements--We would like to thank all our partners in the PROMISE consortium--The Chemical Engineering Department at The University of Leuven (Belgium), Scottish Power in Glasgow, DOW Benelux in Temeuzen (Netherlands), EXIS (Italy) and Tecsiel (Italy). We also gratefully acknowledge support in project No. 2397 from the ESPRIT programme of the European Commission. REFERENCES

1. Open Computing, Multimedia: The key to a thousand doors. Open Computing 3, 39. 2. J. Beagles-Roos, Specific impact of radio and television on adult story comprehension. Presented at a meeting of the American Psychological Association, Los Angeles (1985). 3. A. T. Mayes, C. Dolphin, and J. S. Berzal, Critical M41 issues, deliverable No. 37. PROMISE Esprit and Project No. 2397, Report No. PRO/HCI/I 1/3 (1989). 4. H. Marmollin, Multimedia from the perspectives of psychology. In Multimedia: Systems, Interactions and Applications, L. Kjelldahl (Ed.), Springer-Verlag, Berlin, 3952 (1992). 5. J. L. Alty and C. D. McCartney, Design of a multi-media presentation system for a process control environment. In Multimedia: Systems. Interactions and Applications, L. Kjelldahl (Ed.), Springer-Verlag, Berlin, 293-306 (1991). 6. E. R. Crossman and F. W. Cooke, Manual control of slow response systems. In The Human Operatorin Process Control, E. Edwards and F. Lees (Eds.), Taylor and Francis, London, 51-64 (1974).

218

J. L. ALTY, M. BERGAN, P. CRAUFURD and C. DOLPHIN

7. P. M. Sanderson, A. G. Verhage, and R. B. Fuld, Statespace and verbal protocol methods for studying the human operator in process control. Ergonomics 32(11), 13431372 (1989). 8. N. Moray, P. Lootsteen, and J. Pajak, The acquisition of process control skills. 1EEE Transactions Systems, Man and Cybernetics, SMC-16, 497-504 (1986). 9. N. Moray and I. Rotenberg, Fault management in process control: Eye movement and action. Ergonomics 32(11), 1319-1342 (1989). 10. T. Gillie and D. Berry, Object displays and control of dynamic systems (submitted). 11. P. R. Boucherat, R. H. Benton, H. Shahnavaz, and E. Wang. Emulation of a Multimedia Workstation for the investigation of integration principles in design for useability, human factors in Telecommunications. Proceedings of the 13th Symposium. Torino, Italy, 253-260 (1990). 12. W. W. Gaver, The SonicFinder: An interface that uses

auditory icons. Human-Computer Interaction (Vol. 4), Lawrence Erlbaum Associates, Hillsdale, NJ (1989). 13. W. W. Gaver, R. B. Smith, and T. O'Shea, Effective sounds in complex systems: The Arkola simulation. CHI

'91, Proceedings of the Conference on Human Factors in Computing Systems, ( 1991 ). 14. S. Baker and E. Marshall, Evaluating the man-machine interface--The search for data. In Training, Human Decision Making and Control, J. Patrick and K. Duncan (Eds.), Elsevier/Noah Holland, Amsterdam (1988). 15. P. M. Greenfield, Electronic technologies education, and cognitive development. In Applications of Cognitive Psy-

chology. Problem Solving, Education, and Computing, D. E. Berger, K. Pezdek, and W. P. Banks (Eds.), Lawrence Erlbaum, Hillsdale, NJ, 17-31 (1987). 16. C. D. Wickens, Engineering Paychologyand Human Performance, Charles E. Merrill, Columbus, OH (1984).