ARTICLE IN PRESS
Int. J. Human-Computer Studies 64 (2006) 240–250 www.elsevier.com/locate/ijhcs
Performance of new participants in virtual environments: The Nottingham tool for assessment of interaction in virtual environments (NAI¨VE) Gareth Griffiths, Sarah Sharples (nee Nichols), John R. Wilson Virtual Reality Applications Research Team, Institute for Occupational Ergonomics, University of Nottingham, Nottingham NG7 2RD, UK Available online 20 October 2005
Abstract There is a need for an assessment tool which reliably distinguishes levels of participant performance in virtual environments (VEs) built within virtual reality (VR) systems. Such screening might be of potential users amongst a company’s staff or might be carried out by human factors experimenters prior to the start of experiments in order to provide a base-line of participant competences. The Nottingham Tool for Assessment for Interaction in Virtual Environments (NAI¨VE) comprises a set of VE tasks and related tests, with appropriate performance criteria levels, covering the main aspects of navigation (viewpoint) control and object manipulation and operation. Trials with test participants enabled performance levels to be set to distinguish good, adequate and poor performers and tests to be distinguished according to whether performance in the general population is evenly spread or is skewed towards success or failure. r 2005 Published by Elsevier Ltd.
1. Introduction At one of the first ever meetings on virtual reality (VR) technology, and on the virtual environments (VEs) that are experienced using VR, Bricken (1991) identified the paucity of users as a problem facing VR growth. By this he meant both that researchers needed more real users in order to better understand VR/VE participation and also that greater take-up in practical applications would only happen if potential users could observe real use by other early adopters of the technology. We have also noted this problem, in two particular contexts; first in experimental tests of VR interfaces, assessing participant behaviour and performance and the (side) effects of VR/VE on participants; and second in assessing the readiness of potential users to work in real settings. VEs have fascinated human scientists (ergonomists, psychologists, etc.) from the start, initially perhaps more than they did many computer scientists (Wilson, 2003). Corresponding author. Tel.: +44 115 951 4004; fax: +44 115 951 4009.
E-mail addresses: Gareth.Griffi
[email protected] (G. Griffiths),
[email protected] (S. Sharples (nee Nichols)),
[email protected] (J.R. Wilson). 1071-5819/$ - see front matter r 2005 Published by Elsevier Ltd. doi:10.1016/j.ijhcs.2005.08.008
They seem to offer the opportunity to test almost all aspects of human behaviour that are found in life in general, and to assess a new set of phenomena also. They provide a richer ‘‘experimental laboratory’’ than do Graphical User Interfaces (GUI) with what seem to be hundreds of possible dependant and independent variables. Research has been carried out into many different factors of usability and a variety of methods and metrics have been used (Bowman et al., 2002). Equally, there is a substantial literature in the understanding of how VE participation impacts on visual strain, instability, sickness and other side effects (e.g. Stanney and Hash, 1998; Cobb et al., 1999; Draper et al., 2001; Nichols and Patel, 2002). One of the main problems for much of the experimental work underpinning understanding of VE interactions, usability and side effects concerns the level of expertise of experimental participants. How can we assess navigation in a VE if some participants are struggling to orient themselves in the VE, or to use the input and control devices? Is there an interaction between ratings of feelings of presence and difficulties in viewpoint control? Do problems in understanding the interface increase or reduce feelings of sickness? Much experimental work has allowed naı¨ ve participants some form of familiarisation phase,
ARTICLE IN PRESS G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
sometimes with a test at the end. Lindeman et al. (2001) allowed users to have as many practice trials as they needed and they were instructed to indicate when they could conduct the task quickly and accurately. Pausch et al. (1997), in an experiment which focused on locating a specific letter on various walls, allowed users to carry out practice runs until they felt comfortable. It could therefore be of value to have a structured set of standardised performance tests, to ensure that all participants in a trial are approximately equal in skill and level of understanding. A second and different need is for a screening tool for actual VE use. Organisations that might use VR/VE—a company, hospital, local authority or school for instance— want to know if a manufacturing engineer, surgeon, member of the public or student have sufficient base level experience and skill to get the most out of the VE, whether this is for process layout, laproscopic surgery, new housing visualisation or physics lesson. Within this, training applications of VE really do require some means of knowing, and equalising if possible, participants’ entry level skills and performance, to allow better assessment of training transfer. This paper describes a tool developed to meet these needs—for screening experimental participants and assessing potential user competence: the Nottingham Tool for Assessment of Interactions in Virtual Environments (NAI¨VE). This has been developed in order to aid identification of participants who are (or would be) experiencing difficulties in participating in a VE, and also to identify what types of task appear to be creating greatest difficulty for new (and not so new) users. The work on NAI¨VE had begun when the IST project VIEW of the Future (www.view.iao.fhg.de) started, and so is based on less sophisticated VR systems than the VIEW developments, but the general principles are highly relevant. The only well known work related to NAI¨VE is the VE Performance Assessment Battery (VEPAB) (Lampton et al., 1994). This measured and evaluated human performance within VE. This was a relatively thorough test, but the tasks within it might be regarded as rather limited and did not go far enough to actually test the participant. The tests were also discrete in the testing environment, with participants told when and where to execute the tasks. We wanted to allow more freedom of use whilst maintaining a structure to NAI¨VE so that actions can be related to specific circumstances. The paper opens by examining the content needs of a VE user screening tool. NAI¨VE is built within a large VE and the choices for, and development of, this are discussed next. Then results are reported for a first test of NAI¨VE with 40 users, particularly emphasising those tests that distinguish between levels of user on a consistent basis. 2. Basic components of NAI¨VE The screening tool that was to be developed had to be robust and flexible enough to accommodate as many types
241
of participant as possible. It should also contain as many basic participant tasks as needed to establish the readiness of VE participants to carry out most core activities when using a VR system. The tasks had to be relatively generic. One possibility was to adapt existing VEs by incorporating tests within the environment and interface. However, in order to include tests on a rational basis it was decided to build a new VE in the light of previously defined appropriate test tasks. By first deciding what tests were required to distinguish between levels of ability of participants, we were better able to effectively build the environments around the tests, instead of the other way around, thus creating a more coherent series of environments and tests. Also, rather than a series of discrete tasks, the tests were mostly integrated into a single large and complex VE that could provide a form of narration, to enhance participant interest and involvement and give a coherent feel to the whole screening tool. We divided types of interaction within the test VE into two: navigation interaction, controlling the viewpoint to move around the VE; and object interaction, using the interface to pick up, move or activate virtual objects (see also Kaur et al. (1999) and Bowman et al. (2001) who define different types of interaction). 2.1. Navigation interaction There are two fundamental ways to navigate around VEs. The first involves being constrained by gravity— positioned on the ground, walking or riding in a vehicle. The second method allows participants to ‘‘fly’’, being unconstrained by having to be in contact with the ground. Tests of navigating while grounded should contain actions such as moving forwards in a straight line and navigating around corners; even these simple tasks may be difficult for a new VE participant. A participant may need to move backwards in a straight line, and also rotate in an 1801 arc. Therefore, for simple navigation, the set of tests included:
Cornering 1801 rotation Forward motion Backward motion.
Potential users also have to make more complex manoeuvres, for instance to walk through doorways, requiring correct judgement of the distances between the viewpoint and the doorway. Observation has shown that VE users frequently collide with doorframes, hatches or other apertures, which in turn leads to frustration and annoyance at the VE in general. Any such test could also relate skills to other elements of VE interaction, such as navigating down an especially narrow or non-regular passage, for instance in a submarine or in mines. The other aspect of complex navigation to be considered is the ability to navigate onto a specific destination. This would be needed should use of a VE require a user to be at a
ARTICLE IN PRESS G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
242
certain location, for example entering a small elevator or arriving at a machine operation console. The tests for complex navigation therefore included:
Navigating through doorways Navigating onto a specific destination.
When the user wishes to move vertically, to enter a ‘‘flying’’ mode, there are two methods of achieving this. Firstly they can rotate the view to the direction required and continue to move forward, but this time in a diagonal vector, therefore achieving either an upward or downward trajectory. The other method is to keep the angle of viewpoint constant and to simply raise or lower it. Three basic tasks can represent such performance: to circumnavigate a single object, to navigate in between objects forcing up and down movements and to travel on a dedicated vertical pathway. The tests for ‘‘flying’’ navigation were therefore:
A relatively simple flight-path that will test the user’s ability to navigate while flying. A more complex path that entailed moving in between some objects. Flying through doorways. A standard vertical test where the user must move vertically up and down.
2.2. Object interaction As in navigation, object interaction also has two distinct subsets of actions. The first involves activating an object which immediately causes an action somewhere else in the VE. An example of this could be to operate a switch that activates a light, or to power up a piece of machinery that sets in motion an operation. In these examples, the user is only responsible for the initial operation. The other form of object interaction involves the manipulation—moving and repositioning—of an object, where the user has be constantly in touch with the object, for example to pick up a virtual container with a virtual fork lift truck and then move it to another location. Unlike object operation, this time the user has become responsible for the object and must stay with it until the correct location is reached, or else decide to drop it. Object interaction through operation was tested by requiring activation of wheels and slider bar elements in the interface. For manipulation the tests were to grab virtual objects by positioning and locking on an interface element, and then manoeuvring and rotating the object and moving it to a new location. 2.3. Complex object interaction and navigation We needed a test also to consider object interaction and navigation together. This can involve the user clicking on icons in an external interface to guide an object along a
Fig. 1. Tests for the Nottingham assessment tool for interaction in virtual environments.
specified path and into a particular location. A requirement of such an application may be minimising object collisions. The test has the potential of identifying how good a user is at distance judgment between objects. How the user copes with guiding the object around corners would give a good comparison between those who feel unsure and nudge it around and others who feel confident and carry out the movement in a swift, continuous manner. Giving the user the ability to modify the speed of the object would also give an indication of how confident they are in their abilities. Those who feel uneasy may move it around slowly, ensuring that collisions are minimal, whilst those more confident may feel capable of correctly judging the distance between object and wall and be able to move the object at a much quicker and smoother pace. This test may also give us an insight into an individual’s notion of orientation within a VE. If a user is capable of moving an object in a continuous path, whilst at the same time ensuring it is constantly in view through viewpoint control, then we may assume the user is capable of good judgement when it comes to orientation. Fig. 1 summarises the types of test that were planned for the user to carry out in NAI¨VE. 3. Designing the test virtual environments 3.1. Basis of development To incorporate some form of narration, relatively complex scenarios were defined in order to draw the user into the virtual worlds, therefore increasing feelings of immersion and motivation. A number of possible VEs were considered as potentially suitable for the basis of a VR screening tool. We wanted to give the impression that the tasks and tests were blended seamlessly together, making the user think they were part of a ‘normal’ VE activity in
ARTICLE IN PRESS G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
243
such a way that the user would not actually know they are being tested, merely exploring a new computer interface. A maze VE would be very simple to construct and, although limited in scope would provide a good way to test some forms of navigation. A game scenario could be made to incorporate a whole variety of interactions, incorporate most of the tasks and, instil a sense of competition in the users. The eventual decision was for the NAI¨VE tool to be made up of:
designated location, instructions were once again relayed to them from the environment. In order to make this process as simple and as familiar to the user as possible a virtual PC with a simplified keyboard was built within the VE. While the user was navigating to the correct locations, the first part of the complex navigation test was conducted. By navigating through the doorways, the user completed this test without being aware that a test was actually taking place.
3.3. The maze VE
An initial training environment (for control device use and familiarisation with functions). An introduction VE. A maze. A game with a hockey puck. A representation of the International Space Station (ISS).
Design of the VE was carried out through a structured methodology (see Wilson et al., 2002) and with reference to human computer interaction design guidelines (e.g. Dix et al., 1998). A user-centred design methodology was employed throughout the entire process with storyboarding, task analysis, cognitive walkthroughs and expert testing using heuristics. The software used to create the VEs was Superscape V5.6, running on a Windows 98 PII 450 with 128 Megabytes of RAM and a GForce 2 graphics card. Although this was a low cost VR system of choice for much university experimental work of a few years ago, it is no longer widely supported or used. Nonetheless, the tests and measures are generic and NAI¨VE is designed to be portable in principle if not through code to other VR systems. When first versions had been built, 10 pilot participants, ranging from being expert in the field of VR to complete novices, were used to ensure the environments had a high level of usability, employing observation and simple rating scales to assess this. Users were guided by the VE as to where they should proceed next, via a combination of visual and audio cues upon completion of a task.
Cornering in the maze VE needed no prompting as the environment contained enough corners that required navigating, with certain areas set up automatically to measure user performance. In order to ensure the user carried out the 1801 rotation and the backward navigation tests it was necessary to prompt the user via a series of instruction signs located within the VE (Fig. 4). The yellow sign on the far wall was an instruction for the user to rotate 1801.
Fig. 2. Training VE.
3.2. Training and introduction virtual environment In order to create a VE that had a sense of narration to it, all the information that the participant required came from the environment itself. After the initial training session, the participant was to be taken to the start of the tests. The training environment allowed the user to become familiar with navigating using the joystick by ensuring they completed all the viewpoint manoeuvres that would be required within the main VE (Fig. 2). The introduction was created as a ‘‘reception area’’, whereby information was relayed via a virtual secretary (Fig. 3). The secretary instructed the users to proceed to a location that contained the test they were to undergo, indicated by a map to the right of the secretary. Once the user had arrived at the
Fig. 3. Virtual secretary.
ARTICLE IN PRESS G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
244
3.4. The puck VE Participants had to navigate a small puck around a mini two-level maze system. Navigation and puck speed were controlled via an external interface, icons selected by a mouse click (Fig. 5). There were three specific destinations the users must reach, the first two on the top level were indicated via a flashing ‘‘1’’ and ‘‘2’’, and the third was to reach the end of the maze on the second level. When a user reaches destinations ‘‘1’’ and ‘‘2’’, guidance cues (arrows) change to indicate the direction required next. Should the user make contact with a wall, a sound indicates a collision.
3.5. The International Space Station (ISS) virtual environment The construction of the ISS VE was a complex process and needed to be done in a structured manner. The first part was to get the user to the actual space station, using a
‘‘Star-Trek’’ style transporter effect. It was hoped that this would involve and engage the user. To activate the transporter the user underwent two ‘hidden’ tests, to navigate onto a yellow disk (completing part of the ‘‘specific destination’’ test), whilst pulling levers (part of the ‘‘object operation’’ tests). Once the transporter effect is over, the user found themselves outside the International Space Station. In order to generate a feeling that the user was flying, a representation of the earth was added to provide a reference point, instil a sense of location and serve as a ground plane to aid in vertical axis awareness and user orientation (Murta, 1995). Fig. 6 shows the completed external model of the ISS, whilst Fig. 7 shows the internal structure. The user’s view was placed within a form of enclosure, to represent the inside of a helmet. The user was able to click on a button to activate information required for the experiment (Fig. 8). When clicking on a particular module, information regarding that module was given to the user. Clicking on the space shuttle or one of the solar sails, it is possible to view a video of either a shuttle launch or of the solar sails rotating to continuously face the sun. To carry out the object manipulation test, a separate model of the ISS was constructed with some of the modules
Fig. 4. Scene from the maze VE. Fig. 6. Completed ISS VE (External modules).
Fig. 5. Puck VE.
Fig. 7. Internal view of the ISS.
ARTICLE IN PRESS G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
Fig. 8. View of helmet and VE information.
missing, indicated by two arrows. Clicking on the arrow brings up a list of the hidden modules. Making the correct selection renders the object visible, and the participant is required to grab and position it in the correct location. The other tests in this section of the ISS are the two freenavigation tests. Path 1 entails the user circumnavigating the ISS, whilst path 2 consists of the user navigating inbetween the modules. Both paths were shown to the user by the VE beforehand. In order to complete the vertical navigation test, the user is required to navigate through the modules to reach the test location. In doing this, another ‘hidden’ complex navigation test is undertaken—having to successfully navigate through the module doorways. The vertical navigation test itself involved the user navigating a set path in a centrifuge accommodation module. All environments were evaluated through a combination of expert walkthrough (by the lead author and another VR expert) and exploratory trials with an expert and a complete novice to VR. Testing revealed some problems with the first version of NAI¨VE. These included navigation speed for rotation, frame rates that resulted in higher lags than desirable, some objects being difficult to grab no matter how close the user was, and several areas where collision boundaries could be crossed if movement was erratic. Each issue was addressed, and the VE redesigned and re-tested.
245
(invisibly) within the interface whereby each square of the grid was activated when moved through, to give a representation of the user movement. As can be seen in Fig. 10, the user who is represented by ‘‘X’s’’ navigated accurately, whilst the user depicted by the ‘‘O’s’’ had problems. When assessing complex navigation, walking through doorways could be measured in a binary manner; the user either walked through the door cleanly or collided with the door and/or doorframe. For the second part of the test (manoeuvring onto a specific destination) again the system measured the time taken, with an additional measure looking at the number of times the user had to attempt to successfully navigate onto the pad. The task for the first free navigation test involved circumnavigating the ISS, with time and accuracy once again used as measures. This was done using a series of
Fig. 9. Example of user cornering.
4. Performance measures and criteria for success Performance was assessed via a set of measures of time and/or accuracy for each test. When the user navigated around a corner, rotated 1801 and completed forward/ backward navigation, start and end points for timings were identified with the aid of another expert in VR, whilst accuracy was measured in terms of where they ended up and how the actual manoeuvring was conducted (e.g. smooth or erratic transition—Fig. 9). In order to assess the performance on forward and backward navigation tests, a grid-system was implemented
Fig. 10. Results of hypothetical path (‘‘X’’ represents good navigation, ‘‘O’’ represents poor navigation).
ARTICLE IN PRESS G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
246
invisible objects which registered a collision should the user pass through one, with output similar to that of Fig. 10. The second free navigation test also measured the time taken but additionally looked at how accurate the user was, by recording the number of times a collision was made with one of the modules. Measures of time and of the number of mouse clicks were taken for the object operation and object manipulation tests. A user who was accurate would be able to grasp the object and manoeuvre it in one attempt whilst a less able user would often ‘‘drop’’ the object, having to grasp the object once again. Measures for each test are summarised in Table 1. In order to create criterion levels, a series of tests were run with 10 participants of mixed ability. Results from this were then used to create upper control limits for the time and accuracy measures. The idea of an upper control limit was that it should represent a cut-off between unacceptable and acceptable performance, not the absolutely best performance achievable. The limits should give us criterion levels for each test that are achieved by the most skilled users, attainable with effort by the average and not achievable by poor performers. Levels chosen are shown in Tables 2–7. The key is to avoid either
floor or ceiling effects in the tests, so that we do not have a situation where all participants will pass or all fail each trial. Table 3 UCL measures for complex navigation
UCL
Doorways
Specific destination time (s)
No UCL. Either 0 or 1
5
Table 4 UCL measures for free navigation
UCL
Path 1 time (s)
Path 2 time (s)
Path 2 accuracy (Collisions)
Vertical method 1 (s)
Vertical method 2 (s)
36
58
11
8
7
Table 5 UCL for object operation
UCL
Slider time (s)
Slider (Clicks)
Wheel time (s)
Wheel (Clicks)
5
2
4
3
Table 1 Tests and performance measures used Test
Frequency
Description of measures
Cornering
*4
1801 rotation
*4
Forward–backward navigation
*4
Complex navigation
*4
Time taken to navigate corner. Accuracy measured how user conducted task and where they finished. Time taken to complete turn. Accuracy measured how user conducted task and where they finished. Time taken to complete task. Accuracy measured which square on the grid user walked upon. Time taken to navigate through doorway. Accuracy measured number of collisions with doorframe. Time taken to complete path. Accuracy measured which square on the grid user flew through. Time taken to complete path. Accuracy measured number of times user collided with objects. Time taken to complete task. Accuracy measured which square on the grid user flew through. Time taken to complete task. Accuracy measured how many times user collided with the walls. Time taken to complete rotation. Accuracy recorded number of times user ‘‘lost’’ the wheel and had to re-grab. Time taken to complete bar movement. Accuracy recorded number of times user ‘‘lost’’ the slider and had to redo. Time taken to move module. Accuracy recorded number of mouse clicks required.
‘‘Flying’’ navigation Path 1 ‘‘Flying’’ navigation Path 2 (Inbetween objects) ‘‘Flying’’ vertical navigation
*4
Puck navigation Wheel rotation Slider-bar operation
*2
Object manipulation
*8
Table 2 UCL measures for standard navigation
UCL
Cornering time (s)
Cornering DFR
1801 time (s)
1801 (DFR)
Forward time (s)
Backward time (s)
5
11
6
11
6
6
ARTICLE IN PRESS G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
247
Table 6 UCL measure for the time taken to manoeuvre and click on objects
UCL (Time) UCL (Clicks)
European mod
US mod
Soyuz
Sol. sail
Radiator
CAM
Hab mod
Therm. panels
7 4
9 3
9 4
11 5
7 2
6 3
5 3
9 4
Table 7 UCL measure for complex object navigation
UCL
Time (s)
Collisions
234
12
5. Experiment to test and refine NAI¨VE In order to make a screening tool, NAI¨VE, we needed to know which subset of the tests described above can be used to reliably distinguish poor, medium and high performing participants. Forty paid volunteer participants took part in an experiment, 21 male and 19 female between the ages of 18 and 53. None had previous experience of using VR systems although some did have a very limited knowledge of VR. Most had an interest in computing technology and computer literacy was high. The participants were paid £10 for their time in conducting the experiment. Participant 35 failed to complete the experiment. The hardware used to run the experiments was the same as that used to build the environments. The participants navigated using a Microsoft Sidewinder Pro 2 Joystick and interacted with objects using a standard Microsoft Mouse. An Avid splitter allowed an additional signal to be output from the PC into a ‘‘Quad Splitter’’, which also received a signal from a Sony Digital Video Camera. This enabled the video to record both what happened in the VE and the behaviour of the users themselves, and served as a method of capturing user reactions to areas within the VEs. The VE was administered in versions with high and low levels of detail, to enable a different experimental analysis, not reported here (see Griffiths, 2001). Questionnaires were administered before the experiment to collect information regarding the participants’ views of VR, computers in general and their immersive tendencies. The VR Attitudes Questionnaire (VEDAC) was adapted from Nichols (1999), the Immersive Tendencies Questionnaire (ITQ) was adapted from Witmer and Singer (1998) and the Computer Attitudes Scale Questionnaire (CAS) was adapted from Lloyd and Gressard (1984). Upon arriving, the participants filled out the questionnaires before the experiment started and were then instructed to begin. As the VEs contained instructions on how to proceed, little contact with the experimenter was required during the experiment. In addition, at various key selected points in the VE the participants were presented with simple numerical rating scales of usability and presence via displays within the VE itself, and could rate
their experience of those attributes in that section of the VE. Results from these and other scales are not reported here but may be found in Griffiths (2001). The participants displayed very different capabilities when navigating and interacting within the VE. Some users were good at navigation, whilst others were better at object interaction; few were competent at both. Because at this stage we were less interested in times or frequency counts of errors for each individual test, and more in overall performance assessment, we needed to find a way to quickly and easily visualise, summarise and understand the overall results. Subsequently a subset of the tests was to be identified which represented a broad spectrum of performance across participants, so that they could then be used to reliably and robustly identify levels of individual user performance. First of all, the results for all 40 participants for each test were classified as good, adequate and poor (coded green, blue and red, respectively). The cut-off between good and adequate was at the upper control limit criterion level as described above. The cut-off between adequate and poor was set on the basis of an inspection of all the results by a panel of VR experts (developers and human factors) convened by the first author; this cut-off represented any natural break in the data as well as being complementary to where the good/adequate boundary lay. A large matrix was constructed with the columns for each test and in each were placed the results for participants in order of decreasing performance. Thus, all green coded performances appeared at the top of each column although these comprise different sets of participants in different numbers for each test. As an illustration, a part of the matrix showing 30 participants for six tests appears in Fig. 11. Because NAI¨VE was to be designed to be practically and quickly usable by other industrial groups or by other experimental researchers, we wished to reduce the screening test to include only a core subset of the most useful tasks/tests, selected in order that they reliably distinguish between user performances. There had to be enough tasks to cover all aspects of performance, but not so many that NAI¨VE becomes hard to administer. It was important that the selected tasks did not duplicate each other too much. Some of the tasks should extend even the better users whilst at the same time other tasks should allow almost all users the potential to complete them, even if poorly. Finally, the tasks/tests should show variable performance across any workforce or test sample. Table 8 shows the tasks that were identified from the matrix as providing the best spread of performances,
ARTICLE IN PRESS 248
G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
found harder or easier was into five test groups (see Table 9) as follows:
(A) ¼ Few poor, few adequate and many good level performances. (B) ¼ Few good, few adequate and many poor level performances. (C) ¼ Few good, few poor and many adequate level performances. (D) ¼ Approximately equal number of good, adequate and poor level performances. (E) ¼ Approximately equal number of good and adequate performances, and low poor level performances.
Number of Poor, Medium and Good performances
chosen after close inspection by the authors and discussion with colleagues. Unsurprisingly, navigation seemed to provide the most problematic tasks for a proportion of new users to VR, and so comprise the majority of tests in this first core version of NAIVE. With this task set, we were now able to discriminate user performance and identify users who would or would not experience problems within a VE. To show this, 10 participants (five male, five female) were chosen at random from the group of 40, to examine their performance (Fig. 12). From this we can clearly see that participants 1 and 39 experienced considerable difficulties, participants 21 and 16 performed well, and participants 7, 17 and 19 showed variation in performance across tasks. The full set of tasks represented a range of performances. A first classification of the tests that participants generally
Performance of randomly chosen subjects for task subsets 12 Poor Medium
10
Good
8 6 4 2 0 S1
S16 S12
S7
S17 S19 S27 S21 S39 S22 Subjects
Fig. 12. Performance of 10 randomly selected participants for core NAI¨VE (from high detail VE).
Table 9 Performance spectrum for tasks
Fig. 11. Performance matrix section as example. Light shading represents good (green), medium shading is adequate (blue) and dark shading is poor (red) performance.
Table 8 Tasks which describe a spread of capabilities to form the core NAI¨VE Task name
Task category
Paths 1 & 2 (Time) Vertical move (Accuracy) Path 2 (Collisions) 1801 (Boundary) Backward movement (Time) Cornering (Time) Forward movement (Accuracy) Complex object navigation (Collisions) Object manipulation (Clicks required) Wheel (Clicks+Time)
Free navigation Free navigation Free navigation Standard navigation Standard navigation Standard navigation Standard navigation Complex object navigation Object manipulation Object operation
Task name
Task category (A, B, C, D or E)
Path 2 collision Cornering (Boundary) Slider Doorways (Flying) Vertical movement Doorways (Walking) Paths 1 & 2 Wheel Object manipulation 1801 (Time) Complex object navigation (Time) Backward navigation Cornering (Time) Forward navigation Vertical movement (Accuracy) 1801 (Boundary) Backward navigation (Accuracy) Specific destination Forward navigation (Accuracy) Complex object navigation (Collision) Object manipulation (Clicks required)
A A A A B B B B C C C C C C D D E E E E E
ARTICLE IN PRESS G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
249
Comparison of Worst and best performers from the task spectrums 6 Frequency
S1 4
S16
2 0 A Poor
A A Good B Poor B B Good C Poor C C Good D Poor D D Good E Poor E E Good Medium Medium Medium Medium Medium Task Spectrums
Fig. 13. Comparison of worst and best performers across the task spectrums (S1—worst, S16—best).
Thus if we wish to use tests that would stretch all participants then we would use tests in groups B and C. To provide tests that should be passed by most participants (i.e. to weed out only very poor performers) we would use tests in groups A and to an extent E. If we want tests that generally should show a spectrum of results across a population then those in group D should be used. We can also use the groups of tests to profile different participants (see Fig. 13 profiling the best and worst performance from the subset of participants represented in Fig. 12). 6. Discussion and conclusions The NAI¨VE screening programme was created as a set of VR/VE tasks and tests to reliably identify users who would potentially experience relative success or difficulty whilst in a VE. Along with assessing users’ performance capabilities across a whole spectrum of VE tasks, it is also capable of identifying areas of interaction that could cause users the most problems, for example certain aspects of navigation. Following on from this it would then be possible to selectively distinguish between groups of users, or identify specific users that experienced the most difficulty and therefore need training or other support. The first stage in creating NAI¨VE was to propose, plan and build a large series of typical tasks and associated tests into a large space station VE. Criterion levels of poor, adequate and good performance were defined on the basis of observing a first sample of participants. Subsequently, the full set of tests were run with a main group of 40 participants, with the aim both of creating some normative data for the tests and also of making decisions about which tests to retain in the core version of NAI¨VE for general use. The original version took about 50 min to complete and this was thought to be unacceptable for most experimental or employee screening purposes. We consequently selected a subset of NAI¨VE tests which did not have too great a floor or ceiling effect, gave a spectrum of performance across the group of 40, did not give a profile of results for any one test that differed too much from the rest of the tests in terms of successful or unsuccessful candidates, and covered a range of VE task types. The reduced core-NAI¨VE has 11 tests with 11 measures and takes about 10–20 min to complete. NAI¨VE
can be used in its core version or in its full version in order to comparatively assess a number of potential VR/VE participants, or to find a profile of strengths or weaknesses for individual participants. The work reported here has been carried out to fulfil two different needs: for a participant screening and base-line tool before human factors experiments with VEs; and for an employee screening tool when implementing VE-based work or taking on new staff. A further potential application of NAI¨VE, which bridges the other two, is for assessment pre-training and post-training. In employment use we are NOT proposing NAI¨VE as a yes/no selection tool, but rather its use is to reliably measure the level of performance of employees before they start work with VR/VE, and to determine where they might need training or use support. References Bowman, D.A., Kruijff, E., LaViola Jr., J.J., Poupyrev, I., 2001. An introduction to 3-D user interface design. Presence: Teleoperators and Virtual Environments 10 (1), 96–108. Bowman, D.A., Gabbard, J.L., Hix, D., 2002. A survey of usability evaluation in virtual environments: classification and comparison of methods. Presence: Teleoperators and Virtual Environments 11 (4), 404–424. Bricken, W., 1991. Virtual Reality: directions for growth. In: CR’91— Proceedings of the First Annual Conference on Virtual Reality, Olympia, London. Cobb, S.V.G., Nichols, S., Ramsey, A., Wilson, J.R., 1999. Virtual Reality-Induced Symptoms and Effects (VRISE). Presence: Teleoperators and Virtual Environments 8 (2), 169–186. Dix, A., Finlay, J., Abowd, G., Beale, R., 1998. Human–Computer Interaction. Prentice-Hall, Europe. Draper, M.H., Viirre, E.S., Furness, T.A., Gawron, V.J., 2001. Effects of image scale and system time delay on simulator sickness within headcoupled virtual environments. Human Factors 43 (1), 129–146. Griffiths, G., 2001. Virtual environments usability and user competence: the Nottingham Assessment of Interaction within Virtual Environments (NAI¨VE) Tool. Ph.D. Thesis, School of Mechanical, Materials, Manufacturing Engineering and Management, University of Nottingham. Kaur, K., Maiden, N., Sutcliffe, A., 1999. Interacting with virtual environments: an evaluation of a model of interaction. Interacting with Computers 11, 403–426. Lampton, D.R., Knerr, B.W., Goldberg, S.I., Bliss, J.P., Moshell, J.M., Blau, B.S., 1994. The Virtual Environment Performance Assessment Battery (VEPAB): development and evaluation. Presence: Teleoperators and Virtual Environments 3 (2), 145–157.
ARTICLE IN PRESS 250
G. Griffiths et al. / Int. J. Human-Computer Studies 64 (2006) 240–250
Lindeman, R.W., Sibert, J.L., Templeman, J.N., 2001. The effect of 3D widget representation and simulated surface constraints on interaction in virtual environments. Proceedings of the IEEE Virtual Reality 2001, 141–148. Lloyd, B., Gressard, C., 1984. Reliability and factorial validity of computer attitude scales. Educational and Psychological Measurement 44, 501–505. Murta, A. 1995. Vertical axis awareness in 3D environments. Advanced Interfaces Group, Department of Computer Science, University of Manchester. Nichols, S. 1999. Virtual Reality Induced Symptoms and Effects (VRISE): methodological and theoretical issues. Ph.D. Thesis, University of Nottingham. Nichols, S., Patel, H., 2002. Health and safety implications of virtual reality: a review of empirical evidence. Applied Ergonomics 33, 251–271.
Pausch, R., Proffitt, D., Williams, G. 1997. Quantifying immersion in virtual reality. University of Virginia, Submitted to ACM SIGGRAPH 1997. Stanney, K.M., Hash, P., 1998. Locus of user-initiated control in virtual environments: influences on cybersickness. Presence: Teleoperators and Virtual Environments 7 (5), 447–459. Wilson, J.R., 2003. If VR has changedythen have its human factors? In: de Waard, D., Brookhuis, K.A., Breker, S.M., Verwey, W.B. (Eds.), Human Factors in the Age of Virtual Reality. Shaker Publishing, Maastricht, The Netherlands, pp. 9–30. Wilson, J.R., Eastgate, R., D’Cruz, M.D., 2002. Structured development of virtual environments. In: Stanney, K. (Ed.), Handbook of Virtual Environments. Lawrence Erlbaum Associates, Hove. Witmer, B.G., Singer, M., 1998. Measuring presence in virtual environments: a presence questionnaire. Presence: Teleoperators and Virtual Environments 7 (3), 225–240.