CompurersEduc. Vol. 17, No.
I, pp.49-60, 1991
0360-1315/9153.00+ 0.00 Pergamon Press plc
Printed in Great Britain
THE DESIGN AND EVALUATION OF A MULTIMEDIA AUTHORING SYSTEM CLAIRE O’MALLEY,’
MICHAEL BAKERY and MARK ELSOM-COOKE
‘Department of Psychology, University of Nottingham, University Park, Nottingham NG7 2RD, England, ‘CNRS-IRPEACS, 93 Chemin des Mouilles, BP 167, Lyon-Ecully 69131, France and 31nstitute of Educational Technology, The Open University, Walton Hall, Milton Keynes MK7 6AA, England (Received 16 January I991; accepted I3 February 1991)
Abstract-There are two issues in evaluating multimedia authoring environments such as SHIVA: the effectiveness of the support for authoring and the educational effectiveness of courses designed with the system. There are few evaluation methods which have been specifically designed for authoring systems. Our approach is to use a variety of methods. For the usability of the interface, a variant of Task-Action Grammar known as D-TAG was used to capture the display-oriented nature of the interaction, also think aloud protocols with members of the design team. Detailed observational studies with experienced authors using SHIVA addressed the functional model of authoring underlying the system. One unexpected finding was that experienced users developed a common library of “cliches” of logical sequences of material identifiable from their visual patterns. They were also able to parse flowcharts rapidly, even when unfamiliar with the teaching domain. The spatial layout enabled authors to create a visual course structure of concepts and frames, and the overall visual appearance of the concept network appeared to strongly affect authors’ predictions of the teaching sequence. One of the major problems with such an evaluation is one of scale. Meaningful evaluation requires realistic tasks for authoring courses entailing several hours of contact time, where real training needs have been identified, and realistic settings with professional authoring teams. Traditional task-analytic methods are too low-level, and we were unable to represent interface operations which appear at a higher level in the user’s task. A thorough evaluation of the system should ideally incorporate a number of methods, each addressing a different aspect of the system’s use and at different levels of granularity, preferably over a period of at least 2yr.
INTRODUCTION
The last decade or two has seen the development of better interfaces to wordprocessors, computer-aided design tools, accounting packages, desktop publishing systems, programming environments and other systems. However, improvements in tools for designing computer-based learning materials have not kept pace with these other developments. Barker’s recent review of authoring languages and systems describes tools which are still low-level and inflexible, with interfaces that suggest they are difficult to learn and use[l,2]. A review by Naughton is particularly damning [3]: “ . . . current authoring systems seem locked in a time-warp somewhere betweeen the ‘garage programming’ of the 1970s and primitive ‘fourth-generation’ slowly finding their way into the DP world”.
systems now
There are a few exceptions to this, but these tools are by no means widespread in use. Examples of better high-level authoring systems which make use of graphics and other media include Instructional Design Environment (IDE)[4], Best Course of Action, and Authorware (authoring tools available for the Apple Macintosh). These tools are improvements because they allow the courseware designer to operate at the level of the structure and content of the teaching material, rather than having to work at the level of code. This paper discusses a case study of the design and evaluation of an authoring environment, intended to provide high-level support for designers of multimedia learning materials. This case study is taken as a starting point for examining some of the more general issues involved in evaluating the effectivenes of authoring tools with respect both to the task of authoring and the educational effectiveness of resulting courseware. As such it is not primarily about the results obtained from the evaluation, but instead focuses upon the methodological issues entailed in designing an evaluation of an authoring environment. 49
CLAIRE O’MALLEY rt al.
50
The evaluation of authoring environments perhaps raises the dilemma of evaluation in one of its most extreme, but also possibly most relevant, forms. In evaluating an authoring tool for computer-based multimedia teaching, there are two evaluation problems: whether the system is effective and convenient for authors (evaluating the user interface for authors); and whether it leads to better systems being produced for the students. This latter is the more acute question. because it is indirect (poor learning might or might not be due to the authoring system); yet it is crucial, because it can be argued that improving authoring systems is a most important way of improving the quality of the teaching produced by CAL. Before discussing further the methodological issues raised by such an evaluation. we will outline the nature of the authoring environment which is to be evaluated. SHIVA-A
MULTIMEDIA AUTHORING ENVIRONMENT FOR ADAPTIVE TUTORING
SHIVA* is a multimedia authoring environment, running on PCs, and developed under the European DELTA programme. The system is designed to facilitate the work of teams of authors (programmers, media specialists, trainers and course designers). It enables authors to produce multimedia frames of material (graphics, video, text, sound, digitized images) and to link them together in terms of higher-level concepts in the domain (teaching objectives). On the basis of the network created between concepts and frames, SHIVA makes decisions on their presentation order at run time, depending on the student’s responses. The system is in this respect a tool for creating adaptive courseware as well as multimedia materials. Multimedia teaching materials can be created using a set of specialized media editors for colour graphics, digitized sound, static bitmap photographs, video and text editing. These media scenes can be composed into multimedia sequences using a specialized multimedia editor. A visual flowchart editor allows the author to create graphically a branching sequence of such frames of material. At a higher level in the course structure, authors can create a concept map, consisting of the high-level objectives and structure of the course, which may be linked to frame modules created in the flowchart editor, via a graphical interface. Once links have been created between frames and concepts or teaching objectives, the system makes adaptive decisions on their presentation order to the student. These decisions are based on a set of teaching rules which operate on a dynamic student model, a teaching dialogue history and a discourse model which controls dialogue focus shifts in the interaction with the student (based on the concept network). The author can use the simulation tools provided in the student environment to view the course and anticipate the sequence to be presented. The author can subsequently make changes to the course structure, using a set of dynamic debugging tools. The multimedia adaptive courseware created with SHIVA is intended to present a rich training environment for learners, in terms of the variety of media and adaptive sequencing of learning materials. In addition, the system supports the design of a range of interaction styles for use with the courseware. Learners may choose to interact with the courseware under guidance, or may access a menu system for free browsing. Facilities are also available for providing a hypertext style of interaction. The ,functional architecture of SHIVA SHIVA is the result of combining two existing systems with complementary functionalities: ORGUE (developed by the Computing and Training Department, CNRS-IRPEACS, France), a graphical courseware development tool and ECAL (developed by Mark Elsom-Cook, Open University. U.K.), an “intelligent” authoring tool which provides adaptive instruction. In addition, SHIVA also combines other specialist tools for creating graphics, digitized images, video and sound. *SHIVA was developed under the European DELTA programme, project No. 1010. “Advanced Authoring Tools”. Members of’ the project team include: Mark Elsom-Cook (Open University, U.K.); Michael Baker, Christian Bessiere. Michel Giry, Jean-Louis Leonhardt, Romain Zeiliger (CNRS-IRPEACS, France); Peter Bibby, Claire O’Malley (Nottingham University, U.K.); Patrick Maddelena (APIGRAPH, France); Claudia Scaroni (DATAMAT, Italy): Paul Byerley (SEL, Germany).
Design
and evaluation
of a multimedia
authoring
system
51
SHIVA’s multimedia editors. ORGUE allows the author to combine frames of material created with the specialist editors into a program which specifies the flow of control and the interaction with the student. The author is not required to do any low-level programming, but can link material together graphically. (There are also facilities for linking in specialist programs written in C or Pascal.) The system has three main windows (see Fig. 1). One shows the current flowchart representation, another shows the entire course structure in miniature as a tree-like structure, whilst a third window is used to display a reduced view of the current frame, as the student would see it. Interaction with the system is via mouse and menu commands. Flowcharts may be also nested within one another, reducing the complexity of the visual display. The materials making up the frames which ORGUE uses are created with a set of specialist editors for the different media: MINIGR is a graphical colour editor. The metaphor underlying design in MINIGR is of a set of transparencies which can be overlaid on top of each other, allowing for the creation of dynamic visual scenes. It can also important drawings or digitized images from other editors, produce graphical animation, import and synchronise digitized video. DIESE is a text editor which is compatible with the graphics created by MINIGR. It also provides limited translation facilities. (This facility is important since SHIVA is being developed within a European learning context.) PICCOLO is an editor for digitized static images, which allows the combination of photographs with text and graphics. SAX0 is a digital sound editor, allowing the recording, modification and retrieval of sound at various sampling rates. Once sounds have been digitized they are represented graphically (as spectrographs) and can be manipulated directly, by cutting, pasting and copying sections of the graph.
Present.
PC pointChaud
Interact. --i
rest
I
I
copier
Les “points
chauds’
sont
deszonesou
se
:ommentalrel
manifest9
une
COller
activite
volcanique
particulaire:
une theorie
affirme
sent
les points
que
de conduits
magmatiques relies
profonde
Quand
a la
du man&au.
un point chaud
surmonte oceanique,
d’une
ce
d’emergence
dirktement partie
I
est
plaque
il donne
naissancf
une ile ou une Crete volcanique
sous-marine.
Fig. 1. This figure shows a mock-up of an ORGUE screen. The large window on the left of the figure shows the current flowchart representation for part of a course. The window on the top right shows a tree-like structure, in miniature, which represents the whole course. The window on the bottom left shows a reduced view of the current frame. as the student would see it.
52
CLAIREO’MALLEY et al
Finally, SIMENU is an interface design tool which allows the author to create the learnermachine dialogue in the form of menus, control panels, dialogue boxes, etc. SHIVA’s adaptive tutoring capabilities. Whilst the editors described above support the development of state-of-the-art multimedia courseware, they are unable to provide adaptive instruction. In SHIVA, this is achieved by incorporating a version of the ECAL authoring system[5], which uses simple techniques from artificial intelligence (AI) to represent knowledge of the high-level structure of a course, and to model the learner’s knowledge as a subset of that representation. ECAL stands for “extended computer assisted learning”. The system was intended as an extension of existing CAL systems, allowing them to achieve adaptive instruction without the complexity of a full-blown intelligent tutoring system. ECAL differs from traditional CAL authoring systems in that the specification of course structure and the creation of teaching materials are separated from the flow of control which determines the teaching sequence. The advantages of separating these aspects of courseware development lie in increased ease of modification and support for iterative prototyping of courseware. Traditional systems, in contrast, force the author to consider the content of the course, the structure of the material and the sequence of presentation concurrently. This lack of modularity means that any subsequent changes in content or structure cause interactions whose effects are difficult to predict. The system is based upon an adapted version of a model of curriculum design developed by Posner and Rudnitsky[l6]. The author creates a set of concepts to be learned (“intended learning outcomes”, in Posner and Rudnitsky’s terms), and a separate set of course materials in the form of “frames” (screens of text, graphics etc.). The author then links the frames to the concepts via keywords, which indicate the content of the frames with respect to the course structure. ECAL then creates a presentation order for the material, based upon the structure implicit in the connections between frames and concepts. The concept-frame network created by the author produces a knowledge representation which is used by the presentation system. This controls the interaction with the student. The presentation system also makes use of a dialogue model and a model of the student to generate a course sequence. The dialogue model is based upon concepts such as “focus spaces”[7] and ensures that the system presents material which is relevant to the current topic of discourse. The student model is a modified form of a numeric overlay model, explained in further detail below. Knokvledge representation. ECALs knowledge representation is intended to be used by people with little programming experience. The author provides a sequence of keywords associated with each frame (and for diagnostic frames, keywords associated with each response). These keywords constitute the concepts or intended learning outcomes around which the course is structured. The system can then take decisions about sequencing of material based on these keywords. Concepts are identified as being “connected” or not, depending on whether two keywords are used together in describing a particular frame. If the keywords are used together in more than one frame the strength of their connectedness is corrrespondingly greater. ECAL produces a “connectedness matrix” from this information. The system uses a clustering algorithm to generate a set of relationships between the concepts which reflects the strength of their relationships to all other concepts in the course. In addition to this relatedness, ECAL can also determine the relative “importance” and “generality” of concepts within the course structure. Importance is simply a measure of how much a concept is used. Generality relates to the scope or coverage of a concept. Two concepts may be of equal importance, but the one which is more widely distributed throughout the course is considered to be more general. Student model. ECALs student model is extremely basic in terms of AI techniques used in most state-of-the-art intelligent tutoring systems. It uses a simplified version of overlay modelling. This assumes that the student’s knowledge of the domain can be viewed as a subset of the expert’s (as captured by the domain or knowledge representation). The modelling task is then to decide which parts of the knowledge representation are understood by the student. The teaching process involves identifying those parts in the model which are not understood and presenting information about them to the student. In ECAL, this type of approach has been modified to form a numeric overlay model which assigns confidence values to each concept represented in the system. The numeric value (ranging
Design
and evaluation
of a multimedia
authoring
system
53
from 0 to 1) reflects the confidence of the system that the student understands the concept associated with it. These values are constantly updated by the system as frames are presented to the student. When the student is presented with a frame, the numeric values for the concepts associated with that frame are updated. If the frame is diagnostic (i.e. it presents a direct test of the student’s knowledge), the author specifies in addition a number of possible answers with associated keywords for each concept involved. In order to avoid the “credit assignment problem” (i.e. which concept to “blame” for an incorrect answer) the student model is updated according to a mechanism which takes into account both the behaviour of the student and the previous state of the student model. Changes in the value for a given concept are expressed as percentages of the previous value for that concept. The changes in the student model over time thus form a damped envelope. This means that, as the system gains more confidence that the student understands a particular concept, it becomes harder for the system to change its confidence about the student’s understanding. DiuZogue model. ECAL has a model of dialogue focus which corresponds to the set of goals (domain concepts to be learned) which the teaching dialogue is currently attempting to satisfy. There are three major components to the dialogue model: a dialogue history, a current focus and a set of goals (concepts) remaining to be satisfied. The dialogue focus (i.e. the concept which is currently in focus) is marked either as explicit, which indicates the current teaching goal, or implicit. concepts in implicit focus are those which are also involved in frames which have been used to present the concept in explicit focus. Presentation sequence. The presentation of frames is controlled by accessing the student model, dialogue model and knowledge representation. The system first attempts to choose a frame which maintains the current focus of the dialogue. If this is not possible, it selects a new focus and chooses an appropriate frame for that focus. Given a particular focus, the system collects all the frames which mention that focus explicitly and which have not yet been presented. The frames are given an order of priority such that those which have maximum priority are those in which all other concepts (i.e. other than the current focus) are known, according to the student model. Secondary priority is given to frames where all concepts (other than that in current focus) are in implicit focus. Other frames must contain one or more concepts which are not in focus. These are ordered such that the one with the fewest “new” concepts has the highest priority. Change of focus initially involves promoting to explicit focus a concept which was previously in implicit focus. If this cannot be done, then there is a major change in focus. This is achieved by searching the set of concepts which have not yet been taught and promoting the concept which is mentioned in the maximum number of frames to be the current focus. Further details concerning these rules for choosing frames, controlling the dialogue focus and updating the student model and dialogue history are given in Elsom-Cook and O’Malley[7] and Elsom-Cook[6]. Debugging tools. In addition to creating a course structure by specifying intended learning outcomes or concepts and linking material to them, the author can also successively modify the structure created by ECAL. In this way the system supports iterative course development whilst reducing the complexity of control structure which the author has to deal with. (More details of these tools are available in Elsom-Cook and O’Malley[7].) Graphical interface. The functionality of ECAL has been integrated into the SHIVA system in the form of a graphical tool called PSAUME (see Fig. 2). The function of PSAUME is to allow the author to define a set of high-level concepts which constitute the learning goals of the course. As with other editors within SHIVA, the task is performed graphically. Authors can create concepts, name them and link them graphically to representations of the multimedia frames created with ORGUE. In order to get over problems of visual complexity with large numbers of frames and concepts, there are additional tools which allow authors to create “subviews” of the network in separate windows, to move concepts and frames around without destroying the network, and to highlight particular connections between frames and concepts. Authors can also view the material contained in particular frames from within PSAUME, and “run” parts of a course rather than just the whole course. The result of creating concepts and linking them to frames in PSAUME is a “runnable” course, where the ECAL component in PSAUME makes adaptive decisions concerning the presentation
CLAIREO’MALLEY et al.
Fig. 2. This figure shows a mock-up of a PSAUME screen. The window at the forefront of the figure shows a partial view of the mapping between concepts and frames. The window behind that shows the full network of concepts. The third open window contains a list of parameters and rules underlying system behaviour which can be altered by the author. The icons at the top of the screen represent other “views” of the network structure.
ordering
as a function of the student’s responses and the specific frame-concept links has established. As described above, ECAL bases its decisions primarily on a simple student model and on a model of coherent shifts in the focus of the teaching dialogue, in terms of the high-level domain concepts. The confidence vlaue of a concept taught is incremented when the student makes a correct response and when a frame to which it is linked is presented, and is decremented on incorrect responses. Coherent focus shifts are controlled via an implicit concept network which the system creates automatically from the frame-concept links established by the author: if two concepts are both linked to the same frame, then an implicit link is created between the two concepts. Given such a network, the system calculates matrices for relatedness, proximity. generality and connectedness between each concept, upon which a set of teaching decision rules can operate. Since the pedagogical decisions made by the system are based largely upon this network, the author can view the implicit network. As described in more detail above, a complex set of rules are used by the system for controlling shifts to new concept foci in the teaching dialogue, and for deciding the order of presentation frames linked to concepts. The rules of dialogue shifts operate in conjunction with rules for progressing to a new concept depending on the extent to which it is believed to be known to the student according to the student model. These rules can be examined by the author, using PSAUME’s debugging and simulation tools, at run time. For example, the author could run part of the course, pause it and find out why the system chose to present a particular frame of material, by examining the state of the student model, the dialogue history and the teaching rules. The course can then be modified by making changes to the appropriate links in the concept-frame network. These rules can either be accessed using the special debugging and simulation tools or directly from PSAUME’s concept-frame network representation. In this way, the author can examine the state of the system either with respect to particular frames of material, or with respect to the high-level structure of the course.
which
of frames,
the
author
Design
and evaluation
THE
of a multimedia
EVALUATION
OF
authoring
system
55
SHIVA
There are two main issues in evaluating a multimedia authoring environment such as SHIVA: the effectiveness of the system in supporting the task of authoring and the educational effectiveness of courses resulting from the use of the system. Support for authoring
SHIVA allows authors to design multimedia frames of teaching material and to specify the control structure of the course (as in ORGUE)-in addition the system allows authors to represent high-level concepts involved in the course (using the PSAUME interface to ECAL). The intended advantages of the hybrid system embodied in SHIVA lie in a greater degree of adaptiveness of learning, and decreased production time for modifying courses. The model of authoring embodied in SHIVA needs to be understood in terms of the models of authoring embodied in the two main systems which underlie it: ORGUE and ECAL. For example, ORGUE was designed on the assumption that authors would begin with a complete specification of the course ofIline. Authors are recommended to produce a set of objectives which define the educational objectives of the course in terms of the subject matter, the target population in terms of prerequisite knowledge, and the pedagogical strategy to be used, to facilitate discussion between course team members and to provide a set of specifications which can later be evaluated[9]. In contrast, ECAL was designed to support these activities online. Having specified the details of the courseware, down to the level of each individual frame, as well as the control structure, the author can then proceed to design materials and sequences using ORGUE itself. Upon entering ORGUE, the author is provided with tools for creating empty frames and linking them together into a logical structure. In this respect, ORGUE allows for at least some form of separation between creation of material and specification of structure. In addition, since the logical structure of the course is represented graphically (in the form of a flowchart), this structure can also be relatively easily modified, without affecting the content of frames. A frame can contain one or more screens created with a set of multimedia graphical editors (i.e. MINIGR, SAXO, etc.). Whereas in ORGUE each of these editors has to be accessed separately, in SHIVA there is a multimedia editor for creating and merging graphics, sound, digitized images and video. The major difference between ORGUE and ECAL is that in ECAL the author creates materials (frames) and a course structure (concepts or intended learning outcomes) but the system provides the initial sequencing of material (which can then be modified by the author). With ORGUE, the course structure is only implicitly represented in the sequence, which has to be specified by the author. The former provides support for the higher-level stages of courseware design, while the latter provides support for the creation of materials. By combining both types of approach in SHIVA, the author is provided with a more flexible method of authoring. A possible additional advantage of separating course structure from content in SHIVA is that the author is not only supported in structuring the course, but is positively encouraged to reflect on the course structure and change it if necessary. One might hypothesize that this could lead to improvements in the quality of courses produced with SHIVA over other authoring systems. The risk of investing all the time for course design and structuring prior to implementation is that any changes subsequently found to be necessary will be difficult and costly to make, and there is a danger that authors will be discouraged from making improvements. Several more specific evaluation issues arise from the general considerations outlined above: Separation of teaching materials and control structure. One major question which has concerned us in designing the evaluation of SHIVA is the extent to which authors are able to make a clear separation between the domain knowledge to be represented and the way in which it is to be used in a teaching interaction. With traditional types of authoring system (e.g. TENCORE), domain and teaching knowledge are combined into a single representation, requiring authors to simultaneously create subject materials and teaching strategy in designing the courseware. The problems are greater in authoring for tutoring systems which are adaptive and make decisions about teaching at run time. This issue has received little attention in the intelligent tutoring systems community
CLAIRE O’MALLEY et al
56
in the past, but there have been some recent exceptions[ 10-121. For example Bierman [3] argues that, although intelligent tutoring systems require the separation of domain and teaching knowledge, it is not clear that this separation is enough to make them domain-independent authoring systems. Representing the domain. An important question to be answered in the evaluation is: to what extent can authors make the domain knowledge for teaching sufficiently explicit? This question is similar to that raised by knowledge elicitation and acquisition for expert systems [ 131. The difference in the case of authoring is that knowledge must not only be represented sufficiently clearly for use by the system, but the knowledge must also be in a suitable form for educational purposes. A related question is that, assuming authors can produce the appropriate representation of the domain, how easy is it to separate this representation from its use in a teaching sequence, during the design stage? In terms of requiring the author to represent high-level concepts in a course explicitly, we have to consider the scale of course design at which this is possible or relevant. For example, concepts could be defined at the level of whole units or at a much smaller grain size. Our experience with one group of users in France, who were designing a course on technical English suggests that the authors were easily able to define concepts at the level of whole units (I-2 h of contact time), but not very easily within a unit. However, this may be a function of the way in which particular course teams operate. For other authors and for other courses, meaningful units for representing as teaching objectives may be much smaller. A related issue concerns the size of a frame of material-where frame refers to the lowest level unit with which the ECAL system included in SHIVA can operate. The system has no “knowledge” of anything contained within a frame. Frames can be of arbitrary size, from one screen of information to a whole sequence. Frames are composed of materials created via the multimedia editor(s) and linked via ORGUE. Within ORGUE itself a frame can contain several nested sequences. On the one hand, this allows for greater flexibility of the system. Authors can choose the extent to which they have control over specifying the teaching sequence. They can either “hard-wire” it in ORGUE, or have the system determine it via PSAUME/ECAL. This is especially important if video or sound are being used, since it becomes even less clear what constitutes a frame-it is no longer a single screen of text/graphics, as in traditional branching CAL systems. However, this can create problems for the student model, since its accuracy decreases with the size of materials inside a frame. This has been a problem in designing intelligent interactive video systems, for example. Understanding SHIVA’s teaching decisions. Another question related to the separation of content from control structure concerns the ease with which authors can understand the teaching decisions made by the system, using the available debugging and simulation facilities. Do they agree with these decisions? If not, how easy is it for them to make changes and predict the effects? Design L’Simplementation. Another difference between traditional methods of courseware design and the method intended to be supported by SHIVA is that, in the traditional method there is a separation between design and implementation. Authors usually sketch out the design first on paper, specify it, and then implement it. SHIVA is intended to support the design stage as well as implementation, since the intention is to support a rapid prototyping style of design. Another question for evaluation, then, is the extent to which authors can and do use the tools throughout the courseware design cycle, and to what extent they still have to rely on “offline” methods of support. Individual authors vs courseware teams. Although SHIVA is not specifically designed to support teams of authors, in that it doesn’t contain specific integrated communication and file transfer facilities, it was certainly designed with course teams as well as individual authors in mind. The different media editors reflect the common practice in computer-based training of having media specialists as part of the design team. In addition, there is often a person responsible for integrating different media. This function is supported by the special purpose multimedia editor. Relationship
between
support for authoring
and educational
efectiveness
SHIVA is intended not merely to provide better support for the task of authoring, but also to improve the quality of resulting courseware. Evaluating this secondary aim obviously involves some measure of improvement in learning gains.
Design and evaluation of a multimedia authoring system
57
There are two aspects to SHIVA which suggest that it might result in improved courseware. One is that supporting authors in specifying high-level teaching objectives and making the structure of the domain explicit will encourage greater reflection on the content of the course. The second is that flexibility of the system will make it easier to make changes to the course and encourage iterative prototyping and testing before courses are released in their final form. Other features of SHIVA which relate to the student environment also suggest that the quality of learning might be improved. One such feature is the provision of several different styles of interaction, ranging from the typical highly structured CAL sequence to discovery learning styles supported by hypertext techniques. SHIVA also supports a mixed mode of interaction, incorporating the approach of “guided discovery”[l4]. Another feature is the ease with which authors can integrate different media, encouraging the development of rich and highly interactive courseware.
Evaluation methods There are few evaluation methods which have been specifically designed to evaluate authoring systems. One example is the method described by MacKnight and Balagopalan [ 121, for evaluating functionality, flexibility and productivity. However, these measures are purely quantitative and lacking in underlying theory. For example, productivity is measured in terms of the amount of time taken by an expert author, and seems unable to address the types of evaluation issues outlined above with respect to SHIVA. Our approach has been to address the issues outlined briefly above using a variety of methods. In order to address the usability of the interface to SHIVA we have begun an analysis using a modification of Task-Action Grammar (TAG) [16], known as D-TAG (Display-oriented TAG) [ 171. D-TAG is more suitable for evaluating the interface to SHIVA since it captures the display-based nature of the interaction. Whilst it has some limitations in, for example, allowing us to predict errors and in assessing the learnability of the interface, it has already proved useful in identifying inconsistencies in syntax, for example. Other methods for identifying common errors have involved conducting expert walk-throughs with the system, where members of the SHIVA design team, other than those who had been involved in interface implementation, were taken through a series of prespecified tasks and provided concurrent think-aloud protocols during the use of the system. In order to address issues concerning the functional model of authoring underlying the design of the system, we have been conducting detailed observational studies of experienced authors using SHIVA to create and modify a small course (student contact time approx. 30 min). Four authors were studied, in sessions lasting approx. 4 h each. Two of the authors were experienced users of ORGUE, but had never used ECAL or SHIVA before. The other two were experienced CAL authors, but neither had used ORGUE, ECAL or SHIVA. Each subject was given 1 h of training in the use of the system. This consisted of an introduction to the overall model of authoring in SHIVA, including a detailed explanation of the purpose and process of creating concept-frame networks in PSAUME, and the rules underlying the system’s teaching decisions. The training session also involved hands-on training in the use of the interface. The authors were seated in front of the system, with one of the evaluators seated beside them to prompt them in providing think-aloud protocols. Two video records were taken: one camera was focussed on the computer screen, whilst the other was focussed on the author, to record any notes made on paper or reference to documentation. Since we were only interested in evaluating the interface to ORGUE and PSAUME, and not the multimedia editors within this session, a set of frames had already been created for the purpose of the evaluation. The domain for the course was described to the authors, and they were given a list of suggested keywords with which to describe the concepts involved in the course. (This is realistic, since most courseware production is done in teams, where the actual content of the course has already been specified in some way by a member of the team, and a specialist author is given the task of producing the courseware from this specification.) Authors were then given the following tasks: (i) to create a flowchart specifying a particular sequence of material within ORGUE, the target sequence was given to the authors to reproduce; (ii) to create a concept map within PSAUME to represent the teaching objectives course, from a list of suggested keywords provided to authors;
where of the
58
CLAIRE O’MALLEY ef al.
(iii) to link concepts and frames in PSAUME and to build a runnable course; (iv) to predict the teaching sequence resulting from the concept-frame mapping they had created and to check the resulting sequence using SHIVA’s debugging and simulation tools; (v) to make certain changes to the teaching sequence using the debugging tools. Some preliminary findings. Although our analyses are not yet complete, some interesting issues were raised during these sessions. One finding which was not anticipated concerned the influence of the graphical interfaces to both ORGUE and PSAUME on the strategies used in authoring. Experienced users of ORGUE seemed to have developed a library of “cliches”, similar to cliches found in studies of expert programmers, which referred to particular logical sequences of material and could be identified visually by recognizing certain patterns. In fact one of the authors, on looking at the flowchart which we had asked him to recreate, decided that the structure was wrong and used his own cliches in constructing the sequence. Many of these cliches also appeared to be common to all authors. This suggests that a useful facility would be to build a library of common routines to facilitate flowchart construction. In addition, experienced authors were able to parse these flowcharts extremely rapidly, in terms of the teaching interaction, even though they were relatively unfamiliar with the domain. Although this suggests some form of domain-independent teaching strategy, what militates against this interpretation is the finding that one or two of the authors actually recreated the material inside frames as well as the structure of the flowchart, pointing out that they needed to understand the material in order to create the appropriate teaching sequence. The relationship between content and structure is therefore not a straightforward one. It is clear that some aspects of control can be separated from content, but not all. Another issue concerning the influence of the graphical interface on authoring strategies was found in the use of PSAUME. Authors made the mistake of assuming that, if a concept had many links emanating from it, that it was the most important concept and would therefore be taught first. This is not necessarily the case, but the overall visual appearance (a gestalt) of the network seemed to have a strong effect on authors’ predictions concerning the overall teaching sequence. This suggests that other visual techniques may need to be employed in order to represent the more subtle aspects of the underlying representation of the domain in ECAL (i.e. importance, generality, relatedness, focus. etc.). We have already developed some of these techniques in earlier implementations of ECAL (e.g. the “hot concept map” described in Elsom-Cook and O’Malley[7]), but they have not yet been included in SHIVA. Another finding of interest was the apparent importance of spatial layout in helping authors create the course structure. One or two of the authors tended to lay out frames in ORGUE spatially, and only when they were satisfied with this arrangement did they connect frames together with links. In addition, although spatial layout has no meaning in PSAUME, authors spent a good deal of time arranging concepts and frames to create a visual structure. However, it is difficult to decide with PSAUME whether this was to help them visually “parse” the relationships between concepts and frames (as it seemed to be in ORGUE) or whether it was simply for each of searching the display, since networks can become quite cluttered. Although authors were encouraged to make use of PSAUME’s facilities for creating subviews of the network, there wasn’t much call for this. This may have been due to the relatively small size of the course in question. In terms of understanding the way in which SHIVA makes teaching decisions, the authors we studied were fairly good at anticipating which frame would be presented next, given the structure they had created in PSAUME. However, this was a relatively small course and it is not clear whether a larger course would be so predictable. Furthr ~cwli. Clearly, this is too small a scale of study to make any decisions about the effectiveness of SHIVA in supporting authors or the educational eirectiveness of resulting courseware. We are currently running two other studies, on a much larger scale, with authors in more realistic settings and involving real courses. One study involves a company called Westmill, in Paris. M.ho ;ire developing courses on teaching commercial English within the banking sector. The other stud! involves geography teachers at the University of Grenoble, who are developing a COLIN !;>I- teaching the processes involved in changes to the water table in the RhoneeAlpes
Design
and evaluation
of a multimedia
authoring
system
59
region. Both studies involve authors who already have a highly structured and well developed courseware design methodology, but have little experience of designing computer-based courseware. This, together with the contrasts in the type of domain being taught, should prove an interesting test case for SHIVA. PROBLEMS
IN EVALUATING COURSEWARE
THE DESIGN
EFFECTIVENESS TOOLS
OF
One of the major problems in evaluating the effectiveness of a complex environment like SHIVA is to do with scale. Meaningful evaluation requires realistic tasks (i.e. courses covering several hours of contact time and where real training needs have been identified) and realistic settings (involving professional CAL courseware design and authoring teams, where members of the team have certain organizational constraints and working practices). Evaluating the interface to SHIVA is not a straightforward matter. Traditional task analytic methods from HCI (e.g. GOMS[18]; TAG[l6]) are too low-level. We found that we really needed some kind of formal representational scheme to represent the functional model underlying SHIVA and how it was reflected in interface operations, but the formalisms we tried to apply took us too far away from the model of the system. As a result, we were unable to represent interactions between lower level interface operations which appear at a higher level with respect to the user’s task. The most valuable method for collecting data on the learnability and usability of both the underlying functional model and at the level of interface operations was the detailed observational studies of authors, where tasks were held constant and specified clearly in advance. However, there are two drawbacks to this method. One is the amount of data generated and the time taken to analyse it. (Incidentally, in order to speed up the analysis of the data, the members of the evaluation team sat through the videotaped sessions together and discussed various points as they arose. This turned out to provide not only a more thorough analysis than one person was able to provide alone in the same amount of time, but it also generated suggestions for modifications to the system and for further investigations to be carried out.) The second drawback is that such sessions are unrealistic tests of the system, which would normally be used by a team of authors and over several weeks if not months of development of the course. A thorough evaluation of the system should ideally incorporate a number of methods, each of which would address a particular aspect of the system’s use and at different levels of granularity. In addition, the evaluation should probably take place over a period of at least two years, rather than the few months we have been able to spend on it, and which most research projects are able to spend on evaluation. Such an investment is clearly costly, particularly if the system may eventually have to be modified. However, the cost of modification can be reduced if the evaluation can be carried out early in the design of the system. Finally, we believe that an investment in the design and evaluation of better authoring tools can make a big impact on the improvement of computer-based learning, both in educational and in commercial training applications. We still need to demonstrate the case with SHIVA, but one only needs to look at other programming domains to see the sense in developing better high-level tools for authoring. Unfortunately, most of CAL is still in the world of low-level programming languages such as BASIC and FORTRAN. This wouldn’t matter if it was simply a case of making the task of authoring more difficult, although that’s bad enough. What’s worse is if it means that we continue to develop low quality courseware as a result.
REFERENCES 1. Barker P.. Aurhor Languages for CAL. Macmillan, London (1987). 2. Barker P. and Singh R., Author languages for computer-based learning. Br. J. Educ. Techn. 13, 167-196. 3. Bierman D., ‘Intelligent’ authoring systems: towards better courseware. Proceedings of the Conference on Computers, Educarion and the Child. Urgench, U.S.S.R. (1988). 4. Card S., Moran T. and Newell A., The Psychology of Human-Computer Interaction. Erlbaum, Hillsdale, N.J. (1983). 5. Elsom-Cook M. (Ed.), Guided Discovery Tuforing. Chapman, London (1990). 6. Elsom-Cook M. (Ed.), Extended computer-assisted learning: Minimalism in guided discovery. In Guided Discovery Tuforing. Chapman. London (1990).
60
CLAIRE O’MALLEY er al.
7. Elsom-Cook M. and O’Malley C., ECAL: Bridging the gap between CAL and ITS. Computers Educ. 15,69-81 (1990). 8. Gaines B. and Boose J., Knowledge Acquisifion for Knowledge-Based Systems. Academic Press, London (1988). 9. Grosz B., Focus-spaces: a representation of the focus of attention in a dialog. In Understanding Spoken Language (Edited by Walker D.). North-Holland, New York (1978). 10. Guir R., Conception Pedagogique: Guide Methodologique. L’Association Regionale pour le Developpement de 1’Enseignement Multimedia Informatise, Lyon (1986). 11. Howes A. and Payne S., Display-based competence: towards user models for menu-driven interfaces. Int. J. Man-Much. Srud. 33, 637455 (1990). 12. MacKnight C. and Balagopalan S., An evaluation tool for measuring authoring system performance. Commun. ACM 32, 1231-1236 (1989). 13. Naughton J., Artificial Intelligence: Applications to Training. Open University report, October 1987. 14. Payne S. and Green T., Task-action grammar: a model of the mental representation of task languages. HumnnComputer Interact. 2, 93-133 (1986). 15. Pirolli P. and Russell D., Towards theory and technology for the design of intelligent tutoring systems. Proceedings of the International Meeting on Intelligent Tutoring Systems (IMITS-88). Montreal, Canada (1988). 16. Posner G. and Rudnitsky A., Curriculum Design. Longmans, London (1986). 17. Russell D., Moran T. and Jordan D., The instructional design environment. In Intelligent Turoring Systems (Edited by Psotka J., Massey L. and Mutter S.). Erlbaum, Hillsdale, N.J. (1988). 18. Spensley F., Generating domain representations for ITS. In Arfzjicial Intelligence and Education (Edited by Bierman D., Breuker J. and Sandberg J.). IOS Publishing, Amsterdam (1989).