Multimodal tool support for creative tasks in the visual arts

Multimodal tool support for creative tasks in the visual arts

Knowledge-Based Systems 13 (2000) 441±450 www.elsevier.com/locate/knosys Multimodal tool support for creative tasks in the visual arts q J. Sedivy a...

1MB Sizes 0 Downloads 77 Views

Knowledge-Based Systems 13 (2000) 441±450

www.elsevier.com/locate/knosys

Multimodal tool support for creative tasks in the visual arts q J. Sedivy a,1, H. Johnson b,* a

b

Telepresence Systems Inc., 300-8 Market Street, Toronto, Canada M5E 1W5 Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, UK

Abstract This paper presents an investigation into computer tool support for creative tasks in the visual arts, in particular sketching. The research draws from both theoretical models of creative processes and empirical observations of sketching activity in the domain of character development in animation and illustration. These sources show that creative problem solving through sketching is a highly iterative process during which various constraints are rapidly evolving and being adapted to. A set of requirements for an experimental, multimodal sketching tool was developed, drawing from our theoretical and empirical studies. In order to support the intense, rapid nature of the task, voice input was implemented allowing the user to access functionality without interrupting the hand activity. The multimodal tool also included radial marking menus in order to provide rapid navigation to functionality. Informal evaluations were conducted that provided qualitative feedback about the use of the system. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Sketching; Multimodal tools; Creativity

1. Introduction The goal of the research to be reported in this paper was to investigate ways in which creative tasks in the visual arts, in particular sketching, could be supported by computer tools. There were two initial aims of the research; to review literature related to both psychological and philosophical models of sketching and to identify the feasibility of employing multimodal input as a means of supporting the rapid dialogue between the artist and the sketch. Investigation into a variety of domains was undertaken in order to make decisions about which task categories to support, and to analyse the requirements that any proposed sketching support tool should meet. There were two important facets of the research; collecting, analysing and modelling data about creative sketching tasks and discussing the possibilities for computer tool support with professionals in the ®eld. The speci®c domain chosen to be supported by the proposed multimodal tool was character development in animation and illustration. As a result of the observational and modelling * Corresponding author. Tel.: 1 41-122-532-3215; fax: 1 41-122-5826492. E-mail addresses: [email protected] (J. Sedivys), [email protected] (H. Johnson). 1 Tel.: 1 1-416-777-1177, ext. 32; fax: 1 1-416-777-1188. q Derived from `Supporting creative work tasks: The potential of multimodal tools to support sketching', published in the Proceedings of the Third Conference on Creativity and Cognition, Loughborough, UK, October 10± 13, 1999, pp 42±49. Reproduced with permission from ACM q 1999.

phases, a multimodal vector-based drawing program was developed and informally tested on a number of experienced, professional users. The intention was that the informal evaluations would provide information about design issues concerning speech input for creative work tasks and that the sketching tool could be used as a test-bed for further research on the role of input modality on sketching. The informal evaluations revealed the dif®culty subjects encountered when using new forms of input, types of menu and new tools and alternative approaches to evaluation are suggested. 2. Creativity Creativity can refer to various problem solving, non-linear thinking or inspirational creativity (from the Muses). The term can be applied to thought processes in the arts or in the sciences. Creativity is not easy to de®ne, but can be construed (cf. Refs. [2,15]) as a process whereby there is an evolution towards a solution to a problem, which makes use of a combination of logical and illogical mechanisms. When faced with any creative task or problem, it is essential to be able to investigate a variety of alternative solutions and this process can outwardly vary across disciplines. For example, scientists might use diagrams and equations written on a chalk board while musicians might doodle on the keyboard. In the case of creativity in the visual arts, the production of rough sketches in the initial stages of design is extremely common. Perceptually and physically these three activities are quite different and yet

0950-7051/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. PII: S 0950-705 1(00)00064-2

442

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450

Fig. 1. First two renderings of a Winnie the Pooh scene.

they all serve a similar purpose in the creative process. They all can be seen as external representations of ideas that can be manipulated, modi®ed, communicated and used to stimulate the development of new ideas. From the perspective of providing tool support for creativity, it is important to understand the relationship between the designer and the external representations they employ and for what purpose. Within the domain of visual arts, sketches and diagrams are the most common means by which ideas can be explored; therefore, it is necessary to analyse the activity of sketching.

3. Sketching 3.1. Psychological models of sketching The relationship between the physical act of sketching,

the sketched images, and the cognitive processes of the artists themselves is frequently described as a ªconversationº between the drawing and the artist [15,19,21]. In creating a sketch, designers are making an explicit representation of their ideas, which then aids in further reasoning about the problem. It is useful in this context to distinguish between sketches that are used to explore ideas and therefore are a means to an end and the drawings that are the end products of the sketching process. ªReasoningº through the initial sketches need not necessarily be logical, but can be ªloosely structuredº or analogical. Moreover, different perceptual ªmodesº can be stimulated by the same drawing. For example, Goldschmidt [9] de®nes two different states, which she refers to as seeing as and seeing that. Seeing as refers to the process of using ª®guralº thinking while sketching and seeing that refers to the use of non-®gural elements to reason about the design. Suwa and Tversky [21], in analysing how architects perceive their sketches, devised a taxonomy of information categories that could be used to classify what designers described about their drawings. They further decompose Goldschmidt's seeing as mode into emergent properties, spatial and functional relations and background knowledge. These categories were then used to de®ne chunks of information that were perceived by the designers. Their analysis revealed that detailed consideration of topics related to a particular idea occurred during examination of local spatial relations in a sketch. Oxman [18] concentrates less on the data perceived by designers than on the process of design itself. She also argues that there are multiple abstracted representations that the designer can extract from the sketches. These abstractions are broadly divided into three categories: typological schema, topology and formal systems. Each of these abstraction categories are associated with operational methods that can be observed as sketching actions such as re®nement, generalisation, scaling or symmetry operations. Her model of visual reasoning in sketching is based on the rerepresentation hypothesis, which de®nes creativity as using a cycle of re-representations as a means for conceptual exploration. She extends this theory by arguing that rerepresentations are driven by external and internal constraints and that the new designs adapt to satisfy these constraints. The adaptation is achieved through the designer's perception of the different abstraction levels and the execution of associated operations. The resulting model of re-representation is described as a cyclical activity that passes through a series of distinct stages before repeating itself. 3.2. Character sketching for book illustration The tasks to be considered in depth relate to character development in illustration or animation. For the sake of brevity, only our studies of illustrating will be outlined here. Illustration, for instance in a book, can involve either

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450

443

Fig. 3. Experimenting with details in the margins.

the creation of new visual representations of ®ctional characters or incorporating prede®ned personas into a scene. Unless the author and the artist work in close collaboration, it is typical for illustrators to receive a copy of the story text and a short description of what should be depicted on each page. It is then left to the artist's discretion as to the overall composition of the picture and the precise positioning of the characters relative to each other. This decision is based on the illustrator's knowledge of what is happening in the story and knowledge of the general style expected by the storywriter. In the case of small independent writers, this may be an informal description or in the case of large clients such as Disney or Warner Brothers, there may be strict style guides. Further mitigating this decision is the knowledge about established rules of the way compositional elements lead the eye around the page so that it focuses on the desired element and an understanding of established rules referring to the way compositional elements contribute to dramatic feeling. Initially, the illustrator works out different ideas for the composition as thumbnail sketches. These are done with thick, soft pencils in a light colour. As the ideas for the composition become more re®ned, darker colours and thinner pencils are used to draw over the existing drawing. Also as the ideas become more re®ned, the drawing is redone on a larger scale so that smaller details can be worked out. Throughout the sketching process, the artist pauses to rotate the paper at different angles or tilts their head to get a different perspective. While drawing a scene that involved a lot of movement and characters jumping around, one artist jiggled the paper up and down to aid in visualising how the motion could appear. This iterative process can be seen in the sketch samples in Figs. 1 and 3. In the ®rst image (Fig. 1a), the artist has roughly laid out the appearance of the scene and the approximate positioning of the characters. At this stage, the char-

Fig. 2. Final version of the Winnie the Pooh scene.

acters are not recognisable, they merely serve as compositional elements. The thickness and `fuzziness' of the lines are very deliberate because they allow the artist to imagine more than one possibility for a particular element. In fact, several scene elements have been drawn in more than one position for the purpose of experimenting with the scene. For example, the leftmost character (Tigger) and the rightmost character (Christopher Robin) have been drawn both with their arms at their side and with one arm waving in greeting. Rubbers are never used. If a ªmistakeº is made, it is simply drawn over in a darker pencil or traced over on a new piece of tracing paper. At the early stages of the sketch, however, the artist does not consider these as mistakes, rather they are experiments with different compositions and poses. In many situations, the illustrator prefers to have both representations visible to him so that he can evaluate their dramatic and compositional impact, simultaneously. It is also common to experiment with elements of a scene in whatever space is available on the drawing surface. Fig. 3 shows a part of an illustration where the details of a character's feet are experimented with in the margins. Once the composition of the scene has been decided, the artist places a sheet of tracing paper over it and roughly retraces the scene that was last drawn in the darkest colours (Fig. 1b). In this iteration, detailed features of the characters and their facial expressions are developed on a larger scale than the initial thumbnail. The peripheral elements of the scene can be ignored for the moment, but are revisited later for the ®nal image, which integrates the story text with the illustration (Fig. 2). At these later stages, the artist might consult a book of style guides provided by the client to make sure that the proportions have been drawn correctly and that

444

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450

the movements and facial expressions are ªin characterº. As the drawing nears completion, the image is photocopied onto the page containing the story text to ensure that no crucial elements of the drawing interfere with the text or the spine of the book. A photocopier is a common tool during the later stages in the drawing, when the composition is being ®nalised. Images, or parts of images are often enlarged or reduced then cut out and glued into a new drawing. For example, the artist might especially like the way a particular character was drawn, but for its wrong size. This provides a clue as to the types of operations (scaling, copying, rotations) that could be useful to support in a sketching tool. These operations are clearly dif®cult for humans to perform since an accomplished artist would rather walk over to another room, make an enlarged/reduced photocopy, cut out an image and paste it into a scene, simply to save the effort of redrawing a precise copy in a different size. 3.3. Tool support for sketching The studies reported in Section 3.2 clearly indicate the iterative, manipulative and reasoning elements of representing and re-representing ideas for the purposes of solving creative design problems. We need to know if tools can aid this activity and what current computer tool solutions exist. Although there are a wide variety of powerful CAD tools to assist architects and graphics designers to create representations of their ®nal designs, very little exists in the way of support for the earlier stages of the design process. In fact, some of the tools can actually hamper creativity [3]. It is crucial from an HCI perspective to consider which part of the activity or process of sketching should be supported. There are broadly two possibilities. The ®rst relates to supporting the designer with the representation process itself on an abstract level. Suggesting new or alternative representations, de®ning constraints, or assisting the user in lateral thinking can do this. This approach has been explored extensively by Candy and Edmonds [5], Suwa and Tversky [21] and Gross [7,10] amongst others. The second possibility relates to supporting only the physical act of sketching. Under this paradigm, creativity is solely the responsibility of the human using the tool. Support for this level of the creative process has been relatively unexplored. A notable exception is the Electronic Cocktail Napkin [10], which provides some support for drawing management and allows users to sketch diagrams using a stylus input. However, because most of the tool's features are focused towards assisting the user in de®ning spatial relationships and constraints, it falls more readily into the ®rst category. Our approach was to explore the potential of aiding the physical act of sketching thus supporting what Oxman terms ªoperational methodsº rather than ªabstraction levelsº. How can this be accomplished?

Since sketching consists of intense activity for the hands, speech input could be used potentially to manipulate a sketch. Speech input, in lieu of commands expressed through traditional menu bars, would leave the hands free to work on the art and for experienced users also save time. A mixture of speech and gesture is also a natural combination for certain types of expression that are dif®cult to express through menus. Speech input alone has a number of potential disadvantages, for instance, how is the novice user given information about what commands are available and how will they understand the structure of the command set? An additional problem relates to the fact that we do not know whether issuing linguistic commands will interfere with conceptual thought processes, as in the Stroop effect in psychology. It is therefore considered appropriate to combine the respective strengths of two modes of input, menu-driven and verbal input. The view that speech and gestures are complementary modes of communication rather than redundant is supported in HCI studies by Oviatt [17], Mignot and Carbonell [16], Cohen [6] and Hauptmann [11]. This notion also has formed the basis of the architecture underlying the multimodal software framework proposed by Vo and Waibel [22]. Prototype applications that use multimodal input have been constructed successfully in recent years, but there is scant evidence from user testing that the integration of input modes was successful from a usability perspective as opposed to a purely technical perspective. This means that there is a lack of design principles or heuristics to guide designers in constructing such systems. 4. Towards a designed solution A ®rst step towards a designed solution is to collect, analyse and model data about how designers undertake the creative activity of sketching. This will provide, along with the models of sketching outlined earlier in the paper, a basis for identifying user requirements to be satis®ed by the tool. The task was speci®c enough to identify relatively, narrowly de®ned user requirements; but the sketching activity inherent in it is general enough for the results of the user studies to be applied to other sketching applications. Based on our ®rsthand observations, the sketching task was shown to be highly iterative in nature and the psychological model of sketching proposed by Oxman referred to previously was found, with modi®cation, to be an appropriate representation of the activity of sketching observed and described above. Although the test subjects in Oxman's study were architects rather than animators, it was found that there were similarities in the process of sketching across the domains and only minor adjustments to the model were required. The abstraction levels of typological schema, topology and a formal system still apply in this domain, however, the associated operational methods are slightly different.

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450

445

Fig. 4. Abstraction levels and their associated methods for character and illustration sketching.

The most signi®cant difference between the domain of architectural sketches and character illustration appear to be in the abstraction level of the formal system. In the latter domain, the conceptual components tended to include elements such as compositional rules, perspective and style guides rather than the central, linear and axial concepts employed by architects. In addition, the operational elements differed between the domains. In particular, symmetry operations were not observed, but translation, rotation, scaling and zooming were commonplace. Fig. 4 shows Oxman's model [18], with modi®cations appropriate to our domain. For a full account of the modelling process and results, see Ref. [20]. The sources for the requirements consist of the observed task behaviour by subjects and from the relevant literature. In addition, the requirements bene®t from informal discussions with designers and informal evaluations with existing technology. 4.1. Requirements The main requirement is to support a rapid iteration of design ideas. This requirement appears to be of fundamental importance and can be achieved through a number of design features: 1. Allow access to functionality as quickly as possible by ensuring dialogues and command sequences are kept as short as possible. 2. Support the use of layers. Transparent layers allow designers to create a modi®ed version of a previous sketch, tracing paper is frequently used and could provide a useful metaphor for a sketching program. The layers should be added, removed, shown or hidden very easily. 3. Support a variety of pencil thicknesses and colour intensities. All the designers, who used pencils to sketch, used either different sized or coloured pencils or used varying pressures to achieve the same effect.

4. Allow users to perform manipulations on their drawings. It should be easy to move and manipulate lines. The transformations to be supported should be rotation, scaling, insertion and deletion. 5. Support the expected aesthetic. Users of design packages now expect images to be anti-aliased so that their work does not appear computer drawn. 6. Provide as much space on the screen as possible for drawing. Several users complained that the assortment of palettes and widgets on the edges of their screen took up too much space.

4.2. System design Stemming from the requirements are two fundamental design principles that underlie the implementation of Speak 'n' Sketch. The ®rst is to minimise the time spent in accessing functionality so that as much time as possible is spent actually drawing. The second is to make the screen as uncluttered as possible, making maximum space for drawing. Rough sketching is a rapid process during which ideas can evolve from conception to completion in a matter of minutes and access to drawing functionality therefore needs to be rapid. Most solutions to this problem relate to providing palettes of widgets, which are always available, but this then exacerbates the second problem of lack of screen space. This becomes a major problem, highlighted by designers, because they then can spend signi®cant amounts of time scrolling around their image, as it does not ®t on the screen. Driven by these two principles, the initial screen layout for Speak 'n' Sketch does not have anything visible that is not currently being used. All functionality can be accessed via voice commands and pop-up menus. A layering system has been implemented to simulate tracing behaviour (see Fig. 5). As layers are added, the lines on the lower layer become progressively lighter in colour thus allowing the

446

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450

Fig. 5. The Speak 'n' Sketch interface.

artist to trace over a rougher version of the image. The layers can be navigated by clicking on their associated tab with the stylus. These tabs also become lighter in colour the further down the pile they become. A thick black line outlines the topmost active one. Up to ®ve different sheets can be visible at a time, but each can be arbitrarily hidden or displayed by toggling the check box on the tab. Although layering is used in applications like Photoshop, they do not have any opacity at all so lower layers are indistinguishable from the higher ones and hiding individual layers is a multistep process. The drawn images are treated as vectors, which means that rather than being represented as a collection of pixels the image is represented as a set of lines. Each line is an arbitrarily shaped ®lled polygon than can be individually selected, translated, rotated and scaled. Lines can be grouped together and treated as single entities that can also be moved, rotated and scaled. Whenever an object is

selected, the corners of the bounding box are drawn in, either as circular handles for rotation or edged corners for scaling. The user can grab one of these handles by pressing down on them and dragging. Resizing is a common manipulation performed by illustrators and graphic designers. Photocopiers are used to enlarge or reduce images and then they are glued onto a new drawing. In addition to resizing an image for compositional purposes, the scaling option provides a facility for working on details of a drawing whereby a user can quickly enlarge an image, work on the ®ne detail and then scale it back down to the desired size. The direct manipulation of the translation and rotation also mimics to some degree the type of paper jiggling observed in artists drawing very dynamic, active scenes. In bitmap-based packages such as PhotoShop, this type of direct manipulation is not possible. In order to rotate or scale parts of an image, the area must

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450

Fig. 6. Pie menus: (a) in regular mode; and (b) in marking mode.

®rst be selected and then a dialogue sequence entered into in which the user must specify the angle and direction of the rotation or the scaling ratio. The pop-up menus are implemented in a radial format rather than the traditional linear one. Early research has demonstrated that selection from radial menus is much quicker than from linear ones [4]. Furthermore, because the selection is based on broad angles rather than the precise distance travelled down a linear path, commands can often be executed from a radial menu without necessarily attending to it visually. Submenus are displayed by selecting a node in a parent menu, similar to the way in which nested menus work. The submenus are also radial so a complete selection from hierarchical pie menus will consist of a series of strokes with angles to differentiate them. Further investigation into the design of pie menus has been conducted by Kurtenbach and Buxton [13,14]. In this work the concept of a radial menu has been pared down even further by introducing the idea of marks to make menu selections. Essentially, once the user is familiar with the location of menu items, it no longer becomes necessary to display the entire menu and selections can simply be made by making the appropriate gesture. As a minimum con®rmation of an action being performed when the menu is not displayed it is still helpful for the user to see an ink trail of the cursor's movement [13]. This is referred to as marking mode. In this mode, there is virtually nothing obstructing the view of the drawing, again reducing visual interference to the absolute minimum. Fig. 6 shows pie menus in both regular and marking modes. The design of the menus in Speak 'n' Sketch are based on the guidelines for radial marking menus established by Kurtenbach [13]. All of the commands that can be executed with the menus

447

can also be performed with verbal input with the added advantage that the hierarchical structure of the menus does not have to be navigated. For instance, in order to obtain a thicker brush stroke, the user has to say only the phrase ªthicker penº rather than saying ªthickness menuº followed by thicker pen. By ¯attening out the grouped structure of the command set, voice input permits a more direct and less tedious dialogue sequence. Another advantage is that simultaneous actions can be performed by using multimodal input. For example, the user can be rotating an object using a stylus and change the colour of it using verbal commands; an unwanted layer can be removed with the menu commands while a new one is added with a voice command, etc. One of the design issues that must be addressed while designing voice activated systems is the question of how to provide adequate feedback to the user. A simple solution is to provide a status bar that con®rms their request or informs them that their command has been understood. Another solution would be to provide auditory feedback. A status bar was provided in Speak 'n' Sketch on the basis that this was likely to be less intrusive. However, adequate studies have not been conducted yet to support or refute this assumption, and it also could be a question of personal preference. The system design went through several iterations to take into account feedback from professionals with artistic and creative backgrounds who had different degrees of experience with technology. In summary, changes were made to how the pop-up menus were activated, to the method for hiding and showing individual layers and to shape manipulation. 5. User studies Informal evaluations with a small number of users have been carried out on the redesigned system. Whilst we would not want to claim that the results are rigorous and widely generalisable, they do provide valuable data and lessons learned about designing tool support for creative tasks. First, assumptions about the nature of the task, i.e. speed of drawing, and the implications for system performance are discussed. Second, the study and partial results are described and ®nally alternative ways of conducting the evaluations are presented. The perception of the analyst in observing the sketching tasks being undertaken was that the task consisted of very rapid strokes. This rapidity was supported by the system and all system testing was based on this assumption. When subjects were given a real task in the evaluation studies, they drew far more slowly than expected. This difference could be either due to an illusion, an unintended and perceptual exaggeration on the part of the analyst observer. Alternatively, the demand characteristics of the task and task setting resulted in slower than normal performance from

448

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450

the artists and designers. The result was that the internal representation of shapes contained several orders of magnitude, more points than was envisaged, and system performance slowed accordingly. The speed of drawing was still reasonable, but object manipulation became dif®cult. System performance was therefore a signi®cant usability problem. There were a number of goals of the study: to investigate the ways in which subjects evolved their character ideas via sketching using traditional materials and the designed tool; to examine the role of multimodal input in the context of a creative task; and to consider the design issues related to speech input and multimodal systems, in general. Because of the slow performance of the tool, it was not realistic to undertake experimental comparisons of traditional media and the designed tool. Rather, it was decided to carry out in-depth studies with a few subjects, allowing them to undertake realistic sketching tasks and provide useful qualitative feedback about the design of the system. The studies consisted of giving subjects a ®ctional character description for ªRoderickÐthe amazing daredevil frogº whom they were then asked to represent in one of his daredevil activities for a promotional poster. The type of task was derived from examples given by professional illustrators and animators. In addition, it is very common to use reference materials during such a task and so they were also provided with several images that could be referred to or traced over. Prior to the task, a brief demonstration of the system was carried out and the functionality of the marking menus explained. The subjects were then given time to familiarise themselves with the system, and to execute small subtasks such as drawing lines, grouping, changing colours, etc. They were then asked to dictate a series of commands to the speech recognition system. This had the dual purpose of ®ne-tuning the voice recognition to their voice, and familiarising them with the command set. After these preliminaries, they were asked to create a representation of Roderick the frog. During this phase they were videotaped, with the intention that they may provide a retrospective protocol regarding the evolution of their concepts later. This technique was used in a similar sketching study by Suwa and Tversky [21]. Prior to their study, the most common technique was to ask the designer to give a running commentary or concurrent protocol. This obviously will affect the ability to do the task [8] and Speak 'n' Sketch would be unable to differentiate speech intended as input from that intended as a commentary. 5.1. Results The ®rst subject was a professional illustrator who worked exclusively with traditional media (paper and pencils). Despite initial apprehension at using the new technology, he adapted to the use of the tablet and stylus remarkably quickly. He undertook a number of different sketches to get the composition of the scene, to get a sense of what

Roderick's face would be like and the later sketches produced were more ®nal renditions of the scene. Because he was accustomed to using only traditional media, he did not seem to need to make use of any of the transformation functions that were available. In his particular method of working, these type of manipulations come towards the end of the task rather than in the early stages. Hence he did not attempt to group objects together, resize, rotate or translate them. However, the colour palette of progressively light and dark shades was particularly useful for his style of sketching. The second subject was a professional animator who frequently designed web-based animations. She was highly experienced with a tablet and stylus and was accustomed to operating a tablet and keyboard simultaneously. Although she attempted to perform transformations on the character, performance problems ensued and she subsequently con®ned her activity to drawing. The third subject was a professional graphics designer who was highly experienced in a wide range of drawing packages and taught a course in computer graphic design at a college of art. She had used tablets before, but primarily used the keyboard and mouse in her work. Her version of Roderick went through less iteration than that of the ®rst subject, and she made much more use of rotation and scaling operations. Her initial image of the frog was drawn with him facing upwards because she found him easier to draw that way. He was then rotated and resized smaller until he was facing downwards and a parachute was added. She did not make use of the layer functionality. 6. Discussion and conclusions Performing simple tasks gave the subjects enough experience of the system to be able to comment positively and negatively on the design features of Speak 'n' Sketch. All of the subjects agreed that the design of the layering system was very effective. They found that the tracing paper effect was very useful for redrawing purposes and it helped to know which layer was currently being worked upon. The speech input was also met with great enthusiasm. In particular, subjects who used computers regularly in their work were convinced that this would increase their productivity because it was faster and ªallowed continuity of thoughtº. ªIt means you don't have to stop and think about where you have to goº commented one subject. In a similar vein, another subject noted that ªI think those words a hundred times a dayÐgroup, optimise, rotate. It would be so good to just say it as soon as you think itº. These remarks seem to suggest that in addition to speeding up the execution of functions, verbally expressed commands also reduced cognitive processing. We have to be careful, however, on the basis of qualitative data and subjective comments, without the bene®t of cognitive modelling and performance data, to claim that the transition from thought to speech is more

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450

direct than thought to hand movement. Further studies combining the use of task analytic techniques such as Task Knowledge Structures (TKS) [12], and cognitive modelling of mental resources such as in Interacting Cognitive Subsystems (ICS) [1], would aid in addressing this question. The subject who depended on a tablet for her work remarked that the angle at which she worked in order to operate both tablet and keyboard caused shoulder and back strain. Voice input would alleviate this problem. She felt that at a minimum voice input had the same functionality as keyboard shortcuts and would improve the quality of her work. The disadvantage, however, of the voice input and one often referred to in studies of multimodal input, is the disruption to other workers in situations like open studios. The disruption whilst real and signi®cant is no worse than someone talking on the telephone. In fact it could be considered less disruptive in that to people not involved in the task, the vocalisations consist of random (to them) commands out of context and not potentially interesting conversations. All of the users wanted more palettes available to them as space in Speak 'n' Sketch was much more than in commercially available packages where palettes could be a problem. Certain functionality was immediately needed to be in one easy click such as changing pen size. The inexperienced computer user made the analogy between picking up a new pencil and clicking on a pen-size palette. He considered this to be more natural than moving through a menu. The radial menus received mixed reviews, two people said they would use them in a real system, but the third preferred keyboard shortcuts. Her rationale was that the hand not being used for drawing could be used for the keyboard commands. There is a deep issue here about the degree of disruption and the cognitive demands that relate to keyboard shortcuts and radial menus. This needs further systematic investigation. The evaluation had to be of an informal nature because it would have been unrealistic to expect users to quickly learn a new system, adapt to voice input, adapt to a new type of menu and still expect them to be able to draw in their preferred manner, naturally and easily. There were many new features of the implementation; whilst this allowed us to achieve feedback on a number of design issues and choices, it made it impossible to isolate the effects of individual design features as opposed to their use in combination. This would have been necessary in any systematic experimental manipulation of the factors for the purposes of validating the psychological models of sketching, comparing task performance and quality in traditional and computer supported contexts, and in assessing the behavioural and performance factors associated with the cognitive load of different use of modalities. A longitudinal study also would be necessary to understand how the tasks evolved through long term use of computer support. Some features might prove useful, unnecessary or need to be modi®ed or enhanced.

449

In future research, we aim to use extensions to the implemented system as a test-bed for undertaking further studies, where we will isolate the effects of speech input and changes to the menu system over the short and long term. We also intend to undertake further, theoretical and empirical research into identifying, measuring and assessing the impact on performance of different cognitive resources used in interacting with different modalities. The ultimate research aim is to develop principles for multimodal tools to support highly creative and iterative work tasks.

Acknowledgements We are grateful to the British Council for funding this research. Many thanks to Justin Wyatt for contributing all the sketches in this paper. Walt Disney Co copyrights all sketches.

References [1] P.J. Barnard, J. May, Representing cognitive activity in complex tasks, Human Computer Interaction (1999) 14. [2] M. Boden, The Creative Mind: Myths and Mechanisms, Weiden®eld & Nicholson, London, 1992. [3] S. Bhavnani, et al., CAD usage in an architectural of®ce: from observations to active assistance, Automation in Construction 5 (3) (1996) 243±255. [4] J. Callahan, et al., An empirical study of pie vs linear menus, CHI88,1988, pp. 95±100. [5] L. Candy, E.A. Edmonds, Supporting the creative user: a criteriabased approach to interaction design, Design Studies 18 (2) (1997) 184±194. [6] P. Cohen, The role of natural language in a multimodal interface, UIST, 1992, pp. 143±149. [7] E.Y. Do, M. Gross, Drawing as a means to design reasoning, Arti®cial Intelligence in Design '96, 1996, pp. 22±27. [8] K.A. Ericsson, H.A. Simon, Protocol Analysis: Verbal Reports as Data, MIT Press, Cambridge, MA, USA, 1984. [9] G. Goldschmidt, The dialectics of sketching, Creativity Research Journal 4 (2) (1991) 369±383. [10] M. Gross, The electronic cocktail napkin: a computational environment for working with design diagrams, Design Studies 17 (1) (1996) 53±69. [11] A. Hauptmann, Speech and gesture for graphic image manipulation, CHI89, 1989, pp. 241±245. [12] H. Johnson, P. Johnson, Task knowledge structures: psychological basis and integration into system design, Acta Psychologica 78 (1991) 3±26. [13] G. Kurtenbach, Some articulatory and cognitive aspects of marking menus: an empirical study, Human Computer Interaction 8 (2) (1993) 1±23. [14] G. Kurtenbach, W. Buxton, The limits of expert performance using hierarchic marking menus, CHI93, 1993. [15] B. Lawson, How Designers Think: the Design Process Demysti®ed, Butterworths, London, 1990. [16] C. Mignot, N. Carbonell, Commande orale et gestuelle: etude empirique, Technique et Science Informatiques 15 (10) (1996) 1399±1428. [17] S. Oviatt, Multimodal interfaces for dynamic interactive maps, CHI96, 1996.

450

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450

[18] R. Oxman, Design by re-representation: a model of visual reasoning in design, Design Studies 18 (4) (1997) 329±347. [19] D. Schon, Designing as re¯ective conversation with the materials of the design situation, Knowledge Based Systems 5 (1992) 3. [20] J. Sedivy, Multimodal tool support for sketching, QMW Technical Report, 1998.

[21] M. Suwa, B. Tversky, What do architects and students perceive in their design sketches? A protocol analysis, Design Studies 18 (4) (1997) 385±403. [22] M.T. Vo, A.Waibel, A multimodal human computer interface: combination of speech and gesture recognition, InterCHI93, 1993.