LASAD: Flexible representations for computer-based collaborative argumentation

LASAD: Flexible representations for computer-based collaborative argumentation

Available online at www.sciencedirect.com Int. J. Human-Computer Studies 71 (2013) 91–109 www.elsevier.com/locate/ijhcs LASAD: Flexible representati...

1MB Sizes 0 Downloads 24 Views

Available online at www.sciencedirect.com

Int. J. Human-Computer Studies 71 (2013) 91–109 www.elsevier.com/locate/ijhcs

LASAD: Flexible representations for computer-based collaborative argumentation Frank Loll1, Niels Pinkwartn Clausthal University of Technology, Julius-Albert-Str. 4, 38678 Clausthal-Zellerfeld, Germany Received 16 February 2011; received in revised form 16 February 2012; accepted 3 April 2012 Available online 26 April 2012

Abstract Teaching argumentation is challenging, and the factors of how to effectively support the acquisition of argumentation skills through technology are not fully explored yet. One of the key reasons for that is the lack of comparability between studies. In this article, we describe LASAD, a collaborative argumentation framework that can be flexibly parameterized. We illustrate the flexibility of the framework with respect to visualization, structural definitions and kind of cooperation. Using this framework, this paper presents an evaluation of the impact of using an argumentation system with different argument representations and with collaborative vs. individual use on the outcomes of scientific argumentation. We investigate which combinations of these factors produces the best results concerning argument production and learning outcomes. The results of this controlled lab study with 36 participants showed that the use of simple representational formats is superior compared to highly structured ones. Even though the latter encouraged the provision of additional non-given material, the former is less error-prone. A hypothesized structural guidance provided by more complex formats could not be confirmed. With respect to collaboration, the results highlight that arguing in groups lead to more cluttered argumentation maps, including a higher amount of duplicate elements. An expected peer-reviewing between group members did not occur. Yet, groups also tended to include more points-of-view in their arguments, leading to more elaborated argument maps. & 2012 Elsevier Ltd. All rights reserved. Keywords: Argumentation; CSCL; Visualization

1. Introduction The ability to argue is essential in many aspects of life. Nevertheless, many people have problems to successfully engage in argumentation activities (Kuhn, 1991). This is not surprising since argumentation involves multiple skills including (according to Kuhn (1991)) (a) the skill to generate causal theories in order to support claims, (b) the skills to provide evidence to support the generated theories, (c) the skill to generate alternative theories, (d) the skill to imagine and discuss counterarguments to the existing theories as well as (e) the skill to rebut alternative theories. Together these skills are used to ‘‘[y] [produce] opinions accompanied by reasons in favor or against, in combination with questioning, n

Corresponding author. Tel.: þ 49 5323 727124; fax: þ 49 5323 727149. E-mail addresses: [email protected] (F. Loll), [email protected] (N. Pinkwart). 1 Tel.: þ49 5323 727144; fax: þ 49 5323 727149. 1071-5819/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ijhcs.2012.04.002

clarification, explanation and acknowledgment’’ (Munneke et al., 2003, p. 115) in order to ‘‘[y] persuade or convince others that one’s reasoning is more valid or appropriate’’ (Carr, 2003, p. 76). Thus, ‘‘an argument is regarded [y] as a dialog between two (or more) people who hold opposing views. Each offers justification for his or her own view, and [y] each attempts to rebut the other’s view by means of counterargument’’ (Kuhn, 1993, p. 322). The importance of argumentation skills has been widely recognized. Nevertheless, this importance is not mirrored in educational contexts. Here, two perspectives can be distinguished. On the one hand, the learning to argue perspective (von Aufschnaiter et al., 2008; Jonassen and Kim, 2010) aims at promoting the skills required to participate in a reasonable way in argumentative processes. On the other hand, the arguing to learn perspective (Andriessen et al., 2003; Andriessen, 2006; von Aufschnaiter et al., 2008; Jonassen and Kim, 2010; Osborne, 2010) focuses on the command of argumentation skills as an essential prerequisite in order to

92

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

obtain domain knowledge. However, both of these educational perspectives are rarely found in modern education (Osborne, 2010) and their teaching is problematic, caused (among other factors) by limited teacher’s time and availability: face-to-face tutoring is still the favored argumentation teaching method, but does not scale up well for larger groups. One approach to deal with these issues is the use of argumentation systems (cf. Scheuer et al., 2010, for an overview). These tools engage (groups of) students in argumentation by representing the argument in a graphical fashion (e.g., using a graph, table/matrix, thread/tree) and allowing students to interact with this representation. By means of a making explicit and sharing the representation of an argument (an argument map), which is typically only an abstract entity in people’s minds otherwise, these systems enable discussions and are therefore helpful for the ‘‘arguing to learn’’ perspective stated above. An example of such a graphical representation of an argument is shown in Fig. 1. The number of available tools to support the creation of argument representations is large. A recent overview of about 50 approaches is presented in (Scheuer et al., 2010). Even though these tools share a common goal, that is, to support argumentation and argumentation learning, they differ in a lot of aspects. To illustrate these differences, we will briefly describe three argumentation systems in the following: Athena (Rolf and Magnusson, 2002), Belvedere (Suthers et al., 1995; Suthers, 2003), and Digalo (Schwarz and Glassner, 2007; McLaren et al., 2010).

Belvedere (Suthers et al., 1995; Suthers, 2003) is a collaborative educationally targeted argumentation tool for supporting scientific argumentation. Early versions of Belvedere (Suthers et al., 1995) were designed to engage secondary school children in complex scientific argumentation. By means of advisory guidance by an embedded intelligent tutoring system, the students are supported in their argumentation and encouraged to self-reflect. The Belvedere system went through multiple revisions and the focus of the system shifted from advisory guidance to representational guidance (Suthers, 2003), that is, guiding students’ discourse by means of different argumentation interfaces. Digalo (Schwarz and Glassner, 2007) follows a different approach than Belvedere. Instead of focusing on a domainspecific argumentation model, the underlying argument model is flexible. That is, the elements available to model the argument can be defined before the actual argumentation takes place. This way, Digalo is applicable to a larger set of argumentative problems than pre-defined systems such as Belvedere. Digalo was designed to be used in classroom in groups of three to seven students. In order to assist the teacher, it can be connected to ARGUNAUT (McLaren et al., 2010), a tool that was developed to provide a moderator with additional awareness to supervise multiple ongoing discussions at once. Whereas Digalo and (some version of) Belvedere provide support for collaboration, Athena does not. Instead, it is used to argue on one’s own (or together in front of one computer) and compare the results afterwards. Even

Fig. 1. Graph-based visualization of an argument structure in LASAD showing a part of a legal argument based on a given transcript.

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

though Athena aims at supporting general argumentation, the approach to achieve this goal is completely different from Digalo. While the latter provides flexible structural elements, Athena provides just a general contribution type (‘‘node’’) which can be connected via ‘‘pro’’ and ‘‘con’’ relations. However, each element can be assigned a believability score to weight parts of the argument. Even though these systems differ in details, they share a common underlying principle—all make use of (shared) representations for arguing. Thus, an approach that is used in all these systems is the modeling of arguments. However, the specific benefit of this method has been discussed controversially. On the one hand, there are studies that highlighted improved learning effects when applying argument modeling techniques either with help of argumentation tools (Easterday et al., 2007) or even without any tool by just using pencil and paper (Harrell, 2008). On the other hand, however, there are also studies (e.g., Carr, 2003), which reported no significant benefit of argument mapping techniques. Nevertheless, argument mapping itself is only one side of the coin. The other side is the details in which existing systems and studies differ. With respect to shared representations, the main factors are the use of different collaboration settings and different underlying models used to create a shared representation of an argument. One example of where argumentation systems differ is the intended usage mode. Some systems are designed to be used as a supporting means in face-to-face argumentation sessions, while others primarily target remote scenarios. Clearly, the role of the shared representation differs between these two use cases. In the former, the visual argument representation mainly serves to reify the results of the group’s activities, while in remote scenarios the computer application (and the arguments represented therein) also has communicative function. Both settings are practically relevant for education, and both have their advantages (e.g., face-to-face interaction is typically richer, but remote interaction allows for a better analysis of the group’s interaction since all communication can be tracked). There have been studies concerning both the kind of collaboration (Munneke et al., 2003; Lund et al., 2007; Sampson and Clark, 2009) and the role of the argument model (Schwarz and Glassner, 2007; McAlister et al., 2004), yet their direct impact on the results of argumentation and on argumentation skill learning is still unclear. On the collaboration side, Munneke et al. (2003) investigated the effect of constructing diagrams individually before a debate versus collaboratively during a debate. The results were mixed but indicated that diagrams that were constructed individually before a debate were mainly used as an information source for the follow debate whereas diagrams that were constructed during a debate were used to summarize prior chats. In addition, the latter were mostly chaotic compared to the individual ones. Concerning the depth of the discussion, no significant difference could be identified between both conditions. However, the individual preparation led to a broader discussion about topics.

93

Lund et al. (2007) investigated the role of diagrams for debating as compared to diagrams used as a medium to represent a debate. Here, the main results were that the diagram as medium of debate promoted more opinions instead of arguments, whereas the diagrams that were used to represent a debate (presented in a chat) led to a deeper conceptual understanding of the topic. Nevertheless, there were less conflicting opinions present in the latter. Sampson and Clark (2009) evaluated the question whether groups of students craft better scientific arguments than individuals and how group work influences subsequent individual performance. Overall, the results showed no improvement in the argument quality. However, a substantial proportion of the students in the collaborative sessions adopted the arguments created within the group in following debates. In a follow-up transfer task, students who worked in a group performed slightly better than students who worked alone. On the argument model side, Schwarz and Glassner (2007) investigated the effect of floor control, that is, a turn-taking technique to simulate asynchronous collaboration in a synchronous setting, and informal ontologies, that is, the underlying argument model on the argumentation outcomes. Their study results showed a significantly higher amount of chat-style expressions in groups without floor control and ontology than in groups with both. A similar results was shown for the number of non-productive references, which indicates that the presence of floorcontrol and ontology is beneficial. A similar approach has been done by McAlister et al. (2004). Instead of a graph-based visualization, they used an environment which mixed a chat and a thread-based visualization. Thus, the ontology was represented in form of sentence openers which were used to propose a starting part of an argumentation move. The results are in line with the ones by Schwarz and Glassner (2007), that is, the condition with sentence opener support outperformed the one without with respect to relevant contributions. Overall, the empirical results available in literature so far are not conclusive. The presented approaches and results seem to be highly dependent on the contexts in which they have been used and existing studies about their effects (e.g., Janssen et al., 2010; Osborne, 2010; Sampson and Clark, 2009; Schwarz and Glassner, 2007; Schwarz et al., 2000; Suthers, 2003; Toth et al., 2002) are hardly comparable. The reasons are manifold, including different systems, domains, populations, tools, etc. Overall, concerning collaboration, it is widely accepted that arguing in groups can be beneficial for learning (e.g., Osborne, 2010; Schwarz et al., 2000) and that, consequently, shared representations (as opposed to individually used representations) are a reasonable approach. However, even though collaboration has shown to be effective for solving ‘‘highly intellective tasks’’ (Laughlin et. al., 2006), reasonable collaboration does not occur by nature (Dillenbourg et al., 1995; Rummel and Spada, 2005) and unstructured collaborative argumentation per se will not

94

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

lead to higher quality arguments (Sampson and Clark, 2009). Instead, it is important to aid the process of collaboration, e.g., through visualizations (Suthers, 2003, Schwarz and Glassner, 2007), access restrictions (Schwarz and Glassner, 2007) or collaboration scripts (Stegmann et al., 2007). Concerning the underlying argument model and ontology (i.e., the specific representational form used for arguing in groups), the presented studies confirmed that the presence and the granularity of an argument ontology (and the visual representation that it defines) can indeed have an effect on the outcomes of argumentation. Yet, a concrete answer to the question which argument elements are important in which domain (or to learn to argue in general) has not been systematically investigated so far— Suthers (2003) merely noted that a too detailed ontology may confuse students with ‘‘a plethora of choices’’ (p. 34). In summary, even though there is some evidence that visual and structural argumentation scaffolds may guide learners to successful interaction, collaboration and success in learning, it is hard to come to general statements about their usefulness based on the existing studies that differed in many aspects (multiple tools and various settings). In addition, possible interaction effects between the visualization, the kind of cooperation and structural restrictions in terms of specific argument ontologies have not been investigated systematically: do some argument visualizations and ontologies have benefits for individual and/or collaborative usage as compared to others? In this paper, we want to make a step towards a deeper, more comparable evaluation of the representational factors that make educational argumentation systems (un-)successful. First, we describe the LASAD system—a flexible framework that can be configured with respect to collaboration mode and visual argument representation. This way, one tool can be used to investigate the effect of single factors. Then, we describe the use of this tool in a controlled lab study that investigates the impact of different structural scaffolds (i.e., different representations for arguments) and collaboration settings (i.e., comparing the effects of sharing such a representation vs. using it individually) on the outcomes of scientific argumentation in a remote setting.

how best to support argumentation in domains such as the law or science are still unclear. As part of the LASAD (Learning to Argue: Generalized Support Across Domains) project, we wanted to provide a platform that can be configured to fit domain-specific needs and that can be used to conduct more comparable research. Therefore, we collected requirements that are common to most argumentation domains by means of a review of existing argumentation technology (Scheuer et al., 2010; Loll et al., 2011,2012) as well as by conducting a survey among argumentation experts (Loll et al., 2010a, 2010b) and, based on this, designed a system that is capable of dealing with these requirements. In constrast to other configurable alternatives such as COFFEE (Belgiorno et al., 2008) and Digalo, LASAD is completely web-based and does not require any installation procedure—you can just open it in a web-browser. Compared to COFFEE, LASAD is an integrated approach without software break points, in which different applications must be started. Digalo, however, is limited to ontology level definitions. Here, LASAD offers extended capabilities concerning the interface and process design. In the remainder of this section, we will give an overview of the most important framework features. More concretely, we present the flexibility of the system on three different levels: (1) visualizations, (2) structural definitions, and (3) cooperation. As an example, we illustrate how to configure the system to face the needs of a controlled lab study dealing with individual/collaborative argumentation using different ontologies.

2. A step towards comparability: the LASAD framework

2.1.1. The data layer On the data level, the key to flexibility is the definition of argument primitives. In LASAD, it is possible to tailor the system on the data layer to fit to domain specific needs by means of configuration files. Among others, it is possible to define users, argument ontologies and templates. Each user gets a role (e.g., moderator, teacher, or student) in connection with a set of rights (e.g., read, modify, create arguments; give feedback, highlight elements, etc.). The ontology holds the information of the available elements, i.e., the kind of elements (e.g., node and relation) as well as the meta-information of them, i.e., additional visualization information including number of child-elements (e.g., text

One of the major problems of the existing research in the field of computer-supported argumentation is the lack of comparability between studies. Here, the vast amount of available (different) software tools for computer-based argumentation plays a key role. As a recent review (Scheuer et al., 2010) revealed, tools differ in essential points such as the underlying ontology (the available elements to model an argument, e.g., nodes and relations), the visualization mode (e.g., graphs, threaded discussions, tables) or the support for collaboration (individual argumentation vs. (a)synchronous collaboration). Therefore, many of the influencing factors for

2.1. Architecture The LASAD framework is built upon a classic layered architecture consisting of client, server and data layers. Each layer is only able to communicate with its direct neighbor layer via a well-defined interface. This way, it is possible to use and modify each layer independent from each other. In addition, each layer can be distributed to another machine, which is beneficial for load distribution. An abstract overview of the architecture is given in Fig. 2. In the following section, we will describe the role of each layer from bottom to top and how, in this approach, different levels of flexibility are achieved.

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

95

Fig. 2. LASAD architecture.

field, awareness information, internal and external references, etc.). Templates define the available elements of the user interface such as a chat window, a list of active users or the presence of a given text. A complete overview of available configuration options is given in Table 1. Based on the template, concrete argument map instances can be created. In addition to the configuration of the system, the data layer provides resources for analyzing argumentation processes (including state and action based logs). By means of the action-based logging, the system is able to recreate any step of an argumentation process. This implies that a replay of actions over time is possible and that the argumentation process can be evaluated in detail. There is also an option to save (and load) the current state of the argumentation. Even though these logging mechanisms are flexible enough for the LASAD framework to deal with requirements typical for argumentation in general, not all systems are able to deal with the format. Therefore, it is possible to plug-in additional logging mechanisms that take the actions generated in LASAD to generate

their own format. In the current system’s version, a ‘‘Common Format logging’’ (Schwarz and Glassner, 2007) is available. A concrete example of how to configure LASAD on the data level (in order to use the system with different types of argument representations) is shown in Section 3.3 as well as in Appendix. 2.2. The server layer On the second layer, all incoming requests and actions from the connected clients are managed, i.e., requests will be answered based on the information in the data layer, and actions will be distributed to other clients that are working on the same argument. Therefore, the server layer is responsible for concurrency and access control. The server’s contribution to the overall flexibility of the system is the set of available interfaces to communicate with the system. In the current version, there is the possibility to communicate either via Java Remote Method Invocation or (more platform independent) via Web Services. Thus, the server layer

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

96

Table 1 Overview of LASAD configuration options. Part of configuration

Option

Description

User

Role

Each role (such as moderator, teacher, student) is connected to a set of rights (e.g., read, modify, create, delete arguments)

Template

Chat List of users Number of users Feedback Element’s details

Enable or disable a simple text chat among users Enable or disable a list of users that are actively working on an argument map Define the maximum number of active users on the map. Thus, it is possible to enable or disable collaboration Enable the support to show feedback from teachers or artificial intelligence clients Enable or disable an additional window for each element on the map (i.e., nodes and relations) to show possibly hidden child-elements such as additional notes that should not appear on the map directly Enable or disable the submission of cursor positions of each user to each other user active who is actively participating at the argumentation on the map for additional awareness Provide a text that can be used for internal micro-references between the text and argument parts

Cursor tracking Transcript Ontology

Elements Child-elements

Define a set of available elements to model the argument. These elements can be either nodes or relations Each element’s visualization can be specified more concrete including size, element border, colors Define the number and kind of child-elements of an element. The child-element can be a (labeled) text field, an internal reference, an external reference to a web page, a awareness element which shows who created the element, a (labeled) rating element to assign scores such as believability to an element or an element that allows to choose from a given set of options (e.g., proof standards (Gordon et al., 2007))

(in combination with the definitions in the data layer) provides support for different kinds of cooperation. 2.2.1. The client layer The top level, i.e., the client layer, is the ‘‘window’’ to the system. Here, users are able to argue with a graphical user interface that allows them to create and modify graphical representations of an argument as well as to communicate with others. However, the client layer is not restricted to human users, but artificial intelligence (AI) clients can also be added. The latter may be used to pinpoint the user to possible weaknesses in their argument structure based on analyses of the argument map (an overview of feedback techniques for argumentation support is given in Scheuer et al. (2010, 2011)). To guarantee flexibility, each client is technically able to do the same actions, i.e., even an AI client may modify the argument if desired. The different clients may not only represent different roles (such as human user or artificial intelligence feedback agent), but can also provide different views on the same logical argument. This way, flexibility on the representation and visualization level is achieved. In the next section, we will show how different views on the same data can be realized in LASAD and how interoperability between them is achieved. 2.3. An example of flexibility: argument visualizations One of the most important challenges of argumentation systems is the translation of abstract arguments that are only existent in people’s minds into a shared understanding that is visible to all participants (Kirschner et al., 2003). To do so, there have been multiple attempts at representations, including graphs, tables/matrices and threaded or linear discussion. While each of these visualizations comes

along with a set of (dis-)advantages (cf. Scheuer et al., 2010 for an overview), it may be beneficial to combine multiple visualizations of the same data, as all visualizations offer different kinds of guidance for argumentation. Even a comparison of just a graph- and table-based visualization reveals these differences at first glance: On the one hand, a graph-based visualization, as shown in Fig. 1, is intuitive to most users (Suthers et al., 1995; van Gelder, 2003). It is highly expressive and, hence, gives a good overview of the argument structure and may guide first brainstorming and discussions. Further, it is the favored method of visualizing arguments in modern argumentation systems and, hence, the default view in LASAD. However, when a graph is growing, it is hard to see the temporal sequence of the argument moves and the overview of the content decreases rapidly as graphs tend to lead to ‘‘spaghetti’’ images which consume a lot of space (Hair, 1991; Loui et. al., 1997). On the other hand, a table-based visualization, as shown in Fig. 3, easily allows the relations between argument parts to be systematically investigated, and possibly missing relations can be easily recognized which can get important when an argument evolves over time and it gets increasingly easy to overlook missing relations (Suthers, 2003). Even though this might be beneficial in specific situations, it may influence the arguer to add unnecessary relations. Also, it is an uncommon (non-intuitive) way of stating arguments and it is limited in expressiveness as only two types of elements (row, column) are supported. For instance, relations between relations are not possible without additional efforts. In the LASAD framework, the client determines the used visualization. Technically, the argumentation process is distributed via actions to all connected clients. These actions include a command, a category as well as a set of

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

97

Fig. 3. Table-based visualization of an argument structure in LASAD showing a part of a legal argument based on a given transcript.

parameters. This data contains all information that is necessary to keep the view instances synchronized. An example of such an action is the following: Action: Command: CREATE-ELEMENT Category: MAP Parameter: MAP-ID: Parameter: USERNAME: Bob Parameter: TYPE: box Parameter: ELEMENT-ID: fact Parameter: POS-X: 87 Parameter: POS-Y: 79 Parameter: ID: 67 In this example, the action contains a unique ID for the element that has been created (CREATE-ELEMENT). Each child element of this element will then have an additional parameter ‘‘PARENT’’ which creates a logical link between these two elements. Furthermore, the action defines on which argument map the element is created (MAP-ID) and which kind of element it is (ELEMENT-ID, which is specified in the server-side ontology). In addition, there is the author of the element (USERNAME) as well as a couple of visualization information such as the position of the element. Based on this information, each client is able to generate a view of all elements. Data that is unnecessary for a specific

visualization (such as the position information for the table view) may be ignored and visualization specific differences are applied. In this example, the element type ‘‘box’’ is represented as a freely moveable contribution in Fig. 2, whereas it is part of the first row and column in the table view. Another example is the relation, which is drawn as curve in the graph visualization whereas it is just a cell in the table view. There might be even more abstract argument representations than graphs or tables, such as the mobile device representation shown in Fig. 4, which make use of list structures to fit to the needs of the mobile device. In addition to the graphical visualizations, there might be entirely logical representations (as shown in Listing 1 in Appendix) of the received data, which can be used, e.g., for analysis and feedback agents or for exchange between clients. The remainder of this paper (especially the study) deals with graph based argument representations. We will describe a study in which various representations (differing in their primitive types) have been compared. 3. Study Inspired by prior research results on educational argumentation technologies (cf. above), our goal was to investigate the (interaction-)effects of using different argument representations in individual as well as collaborative argumentation on the outcomes of scientific argumentation and on the learning

98

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

Fig. 4. List-based visualization for mobile devices of an argument structure in LASAD showing an overview of map elements (left) and the content of a concrete element (right).

effects with respect to argumentation skills and domainspecific knowledge. Whereas prior research results suggest that argument models and ontologies may influence the process of argumentation (e.g., Suthers, 2003), we wanted to systematically evaluate the effects of various argument ontologies (all with corresponding graph-based visualizations) used for the same task. A second goal was to investigate how individual and collaborative argumentation (i.e., sharing vs individually using representations) differed, also in the context of different argument ontologies/representations, to identify possibilities to aid argumentative processes better. 3.1. Hypotheses The hypotheses that we wanted to evaluate in this study were divided into three categories: effects of collaboration, effects of argument ontologies and interaction effects. 3.1.1. Effects of collaboration C1. Arguing in groups (as opposed to constructing arguments individually) will lead to a more elaborated argument, i.e., an argument of higher quality, due to different points-ofviews including multiple theories and opinions of the participants. In this context, we see the quality of arguments as a direct result of the successful application of argumentation skills (see Introduction). That is, an argument is of high quality if single points-of-views are well grounded, alternative positions are considered and evaluated by means of adequate facts. Thus, a more elaborated argument would typically include a higher number of reasoned contributions.

C2. In collaborative argumentation activities, students will be more motivated than individual ones. We hypothesize this based on the fact that discussions with other arguers will lead to a greater variation of the task steps and, hence, to a less monotonous activity. Prior results by Pinkwart et al. (2008) highlighted the importance of motivation to promote good learning results in argumentation activities. C3. In collaborative sessions, the participation of single users may drop as compared to individual argumentation sessions: shy arguers may stop arguing against a dominant, leading group member. C4. Collaborative argumentation will lead to more offtopic activities. Prior results by Schwarz and Glassner (2007) show that groups tend to get distracted from tasks, which can be detrimental for the overall argumentation process. C5. Group members will review and respond to each other’s arguments, and, hence, the overall number of mistakes will decrease in comparison to individual activities. We hypothesize this because argumentation is not a trivial undertaking: users may oversee their mistakes and, by discussing about parts of the argument, typical mistakes of single users may be revealed and corrected.

3.1.2. Effects of argument ontologies and representations O1. The higher the structural degree of an argument ontology is (i.e., the more primitives for representation are available), the higher we expect the overall structure of the argument map to be. This is a direct consequence if

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

the ontology is used correctly and supported by the findings of Schwarz and Glassner (2007). O2. The more detailed an argument ontology is, the more elaborated the resulting argument will be. The rationale for this hypothesis is that we expect the multiple elements of detailed ontologies to prompt the users to make use of them and, hence, think about how to fill them with appropriate materials. Clark and Brennan (1991) also noted that it is easier to refer to knowledge units which have a visual manifestation, so that the presence of various, different representational primitives may lead to more discussions and, consequently, to a more detailed resulting argument. When considering the existing argumentation systems, this would imply that the very general argument model provided by Athena (‘‘nodes’’ connected via ‘‘pro’’ and ‘‘con’’ relations) would be inferior compared to a Toulmin based argument model, which explicitly contains structures for more elaborated concepts such as ‘‘datum’’, ‘‘warrant’’ and ‘‘rebuttal’’ (see Section 4.2 for a detailed description of the Toulmin argument model).

99

‘‘unless’’, ‘‘since’’), which correspond to four link types in the representation. 3. A domain-specific ontology which was inspired by the Belvedere (Suthers, 2003) ontology, consisting of three contribution types (‘‘hypothesis’’, ‘‘fact’’, ‘‘undefined’’) and three relation types (‘‘pro’’, ‘‘contra’’, ‘‘undefined’’). This ontology has shown to be effective for scientific argumentation. The within-subject factor in the study was collaboration. Each participant was required to argue about one topic on his or her own and about another topic in a group of three. To eliminate possible confounds, we used counterbalancing so that half of the participants began with the group phase while the other half began with the single user phase. In the group phase, each participant worked on one machine. The participants were only allowed to communicate via a chat tool integrated in the argument framework. This simulated a remote discussion even though the users were actually located in the same room (the experimenter was in this room to enforce the rule). Overall, the study took 6 h per user, including a 1 h break between two sessions.

3.1.3. Interaction effects 3.3. Configuring LASAD for the study I1. For group argumentation, we hypothesize that the used ontology will influence the degree of collaboration: a more complex ontology may increase the need for collaboration (in order to discuss how to use the different elements to build an argument representation). I2. In collaborative sessions, highly structured argument ontologies may be detrimental to the quality of the resulting argument (due to the double complexity of keeping track of the group process and using a complicated argument model at the same time), while the scaffolds that more structured ontologies provide may be more helpful in individual usage.

For the study design, six different system configurations were required. More concretely, we needed three different argument ontologies, i.e., definitions of elements available to model the arguments, with two collaboration settings (individual and synchronous collaboration) each. A part of an argument map created via LASAD during the study presented in this paper is shown in Fig. 5. The underlying argument ontology is the simple domain-independent ontology consisting of a general contribution type (‘‘contribution’’) as well as of three different relation types (‘‘pro’’, ‘‘contra’’, ‘‘undefined’’) from which the latter is not present in the Figure. To configure the system to offer

3.2. Design To investigate the hypotheses, a mixed 3  2 design was used. The between-subject factor was the argument ontology. Here, the following three different ontologies were used: 1. A simple domain-independent ontology consisting of a general contribution type (‘‘contribution’’) and three different relation types (‘‘pro’’, ‘‘contra’’, ‘‘undefined’’). This means that, in the argumentation tool, the users had one box type and three link types at their disposal for creating argument representations. 2. A second, highly structured, domain-independent ontology based on the Toulmin argumentation scheme (Toulmin, 2003). It consists of five contributions (‘‘datum’’, ‘‘conclusion’’, ‘‘warrant’’, ‘‘backing’’, ‘‘rebuttal’’), so five box types were available for creating argument graphs, and four different relations (‘‘qualifier’’, ‘‘on account of’’,

Fig. 5. Part of an argument map created during the study via LASAD using a domain-independent ontology with general contributions showing arguments for and against the use of biofuel.

100

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

these elements, we used the XML configuration file shown (in parts) in Listing 2 in Appendix. The XML definition exemplifies the configuration of ontology elements in the LASAD system. In the definition, there are two elements: (1) the general contribution type (marked in red color) as well as (2) the ‘‘pro’’ relation (marked in blue color). Each of these elements has two children (defined between the /childelementS tags): (A) a text field, and (B) an awareness element. Whereas the text field is used to enter the concrete argument content, the awareness element shows who created the element. By means of the XML attributes minquantity, maxquantity, quantity, the definition specifies how often the child appears in the element at least (minquantity), at most (maxquantity) and on creation of the element (quantity). In this concrete example, the text area in the contribution type is present exactly one time, whereas the text area in the text area in the relation type (‘‘pro’’) is present at most once, but not by default. It, thus, could be added during runtime (in the used view, this is done via a plus button which appears when the user moves the cursor over the title of the relation). In addition to these general element definitions, there are optional visualization parameters that can be used by the client. While the graph-based view of the web-based client uses these additional information, they can be ignored by other clients. An example here, is a table visualization which does not have any edges. Thus, a table-based view will ignore the line-width and the line-color attributes. Once the ontology is defined, there can be multiple template definitions using it. The template definitions are used to enable or disable user interface elements as listed in Table 1. For the collaborative sessions of our study, we used the template definition given in Listing 3 in Appendix to configure the user interface appropriately. In this definition, we enabled the chat window (chatsystem¼ ‘‘true’’), the list of active users (listofusers¼ ‘‘true’’) as well as the distribution of cursor movement information to other clients (track-cursor¼ ‘‘true’’), i.e., each user was able to see the cursor of the other users during the argumentation as an awareness information. In addition, we disabled all other interface elements listed in Table 1 and set the limit of active users in the argumentation to four (three user slots for the participants of the study and one for the observing experimenter). Based on the template definition, we created the four map instances that followed these definitions. The definitions of the other conditions are analogous. In summary, thanks to the system configuration options, we did not have to use different argumentation systems (that may have come along with a vast set of external confounding factors that could have influenced the results of the study) for conducting the study but were able to easily specify the ontology, representation and collaboration parameters. No development efforts for different systems were required. 3.4. Study tasks Each participant worked on two open scientific problems that have no obvious solution. This kind of task choice was

motivated by Toth et al. (2002), who used challenging science problems to simulate an authentic argument activity, avoiding a demotivation of students caused by hiding the answer of already solved questions. Schwarz et al. (2000) results support this decision: their findings indicate that argumentation is most effective if students are arguing under uncertainty. In our study, the concrete topics for the arguments were: 1. The potential of alternative concepts for automotives (including the electric car, the fuel cell, and biofuel). 2. The German energy mixture in 2030 (including nuclear power, fossil fuel, and renewable resources). For each topic, three different possible positions were prepared. To allow all participants to argue for or against any of these positions, the students were provided with two pages of background information per position. This material, given in form of material chunks (graphs, tables as well as plain text) was typical for scientific argumentation, including facts, examples, statistical data and observations. In addition, there was one page containing background material that was common to all positions. The participants were explicitly allowed to go beyond the given material in their arguments. Each session about a topic was split into four time slots of 30 min each. In each of the first three slots, the participants were given the background material for one of the three positions (e.g., ‘‘nuclear power as a future energy’’) as well as the common materials, and they were asked to create an argument about this position using the LASAD system. The fourth time slot was used to integrate the three separate positions and to draw a final conclusion to solve the argumentation task. For this last step, the participants were given the materials for all positions again. 3.5. Participants and training Overall, 36 graduate students (25 male, 11 female) with different majors participated in the study. They were between 19 and 35 years old (m¼ 24.64, sd¼ 3.68) and in semesters 1 to 22 (m¼ 7.00, sd¼ 5.62). All participants were either native German speakers or fluent in this language (the complete study was conducted in German). Participation was voluntary and all participants were paid for completing the study. The participants were assigned randomly to all three ‘‘ontology’’ conditions, i.e., in each condition there were four groups consisting of three students each. In all but one group was one female student. None of the participants had used the argumentation system before. Thus, a short video introduction (15 min) to the LASAD system was shown to make sure that all participants had the same basis. All videos consisted of three parts: (1) A general introduction how to interact with the system, (2) an overview of supporting features to work in groups (e.g., chat, cursor tracking), and (3) an ontology dependent part in which the condition dependent features of

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

the system were explained using an example common to all conditions. Finally, the example argument that was presented in the video was distributed among all participants on paper and was available during the complete study. 3.6. Tests and interviews To test the learning effects caused by the argumentation tools use, three multiple-choice tests on argumentation abilities as well as two multiple-choice knowledge tests per topic were used. The tasks of the argumentation tests were taken from a list of questions of the Law School Admission Test (LSAT). Each argumentation ability test consisted of four questions, two from the area of logical reasoning and two from the area of analytical reasoning. These tests took place before the first session, between the two sessions and after the second session. The order of the tests was counterbalanced. The participants were given 6 min (1.5 min per question) per argumentation test. The knowledge tests were centered on the domain of argumentation in the respective study sessions (automotive concepts and energy mix). They were administered immediately before and after the corresponding sessions (in a counterbalanced manner) to measure domain learning. The participants were given 4 min (1 min per multiplechoice question) per knowledge test. In addition to these two tests, a questionnaire was used to evaluate the usability of the overall LASAD argumentation system. By means of this test, we wanted to check whether certain features of the system might have hindered the students to engage in reasonable argumentation, especially since this was the first larger study with the LASAD system. Here, the standardized System Usability Scale (Brooke, 1996) which has shown to be an accepted measure for usability (Bangor et al., 2008, 2009), was used. Finally, we asked the participants in an open interview about their motivation during the study sessions, and about potential problems and ideas for future improvements of the system.

101

coders with respect to the use of given material. For each element (boxes and relations) in a diagram, the coders checked if the contained information was based on a fact in the ‘‘reference list’’ (cf. above) or if it was a completely new contribution. The coders also rated the correctness of the used ontology elements (if, for instance, a fact element was actually used to represent a fact). To judge the structural quality of an argument map, the coders additionally checked for each of the six chosen maps if this map contained (a) a starting hypothesis, (b) a conclusion and (c) a clear grouping of the different positions. Based on these coding results, the inter-rater reliability was calculated and resulted in a Cohen’s k of 0.60 for the material used and 0.61 for the used elements. Concerning the general structural features (a–c), both coders agreed 100% on each measure. Taking into account the ill-defined nature of argumentation (Lynch et al., 2010), we considered this level of agreement as acceptable overall. The remaining elements were then coded by one coder in the same manner as described above. Overall, 5477 elements were manually coded this way. To measure the degree of coordination, also the chat messages were encoded. First, the chats (consisting of 878 messages) were divided independently by two coders into episodes that belong together, e.g., a discussion about where to start with argument modeling. Remaining slight differences were resolved via discussion between the coders. This resulted in an overall number of 196 chat episodes. Based on the chat episodes of three sessions (one per ontology, i.e., 25% of all material), the following four categories were agreed on as a coding scheme for the chat episodes: (1) Content, (2) Structure, (3) Coordination, (4) Off-topic. Based on this coding scheme, each chat episode within the 12 collaborative sessions was independently coded by two raters. The raters achieved a moderate Cohen’s k of 0.56. However, it turned out that the categories ‘‘structure’’ and ‘‘coordination’’ were often not clearly distinguishable so that these two categories were merged into one, which resulted in a high k of 0.76. The raters resolved remaining conflicts through discussion.

3.7. Coding procedure The material distributed to the participants consisted of unconnected information chunks including relevant as well as non-relevant parts. To be able to check how much of the relevant material was used, three domain experts independently created a list of all the facts that could either be directly taken from the material or directly concluded based on a combination of multiple information chunks. These lists were merged and discussed; the resulting lists (containing 81 entries for topic 1 and 75 for topic 2) were used as a reference for the relevant information that can be extracted from the hand-out material. To get further insights into the resulting argument maps, 6 of 48 maps (one individual map and one collaborative map for each ontology, i.e., 12.5% of all the maps) were randomly chosen and were coded element-wise independently by two

4. Results 4.1. System usability This study was the first one conducted with LASAD framework. Therefore, we were interested in the general usability of the system to avoid any detrimental influences on the other outcomes of the study. The SUS test resulted in a mean score of 81.46 (which is similar to a ‘‘B’’ grade). The concrete questions of the test are listed in Table 2 (scale 1 ¼ strongly disagree to 5¼ strongly agree), the results are shown in Fig. 6. Here, it has to be noted that some students had problems with question 6 (not understanding what was meant with inconsistency) and needed further explanation.

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

102

Table 2 Standardized questions of the System Usability Scale (Brooke, 1996). No.

Question

1 2 3 4 5 6 7 8 9 10

I I I I I I I I I I

think that I would like to use this product frequently found the product unnecessarily complex thought the product was easy to use think that I would need the support of a technical personal to be able to use this product found the various functions in the product were well integrated thought there was too much inconsistency in this product imagine that most people would learn to use this product very quickly found the product very awkward to use felt very confident using the product needed to learn a lot of things before I could get going with this product

In topic 2 (The German Energy Mixture in 2030) the pre-test resulted in m¼ 2.31 (sd¼ 1.04), whereas the post-test resulted in m¼ 3.42 (sd¼ 0.84; based on paired samples t-test: t(35)¼  5.976, po0.001). Concerning the gain of domain knowledge, there was neither a significant difference between individual/collaborative use of the system nor between the different ontologies. 4.3. Effects of collaboration on the argumentation outcome

Fig. 6. Results of the System Usability Scale test.

The results indicate that the LASAD framework used in this study was perceived as adequate tool to support argumentation. Questions 2, 3, 4 and 10 highlight the ease of use of the system. In combination with the 15 min tutorial video, this also confirms prior results reported by Van Gelder (2003) indicating that the box-and-arrow/ graph argument representation is, overall, intuitive to most users. Thus, we have no reason to expect a detrimental influence of the system on the outcomes of the study.

4.2. Overall effects on argumentation abilities and domain knowledge Based on the scores of the argumentation ability tests (m(T1) ¼ 1.611, sd ¼ 1.02; m(T2)¼ 2.056, sd¼ 0.89; m(T3)¼ 2.083, sd¼ 1.08; scale ranging from 0 to 4 points), a repeated measures ANOVA was calculated. This showed no statistically significant gains in argumentation skills, but a tendency (F(2, 66) ¼ 2.907, p ¼ 0.062). The betweensubject factor ‘‘ontology’’ did not cause a significant effect (F(2, 33) ¼ 0.745, p ¼ 0.483). Regarding domain knowledge, a significant gain between pre/post-test scores was consistently achieved. In topic 1 (Potential of Alternative Drive Concepts for Automotives), the pre-test resulted in m¼ 0.92 (sd¼ 0.77), whereas the post-test resulted in m¼ 2.97 (sd¼ 0.88; based on paired samples t-test: t(35)¼  10.330, po0.001; scale ranging from 0 to 4 points).

An ANOVA highlighted significant differences between individual and collaborative argument maps as shown in Table 3. In comparison, collaborative argument maps contained a larger amount of elements (i.e., boxes and relations between them) used overall (F(1, 46) ¼ 18.954, po 0.001) and a higher percentage of material used twice for the same position without additional benefit for the overall argument (F(1, 46) ¼ 6.983, p¼ 0.011). If the material was used twice, but for different angles of the argument, it was not counted as a duplicate. Contrary to our expectations, the percentage of given material used did not differ significantly between individual and collaborative argumentation (F(1, 46)¼ 0.932, p¼ 0.339). Yet, group members provided significantly more own contributions (i.e., contributions not derived from given material) (F(1, 46)¼ 13.524, po0.001) than individual arguers. Hypothesis C5 (groups will review the work of the members and, hence, will make less mistakes), measured by the percentage of wrongly used elements, has to be rejected (F(1, 46)¼ 0.956, p¼ 0.333). In fact, mistakes made in the group phases were often very similar to those made in the individual phases (e.g., wrong directions of relations). Thus, hypothesis C1 (group workhigher quality) is only partially supported. To measure the motivation of the participants, we analyzed the statements in the personal interviews conducted after the study. Here, all groups agreed (after short discussions) that working in groups was more motivating than working alone (hypothesis C2). This is supported by the observations of the experimenter, who stated that sometimes the participants in the individual sessions made a bored impression, as opposed to the collaborative sessions. Also, the groups always used all the time for their tasks, while some individuals finished early.

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

103

Table 3 Comparison between individual and collaborative argument maps.

Overall number of used elements in the workspace Number of own contributions (not derived from given material) Percentage of material used twice Percentage of erroneous used elements

Individual (n¼36)

Collaborative (n¼ 12)

m¼ 104.00 (sd¼ 29.72) m¼ 9.97 (sd¼ 5.43) m¼ 5.03% (sd¼ 4.23) m¼ 22.46% (sd ¼18.80)

m¼143.25 (sd¼ 15.80) m¼18.58 (sd¼ 10.61) m¼9.14% (sd¼ 5.83) m¼28.83% (sd¼ 21.73)

Table 4 Overview of average chat episodes per ontology in multi-user maps. Group map no.

Ontology

1 2 3 4

Simple Simple Simple Simple

m (Simple) 5 6 7 8

6 3 2 14

(33.3%) (33.3%) (100.0%) (51.9%)

6.25 (44.6%) Toulmin Toulmin Toulmin Toulmin

m (Toulmin) 9 10 11 12

No. of content episodes

5 4 5 6

(38.5%) (11.8%) (83.3%) (46.2%)

5.0 (30.3%) Specific Specific Specific Specific

6 1 6 6

(31.6%) (4.2%) (42.9%) (37.5%)

No. of structure and coordination episodes 12 4 0 13

(66.7%) (44.4%) (0.0%) (48.1%)

7.25 (51.8%) 8 21 1 6

(61.5%) (61.8%) (16.7%) (46.2%)

9.0 (54.5%) 13 20 6 8

(68.4%) (83.3%) (42.9%) (50.0%)

No. of off-topic episodes 0 2 0 0

(0.0%) (22.2%) (0.0%) (0.0%)

0.5 (3.6%) 0 9 0 1

(0.0%) (26.5%) (0.0%) (7.7%)

2.5 (15.2%) 0 3 2 2

(0.0%) (12.5%) (14.3%) (12.5%)

Overall

18 9 2 27 14 13 34 6 13 16.5 19 24 14 16

m (Specific)

4.75 (26.0%)

11.75 (64.4%)

1.75 (9.6%)

18.25

m (Overall)

5.33 (32.8%)

9.33 (57.4%)

1.58 (9.7%)

16.25

These observations can only be seen as indicators with limited validity. They are not conclusive because facial expressions as observed by the experimenter cannot always be categorized easily (Ekman, 1993) and because the required task time is dependent on context factors such as group size and coordination demands. Among our study participants, the question about the optimal group size for argumentation was discussed controversially. The majority agreed on two to three people arguing together: larger groups and the resulting growing needs for coordination were seen as potentially detrimental for the overall results. Hypothesis C3 (collaboration-participation drop of single users) is not easy to evaluate. We sought to investigate if users, when working together, became less active. To do so, we first computed the proportion of elements of each user in the collaborative sessions (min¼ 0.16, max¼ 0.59, m¼ 0.33, sd¼ 0.12)—i.e., single users created between 16% and 59% of a collaborative map. Apparently, there were thus no ‘‘drop-outs’’ and no dominating users creating a whole map alone. To represent how active a user is in individual sessions (as compared to his peers), we also computed, for each user, the proportion of the argument elements in his individual session to the sum of elements of all individual maps of his group members (min¼ 0.19, max¼ 0.51, m¼ 0.33, sd¼ 0.07). These two values resulted in a significant Pearson correlation

of r ¼ 0.428 (p¼ 0.009). Thus, hypothesis C3 can be rejected: users who are generally (in)active in individual sessions exhibit the same attitude also in collaborative sessions. The hypothesis that working in groups might lead to a large amount of off-topic talk (hypothesis C4), an aspect found to be critical in prior studies about remote argumentation (Schwarz and Glassner, 2007) could not be confirmed in our study. In the argument maps, there were in fact no noteworthy off-topic contributions (i.e., argument parts that contained either any form of discussions or points that were completely irrelevant for the argument) at all. Also in the chat, the extent of off-topic talk was relatively small (cf. Table 4). However, in interpreting these results, one should consider that the chat tool made it somewhat harder to engage in chat than e.g., an audio chat would have made it. One would expect people not to engage in off-topic talk when it is an effort to do so. This means that from our study, no conclusions about off-topic talk can be made for other situations where the ‘‘cost’’ of off-topic conversation is less. 4.4. The effects of ontology on the argumentation outcome Based on the structural assessment of the maps (with respect to starting hypothesis, conclusion and clear grouping), no significant difference between different ontology

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

104

conditions could be identified and, hence, hypothesis O1 (higher structural degree of ontology-improved structure of the argument) has to be rejected. However, the Toulminbased ontology had to be excluded from this analysis, because it follows a different model of argumentation (beginning with data and then drawing a conclusion) and there is no explicit hypothesis element in the ontology. A difference between ontologies was found in the percentage of wrongly used elements, e.g., using a hypothesis box to represent a fact or to ignore the direction of a pro relation (F(2, 45) ¼ 18.082, po 0.001). A post-hoc Tukey HSD test indicated that there was a significantly higher error rate (shown in Table 5) in the Toulmin condition than in the others (p o 0.001 for Toulmin vs. Simple and p o 0.001 for Toulmin vs. Specific). Hypothesis O2 (detailed ontology-elaborated arguments) could be confirmed partly. An ANOVA showed no significant differences (F(2, 45) ¼ 1.909, p ¼ 0.160) between ontologies with respect to the percentage of given material being used. However, a non-parametric Kruskal– Wallis test indicated that the amount of own contributions (not derived from given material) did differ significantly (p¼ 0.034) as shown in Fig. 7. 4.5. Interaction effects Regarding hypothesis I1 (ontology will influence the degree of collaboration), we analyzed the number of relations between elements of different authors in relation to the overall number of links as an indicator of the degree of collaboration Table 5 Overview of wrongly used ontology elements. Ontology

Average percentage of wrongly used elements

Simple Toulmin Specific

m ¼10.25% (sd¼ 10,80) m ¼41.29% (sd¼ 16,48) m ¼20.63% (sd¼ 16,57)

Fig. 7. Differences of own contributions (not derived from given material) used between ontologies.

(since this reflects the inter-relatedness of contributions from different users). An ANOVA did not reveal any significant difference between ontologies (F(2, 9)¼ 1.689, p¼ 0.238). Thus, the hypothesis could not be confirmed. Similarly, the results of a comparison of the number of chat messages used in different ontology conditions did not show any significant differences between ontologies as well (Content episodes: F(2, 9)¼ 0.212, p¼ 0.813; Structure & Coordination episodes: F(2, 9)¼ 0.408, p¼ 0.676; Off-topic episodes: F(2, 9)¼ 0.568, p¼ 0.586). The comparison of the chat messages can be used for the investigation of hypothesis I2 (highly structured ontology will be detrimental to collaborative argumentation) as well, showing that the amount of needed coordination of structure and activities are not dependent on the complexity of the argument ontology. In addition, there was no significant interaction effect between individual/group argumentation and the ontology (F(2, 42)¼ 0.605, p¼ 0.551) in terms of the number of erroneously used elements for argumentation. As such, I2 has to be rejected. 5. Discussion Regarding the knowledge and the argumentation tests, the results are not surprising. The increase of domain knowledge was an expected side-effect of our study: if students argue about a topic for a longer time with additional material, the result that they have gained knowledge in this field can be expected. The positive trends shown by the argumentation ability tests is more interesting and needs to be further evaluated in long-term studies: 4 h use of an argumentation system might simply not have been enough to come to statistically significant effects at the.05 level. This is clearly a weakness of the chosen experimental methods. In more realistic educational settings, a longer-term use of the argumentation tool can be expected (e.g., a recent use of the LASAD tool in Pittsburgh involved a time span of one semester with tool use in several weeks during this time period). With respect to collaboration, the results of our study confirm the possible benefit of collaboration for learning argumentation and are in line with prior findings (e.g., Janssen et al., 2010; Osborne, 2010; Sampson and Clark, 2009; Schwarz and Glassner, 2007; Schwarz et al., 2000). Against our hypothesis, groups in our study appeared not to have really checked each other’s contributions well, but have argued for or against possible arguments, resulting in more elaborated arguments. This is clearly a point that may be worth future investigations, as peer-reviews have shown to be an effective learning strategy (Gehringer, 2001; Cho and Schunn, 2007; Loll and Pinkwart, 2009) and their inclusion into argumentation system could be fruitful. Based on a scripted approach, a peer-review process could be enforced in argumentation systems. Contrary to the results of Schwarz and Glassner (2007), the influence of structural aids and collaboration on the amount of off-topic talk could not be confirmed in our

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

105

study. Possibly, the presence of a separate chat window was sufficient to keep the resulting argument map ‘‘clean’’. Concerning the guiding function of the visual representation and its underlying ontology, our results support Suthers’ (2003) findings. The use of the Toulmin argumentation scheme did lead to a different style of argumentation: While the Toulmin approach is based on data used to draw a conclusion (without any hypotheses), the other ontologies used in our study employ hypotheses that are then backed up with supporting facts. However, we were not able to provide evidence that a domain-specific approach is more beneficial for the overall argument quality than a domain-independent one. In addition, the participants in our study had some problems with a highly structured argument ontology, confirming prior findings by Suthers (2003) that a broad range of elements may cause problems for students dealing with it: the Toulmin ontology puts excessive demands onto the students due to its complexity. In fact, there were even students who refused to use the ontology correctly at all and only used the colors of the elements as orientation, e.g., using the red ‘‘on account of’’ relation as contra and the green ‘‘since’’ relation as pro. An interpretation of whether this behavior is positive (the students actually used the system to devise their own argument representation that made sense to them) or negative (the students did not use the predefined argument categories correctly and thus made argumentation errors) can essentially not be made independently from the intended effects of the system use. If the goal of using an argumentation tool is to teach a specific (or even formal) way of arguing, or if the computer is intended to analyze the arguments automatically, then a misuse of categories is problematic and should be avoided. If, however, the core goal of system use is to reach shared representations that make sense for the persons who created them during or after their argument (and where it is not important that other people or computers can interpret the results), then certainly a misuse of predefined ontology elements is acceptable. There was no noteworthy difference between the other two ontologies. As a limiting factor, though, we would like to mention that the students were not familiar with any argument ontology before the study and the theoretical argument model of Toulmin was definitely the most complicated one in our study. Also, a less elaborated ontology offers simply fewer possibilities to actually use elements incorrectly. Nevertheless, the Toulmin ontology was more prompting than the others, that is, it motivated the students to provide more own contributions.

order to acquire domain knowledge and argumentation skills. We introduced LASAD, a framework that is able to provide domain-independent support for computer-based argumentation. Here, we described three levels of flexibility achieved in LASAD: (1) argument visualizations and representations, (2) structural argument definitions, and (3) the kind of cooperation. Using LASAD as a research tool, the study reported in this paper compared different types of argument representations with respect to their educational value and, additionally, also investigated what effects the sharing of these argument representations (as compared to an individual use) have. In our analysis, we also looked at possible interaction effects between these two factors (ontology and collaboration). Our findings highlight the importance of adequate ontologies: the visual representation (and its complexity) of arguments indeed makes a difference for the resulting overall argument quality. Yet, we were unable to find specific different needs of individuals vs. groups that would correspond to the employed ontologies. This could imply that groups are able to deal with quite complex ontologies even though they also have to manage the complexity of group work at the same time. In this sense, a consequence is that designers of argument models would not have to care about shared or individual use of the argument representations they develop. Yet, further investigations may be required here: one could argue that the structure provided by an ontology may even support the coordination in groups as concrete element types may reduce the amount of required discussions (e.g., hypothesis that implicitly defines start points of an arguments, etc.) so that both effects (the detrimental one as well as the supportive one) may have canceled each other out. Finally, our results confirm that groups were able to enrich argumentation with different points-of-view independent of the ontology used. In future research, we plan to carry these results forward into other domains like ethics and legal argumentation and check if the results found are valid across argumentation domains and across argument visualizations. Additionally, we plan to compare scripted scenarios with unscripted ones to gain further insights how to improve the learning process. Also, to enable other, less technology adept, users to benefit from the system’s flexibility and configuration mechanisms, we are currently working on an authoring tool that allows the configuration of the system via a graphical interface instead of writing XML code. First evaluations of the authoring tool can be found in (Loll, 2011, 2012).

6. Conclusion and outlook

Acknowledgments

In this paper, we investigated the value of sharing representations in the context of educational argumentation—i.e., jointly using visual argument representations in

This work was supported by the German Research Foundation under the grant ‘‘LASAD—Learning to Argue: Generalized Support Across Domains’’.

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

106

Appendix Listing 1. Logical view/state-based log text vehicle in motor home park (with water and electricity connections) outcome search permitted author t1 time Jan 19 13:42

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

Listing 2. Ontology part of the XML definition of the simple domain-independent ontology

Listing 3. Template part of the XML definition of the simple domain-independent ontology Study condition 1b: Support for general argumentation in groups

107

108

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109

References Andriessen, J., 2006. Arguing to learn. In: Sawyer, R.K. (Ed.), The Cambridge Handbook of the Learning Sciences. Cambridge University Press, pp. 443–460. Andriessen, J., Baker, M.J., Suthers, D., 2003. Arguing to Learn: Confronting Cognitions in Computer-Supported Collaborative Learning Environments. Kluwer Academic Publishers. Bangor, A., Kortum, P.T., Miller, J.T., 2008. An empirical evaluation of the System Usability Scale. International Journal of Human– Computer Interaction 24 (6), 574–594. Bangor, A., Kortum, P.T., Miller, J.T, 2009. Determining what individual SUS scores mean: adding an adjective rating scale. International Journal of Usability Studies 4 (3), 114–123. Belgiorno, F., Chiara, R.D., Manno, I., Overdijk, M., Scarano, V., van Diggelen, W., 2008. Face to face cooperation with CoFFEE. In: Dillenbourg, P., Specht, M. (Eds.), Proceedings of the 3rd European Conference on Technology Enhanced Learning (EC-TE, 2008. Springer, pp. 49–57. Brooke, J., 1996. SUS—a quick and dirty usability scale. In: Jordan, P.W., Thomas, B., Weerdmeester, B.A. (Eds.), Usability Evaluation in Industry. Taylor & Francis, pp. 189–194. Carr, C.S., 2003. Using computer supported argument visualization to teach legal argumentation. In: Kirschner, P.A., Buckingham Shum, S.J., Carr, C.S. (Eds.), Visualizing Argumentation: Software Tools for Collaborative and Educational Sense-Making. Springer, pp. 75–96. Cho, K., Schunn, C.D., 2007. Scaffolded writing and rewriting in the discipline: a web-based reciprocal peer review system. Computers & Education 48 (3), 409–426. Clark, H.H., Brennan, S.E., 1991. Grounding in communication. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (Eds.), Perspectives on Socially Shared Cognition, pp. 127–148. Dillenbourg, P., Baker, M., Blaye, A., O’Malley, C., 1995. The evolution of research on collaborative learning. In: Reimann, P., Spada, H. (Eds.), Learning in Humans and Machines: Towards an Interdisciplinary Learning Science. Elsevier/Pergamon, Oxford, pp. 189–211. Easterday, M.W., Aleven, V., Scheines, R., 2007. ‘Tis better to construct or to receive? Effect of diagrams on analysis of social policy. In: Luckin, R., Koedinger, K.R., Greer, J. (Eds.), Proceedings of the 13th International Conference on Artificial Intelligence in Education (pp. 93–100). IOS Press. Ekman, P., 1993. Facial expression and emotion. American Psychologist 48 (4), 384–392. Gehringer, E.F., (2001). Electronic peer review and peer grading in computer-science courses. Proceedings of the 32nd SIGCSE Technical Symposium on Computer Science Education (pp. 139–143). Gordon, T.F., Prakken, H., Walton, D., 2007. The carneades model of argument and burden of proof. Artificial Intelligence 171, 875–896. Hair, D.C., 1991. Legalese: a legal argumentation tool. SIGCHI Bulletin 23 (1), 71–74. Harrell, M., 2008. No computer program required: even pencil-and-paper argument mapping improves critical thinking skills. Teaching Philosophy 31 (4), 351–374. Janssen, J., Erkens, G., Kirschner, P.A., Kanselaar, G., 2010. Effects of representational guidance during computer-supported collaborative learning. Instructional Science 38 (1), 59–88. Jonassen, D.H., Kim, B., 2010. Arguing to learn and learning to argue: design justifications and guidelines. Educational Technology Research and Development 58 (4), 439–457. Kirschner, P.A., Buckingham Shum, S.J., Carr, C.S., 2003. Visualizing Argumentation. Software Tools for Collaborative and Educational Sense-Making. Springer. Kuhn, D., 1991. The Skills of Argument. Cambridge University Press. Kuhn, D., 1993. Science as argument: implications for teaching and learning scientific thinking. Science Education 77 (3), 319–337. Laughlin, P.R., Hatch, E., Silver, C., Boh, L., J.S., 2006. Groups perform better than the best individuals on letters-to-numbers problems: effects of group size. Journal of Personality and Social Psychology 90 (4), 644–651.

Loll, F., 2012. Domain-independent support for computer-based argumentation. PhD Thesis. Clausthal University of Technology. Published online at/http://www.gbv.de/dms/clausthal/E_DISS/2012/ db110727.pdfS. ISBN: 978-3-942216-92-0. Loll, F., Pinkwart, N., 2009. Disburdening tutors in e-learning environments via Web 2.0 techniques. In: Dicheva, D., Mizoguchi, R., Greer, J. (Eds.), Semantic Web Technologies for e-Learning. IOS Press, Amsterdam, The Netherlands, pp. 279–298. Loll, F., Scheuer, O., McLaren, B.M., Pinkwart, N., 2010a. Computersupported argumentation learning: a survey of teachers, researchers, and system developers. In: Wolpers, M., Kirschner, P., Scheffel, M., Lindstaedt, S., Dimitrova, V. (Eds.), Lecture Notes in Computer Science (6383)—Proceedings of the 5th European Conference on Technology Enhanced Learning (ECTEL). Springer Verlag, Berlin, Germany, pp. 530–535. Loll, F., Scheuer, O., McLaren, B.M., Pinkwart, N., 2010b. Learning to argue using computers—a view from teachers, researchers, and system developers. In: Aleven, V., Kay, J., Mostow, J. (Eds.), Lecture Notes in Computer Science (6095)—Proceedings of the 10th International Conference on Intelligent Tutoring Systems (ITS). Springer Verlag, Berlin, Germany, pp. 377–379. Loll, F., Pinkwart, N., Scheuer, O., McLaren, B.M., 2011. Simplifying the development of argumentation systems using a configurable platform. In: Pinkwart, N., McLaren, B.M. (Eds.), Educational Technologies for Teaching Argumentation Skills. Bentham Science. Loui, R.P., Norman, J., Altepeter, J., Pinkard, D., Linsday, J., Foltz, M., 1997. Progress on room 5: A testbed for public interactive semi-formal legal argumentation. In Proceedings of the 6th International Conference on Artificial Intelligence and Law (ICAIL 1997) (pp. 207–214). ACM. Lund, K., Molinari, G., Se´journe´, A., Baker, M.J., 2007. How do argumentation diagrams compare when student pairs use them as a means for debate or as a tool for representing debate? International Journal of ComputerSupported Collaborative Learning 2 (2–3), 273–295. Lynch, C., Ashley, K.D., Pinkwart, N., Aleven, V., 2010. Concepts, structures, and goals: redefining Ill-definedness. International Journal of Artificial Intelligence in Education 19 (3), 253–266. McAlister, S., Ravenscroft, A., Scanlon, E., 2004. Combining interaction and context design to support collaborative argumentation using a tool for synchronous cmc. Journal of Computer Assisted Learning 20 (3), 194–204. McLaren, B.M., Scheuer, O., Miksatko, J., 2010. Supporting collaborative learning and e-fiscussions using artificial intelligence techniques. International Journal of Artificial Intelligence in Education 20, 1–46. Munneke, L., van Amelsvoort, M., Andriessen, J., 2003. The role of diagrams in collaborative argumentation-based learning. International Journal of Educational Research 39, 113–131. Osborne, J., 2010. Arguing to learn in science: the role of collaborative, critical discourse. Science 328 (463), 463–466. Pinkwart, N., Lynch, C., Ashley, K.D., Aleven, V., 2008. Re-evaluating LARGO in the classroom: are diagrams better than text for teaching argumentation skills? In LNCS 5091, 90–100. Rolf, B., Magnusson, C., 2002. Developing the art of argumentation. A software approach. Proceedings of the 5th International Conference on Argumentation (pp. 919–926). Rummel, N., Spada, H., 2005. Can people learn computer-mediated collaboration by following a script? In: Fischer, F., Mandl, H., Haake, J., Kollar, I. (Eds.), Scripting Computer-Supported Communication of Knowledge Cognitive, Computational, and Educational Perspectives. Kluwer, Dordrecht, NL. Sampson, V., Clark, D., 2009. The impact of collaboration on the outcomes of scientific argumentation. Science Education 93 (3), 448–484. Scheuer, O., Loll, F., Pinkwart, N., McLaren, B.M., 2010. Computersupported argumentation: a review of the state-of-the-art. International Journal of CSC 5 (1), 43–102. Scheuer, O., McLaren, B.M., Loll, F., Pinkwart, N., 2011. Automated analysis and feedback techniques to support argumentation: a survey, to appear. In: Pinkwart, N., McLaren, B.M. (Eds.), Educational Technologies for Teaching Argumentation Skills. Bentham Science Publishers.

F. Loll, N. Pinkwart / Int. J. Human-Computer Studies 71 (2013) 91–109 Schwarz, B.B., Glassner, A., 2007. The role of floor control and of ontology in argumentative activities with discussion-based tools. International Journal of CSC 2, 449–478. Schwarz, B.B., Neuman, Y., Biezunger, S., 2000. Two wrongs may make a right y if they argue together! Cognition and Instruction 18 (4), 461–494. Stegmann, K., Weinberger, A., Fischer, F., 2007. Facilitating argumentative knowledge construction with computer-supported collaboration scripts. International Journal of CSCL 2, 421–447. Suthers, D.D., Weiner, A., Connelly, J., Paolucci, M., 1995. Belvedere: engaging students in critical discussion of science and public policy issues. In: Greer, J. (Ed.), Proceedings of the 7th World Conference on Artificial Intelligence in Education (AI-ED). Association for the Advancement of Computing in Education, Charlottesville, pp. 266–273. Suthers, D.D., 2003. Representational guidance for collaborative inquiry. In Arguing to learn: Confronting cognitions in computer-supported collaborative learning environments, pp. 27–46.

109

Toth, E.E., Suthers, D.D., Lesgold, A.M., 2002. Mapping to know: the effects of representational guidance and reflective assessment on scientific inquiry. Science Education 86, 2. Toulmin, S.E., 2003. The Uses of Argument, 2nd rev. ed. Cambridge University Press. van Gelder, T., 2003. Enhancing deliberation through computer supported-visualization. In: Kirschner, P.A., Buckingham Shum, S.J., Carr, C.S. (Eds.), Visualizing Argumentation. Software Tools for Collaborative and Educational Sense-Making, pp. 97–115. von Aufschnaiter, C., Erduran, S., Osborne, J., Simon, S., 2008. Arguing to learn and learning to argue: case studies of how students‘ argumentation relates to their scientific knowledge. Journal of Research in Science Teaching 45 (1), 101–131.