215
Networking Technology
MultimETH, a Collaborative Editing and Conferencing Project H a n n e s P. L U B I C H lnstitut f~r Technische lnformatik und Kommunikationsnetze, Fachgruppe Kommunikationssysteme, Swiss Federal Institute of Technology (ETH), CH-8092 Zurich, Switzerland (E-mail: lubich@komsys,tik.ethz.ch)
Abstract. We describe a model for the collaborative conferenchag and joint editing system MnltimETH. We also introduce a document model that has been designed to meet collaborative needs in the context of joint editing. The overall design goals for M u l t i m E T H have been to identify and model the conference behavior and the functional requirements of a specific user group, in order to define a complete and easy to use set of c o m m a n d s and interaction tools. We also show how to integrate the M u l t i m E T H system into the conventional office working environment. The work described in this paper is part of the M u l t i m E T H multimedia collaboration project at ETH Zurich. Keywords. Collaborative conferencing, joint editing, multimedia documents.
l-~nnes Lubich, born 1961, started studying computer science at the Technical University of Berlin in 1982 and received a Diploma Degree in 1986. F r o m 1986 until 1989, he was teaching assistant in the Research Group on Communication Systems at the Swiss Federal Institute of Technology, Zurich. In 1989 he received a Ph.D. degree in computer science from ETHZ. His current activities at E T H Z include lecturing and research activities in the area of realtime-mnltimedia conferencing, U N I X system development, distributed systems, and message handling. North-Holland Computer Networks a n d ISDN Systems 19 (1990) 215-223
I. Introduction It has become a c o m m o n remark to say that we live in the age of communication. However, it is just as c o m m o n to hear that the communication tools used today do not always match expectations and requirements. In the following paragraph, we will discuss some of the problems that stem from the mismatch between expectations and reality, and we will indicate a possible solution for group communication. Communication within groups, in general, means a flow of information from one or more (active) sources of information to (passive) consumers of information. Three main characteristics of communication can be identified: (a) Synchronous/asynchronous communication (time dependent characteristic). Synchronous communication implies communication of all group members at the same time (e.g. in a video conference), while asynchronous communication allows participants to communicate within a certain timeframe (e.g. via postal or electronic mail). (b) One-way/multi-way communication (flow of information characteristic). One-way communication implies a unidirectional flow of information (e.g. newspaper, television), while multi-way communication means true interaction between participants. Multi-way communication m a y be separated into two-way alternating (e.g. hearings) or two-way simultaneous modes (e.g. brainstorming). (c) 1 : 1 / 1 : n / n : m - communication (association between participants characteristic). 1 : 1 communication implies 2 participants communicating with each other (e.g. dialog), while 1 : n communication implies a flow of information from one source to m a n y destinations (e.g. lecture). In a n : m communication there are m a n y sources and m a n y receivers of information (e.g. conferencing, joint editing).
0169-7552/90/$03.50 © 1990 - Elsevier Science Publishers B.V. (North-Holland)
216
H.P. Lubich / MultimETH
None of the identified communication forms implies a geographical dimension of communication, although some communication forms may not be useful in a distributed or non-distributed environment (imagine members of an asynchronous communication group sitting in the same room writing letters to each other). Although there is a sufficient number of media types for the 1:1 (telephone, letter) and l : n forms of communication, there is only a very small number of communication tools that support n : m interactions. Telephone conferencing, bulletin boards, and closed user groups in telematic services like Videotex offer some base functionality; however, these tools do not sufficiently model the communication behavior of their user groups and, therefore, do not fulfill the real needs of a collaborating group. Using these tools, the group is forced to modify its communication behavior to fit the tools available. Thus, electronically supported work easily becomes electronically handicapped work, especially when the user group does not consist of telecommunication experts but of people that wanted to use the electronic support just as one tool amongst others in a problem solving process. At this point, one ends up with the oldest and most reliable form of group communication: the personal meeting, a form of communication familiar to all of us. As a result, we spend a significant amount of time in airports and airplanes, railway stations and railways, hotels, and conference rooms as opposed to being in our most effective working environment--our own office, which is our knowledge base. Almost everybody has experienced a situation in which important documents were missing during a conference in a remote place. All one can do at that moment is to either skip the appropriate item or to postpone this item and rearrange the agenda, while one tries to phone somebody at one's office to find those documents and transfer them to a fax machine near the meeting place, where they have to be photocopied again to be distributed among the members of the meeting. There are a number of other disadvantages to the frequent face to face meetings outside of one's office; i.e., the bad reachability of the participants by their organization during the meeting and while travelling. Almost everybody who participates regularly in face to face meetings can contribute further to this list.
2. Requirements for a Computer Supported Conference System Given that there is a growing worldwide network consisting of hundreds of thousands of interconnected computers and that m o d e m workstations are to be found on almost every desk, it would seem that human collaboration could be supported by means of computers and computer networks. This concept is not new at all. Since the A R P A N E T started operation at the end of the sixties, single persons and a growing number of open or closed user groups have collaborated using this or similar networks. Examples of software still used today in many places are the asynchronous transmission of electronic mail, the usage of bulletin boards, and the synchronous transmission of text messages using tools like "chat" or "phone". Some of these communication tools, (e.g. electronic mail or bulletin boards) are very useful for asynchronous communication and are therefore widely used. Nonetheless, nobody would try to replace a face to face meeting by a "chat" session on a vtl00 terminal. Especially in synchronous conferencing, it is of crucial importance that software reflects properly the communication behavior of its user group. The software should not enforce a certain style of communication, that the user is not used to. This only forces the users back to a large number of face to face meetings in order to get things done. A computer supported conferencing system should be able to incorporate both synchronous and asynchronous communication and should provide means to dynamically enable 1:1, l : n and n : r n communication. Also, both one-way and multi-way communication must be supported. Geographical distribution of conference participants must be assumed. This form of communication--especially in the geographically distributed variant--is provided today by means of telephone or video conferencing. However, in the author's opinion, a computer supported conferencing system would allow much better support of synchronous meetings than a telephone or video conference. One advantage is that participants would no longer have to leave their knowledge base to join a meeting in a video conference studio. In addition, the conference system could be embedded into the working environment on the user's workstation, thus enabling data
H.P. Lubieh / M u l t i m E T H
transfers between the users private data collection and the conference database. Also, additional and arbitrary complex new data types could be integrated into the conference system and easily combined with each other while both telephone and video conference are limited in that respect. As we have indicated before, such a system needs to be designed and implemented to suit the needs of its user group, thus aiming to model conference activities and forms of interaction as closely as possible. With respect to the user group and conference forms to be supported, it must be assumed, however, that it is not possible to specify or build a conferencing system that satisfies the needs of all existing and potential future user groups nor can every form of conference be supported by one system. Thus, the M u l t i m E T H functionality proposal does not aim at supporting the needs of future uses groups but focuses rather on a user group whose conference requirements are already known. Several studies [5,9] have shown that the communication needs are extraordinarily high in the field of expert work and that experts groups have the highest knowledge both of conferencing procedures and computer usage. Typically, the goals of expert conferences are to prepare a face to face meeting or conference, an agenda or program for a conference or meeting, collaborative editing of documents, code walkthroughs, brainstorming sessions, expert discussion, short-term negotiations, and balloting. A typical example for such a conference is a meeting of a program committee for a large conference, which is trying to compile the final program document, define the order of speakers, affiliation to session topics, etc. Within the M u l t i m E T H project, we focus on our own communication needs as a relatively small group of computer experts who want to use a conferencing system to ease time constraints (and cut down travel expenses) for regular meetings and to collaboratively edit and maintain documents. The conference form and user group can generally be described as collaborative expert work where a result is provided based on synchronous work of a small group (~< 10) where every participant needs to access private and local data and partial results. It is also assumed that the members of the group know each other and that monitoring of non-verbal reactions (as in business negotiations) is not important, thus making integration of
217
video presentation of the participants (i.e. integration of video conference functionality) unnecessary. Supported media types should include at least text, graphic, images, spreadsheets, and voice to provide a complete office workbench. It must be noted, however, that this does not imply that we want to replace meetings such as the E A R N / R A R E Joint Networking Conference by a computer supported conference, since the requirements differ greatly from the requirements we just described. Furthermore, it must be noted that computer supported collaboration is by no means a full replacement for all forms of direct interaction between persons. Especially social contacts and private by-pass communications during a conference must not be neglected, since they play an important role in human interaction.
3. The Conferencing Model and Functionality Following the definition of user requirements in Section 2, we present the functionality of a conferencing system that allows its users to communicate and collaboratively edit and manage documents. Also, a document model will be presented that allows for multiple media types and simultaneous multiuser access. The system consists of a number of cooperating components: (1) conference administration, (2) collaborative conferencing, (3) collaborative editing, (4) homogeneous user interface for the administration, conferencing and editing components, (5) reliable, synchronous, standardized data communication in a L A N / W A N context. In this paper, we focus on points (1)-(3). The description of the system architecture as well as a discussion of the protocol design, the user interface, and implementation aspects is part of the MultimETH project but is beyond the scope of this paper.
3.1. The Conferencing Model In a multimedia conference, members mainly utilize synchronous multi-way communication, but 1:1-, as well as l : n - and n: m-communication must also be provided. The conference model not only includes the functionality needed during a conference. In addiiion, the model covers the time
H.P. Lubich/MultimETH
218
-[
S~
~ i
Chairman
AccessControl Mechanisms
-" User 4.
keeperofthe minutes
c~no I voice
document editing
©
(., O
User2
User3
Fig. 1. Structure and components of the conference model.
before and after a conference, since invitations to join a conference can be sent asynchronously, and information needed at the conference start (e.g. documents that will be collaboratively edited during the conference) can be distributed to the participants before the conference takes place. Also, a transcript of a conference for later preparation of conference minutes can be created and documents may be distributed by an electronic mail system. Figure 1 presents the structure of the conference model and its components while a conference is active. Within a conference there are some roles that are assigned to participants. The role of the chairman must be assigned to one participant, while other roles (e.g. keeper of the minutes) are optional. All participants may be identified by name and, if a role is assigned to them, also by role. All participants can access their private work space as well as a shared work space (SWS) that contains data accessible to all participants. Data in the shared work space comprises either documents that may be edited collaboratively or non-permanent communication units like voice, data within dialog windows, etc. Communication between the participants is based on a common view on the shared work space and modification functionality for shared data (WYSIWlS, " W h a t you see is what I see" [8]).
An important component of the conference model is access control for shared resources. In this context, objects within a document as well as documents themselves and communication channels are viewed as resources that may be either reserved by one participant or are available to all participants without explicit access control. Aocess control can be explicitly switched on or off for each resource, but the access granting mechanism may differ for different resource types. In this section, we will focus on access control for the voice channel resource. Section 3.4 discusses the corresponding access control method for collaborative editing of multimedia documents. If access control is not used, every participant may act at any time; i.e., voice comments will be transmitted between all participants. This implies simultaneous access to the voice channel, even if there is explicit access control to the voice channel, since the chairman should at any time be able to use the voice channel for urgent messages. If access control is switched on, the chairman must explicitly assign a token for each voice channel. Only the participant that possess such a token may access a voice channel for "writing" (i.e. talking). All other participants can hear the talking participant. Every token has a time attribute that can be defined by the chairman. After expiration of this timer the participant is notified and
H.P. Lubich / MultimETH the token "returns" to the chairman if the chairman has not prolonged the timer in the meantime. Every participant may apply for a token. The chairman may assign the token(s) explicitly; i.e., even if the token is not assigned, it is not automatically passed to the next applying participant. If the number of applying participants is larger than the number of tokens, a waiting list is created that can be inspected by all participants. The chairman may edit this list and delete or rearrange entries. Participants may delete their own entry in the list at any time before they get the token. To decrease complexity of operation for the chairman, the token-based access control mechanism for the voice channel in a conference is optional and is switched off by default. To provide a compromise between simultaneous access and a high workload for the chairman, the token may be assigned automatically on a first-come first-serve basis. In this case, a token timer can only be prolonged if no other participant is waiting for the token.
3.2. The Conferencing Functionality The conferencing functionality can be organized in a layered fashion. The kernel of the system consists of a data communication module which enables the system to reliably cooperate with other systems in a standardized way. Based on the communication module, the conference management module provides functionality needed to administer conferences, participants and documents. This layer especially provides the user with functions to enter or leave conferences. Within a conference, the multimedia editor layer offers functionality to edit documents collaboratively with other members of a conference. Finally, the functions of the administration, conferencing, and editing modules that are visible to the user are provided under a homogeneous user interface. In the following sections, we will step through the various operations that are visible to the end user and indicate the functionality for each operation provided.
Conference Management Functionality This section contains operations for conference management, such as sending invitations for a conference, opening, closing, joining, and leaving a conference. Several specialized commands for
219
administrative tasks are reserved for the chairman, others can be issued by all participants.
Communication without Shared Documents The operations in this section enable the participants to communicate with each other by 1 : 1"chatting", sending of broadcast messages and formal voting procedures. Communication Using Shared Documents The proposed real-time multimedia conferencing system allows its users to communicate not only by utilizing the voice channel and interaction functionality discussed in Section 3.1, but also by collaborative editing of documents. Sections 3.3 and 3.4 discuss the basic document and object model and present the collaborative editing functionality on documents and their contents.
3.3. The Document Model and Functionality The editing component of the MultimETH system may be used to collaboratively create, edit, and manage multimedia documents. A multimedia document consists of a n u m b e r of structure elements (e.g. title, headlines, chapters, sections, etc.) that have a hierarchical relationship and that have a content portion (e.g. text, graphic, bitmap, etc.) assigned to them. Users must be able to modify the logical structure of a document by modifying structure elements as well as the corresponding content portions. This implies a hierarchical model where a document contains objects which in turn may contain elements (e.g. characters in a text object). The textual description of a multimedia document needs to be more formalized and put into relationship with existing document architectures. Therefore, we will briefly review the most common standardized document architectures capable of dealing with multimedia information. Finally, this section presents a formally described document type that will be available within the multimedia editor. With consideration of the the urgent need for standardization due to the growing number of incompatible, manufacturer-defined document architectures and text formats; the two document architectures, namely S G M L [4] and ODA [3]
220
H.P. Lubich / M u l t i m E T H
have been defined by different standardization bodies for different purposes and independently from each other. As part of the project work, we have already shown in [1] that the O D A model suits our needs much better than the S G M L model with respect to multimedia documents as they need to be accessed by the collaborative editor. In [1] we have also presented a model that introduced the time dimension to ODA, thus adding time constraints and synchronization functionality to the document architecture. We assume this extended O D A document architecture as a basis of discussion in the following sections. The logical structure of an O D A document can be viewed as a tree structure, formally described by the Backus N a u r F o r m (BNF) defined in [2]. We will use this notation to formally describe the MultimETH document type. A second view on the example document is the layout view that contains information about the presentation of the document and its objects; e.g., number of columns, formats, fonts, and font sizes. Using the layout view, styles for objects may be defined (e.g. headlines must be set bold, each chapter must begin on a new page, etc.) as well as exceptions and additional formatting information (e.g. only the next paragraph should be set with a hanging indent). In the next step, we need to define formally the structure of the document type that is supported in the collaborative editor following the O D A notation of logical document structures. This document type represents a valid subset of the document architecture described in ODA; see Fig. 2. One could easily add other structure objects as they are defined in ODA, but, with respect to the
=
= = = = = = = = = = = =
requirements of the supported user group, this definition is sufficient for creation of all needed document types. Each document and each object within a document, regardless of whether it is a content portion or a structure object, has a set of attributes. In addition to the attributes defined in [3], we define the following attributes for conference specific information and give possible values: - access rights (combination of: "read", "write", "annotate", "delete" and the access groups "owner", "conference", "all"), - owner (username), annotation (text or empty), - reserved (username or empty), created (date, time and username), - m o d i f i c a t i o n (list of date-time-username triplets). For each document in the shared workspace, the corresponding access rights must be declared as attribute values. The chairman or the owner of a document defines the access rights of a document when it is created in the shared workspace. Afterwards, the access rights may be modified by either the chairman or the owner of the document. The access rights ("write" and " a n n o t a t e " ) contain " r e a d " while "write" contains "annotate". To change attributes of a document, "write" access is needed. For every document, there may be annotations that are not a part of the document content but are administered as an additional attribute. A user does not need to have "write" access but " a n n o tate" access to the document in order to add annotations to the document. In analogy to the voice channel, access control is provided for collaborative editing but with a -
-
[] [] [] ()* ()* [] ()+ [] ()+ J J I TEXT TEXT TEXTJIMAGEBGRAPHICISPREADSHEETJVOICEJCHALKBOARD
Fig. 2. Subset of the document architecture described in ODA.
H.P. Lubich / MultimETH
different mechanism than a token being granted to a user. Instead, control of the editing process is carried out by definition of explicit access rights for documents and objects within documents, as well as by explicit selection and reservation of parts of the document for exclusive write access by one participant at a time. This explicit access control mechanism guarantees the integrity of the shared data. It is described in more details in Section 3.4. The following sections briefly mention the document oriented functionality of the collaborative editor.
Document Management Functions This section contains operations for browsing the document space of a conference as well as for creating, removing, copying (within or between workspaces), attributing, and annotating of documents. In addition, the multimedia editor may be invoked for a selected document and documents containing dynamic portions (e.g. voice replay, animation etc.) may be replayed. Operations on Complete Documents Within the Editor Once the editor is invoked, there are several operations that can only be applied to complete documents such as including a complete document at the current cursor position or storing a selection of the current document as a new document. Also, the logical structure of a document (i.e. the outlining tree) can be viewed. By clicking on a node, parts of the document may be reserved for exclusive write access. Store, restore, and preview operations allow for storage and retrieval of previous versions of a document. 3.4. Objects In Multimedia Documents As we have mentioned already in Section 3.3, every object in a multimedia document has a number of attributes that contain conference specific informations. The chairman or the creator of an object defines the access rights and access groups for that object. The creator is the owner of the object. Attribute values may be changed at any time by the chairman or the owner of the object. To ease the administrative work load, the access rights m a y be specified as being the same as the whole document or as the object that is
221
hierarchically on top of the actual object. For annotations, the same rules as specified for document annotation is applied. Analogous to the the voice channel is an access control mechanism for collaborative editing that prevents simultaneous user access to resources by asking the participants to explicitly request parts of the document for exclusive write access. It must be noted, that a reservation is not necessary for read access, however, it can not be guaranteed that read access to a part of the document will return the most actual data when there is a write reservation on that part of the document. Typically, a participant viewing a part of the document that is being edited by somebody else will see any changes to the document as soon as they are received by the reading participant's instance of the conferencing system. The reservation mechanism is realized by presenting the logical structure of the edited document as a document tree in a separate window. Each object in the document is represented as a node of the tree. By clicking on a node and issuing the "lock" command, the user reserves the subtree under the selected node for exclusive write access. The " u n l o c k " c o m m a n d releases the reservation. A user may only reserve one subtree at a time, but this reservation might very well contain m a n y objects. The smallest reservable part of a document is one object. Reservation of a subtree is only possible if there is no reservation already in the requested subtree. If a reservation of a subtree is granted, no subtree that incorporates the reserved subtree may be reserved. All nodes that may not be reserved for that reason are defined as being "unavailable". Nodes that may still be reserved, since they are located in a different area of the document tree, are defined as being "free". Consider the logical view (outlining) of a document and a possible user view in Fig. 3. Let us assume that an attempt is made to reserve the second section of the document by clicking on the corresponding "chapter' node below the "section" node in the outlining tree. Assuming that there is no conflicting reservation request, the corresponding parts of the document tree (i.e. all objects in the subtree of the selected node) are marked as "reserved", while the two nodes hierarchically on top of the selected node are marked as "unavailable". All other nodes remain being marked as "free". The reservation is
222
H.P. Lubich
/ MultimETH
I. Chapter Exampletext of the first
'Csection '
chapter 1.1 Seclton
( Text)
~
( ChaDter )
Fig. 3. Logical view and user view of an example document.
also indicated to the selecting participant in his user view, as indicated in Fig. 4.
laying window system; such as close, move, resize, front, back, and redisplay. 3.6. Presentation to the User
3.5. Miscellaneous Operations
There are several other operations that are needed to allow the user tailor his conferencing system to match h i s / h e r personal needs. These operations allow the user to define, for example, whether broadcast messages or other participants cursors may be displayed at his/her conferencing window or not. Other operations allow setting of local default values for the conferencing system (e.g. a default position, size of the conferencing window, etc.). A number of basic window operations that are not a part of the conferencing system are additionally required from the under-
To the user, the conferencing and editing system is presented as one application in a multitasking environment, represented by a conferencing window and temporary additional windows (e.g. the outlining window). Thus, the user is free to switch to other applications in the window system at any time.
4. Conclusions and Further Work
A number of conclusions can be drawn from the work we have completed so far:
J objects
I. Chapter Exampletext of the first chapter
i:!:i:!:i:i:i:i:i~i~i:i~i:i:i:i:~:~:~:~iiiii~i~i~iii~i;ii;!~i~:~i~i~i~i~i~:
i::i::i:~i:~ii i i i!::!ii ~i~:~!:~:~i~i i i i !i!i!i!i~i i ~i i i i~i i!i!i!i~i~ ~~ ~"i;;;"';";;:iiiiii:::: :::: ::::7::i::;11 I
(unavailable}
(
I
|
free )
Fig. 4. Modified outlining tree and user view of the reservation.
I
I
I
I
II1.;.;,:.;,:.
H.P. Lubich / MultimETH
With respect to our perception of the requirements of a dedicated user group, the proposed functionality of the system is complete, including functionality for conference management, voting, joint editing, and document management. The system allows the user to remain in his/her usual working environment, so that the user may access h i s / h e r private data at any time during the conference. It includes the framework for future integration of various additional media types. The media types completely satisfy the needs of the dedicated user group. The system contains the definition of a document model that, due to its hierarchical structure, provides an elegant solution to the problem of simultaneous access to objects by more than one user. The inclusion of media types like voice, video, etc., however, implies time dependencies between objects which are not covered by the underlying document model. Therefore, we have added timing and synchronization functionality to the model. This extension to the O D A model, in our opinion, makes it a suitable architecture for collaborative editing of multimedia documents. In addition to user functionality and document modeling, we have focussed on application layer protocol design and the requirements for the underlying communication infrastructure. The communication infrastructure of the system is embedded in the OSI model, thus providing both standardized connectivity with other systems and solutions for L A N / W A N interworking. Prototyping of both the application layer protocols and the communication subsystem has revealed that usage of OSI protocols provides acceptable performance and delays for the end user in a LAN context. Public W A N communication services (e.g. packet switched public data networks, X.25) at the moment do not provide sufficient bandwith to offer acceptable performance and end-to-end delays.
223
Development of new W A N technologies like ISDN, fast packet switching, etc. will, however, ease this situation soon, allowing usage of synchronous multimedia conferencing systems over both L A N and WAN. Further work items in the collaboration area have been identified, including functionality definition for conference scheduling and subgrouping mechanisms. Other work items have to be covered in the document modeling as well as in the communication area, especially a theoretical performance analysis of the conference architecture and performance requirements both in the L A N and W A N context will have to be carried out to complete our experimental results.
References [1] H. Gerloff, Vergleich und Bewertung von Dokumentenarchitektur - Modellen fiir Multimedia-Dokumente, ETH Ziirich, Diplomarbeit, 1988. [2] V. Glavinic, A structured view of document production, Technical Report 83-17, Technische Universit~it Berlin, Fachbereich 20, 1983. [3] Information processing - Text and office systems - Office document architecture, 1988. [4] Information processing - Text and office systems - Standard Generalized Markup Language(SGML), 1986. [5] H. Lubich, Maus, Desktop, OCR und Lan - Neue Interaktionsmedien in der Biirowelt, Technische Universitiit Berlin, Fachbereich Informatik, Diplomarbeit, 1984. [6] H. Lubich, MultimETH: Ein Beitrag zur Konzeption eines Echtzeit-Multimedia-Konferenzsystems, Eidgen6ssische Technische HochschuleZiirich, Abteilung Informatik, Dissertationsarbeit Nr. 20, 1989. [7] M.T. Rose, The ISO development environment, version 6.0, Performance Systems International, Inc., 1990. [8] M. Stefik, G. Foster, D.G. Bobrow, S. Lanning and D. Tatar, WYSIWISrevised: Early experiences with multi-user interfaces, in: Proc. Conference on Computer-Supported Cooperative Work, Austin, TX (1986). [9] P. Wisskirchen et al., Informationstechnik und Biirosysteme, in: Leitfil"den der angewandten Informatik (Teubner, Stuttgart, 1983).