Making SGML work

Making SGML work

CRMPRml snwllllRRS IiIHIERldCES ELSEVIER Computer Standards & Interfaces 18 (1996) 37-53 Making SGML work Introducing SGML into an enterprise and us...

2MB Sizes 0 Downloads 116 Views

CRMPRml snwllllRRS IiIHIERldCES ELSEVIER

Computer Standards & Interfaces 18 (1996) 37-53

Making SGML work Introducing SGML into an enterprise and using its possibilities in advancedapplications Hans Holger Rath * , Hans-Peter Wiedling Computer

Graphics

Center

(ZGDV),

Whelminenstrasse

7, D-64283

Darmstadt,

Germany

Abstract Nowadays more and more companies are evolving into worldwide operating enterprises embedded in interdependent networks of communication, information exchange and product manufacturing. An up-to-date enterprise is spread over several sites with distributed subsidiaries for administration, product launches and manufacturing. At the same time internationalization is combined with a strong need for efficient information exchange in an open system environment. Information management and dissemination is becoming a key issue to successin every phase of the product life cycle. Any solution must take all these developments into account. With SGML-based documents and tools a flexible approach can be made towards open system independent document management with re-usable portions of information. In this paper different aspects are described: the steps required to set up an SGML-based application are introduced, ideas and the benefits of Literate Programming to SGML applications are explained, database functionality in distributed environments is presented, and document exchange and hypermedia documenh are considered. Keywords: Document analysis and structuring; Tailored SGML applications; Literature Programming; Document exchange; Joint editing; Combined text and graphics tagging; World Wide Web; HyTime

1. Introduction

There is a need in industry to switch to electronic documents based on a structured and standardized format. Companies want to perform the whole documentation process with one uniform format which enhances open information inter-

* Corresponding author. http://zgdv.igd.fhg.de/ 0920-5489/96/$15.00 SSDI

0920-5489(95)00034-S

Email:

[email protected],

URL:

change, reduces effort, and improves control over the process chain. Typical examples with many hundreds of parts suppliers and manufacturers are the automobile industry, the aerospace industry, and the health care sector. The last few years have shown that SGML [1,8] meets most of the mentioned needs and will extend the usability of electronic documents well into the future. This paper is divided into two major parts: a description of the process of “Making SGML work” in an enterprise and a presentation of advanced applications based on SGML.

0 1996 Elsevier Science B.V. All rights reserved

38

H.H.

Rath,

H.-P.

Wed&g

/ Computer

Standards

The first part (Section 2) explains the most important and common steps to setup an SGML application. These are: l the analysis of the documents and the work flow, l the formal specification and documentation, l the selection of the tools, l the implementation and customization, and l the testing, training, and maintenance. The second part (Section 3) of the paper presents new tools and ideas which enhance the possibilities of SGML or simplify the handling of SGML. These topics are: l a new approach towards specifying and documenting a hypermedia SGML application in one interactive document (see 3.1), l an overview of the requirements and the functionality of SGML databases (see 3.2), l the exchange of electronic documents in a network environment (see 3.3), l the vision of a powerful document server to support concurrent engineering and office management, as well as the preparation and execution of joint sessions of document editing in distributed environments (Computer Supported Cooperative Work, CSCW> in a LAN and WAN (see 3.41, and

Lavout SGML: Standard

& Interfaces

a survey of techniques and standards which support hypermedia documents (see 3.5). We close this paper with a summary of all mentioned topics (Section 4).

2. Setting up an SGML application

2

Language

As mentioned

The first step is document and work flow analysis. Document analysis has to figure out the structure, formatting, and hyper-linking of the documents. The analysis must consider existing documents, style guides, corporate identity requirements, etc. This step is the most important part while working with SGML, because it influences the quality of the SGML application, just as the design phase in a software project influences the quality of the developed software. The analysis of the structure can be illustrated as shown in Fig. 1. Furthermore the current work flow during the document life cycle must be analyzed and the new work flow must be designed. Typically, different phases in the documentation process can be recognized: (1) integrating existing formats, editing and update services, (2) archiv-

Structure Markup

Language

2.1

Language

constructs

concrete

syntax

Subqsection ,-,eadj”g

of the DTD and attributes

Paragraph

are defined.

Foofnote

Type Definition

Fig. 1. Analysis

/je&ing Paragraph Cross reference /At List items Highlighted phrases

SGML provides:

Within the DTD entities, elements

1) DTD = Document

Paragraph Section

the reference

View:

‘tie SeclionHead;“g

introduction.

constructs

in sectior@

in five steps

2.1. Document and work flow analysis

1 Introduction This text is the example’s

37-53

l

View:

Generalized

18 (1996)

of the structure

of a document.

(as part of lisl ifern)

H. H. Ruth,

H.-P.

Wiedling

/ Computer

ing, (3) distribution, and (4) usage and application of the information. In the early phase of developing an SGML application these generic steps are very helpful for addressing which functionality belongs to which parts in the process. It is important that the documents are syntactically correct and can be validated by an SGML parser in every step of the process chain.

Standards

& Interfaces

18 (1996)

39

37-53

2.2. Specification and documenfation

The second step is the specification of the results of the analysis phase with standardized formats using tools like parsers and DTD (Document Type Definition) editors. During the formalization it is important to look at existing DTD fragments for common document parts like equa-

Element: Dot

Element: Section \

/

Element: Subsection

Element: List

Element: Paragraph,

Listltem

Element: Title, Heading, HighPhrase,

Footnote

Element: CrossRef

Fig. 2. Structure of document type “Dot”

as syntax (railroad) diagram.

*

H.H.

40

Rath,

H.-P.

Wiedling/Computer

a T

Standards

& Interfaces

18 (1996)

publication, delivery, and exchange of the SGML documents. Some selection criteria: support of SGML or at least a structured format, various DTDs, various layouts, table/formula/graphics, parsing onthe-fly, true WYSIWYG, publishing print quality, application programming interface (API), database access, CD-ROM support, hypertext. It depends on the application deciding which criteria are of importance and which are not. 2.4. Implementation

Fig. 3. Structure of document type “Dot” (generated with Near & Far?.

37-53

and customization

as tree diagram

tions and tables ([7] gives an overview of all relevant table DTDs). If the users of the documents need on-line access to the information, use of the World Wide Web (WWW, see 3.5) should be considered. This step is complete when the formal specifications are documented in a way that even SGML novices understand it. There are two widely used presentation styles to illustrate the specified document structure: syntax (or railroad) diagrams (Fig. 2) and tree diagrams (Fig. 3). If a company wants to use SGML for more than one document type, the re-use of common structures can reduce the costs of subsequent applications. Fig. 4 illustrates this aspect.

The specified SGML application is implemented with the selected SGML tools in the fourth step. This includes the customization of the tools for the needs of the customer. The customization requires an API of the tool, which should support a mainstream programming language (like C, C + + , Lisp). The customization of an application does not only mean bringing the DTD to a specific platform, but it means also tailoring the user interface to the requirements of the expected user group, e.g. reducing menu items to prevent wrong interactions and inconsistent states from a typist, supporting special keycodes to accelerate the work by reducing obsolete mouse movements, and enlarging menu items for database access or other additional features.

2.3. Selection of tools

2.5. Testing, training, and maintenance

The third step covers the selection of the appropriate SGML software tools for authoring, management/retrieval in databases, formatting,

In the last (fifth) step the authors (means the end users) are trained on the SGML tools and later on the various parts of the application are

3 independent

DTDs

3 DTDs with common

70 % common

Setup effort = 300 %

structures

structures

Setup eff art = 160 %

Fig. 4. Setup efforts can be minimized using an element library for common structures.

H.H.

Rath,

H.-P.

Wiedling

/ Computer

Standards

maintained or further document types are added. During field trials user requirements and experiences that help to improve the acceptance of the new tools may be integrated very early.

3. Advanced tools and applications 3.1. Integrated specification and documentation hypermedia SGML applications

of

3.1.1. Overview

Setting up a conventional SGML application as described above results in formal specifications and in documentation readable by humans. Changes in the specifications will easily cause inconsistency with the documentation. The problem could be solved using one interactive document which is comfortable to edit. The document contains the formal specifications and documen-

Specification Fig. 5. “Conventional”

tools

& Interfaces

tation. ported The clause ming”. ald E.

18 (1996)

41

37-53

The input of the specifications is supby specialized graphic editors. system which will be described in this is based on the idea of “Literate ProgramThis term was introduced in 1984 by DonKnuth. He wrote in [6]:

“I must confess that there may also be a bit of malice in my choice of a title. . . . By coining the phrase literate programming, I am imposing a moral commitment on everyone who hears the term; surely nobody wants to admit writing an

illiterate program.” Knuth introduces a new paradigm for programming which was supported by a system called WEB. He wrote: “I chose the name WEB partly because it was one of the few three letter words of English that hadn’t already been applied to computers.. . . I think that a complex piece of software is, indeed, best regarded as a web that has been delicately

Authoring

specification of an SGML application.

system

42

H.H.

Rath,

H.-P.

Wiedling/

Computer

Standards

In a SGML application scenario there are no programs, but there are formal specifications of the structure (using SGML), layout (using FOSI [lo] or DSSSL [5]), and hypermedia elements (e.g. using HyTime [4], see 3.5) of the documents. Therefore, the term “Literate Specifying” seemed appropriate for the new technique to describe hypermedia SGML applications. The system we developed to support this technique is called SWEBS: SGML WEB System. 3.1.2. What is the real problem?

All formal specifications are based on well-defined “languages” which are designed by experts for experts. But, if someone wants to process his documents with SGML, the person is an expert on his documents and not on SGML. So, he

s User layout, hypermedia

18 (1996)

r---v*

s Author

expert) _ -

-

-

Influences - - - _ :

Document structure

Application programmer

Authoring

SWEBS Fig. 6. Specification

37-53

needs a second expert, someone who knows SGML but mostly nothing about the documents. It is the same with the layout and the hypermedia structures of the documents. Fig. 5 shows the involved people and their tasks. The structure, layout, and hypermedia experts work together with the document expert. They produce the formal specifications and the documentation for the author and the application programmer. Both have to know and understand the specifications of the experts. The formats that specify a hypermedia SGML application are based on ASCII and are therefore “readable”, but they are difficult to understand. Each format specifies its part of the whole application, but, taken together they do not comprise one single integrated format. Fig. 5 hints at the three main problems: l Inconsistency between specifications: Since the specification of the whole application is divided into three independent specifications with three different “languages” (e.g. SGML,

pieced together from simple materials. We understand a complicated system by understanding its simple parts, and by understanding the simple relations between those parts and their immediate neighbours.”

(Structure,

& Interfaces

of an SGML

application

with

SWEBS.

system

H.H.

l

l

Rath,

H.-P.

Wiedling

/ Computer

FOSI/DSSSL, Hytime) inconsistencies can easily arise. Some examples: a structure element could exist without a layout definition, a context condition of a layout definition might never match the structure, or a simple typing error results in different element names in the document structure and the hypermedia structure. Too much know-how of expensive experts: Most specifications are keyed in with “stupid” ASCII editors. Therefore the user must be an expert in the format. He has to know almost everything about both syntax and semantics. But the user (another expert?) has to know also everything about the processed documents. During the specification process at least two experts are necessary. Inconsistency between specification and documentation: The documentation of the applica-

tion is done separately from the specification. These are different files written at different times by different people. If the specification is modified, the changes must be duplicated in the documentation. But what happens if the responsible person is not available. The likelihood that specification and documentation become inconsistent after some time approaches 100%. On the other hand, the documentation process needs more than one expert. They have to know everything about structure, layout, hypermedia, and the underlying documents. It could be much easier if the person who knows his documents can describe the SGML application with an easy-to-use tool. He should not need to know anything about the syntax of the formats. This was the main design goal during the development of SWEBS. 3.1.3. Solutions

The mentioned problems could be solved with these new concepts: l Compound interactive document: The specification of the structure, the layout and the hypermedia structure are integrated in one homogeneous electronic document @WEBS document, Fig. 6) with an easy-to-use graphical user interface. Although the SWEBS document is

Standards

l

l

l

l

& Interfaces

18 (1996)

37-53

43

based on the mentioned formats, the user does not handle the syntax directly (but he can, if he is an expert). Graphical representation of formalisms: The graphical user interface covers the formalisms by representing them in a graphical manner. In particularly two complex concepts were examined in detail: the content model of an SGML element and the context condition of a layout definition. Graphical specification: Specialized graphical editors support the user in defining content models and context conditions without exact knowledge about the underlying formalism. The editors are well designed following wellknown concepts of user interface ergonomics. So, they are designed to meet the needs of both novices and experts. Check functions to find inconsistencies: SWEBS offers checking functions to ensure both the consistency of the single formats and the consistency among the formats. Specification and documentation in one single document: The SWEBS document is designed

under the ideas of “Literate Programming (Specification)“. This means that the SWEBS document contains both the specification and the documentation in a way which is readable by humans and not only by computers. All specification and documentation parts may be combined in any order. They are linked together with cross references (like a web). The reader of the SWEBS document (application programmer, author) should read it like a “good book”. l

Automated generation of the setup data for the authoring system: SWEBS extracts and re-

orders all specification parts out of the SWEBS document and generates either files of the selected formats (SGML, FOSI/DSSSL, HyTime) or all setup data which are necessary for an authoring system to run the SGML application. 3.1.4. Architecture

of SWEBS

SWEBS is one tool in the arena of SGML software. It covers the analysis and the specification step in the process of setting up an SGML

44

H.H.

Rath,

H.-P.

Wedling/

Computer

Standards

application. Further tools in the process chain are the authoring system, the database, the delivery system, and the document viewer. The interface of SWEBS to the other tools is SGML and other standardized formats. But SWEBS can also be tightly connected to one selected authoring system. In such a case, SWEBS generates directly proprietary data for the SGML application in this authoring system. The SWEBS document is itself an SGML application. There is a SWEBS DTD and a self-description of the SWEBS document in a SWEBS document. This was one of the first “real” applications described with SWEBS. The SWEBS software is based on an SGML authoring system with a complete application programming interface (API). The specialized graphical editors are sepa-

& Interfaces

18 (1996)

rate programs which communicate with the SWEBS kernel in the authoring system. The check functions and the generation functions of the formats and the setup data are coded as API functions (Fig. 7). 3.1.5. Status of SWABS and conclusion

SWEBS is still under development. The SWEBS DTD is quite stable. The graphical DTD editor, the generator of the setup data, and the interface to SGML are mostly completed. Up to now, there has been no decision whether to support either FOSI or DSSSL; for the moment a self-defined layout language is used, which is very close to DSSSL. The self-description of SWEBS within SWEBS shows that SWEBS is a powerful, but easy-to-use

SWEBS

Authoring

3s

r - - - -)

Author influences

Document structure (SGML DTD)



ADDliCatiOfl

prbgrammer

of setup data

U SWEBS documents (proprietary format)

system

ZE

User

Generator

37-53

SGMLdocuments (proprietary format)

SWEBS documents (SGML instances)

Specification Fig. 7. Architecture

of SWEBS

Creation and its cooperation

with

the authoring

U

Printer

SGMLdccuments (SGML instances)

and update system.

H.H.

Rath,

H.-P.

Wiedling

/ Computer

Table 1 Feature

Standards

& Interfaces

18 (1996)

45

37-53

Needed in database for Publishing *

Retrieval *

*

*

*

*

* *

f *

* *

* *

*

*

*

* * *

* *

* *

Production I

Access control, authentication, identification, security Accounting Consistency checks (DTD, cross references) Access to document parts Check-in/out Locking Version management Retrieval (keyword, full-text, SGML structure) Generating of views (e.g., a catalog) Adopting information out of SGML instance Release process with check for completeness On-line access in a LAN/WAN Information about updates, effectivity, and status of document Annotations (private/public hyperlinks, notes, bookmarks) Work flow information Integration of other formats (text, graphics) Application Programming Interface

* * * * I * * * * * *

databases ensure the fast access to specific information out of the tremendous amount of electronic documents in a company. The tasks of a database are storage, version control, consistency, and retrieval of data. The use of SGML documents enriches the possibilities compared with documents in other formats. The collection of SGML documents is the database. This means that every piece of tagged information in an SGML document can be retrieved easily. The combina-

tool to define a complex SGML application and to produce easy-to-understand documentation. The idea of “Literate Programming” is very well applicable for the specification of SGML applications. 3.2. SGML databases

The benefits of SGML documents increase when they are stored in (SGML) databases. Only

Publishing DB

Production DB Release

r--“I

Retrieval DBs Pub

Paper SGML RTF Interleaf Frame

..

Create

Using

Author Product Manager

User of document

Fig. 8. Integration of the three database types.

46

H.H.

Rath,

H.-P.

Wiedling/Computer

tion of SGML and databases improves the advantages of both. 3.2.1. Three database types

Databases that deal with (SGML) documents must support various parts of the document lifecycle: authoring (production), publishing, and retrieval. The production database coordinates the input of the authors. The publishing database contains the released documents. The user/ reader of the documents wants to retrieve specific information out of the retrieval database. These three databases need be segregated only at the conceptual level. They could be one or two physical databases. Table 1 lists the needed features of the three database types and Fig. 8 shows their integration. 3.2.2. Relational vs. fulltext database A relational database stores an SGML

instance as one unit with some keywords. These keywords may be extracted out of the instance during the check-in process. When a user searches for a specific term in the documents the database will find only those documents which have this term as a given keyword. If necessary, the size of a “document” can be reduced to a part of a document, but the technique and the keyword dependent results will be the same. A fulltext database offers access to the whole contents of the document. But in lots of cases, too many documents will satisfy one search expression. A SGML fulltext database is the logical solution. It offers the advantages of a fulltext search combined with a context dependent search in user-selectable parts (namely, SGML elements) of the documents. All operations of the database work on the element unit, even check-in/out and version control. This gives the best flexibility for all three mentioned database types - production, publishing, and retrieval. 3.2.3. Database-publishing

and printing on demand

The publishing database which contains all released documents could be used to produce highly tailored documents (e.g. a catalog which lists the documents in a customer specific view). The documents and their parts can easily be

Standards

& Interfaces

18 (1996)

37-53

re-arranged and printed directly out of the database (re-use without re-keying of the information). Furthermore all the publications could be printed when the customer wants a copy and not before. This concepts can reduce the costs of publications with limited editions. 3.3, Exchange of SGML-coded

documents

As mentioned above on the one hand the need for cross-company exchange of complex documents increases and on the other hand documents have to be archived for maintenance reasons over a very long period of time. For long term archiving standards play an important role. It is important to support standards not only for archiving and re-use but also for the exchange of documents in heterogeneous environments. Available networks should be applied for the exchange of SGML-coded documents and for avoiding media breaks (i.e. print the document on paper, send it, and re-type at least parts of the document). By sending an SGML document is not as easy as sending an ordinary text file. An SGML based document comprises the document instance, it references the DTD, and at least in technical applications it contains several references to graphical data. The difficulty in this situation is that transmitting an SGML-coded document is not supported by any of the commercial-off-theshelf SGML tools. The sender himself has to take care that he sends all necessary files and even worse: who is the one who ensures that the receiver of the document knows how to handle each file and integrates it correctly in the receiver’s working environment? This is time consuming and error prone. The problems increase when the size of the document increases. Only support of the whole docmlent exchange in a controlled manner can guarantee completeness and consistency. These are the reasons why we decided to develop a carrier for SGML-coded documents. The document is packed into an envelope-like file, some administration data is added, and it is sent with normal means to the receiver, where it is unpacked. In this development ED1 [21, CALS-

H.H.

Rath,

H.-P.

Wiedling

/Computer

Standards

& Interfaces

18 (1996)

47

37-53

Compress Encode Collect

Fig. 9. Sending

an SGML-coded

Tape [9] and SDIF [3] concepts have been considered. For packing the information the following process is performed: the SGML instance is parsed for references to other files. This process can be configured as a result of the requirements of a special DTD. All needed files are automatically

document.

gathered in a temporary directory, compressed, converted to 7-Bit ASCII and packed (Fig. 9). The unpack program at the receiver’s side executes these steps in reverse order. In addition to that, file names are adapted to system-specific requirements. Therefore the anchors of the references in the SGML instance are also modified to

Distribute Decode

Uncompress

Fig. 10. Receiving

an SGML-coded

document.

48

H.H.

Rath,

H.-P.

Wiedling

/ Computer

Standards

& Interfaces

18 (1996)

Enterprise site A

Enterprise site B

Author is,

Private

Document

37-53

Cooperatively

Edited

Document

Database

Fig. 11. Example

Private

of a documentation

scenario

allow smooth integration in the new environment (Fig. 10). Currently, we consider adding digital finger-

distributed

Subsidiary, Joint

Document

between

h+o sites.

Site B Supplier)

:

Technical

(Headquarters,

Subsidiary,

Supplier)

Documentation

Advanced Tools: Editor, Database, Delivery

I ’

Advanced Tools: Editor, Database, Delivery

Standards: SGML, HyTime and others

,

Standards: SGML, HyTime and others

Telecommunication Services

CD-ROM Fig. 12. Layer

CD-ROM model

for the joint

Database

print information to our carrier file to enhance security during transmission and to avoid unauthorized access.

Site A (Headquarters,

Author Q

editing

of technical

documentation.

H.H.

Rath,

H.-P.

Wiedling/

Computer

3.4. The vision of a document server

Besides document exchange there is also an increasing need for tools for concurrent engineering of documents. A document server (Fig. 11) embedded in the network could fulfill such requirements. Especially short term changes (e.g. updates, country specific modifications) and browsing through bulky documents cannot be met by the exchange of the whole data. This is especially true when the document is very large and comprises information that is equivalent to some several thousand pages. The document server has to support concurrent documentation management as well as the preparation and execution of joint sessions of document editing in environments distributed over wide areas. This fact can take advantage of the fact that telecommunication companies provide access to convenient networks and infrastructure as basic services (Fig. 12). Advanced tools for document editing, archiving and management are becoming available on the market. So, these tools can be used and combined with the advantages of SGML for joint editing, i.e. SGML documents are well structured and parts of a document can be identified and accessed directly.

Specification

Development

(Part)Production

Standards

& Interfaces

18 (1996)

49

37-53

Special requirements result from the number, size, and complexity of the documentation, as well as from the emerging relevance of interactive multimedia information. The administration of such documentation is closely related to the selection of an appropriate document format. The results of work groups are fixed in documents, and these results are required for the work flow subsequently followed. Engineers involved in the development of a product do concurrent documentation. The work in progress has to be delivered, integrated with other document parts, and commented on. It is the basis for discussion. On the one hand, documentation reflects the ongoing work in developing new features; on the other hand existing documentation has to be considered for quality reasons of the development (e.g. standards, test reports). The results of a working group in a phase of a larger work flow is the starting point for succeeding phases: specification, design, development, part production, manufacturing, maintenance, recycling (Fig. 13). Examples of such documents are service information, instruction manuals, repair manuals, diagnostic reports, training information, and technical data. Furthermore, electronic documents

Construction

Maintenance

Recycling

Fig. 13. Documentation exchange is required during the whole product life cycle.

50

H.H.

Rath,

H.-P.

Wiedling/

Computer

are important for long term archiving and reuse after several years. The entire document comprises many different data types. Drafting, design studies, technical drawings, tables, formulas, as well as explanations describing the behavior and the physical characteristics of the product are as the two sides of the same coin: in the end all parts have to be combined in the product documentation. 3.4.1. Concurrent document assembly

If we look more precisely at the document assembly process we can identify several steps: 0 Creation, integration, and updating: In this part of the process the documentation is generated. Individual documents are generated, individual documents link to and (re>use documents or parts of documents, and documents are composed of different documents. The different of the documents are put together and link is performed but not only to text documents. In technical drawings, for example, it is very important for all kinds of people that are involved in the documentation that they can easily recognize which object is meant by describing “part B-011234 in Fig. 43.2”. l Archiving and dissemination: The documents are achieved in databases and the information is indexed, classified and prepared for retrieval, reuse, and application. l Usage and application of the information: The documents are applied on the shop floor, in the engineering process, presales talks, etc. In addition to that, documents are created and modified by different individuals or groups in the distributed environment. They generate textual and graphical information (e.g. tables, formulae, business graphics), integrate test results, and insert data from a scanner or a video camera. In this step the documentation is structured and provided to other peers for referencing and reuse. Additional information is generated out of the available information (e.g. cross reference lists, table of contents, glossary, index, etc.). Whenever people, i.e. designers, technical engineers, manufacturers, salesmen, or service engineers, talk about product results they achieved, it is recorded in some kind of documentation. It is

Standards

& Interfaces

18 (1996)

37-53

the goal of the document server to support the combination of individual, group-specific, and company-wide information. 3.4.2. Objective of a document server

It is the intention to support short-term changes in joint document editing sessions. l support long-term archiving, retrieval, dissemination, and reuse of strategic documents. l support the gathering of information in a timeand cost-effective way and reduce the time-tomarket in the long term. l support the composition and decomposition of information that is complex but well-structured. l support the integration and linking of documents and parts of documents. Both textual and non-textual information should be handled in an intuitive and unique way. l support functionality to manage the work fzow l

l

of documents. apply available

systems: advanced tools are available for the handling of different aspects of the described scenario. It is the goal to apply and to put existing software together and to come closer to “plug-and-play” situation in a rather complex application. It is a not a goal to re-invent existing components.

3.4.3. Technical approach to joint editing

Distributed editing, archiving, retrieval, interchange, delivery, and comfortable viewing of SGML-based hypermedia documents require advanced tools. The editor should provide a context-sensitive user interface which “knows” the document structure. The archiving and viewing tool should also “know” the structure. This is necessary to offer context dependent retrieval of he SGML documents to the user. In addition to the server’s main functionality, gateway functionality to and from the server has to be offered. Document conversion is necessary for easy integration of other data and the preparation of the structuring of the data. Authoring complex and large documents normally involves more than one editor in this process. The editors will not sit together during the

H.H.

Rath,

H.-P.

Wiedling/Computer

whole process of document preparation. But at the same time the need increases to cooperate on documents or on smaller parts of documents even when the authors are separated over longer distances. Joint editing of and communication about the document can be based on the document itself without exchanging the whole document. Because it is very simple to address small but well-structured subparts of a SGML document, it is the best way to support distributed editing of document by just exchanging the small document parts. (Small) parts of a document are reserved by a particular author; he changes and makes the changes publicly available. At the same time another uses could be browsing information or editing other parts of the same document. Another application scenario could be that an editor can be remotely driven. For discussion an editor could be driven to a specific location in the document. For performance reasons the document could be preloaded before the session starts or at run time only those parts of the document that are really needed are transferred. The first steps to enhance an editor in a way described above are just being implemented at ZGDV. 3.5. Hypermedia documents

In the near future electronic documents will be a composition of interactive hypermedia objects. The formats applied have to offer mechanisms to describe how to define and integrate link anchors in non-textual objects like graphics and video. One widely used system is the World Wide Web (WWW). With its HyperText Markup Language (HTML), WWW is SGML-oriented in the way documents are described, and links to other objects can be embedded in a document. Furthermore, it takes advantage of the availability of the Internet to access documents that are spread world-wide. By the common gateway interface (CGI) mechanism, it is possible to integrate other formats and viewers. A problem that is especially well-known in the context of document servers in a WWW environment is that when a document is requested only a single file is transmitted. The functionality of sending a complete document with all relevant

Standards

& Interfaces

18 (1996)

37-53

51

links is not offered by WWW servers. The user has to search for the missing parts by himself and send requests for every missing part. The document is incomplete until the provider of the document confirms its correctness by manual checks. That is one reason why we have implemented a carrier for SGML-based documents (see 3.3). In the next section we want to focus on another important development that has SGML roots, i.e. HyTime. 3.5.1. Hytime - Hypermedia / Time-based Structuring Language

HyTime (Hypermedia/Time-based Structuring Language, IS0 10744, [4]) defines a language for addressing multimedia information objects (like text, graphic, audio, video) or parts of the objects, for describing the links between the addressed objects, and for synchronizing static data (text, graphic) with time-based data (audio, video). HyTime is based on SGML. It defines elements and attributes in a meta-DTD and gives them their semantics. The defined HyTime constructs are called Architectural Forms. Because HyTime is based on SGML, it fits all the requirements for linking hypermedia SGML documents together. But the HyTime linking techniques consider also every other format of multimedia objects. Only the addressing of the portion of data must be defined in an application-specific manner. HyTime stores the addressing, linking, and synchronizing information inside the SGML file or outside the affected multimedia objects in separate files. The latter has some advantages: (i) the same multimedia object can be used in more than one hypermedia application, (ii) link anchors can be placed even in read-only files, and (iii) link anchors can be placed in (binary) files without changing the file. HyTime is divided into six modules (Base Module, Measurement Module, Location Address Module, Hyperlinks Module, Scheduling Module, Rendition Module). Because not every HyTime application needs all Architectural Forms and the modules comprising it. HyTime software (HyTime Engine) is adjustable. The Base Module must be used/supported by every HyTime application/Engine. All other modules are optional.

52

H.H.

I-

Rath,

H.-P.

Wiedling

/ Computer

Standards

& Interfaces

18 (1996)

37-53

means: requires the other module

Fig. 14. Modules of HyTime and their dependencies.

Some modules require other modules (Fig. 14). Each module defines more Architectural Forms belonging to the subject of the module. Only the Base Module defines Architectural Forms with aspects that overlap. They define the behavior of the other modules and the whole application.

4. Conclusion

This paper has given a short overview of introducing SGML to an enterprise and has described some advanced developments at ZGDV which take advantage of the SGML format and available SGML tools. Generally speaking, management of documents means that lots files have to be handled, and different experts are involved in different phases of the process. We showed that whenever something has to be modified after introducing SGML the risk of inconsistency is present. An approach solving this problem is the way of Literate Programming in SWEBS. Another problem that may cause inconsistency during the exchange of SGML documents has been recognized and solved. We presented a carrier for SGML-based documents. To have direct access to documents in a networked environment the document server offers this facility. For the near future we see a strong need to integrate real multimedia in terms

of supporting authoring environments, the linking of different documents, and synchronization of different media streams with respect to time and location. ZGDV has developed or will develop tools which fulfill the described ideas and concepts. Acknowledgement

The major developments described in this paper are sponsored by the German Ministry of Commerce (Bundesministerium fiir Wirtschaft) under grant no. 68504, and derived from the WISE project that is performed for the CEC. WISE stands for World Wide Information Support for R & D Efforts. We would like to thank our colleagues Jiirg Hofmeyer and Erik Meiljner as well as our students for contributing to the success of the projects. References [l] International Organization for Standardization, Information processing - Text and office systems - Standard Generalized Markup Language (SGML), IS0 8879:1986, ISO, Geneva, 1986. [2] International Organization for Standardization, Electronic data interchange for administration, commerce and transport (EDIFACT) - Application level syntax rules, IS0 9735:1988, ISO, Geneva, 1988.

H.H.

Rath,

H.-P.

Wiedling/

Computer

[3] International Organization for Standardization, Information processing - SGML support facilities - SGML Document Interchange Format (SDIF), IS.0 9069:1988, ISO, Geneva, 1988. [4] International Organization for Standardization, Information technology - Hypermedia/Time-based Structuring Language (HyTime), ISO/IEC 10744:1992, ISO/IEC, Geneva, 1992. [S] International Organization for Standardization, Information technology - Text and office systems - Document Style Semantics and Specification Language (DSSSL), ISO/IEC DIS 10179:1994, ISO/IEC, Geneva, 1994. 163 D.E. Knuth, Literate Programming, Computer J. vol. 27 (2) (1984) p. 7-111. [7] H.H. Rath, Tabellen in SGML, ZGDV Report 7.5/93 (German), Computer Graphics Center (ZGDV), Darmstadt, 1993. [8] H.H. Rath, SGML - Eine Einfihrung (German), in: Proc. Workshop SGML in der Praxis GI (Gesellschaft fiir Informatik) Fachgruppe 4.9.2 “Multimediale elektronische Dokumente”, Heidelberg (April 1994). [9] U.S. Department of Defense, Markup requirements and generic style specification for electronic printed output and exchange of text, MIL-M-28001A, U.S. Department of Defense, 1990. [lo] U.S. Department of Defense, Markup requirements and generic style specification for electronic printed output and exchange of text - Appendix B: Output specification, MIL-M-28OOlA, U.S. Department of Defense, 1990. [ll] H.-P. Wiedling, Elektronische Speicherung and Handhabung von Dokumenten (German), to be published in DIN-Mitteilungen, Beuth Verlag, Berlin, 1995.

Standards

& Interfaces

18 (1996)

37-53

53

Hans Holger Rath studied Computer Science at the Technical University of Karlsruhe? Germany and received his diploma m computer science 1990. Since 1990 he is a member of the scientific staff of Computer Graphics Center (ZGDV) in Darmstadt, where he is heading the department “Document Computing” since the beginning of 1995. He is also the head of the “Competence and Application Development Center for Online-Publishing (CADENCE)” at ZGDV. Furthermore, he is a member of the German DIN committees “ANP ESHD” and “NABD 13.2” and the international IS0 committee “ISO/ITSCG WGI” where he is one of the technical advisers for SGML. At ZGDV he works in the areas structured electronic documents, online-publishing, hypermedia documents, CSCW on structured documents, and specification of SGML/HyTime applications. Hans-Peter Wiedling received a diploma in computer science from the Technical University of Darmstadt, Germany, in 1988 and a Ph.D. in computer science from the same University, in 1993. Since 1992 Hans-Peter Wiedling is heading the department “Graphical User Interfaces and Applications” in the Computer Graphics Center. Since 1994 he heads the workine erouu in ZGDV on electronic docu&& ‘and since March 1995 he is co-managing a forum for information and communication technology transfer in a regional area. Currently he is responsible for the development of an object-oriented user interface management system with multi-media extensions and manages a variety of projects that improve Human Computer Interaction (e.g. video integration into a process control system). Within this department WWW related work is performed, e.g. extending WWW to an easy-to-use inhouse information and communication system. He has worked since 1988 at the Fraunhofer Institute for Computer Graphics in Darmstadt in the integration division, prepress and publishing systems department. Wiedling participated in the development of an interpreter for page description languages (i.e. PostScript) and a raster image processor. He also was involved in a project for multi purpose data transfer in the graphics arts industry. His current research interests are multi media extensions and online information systems.