ODA: a document architecture for open systems

ODA: a document architecture for open systems

O DA: a document architecture for open systems R Hunter*, P Kaijser t and F Nielsen discuss the purpose and coverage of the Office Document Architectu...

1017KB Sizes 0 Downloads 78 Views

O DA: a document architecture for open systems R Hunter*, P Kaijser t and F Nielsen discuss the purpose and coverage of the Office Document Architecture standard

The main concepts and aims of the new ISO 8613 standard~generaUy referred to as Office Document Architecture- that has been jointly published by the International Organization for Standardization and the Consultative Committee on International Telephone and Telegraph, is described here. ODA provides a general information architecture which can be used as a basis for encoding documents so that they can be transferred between dissimilar document processing systems. The architecture supports the representation of multi-media documents (containing text, raster graphics and computer graphics) in both revisable and final forms. ODA is expected to have a significant role in the integration of office systems, particularly those that operate in an Open Systems Interconnection environmenL Keywords: office information systems, Office Document Architecture, document interchange, Open Systems Interconnection

The recent growth in the use of personal computers and decentralized computing in the office allows users to create a wide range of document types containing, e.g. text, images, computer graphics and data. This has emphasized the need to be able to transfer information electronically both within and between office systems. There are two main aspects concerning the interchange of information: the communication protocols to be used; and the format of the information to be transferred. The first of these aspects is being tackled by the development of the Open Systems Interconnection *BritishTelecom ResearchLaboratories,MartelshamHeath, IpswichIP5 7RE, UK

tSiemensAG, Otto Hahn Ring6, D-8000Munich 83, FRG ~tNational Institute of Standards and Technology, Gaithersburg, MD 20879,USA

(OSI) communication protocol standards. These aim to define communication standards that will allow computer systems from different manufacturers to communicate with one another. The standards are now maturing, and applications based on them are currently being developed. The second aspect (information format) can now be satisfied with the use of a new multi-part standard, commonly referred to as ODA (Office Document Architecture) published by the International Organization for Standardization (ISO) 1. In addition the Consultative Committee on International Telephone and Telegraph (CCII-I-) have published a set of standards in the form of the T.410 series of recommendations 2. Apart from two small extensions, one specific to ISO and one specific to CCITT (see below), the two standards are identical. (ODA is used here to refer to both sets of standards.) The prime objective of ODA is to provide for the representation and encoding of documents so that they can be transferred between different systems irrespective of their manufacture, so that those systems can have a common understanding of the data that makes up those documents. One of ODA's most important features is that it provides for the representation of documents in two principal forms, i.e. the processable form, which allows a document to be revised by a recipient, and the formatted form, which allows the precise layout of the documc~nt to be specified. ODA also provides for the transfer of documents in formatted processable form; in this form the content of the document is represented in both formatted and processable forms at the same time. In addition, ODA provides for the representation of multi-media documents. At present, ODA standardizes three content types: character text; raster scanned images; and computer graphics. These content types conform to other existing standards, as shown below. In general, ODA provides features to support a wide range of different language and cultural requirements, and

0140-3664189/020069-I 1 $03.00 © 1989 Butterworth & Co (Publishers) Ltd vol 12 no 2 apdl 1989

69

thus the standard can be used for the worldwide interchange of electronic documents. Also, ODA encoded documents can be transferred using any form of communications, including storage media and OSI communication systems such as Message Handling Systems (MHS) 3 and File Transfer, Access and Management (FTAM) 4. ODA is a large and complex standard, and only a broad introduction to its capabilities, technical features and potential applications can be given here.

HISTORICAL BACKGROUND/CURRENT ODA STATUS In 1981 the European Computer Manufacturers' Association (ECMA) and ISO started work on ODA, and close liaison was established with CCI1-F shortly after. ISO completed its work on ISO 86131 early in 1988, and has approved it for progression as an International Standard (IS). In conjunction with this, in February 1988, CCITT approved the T.410 series of recommendations 2, which will be identical to ISO 8613. These recommendations were published in the CCITT 1988 Blue Book. However, there have been two interim standards relating to ODA. The first was CCITT Recommendation T.73 s, which was published in the 1984 Red Book. This only provided for formatted documents. At the same time, CCITT also approved recommendations defining applications based on T.73 (see T.72 and T.6 in the 1984 Red Book). T.73 is to be replaced in the Blue Book by the T.410 series of recommendations. In addition, ECMA, who contributed substantially to the early development of ODA, published ECMA-101 in 19856. This was a more enhanced standard than T.73, and catered for formatted, processable and formatted processable form documents. ECMA-101 was used as a basis for developing prototypes of ODA-based systems. Also, it is planned to issue a revised version of ECMA-101 which is aligned with ISO 8613. ISO 8613 and the equivalent T.410 series contain seven parts: ISO 8613-1 (T.411): ISO 8613-2 (T.412): ISO 8613-4 (T.414): ISO 8613-5 (T.415): ISO 8613-6 (T.416): ISO 8613-7 (T.417): ISO 8613-8 (T.418):

Introduction and General Principles; Document Structures; Document Profile; Office Document Interchange Format (ODIF); Character Content Architectures; Raster Graphics Content Architectures; Geometric Graphics Content Architectures.

The two standards are identical except that ISO 8613 provides for the encoding of documents using either ASN.17-9 or Standard Generalized Markup Language (SGML) 1°, whereas the T.410 series includes only the ASN.1 encoding method. The T.410 series also specifies an interface to MHS 3. Both standards cater for Videotex applications (note that part 3 is reserved for future use). CCI-I-r have approved a new series of communication protocols for use in conjunction with the T.410 series - -

70

~_~ [ DOCUMENT SPECIFIC I Logical CONTENT view Figure 1.

Layout view

Two views of a document's structure

the T.430 series11. Both the T.410 and T.430 series will be used as the basis of further document interchange sewices that will be developed during the next study period (1989 to 1992).

ODA PROCESSING MODEL The key concept used in the ODA processing model is that a document can be described in three ways. As illustrated in Figure 1, a document can be viewed in terms of its logical components or its layout components. A document can also be described in terms of both its logical and layout components at the same time. In the logical view, the content of a document is described in terms of components that have a meaning to the user, e.g. chapters, sections, paragraphs, headers, titles, footnotes, figures and captions. In the layout view, the document is described in terms of components that relate to its presentation, e.g. how the document content is divided into pages, and how portions of the content are positioned within the pages. (How a document can be represented in accordance with these views is described below.) Figure 2 provides an outline of the document processing model on which ODA is based. The three types of documents that are distinguished in this model are: Processable form documents When a user creates and edits a document in a typical local text processing system, the content of a document is usually stored in what is referred to as processable form. In this form, the document has not yet been laid out in a form suitable for reproduction by an output device. A processable form document is one which is described in terms of its logical components. The document may or may not contain information concerning how it is to be laid out and presented by the layout process. It may also contain rules and other information on how to edit the document. Interchanging documents in this form is thus very useful, since the recipient can easily modify both its content and structure, and can also modify the intended layout and presentation of the document. Formatted form documents A formatted form document is one in which the layout and presentation of the document are specified completely. Thus, a formatted document is one which is described in terms of its layout features, and which contains all the information required for an output device to print or display the document as intended by the originator. Documents in this form are not intended to be revised by the recipient. This form is very useful when interchanging finalized documents or other documents that are not intended to be modified.

computer communications

Formatted form document _~

Ii i i

ii','!3'i

~i!i!i ~mag.~n.g.i!i!i

.iiiiiiiiiiii!i!iiiiiiiiiiiiiiiiiiiiiiiii!i.

I :'!'

_Ii ii ! ii!!i i l -liiii!:goc ge i.!i!i !

i i!ii ii!iiii~i o~ui i i i i i~i !ieni i !i!i!i!i!i!iiiiili i i I!i ~:.:.:.:-:, layout ::::::::::::l

Processable form document

I YI

i

i

:!:!: process:!:i:iI

l_

Formatted processable form document

-li iiiiiiiiiiiiiiiiii

I

Figure 2. Document processing model outline Formatted processable form This is the most powerful and versatile form provided by ODA, since it enables a document to be represented and encoded in both processable and formatted form at the same time, i.e. the document is described both in terms of its logical and layout features. In this form, a document can be edited and reformatted according to information specified by the originator, or it can be displayed by the recipient as it was laid out by the originator. The important feature of this form is that although both the processable and formatted forms of the document are interchanged, the content relating to both forms is the same, and is only interchanged once. ODA does not define the actual processes (e.g. filing, retrieval, creation, editing, formatting, imaging) that might be carried out on documents in an end system. However, ODA is concerned with interchanging documents with sufficient information to support these processes. The purpose of the ODA processing model is to explain how the information that makes up a document may be used in local document processing. In particular, one important feature of ODA is the description of the reference model of the document layout (or formatting) process. This reference model provides a means of defining the semantics of documents in processable form, and how this form of document is intended to be displayed by an end system. The actual layout process used in an end system depends on each particular system, but the results of the layout process are expected to be in accordance with the reference model. (In this short review, the reference model is not described further.)

DOCUMENT ARCHITECTURE MODEL Part 2 of the ODA standard defines a document architecture model which provides a range of features that can be used when describing a document. These features are well defined in the standard so that different systems can have a common understanding of documents when they are interchanged. The key concept of the model is that a clear distinction is made between the document content and the

vol 12 no 2 april 1989

structural characteristics of the document. In the model the structure and content of a document are completely independent. The advantages of this approach include: • the structure of a document can be manipulated independently of the content and without a need to understand the content; • more than one content type can be integrated in a document, irrespective of its structure; • it is possible to specify the structure of a document without specifying its content, and vice versa. These advantages are realised by taking an approach which divides and subdivides a document into its component parts; these are referred to as objects. The purpose of this is to distinguish between parts of a document that have, e.g. different logical functions within the document, different layout and presentation requirements or different content types and encoding. The rules for defining the structural relationships between the objects and their characteristics are defined by what is known as the structural model (see below). The method of representing the document components and their characteristics is then provided by what is known as the descriptive model (see below). In this model the characteristics of each object are described by means of a set of attributes, each of which specifies a particular property of the object and its relationships with other objects. The content of the document is divided into a number of broadly defined types, e.g. text, computer graphics, data, etc. The structure of each of these content types is described by means of a set of rules known as a content architecture. An important feature of ODA is that there is a clearly defined interface between the structure of a document and its content. Hence, the standard can theoretically provide for the representation of documents containing any type of content (see below).

STRUCTURAL MODEL There are three main elements of the structural model: specific structures; generic structures; and the content.

71

DOCUMENT

Document layout root

/-,,,

Y

Page s e t s (level I )

Page sets

Page sets (level n)

.[ [ HeaOerf ame]

7",,

Page

Pages Frame Block Columns represented by frames

I

Figure 3. images

Footer frame I I

-,, Frames (level l )

,/ \

Layout structure of a document D: text; I~:

The structures of particular documents are defined by means of specific structures. Generic structures define object classes and document classes which provide the means to generate objects and complete documents with common characteristics, respectively. The third element concerns the document content and how it relates to the document structures.

Specific structures In the ODA document architecture model, the logical and layout views of a document are represented by a specific logical structure and a specific layout structure, respectively (see Figure 1). The specific layout structure is used to describe documents in formatted form. The specific logical structure is used to describe documents in processable form. Formatted processable form documents are documents which are described both in terms of a logical and a layout structure.

Layout structure Figure 3 illustrates the layout features of a typical document. In terms of the ODA structural model, this layout structure is represented in the form of a hierarchical tree structure of objects which have a well defined order (illustrated in Figure 4). Non-terminal nodes in the structure are referred to as composite layout objects and terminal nodes are called

72

/

Frames (level n)

,/\ Blocks

/ \ Content

Figure 4. Hierarchical layout structure basic objects. The top level is the document layout root, and below this are the objects of the types page set, page, frame and block. Page sets are optional, and may be used to distinguish between groups of pages that have a specific function in a document, e.g. a set of pages having the same format or corresponding to a chapter in the document. There may be more than one level of page set in a structure, and so a page set can be subdivided into groups of other page sets. Objects of the type page correspond to a unit of the presentation surface. The interpretation of this unit is the responsibility of the application, e.g. a single object of the type page may be reproduced on a single sheet of paper.

computer communications

Frames are used to partition pages into different areas, e.g. to divide the page into one or more columns of content. Frames are also optional, and like page sets there may be more than one level of frame in the layout structure. Thus, areas within a frame can be delineated by other frames. A particular feature of frames is that their position and dimensions can be defined as fixed or variable within a page. Fixed means that the position and dimensions are predetermined: variable means that the position and dimensions can be specified relative to other frames, and is dependent upon the amount of content to be placed within the frame. This is a very powerful feature, since it provides for a wide variety of possible page layouts. Blocks are the lowest level objects in the structure, and cannot be subdivided. They act as containers for the document content, and are used to position the content within frames and pages. Only one type of content may be placed within a block, and so blocks are used to delineate between portions of content that have different characteristics. Composite images made up of different content types may be formed by ovedaying blocks. ODA defines a co-ordinate and measurement system that allows different precisions to be used when positioning and dimensioning layout objects. In addition, the end system can scale the document so that it can be reproduced on different media, e.g. paper, soft copy devices and micro fiche.

DOCUMENT LOGICAL ROOT

J\ Composite logical objects (level I)

S\ Composite logical objects (level n)

Basic logical objects

j-,,,

Logical structure As in the case of layout structures, the logical structure may be represented in terms of a tree structure (see Figure 5). The top level in the structure is called the document logical rooL Below this there may be one or more levels of nodes representing composite logical objects. The lowest level nodes are called basic logical objects. Like blocks,

I

Content portions

Figure 5. Hierarchical logical structure

J

DOCUMENT LOGICAL ROOT

I

,,

i

I

FRONT MATERIAL

1

I

I

BODY

1

I

IoATE 1 TTLE AOTHORILsoMMARY I [ IIc°°te°tll I II Con/eottIll Para°ra°h I IIC°°tentll

Ilcooteotll

I Paragraph

FIGURE1

I

I l Paragraph

I

Diagram ] I Caption

llc°ntentII

I

il C°nteotII II C°ntent

II Content

I]

Figure 6. Specific logical structure example

vol 12 no 2 april 1989

73

basic logical objects act as containers for the document content, and each contains only one type of content. The branches in the tree structure represent the division of the composite objects into their subordinate components. A composite object therefore represents the structural relationships between groups of subordinate objects. The precise purpose of each object in the logical structure is not defined in the ODA standard. It is the user who determines how a particular document is defined in terms of a logical structure, and what the components of that structure represent. This is in contrast to the layout structure, in which the purpose of each object is defined (see above). Thus, a document may be represented simply as a sequence of objects called paragraphs. Alternatively, the user may wish to indicate how the paragraphs are arranged into sections, chapters and annexes; this can be achieved by using an appropriate logical structure. Also, ODA allows logical objects to be linked to appropriate layout objects, making it possible not only to describe the logical semantics of various parts of a document, but to specify how each object is to be laid out and presented. An example of a logical structure is shown in Figure 6, which outlines how the logical components of a document, and their hierarchical relationships, can be specified. Document content The content of a document is divided into components called content portions. Each content portion is associated with basic objects of the specific structures that exist in the document. In formatted and processable form documents, each content portion is associated with a basic layout object or a basic logical object respectively; in a formatted processable form document, each content portion is associated both with a basic layout and a basic logical object. Each content portion contains content information, such as text, raster graphics or geometric graphics information, in encoded form, and any additional information relating to the coding of the content. The content information pertains to a particular type of content, i.e. it is not possible to mix different content types within one content portion. In principal, for each content type, three content architecture classes may be defined, i.e. formatted, processable and formatted processable content architecture classes (see below for further information). Formatted content is content for which all necessary information relating to its layout has been specified. Such content is intended to be imaged but not edited. Processable content is content which has not been laid out and is suitable for editing. Formatted processable content is a combination of the other two forms, and can be imaged as specified, or edited if required.

Generic structures

Many of the objects that are present in a specific layout or logical structure that represents a particular document can be classified into groups of objects that have identical or similar characteristics, e.g. the paragraphs in this paper all have identical characteristics. Also, it can be seen that the

74

layout of the pages of this paper are very similar, the size of each page is the same, and the number of columns on each page is the same. It will also be observed that a particular document can often be regarded as belonging to a group of similar documents, e.g. all technical reports produced by an organization may be drafted in accordance with a predetermined specification, i.e. with a standard front matter, standard items in the contents list (such as management summary), standard paragraphs (such as a copyright notice), and specific rules for laying out the content on the pages. Similarly, letters from a particular company are likely to have to have a standard layout with a particular logo. Such a group is referred to as a document class. This has led to the concept of object classes to represent the common information that is applicable to parts of a document that have similar characteristics. Alternatively, object classes can be used collectively to represent document classes (see above). Whether they are used individually or collectively, a set of logical object classes and a set of layout object classes are referred to as a generic logical structure and a generic layout structure respectively. Object classes An object class corresponds to any of the types of objects that make up a specific layout or logical structure, e.g. page classes, frame classes and composite logical object classes. One of the main purposes of object classes is that they can act as containers for common information required by a number of objects in the specific layout or logical structure. An object in a specific structure can access the common information by referencing an appropriate object class. A significant saving in the information needed to represent a document can be achieved using this method. In addition, object classes can be used during the document creation and editing process as 'templates' for the creation of parts of a document that have similar features. Document classes A document class is a set of object classes that is able to control the creation of a set of complete documents with common characteristics, e.g. consider a document of the document class 'report'. The logical structure of such a class might be defined as follows:

Report : : = Front matter Logo Title Author's name Summary Body Sequence of chapters Sequence of sections Sequence of paragraphs Annexes Index The layout characteristics corresponding to this document class could be represented in a similar way. (Note that the above example consists of only an outline, and is not intended to be a complete definition of the class.) The logical and layout characteristics of document

computer communications

DOCUMENT LOGICAL ROOT

JSEQ

I

I REP

Front matter

f- . . . . . . I I A I I.. . . . .

I I I I -.I

J SEQ

Chapter

I I SEQ

Title I OPT

Logo

Title

Author's name

I I

,l

ICHO

i

A

I-- . . . . . .

'I, J

Summary

I.EP

REP[ Paragraph

I

Section __--L------

iI II

Content

I

A

Ii iI .I

Figure 7. Generic logical structure example DESCRIPTIVE MODEL

classes can be defined using ODA by means of a 'complete' generic logical structure and a 'complete' generic layout structure, respectively. The characteristics of each logical component in the document class, e.g. the logo, title or index, is represented by a logical object class. Similarly, layout components are represented by layout object classes. Figure 7 illustrates the generic logical structure for the above example. This illustrates a document class that consists of a SEQuence of the component 'front matter' followed by the component 'chapter' which can be REPeated. Each chapter contains a title and a number of sections, etc. A 'complete' generic structure can be regarded as a set of rules for controllingthe generation of specific structures. These rules are contained in construction expressions in the object classes. The generic logical structure is used to control the creation and editing of documents, and the generic layout structure is used to control the generation of the specific layout structure when the document is formatted (see also below). Thus, document classes are a particularly important feature within the processing model. It should also be noted that generic structures support many other important features that are useful in document processing, e.g. automatic numbering can be provided for pages, chapters, sections, etc., and it is possible to provide automatic references from one part of a document to another (e.g. references to footnotes and figures).

The descriptive model is concerned with the constituents that can make up a document when it is interchanged. A document may contain up to eight different types of descriptions: logical object and object class descriptions; layout object and object class descriptions; content portion descriptions; layout styles; presentation styles; and the document profile. (These are described further in the following subsections, and which of these descriptions may be contained in each of the three forms of document described in the above section on the ODA processing model is explained below.) In general, a constituent consists of a set of attributes that describes the characteristics of that constituent. Each attribute has a unique name, and may specify a well defined range of possible values, e.g. the attributes 'object identifier', 'position', 'dimensions' and 'subordinates' may be associated with an object of the type frame. These respectively serve to identify the object, to define its position relative to its containing object, to define its horizontal and vertical dimensions, and to identify the subordinate objects which are contained within that frame.

Generic content

These descriptions contain information concerning the characteristics of the objects and object classes that make up a document. This information includes the hierarchical and the non-hierarchical relationships that each component has with other components in the document.

In the same way that content portions can be associated with objects in the specific structure, it is possible to associate generic content portions with object classes in generic structures. This allows for the specification of content which is common to more than one part of the document, e.g. a header or logo which appears on even/ page of a document.

vol 12 no 2 april 1989

Document constituents

Object and object class descriptions

Content portion descriptions Content portions are represented by content portion descriptions. These descriptions contain the encoded

75

content information itself, and any additional information relating to the encoding method used.

the document creation date, which may be of interest to the user and can be used in automatic machine processing, e.g. for filing and retrieval.

Styles DOCUMENT ARCHITECTURE CLASSES

The information relating to the layout and presentation of objects and object classes is specified in separate constituents called styles. Each object and object class may refer to an appropriate style, of which there are two types: layout styles; and presentation styles. Layout styles are referred to by logical components, and specify information relating to the layout of those logical objects, e.g. a layout style can be used to control the placement of a'paragraph' at a particular position on a specified page. Presentation styles are referred to by basic logical and layout objects, and specify information relating to the layout and imaging of content portions. Different sets of presentation attributes are applicable to different types of content. For character content, presentation attributes specify, e.g. the indentation required, the character and line spacings, and the font to be used. More than one object or object class may refer to the same style, and this mechanism facilitates document editing, e.g. changing a style will affect all objects that refer to that style. This mechanism also improves transmission efficiency.

ODA defines three document architecture classes: formatted document architecture class; processable document architecture class; and formatted processable document architecture class. Figure 8 shows which of the eight possible types of constituent may be present in a document corresponding to each of the three document architecture classes. These three classes correspond to the three ways in which a document can be interchanged (see above). Figure 9 illustrates the document processing model in more detail, and shows that the creation of a processable form document is carded out under control of the generic logical structure. The generic logical structure also provides factorization information when laying out the document. The layout process creates a specific layout structure, and this is carded out under control of the generic layout structure, layout styles and presentation styles.

CONTENT ARCHITECTURES ODA currently caters for three content architectures the main features of each of these are summarized in the following subsections. The definition of a content architecture contains the following information: -

Document profile One further constituent of a document remains to be described- the document profile. This contains information which relates to the document as a whole. The document profile provides two types of information: first, it provides technical information concerningthe document, including the type of structures and content used in the document. This information enables the receiving equipment to determine whether it is capable of fully processing the document without the need to parse the whole document. Second, the document profile contains document management information, such as the author's name and

• The specification of the graphical elements, control functions and encoding methods to be used. • A definition of the principles of positioning the graphic elements within basic layout objects. • A description of the content layout process; this is a reference model that defines what the result of a content layout shall be when transforming content belonging to the logical structure to content belonging to the layout structure.

Layout sty es

I Pr estey/teati°n I

/

Constraints on

specific logical

structure

Generic logical structure

i

-

Editing process

Document layout process I

Specific" logical structure

Content layout process

~

C o n s t r a i n t s on specific layout structure

Generic layout structure

created Specific layout structure

%

--~IP-

Imaging

T

Figure 8.

76

Layout process

process

Content

Content

Processable form document

Formatted form document

Document processing model details

computer communications

• A definition of the attributes which may be applied to the content; these provide information relating to the layout and imaging of the content.

Character content architectures Part 6 of the ODA standard defines formatted, processable, and formatted processable character content architecture classes. These make use of ISO 693712 and any registered subrepertoire of ISO 6937 (e.g. the teletex subrepertoire 13 and the character repertoire of ISO885914). Also, different fonts can be selected in accordance with ISO 95411s, 16. Part 6 provides a large range of layout features including alignment, tabulation, writing directions of 0, 90, 180 and 270 degrees to the horizontal, indentation, itemization and first line offsets, parallel annotations, subscripts/superscripts, different modes of emphasis, pairwise keming, and fixed or proportional line and character spacing.

Raster graphics content architectures Part 7 of the ODA standard provides for two-dimensional, two-tone raster scanned images which are encoded in accordance with facsimile encoding methods defined in CCITI" Recommendation T.417 and T.618. A 'bitmap' encoding scheme is also provided. Two content architecture classes are defined: formatted form, which can only be used in formatted form documents; and formatted processable form, which can be used in any of the three forms of document. Formatted processable form is the more flexible form. It can be laid out in accordance with a specified picture element and line spacing, or in accordance with size constraints specified on the resultant image. In the latter case, the image resolution is determined when the content is laid out.

Geometric graphics content architecture Part 8 of the ODA standard provides for computer graphic images, as defined in Computer Graphics Metafile (CGM) 19'2°. Any binary encoded CGM image is a valid content in ODA documents. A formatted processable content architecture class is the only class defined, and this can be used in any of the three forms of document. This content architecture class has many similarities with the formatted processable raster graphics content architecture class, including an aligned reference content layout model.

OFFICE DOCUMENT INTERCHANGE FORMAT Part 5 of the ODA standard defines the allowed formats of the data streams that may represent ODA documents. The ODIF data stream consists of a set of interchange data elements, each of which represents one of the allowed constituents of a document. The intemal structure of the data elements are defined using Abstract Syntax Notation One (ASN.1), as specified in ISO 88248. Only the ASN.1 encoding rules defined in ISO 88259 are specified for coding ODA documents.

vol 12 no 2 april 1989

In addition, in order to provide a gateway with systems based on SGML (defined in ISO88791°), ISO8613 specifies an SGM L encoding of ODA documents such that one-to-one mapping between the two can be performed. (Note that SGM L originally was intended for the representation of documents supported by the publishing industryT.) The SGML representation of ODA documents is included in ISO 8613, but not in the T.410 series of recommendations.

APPLICATIONS ODA is a 'base' standard which offers a wide range of different sets of features. As a result, there has recently been much activity in the development of functional profiles that will specify subsets that are appropriate for particular applications. Subsets of the ODA standards are referred to as Document Application Profiles (DAP). DAPs will be published by various standards-making organizations, e.g. by CCITT in the context of telematic service recommendations, by ISO in the form of International Standardized Profiles (ISP), by CEN/CENELEC in Europe as European Norms (EN), and by the US Federal Government as Federal Information ProcessingStandards (FIPS). CClTI- has published four DAPs relating to ODA in the 1988 Blue Book: T.50121; T.50222; T.50323 and T.50424. The Blue Book will also contain other recommendations specifying how these DAPs are to be used in CClTr telematic services. Additional recommendations will define the equipment characteristics and communication requirements for these services. Additionally, user/manufacturer groups are now engaged in the development of a hierarchically related set of DAPs that will provide for the interchange of documents ranging from simple text documents to highly structured multi-media documents containing text and graphics (examples of the latter are documents that can be produced using desk-top publishing systems). T.50222 is expected to form the lowest level in this hierarchy. The groups involved in this include the US National Institute of Standards and Technology (NIST), the European Workshop for Open Systems (EWOS), and the Interoperability Technology Association for Information Processing (INTAP), Japan. Joint meetings of these groups are being held to produce internationally harmonized profiles for publication as ISPs. It is also expected that there will be close liaison with CcITr in this work. There are now many initiatives concerning the implementation of ODA. Two of the most important are the European Strategic Programme for Research and Development in Information Technology--Piloting of ODA (ESPRIT-PODA) project in Europe, and the Experimental Research in Electronic Submission (EXPRESS) project in the USA. The participants in PODA interworking demonstrations are ICL (UK), Bull (France), Siemens (Germany), Olivetti (Italy) and OcE (The Netherlands). Additional organizations are expected to join the PODA 2 project, which is now under review. The EXPRESSproject is mainly being carried out by Carnegie Mellon University and the University of Michigan in collaboration with several other organizations, including the NIST. These projects have similar aims, i.e. to demonstrate the use of ODA to interchange information between different proprietary systems, to develop document

77

editing systems that can fully exploit ODA's potential, and to further enhance ODA's capabilities. In addition, ISO, in conjunction with other organizations, are currently studying ODA conformance testing methodologies and test suites. These can be used as a basis for developing software tools which will facilitate conformance testing 2s. EXTENSIONS

TO ODA

ISO, in conjunction with CCITT, are developing a framework for future extensions to ODA. The areas to be studied include: document access and manipulation functions--these will provide support for enhanced applications such as remote document editing and data entry; colour information; the use of data in documents to provide for applications such as spreadsheets, processable tables and business graphics; security features; annotations and control of document revision; sound (i.e. audio and voice information); and enhanced layout features. CONCLUSION

ODA is the first international standard that takes a comprehensive view of the needs for the interchange of multimedia documents in both revisable and final forms. ODA provides a general information architecture that allows systems of different manufacturers to have a common understanding of interchanged documents. ODA is thus expected to play a dominant role in the integration of office and communication equipments, particularly for those adhering to OSI. Furthermore, ODA will gain rapid acceptance as it has been adopted as the basis of all worldwide CCITT telematic services for the interchange of documents. REFERENCES* t

1 Information Processing: Text and Office systems; Office Document Architecture and Interchange Format (ISO 8613) Parts 1-8 (1989) 2 Open Document Architecture and Interchange Format CCITT T.410 series of recommendations

(I 988) 3 Message Handling Systems CCITT X.400 series of recommendations (1984 & 1988) 4 Information Processing systems- Open Systems Interconnection -- File Transfer, Access and Management (ISO 8671) Parts 1-4 (1987) 5 Document Interchange Protocol for the Telematic Services CCITT Recommendation T.73 (1984) 6 Office Document Architecture ECMA-101 (1985) 7 Smith, J 'Standard Generalized Markup Language and related standards' CompuL Commun. Vol 12 No 2 (April 1989) pp 80-84 8 Information Processing Systems- Open Systems Interconnection- Specification of Abstract Syntax Notation One (ASN.1) (ISO 8824) (1987) 9 Information Processing Systems--Open Systems Interconnection -- Basic Encoding Rules for Abstract *InternationalStandards(IS)areavailablefrom the InternationalOrganization for Standardization Central Secretariat, 1 rue de Varemb~, Case Postale 56, CH-1211 Geneva 20, Switzedand. tCCITT Recommendationsare availablefrom CCI1-Y,Placedes Nations, CH-1211 Geneva20, Switzerland.

78

Syntax Notation One (ASN.1) (ISO 8825) (1987) 10 Information Processing Systems- Text and Office Systems- Standard Generalized Markup Language (ISO 8879) (1987) 11 Document Transfer and Manipulation: Services and Protocols CCITT T.430 series of recommendations (1988) 12 Information Processing: Coded Character Sets for Text Communication (ISO 6937) Parts 1-3 (1985) 13 Character Repertoire and Coded Character Sets for the International Teletex Service CCI1-F Recommendation T.61 (1988) 14 Information Processing--8-bit Single-byte Coded Graphic Character Sets (ISO 8859) Parts 1-3 (1987) 15 Information Processing ~ Font and Character Information Interchange (ISO 9541) (to be published) 16 Smura, E, 8eeton, B, Savage, K and Griffee, A 'Font Information Interchange Standard ISO/IEC/9541' CompuL Commun. Vol 12 No 2 (April 1989) pp 93-96 17 Standardization of Group 3 Facsimile Apparatus for Document Transmission CCITT Recommendation T.4 (1988) 18 Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus CCITT Recommendation T.6 (1988) 19 Information Processing Systems-- Computer Graphics -- Metafile for the Storage and Transfer of Picture Description Information (ISO8632) Parts 1-4 (February 1987) 20 Mumford, A'Why care aboutthe ComputerGraphics Metafile?' CompuL Aided Des. Vol 19 No 8 (October 1987) pp 425-430 21 A Document Appfication Profile MM1 for the Interchange of Formatted Mixed Mode Documents CCITT Recommendation T.501 (1988) 22 A Document Application Profile PM1 for the Interchange of Processable Form Documents CCITT Recommendation T.502 (1988) 23 A Document Application Profile forthe Interchange of Group 4 Facsimile Documents CCITT Recommendation T.503 (1988) 24 Document Application Profile for Videotex Interworking CCITT Recommendation T.504 (1988) 25 Carr, R and Dawson, F 'Conformance testing of Office Document Architecture (ODA)' CompuL Commun. Vol 12 No 2 (April 1989) pp 102-106 Roy Hunter graduated from the University of Manchester in 1968 and later received his PhD degree from the University of Salford. He joined British Telecom Research Laboratories in 1970, since when he has been concerned with the development and standardization of telecommunication systems. For the last 5 years he has been involved in the development of the ODA standard in CCITT and ISO. Mr Hunter currently chairs a CCITT SGVIII special rapporteurs group studying the development of functional profiles based on ODA.

computer communications

Per Kaijser received his doctor degree in quantum chemistry from the University of Upsala, Sweden, in 1974. He continued with research in physics and chemistry at universities in Canada, Denmark, Germany, Sweden and the USA until 1980, when he started work with Siemens AG. He has ...... been employed as a systems architect in office communications at Siemens since 1984, and shares joint responsibility for the development of the ODA standard. Dr Kaijser is an active member of several standardization committees in DIN, ECMA, ISO and CCITT.

vol 12 no 2 april 1989

Frances H Nielsen received her BSc in computer science from the American University, Washington DC, and has done graduate work in information systems at the University of Maryland. She is currently a MSc candidate at John Hopkins University, Baltimore. Mrs Nielsen is project leader for the development of FIPS related to document architecture and interchange formats in the Office Systems Engineering Group at the NIST.She is a member of ANSI standards committee X3V1, vicechair of the NIST ODA SIG, and of the TOP Document Architecture Subcommittee.

79