Pergamon
InfiJrmation Processing & Management, Vol. 33, No. 5, pp. 599-614, 1997 © 1997 Elsevier Science Ltd All fights reserved. Printed in Great Britain 0306-4573/97 $17 +0.00
PII: S0306-4573(97)00020-4
LOGICAL STRUCTURE OF A HYPERMEDIA NEWSPAPER JANNE SAARELA*, MARKO TURPEINEN, TUOMAS PUSKALA, MAR1 KORKEA-AHO and REIJO SULONEN Department of Computer Science, Helsinki University of Technology, 02150 Espoo, Finland
Abstract--The OtaOnline project at the Helsinki University of Technology has been deploying the distribution of Finnish newspapers such as Iltalehti, Aamulehti and Kauppalehti on the Internet since 1994. The editors produce the electronic counterpart of these papers by a conversion process from QuarkXpress documents to HyperText Markup Language. The project is about to step into a new phase by introducing an approach which provides many new features not available in the old process. This paper describes an object-oriented approach which implements the logical model of a hypermedia newspaper. This model encapsulates the structure of the hypermedia documents as well as their capability for transforming into different presentation formats. It also provides a semantical rating mechanism to be used with intelligent agents. A distribution scheme which enables efficient use of this model is also presented. © 1997 Elsevier Science Ltd
1. I N T R O D U C T I O N
1.1. Otaonline OtaOnline is a testbed, implemented at the campus of the Helsinki University of Technology, for experimenting with the possibilities of the net media and electronic publishing. The testbed has been built around an existing high speed network, ubiquitous in the Otaniemi campus area. OtaOnline consists of a product development project and a research project. These two subprojects share many common research and development efforts, but the ultimate goals are different. The OtaOnline product development is primarily interested in experimenting with the net media concept. The usage of the net media is carefully studied with different tools. New product concepts and enhancements are introduced on a regular basis. The OtaOnline research project concerns the problems of large-scale implementation of net media. Four main research areas are production systems, scalable distribution systems, structure modelling of multimedia products, and personalisation of multimedia services.
1.2. Problem domain This paper discusses the design of a logical structure of a hypermedia news server which fulfills several requirements set for servers of this type. These requirements have been identified as follows: 1. Control over the structure--an electronic newspaper should be designed on two distinct levels: in the document level which combines different media types into one document and in the publication level which creates relations between the documents. 2. Control over presentation--the presentation of the documents should be defined separately from the structure in order to enable several concrete representations from one high-level, presentation-independent representation. 3. Versioning--an edition of an electronic newspaper can be conceived to consist of the *To whom correspondence should be addressed. Tel., +358-9-451
[email protected]. 599
3246; fax, +358-9-451
3293; E-mail:
600
Janne Saarelaet al.
objects which have most recently been introduced to the system. This enables the paper to have real-time properties as the contents can be updated at any time. 4. Distribution--if the descriptive part of the documents to be presented is separated from the actual media data, the descriptive elements can be easily distributed and replicated among several hypermedia servers. In this way the servers can determine the material they store and serve according to their own criteria. 5. Metadata--the reusability of the hypermedia documents is much better if the documents have separate information about themselves. This information enables the personalisation aspect of a newspaper as it can be used as a basis for personal views inside a newspaper. This metadata will be separate from the description of the structure presented in requirement I. Moreover, the functionality of this logical model will focus on the following aspects: 1. Generation of the presentable hypermedia documents; 2. distribution of the objects of this framework; 3. added-value features facilitated by the model.
1.3. Concrete representations
Different hypermedia representations are available for our purposes. The World-Wide Web family of formats and protocols enables easy linking and presentation of hypertext documents in the HyperText Markup Language (HTML) format. The MHEG-5 ISO standard (see ISO/ MHEG-5) is assumed to become popular on other platforms apart from the Internet such as Set Top Boxes (STB) with normal TV sets. The term concrete representation will be used throughout this paper to describe the final presentation format of the client host. HTML, Java and MHEG-5 concrete representations are described in more detail below as they are the formats we are planning to support. The work in this paper concentrates on the HTML implementation. 1.3.1. HTML. HTML was developed as a part of the Word-Wide Web project in the early 1990s. At first it was only an SGML-look-alike application but was formally defined and accepted as an Intemet standard a few years later. HTML is a specific Document Type Definition (DTD) defined by the means of the SGML (see ISO/SGML) standard. It describes the logical structure of a document despite the fact HTML also contains presentation-oriented markup elements. HTML supports markup for 6 levels of headings, 3 types of lists, text emphases, link anchors and inlined images within the documents. A special FORMS collection of markup elements enables interaction and passing information from the client to the server. Private vendors are also introducing their own features to HTML in the search for more customers. Netscape Communications Corp. has recently introduced version 3.0 of their popular Netscape browser which supports even more markup elements such as frames and scripting languages. 1.3.2. Java. Java is a language developed by Sun Microsystems, Inc. This language is based on C++ and introduces features from Objective C. It eliminates the most common problems in object-oriented programming by removing pointers, multiple inheritance and provides garbage collection for the run-time environment. This language is interpreted on the World-Wide Web by a browser, the first of which was HotJava, also developed by Sun. Netscape browser version 2.0 brings this interpreter to a wide variety of platforms and offers the possibility of bringing the HTML documents alive. Java can be used to write applets, application or protocol handlers. Applets are programs which are included within HTML documents. They are identified using the APPI.ET element. Applications are stand-alone Java programs which need not be within HTML documents. Protocol handlers can be used to write handlers for media types which the browser does not support. The output of these handlers should be one of the known and supported media types. The following markup hints the WWW browser to fetch from the server and run the corresponding applet which has been already compiled into bytecode by a Java compiler. The
Logicalstructureof a hypermedianewspaper
601
bytecode consists of machine-independent instructions which run on any Java runtime system.
Java has a supported API which is a class library consisting of classes for graphical user interface, IO, network and utilities such as containers (collection classes). Java also supports the notion of threads which enable concurrent execution of applications. Concurrency can be controlled using the monitor and condition variable paradigm. 1.3.3. MHEG, PREMO. Two international standards are emerging as the most prominent candidates for multimedia interchange and presentation. The Multimedia/Hypermedia Experts Group (MHEG) has suggested an object-oriented framework (see ISO/MHEG-I) where the primitives of the standard are classes. The instances of these classes can be set to perform synchronised actions and user interaction using methods which encapsulate the functionality within the classes. The focus of the standard is in the interchange of multimedia presentations and it uses ASN.I (see ISO/ASN.1) for transmitting the description of the structure of the presentation and the media data itself. Today, there are some publicly available MHEG engines such as GLUE developed by GMD Fokus/DeTeBerkom~ Part 1 of the proposed MHEG standard is a large collection of classes from which part 5 specifies only a restricted subset. This subset is assumed to be widely used in Set Top Boxes (STBs) with future television sets. Part 3 specifies a scripting support whose implementation is still left open. Plans to use virtual machines similar to those used by Java, have been proposed. Part 5 defines a class library which can be used to create scenes, basic units of presentation. The instances of the class library are described along with basic units of interaction; links and actions. Every object can have several different actions associated with it and thus provides a way to describe the functionality as well. Secondly, Presentation Environments for Multimedia Objects (PREMO) (see ISO/PREMO), shifts focus to the interactivity of the final presentations and enables methods for creating and modifying the application models of the final presentation. PREMO uses object technology but differs from MHEG in the sense that it controls the presentation details and could even be used to construct a MHEG engine. An example MHEG-5 scene with MPEG video is presented below. (:scene Video (:items (:video (:size 320 100) (:position 0 0) (:hook #MPEG1) (:data (:name "queen.mpg") (:storage-mode #stream) ) ) )
1.4. Related work
Marcus Kesseler (see Kesseler, 1995) discusses issues on generating HTML presentations from a Hypertext Structure Description Language (HSDL). This work also addresses the problem of designing hypermedia applications in a presentation-independent manner. He does not extend this basic mechanism with any additional features such as more efficient distribution or personalisation. Ozsu et el. (see Ozsu et el., 1995) discuss the structural issues in a news-on-demand application where the articles do not have a higher level structure. They do present a ~http:llwww.fokus.gmd.delovmalgluel
602
Janne Saarela et al.
comprehensive study on modelling articles that have a sound logical and presentational structure. The personalisation of the system comes from the user queries that can be used to filter articles having a given keyword within the article.
1.5. Organisation of this paper This paper is organised as follows: Section 2 describes an ideal situation we would like the users of our servers to experience. Section 3 presents the logical model which enables the ideal use and goes through its major concepts. Section 4 shows how this model will be actually used. Section 5 says a few words about the implications of this model, Section 6 presents a framework in which this model will work. Section 7 gives ideas on the future work and Section 8 concludes the paper.
2. AN IDEAL SCENARIO Jukka, our example user, enters OtaOnline by connecting to an OtaOnline server. Jukka is at his home workstation and uses a WWW browser to read OtaOnline. The server uses public-key cryptography to authenticate Jukka as a legitimate user for session logging purposes and billing of services. The welcoming screen has many different views based on the service selections that Jukka has made during earlier sessions. The first view brings the news of general interest as an edited issue of latest news items (similar to current OtaOnline contents). The second view that Jukka has selected is a special edition for classical music enthusiasts, including concert reviews, interviews, high quality audio and video samples of recordings. The third view is dynamically constructed according Jukka's personal preferences described in his user model. The status of each view is visible on screen. Static (i.e. general) views should be accessible after a short transmission delay. The dynamic (i.e. personal) view is generated at this stage and Jukka is informed when the personalised view is readable. Jukka can also order the personalised view to be constructed at a specific time of day. Jukka enters the general news view and selects the first document of interest. The news document contains text, images, two synchronised video streams, and an application for entering and viewing user annotations related to the story. Navigation system includes a visual map of the current view and the position of the current story in the view. Jukka can now move from one article to another article inside the view. The view consists of latest documents as well as links to older documents related to the current document (news threads). The view may include these older documents, or there may be predefined queries to fetch relevant articles from the article archive, if Jukka wants to browse the related stories of interest. Jukka is able to rate each article, using a predefined scale (0-5). He thinks the current article was worth reading and gives the article rating 4. These annotations are used in specifying how interesting a given article is and used for providing other users with clues about interesting articles. The rating is purely optional. Jukka enters the module where he can view and change his registered user profile. The profile is based on predefined semantic groups that have an importance value. Jukka wants to see more motor sports news in his personal profile so he changes the 'Sports/Motor' parameter from 2 to 3. Information filtering methods are used to suggest articles to read. This ranked list of interesting material is part of Jukka's personal view. The selection is based on the user profile and the annotations other users have made related to current material. The billing of service is based on the data collected in the server log about the user session. Jukka has a 3 month subscription and is allowed to make as many requests as he likes within that period.
Logical structure of a hypermedia newspaper
603
3. GOING FOR THE GOAL
3. I. An object model describing the logical structure The object model which enables the functionality and flexibility we want to achieve is presented in Fig. 1. We argue for each class and class relation with other classes in the following subsections. This model will be used to provide one high-level representation from which several concrete representations can be derived. This is also known as multi-purpose publishing. By this we mean the capability of supporting different presentation and distribution environments. This model can be used to structure the contents of an electronic newspaper in two separate levels much in the same way as Garzotto et al. (1995) do: I. Authoring in the Small (AIS)--how a single hypermedia document (here a SingleComponent) is structured. This is accomplished using the Relation class underneath the SingleComponent class in Fig. 1. The Relation class provides a dynamic way to name relations to different MediaObjects. This level of structuring is also similar to the within-
Descriptor I°ldversi°n
L traverseditems ~
i~l:e~ch~irptin°I +date
has
I
I
~ I Publioation
I
ComponentI o
consistsof
|
I
I
Ua. i +prg°~fessi°n Semantics
pointsto
+query II +$1i es as +$1isstCategori tSubCategori
A
+toHTML3 II co"sistso'
~
+~O."TML~
L'S "O*O"OU"O O
in relationwith I has
Relation I+target
I I .*.'.m,,o..,l
I +toHTML3
I+toMHEGS I +toMHEG5 Fig. 1. The logical structure presented by the means of OMT.
604
Jann¢ Saarelaet al.
component layer of the Dexter Hypertext Reference Model (Halasz & Schwartz, 1994). 2. Authoring in the Large (AIL)--how the documents (here SingleComponents) are structured to form an electronic newspaper. This is accomplished using the classes Publication, View and GroupComponent.
3.1.1. Media types. Different media types supported by the model are Audio, Video, Dynamic, Text and Image. Each of these types is implemented as a class which has the functionality of converting the actual contents into different formats. The variety of formats is due to the fact that HTML3 browsers may natively support different formats than MHEG-5 or other concrete representation engines. The media classes share the same base class, MediaObject, which remains an abstract class but provides a method for identifying each object uniquely within the framework. The Dynamic class is a special class holding a small program or a routine to be used in the final presentation. A logical name is assigned to each instance of the Dynamic class is order to identify them. For example, a stock ticker display might be a Dynamic object running on the screen. 3.i.2. Media representations. We decided to have different media types to be encoded in the following formats in the first stage of development. Next stage will be to reuse these formats to produce other formats if necessary. • • • •
JPEG for Image--a format widely used in the news industry; MPEG level 1 for Video--a common format for video content producers; AIFF for Audio---a format defined primarily for audio interchange; SGML for Text (Universal Text Format DTD) (see Becker, 1995)--an international standard which enables high-level abstraction of documents independent of their presentation. It thus provides an excellent master representation for text documents from which several other representations can be derived.
3.1.3. Synchronising media. Introducing temporal media within the model poses the problem of synchronisation. Possible solutions to this problem can be categorised into two mutually exclusive choices: 1. logical synchronisation--something happens before, after or at the same time as another event; 2. timely synchronisation--something takes place at time tl having duration dl, something else taking place at t2 having duration d2. We adopt the concept of timed streams proposed by Gibbs and Tsichritzis (1994) which follows the second paradigm. They consist of a sequence of tuples of the form
, i= 1..... n. ei are the media elements (instances of MediaObject), si the start time and d~ the duration (in seconds). These tuples can be generated by the editor as a part of the editorial process. A graphical tool that supports visual cues for synchronisation will be necessary. For example, a Gantt chart gives an intuitive idea on concurrent and serial activation of temporal objects. In the first stage of development we do not enable the editors to set the time points for synchronisation. This design activity can, however, be easily added later as the Relation class provides a place for storing the time points. Figure 2 introduces an example construction of one SingleComponent holding several MediaObjects. 3.1.4. Conversion between representations. The philosophy we want to stress with this model is the separation of structure and presentation similar to SGML (see ISO/SGML) which is a standard to markup the structure of a document and DSSSL (see ISO/DSSSL) which is a standard for restructuring and formatting the SGML documents. In our model the formatting of the Components is implemented separately from the structure using the Formatter class. This class can have several subclasses each of which does a specific type of formatting. For example, Fig. 3 shows two different formatters, one for HTML3 and one for MHEG-5. The base class, Formatter, holds a list of all available formatters. The editor can apply one of these formatters to any Descriptor subclass and thus produce the desired concrete representation. The formatters are implemented by the system designer and they are used to
Logical structure of a hypermedia newspaper
605
[ -1 ~lwl.s
Y
ma~vic~o
Ag~ihdding~rty Fig. 2. A
~ml~all,S
Agas~'slastwve
SingleComponenthas named relations to three MediaObjects.
produce a standard layout for a publication. It should be noted that the media types themselves also participate in this formatting process by producing the suitable media formats to be used in the concrete representations. In the first stage of development, the MediaObjects of type Video, Audio and Image shall be encoded in one of the formats already supported by the WWW browsers. The Text objects will be converted from Universal Text Format DTD to HTML3 DTD using an event-driven translator built on top of the validating SGML parser, nsgmls 2, by James Clark. The translator reads the Element Structure Information Set (ESIS) format output of nsgmls and uses it as a basis for an event-driven compilation to a target output DTD. The tool we use is an extended version of the sgmlspl 3 conversion package originally written by David Megginson at the University of Ottawa. In the next stage the conversion of the other MediaObject types can later be encapsulated within the classes. The instances can then simply be asked to convert themselves to a desired encoding format. At this point the formatter for MHEG-5 incorporates a class library generated by the snacc compiler4 from the ASN. 1 descriptions for MHEG-5. This class library can be used to construct the scenes for the MHEG-5 presentation and to produce the universal encoding according to the
Formatter l +apply
II+$add ] +$1ist I +$instance
I
A MHEG5_Forrnatter
HTML3_Formatter +app y
1 Fig. 3. The
+app y
Formatterclass structure.
2http://www.jclark.com/ ~http://www.uottawa.ea/~ dmcggins/lndex.html 4http://remarque.berkeley.edu/~ muir/free-compilers/TOOL/ASN l - l .html
Janne Saarelaet al.
606
Table 1. Semanticcategories Categories Social event Social event Location Arts
Subcategories Value marriage domestic music
5 5 5 2
Basic Encoding Rules (see ISO/BER). The formatter is, however, still incomplete and does not produce a comprehensive output description. 3.1.5. Article classification and rating scheme. With a separate Semantics class we address the problem of classifying the contents of an article apart from its actual representation in a systematic way. The approach we use is a straight-forward implementation of rating categories of two levels. An instance of the Semantics class is attached to each instance of SingleComponent. This helps the editor when he/she plans, for example, a View with contents of a given type. One instance of Semantics class is also linked to every user as the users may have personal profiles which determine the semantical contents they are interested in. The user profile can be used to automatically filter information from all available articles to a personal subset of that material. The semantic metadata must, however, be provided by someone early in the editorial process. We suggest the following scheme for semantic metadata: a fixed set of semantic categories each having a value from 0 to 5. The bigger the value, the more this information is relevant to the given semantic category. 0 indicates a total absence of relevance to a given category. These categories can have one level of subcategory. This is how we extend the categories but still remain in a level abstract enough not to get into single keywords which describe the semantics. Table 1 gives an idea what a semantic metadata entry might look like for an article describing the wedding ceremony of a celebrated domestic music artist. The categories must be defined before they are used as they need to be available on an equal basis for all news material generated with this model. The categories vary between different publications as the contents may be totally different. For example, business news versus evening newspapers. The person who assigns these categories and their respective values to SingleComponents, should ideally stay the same as otherwise the subjective classification of these entries might vary too much.
3.2. Using the semantic classification to filter information In addition to these categories, the users can type in a list of inclusive and exclusive keywords. The inclusive keywords are used to select the articles no matter what the semantic rating, and exclusive keywords are used to discard articles. Now, it is necessary to stress the two-fold nature of Semantics. 1. An editor assigns values to each category from his newspaper's, i.e. the product's, point of view. 2. The user can set a profile based on his/her personal interests. For example, value 5 for sports category will make sure the user gets articles with rating 5 for sports. An example interface for setting the values using Java is shown in Fig. 4. As an example there are five articles having different ratings 1, 2, 3, 4, 5 for sports. The user says he wants sports with value 4. He will now get all articles having the representative value greater or equal to 4, i.e. 4 and 5. If he/she wants to be more specific and sets a value for any subcategory of sports such as motor sports and clears the main level sports setting to 0, other sports news will not be supplied for him/her.
Logical structure of a hypermedia newspaper
607
Fig. 4. Configuring the user profile with a Java applet.
4. USING THE MODEL 4.1. Generating views Instantiating the class objects and making the associations between these instances will be left to the editor. His task will consist of creating documents and rating them with the semantic metadata. In addition he makes logical associations between the document instances and generates Views. An instance of the View class is an ordered collection of SingleComponent or GroupComponent instances. The main activity of the editor will be the creation of these Views which form a single Publication. An example of the Views is given in Fig. 5. This task is somewhat similar to the one using Hypermedia Design Model (see Garzotto et al., 1995) implemented by Kesseler (1995). It is worth noting that the structure will be a graph instead of a tree due to the fact that different Views may have common children. Figure 6 presents a prototype tool developed for this purpose.
I (
I
IntemaUonal
news
Sweden declines EMU
Politics
EUdirective for music
France bombing Atolls
ViewI
I
Domestic
='
Component I -10% I UnemplSingloeyment
Fig. 5. Two views sharing the same components. [PH 33:5-E
View1
Agasslwins
I
1
SingleComponent Padament on leave
1
608
Janne Saarela et al.
The concrete representation of these Views will be generated when needed. This we call the late binding of views. Each instance will know whether it has already been converted into a given concrete representation or even whether it is its duty to do the conversion. This concept helps us with the distribution scheme described in more detail later in this paper. It should be noted that each object in this model will have a unique identifier consisting, for instance, of a date and a positive integer. These identifiers will have an infinite lifetime which guarantees that a single instance of SingleComponent or an instance of Audio can be retrieved from the system at any time. Having the identifier-concrete representation mapping table available at all times helps to reduce the storage requirements at the server, as not all the concrete representation need to be available at each server. If the concrete representation is not here, we know where to find it.
4.2. Logging activities Once a client requests an instance of the Component class, his request will be resolved through a logical resolver. Its implementation is very simple; it finds a corresponding instance given a logical name, sees whether this object has the physical representation at this server in which case it simply returns it. In case the concrete representation is missing, it will be either generated on-the-fly or fetched from the server that has the MediaObjects associated with the
Component. Once a document is retrieved from the server, a new association between a user specific
LogBook instance and the retrieved Component will be created. On regular basis, these associations can be collected to one central server which can perform a global analysis of the user behaviour. Figure 7 shows a user who has fetched three Components and annotated one of
Fig. 6. A Java interface for generating a Publication.
Logical structure of a hypermedia newspaper
I
gaulwins
JanneSurela
EUUnemployment -10%
609
Userl
EU
directivefor music
Fig. 7. Logging activities and annotations using LogBook and Annotation.
them. These log entries will work as a basis for the intelligent producer agents which work at the server end analysing the clients. The agents are described in more detail in Turpeinen (1995).
4.3. Annotating components The user will be given a chance to evaluate the Components by giving, for example, a value ranging from 0 to 5; 0 indicating no interest at all, 3 some interest and 5 I-want-more-of-these. These instances of the Annotation class will be linked from the User instance to several
Components. The annotations can be processed later and used in the analysis of the user profile in trying to find out what type of articles the user finds interesting. Letting users see other people's ratings facilitates social filtering first introduced in Malone et al. (1987). This annotation process is separate from the user profile described in Section 3,1.5.
5. IMPLICATIONS OF THE MODEL
5.1. Lost concept of an edition The edition of a hypermedia newspaper complying with the model is a collection of the latest versions of the introduced objects. An editor can generate Views once a day to give the reader a daily newspaper, but the MediaObjects can be introduced at any time. A new version of an object replaces the old one. The new version then has a history link to the previous versions. Figure 8 presents this situation. It is also possible for the editor to create new articles during the day. Once he or she introduces a new SingleComponent into the system, he or she has to either update a View or a GroupComponent in order to reflect the new association to this article. It is once again worth noting that if the user does not want to go through the predesigned Views, he can take advantage of the filtering process and thus be able to include the new article in his personal newspaper.
610
Janne Saarela et al.
I
I
v~
ink
Lalesle ~
~ea ~0~0ry Fig. 8. Latest edition defined in terms of latest objects.
5.2. Object lifetime As the amount of objects becomes increasingly large, an object-collection process can be run on the material. This process can store old (old as in age or as an old version) objects into a longterm storage. The object identifiers remain available at each server thus providing a way to find old material being placed at a specific storage system. To avoid a large table of identifiers being stored at each server, a identifier name server (INS) could be set up which can then have a centralised way of knowing the exact location of each resource identified by the identifier.
6. AN ARCHITECTURAL FRAMEWORK All the benefits of this model can be achieved by using a persistent store for storing the objects. The objects and their associations will remain the same from one invocation of the server to another which provides us with a solid basis for the framework. Object database management systems (ODBMS) (Cattell, 1994) can provide many different features we find useful. In addition to persistency a concept of versioning is necessary. An electronic newspaper can be taken as a collection of the latest versions of the objects in the logical model. Every time a new version of an existing object is introduced into the model, the old ones still remain available through the versioning mechanism. The object model that the object database uses may not be final at the early stage of the design process. Changes to the model will cause problems if the ODBMS is not capable of handling them. Some of the available databases can be adjusted with modifications in the attributes, methods, types and inheritance chains. We find this adaptiveness crucially important. Of other features, a query language seems appropriate. A user who has read a given article can be traced with a query expression. The language should, however, be compatible with an objectoriented language such as C + + . Several object databases do this and they also provide an ad-hoc query facility when the ODGM-93 compliant interface is missing.
Logical structure of a hypermedia newspaper
611
1. editor introduces a new component into system 2. receiving server stores the MediaOt)jects locally 3. server replicates the instances of the logical model
Fig. 9. Replicating instances instead of concrete representations.
6.1. A distribution scheme Distributing the instances of the logical model to all of the servers serving the newspaper is necessary as we wish to have the same material available at each site. Transferring the instances of the descriptive elements of our framework over a network to other servers appears more effective than having all the concrete representations of media being transferred as well. This plan is shown in Fig. 9. Owing to the huge amount of information contained in the electronic newspapers we plan to distribute the origin of different types of media into different servers. This leaves room for optimising the server load and storage for a given situation. A single server that is missing the concrete representation of an object can choose whether it wants to keep a local copy (cache) of these representations or delete them as soon as they have been served. An example situation where a server is missing the Text for a Component and fetches it from a text archive is presented in Fig. 10. It is also possible to classify different servers by their capabilities. A basic server might only hold the concrete representations of the objects and have no network connection except once a day when it retrieves the daily newspaper. Other superservers could communicate and update the news material at any time and thus reach for the real-time electronic newspapers. The centralised method of the MediaObject class tells whether a given instance can be
voting system
© 2
1
$
4
3
~ afchlvo
1. request il Component 2. requRt Ihe Toxt part of Ihe Compons~t 3. return the T~I live videofen
4. crem the r,occrm r~orm~l=tion end ¢ad~ it 5. return the co.fete ~preqNmtz~n
Fig. 10. The connected server does the late bindingof views.
612
Janne Saarela et al.
distributed to several servers or if it should remain at a single server from where clients request it every time they need it. Should there be an application, say a voting system, whose result influences other objects, say a pie chart image, the application (instance of the Dynamic class) should remain centralised at one server in order to provide all the clients with the same output. Application-level control of the distribution can be implemented in many ways. These aspects are discussed in (Korkea-aho, 1995).
7. RESULTS The object model has been implemented on a single Objectivity/DB database which provides a clean interface for object management. Features such as versioning, clustering and indexing are used in the implementation. Some tests have been conducted where articles have been composed of separate media objects and higher level structures have been designed to provide the GroupComponent and View levels of this model. Simple formatter subclasses have also been designed which can be used to set the layout for the presentable units such as SingleComponents. An example of this is presented in Fig. 11. The articles presented in this example correspond partly with the ones created in Fig. 6. The algorithms for generating the contents and the structure of the newspaper have also been verified and they have proven to work with the simple rating system presented in this paper. The ease with which a whole Publication can be created is clearly an advantage compared with the old-fashioned editing of HTML documents. However, it remains to be seen how well the editors at the newspaper companies adapt to the addition of a new type of product. What s011 remains to be verified is the efficiency of using late binding of views instead of transferring the concrete representations over the network in a distributed environment. The current implementation does the late binding of views but only in a centralised manner.
8. CONCLUSIONS A logical model of an electronic newspaper has been described. It aims to be a next generation newspaper which not only is the counterpart of the printed version but brings added value by introducing several features: • the documents are described in a structural manner independent of any specific presentation format such as HTML or MHEG-5. Separate Formatter class instances are applied to the structures to produce the wanted layout; • the material is distributed at the level of this model without binding the structure to any specific presentation format before it is needed. The servers are also capable of determining what level of functionality they provide to their clients as different servers have different storage and processing resources available; • the contents are rated with a simple two level category classification system which enables efficient filtering of information thus enabling personalised views inside the electronic newspaper; ° the users are allowed to annotate the articles. This enables social filtering, i.e. the possibility to follow popular articles read and annotated by other similar users. We believe this is the most suitable framework for an electronic newspaper. It provides a solid basis for the whole editorial process starting from traditional page layout systems such as QuarkXPress. The process consists of composing single presentable units from media objects. After this a higher structure consisting of groups of presentable units, views collecting together groups and articles and finally publications each with their own brand can be designed. Once the structure has been designed, the editor attaches specific formatting instructions to the structures on how to lay them out on different presentation environments. The product is then distributed using a scheme which binds the structures with the formatting instructions at the
~qlJo suo!lo)!untuutoD "sanss! uollenleAZ pue s!SAl~Ue 'u:~!sop e!potuJ~clAH "(~661 ) "d "!U!lOed "q '!lloU!elAI "'zI 'OI1OZ.II~r~ "fiZlS~A~-UosTPpV:VIAl '~u!Pe~I "luatuagt~ut~14[vlv(l I)a[qo "('e66|) "D ' 9 "'8 'llZlleD ',~lSZA'k-uos!PpV :VIAl '$u!pez~l "sutalx~fs v!pauqllnl,f '(Ir661) "'_:A".4 "f 'pJojn~ "LI-L 'I (0£ uef) OI/PZ stuals',(~ 8u!ffs!lqnd uo Jodabl ploq,gaS "£alsnpu! Uo!lnq!als!p Sh',ZU~ql .Ioj paepuels ql~OS ue :[lettuozl lxoI leSJ0,x!UI'i] ~I.LI"I'(~661 ) "C[ 'a~13~
'I~Potu J.IAIO ~ql ~u!sAleUUJoj ele~lUOH I~I!IAI ol S~lUeql "uo!leJoOao3 !lq~lntue, V oql )ue ~Z[~lON '(S~I~J..) J~lU~S)lU~LUdoIoA;~O~O1OUq3~I qS[UU!~I~ql ,~q p~laoddns se~x q~Je~s~J s!qJ--gluautaSpal,~aou~t,~V
•suo!ll~louul~ ,sJ#sn j o di~ q ~ql ql~.AX~lq!SSod osIe ST ~u~.J~l[I-J [l~[3os s'e q ~ n s ~oJnle~j ~nleA p ~ p p e J ~ q l o '13npoJd zql j o ~anl3nJls zql p u e SltlglUO3 oql j o u o ! l e J z u ~ i~uosaod toj saxolle q3Tqax t u z l s £ s ~Ulll3a 3t.]tlt~IM~s e j o ~sn ;~q] q~noatll p ~ i q e u ~ s! uo!leS!leUOSJ~ d •p a l s ~ n b a J aJe ,{aql qa!qA~ UlOJol J;~AJ~)S
~u~lsu! SSel,~.~a~otu.wj ~ Aq 7/4//H u! p~lletuJoj u~Jpl!q~ sl! pue .'aa!A V "11 "~!:t
r[,................................................................... __ .
,
1
~ u
19
Jodp, dsa~atl
,e,!pottlJOdAq
1~ j o
/
oJnlanJls
le,a!~o2
614
Janne Saarcla et al.
ACM 38(8), 74-86. Gibbs, S. J., Tsichritzis, D. C. (1994). Multimedia Programming. Reading, MA: Addison-Wesley. Halasz, E, Schwartz, M. (1994). The Dexter HyperText Reference Model. Communications of the ACM 37(2), 30-39. Isakowitz, T., Stohr, E. A., Balasubramanian, E (1995). RMM. A methodology for structured hypermedia design. Communications of the ACM 38(8), 3~ ~ . International Organization for Standardization. Abstract Syntax Notation One (ASN.I): Specification of basic notation. ISO/IEC 8824- I. International Organization for Standardization. ASN. l Encoding Rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER). ISO/IEC 8825-I. International Organization for Standardization. Document Style Semantics and Specification Language, DSSSL, ISO/ IEC DIS 10179. International Organization for Standardization. Support for Base-Level Interactive Applications, ISO/IEC IS 13522-5. International Organization for Standardization. MHEG Object Representation, Base notation (ASN.I), ISO/IF.,C DIS 13522- I. International Organization for Standardization. Presentation Environments for Multimedia Objects (PREMO), ISO/IEC CD 14478-1. International Organization for Standardization. Standard Generalized Markup Language (SGML), ISO 8879. Kesseler, M. 0995). A schema-based approach to HTML authoring. World Wide Web Journal, Fourth International World Wide Web Conference Proceedings, pp. 619--631. U.S.A.: O'Reilly & Associates. Korkea-aho, M. (1995). Scalability in distributed multimedia systems. Technical Report TKO-B 128, Helsinki University of Technology. Malone, T. W., Grant K. R., Turbak E A., Brobst S. A., Cohen M. D. 0987). Intelligent information-sharing systems. Communications of the A CM 30(5), 390--402. Turpeinen, M. 0995). Agent-mediated personalised multimedia services, Technical Report TKO-BI25, Helsinki University of Technology. Ozsu, M. T., Szafron, D., EI-Medani, G., & ViRal, C. (1995). An object-oriented multimedia database system for a newson-demand application. Multimedia Systems, 3, 182-203.