The prototype cloris system: Describing, retrieving and discussing videodisc stills and sequences

The prototype cloris system: Describing, retrieving and discussing videodisc stills and sequences

Informorron Processrng & Management Printed m Great Bntam. Vol. 25, No. 2, pp. 171-186, 1989 CopyrIght 0 030-S-4573/89 $3.00 + 00 1989 Pergamon P...

1MB Sizes 0 Downloads 8 Views

Informorron Processrng & Management Printed m Great Bntam.

Vol. 25, No. 2, pp.

171-186,

1989 CopyrIght

0

030-S-4573/89 $3.00 + 00 1989 Pergamon Press plc

THE PROTOTYPE CLORIS SYSTEM: DESCRIBING, RETRIEVING AND DISCUSSING VIDEODISC STILLS AND SEQUENCES* ALAN P. PARKES Centre for Research on Computers and Learning, Department of Computing, University of Lancaster, Bailrigg, Lancaster, LA1 4YR, England Abstract-Given the problem of the incorporation of videodisc technology into an Zntelligent Computer-Assisted Instruction (ICAI) framework, this article presents an outline of the methodological basis for, and the architecture and operation of, a prototype Mdeo-based ZCAZ (VICAI) system. The aim of the system is to allow the learner to control the learning environment by watching video material (stills and moving film), interrupting it, and entering into a discussion about the visual and conceptual aspects of the events in progress and the objects on view. The prototype is to be used for experimentation with learners to investigate the strategic and learner-modeling requirements of VICAI systems.

1. INTRODUCTION

Using current technology, a videodisc (typically a collection of 54,ooO individually addressable still images) can be controlled by a computer program. Available facilities can include playing sequences of images (as a moving film), indefinite time freeze frame, stopping and starting sequences at any point, and directly accessing and displaying any required frame by specifying its disc address. Often, the video screen can be touch or mouse sensitive, and permit computer-generated graphics and text to be overlayed onto the video image. The potential of such facilities in a learner-motivated instructional situation will, however, only be realized if the controlling program has access to symbolic descriptions of the visual material available to it. This article describes an Artificial Intelligence (AI) approach to the description of instructionally motivated videodisc material, based on the structure of such material, be it still frame and/or moving film. The overall research problem is this: Given that visual material is to be incorporated into an Intelligent Computer-Assisted Instruction (ZCAZ) framework, what are the requirements of a formalism via which the conceptual and visual content of such material can be discussed, manipulated, generated, and controlled in a user-responsive way by an ZCAZ system? Based on research into this problem, a prototype Video-based ZCAZ (VICAI) system called CLORIS (Conceptual Language Orientated to the Representation of Instructional film Sequences) has been built-it is written in Poplog Pop-l 1 running on a Sun 2 workstation controlling, via a Felix Link interface, a Philips VP-410 Laservision disc player and a Philips CM8533 color video monitor. This article gives an overview of the methodological basis for, and the architecture and operation of, this prototype system (see [l-4]).

2. THE METHODOLOGICAL

BASIS

2.1 A VZCAZ scenario A learner faces a video screen, holding a pointing device. On the screen a short description of the aims of a sequence of (moving) film is given. The sequence begins. From *A shorter version of this article was presented at the Conference “User-oriented Content-based Text and Image Handling” (RIAOSS) at MIT, Boston, March 1988. 171

172

ALAN P. PARKES

now to the end of the sequence, the learner can interrupt the film at any point and enter into discussion with the system about that film, such discussion being about:

I. The event that is in progress at the interrupted point (e.g., “What is actually happening now?“) in terms of its place in the temporal chain of events on view (e.g., “Why was x done before y?“), its wider context (e.g., “Tell me more about . . .“>, and why and how it was done. 2. The objects that are on the screen at the interrupted point: by selecting (i.e., by clicking the pointing device on the screen image of the object) (e.g., “What is x for?“; “TeIl me more about x”; “Show me some examples of x being used . . .“; Show me a better view of x”). The system responds using (1) generated natural language explanations, (2) generated graphics overlays and annotations and (3) further stills and sequences, all being the subject, if required, of further interactions as described above. In addition to this, the learner can use the system directly as a retrieval tool by asking questions, in some form (e.g., “Show x”). Note that this retrieval function could be utilized by another ICAI system, to which this one is an intelligent image and film retrieval tool. 2.2 The req~ir~me~ts of the system In order to achieve these levels of interaction, a system needs access to the following: 1. A body of domain knowledge describing the concepts, objects, and events for a particular domain in question, and providing access to the background information [4] necessary to facilitate discussing moving films. 2. A body of descriptions relating the terms used in (1) to images in storage, and specifications of the interrelationships among those images- to facilitate learner-system manipulation of networks of related images. 3. A collection of rules specifying the construction of new sequences. Note that new sequences may often be edited existing ones, or recombined components (e.g., shots, scenes-see [5]) - of other sequences. The architectural

realizations of the three requirements

should be separated because:

1. The domain knowledge exists independently of any medium that may be used to portray examples of that knowledge and a particular concept or event may be visually representable in a multitude of ways. 2. Images in storage should be described separately because: (a) Sequence material may not actually feature in that available, or there may be a mixture of stills and sequences. (b) Sequence material is, ultimately, a collection of still frames and, despite the fact that frames may be part of a sequence, they can also be used for other purposes 121. (c) The system should support the facility for browsing by the learner through sets of related stills. (d) The stills descriptions relate the domain knowledge to images, ultimately to frame addresses, and objects depicted in images to on-screen positional information (e.g., polygons) representing, in some coordinate system, the location of the objects on screen when that frame is on display. 3. The film rules are context and domain independent. 2.3 The meeting of the requirements 2.3.1 Events, instants, and frames. The need is stressed to avoid assuming that because the videodisc is ultimately a collection of individually addressable images, one’s concep-

173

The prototype CLORIS system

tual description of it should start at those images. It can be seen from the VICAI above that the still image has at least two interdependent uses:

scenario

1. As a picture in its own right -a photographic depiction of objects from some domain (e.g., a micrometer), which can be the subject of discussion as to the visual and/or conceptual aspects of those objects, giving access via some structure to other depictions, examples, etc. thereof. 2. As an interrupted “moment”from a moving film (i.e., an event “in progress” [6]) in which case its fuller meaning requires not only (1) above, but the situation in the film at that time (e.g., “In this picture the engineer is cleaning the micrometer”). Note that the interpretation of pictures in this way is not restricted to those taken of a man running”)the man is from a moving film (e.g., “Here is a photograph shown “frozen” in a certain position: one infers that he was actually running when photographed. So, in addition to the description of stills from the more objective perspective suggested by (1) above, conceptual description of moving film requires higher-level structures than the individual frames themselves. The argument is, in fact, analogous to that given by Turner ([7]: Chapter on Temporal Logic) who argues that description of events in terms of instants (c.f.; the frame) should proceed from our conceptualization of those events, and not from the instants, which only possess meaning by virtue of the events they constitute. 2.3.2 Event structures, and “settings”. What, then, are these structures we should describe? By way of illustration, an example can be considered, using a brief sequence of film (from a London University Audio-Visual Centre demonstration disc) also considered in earlier work [l]. The film sequence is as follows:

An engineer sits at a work bench. In front of him is a small case, from which he takes, then cleans, a micrometer. He moves a small piece of metal to the front of the bench and measures it with the micrometer. He writes the reading in a small note book also located on the bench.

Now see Fig. 1. This mapping shows the way in which the events in the micrometer film are portrayed by “settings.” Intuitively, a setting is a group (2 1) of frames sharing the same visual description (in practical terms, this includes frames that have differences deemed by the describer to be irrelevant). In terms of the event structure of the film, the setting is the minimal element of description (i.e., the level below which description is, at the level of events, unnecessary). Referring to Fig. 1, take setting D as an example. In this setting, the rods of the micrometer are shown moving apart. In the film this setting is on display for 6.6 seconds (i.e., occupies some 150+ frames-at 25 frames per second (f.p.s.)). The only changes in the frames in D concern the distance between the rods-in each and every frame the learner could point to the same screen locations to indicate the same objects. Now consider the actions. The unscrew collar one, for example, is shown by: 1. Setting B. 2. A cut to setting 3. A cut to setting

C showing D showing

a closeup the rods.

of the engineer’s

hands.

In setting D the rods are moving, but the engineer is not shown explicitly carrying out the action-only its effects are shown. If the film were to be interrupted by the learner at this point, the system could quite reasonably say: “The engineer is tightening the collar,” but it would be unwise to say this if such a picture was arbitrarily retrieved as a still from the disc. Similarly, though the system can tell that at that point the engineer is tightening the collar, this does not mean that the engineer, or even the collar itself, can be seen- the par-

174

ALAN

P.

PARKES

The prototype

CLORIS

system

175

titular setting at the interruption point indicates which objects are on view. The emphasis concerning the meaning of the picture has changed in and out of the moving film context, and the displayed visual characteristics of the current event cannot be reliably inferred from a description of the event alone-an excellent illustration of both the interaction between, and the need for the separation of, the descriptions of the stills and events. 2.3.3 The settings structure. Consider now the relationship between the settings themselves. C is, in effect, a zoom in of B, while D is a zoom in of C, and thus, transitively, a zoom in of B. There are other relationships suggested by zoom outs, pans, and tilts (see [5] for a glossary of film terms). Figure 2 briefly describes some of the setting relations. Such relations, however, can exist independently of the moving film scenario. Imagine a database of stills representing paintings. It would be quite feasible to show details from a painting by means of zoomin pictures: among those zoom ins would be those which were pans, etc. The structure which, for a collection of settings, maintains the details of the setting relations between any two of those settings is called the settings structure. For a diagrammatic representation of (part of) the settings structure for the micrometer film see Fig. 3. The settings structure provides a way of accessing alternative views of the same object(s) in the still frame situation and also provides a basis for making temporal inferences when the settings are part of the temporal chain in the moving film; a simple exam-

X a Y (Zoom In) X is a zoomed-in version of Y i.e. some detail from the setting Y has been represented in full screen size in X. Example: C Zj 8.

X & Y (Zoom out) The inverse of B i.e. Y is a zoomed-in version of X. Example: Cl Zp D.

X M Y (Modlfled) X and Y are the same settings save for some change. The actual change is difficut to quantify, but we wish this definition to apply to changes which preserve some non-empty sub-setting common to both. Example: I M II.

X bjZf Y (Modified zoom-in) X is a zoomed-in version of Y with modifications (as for M) which affect the image in X. There must be a setting (though not necessarily existing as such) Y’ such that Y 4Y=>XMY. Example: B J&j A.

Xm The inverse of JJ~&.

Y (Modlfled

zoom-out)

X a Y (Pan-left) X is a left pan of Y i.e. a setting which, in the (visible) world which Y is a part of, is located to the left of Y. Example: Et fl F.

X&Y The inverse of _Pr Example: F & E.

(Pan-right)

Fig. 2. Some of the setting relations described. Note: settings in the examples are as in Fig. 1. C, , E,, and I, are the resulting settings described in notes (a), (b), and (c) in Fig. 1.

176

ALAN P. PARKES

Fig. 3. Part of the settmgs

wucture

for the micrometer

film. NO/~: Fettmgs are as m Fig.

1.

ple: the book appears in setting A but does not appear again until written on (setting I). Following back up the settings chain facilitates finding (and showing) the appearance and location of the book. The settings structure was influenced by the temporal relations transitivity table [B]. 2.3.4 Temporal event hierarchies. Reconsider Fig. 1, in particular the micrometer film event hierarchy. The informal interpretation of such a hierarchy can be seen in the cleanmicro example: Cleaning the micrometer involves: 1. 2. 3. 4.

Holding the micrometer. Lifting the cloth. Wiping the rods. Replacing the cloth.

Note that each of the sequence of actions could be an event in itself, and require other actions as its subordinates-as in measure metal. Thus, an interruption in any of these subevents can be described: “The engineer is cleaning the micrometer, and at this time

The prototype

CLORIS

177

system

is . . .” etc. The interpretation of an event as being true of all its subevents is called downward hierarchicality [9] and is an important inference requirement of the type of system envisaged here. 3. THE CLORIS

SYSTEM

ARCHITECTURE

In order to demonstrate the applicability of the methodological basis as just outlined, a prototype VICAI system has been developed. The system is called CLORIS and consists of a controller module (also called CLORIS) and three representation modules (Fig. 4). Two of the modules use the conceptual graph formalism [lo] as their basic building block. This notation was chosen because it is programming language and domain independent; is well suited to describing objects, concepts, and events in a uniform notation; and has clear psychological, philosophical, and semantic foundations.

1 Domain

A particular micrometer, ‘ml’ is...

Representation

Inference

System

CONCEPTS.

~ >J

ECTS.

of metal involves...

MORISS Module

STRATEGIES SEQUENCE

of Rules for the

FOR IMAGE/ RETRIEVAL.

To show a man looking at something, first show a shot of the man looking out of shot, then show a shot of what he is meant to be looking at...

Information

In frames 200 -300. the micrometer ‘ml’ is located on screen in the

in Settings -

I I

SETTING

THE

DESCRIPTIONS.

SETTINGS

STRUCTURE.

Fig. 4. CLORIS

system architecture.

I

_

Setting X is a zc0m in of setting Y; a left pan of setting 2, which is...

ALAN P. PARKES

178

3.1 DORIS: Domain Representation Inference System This module provides the conceptual descriptions of the concepts, objects, events, and instructional stories to be told in a domain. The constructs utilized are as follows (and are input in the form shown in the related figures): A type hierarch~v l l

l

The definitional hierarchy of type labels used in the system. A domain-independent skeletal form of this resides permanently in the CLORIS system, to which a describer adds the terms for his or her domain. An illustrative part of such a hierarchy can be seen in Fig. 5.

Types, relation, and schemata definitions l

These are as given in [lo] (e.g., see Fig. 6).

Script abstractions Are based on the work of Schank et al. [11,12]. Are abstractions describing a stereotypical sequence of events (e.g., the sequence of tasks in the micrometer film cited). The overall form is basically Schankian, but they take more advantage of the temporal hierarchies as discussed earlier (see Fig. 7). Make no reference to films or any other media, since they exist independently of any means used to convey their information content. 3.2 BORIS- Bank of Representations of the information in Settings This module exists to relate the terms defined in DORIS to still images in storage. The descriptions are in the form of settings (see Fig. 8). Setting descriptions are as visually objective as is possible (i.e. they are descriptions of the objects that are on view in the frame(s) specified (not those that would be “present” in the real world-when the rods are on display and the engineer is tightening the collar, he is not visible, but he is present: this is not a valid inference if the picture of the rods alone is displayed out of the moving film context). Similarly, setting descriptions can also include descriptions of relationships between objects, but the same point applies here too: only visible relationships should be specified (e.g., if the setting shows a book on a shelf, for example). A further point is that any or all of the polygon specifiers are optional, a setting description may indicate the visual “scene” without saying where (on the screen} the objects are- the facilities to map the screen may be unavailable or not required.

...

/ SpeclaliSatlon

PHYSICAL

OBJECT

.-. MEASURING

INSTRUMENT

Fig. 5. An (illustrative) part of a type hierarchy.

The prototype

CLORIS

system

179

MICROMETER

“A micrometer

is used by an engineer to measure a variety of

metal objects.”

(PART 1

APART \

n

“A micrometer has a fine adjusw a lock, and measuring rods. The collar is located between the two end shafts of the micrometer, on the mid shaft. The fine adjustor is located on one end, the rods on the other. The lock is between the collar and the measuring rods.”

Note: Schemata are PLAUSIBLE, not LOGICALLY CORRECT definitions. A type may have arbitrarily many Schemata - a SCHEMATIC CLUSTER. The types in the schema definitions above can, of course, be the subject of schematic definitions. The “depth” of the descriptions is the responsibility of the domain describer. The basic inference operation on schemata is the SCHEMATIC JOIN which, intuitively, is the largest “spreading” join beginning on one concept of two graphs. Inference involves selecting the most relevant schemata for the starting graph (i.e. a “question’ or “goal”) and joining them to this graph - see Sowa 1984. Parkes (1966b) gives the definitions for the micrometer film example.

Fig. 6. Two schemata

for the concept

“micrometer.”

In some cases (i.e., a database of unique still images), each frame would require a separate setting description. As we pointed out, however, one often finds that a setting description can apply to several-usually contiguous-frames. This module also contains the settings structure-a table giving the setting relations between any two settings. 3.3 MORISS: Module of Rules for Interpreting the Structure of Sequences This can be described as a set of procedures which, given an event structure specification and a collection of settings, produces a moving film. Since these rules will be psychologically motivated (some derived from [13]), they can comprise a model of the likely interpretation, by a viewer, of certain film sequences, and a predictor of likely errors in such interpretations. Only a rudimentary form of this module exists at present (see Fig. 9 for a typical filmic rule). The three modules have been described, so it is time to consider the operation of CLORIS.

180

ALAN P.

PARKES

SCRIPT for MICROMETER-USE PROPS: [MICROMETER:‘ml] ROLES:

[ENGINEER:

ENTRY CONDITIONS:

[METAL-OE!JECT:‘p

‘el] [TAKING-MEASURE]-

as “The engineer is

interpreted

POST CONDITIONS: SCENE 1. ACTION 1. [ENGINEER:‘el]~-(AGNI)~-[UNCASE]-~(OBJ)-~[MiCROME~R:‘m1]. ACTION 2. [ENGINEER:‘el]c-(AGNT)~-[CLEAN]-~(O&))-~[MICROMETER:‘m1]. ACTION 3. [ENGINEER:‘el]<-(AGNT)c-[TAKING-MEASURE]->(O~)-~[METAL~OBJECT:*pl] ACTION 4. ~ENGINEER:‘el]<-(AGNT)<-[RECORD-MEASURE)->(OBJ)->[METAL-O~JECT:’pi].

I Any and all concepts can be schematically defined, or have type/relation definitions associated with them.

Fig. 7. DORIS

abstraction

script

for “using

the micrometer.”

\ setting

F’

frames

3000 - 3096

screen

~~UMB~tlJ

A-1

=> ~(0,0),(0,7),f23,t4),(25,13),(25,12), CW’),(17,W

[COLLAfkl].

=~~~,17).(11.20),(16,23),(28,13),

?> srels

Pr) -> I9 (MZi) -> [C J

Glvss the lmmsdlste settings related to this one - see

priority

1

Each polygon has a describer specified prlorlty number (default 0). 0 lndlcstes front, the greater the no. the further back the oblsct in the Image: Thus,slncs the two polygons overlap, the system

polygon specifier _ entered by describer drawing round object on screen ush-lg nlO”SB. Uses: system can highllght obJect or detect when user clicks Inside polygon, thus aeisctlng the object.

Figure 3. At complfstlon time these 8,s used to propagate the constraints of the relatlonal properties of the setting refetlona throughout the settlnys structure.

knows that the thumb partly hides the collar. The user can be asked which oblect he requires when clicking on more than 1. The context Will usually dstsrmlne &he most llkely choice. however.

Note: setting dsflnitlons can also Include a colfectlon of conceptusl graphs with the syntax word “false” preceding them, end these are taken to be false about the Images described - sss Parke, 1966b Ior the uss of these.

Fig. 8. Example

setting

def’inition

for scttrng

“I’.

The prototype

Rule

of Co-ordination

Based on Principle

CLORIS

Reduction

I of filmicity:

181

system

(Carroll,

1980).

l

When actions are repeated in scene, noninitial iterations may be reduced by havmg some subsets of their identical repetitive elements deleted

Formally:

41

Al

..

A = action.

Conditions:

Example:

AAA ..

An

+

(1) A’j .. Am is a continuous A’1 . . A’n. (2) Al = A’i for all i.

subsequence

of

The micrometer film actually continues beyond the end point of the analysis in this paper. Immediately after the figure Is written In the book (setting I), there is a cut, and the actions from TIGHTEN COLLAR on are shown for a new piece of metal.

* “fllmlcity” is to film as “syntax” is to language Carroll distinguishes between this and “cinematuty”, which is concerned with the aesthetics of film

Fig. 9. A “filmic”

4. OPERATION

rule.

OF THE CLORIS

SYSTEM

CLORIS exists to demonstrate the realizability of the VICAI scenario. The basic assumption is that CLORIS displays sequences, monitors user activities and interruptions, answers questions (including retrieval) and discusses any part of the film or any still image that the user desires. The output from the system is: Generated natural language l

This will use a conceptual graph to natural language generator being developed as a further project.

Textual and graphical annotation l

“Shading objects in” and labeling them (from the setting polygon specifiers: see Fig. 8).

182

ALAN P. PAKKES

Further stills and/or sequences Replays of the same sequence (or parts thereof) and short explanatory sequences or example stills, etc. * Stills: by interpreting the settings structure to find suitable closeups or longshots of a particular object etc. l Sequences: by consulting DORIS for event descriptions, these being mapped onto existing sequences, or used as a basis for the generation of new ones (via MORISS). l

CLORIS will also use a learner model [14] to guide its discussion with a particular learner. The prototype, however, uses a rudimentary model consisting of a record, for a user, of the level of explanation reahced for each concept in the domain. Finally, since the CLORIS system is required to discuss the events in a moving film, relate those events to the domain material, to other sequences and to stills portraying them, and be capable of discussing the event in progress at any stage in a moving film, the constructs called scripf ap~licuti~~s/instantiatio~s are introduced. Any script abstraction in DORIS can have any number of applications associated with it (since a piece of narrative can, in general, be represented by various pieces of film). The application is derived from the abstraction, but presents a more detailed breakdown of the events involved, in respect of the particular actors and props, ultimately relating those events to start-end disc locations where film phrases showing those events can be found (see Fig. 10 for an application example).

A_l

SCRIPT APPLICAT~

for MI~R~E~R-US~

[ENGINEER: Jim] [MICROMETER:ml]

[METAL-O@JECT:pl] either

~~~~:~~

they

are

lnstructlonally

~

ACTION 3.1. ;501,2904 [ENGINEER: Jim]<-(AGNT)<-[LOOSEN]->(OBJ)->[COLLAR]. ACTION 3.2.2905,2989 [ENGINEER: Jim]<-(AGN~c-[P~PARE]->(O~)->[M~AL~~E~~pl]. ACTION 3.5.2990.3096 [ENGINEER: Jim]<-(AG~c~lGKFENl_>(O~~>[~~R]. ACTION 3.6.3201,33X [ENGlNEER:Jim]<-(AGNT)2-[AblUST]->(OBJ)->[FlNE-A~USTOR]. ACTION 3.6.2.3211,3390 [ENGINEER: Jimk-(AGNTj+lGHTEN~~(ORJ)->[FINE-ADJUSTOR]. ACTION 3.7.3391,34?7 [ENGINEER: ~~~~~N~c-[L~-AC~ ACTION 3.7.2.3421,34?7 [ENGINEER: Jim]<-(AGNT)c~MoM]->(~)->[L~~. ACTlON 4.3478,3576

Note that some actlons have been developed to a greater depth than In the abstractlon - this reflects the prevalllng Instructional requirements

Fig. 10. A script application.

The prototype

CLORIS

183

system

At compilation time, CLORIS forms mappings between the following: 1. The storage-independent DORIS scripts and the storage-dependent CLORIS applications. This involves the filling in of role and prop slots, etc. 2. The application and the settings used in the film sequence to which that application refers by associating the actions of the application with the settings encompassed by the frame ranges of those actions. It is on the above mappings that the system’s dialogue about a piece of film is based, for it is via 1. The terms used in the application that the DORIS module can be used to derive explanations for those terms. 2. The settings in BORIS that the graphics can be generated, objects selected and highlighted, user selection of objects detected, object names used to obtain DORIS descriptions, etc., other views of objects and related objects obtained, etc. 5. USER QUESTIONS

The user is allowed to interrupt a moving sequence when required (by pressing one of the mouse buttons), and, at the present time, the subsequent user input at such an interrupted point takes the form of selection (by mouse device) from hierarchically organized popup menus (which appear, as does all input and output, as overlays on the video screen) of predefined question types. Examples of some of the menus involved in the prototype can be seen in Figs. 11, 12, and 13. Ideally, it would be desirable to allow the questions to be input by the user as and when required (e.g., in natural language). The approach has been to utilize a question typology [ 15,161. It is not being suggested that the menu approach is optimal-the aim is to demonstrate the question-answering power of the prototype. For demonstration purposes, the menu option titles have been chosen to reflect their correspondence with the question types. 6. RETRIEVAL

QUESTIONS

As previously mentioned, it is desirable that the system possess the capability of answering arbitrary retrieval questions. The CLORIS controller actually has the basic form of such a facility, but at present it issued to deal with system-generated requests for images satisfying a particular internal query. To make this facility user accessible would involve addressing problems regarding the processing of external (i.e., user-formulated) queries, which is beyond the scope of our current research. However, the basic query mechanism in CLORIS, which is outlined here, would be the target form of those queries.

INTERRUPTION Talk About

Events

Talk About

Objects

Note: this menu can be called up on interruption of the film sequence.

MENU

Resume

see Fig

12.

-

see Fig

13.

-

continues the film end (or interrupt)

QUIT

Fig. 11. CLORIS IPM 25:2-E

sequence

interruption

menu.

IO

184

Atnx

P. PARKES

resume

1ter

\

sound/spesd~ parametars.

rovldss othar sxamptes of the current BTlf (I.% sequences/stills) _ only appears a~~llcabl13 (1.P. examples exist).

lext,

Note: (1) In all cases “expfanirtim” uses ~$1 lhreie Ct_OORIS autput modes I.e graphics and film. (2) Immediately prlcr to the eppearance of this menu, ih8 learner IS asked to choose B sub-sequence from the t)wnls up to the lnterrupt~on point in the film. It Is on this sub-sequence that the subsequent explanations are

based, until the learner elects lo change It. Fig. 12. CLORiS

talk-about-events

menu.

This menu follows selectlon of an ob]ect (by cllcklng the mouse dn the image of that abject), after choosing “Talk About Objects” (see FIG The object ia hlghllghted end labelled with Its nsmo.

Thls

Each of these appears only If eppllcable. E.g. if micrometer film tnterrupted In setting 0, farger view wcuid be C. a “left of” would be E, A “right of” would be F. There would be no closer view etc.

Left Of

optlon

appears if there are any events on film In which this object appears (not inchIdIng current sequence).

Others

Events

tnvatving QUIT

Fig.

13.

CLORIS

talk-about-thls-ub~e~[

screen 11).

nmlti.

The prototype CLORIS system

I85

6.1 “Given” and “show” The underlying retrieval mechanism involves the use of conceptual graphs as both the Here, representation formalism (i.e., in settings) and the vehicle for query expression. English will be used to make the arguments more intuitively appealing. Suppose we have a setting description containing the description: the exercise book (b) is located on the desk (d) (where b and dare constants). Now suppose a user query had been expressed: show: the book (which is used by the engineer) workshop).

Internally,

CLORIS

would represent

on the desk (which is in the engineer’s

this as follows

(but in conceptual

given: that the engineer uses a book (call it x) and the engineer’s workshop desk (call it JJ); show: the book

(x) located

graph form): contains

a

on the desk (y).

In the form, the query expresses two things: (1) the givens: these are constraints that are to be applied to the objects to be shown before an attempt is made to show them. (2) the show part: this expresses the visual part of the query (i.e., that which is actually to be shown (after applying the constraints)). Note that if the givens had been omitted from the above, the query would be a request for any images showing any book on any desk. The system’s action would then proceed thus. (1) Find out which particular book is used by the particular engineer, and the particular desk in a workshopthere may be more than one in each case. (2) Restrict the concepts in the query appropriately. Let us assume that the type hierarchy (q.v.) contains the information that an exercise book is a type of book. Then the system may produce: show a setting in which the exercise book (6) is located on the desk (d) - which can be satisfied. The reasons for this formulation of the query mechanism are as follows: 1. The only visual information required in the query is that the book be shown to be located on the desk: expressing the whole of the initial query as a show: request to CLORIS would mean: “show the engineer using the book when the book is located on the desk in the workshop”-a reasonable request but not what is required. So, the objective is to separate the specification of the visual information required from qualifications applying to that information but not to be included in it. 2. It is unreasonable to assume that the user would know the internal identifiers for the objects-the query asks for a particular book, but the user cannot simply express the name by which that book is known to the system. 3. The total query may contain relations and concepts that are not visible (e.g., “show the car that Jim owned in 1986”) actually becomes: “given that Jim owned a car in 1986, show that car” (i.e., “owned” is the invisible relationship). In conceptual graph terms, once the show part of the query has been constructed (by joining in all relevant schemata and prototypes for the givens part, and applying the necessary changes to the show part), the problem then becomes one of matching the resulting show query against the image description graphs. Some useful work in this area has been done by Gecsei [ 171, using a scheme in which, interestingly, conceptual graphs are also used as the image and query representation formalism. In this scheme, an image is said to satisfy a query if the query graph is a subgraph of the image graph. This is, however, a slight simplification: consider the query (with no constraints) “show a book.” Given the assumption above regarding exercise books, the image description above would be suitable, but since “book” is not a subgraph of the description, this image would be unretrieved. What is therefore required is that there should be a projection of the query graph in the image graph, that is, that a subgraph (possibly all) of the image graph can be derived from the query graph by type restrictions (i.e., replacing types by subtypes and generic variables

186

ALAN P. PARKES

by individual markers). In logical terms, this means that the formula representing the image graph implies the formula representing the query graph. Note that this scheme is a generalization of Gecsei’s, since a subgraph is a projection, in which all type labels and markers have been replaced by themselves. The point about individual markers is important since pictures depict actual objects, so it is arguable that there should be no variables in a setting descriptionhowever, this is not something a system would necessarily need to enforce.

7. CONCLUSION

This article has presented an overview of the assumptions behind, and ture and operation of, the CLORIS prototype VICAI system. The purpose type is to assess the validity of the methodological assumptions previously the aim of achieving a VICAI scenario of learner-orientated discussion about moving film. At the present time, experiments are being performed involving acting with the CLORIS system. These experiments wili yield information 1. The suitability of the chosen interaction styles. 2. Desirable dialogue management strategies for intelligent multimedia 3. Special requirements of user modeling in VICAI systems.

the architecof the protodetailed with pictures and learners interabout:

systems.

For now, the onus is on showing that the required information to answer user questions can be obtained-future work will need to concentrate more on the optimal ways of formulating and using such information. Acknowledgements-This research has been supported by the Science and Engineermg ica (Cambridge) Ltd. Special thanks and love to Anna and Betty.

Research

Council and Log-

REFERENCES of a sequence of educatronal film. Technical Report 28. Centre I. Parkes, A.P. The analysts and description for Research on Computers and Learning, Dept. of Computing, University of Lancaster; 1986. factors mfluencing the design of a representation language for edu2. Parkes, A.P. Temporal and conceptual cational motion films. Working paper, Centre for Research on Computers and Learning, Dept. of Computing, University of Lancaster; 1986. of a computer system with which a trainee can discuss the con3. Parkes. A.P. An overview of the architecture ceptual and visual content of educational film Technical Report 29, Centre for Research on Computers and Learning, Dept. of Computing, University of Lancaster; 1986. representanon language for educational fdms. Programmed Learning 4. Parkes, A.P. Towards a script-based and Educational Technology, 24: 234-246; 1987. 5. Monaco, J. How to read a film. New’ York: Oxford Umversity Press; 1977. D.V. A temporal logic for reasoning about processes and plans. Cognitive Science, 6: 101-155; 6. McDermott, 1982. 7. Turner, R. Logics for artificial intelligence. Chtchester, England: Ellis Horwood; 1984. of temporal knowledge. In Proc. 7th international Joint Conference 8. Allen, J. An interval-based representation on Artificial Intelhgence, 221-226, August 1981, University of British Columbia, Vancouver, Canada; 1981. In Proc. 7th European Con9. Shoham, Y. Reified temporal logics: Semantical and ontologtcal considerations. ference on Artificial Intelligence, 390-397, July 1986. Brtghton, England; 1986. structures: Information processtng in mind and machine. Readmg, MA: AddisonIO. Sowa, J.F. Conceptual Wesley; 1984. Hi&dale, NJ: Erlbaum; 1977. 11. Schank, R.C.; Reisbeck, C. Scripts, plans, goals and uilderstanding. understanding. Hillsdale, NJ: Erlbaum; 1981. 12. Schank, R.C.; Abelson, R.P. Inside computer psychology of cinema. The Hague: Mouton; 1980. 13. Carroll, J.M. Toward a structural student models. 1st Annual Review of Computer Science, 1: 381-450; 1986. 14. Clancey, W.J. Qualitative IS. Hartley, J.R.; Smith, M.J. Question answering and explanation giving m on-line systems. To appear in Arttficial intelligence and human learning: Intelligent computer-aided instructton. London: Chapman and Hall; 1986. Htlfsdale, NJ: Erlbaum, 1977. 16. Lehnert, W.G. The process of questton answering. Personal Communication. Department of Information 17. Gecsei, 3. Browsing techniques in image databases. and Operational Research, University of Montreal, C.P 6128, Succ. A, Montreal, Quebec, H3C 357, Canada; 1987.