H o w to handle an image STEVE SCRIVENER The Human-Computer Interface Research Unit, School of Mathematics, Computing and Statistics, Leicester Polytechnic, Leicester, UK
This paper argues that conventional computer graphics techniques are inappropriate for dealing with images and hence for applications where the principal design concern is the synthesis of an image. An alternative approach to computer graphics defined as image handling is a way of utilising more fully the imaging power of the new generation of computer graphic displays. A number of basic functions for handling regions and collections of regions in an image are described to illustrate this approach. Keywords: computer graphics, image handling, 2D images
During the last few years the number and variety of computer graphic displays has increased markedly. A feature of the new generation of displays is the inclusion of greyscale and colour. Just a little while ago a colour display was the exception whereas now it might almost be said to be commonplace. Another significant development is that in many systems a standard colour television functions as the display device. The trend towards computing systems that are compatible with, or can be interfaced to, standard video equipment has considerable promise. The arrangement illustrated in Figure 1 where the computer is linked to various video devices and can communicate with them is technologically possible. These developments mean that the computer has the potential for handling images of a complexity and variety comparable to other image making media. The insidious green vector of the familiar CRT should soon be a thing of the past. However, the fact that the new graphic displays are functionally more powerful and in general cheaper than earlier models does not necessarily imply that they will actually prove more useful. There are still many problems connected with the storage and retrieval of images and it is also not easy to provide facilities that allow the designer to manipulate an image as he wants.
I Camera
Computer I
n
Tablet Figure 1.
Videodisc
tt
kl
=[
I Iv,o.o-i
Videosystemlinkedtocomputer
SOME PROBLEMS WITH INTERNAL MODELS OF THE IMAGE It is a characteristic of the design process that decisions are often tentative; the designer may change his mind about earlier decisions. Computer graphics systems are often designed in such a way that the design process when using them must necessarily exhibit a certainty of direction that is not evident without a computer. This is partly due to the prevailing approach to the design of computer-aided design systems in which the object being designed is represented in a data structure and the image is displayed by a viewing algorithm that examines the data structure. (See Figure 2.)
J structure Data H
olgorilhm~
~ DisplOy
Figure 2.
vol 4 no 1 january 1 9 8 3
Datastructuredisplayedtouserthroughviewingalgorithm
0142-694X/83/010035-07503.00 © 1983 Butterworth & Co (Publishers) Ltd
35
F
a
b
Figure 3. Creation of two shapes (a) as displayed (b) as perceived
The consequence of this is that the user can only perform operations on the developing image if they are consistent with the computer's representation of it. To illustrate this, consider the following example in which two shapes are created using a hypothetical graphics command language, Figure 3(a): BEGIN SQUARE LINE 10, 10, 10, 20 LINE 10, 20, 20, 20 LINE 20, 20, 20, 10 LINE 20, 10, 10, 10 END BEGIN CROSS LINE 15, 10, 15, 20 LINE 10, 15, 20, 15 END The BEGIN signifies that what follows is to be regarded as a shape and the parameter that is associated with this command defines its name. The shape isthen generated by using drawing commands that produce lines from the first two parameters of the command to the last two. These shapes might be represented in a data structure as shown in Figure 4. This internal data structure comprises the computers model of the displayed image and it is with this that the user interacts, Figure 2. As the user effects modifications to the image (eg MOVE SQUARE) these are reflected in the data structure which in turn causes the displayed image to change. Depending on the flexibility of the system the user may manipulate the SQUARE or the
I
Square
Figure 5. An ambiguous image
CROSS or perhaps individual lines within a shape. However, it is not usually possible to operate on the image as if it consisted of four squares, Figure 3(b). To do this requires a degree of restructuring that is not usually supported by graphics systems. In some applications, such as circuit design, the set of design components and their structural relationships can be established to a degree that makes mismatch between the machine's and the user's model of the designed object unlikely. However, in 2D design the image is the user's principal concern. Images can be constructed so that the perceiver arrives at several interpretations; images can be ambiguous, Figure 5. Elsewhere 1 4 it has been argued that the inherent ambiguity of imagery plays an important part in the visual arts. It is not unreasonable to expect that ifthe user reinterprets the image then he may want to manipulate it in terms of the new interpretation. Thus, referring back to the earlier example, he may wish to operate on four squares rather than a square and a cross. In general, computer graphics systems have been developed for applications where the user's understanding of the design object is more or less constant and the likelihood that the system will be required to handle wholesale restructuring of its internal model remote. If we represent the situation differently, Figure 6, and treat the image as input to the machine rather than output from it, then the internal model can be regarded as
,Ol2O
I-I
[; I,o I,ol 20 I 20 20 I I0
[ I\1
Display
Cross
I-I
I 1 ol,ol
I0 I I0
d l'°l'°l
15 1 20
Machine's interpretation of image
T
2ol 15 Figure 4. Data structures representing the shapes created in
36
Figure 6. Machine model viewed as machine interpretation of image
DESIGN STUDIES
the machine's interpretation of the image, although the user may not appreciate the fact he is communicating with the machine about its interpretation of the image and not the image itself. Consequently, if the machine cannot adapt its interpretation then mismatch between the man and the machines understanding of the image can occur. Another problem with the 'model' is that it is a sparse representation of the image. In Figure 4 for example, only the boundaries are represented. The blank or enclosed areas of the image, although important perceptually, are not described and even the lines are only represented as endpoints. What this means is that the user can only talk about those aspects of the image held in the model. Finally, as implied earlier, it is often the case that the designer is not aware that he is interacting with the system's internal model of the image rather than the image itself. Typically, as the designer issues drawing commands they automatically cause entries or modifications to the data structure. Although the designer may regard his decisions as tentative because of the relative sparsity and inflexibility of the data structure he is implicitly taking actions from which he may not be able to recover. It is this sense thatthe computer may force a rigid and unrelenting form of design behaviour.
THE IMAGE HANDLING APPROACH These problems are particularly troublesome in applications such as 2D design where the constraints on the interpretation of the image are less rigid and where the focus of attention is the displayed image. Since it isthe image that the user is interested in, one approach is to shift the emphasis away from the model to image. The approach adopted in the Raster Graphics Project at Leicester Polytechnic is illustrated in Figure 7. No attempt is made to support a complete model of the image; instead procedures are provided that extract user described objects from the image as and when they are perceived. Usually the identification of an object is a preliminaryto performing some operation on it (eg MOVE). Once this has been done the extracted shape can be discarded. This approach has been described as 'image handling', so as to distinguish it from image processing and pattern recognition. Image handling is concerned with the interactive generation and manipulation of images. As such it may involve the use of image processing and pattern recognition techniques, but in the context of the interactive design of images.
This approach makes direct use of a feature of a class of new graphic displays. Video displays, such as the random refresh display, maintain a visually stable picture by redisplaying it around 50 times a second. This is necessary because the phosphor coating on the screen is only temporarily brightened as the electron beam passes over it. To do this the display processor accesses a representation, or display memory, of the image which is usually separate from any model held in the host machine. Thus there are usually two representations of the image. One is used by the applications program for various purposes; the other by the display processor simply to present the image on the display. One way of storing the image in the display memory for execution by the display processor is called a 'bitmap' or 'framebuffer'. This method uses a given amount of memory to store the brightness/colour of every displayable point. The screen position of a point is determined by its location in the framebuffer. Thus for a binary image the two states of a point can be represented in 1 bit of memory; a display capable of presenting 512 × 512 points would require 262 144 bits to store an image. To store a 16 level greyscale or colour image would require 512 x 512 x 4 bits per point of memory. This may seem a lot of memory but computer memory has become relatively cheap. The framebuffer provides a representation of the displayed image in which there is a one-to-one correspondence between displayed and framebuffer points. It is thus a good approximation of the displayed image; indeed it can be viewed as a digitization of the screen. The extraction and display routines in image handling, Figure 7, operate directly on a framebuffer representation of image and in this sense share it with the user.
BINARY IMAGE HANDLING As a starting strategy the notion of image handling has been explored using binary images. In this section a number of the primitives (and their combinations) that have resulted from our initial investigation at Leicester Polytechnic are used to illustrate the potential of an image handling approach. The description of primitives is necessarily informal here and the reader is directed to Scrivener s for a detailed discussion of binary image handling methods. The fundamental process is the EXTRACT primitive. Its function is to retrieve a user identified region from the image. A region is a set of points of the sametonal value (eg all black) where any two points are connected. pt 2 -
Bitmap
Display
I J
ooJ
Interpret
t a
vol 4 no 1 january 1983
0
O0
0
0
0
@ @
@
0
@@
0
0
@
0
0
@
0
0
0
@@
@
0
0
~e
b Figure 7. Approach to the image used in the raster graphics project: the emphasis is on the image rather than the machine model
0 0 0
T
Pt i Figure 8. Connectivity: (a) definition of 4-adjacency (b) Two points, Ptl and Pt2, connected by a path of 4-adjacent points
37
Points are connected if a path made up of 4-adjacent points, Figure 8(a), exists between them, Figure 8(b). To retrieve a region the user identifies a point belonging to it using a Iocator device such as a cursor. Given this start point all the other points connected to it are retrieved from the framebuffer using EXTRACT and temporarily stored in the host machine in the same form as the framebuffer. To distinguish this area in the host machine from the framebuffer it will be referred to as a framestore. Depending on the memory available several framestores could be provided for system use, a particular framestore being uniquely identified by a system name. The system can maintain control of framestores during the execution of processes. Thus: the command EXTRACT may be defined as follows:
A complementary facility permits transfer from framestore to framebuffer: DISPLAY region at x, y • Put a framestore (region) at x, y in the framebuffer. During the EXTRACT process points are inverted as they are found which has the visual effect of erasing the framebuffer shape, Figure 9(b). In fact this is a consequence of the way in which this primitive has been implemented and need not happen, but has the side effect of providing feedback to the user who can thus confirm that the correct region is being extracted. To achieve a non-destructive extract, Figures 9(a) and 9(c), and the EXTRACT and DISPLAY primitives can be combined to form a new primitive: FIND region • Get the x, y o f a point in the region from the user. • EXTRACT region at x, y. • DISPLAY region at x, y.
EXTRACT a region x, y • Locate all points connected to point x, y and store in a framestore.
E!
L "r
a
b
C
x,,~-~
[3 f
e
x l ' Yl
m x,y
m g
h
Figure 9. Primitive commands (a) screen image (b) EXTRACT (c) FIND (d) COPY (e) MOVE (f) DRAW BOUNDARY (g) FILL (h) COPY-GROUND (i) MOVE-FIGURES-ON-GROUND
38
DESIGN STUDIES
With slight modification to the above a function can be defined that copies a region from one location in the framebuffer to another, Figure 9(d) from 9(c): COPY region • FIND region. • Get new location (xl,)'1) from the user. • DISPLAY region at (xl,)'1) A region can be moved from one location in the framebuffer to another if the step of displaying it at its original location is left out, Figure 9(e) from 9(o'): MOVE region • Get the x, yof a point in the region from the user. • EXTRACT region at x, y. • Get the new location (xl,)/1) from the user. • DISPLAY region at x~, y~.
the image is viewed as planes in space then the object manipulated in the basic functions is always treated as lying on the nearest plane to the viewer (ie no part of it is occluded by a region on a nearer plane). We might define the shape manipulated in this way as a FIGURE since it is viewed as being upon a GROUND. The basic functions, and others, can be combined such that structures perceived as existing in several planes can be manipulated. Handling the visual entity FIGURE has been already discussed in the sense that the objects (regions) dealt with by the basic functions are implicitly FIGURES. Handling the entity GROUND is more difficult since it involves inferring points that are not present in the framebuffer. The process of extracting a GROUND from the image is given below:
Since as a region is extracted it is inverted, it is in effect filled, or painted, in. If step 3 in FIND is left out a new function can be defined that fills a region:
EXTRACT-GROUND at x, y • EXTRACT region at x, y in the image. • Locate OUTER-EXTERNAL-BOUNDARY and store in framestore 1. • EXTRACT region at x, y in framestore 1 into framestore 2.
FILL region • Get the x, yof a point in the region from the user. • EXTRACT region at x, y. This is illustrated in Figure 9(g) where the white area bounded by a black rectangle is filled. As evident in the above functions, a region is usually retrieved as a preliminary to manipulation (eg COPY). Consequently, once the manipulative process is complete the retrieved region can be discarded. However, a function can be provided for saving an extracted region on disc for later use: SAVE region on disc • Get a name for the region from the user. • Output the framestore onto a disc file having the user specified name. A complementary function can be defined to load a region from a disc file into a framestore:
In the first step the 'visible' part of the entity perceived as GROUND is extracted from the image into framestore 1, Figures 10(a) and 10(b). The purpose of the function OUTER-EXTERNAL-BOUNDARY is to locate, as its name suggests, the outer external boundary of a region and store it. In this case it examines the region in framestore 1 and stores the boundary in framestore 2, Figures 10(b) and 10(c). Extracting the area within this boundary, Figures 10(c) and 10(d), yields the GROUND. The result of a function (eg EXTRACT, OUTEREXTERNAL-BOUNDARY, etc) always produces a framestore in which the area of interest (eg region) is signified bythe presence of ls, whatever the value of the region processed by the function. The end result of the EXTRACT-GROUND
LOAD a region from disc • Get the name of the saved disc file from the user. • Input the named file into the framestore. With the aid of these functions the user can manipulate regions as and when they are perceived. An internal model of the image is not maintained in the host machine and as a result the user is not restricted to manipulating the image as if it had a particular and constant structure. If the user's understanding of the image structure changes then he can use the above functions to manipulate it in terms of that structure. He is thus not constrained to a particular view of the image bythe computer system. The basic functions can be extended to allowthe manipulation of more complex perceived structures. Consider, for example, the shape in the top left of Figure 9(g). This can be viewed in a number of ways. The black L and T shapes might be seen as lying on a white rectangle, upon a black rectangle, upon a white background. Alternatively it might be seen as a white rectangle with the L and Tshapes cut out, upon a black rectangle, upon a white background. In the first case four spatial levels are perceived and in the second three. What is important is that more than one spatial level can be perceived and we would like to be able to handle these 2½D spaces. Figures 9(a)-9(g) reveal that if
vol 4 no 1 january 1983
Image •
•
• •
I
Fromestore
•
0
0
0
0
•
•
•
0
0
0
•
•
•
0
0
•
•
•
o
0
•
•
•
•
•
o
0
0
•
•
o
o
,o
o
o
o
o
i
•
• •
0 '0 0
•
•
bl
Pixelsof externfil boundary(outer) Fromestore2
-X,y
FramestoreI
•
•
C
O\
o\o
=
0'0
•
•
•
o
o
d
Lx,¥ Figure 10. Primitive EXTRACT-GROUND
39
function is thus a framestore in which the ground is recorded by the presence of ls although the GROUND region in the image identified by x, ymight have been black (ls) or white (0s). We must, therefore, maintain some record of how the final result is to be interpreted. The way in which this can be done is illustrated in the COPYGROUND function, Figure 9(h): COPY-GROUND • Get the x, y o f a point in the region perceived as GROUND from the user. • EXTRACT-GROUND at x, y. • DISPLAY region in framestore at x, y in framebuffer. • Get the new location (xl,)/1) from the user. • DISPLAY ground in framestone 2 with the same intensity as the point x, y (in the framebuffer) at x~, y~ in framebuffer. Thus, if the original region perceived as ground is white, then although the result in framestore 2 is signified by ls, we can pass the tone of the point used to identifythe ground as a parameter to the DISPLAY function so as inform it of the tone it is to use to display the ground inthe framebuffer. It is interesting to compare the difference between the COPY function, Figures 9(c) and 9(d), and the COPY-GROUND in Figures 9(g)-9(h); although the same x, y in the image is used to indicate the region of interest, the result is quite different. More complex operations can be defined involving both FIGURE and GROUND. MOVE-FIGURES-ON-GROUND • Get the x, y of a point in the region perceived as GROUND from the user. • EXTRACT-GROUND at x, y. • Get the new location (Xl,)'t) from the user. • DIFFERENCEframestore and framestore 2 and store in framestore 3. • DISPLAY framestore 3 in the framebuffer at x~, yl in the inverse tone of x, y. The operation of this function can be best understood by looking back to Figure 10. After the ground has been extracted the visible part of the ground is held in framestore and the full ground in framestore 2. If we take the DIFFERENCE between these two framestores we are left with the part of the ground concealed by figure; in other words the figure in inverted tone. Thus if we DISPLAY framestore 3 in the inverse tone to the ground we have moved the figure. In this case the figure to be manipulated is identified by a reference to the ground upon which it is perceived to lay.
Figure 11. Primitive MOVE-FIGURE-AND-GROUND
40
A function to operate on a figure by reference to a point on the figure has of course been defined 5. One ofthe features of MOVE-FIGURES-ON-GROUND is that if there are several figures on a ground, Figure 9(i), they will all be moved. In all the functions discussed so far the objects manipulated, whether single or multiple, are of a single tonal value. However both black and white regions can be manipulated together, Figures 9(1~and 11 : COPY-FIGURE-AND-GROUND • Get the x, y o f a point in the region perceived as GROUND from the user. • EXTRACT-GROUND at x, y. • DIFFERENCEframestore and framestore 2 and store in framestore 3. • DISPLAY framestore with the tone of x, y. • Get the new location (Xl, Yl) from the user. • DISPLAY framestore 2 at xl, Yl in the tone of x, y(eg ground). • DISPLAY framestore 3 at xl, yl in the inverse tone of x, y (eg figures). The functions described above show that it is possible to consider an approach to the design of images that permits greater freedom of manipulation. Since, in this approach, the machine maintains no permanent interpretation or model of the design (image) in progress, other than a digital framebuffer, the user is not constrained to operating on the image as if it contained particular, invariant, objects. As he perceives structures in the image he can identify them to the machine and subsequently operate on them. Consequently, the designer can reserve his judgement about the structure of the image; he can change his mind as he goes along. Thus in Figure 9(d) the perceived object manipulated is a black frame whereas in Figure 9(h) it is seen as a black solid rectangle.
CONCLUSIONS Conventional computer graphics techniques are inappropriate for dealing with images and hence for applications where the principal design concern is the synthesis of an image. It has been argued that the trend in computer graphics has been towards the generation of displayed images from internal models of the design object and this has led to relatively inflexible representations of images and also restricted representations (eg lack of colour etc). Image handling with its emphasis on the framebuffer representation of an image and procedures for operating on it provides an approach that, in the authors opinion, will lead to a fuller realization of the imaging power of the new graphic displays and the potential of video technology, in general, in computer systems. There are of course many problems to be resolved. In a sense, the problems are concerned not so much with how images can be represented but how the user can describe what it is that he wishes to operate on in an image. The functions described in this paper indicate the way this might be done using a descriptive language. Generalizing to greyscale and colour images, of course, increases the difficulty. However, preliminary work has been carried out on greyscale and colour 6-e and the basic functions are being modified to handle these attributes of an image. In many applications the displayed image functions simply to present some aspect of the object being designed. In these applications the machine's model of the
DESIGN STUDIES
2
Scrivener,S A R and Edmonds, E A 'Pictorial properties in raster graphics' Proc CG 81 Brighton, UK (1981)
3
Schappo,A and Edmonds E A 'An interactive raster graphics language' Proc. BCS 81 London, UK (1981)
4
Scrivener,S A R 'The interactive manipulation of unstructured images" Int. J. Man-Machine Stud. No 16 (1982)
5
Scrivener,S A R An interactive raster graphics language and system for artists and designers PhD Thesis, LeicesterPolytechnic,
design is essential and an image handling system might, therefore, seem inappropriate. However it could be argued that the kind of freedom to manipulate an image provided by an image handling system could be helpful in the preliminary design stage. One possibility that we hope to explore in the near future is the conversion of an image produced by an image handling system into an internal model using information provided interactively by a user. An image handling module might then provide a front-end to a conventional modelling system.
6
Boreham,D P and Edmonds, E A 'Extracting shapesfrom greyscale images' Int. J. Man-Machine Stud. No 16 (1982)
REFERENCES
7
Edmonds,E A 'Lattice fuzzy logics' int. J. Man-Machine Stud. No 13 (1981)
Scrivener, S A R, Edmonds, E A and Thomas, L A 'Improving image generation and manipulation using raster graphics' in Proc. CAD 78 IPC Scienceand Technology Press, Guildord, UK (1978)
vol 4 no 1 january 1983
UK (1981)
8
Edmonds,E A 'Domains of interset in fuzzy sets' Int. J. ManMachine Stud. No 14 (1981)
41