The interactive manipulation of unstructured images

The interactive manipulation of unstructured images

Int. J. Man-Machine Studies (1982) 16, 301-313 The interactive manipulation of unstructured images STEPHEN A. R. SCRIVENER School of Mathematics, Co...

3MB Sizes 2 Downloads 62 Views

Int. J. Man-Machine Studies (1982) 16, 301-313

The interactive manipulation of unstructured images STEPHEN A. R. SCRIVENER

School of Mathematics, Computing and Statistics, Leicester Polytechnic, P.O. Box 143, Leicester, U.K. (Received 23 May 1981, and in revised form 23 September 1981) Conventional approaches to interactive computer graphics do not always seem appropriate for certain kinds of two-dimensional design (e.g. art, graphics). This paper discusses an approach to computer graphics in which the interaction between the man and the machine is viewed as a process of communicating interpretations. The task for the man is to describe the structure perceived in the displayed image. The task for the machine is to derive an interpretation consistent with the user's perception by utilizing the image (the bitmap of a raster display) and the description of it provided by the user. Examples of techniques used in this approach are discussed, including those for handling figure/ground perceptions. It is argued that by using such techniques it is possible for the user to manipulate unstructured images interactively.

1. Introduction Writing on being creative with c o m p u t e r aided design, N e g r o p o n t e (1977) noted: Sketch recognition is as much a metaphor as a fact. It is illustrative of an interest in those areas of design marked by vagary, inconsistency, and ambiguity. While these characteristics are the anathema of algorithms, they are the essence of design. This passage is quoted because, like Negroponte, we are interested in an area of design that is characterized by its "vagary, inconsistency, and ambiguity"; pictorial art and surface design (e.g. textile and graphic design). The objective of our work is to provide c o m p u t e r aided design facilities that assist designers in two dimensions in the generation and manipulation of images. Not so long ago the colour display was the exception; now it is close to becoming a commonplace feature of the computer graphics system. Recent developments have made it possible for those interested in computer graphics to consider systems for handling the kind of images that the artist is able to produce using conventional media. At Leicester Polytechnic, we see the way a h e a d for two-dimensional interactive computer graphics to be in the direction of television, Fig. 1, with the computer providing a mechanism for integrating images from a variety of sources.

I camera

I I Tablet

II zvl

Computer

L Vididseo k

disploy

t Video cassette

plotter

FIG. 1. 301 0020-7373/82/030301 + 13503.00/0

(C) 1982 Academic Press Inc. (London) Limited

302

s.A.

R, S C R I V E N E R

However, although the technology exists for handling a greater variety of imagery the p r o b l e m of interfacing the artist to these resources in a satisfactory m a n n e r remains to be resolved. This p a p e r describes our approach to the above problem with particular reference to facilities for the interactive manipulation of unstructured images.

2, One step at a time It seems clear that, in general, the artistic process is characterized by an uncertainty that sometimes leads to unsuccessful sorties, retreat and new plans of campaign. Simon (1975) has argued that design is "tentative, cumulative and selective" and there is perhaps no better example to support his argument than the production of art. This view is confirmed by looking at what artists have to say about the making of art. T h e English artist Harold Cohen (1974) writes that " w e associate with it (the art making process) an elaborate feedback system between the work and artist; and d e p e n d e n t upon this system are equally elaborate decision-making procedures for determining subsequent ' m o v e s ' in the work". This is illustrated by the following passage attributed to Matisse (Guichard-Meili, 1967): Suppose I set out to paint an interior: I have before me a cupboard; it gives me a sensation of red--and I put down a red which satisfies me; immediately a relation is established between this red and the white of the canvas. If I put a green near the red, if I paint in a yellow floor, there must still be between this green, this yellow and the white of the canvas a relation that will be satisfactory to me. For Matisse, certain constraints may be clear but the way ahead is by no means certain for it depends on his response to the image so far. For Paul Klee (1925) it was enough to start without any p r e d e t e r m i n e d constraints: " A n active line on a walk, moving freely, without goal. A walk for a walks sake." Klee's method starts with a confrontation between the artist and the m e d i u m he uses. While a strategy for beginning his approach does not prescribe the events that will follow. Thus, the artist m a y or m a y not have goals when he starts working and it is difficult to predict how he will proceed, but it is highly likely that he will change his mind about things as he goes along. Sometimes such changes m a y defy explanation (Crichton, 1977): Why did you make that change? Because I did

But what did you see? I saw that it should be changed

Well if you change it what was wrong with it before? Nothing, I tend to think one thing is as good as another

Then why change it? Well I may change it again?

Why? Well I won't know until I do it

This illustrates the difficulty of trying to predict how the artist will behave, and also suggests that even if we ask him why he did something he will probably not be able to tell us. The artist, perhaps m o r e than any other potential c o m p u t e r user, forces us

INTERACTIVE M A N I P U L A T I O N OF I M A G E S

303

to recognize the unpredictability of h u m a n behaviour. Interestingly, we can show that one of the factors influencing the artist's progress is the perceptual ambiguity of the image.

3. The ambiguous image In writing of the technique called Frottage, which he invented, Max Ernst (1948) had this to say: I made from the (floor) boards a series of drawings by placing on them at random, sheets of paper which I undertook to rub with black lead. In gazing attentively at the drawing obtained, 'the passages and penumbra', I was surprised by the sudden intensification of my visionary capacities and by the hallucinatory succession of images superimposed, one upon the other with the persistence and rapidity characteristic of amorous memories. We are all, I am sure, familair with this kind of experience. Often a patterned wallpaper can be the source of m a n y interesting discoveries. Without this predilection to see form in the formless, the horror film would be r o b b e d of one of its most powerful devices; a foggy day in L o n d o n town.

FIG, 2.

304

S. A . R. S C R I V E N E R

F~G. 3. In the passage above, Ernst quotes Leonardo da Vinci who was perhaps the first artist to recognize the value of techniques that keep the image in an indeterminate state during the early stages of picture-making. L e o n a r d o ' s sketches are characterized by "pentimenti", a welter of lines that shatter the integrity of the perceived image, Fig. 2. Gombrich (1966) has argued that Leonardo allowed the indeterminate to rule the sketch as a means to stimulate the mind to further inventions. Gombrich also demonstrates how a picture perceived in an alternative way by Leonardo provided the stimulus for a new composition. He compares the Neptune sketch, Fig. 3, which was produced while Leonardo was in Florence, engaged on the painting of the Battle of Anghiari, to a sketch of the latter, Fig. 4, arguing that the figure rising with upraisal arm over a group of horses suggested the image of Neptune driving his sea-horses. As Gombrich (1966) puts it, "in searching for a new solution Leonardo projected new meaning into the forms he saw in his old discarded sketches". The psychologist, Richard Gregory (1973) writes: It may be that susceptibility to illusion is necessary to being creative, for if we were controlled directly by sensed events.., we would surely be tyrannized by the here and now--imprisoned by what is. Artists with their skill, somehow play upon our potential for illusion, allowing us to see and invent new possibilities. If this is so the artist must develop an " e y e " for illusions. To do this the artist often creates techniques for producing ambiguous patterns open to a variety of equally acceptable interpretations; one might call them illusion generators. Consequently we should not only expect that artist to be tentative with respect to process but also to the image. Indeed, maintaining the image in an indeterminate state can be regarded as a strategy for creative invention.

305

I N r l ' E R A C T I V E M A N I P U L A T I O N OF I M A G E S

FIG. 4.

4. Communicating about an interpretation Typically, in interactive computer graphics information about an object, or objects, to be displayed is recorded in a data structure. The data structure is then processed by a "viewing algorithm" and presented to the user on a visual display. In response to the displayed picture the user can usually initiate commands that cause modifications to the data structure, and consequently the displayed picture, Fig. 5. Display

algorithm

FIG. 5.

306

s.A.R.

SCRIVENER

Often the data structure representing the displayed picture is built up implicitly as the user issues drawing, or generative commands. When this is the case the user may not be aware that a particular structure is being given to an image. If we consider for a m o m e n t the screen pattern not as output from the computer but as input to it, then the graphical data structure can be thought of as the computers interpretation of the screen; its " p e r c e p t i o n " of the displayed "stimulus". On the other side of the screen we have the artist arriving at his interpretations of the displayed data, Fig. 6. Display

Machine's interpretation of image

FIG. 6. Communication about an interpretation.

When the user initiates an action intended to affect the pattern he sees, he sets in motion modifications to the data structure. Thus, although the user may not be aware of it, he is in the business of talking about the machines interpretation of the image, not what he actually sees. Problems can arise when the user attempts, or wishes, to m a k e modifications to the image that do not match the computers understanding of it. For example, an image might have been generated in such a way that the computers internal representation is that of Fig. 7(a), e.g. five lines. H o w e v e r the user might see, and wish to manipulate it as two squares, Fig. 7(b); or as a " T " shape and a " U " shape, or in terms of any other interpretation that comes to mind.

(a)

(b

(c)

FIG. 7.

Now, in most applications such mismatch is, some might argue, not a serious problem because it is not the image that the designer is operating on but the underlying model that the image represents. Thus in a C A D system for r o o m layout, the computer's internal data structures can be viewed as a model of the r o o m layout on which the designer is working. In this context the man and machine can be regarded as sharing a model that has a more-or-less constant meaning for the designer. However, as argued previously, we should not expect the artist to maintain a fixed interpretation of the image. Consequently an approach to interactive computer graphics than places emphasis on the problems of communicating about the machine's

I N T E R A C T I V E M A N I P U I , A T I O N OF I M A G E S

307

particular internal model of the image is likely to lead to systems that are at variance with the artist. We have adopted an approach that moves away from the problems of the " m o d e l " and concentrates more on the image. Effectively, we aim to provide facilities that allow the user to manipulate structures, as and when they are perceived in the image.

5. Communicating interpretations An alternative model of interactive computer graphics is illustrated below.

Bitmap Display , Interpret

J

T

FIG. 8. Communicating interpretations.

Here the central idea is that both the man and the machine are "looking" at the same image. A suitable "image" for the computer is the " b i t m a p " or "frame-buffer" of a raster scan display (Scrivener, Edmonds & Thomas, 1978). Looking at the situation from the outside in, the " b i t m a p " can be viewed as a digitization of the screen image. As such it provides a reasonable approximation of the image presented to the user. In all further discussion the computers image is assumed to be represented on a square grid (bitmap). Like the man, the machine can interpret the image using information extracted from it. As well as being able to inspect the image the computer can also modify it at the request of the user. The dialogue between the man and the machine can be viewed as one in which the user explains to the machine how he is interpreting the image, or part of the image, and what operations are to be applied to it. Seen in this way, the man-machine interaction is a process of communicating interpretations. To take an example of this approach consider the binary image in Fig. 9(a).

Cursor

x,7

(b) (al FIG. 9. Move region at location to x, y.

Given a point in the region to be moved, it is fairly straightforward to define an algorithm that determines the spatial co-ordinates of the points comprising the perceived region.

308

S. A . R . S C R I V E N E R

Given the start point the procedure grows the region by iteratively grouping together 4-connected (Rosenfeld, 1970) neighbours of the same value. Any given point in the image has eight immediately neighbouring points. Its nearest neighbours are the four points above, below, left and right of it. When only these neighbours are considered the point is said to be 4-connected. Having determined the region it is then erased by inverting its tone value and re-displaying the region at its original location [alternatively the procedure can be defined in such a way that the "erasure" is performed in parallel with the extraction of the region (Edmonds, Schappo & Scrivener, 1980)]. The operation is completed by displaying the region at its new location. The internal representation of the extracted region which is regarded as temporary can now be discarded, or left in what might be called a "short term m e m o r y " that is overwritten as new regions are extracted. Copying a region from one location to another is achieved by leaving out the operation that erases the region identified by the user. Those examples illustrate facilities that allow the user to operate on features in the image as they are perceived. In this sense the user is in the business of interactively manipulating an unstructured image.

6. Splitting and merging regions At any moment, it is assumed that the user is interested in a part of the image to the exclusion of the rest. This seems intuitvely reasonable and has the additional advantage that, in general, only part of the image will need to be processed. In an interactive graphics environment this can be an important consideration. Given that ours is primarily a task of grouping together points in an image starting from some focal point, it has much in common with region growing in image processing (Zucker, 1976). Region growing, in its simplest form, is the process of joining neighbouring points (or collection of points) into larger regions, subject to specific conditions. For example, given that the number of regions and the location of a single point in each region is known, an aigorithm can be developed which starts at the known points and appends all neighbouring points which have the same binary value as the known point to form a region. In this way the entire image can be segmented into regions. The procedure discussed previously for extracting a region identified by the user is, in principle, a region growing process. Unlike most region growing algorithms it is not a requirement that the segmentation be complete (i.e. every point should be assigned to a region). However, it is a requirement that the partial segmentation that results from its application corresponds to the object perceived by the user. The region extracted is what Brice & Fennema (1970) have described as an atomic region : A partition of a set X is any collection of sets (R1, R 2 . . . . , R,) such that the union of the Rk is exactly X and the pairwise intersection of the Rk is nil unless the two sets are identical. If we define some equivalence relation on the array P--say, as a trivial example, P(i, j) is equivalent to P(k, l) if their values are equal--then this in turn induces a natural equivalence relation on G given by (i, j) is equivalent to (k, l) if and only if P(i, 13 is equivalent to P(k, l). Any equivalence relation on G yields a partition of G into equivalence classes. These classes can be further broken down into maximally connected subsets called connected components; we call these homogeneous connected components atomic regions. Using the

INTERACTIVE

MANIPULATION

309

OF IMAGES

equivalence relation induced by the equality of the gray scale, the atomic regions are obtained by the first step of our technique. An atomic region is thus a connected c o m p o n e n t of a constant grey level [or a physical region in B o r e h a m & E d m o n d ' s (1982) terms]. Thus, whenever the user identifies a region the partial segmentation that results is an atomic region. However, as Fig. 10 illustrates, such a region can not be regarded as perceptually atomic because it can be subdivided perceptually.

Atomic region

Perceived region

E IN

N (o)

(c)

(b) FIG. 10. E x t r a c t r e g i o n at c u r s o r l o c a t i o n .

The algorithm could be modified such that an atomic region or part of one results depending, let us say, on the shape of the region identified by the user. However, the uncertainty of the output from such a procedure may lead to frustration for the user. Clearly the correct person to resolve the kind of ambiguity evident in Fig. 10 is the user. In tackling this p r o b l e m we abide by the principle that wherever possible the user should be able to choose the segmentation to be p e r f o r m e d by providing the system with a high level description of it. Thus, for example, Fig. 10(c) might result from a request such as: " E X T R A C T T H E S Q U A R E S H A P E D R E G I O N A T C U R S O R L O C A T I O N " . H e r e the meaning of R E G I O N is being qualified by SQUARE SHAPED. So far as the system is concerned, without qualification, it understands a R E G I O N to be an atomic region. The atomic region represents a base element to which the user can return at any point. In other words the user can be sure about what will result when he asks for a R E G I O N to be extracted. It is important that the base element used in image description is easily recognised by the user and the atomic region is a suitable choice in this sense. Given the notion of a R E G I O N the structures perceived by the user can be viewed as a segmentation resulting in a region [Fig. 1 l(a)], part of a region [Fig. 11(b)], or many regions [Fig. 11 (c)]. I I

Region

Squore region

Region ond outer region

I iiiiii~i ~i~?IL==liiiiii

-I

d?

[]

i

I I

(b)

(a)

FIG. 11.

(c)

310

s.A.R.

SCRIVENER

The first segmentation is invoked by REGION, the second by adding a qualifier to REGION, the third by describing a relationship between REGIONS. Thus, regional qualifiers control the splitting of a region into parts and inter-regional relators the merging of the regions. Inter-regional relators could be provided that allow the relationships between adjacent and non-adjacent regions to be described, Fig. 12. Region and inner region

Region and region above

No Fit;, 12.

Clearly the latter will be the most difficult to implement satisfactory.

7, Inter-planar relationships Thus far, the image has been viewed as a kind of two-dimensional jig-saw puzzle. However, the perception of pictures invariably involves two planes; a figure (foreground) plane and a ground (background) plane. As soon as we think about moving a region (e.g. Fig. 9) this fact becomes obvious. When the region is moved we do not expect to see a void left in its place. We imagine a plane continuing under the region to be moved. We wil~ therefore need to provide mechanisms that will be able to handle relationships between planes; these we shall call INTERplanar relationships as opposed to relationships in a single plane which we shall call INTRAplanar

relationships. The notion of many picture planes complicates the problem because part of region on a lower plane might be occluded by a region on a higher plane. Consequently, in order to complete a segmentation consistent with the perception of an interplanar relationship between regions, information may have to be inferred. In some instances then, it may not be sufficient to simply group points in the image plane, Fig. 13.

I

Atomic recj iotas

~174 o@

FIG. 13. Figureand groundat cursor location. The handling of figure/ground relationships in binary images has been described elsewhere (Scrivener & Edmonds, 1980) and is reviewed briefly here. If we ignore certain kinds of figure ground perception, such as those illustrated in Fig. 14(a) and (b), then a region seen as figure will be enclosed by the region seen as

311

INTERACTIVE MANIPULATION OF IMAGES

Closure

(o)

Overlap

Figure within ground

(b)

(c)

FIG. 14.

ground, Fig. 14(c). A region can be defined as having internal and external boundaries. Internal boundaries are comprised of all the points belonging to the region with at least one 4-connected neighbour belonging to another region. An external boundary of a region is the set of points (not belonging to the region) 4-connected to the points defining an internal boundary. Internal and external boundaries can be further subdivided into inner and outer boundaries. If a region is described as a F I G U R E by the user its external boundaries will be internal to the G R O U N D . Alternatively, if a region is described as a G R O U N D by the user its inner external boundary (boundaries) are internal to the F I G U R E (FIGURES). Where a F I G U R E encloses a region, the enclosed region is assumed to be part of the G R O U N D . Using these assumptions, algorithms have been defined for extracting regions described by the user in terms of figure/ground relationships. For example if the user issues the request, " E X T R A C T F I G U R E A N D ITS G R O U N D A T C U R S O R L O C A T I O N " , then the process of interpreting it by the machine can be described in the following way. A region is extracted that has been identified by the user as the F I G U R E , Fig. 15(b). Since the G R O U N D is also required a point on the outer external boundary of the F I G U R E is located and this is used to extract the G R O U N D , Fig. 15(c). Since part of the G R O U N D was occluded by the F I G U R E this must be inferred. To do this the outer internal boundary of the G R O U N D is determined and filled in, Fig. 15(d). [For a fuller discussion of algorithms for handling figure/ground in binary images see Scrivener, (1982).]

Co)

(b)

(c)

(d)

FIG. 15. Figure and ground at cursor location.

8. Conclusions The artistic process is characterized by a high degree of tentative decision making. Typically, when the artist starts to work he has a "fuzzy" idea about what will result

312

s . A . R . SCRIVENER

and will probably change his direction and objectives m a n y times as the work progresses. F u r t h e r m o r e , the image itself may, due to its visual ambiguity, provide a source of change. Alternative perceptions of objects in the image may lead to alternative ideas. It has been argued that the conventional approach to interactive c o m p u t e r graphics (described here as a process of communicating about an interpretation), with its emphasis on an internal model from which the displayed image is derived, does not lend itself to a tentative formulation and manipulation of images. A n alternative model has been described in which the focus is shifted away from a highly structured internal model used to represent the displayed image to an unstructured representation (e.g. bitmap). The man and machine are viewed as sharing the image ("stimulus"). The p r o b l e m for the user is to describe to the machine what he sees and where it is located in the image. The p r o b l e m for the machine is to take this description and any information it can usefully extract from the image and arrive at an interpretation that matches the user's perception. We have described this as a process of communicating interpretations. The image for both the man and machine can be viewed as two-dimensional. However, the visual structures perceived by the user will usually be implicitly or explicitly multiplanar. The task for the system is not just to segment the image but also to m a k e inferences about structures in several planes where necessary. Since the user may interpret an image, or part of an image, in a variety of ways most of which we can only guess at, our approach is to provide facilities that allow the user to be specific. In this sense the system relies upon the user to provide it with assistance. This he does by issuing descriptions of the perceived structure that guide the system in processing the image. The descriptive language employed by the user m a k e s it possible for him to describe relationships between regions in one plane (intra-planar relationships) and also relationships between regions in several planes (inter-planar relationships). This work has been supported by the S.E.R.C.

References BOREHAM, D. P. & EDMONDS, E. A. (1982). Extracting shapes from grey-scale images. International Journal of Man-Machine Studies, 16, 315-326. BRICE, C. R. & FENNEMA C. L. (1970). Scene analysis using regions. Artificial Intelligence, 1, 205-226. COHEN, H. (1974) On purpose: an enquiry into possible roles of the computer in art Studio International, 187 (962), 9-16. CRICHTON, M. (1977). Jasper Johns. London: Thames and Hudson. EDMONDS, E. A., SCHAPPO, A. & SCRIVENER, S. A. R. (1980). Graphics without data structures. Proceedings Computer Aided Design 80, Brighton, pp. 138-145. ERNST, M. (1948). Beyond Painting. New York: Wittenborn, Schultz Inc. GOMBRICH, E. H., (1966). Leonardo's method for working out compositions. In Norm & Form: Studies in the Art of the Renaissance, Oxford: Phaidon Press. pp. 58-63. GREGORY, R. L. (1973). The confounded eye. In Illusion in Nature and Art. London: Duckworth. GUICHARD-MEILI, J. (1967). Matisse. London: Thames and Hudson.

INTERACTIVE MANIPULATION OF IMAGES

313

KLEE, P. (1925). Pedagogical Sketchbook. London: Faber & Faber. NEGROPONTE, N. (1977). On being creative with computer aided design. Information Processing 77, LF.LP., Amsterdam: North-Holland. pp. 695-704. ROSENFELD, A. (1970). Connectivity in digital pictures. Journal of the Association for Computing Machinery 17, 146-156, SCRIVENER, S. A. R. & EDMONDS, E. m. (1980). Pictorial properties in raster graphics: classification and use. Proceedings Computer Graphics 80, Brighton, pp. 423-433. SCRIVENER, S. A. R. (1982). An Interactive raster graphics language and system for artists and designers. Ph.D. Thesis, Leicester Polytechnic. SCRIVENER, S. A. R., EDMONDS, E. A. & THOMAS, L. A. (1978). Improving image generation and structuring using raster graphics. Proceeding of Computer Aided Design 78, Brighton, pp. 213-223. SIMON, n. A. (1975). Style in design. In EASTMAN, C. M., Ed., Spatial Synthesis in ComputerAided Building Design. New York: Applied Science Publishers. ZUCKER, S. W. (1976). Region growing: childhood and adolescence. Computer Graphics and Image Processing, 5, 382-399.