In search of a general picture processing operator

In search of a general picture processing operator

OOMPUTER GRAPHICS 2iND Ii%IAOl*~PROCESSING 8, 1 5 5 - 1 7 3 (1978) In Search of a General Picture Processing Operator GOESTi H. G~Az~LvNZ) Picture ...

6MB Sizes 1 Downloads 37 Views

OOMPUTER GRAPHICS 2iND Ii%IAOl*~PROCESSING 8, 1 5 5 - 1 7 3

(1978)

In Search of a General Picture Processing Operator GOESTi H. G~Az~LvNZ)

Picture Processin~ Laboratory~ Department of Electrical Engineering, University of Linkoeping, S-581 83 Linkoeping, Sweden Received July 5, 1977; revised October 6, 1977 The problem of finding a general, parallel, and hierarchical operator :forpicture processing in considered. An operator is defined which at different levels can detect and describe structure as opposed to uniformity within local regions, whatever structure and unih)rmity may imply ~t a particular level. The operator performs a mapping from one complex field to another. The important characteristic of this approach is the use of complex fields which allows a global-to-local feedback. In the transformation process the image is simplified. A Fourier implementation o[ 0m operator is described and a new transform is defined. The operators become increasingly global on higher levels in order to include adiacent high-level features. A hierarchical structure of sudl ~ransformations gives a sequential description of structure over increasingly larger regions of the image. The processed in[ormation at different, levels can be used as input to a classifier. Examples are given of processing results. i. INTRODUCTION

Pictorial pattern recognition systems are often described as consisting of three parts: a preproeessing part, a feature extraction part, and a classification part. The preprocessing is used to enhance or sharpen the image to be processed. This is usually done using linear operations or operations on the gray scale such as thresholding [1-3]. The classification part is fairly well understood [4, 5]. The feature extractor, on the other hand, is very much dependent upon the actual problem and no general theory has emerged on how to deal with it. Feature extraction procedures so far have been ad hoe, often referred to as "a bag of tricks." The present work grew out of an interest in finding a single picture operator that could in parallel perform a number of useful operations and that could work on several levels in a hierarchy. One background to this interest is the feeling that the eyes and brains of humans and animals are likely to have such standard operators, as the mierostrueture of the brain probably could not be specified in detail in the genetic information. It is not unlikely that the brain has a general language that is employed in most of its information processing, which is equivalent to the use of such s general operator. These statements are not meant to imply that this is intended to be a model of biological visual sysbms ['6]. Although the model suggested is strongly influenced by what is known about the 155

0146-664X/78/0082-0155502.00/0 Copyright ~ 1978 by keademle I'ress, Ine. All rights of rcproduetloa in any form reserved,

156

GOESTA H. GRANLUND

visual system in humans and aninmls, it was deriw~d with the objective of being technically functional. Another objective for this ope,'ator is the need for parallel processing in order to supply the computation power needed to process the large amounts of information involved. A number of structures have been suggested for fast processing of pictorial information [7-~13] and in some instances these ideas have been implem e n t e d iI1 hardware. 2. BASIC

FUNCTION

OF

THE

OPERATOK

T h e b~sic function of the operator is t h a t in a local regi~m of the image it de,scribes tile situation as a magnitude and a direction; that is, a two-dimensional v e c t o r (see Fig. 1). T h e optimal correspondence between the numerical values of the vecter and the properties within the region is an open question, and a number of choices are possible. An obvious choice is to have the direction of the vector correspond to directional components within the window. The magnitude can represent the a m o u n t eft this property, such as the step size of the edge or the amplitude of the g r a y scale variation of the texture. In general we want an operator that gives zero output from a region with a uniform content and a nonzero output from a structured region, whatever uniformity and structure may imply in a given case a n d tm a given level in the hierarchy. T h e model hns here been centered around the Fourier transform F14, 15~]. T h e r e is no reason to believe that it gives the most effective operator; rather, experiments on texture have repeatedly indicated that other descriptors are often more suitable F16-18]. The Fourier transform has, however, a number of attractive analytical features which are useful in a first analysis of the situation. They describe in an elegant way relationships between properties in the space domain and frequency domain as welt as information content. w/nJow /

\

o

b

FIG. 1.. Ilhlstration of the basic function of the operator. (a) original image; (b) contribul;ion from window to I~ransformedimage.

A GENERAL

PICTURE

PROCESSING

OPERATOR

157

V

Fro. 2, Fourier transform plane of image within window. Amplitude content regions of interest indicated.

As indicated earlier the basic hypothesis is that within a window of a certain size we have within a certain frequency range a variation predominantly in one direction, and that it is suitieient to give a single magnitude value related to that direction. This is equivalent to saying that in the two-dimensional Fourier transform of t,he image within the window, there is a considerable amplitude content within only one of the sectors (see Fig. 2). We will have to o m i t a dis' eussion of the relevance of this assumption in this eontext. Consequently we find the maximum amplitude content, Ao, for some direetiotl 0 = tan-~(v/u), within a range 0~ _< 0 _<0b within the frequency window T12 <( ~2 Ji- V2 <~ ~'22:

Ao = max f ,

IF(u'v) ldOdr

f0o

where F(u, v) is the Fourier transform of a picture region f(x, y) : co

where j = - 1~. We will show later that the use of a window of small size gives inherentls? tile amplitude content within a wide frequency band, although tlle frequeney argument within the integral may assume one single value. The window function

158

GOESTA H. GRANLUND

selected is a two-dimensional symmetric Gaussian a ½

The Gaussian form has been chosen because it is a smooth function giving a Fourier transform with nonnegative values. The transform's being a Gaussian is an attractive feature in space-frequency relationships and will be discussed later. The Fourier transform is computed for the picture function with the window in position xo, yo for a number of discrete angles 0,~ = tan -1 (v~/u~) and for a certain frequency r --- ( ~ 2 + v,,2)~. The angles used in the computations have been O~ ~ ( n - - 1 ) ' 7 r / 8 , n ~ 1, 2, . . , , 8. Due to symmetry of the amplitude spectrum, only the 0, of the first two quadrants have to be considered (see also Fig. 2) :

where r 2 = u~~ -~ v. 2 and 0~ = tan -I (v~/u~) --- (n - 1)(7r/8), n = 1, 2, . . . , 8. The next step is to find the maximum value of [g (r, 0~) ] as a function of 0~. This value is also dependent upon the position of the window, x0, yo : A ( r , xo,yo) = max IF(r, 0~)]~0,~j0. 0~

The entire picture is processed by this operation, by means of which A (r, x0, yo) is computed for each position (x0, Y0) of the window. One immediate w a y to represent this information is as vectors with magnitude

A (r, z0, y0) and direction O,,m~~. This would give a ~ransformed, complex function f'~(x, y) of the original picture function f ( x , y) where f'~(x, y) = A (r, x, y)e i°', .... Figure 3 illustrates a simple stylized picture fuuetion f(x, y) together with i~s transform f'~(x, y). A problem with this procedure is that only half of the angular space, 0 ~_ 0~ <_ 7w/8, is used, and that there are certain ambiguities ia the definition of the direction of a line or an edge. Also we conceptually consider the .~WOhalves Of the .pictt~re in. Fi~. 3 maximally different in terms of line direction

A GENERAL PICTURE PROCESSING OPERATOR

159

o

Fro. 3. A stylized picture function f(x, y), (tL),wi[h it.s tra~lsformG'(x, y), (b). when the lines are at right angles. It seems appealing that a maximal difference in terms of direction be described by vectors with opposite directions. The difficulties described can be resolved by giving the vector in the transformed, complex function a directi(m angle 20. That is, we define the transformed, complex function f,¢')(x, y) of the original picture function :f(x, y) by

G c~) (z, y) = A (r, x, y)e ~°,,,.... We denote this tra~lsform by ~owhere y) = g

{f(z,

The effect of this transform is illustrated ia Figs. 4a and b on the same stylized picture function. 3. I~,ELATIONSHIP BETWEEN WINDOW SIZE AN.[) FIUi]QUENCY CONTENT So far nothing has been sLdd about the width (:ifthe window g (z, y) in relatiea to the frequency content obtained. The window function used is

g (x, y) = (a/v)ie-'~(~'+~'). In the following discussion we will, for the sake of notational brevity and simplicity in illustration, use one-dimensional rather than two-dimensional representations. The reader will have no difficulty in later extending the discussion to two dimensions. The window function in one dimension becomes g (t) =

o

b

FI(~. 4. Result of ~wc~t,'tmsforlnt~t~ions.(t~)Origimdimage f(x, y). (b) First-level trtms~ormabion fr, c~)(x, y). (e) Second-level ~rtmsform~tionfr2(~) (Xj y).

160

GOESTA H. GRANLIIND

If we multiply f(t) with a window function, ca(t), we eal~ consider the transform value

f(t)ca(t-

F,0(r) =

to)e-~2~-dt

representing the amplitude content at frequency r of the function f(t).g(t) near t = to. However, we can also consider F ~o(r), representing the amplitude content of f(t) near t = to, within a frequency range around r equal to the Fom'ier transform of ca(t). This sit,uation is apparent if we observe that for

f(r)g(r)e-S2~'dr

in'(,.) = ~{f(t).ca(t)l = and h(t) = f(t),[ca(t)eia'~o'J =

?

,'f(r)g(t- r)d"-,~'o('-~)dr

we have F(r0) = h(O)

3o,' y(t) = ~/(-t).

If H(r) = ~{h(t)} then H ( r ) = ~{f(t),E0(0d-%r°'J}

= ~{f(t)}.,~{u(t).d-'~r°'}.

Now

~{ ca(0" d=~'°'} = ~ { rl (0 } • ~{ e~'''°' }

= v{ (~/~) ~e-""} ,~ (r ~.,,) e-~r=r=l"*a(~'-- re) = e,-~r=(r-r°)~/a. -

=

The preeeding mathematics shows that finding the amplitude content at frequency r0 of a function f(t).ff(O (where ca(O is the window fuaetkm) is equivalent to oonvolving f(t) with a weighting function ca(t)eJ='~°' and also equivMent to finding the amplitude content of be(t) within a frequency band 0 0 9 = e -'~C'-~°~v~. The previous discussion can easily be extended to two dimensions with the same general result, Observe, however~ that the entire transformation ~/~)(x, ,v) = ~,{f(x, y)}

is no~ a linear operation, as in every position of the window a maximization with respect to angle is performed. The information content is considerably reduced; for example, the phase angle information has been discarded. Thus, no complete reconstruction can be performed from the transform. From the preceding discussion it is apparent that one picture transform L0C~)(x, y) gives the information from the picture function f(x, y) in a band close to frequency r0. The bandwidth obtained is dependent up(m the size of the window. We define the width of the window as the distance T~ between points where the window function has decreased to a value which is one-half of its maximum

A (3ENEI1AL P[CTUIIE PH()CESSIN(] OPI~]II,ATO[~

161.

value : :j(V,,,/~,) = ~(0)/2 which gives T,, = 2(hi 2/a)"-'. If we decide to have n pexiods of the complex sine WaVe

~,i2~r0t

within the window, then it can be shown t,hat G(r) = exp

..... 4r02 hi2

/

and that the relatiw~ bandwidt:h B is B

r - r0

2 In 2

0.4:.41',~

0

Figure 5 shows eosine and sina e()ml)onenl:s ()f the weighting funcl:ions w(t) and corresponding frequency function ([(r) for wind{)ws of length 1. ',rod 2 periods. 4. EXTI~]NI)F,I) FI{EQUF, N(JY (X)VE]t.A(II,~

So far we h'we only discussed the effeet:s of one single transform f,@)(x, y) = eo~0{f(a',y)} (m the (>riginal pictur(~. We have seen that, the transform cmitains information only wil;hin a band centered m'ound the frequency %. lit is, however, possible to cover a larger fr(;quc;ncy range by cmnputing a number of transforms L,,,(1)(x, y) oj,,,{f(x, y)}, where the fr(~qucncies r,,, are eh()sen so that the individual transfer fmwti(ms arc staggered. One, way of doing this is to have the) same relative window size in relation t~, the c(;ntcr frequency r,,, and chose r~ so t h a t the lower cutoff frequency ~)f one function coincides with the higher cutoff frequency of another function. T h a t is =

exp (

......... 4r,~ 2 In 2

/

eXI) (

. . . . . . . . .

4r,,_i ~ In 2

/

2

where rm,i = 'rm-l.1, and r,,-i and rm are center frequencies of adjacent frequency functions. This arrangement results in a frequency function that is fairly constant (see Fig. 6). For a picture of 512 X 512 dements we may want to cover a frequency range from zero to 256 cycles per dimension of the image. A consequence of the constant relal;ive window size in relation to the center frequency is t h a t w e obtain a bandwidth that is a constant fraction of the center frequency. This implies frequency windows of decreasing widths as the center frequency decreases (see Fig. 6). If w(; desire a frequency eew~rage of this type, there is a restriction upon the rchttionship betwe~;n window sizes to use for corresponding center frequencies. We' h'tve then. G(r -- h) = G ( r - r~) = (:(0)/2

where r,, < r < rl.

162

GOESTA H. GR, ANLLTND G(r)

G(r)

::r

(b)

(a)

(c)

FI(;. 5. Weighting functions for n = 1 and n = 2 and corre,~ponding frequency funcl~ions.

We h a v e seen earlier t h a t i n this case r

-

r~

rl - r

Solving for ,rgives r=rl

2 In 2

(') 1-

2-

=r2

B

0.4413

(}) 1+

and

B 1+-ri

2

n d- 0.4.413

r2

B

n - 0.4413

2

r4

I

I

I

I

, n~

I n!

r6

Fro, 6. Arrangement of frequency functions to produce a continuous specgral coverage.

A GENERAL

PICTURE

PROCESSING

OPE[CATO[I

163

If the number of periods within the window is constant with value n, the window size, To as defined earlier is inversely proportional to the center frequency r0. As an example we obtain

T2/T1 = h/r2 = 2.57

for n = 1,

T2/T1 -- rl//r2 = 1.57

for n = 2.

An infinite number of increasingly larger windows will be needed to cover frequencies down to 0 cycles if we maintain the requirement of the same number of cycles, n, within the window at every center frequency. An improvement can be obtained if we decrease the relative size of the window as the frequency decreases. A hypothesis, which here has to be give,n without further comments, is that some similar effect m a y be responsible for the reduced sensitivity at low spatial frequencies in the visual system. 5. TRANSFORMATION OF HIGttEI{-LEVEL COM[PLEX FIELDS As a result of the processing described in the preceding section we have a two-dimensional vectorial field fr (I) (x, y) = 9~{f(x, y)}, describing magnitudes and directions of variations in the original image within the frequency range chosen. Tiffs field can now be operated upon with an operator that can accep~ a vectorial input rather t h a n a scalar one. As we used a form of the Fourier transform on the processing of the original scalar picture it is logical tha~ we continue with a form of the complex Fourier transform to process the vectorial field obtained. The transformed picture function f,.0u~ (x, y) is herc complex: Lo(l>(x, y) =/,.o,~o °) (x, y) "k jfro. I,,,(x, Y) = g~oIf(:c, y)} = A(r0, x, y)eW0. The Fourier transform is now computed for the picture transform L0¢t)(x, y) within a Gaussian window for discrete angle,s Vn

"IT

0~=tan -~-=

(n-l)

8,

n ---1, 2, . . . , 8

%n

and for a frequency r= Fa)(r, O.)~o,~o = Fa)(u., v,O -- f f

(u2+v~)~: Loa)(x, y)g(x - xo, y -- yo)e-~2<~"x't-v'~)dxdy

co

=

f,.o<,>(x, y) (-1

\7r/

"

--o0

We obtain the maximum magnitude, A (~)(r, x0, y0), of F0,,a) (r) as a function of B~ for every window position (x0, yo) : x ( ' ) O ' , x,,, yo) = m a x

On

164

GOESTA H. GRANLUND

Another vector field L('-')(:t., y) is generated where the position (x, y) is assigned a value w i t h magnitude A a~ (r, x, y) and angle 20 ..... : f, (~ (.~, y) = A (~ (r, z, y)e "-j°. . . . . .

We denote the transformation by aad We do in general consider the picture functions fr (") (x, y) to be two-dimensional complex functions, and the operator as working with complex functions, w i t h f(o) (x, y) or f(x, y) a degenerate special case when the field is scalar. W e define f,o~ (:% y)

= f(x, y)e i°.

6. .I~ESULTS FRON[ PROCESSING The result of two levels of ~ransformation is illustrated ii~ Fig. 4. P a r t b displays the vector field obtained after the first tr'msformatien. Part c displays the result of the second transformation. We get a contribution only from the region in part b where the vector field eha~tges. One interpretation of the preceding case is t h a t par~ a represents an image with regions having different textures. P a r t b represertts the texture description and part c indicates the border betwem~ the t e x t u r e regions. Through two transformations we have been able to detect the borcfer between two texture regions. An attractive feature of the operator is t h a t it indic~ttes borders between vector fields of different magnitude as well as direction. A few experiments have beea performed using this operator. The folk)wing photographs represent the positive projections of the field vector along a selected direetioll vector p = es~. The displayed function is a luminance L where L=

L'

= 0

ilL'>0, if L' < 0,

and r~' = c + {f/n)(x, y ) . e;~>.

The character (-} denotes the scalar product, and the eonsMnt c is used t0 set a bias level. I n Fig. 7 the orlginal picture f(x, y) (part a) consists of 100 X 100 elements. P a r t b iIlustra~es the result of the first ~ransformafion, G a) (x, y). Window w i d t h is one period of frequency r~ = 12/100 points. P a r t b represents (L~ (~) (:c, y). e J°). Parb c illustrates the resul~ of ~he second transformation, L2 (2) (x, y). The picture i11dicates the magnitude if,, (2) (:r, y) ]. Window width :is one period of 'r2 = 8/100 points. Figure 7d illustrates L with c' = 0, ~ = 0. The choice of phase angle for the directioI~ vector e~'~in L determines the angle of lines to be displayed. This is also illustrated in Fig. 8. The original picture, f(x, y), she wn in a, is a

A CI[~NEI.{AL PICTUI{E PROCESSIN(2I OPI!HIATO[t

(':9

(c)

165

(b )

(d)

Fra. 7. Result of two levels of t,ransforlnafi(m. (~b) Original image f(z, y). (b) f,l (t) (x, :q) displayed with c = 0, ~ = 0. (e) If,2(~)(x, y)I. (d) f@'~l(x, y) displayed wid~ c = (I, ~ = 0. c h e c k e r b o a r d patte~rn. P a r t b is the m a g n i t u d e of the first; t r a n s f o r m or [f/~) (:c, ll) ]. P a r t c shows L for f,(t)(:c, .~/) w i t h c = 0, e = 0. P a r t d shows L for Iv (1) (X, y) w i t h c = 0, ~ = ~r. P a r t e shows L for L (1)(x, y) with c = ct ¢ 0, ~ = 0. P a r t f s h o w s L for L (~) (x, y) w i t h c = 0, e = 7r/2. P a r t f shows the a m b i g u o u s p o i n t s in t h e crossings w h i c h r e p r e s e n t neither vertical nor horizontal structures. A satisfying p r o p e r t y of t h e o p e r a t o r is t h a t it t r a n s f o r m s a single line b a c k to a line. Lines a p p e a r to be s o m e sort of p r i m i t i v e s within this t r a n s f o r m s y s t e m . H o w e v e r , as we will see later, groups of lines acquire a c o m p o s i t e d e s c r i p t i o n a n d t h e i n d i v i d u a l lines lose their s e p a r a t e identity. F i g u r e 9a shows the result of processing an i m a g e f(x, g) of two lines joining a t r i g h t angles. T h e r e m a i n i n g p a r t s b - f are v a r i o u s d i s p l a y r e p r e s e n t a t i o n s o~ t h e first t r a n s f o r m f(tl(:c, y). P a r t b gives the l o g a r i t h m of the magnitude, log ] f / U ( x , y ) l. P a r t e gives a t h r e s h o l d e d l o g L with c = 0 and ~ = 0. P a r t s d--f h a v e i n s t e a d ~ - or, e = ~r/2, and ~ = 3~/2, respectively. A n i n t e r e s t i n g p r o p e r t y of t h e t r a n s f o r m is t h a t the direction of the line can be o b t a i n e d in a n y single p o i n t of the t r a n s f o r m e d line. This is an aspect of t h e i m p o r t a n t g l o b a l - t o local f e e d b a c k p r o p e r t y of t h e transform. I t is also interesting to o b s e r v e t h a t t h e e n d p o i n t s of t h e lines as well as the joining point h a v e p r o p e r t i e s t h a t are d i s t i n g u i s h a b l e f r o m those of o t h e r points on the lines.

][66

GOESTA H. GRANLUND

(a)

(a)

(c)

(d)

(e)

(f)

Fro. 8. Result of one level of t~ransformabion. (a) Original image I(x, y). (b) If/u (x, y) I. (e) L o) (x, .~/) displayed wi~h c = 0, ~ = 0. (d) fr ol (x, y) displ~wed with c = 0, ~o= lr. (e) f/n (:% y) displayed wil.h c = ct ~ 0, ~ = 0. (f)L(t)(x, y)disphwed with c --- 0, e -~ 7r/2. F i g u r e 10 shows t h a t a picture composed of a n a r r a y of dots p a r t a is transf o r m e d i n t o a line p a r t b. 7. DEFINING A HIERARCHICAL STRUCTURE W e h a v e seen in tile preceding sections t h a t it is possible to e x t r a c t m o s t of the i n f o r m a t i o n in a picture b y analyzing the content in local regions of v a r y i n g size. W e h a v e also seen some of the effects of sequences of transformations, each w i t h a certain w i n d o w size giving the information within a limited frequency b a n d . T h e question now arises: W h a t t y p e of s t r u c t u r e can combine these two effects in a useful w a y ? I t has b e e n f o u n d useful t h a t the windows become increasingly wider on higher t r a n s f o r m a t i o n levels. One effect of t h e t r a n s f o r m is t h a t it gives a simplification of t h e p a t t e r n . I n order to contain the same average a m o u n t of i n f o r m a t i o n the w i n d o w m u s t b e c o m e wider at higher levels of t r a n s f o r m a t i o n . After e v e r y t r a n s f o r m a t i o n only higher level features remain, and these features have to be r e l a t e d to other f e a t u r e s on the s a m e leveh T h u s the width of t h e operational field or t h e w i n d o w m u s t be increased. T h e o r g a n i z a t i o n suggested for a s y s t e m combining several levels of t r a n s f o r m a tions is i n d i c a t e d in Fig. 11. At the b o t t o m left is the first-order t r a u s f o r m a t i o n covering t h e highest f r e q u e n c y b a n d around r~ and consequently having the smallest w i n d o w size. T h e window size and t h u s the sampling frequency are

A GEN]It]I:~ALPICTURE PIIOCESSiNG OPELLATOI~

16~

(o)

(a)

(c)

(d)

(e)

({)

Fro. 9. Resu[l~ ~l: tr~msf, rm~t:ioa of lines. (~) Origimti im~tge,f(x, y). (b) l~g ]f/tl(x, y)I. (e) [log f,0)(x, y)-threshoId] wilh c = 0, ~o = I). (d) [log f,.u)(x, y)-threshold] with c = 0, ~ = ~r. (e) [log f~(t~(.v, y)-l.hreshold] wil~h c = 0, ~o = rr/2. (f) ['log f~
3rr/2.

i r t d i e a t e d b y a grid p a t t e r n oIt this and o t h e r picture functions. T h e t r a n s f o r m a t i o n gives as a result t h e complex f u n c t i o n L,a)(x, y). I n a c c o r d a n c e w i t h t h e earlier disettssi(m, this t r a n s f o r m e d picture f u n c t i o n has a lower feature densii:y a n d o u g h t to be s a m p l e d at a lower density attd within a Iower f r e q u e n c y ba~ld. T h i s is indieaeed b y the grid p a t t e r n of lower d(,nsity :for frl (t) (x, y). A c c o r d i n g to t h e earlier diseussiott we should p r o c e e d w i t h attother t r a n s f o r m a tiort of frl (1) (X, y). [It has beetl flmnd, however, t h a t a b e t t e r result is o b t a i n e d if i n s t e a d a f u n c t i o n ftrt(t)(X, y) is t r a n s f o r m e d where f'~,m(x, y) = f"(x, y)e i2° = 0

(o)

forF(%

y) >_ 0,

for f"(x, y) < O,

(~)

Fro. 10. l~esult of tr,'msfornu~tion of ~m tirrr~y o[ dol,s. (s,) Origimfl im~ge f(x, y). (b) [log f,(~) (x, y)-threshold] wighc = 0, m = 7r.

168

GOI~:S'P~k H. GI~ANLUND

\

Q

~q

;a ,-4

A GENERAL PICTURE PROCESSING OPERATOR

169

and

f"(x, y) =/~['log [f,.,(1)(x, y)] - ctl~] as

L,(1)(x, y) = If,~(1)(x, y)]e j~°. This transfer function removes low-level noise at a level set by ct~, and gives a compression of the range of values of L, :~)(x, y), emphasizing the middle amplitude range. /c is a proportionality factor. It may be interesting to observe that this amplitude characteristic is similar to certain stimulatioil-response characteristics of the visual system. In order for us to obtain information within lower-frequency ranges, the original picture has to be processed using wider windows and a lower center frequency r2 < r~. From Fig. 11 it is apparent that the transformed and resealed picture is combined with the origin,~l picture to form a picture function d~, (~)(x, y) which is transformed further. The combination c~perator is denoted @ and could in general be addition, multiplieai;ion, or something else. In our experiments it has been found thai: a form of addition gives appealing results. We define

d~(')( x, Y)

=

Y) I + f(x, y)]e j2°.

As we deal with a logarithmic representation of the transform, an addition of transforms will imply multiplication of the pictures. It might be argued that we do not perform any logarithmic resealing on the original image; however, many media for picture input, such as photo)graphic film, do in fact exhibit logarithmic characteristics [-19]. As the next transformation is done around a lower center frequency, r,~, only a band around that frequency will be taken from the original image. This is indicated by the lower resolution grid imposed upon the original picture function for that frequency. The combined picture function d~ (~)(x, y) is new transformed into a function f~2(2)(x, y), which is resealed logarithmically and thresholded and combined with the original picture giving a picture function d,.~C2~(x, y), and so on for each level analogously to the previous discussion. The number of transformation levels to use is still an open question, and will have to be determined after further experiments have been performed. An example of the results of four levels of transformations is given in Fig. 12. Here only the information from consecutive transformations of the highest frequency content of the image is shown. The first level transform indicates edges and high frequency content within the original picture. The direction of edges and high-frequency content is apparent from the various displays having different values of ¢. One of the interesting features is the transform of the pillars in the lower right,, which appears in Fig. 12c strongly delineated from the surroundings. In contrast, the roof abe)re the pillars appears in part e. In the second level transform we have in part j a strong ccmtribution in the region of transition from the pillars to the roof. These and similar features can be observed from the different transform displays. It becomes, however, increasingly difficult to follow

170

Xe~/4

-?~¢× . ; r~ ,7,

C,OI.~ £A H. GRANLUND

l l 18 I N /HIi/I/1/i roll /lib II (o)

Levg/

Le¢¢/a"

cg)

l,,2v¢1 f

GJ O¢'~¢. ~/)

( t,../

(¢)

o7

0")

r~/

(d)

{¢)

fry

original F (x, .¢)

o) Fick 12. ltesu[(; of four levels of transformation, c = 0. w h a t is h a p p e n i n g on the higher levels. If we look at the display in part o, we c a n essentially only say t h a t there are two structurally complex regions within t h e picture. This m a y at first seem like a meager result, but it need not be. T h e c o m p l e x i t y indicated within these regions does not here imply a large highf r e q u e n c y content, although only t h e information in the original picture w i t h i n a n a r r o w high f r e q u e n c y band has been employed. I t is rather an indication of a n a m o u n t of structure or complexity t h a t has been retained t h r o u g h the transform a t i o n levels. A uniform high-frequency content will disappear after two transformal)ions. F o r leve.ls ',3 a n d 4 the displays with ~ = 7r/2 and e = '37r/2 were not recorded because their contributions appeared v e r y small. In Fig. 13 another example is given of a picture wi~h three levels of transformation.

A GENERAL PICTURE PROCESSING OPERATO[{

171

8. CONCLUDING REMARKS This work has dealt with the problem of what can be achieved in a picture processing system if one is restricked to a single type of operator to be used in t h e processing of image information ranging from texture to high-level features. One reason for this restriction is the assumption t h a t the visual, system of humans a n d animals is a repeated array of some fairly simple standard operators. A n u m b e r of similarities to visual systems in structure as well as effects could be pointed out, b u t that would be outside the scope of this presentation. The basic function of the operator is to detect and describe structure as opposed to u n i f o r m i t y within local regions, whatever structure and uniformity may imply at a particular level. The operator performs a mapping from one complex field to another. T h e significant characteristic of this approach is the use of complex fields, which allows a global-to-local feedback, e.g., at e v e r y point of a line we can determine its direction. In the process of transformation the hnage is simplifled considerably. T h e origina[ image, which is a scalar field, can be viewed as a special ease of a vectorial field with the phase 'mgle set to zero. A complex original im~ge is obtained if we include color information and represent the luminance as t h e magnitude and the hue as the angle of the vector. T h e saturation, which carries less information, is left out. The system will then be able to delineate ~.~

HQgn/h,&

L ~vel ~

r/)

I

~J

t,~J

~ ~

02

L~/t

(il

q)

WIIIIIIII (b)

td~

Ce)

f(x,y)

FIG. 13. Result of ~hree levels of ~ransformation, c = 0.

172

GOESTA H. GRANLUND

between regions of different color as well as regions of different luminance and texture. The use of a complex representation which also includes direction seems to be quite effective for edge finding. Preliminary experiments indicate that edge candidate elements that are generated on the lowest level are propagated through the next level only if they have the same or similar directions; that is, they are part of a true edge. Noise tends to give edge candidate elements with random directions which are not propagated through the next level. The multiplieative effect between results on different levels gives the advantage of sharp edges due to the small operation area of the lowest level operator combined with the high suppression of noise due to the fact that the operation on the higher level is more global. Certain ambiguities can arise due to the fact that the input is a complex field. This means that the same output can be caused by several different combinations of magnitude and phase of the input field. This gives rise to certain "illusions" of the system. This presentation is centered around a Fourier implementation of the operator. It is far from certain that this is the most effective implementation. In general any meaningful property can be used, which can be described by a two-dimensional vector and can be computed within local regions of the image. There are indications that variable sensitivity, dependent upon the average output from the operator over local regions larger than the operator windc~w, is a useful feature. This will emphasize faint and isolated features which now have a tendency to be lost through repeated transformations. An interesting feature can be observed in the processing of edges and texture. It is well known that edge components and texture may have similar characteristics over small regions. In order to distinguish between the cases, larger regions have to be analyzed. If the texture properties are constant over a large area any trace of them will have disappeared on the second level transform. Only the borders will be left. The edges, on the other hand, are defined only over a small band region and will remain through all transformation levels, or they will combine with other features, e.g., lines, and produce a new, different feature on the higher level. Finally, it should be noted that the transform described is nonlinear, and that a reconstruction of the original image from transforms cannot be done. Most notably phase information is discarded in the processing. The computations have been performed on a PDP=9 computer, and the computation of one level of the transform for an image of 256 X 256 elements takes three minutes. A planned special hardware processor will reduce this computation time to less than one second. REFERENCES l. H. C. Andrews, Computer Techniques in Image Processing, AcademicPress, New York, 1970. 2. T. S. Huang, Picture Processing and Digital Filtering, Topics in Applied Physics, Vol. (i, Springer-Verlag, Berlin, 1975. 3. A. Rosenfeldand A. C. t(ak, Digital Picture Pro~.essing, Academic Press, New York, 1.976.

A GENERAL PICTURE PR,OCESSING OPERATOI~

173

4. R. O. Duda and P. E. Hart, Pattern Classtfication and Scene Analysis, Wiley-Interscience, New York, 1973. 5. K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, 1972. 6. T. N. Cornsweet, Visual Perception, Academic Press, New York, 1970. 7. S. H. Unger, A computer oriented towards spatial problems, Proc. IRE 46, 1958, 1744-1750. 8. B. H. McCormick, The Illinois pattern recognition computer, IEEE Trans. Electron. Comput. EC-12, 1963, 791-813. 9. M. J. E. Golay, Hexagonal parallel pattern transformations, IEEE Trans. Computers C-18, 1969, 733-740. 10. B. Kruse, A parallel pictm'e processing machine, IEEE Trans. Computers C-22, 1073, 10751087. 11. B. S. Gray, The Binary Image Processor and Its Applications, Information International Inc., Los Angeles, Calif. 90064, 90365-SC-Jan. 1972. 12. 1VI.J. B. Duff, D. M. Watson, and E. S. Deutsch, A parallel camputer :for array processing, in Proc. I F f P Cong. 74, 1974. 13. M. J. B. Duff, Clip 4: A large scale integrated circuit array parallel pr~cessor, in Proc. Third Intern. Joint Conf. Pattern Recognition 1976, pp. 728-733. 14. R. Bracewell, The Fourier Transform and Its Applications, McGrt~w-Hill, New York~ 1965. 15. J. W. Goodman, Introduction to Fow'ier Optics, McGraw-Hill, New York, 1968. 16. R. Ba]csy and L. Lieberman~ Texture gradieng as a depth cue, Computer Graphics Image Processing 5, 1976, 52-67. 17. J. S. Weszka, C. R. Dyer, and A. Rosenfeld, A comparative study of texture measures for terrain classification, IEEE Trans. Systems, Man and Cybernetics SMC-6, 1976, 269-285. 18. R. M. Haralick, K. Shanmugam, mid I. Dinstein, Textural features for image classification, IEEE Trans. Systems, Man and Cybernetics SMC-3, 1973, 610-621. 19. T. G. Stockham, Image processing in the context o[ a visual model, Proc. IEEE 60, 1972, 828-842.