Simulation of process of forming the language for description and analysis of the forms of images

Simulation of process of forming the language for description and analysis of the forms of images

Pattern Recognition Pergamon Press 1972. Vol. 4, lap. 101-140. Printed in Great Britain Simulation of Process of Forming the Language for Descripti...

2MB Sizes 0 Downloads 37 Views

Pattern Recognition

Pergamon Press 1972. Vol. 4, lap. 101-140.

Printed in Great Britain

Simulation of Process of Forming the Language for Description and Analysis of the Forms of Images I. B. M U C H N I K Institute of Automatics and Telemechanics, Academy of Science of USSR, Moscow, USSR

(Received 15 January 1971) Abstract--The theme of this paper is a linguistic approach to the problem of pictorial pattern recognition. Instead of the vocabulary of primitives and grammar being selected by the designer, this new pattern-recognition system is able to make a new description, give the vocabulary, and construct the grammar. The vocabulary consists in two kinds of words, "'forms" of characteristic fragments of pictorial patterns (images) and "'locations" of these fragments. The enumeration of the forms and locations describes a pattern. The general scheme of the process of forming the language, the ways of extracting the characteristic fragments, the vectors (sentences) connecting "'forms" and "location", and the algorithms for constructing and comparing the image descriptions are described. Simulation of forming a description language and analysis of various forms are presented.

INTRODUCTION A NEW APPROACH to the problem of picture identification has recently been developed. In contrast to the usual geometrical approach ~1-6~in which an image is viewed as a single entity characterized by a set of numbers (for instance, the blackness of all points of the raster), in this new approach, the image is supposed to be made up of parts. The basic idea of the system of analysis of the image consists of a special language. The language to describe the images is constructed by a vocabulary of primitives and a set of rules (grammar). Because of this, this approach is usually called a linguistic or structural approachF ,8,11-21~ In the papers on the linguistic approach, usually the vocabulary of the primitives and the grammar are selected by the programmer. It, therefore, seems necessary for each new class of images for which the machine should be able to make a new description, to give the vocabulary of the primitives and the grammar. In the present paper, research is described which was carried out by the author in the laboratory of Pi'ofessor M. A. Aiserman, at the Institute for the Control Sciences (Automatica and Telemechanika). An alternative possibility is described which is connected with the development of learning machines. It is assumed that the machine, looking at a pattern of a certain class, should by itself form a language which will be appropriate for analysis. The paper describes a specific class of learning machines which forms a very simple linguistic system. The vocabulary of this machine consists of two types of words. The words of the first type are used to represent the "form" of certain characteristic fragments of images. The words of the second type are used to represent "location" of these fragments on the picture. The proposed language has an extremely simple grammar. The description of a pattern is equivalent to the enumeration of the locations and forms of the characteristic fragments. 101

102

I . B . MUCHNIK

The paper follows the following arrangement, Section 1 describes the general scheme of the process of forming the language. Sections 2-5 describe the algorithms: in section 2 the ways of extracting the characteristic fragments from the patterns under study are described; section 3 shows how, for the given characteristic fragments, the vectors which give their "form" and "location" are determined; section 4 is dedicated to the algorithms of constructing and comparing the image descriptions ; in section 5, certain additions to the scheme ju3t shown are given. These additions allow the introduction of a certain new broader grammar into the language formulated by the machine. Finally, in sections 6 and 7, experiments of simulation of the process of forming a description language and the analysis of various forms are described. 1. THE GENERAL SCHEME OF THE PROCESS OF F O R M I N G A LANGUAGE It is assumed that in many cases of practical interest, it is possible to characterize the pattern of the image sufficiently good by characterizing the relative locations of its local geometrical features. Because of this assumption, the process of the analysis of the images, which are given to the machine at the time of learning (the case) is started by outlining those fragments which are related to some geometrical features. These fragments are called characteristics. In order to extract the characteristic fragment, a special method of analyzing the images was derived. This method does not require concrete specification or specific data of these geometrical features which have to be found. After having found all of the characteristic fragments in all the images under study, the machine proceeds to formulate the vocabulary of the "form" of the characteristic fragments. Each fragment will have a vector associated with it. The parameters of this vector yield its form. In the simple case, this vector may just be a set of values of the blackness of its points. Then this set of vectors which describe the form of the characteristic fragments will be sorted using algorithms of automatic classification (learning without a teacher). ~22'23) In keeping with the idea of the method of automatic classification, the accumulated set of characteristic fragments is shown to be divided into classes of "similar shape". Each class found as a result of the classification of the fragments is assumed to be a separate "word"; the vocabulary "form" is the set of these words. With the help of such a system, for each new fragment it may be shown to which of the classes we now have, this new fragment is the "closest according to form."* The forming of the vocabulary of "locations" is carried out in exactly the same way. At first, for each of the separated characteristic fragments a set of characteristics of its "location" is found. In this paper, we will take a set of characteristics which describe the diagram of the directions of the location of all characteristic fragments of the image in relation to a given fragment. Then the whole group of these sets will be divided into "clusters" on the basis of the same algorithms of automatic classification mentioned earlier. The classes thus found describe the vocabulary "location." The formation of the vocabularies "form" and "locations" completes the first part of the learning by the machine. * This procedure of relating a fragment to one of the existing classes may be accomplished by any of the conventional algorithms of pattern recognition which are based on the geometric approach using prototype fragments which were used in the initial classification."'2~

Simulationof process of formingthe languagefor descriptionand analysisof the formsof images

103

The second part consists of teaching the machine to construct a description of the image using the vocabularies found. Also, in developing a method for comparing these descriptions, it would be possible to describe images which are similar "from the human point of view" and group them into classes (patterns). One should introduce the following elementary statement characterizing the image: "On the image, at location so and so, there is a characteristic fragment of such and such form." The description of the image will be given by a characteristic table or "formlocation" array. The rows ofthis array correspond to the different elements ofthe vocabulary "form". The columns of the different elements of the vocabulary "location" which corresponds to each element of this array is an elementary statement. For each element of the array where the row and column meet, a number will be assigned which will characterize the "reliability measure," which means the reliability of the statement connected to this element. In particular, one may use the simplest coding of the truth of the statement; namely, "true" or "false" ("1" or "0"). Consequently, we will introduce a space, each point of which will be given by a "formlocation" array. It is assumed that points which are close in this space are in reality describing patterns which are similar. This space will be called a space of elementary statements. Thus, it is again possible to use the geometrical approach. The procedure of finding the rules of recognition of the images which are transformed into points in the space of elementary statements ("form-location" arrays) consists of finding a separating function in this space. This problem is solved by the conventional geometrical methods. "'2'4) The investigation of the degree of correspondence of the closeness of the points in the space of elementary statements to those in reality and to our ideas of closeness patterns is the main task of the experimental part of the present paper. It is assumed that all images are given on a discrete raster of one dimension, but that this dimension may be changed in a very wide range. The processing of the images will be performed sequentially. There are two possible schemes for the operation of the general algorithm. In the first scheme the algorithm proceeds stage by stage. First, the characteristic fragments are found on all the data for the analysis of the images; then for each of these fragments the vectors are calculated which characterize their "form", "location", etc. In the second scheme the algorithm starts by completing all the stages of the processing on one image, then it repeats the same procedure of analysis on the next image, and so on until the whole group of images to be processed is exhausted. 2. THE ALGORITHMS FOR EXTRACTING CHARACTERISTIC FRAGMENTS In this section three algorithms for finding the characteristic fragments of an image will be described. As usual, a program for any algorithm should have all possibilities of all kinds of modifications which will allow the experimenter to try to find a better solution. Most important of these possibilities will be described separately during the description of each of the algorithms. As input information, for each of the later described algorithms we will use the set of images given directly in the form of a collection of values of the blackness of all the squares of the raster. In all the algorithms we will use a "window" on the image with which it will be possible to cut out a fragment of some simple form, for instance, a square.

104

I . B . MUCHNIK

The general idea of the algorithm is based on introduction of a special function which assigns a number to each fragment cut out by the window. In this way, this function for each of the fixed images will depend only on the coordinates of the center of the window which yields its location. We will call this function an informativity function. In this paper, a characteristic fragment will be chosen such that the informativity function will assume the value of the local extremum. 2.1. Algorithm of search with prototype images The algorithm uses a special image as shown in Fig. 1 where the numbers in each square show the blackness. In this image the blackness is maximum in the center and falls monotonously with the distance from the center. Such an image will be called a prototype. As an informativity function, we took the Euclidean distance in the space of the receptors which are cut out by the window "2~ between the prototype image and the present fragment (Fig. 2). The viewing of the field of image in the window with the goal of finding the characteristic fragment is done in the following way. (1) A random point is chosen, and at this point the center of the window is placed. (2) The fragment of the image cut out by the window is compared with the prototype image and the value of the chosen search function is computed. (3) By changing the location of the center of the window, a gradient search for a local minimum (or maximum) of the informativity function is performed ; the fragment which is cut out by the window in the point of the local minimum (or maximum) is chosen as the characteristic fragment. (4) If the trial steps do not change the informativity function, then a step in a previously fixed direction is made. (5) Whenever the limit of the field of the image is achieved, or whenever the window exits on a "white" fragment, a random step is performed. (6) After the characteristic fragment is found, a random step is again made and the whole process is repeated in order to accumulate the given number of characteristic fragments. The basic modifications which will be made in the program of this algorithm are the following: (1) providing the possibility of wide variations of the dimensions of the cutting window; (2) providing the possibility of changing the prototype image (actually, two ways of realizing the prototype image are used : (i) directly in the memory when each square of the prototype image represents a random number given from the outside and (ii) using a function of one variable which shows the dependence of the blackness of the prototype image from its center to the boundary and given by one or two parameters); (3) the possibility of changing the rule of search of the characteristic fragment, in particular, the possibility of simple scanning of the analyzed image with sampling of all of its characteristic fragments; (4) the possibility of working with a cutting window of circular form, (5) the possibility of changing the form of the informativity function, and, in particular, the possibility of choosing as an informativity function the scalar product between the prototype image and the present fragment in the space of the receptors of the window. 2.2. Algorithm of search with de-focused image If the contour image will be de-focused (smeared), then in the locations of a pronounced change in form (such as "small corner", "intersection", etc.) the blackness will be found to

Simulationof process of formingthe languagefor descriptionand analysisof the formsof images

105

be the highest. Taking into account this fact, the following algorithm of finding the characteristic fragments is being worked out. The blackness of the initial image is transformed into a new image 6'(x) by the following transformations :

~'(x) = ~ ~b(x, y)~(y)

(1)

Y

where ~(y) is the blackness of the initial image in point y and ~k(x,y) is some function which decreases with the increase of the distance between the points x and y. The idea of introducing the function ~k(x,y) is in the a priori weight function given for the closeness between points on the raster. Following Reference (1) we will call ~,(x, y) the potential function (on a plane). For the centers of the characteristic fragments we will choose such points x where 6'(x) has its local extremum values. In Fig. 3 the effect of transformation (1) is shown in the example of transforming a black-white image of the digit 5. The system for searching for the local extremums, tS'(x) in this algorithm, coincides with the system described in the previous algorithm. The program of this algorithm allows changes in the form of the function @(x,y) in a wide range and, in particular, changes in the size of the neighbourhood of the point x, which is different from 0 (origin). Thus, one may arrive at the possibility of changing the "degree" of the de-focusing of the image which is to be analyzed. The possibility of changing the de-focusing allows us to extract characteristic fragments of different sizes. (The larger the "'degree" of the de-focusing, the larger the window which cuts out the fragment should be.) Additionally, a very heavy initial de-focusing will speed up the search of a small image on a large raster (Fig. 4). 2.3. Algorithm of search with a "decorrelation" of images As an informativity function for finding the characteristic fragments we will choose in this algorithm a function which will be computed as the Euclidean distance in the space of the receptors of the cutting window between the presen', fragment given in the initial form and the same fragment transformed according to equation (1). Such an operation is analogous to "decorrelation" of the image very often used in the method of compressing the band width of a television channel/241 In papers which describe these methods, it is shown that there is a strong effect of emphasizing the geometrical features on the decorrelated image. In order to decrease the volume of the computations in realizing the given algorithm, a special coarse method of picking the characteristic fragments will be used. The window will be placed initially at a random point. The value of the informativity function is then computed. Then three steps are made : vertically up, vertically upwards-left, vertically upwardsright. These steps are made for one square of the raster. The center of the window is placed in one of these three points; namely, at the one in which the function value was minimum (maximum). If this means that the shift is done upwards, then three new steps are taken, and according to the same rules, a new point is chosen. If the shift is performed vertically upwards-left, then the following three trial steps are performed in the different direction: upwards-left, left, downwards-left (an:dogously, in the case of shifting upwards-right; upwards-right, right, downwards-right). In this way the window has an oriented cone of directions of the trial steps which will turn according to the change in the direction of movement of the window. After a while, when the window has made a trajectory with a

106

I . B . MUCHNIK

given number of steps, or if it touches the boundary of the raster of the analyzed image, a starting point of a new trajectory will be randomly chosen. This will be done until a prespecified number of trajectories are completed. The algorithm will choose a characteristic fragment such that a local minimum (maximum) of the informativity function will be achieved along the trajectory of the movement of the window. Figure 5 illustrates in a schematic way the features of the operation of this algorithm. The described algorithm will not evidently exhaust all means of finding the characteristic fragments. As is seen directly from inspecting these algorithms, it is easy to construct other algorithms which will fulfill the same task. They are only examples, but one may probably say that these are the "natural" or "typical" examples. In connection with this, it should be noted that the first algorithm described was used by Mr. N. V. Zavalishin as a basis for formulating a psychophysiological hypothesis about the pattern of fixation of the eyes of a human being when looking over an image. ~9) I

I

I

I

I

I

I

2

2

2

2

I

I

2

3

3

2

I

I

2

3

3

2

I

I

2

2

2

2

I

I

i

I

I

I

I

FIG. 1.

FIG. 2.

Simulation of process of forming the language for description and analysis of the forms of images

|

II

1

FIG. 3.

FIG. 4.

....

\

I

•7~,~ II.

FIG. 5.

.... /

<

107

108

I.B. MUCHNIK 3. A L G O R I T H M S F O R C O N S T R U C T I N G T H E SPACE O F T H E F O R M S AND LOCATIONS OF THE CHARACTERISTIC FRAGMENTS

The algorithms which will be described in this section will use the following initial information. For each separate image an accumulation of all of its parts is cut out by the moving window as a characteristic fragment, and for each of these parts the coordinates of some fixed point of the window (for example, center point). (The coordinates are determined in the usual rectangular system with a direction of the axes and the origin which will be connected to the raster in a random but fixed way.) It is exactly this information which is received in the machine as a result of the work of the previous programs which realized the algorithm of finding the characteristic fragments. 3.1. Algorithm of constructing the vectors which give the jorm of the characteristic J'rag-

ments Let us introduce the space of parameters which determine the form of the characteristic fragments. We will call this space the " f o r m " space and will denote it by The form of the characteristic fragment as a separate image will be completely determined by the blackness given by a point-by-point method on the raster of the cutting window. The collection of these values may be used directly as a point in the " f o r m " space ~'~. In other words, we may use directly the space of receptors of the cutting window as f f space. But as was shown in References (23, 24) it is better to first de-focus the image fragments. This enables us to " s m o o t h " the small changes always present in the relative location of the window on the raster with fragments of the same form but found in "different context". Thus, the first step in the algorithm in constructing the space for the fragments will be the transformation of the vectors in the space of the receptors {6(x)} according to equation (1). The second (last) step of the algorithm consists of normalizing the vectors found after the transformation 6'(x) f(x) = (2)

x/-~ 8' 2(x) x

where 6'(x) is the value of the blackness of the transformed fragment at point x a n d f ( x ) is the value of the x component of the vector of the " f o r m " space ~ . 3.2. Algorithm of constructing the vectors which yield the location of the characteristic

fragments The initial information for constructing the vector which given the location of a given fragment on a given image is the accumulation of the pairs of coordinates of all the characteristic fragments of this image. On the raster of the image we will make a circle with a radius smaller than the distance between the centers of two neighbouring fragments, around the center of the given fragment. On this circle we will note all points which are found outside this circle on the radial lines which connect the center of the given fragment with the centers of the other characteristic fragments of the image under study. Let us introduce a special function given on the circle and related to a fixed point ~ on the circle. We will choose it so that it will monotonously decrease with the increased distance on the circle from this point ct. Let us call it the standard diagram of the point ct, and describe

Simulation of process of forming the language for description and analysis of the forms of images

109

it by K~(fl). As an example of such a function we may have 1

K*(fl) = 1 + a{Ict-fll(mod n)}'

(3)

where t ~ - ill(rood n) is the distance between the fixed point ~t and a moving point fl on the circle and a is a positive constant. Let us note that K*(fl), viewed as a function of two variables ~ and fl, is the potential function on the circle. (1'2) The typical form of such a function is shown in Fig. 6. Now we will find, for the given fragment, the function on the circle which represents the sum of the standard diagrams of all points noted on its circle.

K(fl) = ~_, K~(fl).

(4)

~t

Finally, let us normalize the function K(fl): K(fl)

P(fl) = x/~ 2o"K2(fl)dfl"

(5)

Let us introduce now a space in which the points (collection of parameters) characterize the location of the characteristic fragments. We will call this a "location" space and denote if by ??. The function p(fl) found in this way will be chosen as the points in the space ~ which corresponds to a given characteristic fragment. The geometric interpretation of the function is self evident. It represents a smoothed and normalized function of locations of all characteristic fragments of images relative to a given fragment which resembles a radiating star. The program which is used for determining the points of the space corresponding to the characteristic fragments chosen is a coarse form of the procedure described above. Figure 7 shows a system of eight overlapping sectors of directions into which the plane may be divided relative to a given point. Using this system of sectors of directions for each of the chosen fragments, a vector will be calculated for each of the sectors located about the center of the fragment. It shows how many other fragments (their centers) are located inside its boundaries. The computed vector will now be normalized and then it will be viewed as a point of the space ~ corresponding to the characteristic fragment. In the program there is a provision for using an even more coarse procedure for determining the location of the characteristic fragment. For a point in space ~ which corresponds to a characteristic fragment, we will take a Boolean vector which will show for each sector located about the center of the given fragment whether or not there exists at least one of the other fragmentary images (their centers) insides of its boundaries.

(U FIG. 6.

3 2

FIG. 7.

110

I.B.

MUCHNIK

4. ALGORITHMS FOR CONSTRUCTING THE DESCRIPTION OF THE IMAGES AND FORMULATING THE SYSTEM OF PICTURES (CLASSES OF IMAGES) After running the programs for the algorithm as described in this paper, each characteristic fragment in the machine is represented by two vectorsfand p, which accordingly represent the "form" and the "location" of the given fragment. As we already said previously, on the basis of the set of the vectors {f(x)} we may form a vocabulary of "form", and on the basis of the set of the vectors {p(x)}, the vocabulary of "locations". The formation of the vocabularies occurs separately using the same algorithm of automatic classification as described in Reference (23). As a result of running this algorithm, each vector f which is stored in the memory of the machine will receive an index which encodes the type (name) of the form of the corresponding characteristic fragment. In an analogous way, each vector p will have an index which encodes the type (name) of the location which corresponds to this fragment on the image. Additionally, the machine has a formulation of a decision rule which makes it possible to find the indices of any new vectorsfand p. These rules are given in the machine in the form of the functions

dy = max{(c(,f)-aiV},

i = 1. . . . , K r

dp

j = 1. . . . . Kp

(6) P p ) - a j P);, = max{ci, J

where {c} are vectors and {a} are constants which give the boundary which separates the given space into subspaces ; (x, y) is the conventional notation for a scalar multiplication between the vectors x and y; KF is the number of subspaces into which the space ~" is divided and K~ is the same for space ~. The pointfis assumed to belong to the sth subspace of ~" if

d f = ( cFs , f ) - a s .F

(7)

In an analogous way, the point p belongs to the tth subspace of ~ if

dp = (Ge, p)-at e.

(8)

That is how the process of forming the vocabulary for describing the characteristic fragment is completed. The assignment of vectorfto the sth subspace may be naturally viewed as a sign of the existence of some particular form S in its characteracteristic fragment. On the other hand, if the vectorfdoes not belong to the sth space, it is assumed that the corresponding fragment does not have the characteristic of the form S. The number

dr(f i) = (cf, f ) - af

(9)

may be viewed as a "reliability measure" of existence of the characteristic i in the corresponding fragment (vector f). The coding of the values of the characteristics of the locations is given in the same way. As was mentioned earlier, with the aid of formation of the initial features characterizing the "form" and the "location" of the geometrical features of the image, the elementary statements may be naturally constructed. These statements determine "Is there on the given image at such and such a location, a fragment (at least one) of such and such a form?" The value of such an elementary statement may in the simplest case have one of the following two values: "there is" or "there is not". But also, exactly as was done for the initial

Simulation of process of formingthe languagefor descriptionand analysisof the formsof images

111

feature, one may attach to the elementary statement a number which characterizes the "reliability measure" of the truthfulness of the given elementary statement. As such a number, one may choose, for instance, the average of the product

dF(f,i).dp(p,j) (10) which is taken on all fragments on a given image. W e will use one notation, namely, g(i,j) for the value of the elementary statement independent of whether it is expressed in a discrete form or in a continuous form. The entire set of the possible values of the elementary statements of this kind which may be found for a given image will be used as the description of such an image. It is given by the matrix Ilg(i,j)ll. The first algorithm for constructing the description of the images is based on the construction of a discrete matrix IIg(i,j)ll. (1) Using the equation (6) for all the characteristic fragments of an image one finds the corresponding pairs of "form-location" indices. (2) If in the list of these pairs there is at least one pair (i,j), then the value of the corresponding element g(i,j) equals "1". Otherwise, the value of the elementary g(i,j) equals "0". The second algorithm computes for the image the continuous values of the element of the matrix (g(i,j). (1) For each fragment two sets of numbers are found: {dr(f, i)} and {dp(p,j)}. (2) Based on these sets, a matrix of all possible products Ildr(f, i). dp(p,j)[I will be found. (3) A matrix which shows the average arithmetic of the matrices Ildr(f, i). dp(p,j)lf is found for all characteristic fragments of the image. Let us denote the calculated matrix by IId(i,j)tl. (4) The matrix Ild(i,j)lL may be viewed as a vector whose components are the element of this matrix which will now be normalized. The resulting matrix will be chosen in this algorithm as the description of the image.

d(i,j) g(i,j) = x/-~ d2(i,j)

(11)

i,j

Let us now take a look at the Euclidean space on the axes of which we will take the values of the elements of g(i, j). This space will be called the space of the elementary statements. Let us denote it by S. Each point of this space is a description of an image. Taking into account the algorithms for constructing the descriptions, it is natural to assume that "similar" images will be clustered in this space. Such clusters will be called pictures. The algorithms for forming the pictures is therefore equivalent to the already known procedure of automatic classification of points of the space of the elementary statements S. 5. ON SOME POSSIBLE ADDITIONS TO THE PROPOSED SCHEME OF THE LEARNING MACHINE The discussed variation of a learning machine forms a very simple linguistic system for the analysis of images. The vocabulary of this machine consists of only two types of words. The words of the first type are used to denote the "form" of some characteristics of the fragments of the images. The words of the second type serve to express the "location" of such fragments on the images. The proposed language has a very simple grammar: the

112

I.B. MUCHNIK

description of the image is actually an enumeration of "form" and "location" of all the characteristic fragments. It so happens that the described variation may serve as the basis for constructing a more advanced linguistic system for analysis of the images. In order to make this clear, in the present section we will analyze three concrete examples of the modification of the above described scheme of a learning machine to the analysis of images. The first of these examples is connected to the use of logical methods for the analysis of descriptions of images which are described in the form of "form-location" arrays. The second is connected to the changes of the scheme, namely, change of the integral description of the location of the characteristic fragment by a differential description which will characterize its location ; for instance, according to only one (previously fixed) fragment of this image. The third modification will touch the very base of the scheme. It is proposed here to use as the elements of the vocabulary of elementary features, not the local characteristics of the form of the images, but its global characteristics which give the possibility of evaluating the type of "movement" of its lines and contours ("tracking features"). As it will be shown later, all these modifications may be viewed also as additions to the above described scheme. 5.1. The description of the class of images in the form of a logic function of the discrete characteristic "form-location" arrays Let us form the arrays of "intersection" and " u n i o n " of the discrete characteristic arrays of the image of the given class which are fed to the machine for learning. Let us denote them as C and U. The rows and columns of C and U have the same meaning as the rows and columns of the discrete characteristic arrays of the images. When there is a " l " in a particular cell of all the discussed characteristic arrays, a "1" will be put in the corresponding cell of the array C. If there is a " 1 " in any cell of at least one of the characteristic arrays, then a "1" will be put in the corresponding cell of the array U. L e t us denote by the statement "the array A is included in array B" the following relation between these two arrays: for all the cells of array A having a "1", there must also be a " 1 " in the corresponding cells of array B. One may say for instance that the array C is always included in the array U. With the aid of such a statement, one may introduce the following descriptions of the class of the images : an image which has as its characteristic array X belongs to a given class of images if the array X is included in the array U of the given class and if at the same time the array C of the given class is included in the array X. The description of this kind actually says what "absolutely must exist" on the images of a given class and what "may exist" on the image of a given class. 5.2. Characteristics which indicate the relative location of two (three) local geometrical features on the image Let us review now all possible pairs of local geometrical features of a single class extracted from all the data to teach the machine. For each pair one may introduce the following two characteristics. The characterististic of the direction. Let us again view the diagram of Fig. 7 and also Fig. 8. To the system of vectors which may be constructed using such figures, it is only natural to assign the terms: "upwards-left" (ul), "upwards-right" (ur), "downwards-left" (dl), "downwards-right" (dr), "upwards" (u), "right" (r), "downwards" (d), and "left" (1). Using this system it is possible to describe the relative location of two fragments as the

Simulationof process of formingthe languagefor descriptionand analysisof the formsof images

113

phrase, "'on the given image the fragment of form a is located upwards-right from the fragment of form b." Obviously, for a more accurate description of the relative location of local geometrical features on the image it would be necessary t.o take the original diagram with a larger number of vectors which fix the directions of the parts.

The logical characteristics. Let us include in the phrases which we introduced previously one of the following logical characteristics. (a) It exists on all images of the discussed class. (b) It exists on at least one image of the discussed class. (c) It exists on all images of the discussed class, and it does not exist on even one image on any other class given to the learning machine. (d) It exists on at least one image of the discussed class, and it does not exist on even one image of any other class given to the learning machine. Using logical characteristics, one may proceed to extract pairs which correspond to such characteristic. More complicated phrases which show the existence of three specific, according to form, local geometrical features on the image and which exist in a special relative location, may be constructed in an analogous way. In the same way as we have done for the pair, one may take the triples and introduce the same logical characteristics. Let us denote as "terms" the so found pairs and triples of the data under study. The terms will be viewed as logical variables. Any function of the conjunctive-disjunctive form of these variables such that its arguments include all the fragments chosen on all the images of the class under study will be viewed as a description of this class. It seems that it makes sense to deal with the descriptive and discriminant classes separately. To the first description, the corrt sponding functions are such that all its variables are terms found during the process of learning which uses the logical characteristics of type " a " and "b". To the second, a function all of whose variables are terms found in the process of learning with the use of logical characteristics of type "c" and "d". Thinking in a different way, the descriptive class is given in terms of finding, during the analysis of the images, only one class irrespective of how the images of the other classes are organized. In contrast, the discriminant class is given in terms found by comparing the images of one class with those of other classes.

5.3. The formation of "tracking features " For images which consist of lines we will take the function which gives the relative location of the cha.racteristic fragment on the image as the coupling ratio. (2°) The fragments are considered to be coupled if they are connected by a line. On the other hand, one might distinguish between connecting lines according to the forms "the line of such and such a slope," "'arc convex to the right," "spiral," etc. In terms which give the form of the lines, one may also characterize the form of the geometrical features in which the lines coincide; for example, "the given characteristic fragments is the union of the line such and such, such and such, and such and such type." In such a way a new task if formulated--the task of formation of the vocabulary of the global linear "tracking features". The general plan of the solution is clear. The first step is the development of algorithms for finding the connecting lines. The second step is the construction of the space in which the points define the form of these lines. The third step is the automatic classification of the set of points in this space.

114

I.B. MUCHNIK

The algorithm of discovering the connecting lines. This algorithm is actually a modification of the algorithm of extraction of the characteristic fragments using a prototype. But here there are some constraints put on the prototype image ; namely, on it there must be a zone outside of the points where the blackness is less than ½; i.e. it must be a prototype image which allows the extraction of the ends of the line. In the algorithm new operations on the images will be used : the erasing and restoration of certain parts. In the description of the algorithm there are no explicit characteristics of the lines so found. It is assumed that before the algorithm starts, all the characteristic fragments are already found. (1) The moving window is placed on one of the characteristic fragments of the initial image. (2) The section of the image which is seen through the window, of which the corresponding points on the prototype images have a blackness above ½ ("half" of the fragment seen through the window), are erased. (3) Now the new fragment of the image seen through the window is compared with the white (empty) fragment ; if the new fragment is white, then the entire image is restored, and we come back to step (1) and choose a new fragment. It is now certain that the characteristic fragment from which we started the search of the connecting line is isolated. In the other case, the process of finding the line proceeds. (4) One makes a search of the local minimum of the informativity function. (5) If the local minimum corresponds to a fragment which contains a center of any of the previously extracted characteristic fragments, then the process of finding the connecting line is completed. As a connecting line, one assumes the straight line which connects the initial and the final fragments. If the described condition is not fulfilled, then starting from the found local minimum as a beginning, one again follows the steps (2)--(5). As a connecting line we choose the broken line which crosses from the initial characteristic fragment through all found local minima to the last characteristic fragment which was found according to conditions in step (5). (6) The process described in steps (1)-(5) is repeated, each time starting from one and the same characteristic fragment until it is interrupted in step (3). As a result, in the memory of the machine we have stored all the lines which connect a given characteristic fragment to the other characteristic fragments of the same image. (7) The process according to steps (1)-(6) is repeated for aoother characteristic fragment, etc., until for each of the characteristic fragments we find a list of all the lines connecting it with the other fragments of the same image. Algorithm for constructing the space of connecting lines. Each of the connecting lines is described in the machine at the end of the operation of the previous algorithm in the form of a sequence of pairs of coordinates of the points in the field of image (x, y)..As the standard beginning of the sequence, we will choose one of the two characteristic fragments of which the x coordinate is larger. If the x coordinates of the fragments are the same, then as a standard beginning we will choose the characteristic fragment for which the y coordinate is larger. Let us construct a 2n-dimensional vector corresponding to any sequence of pairs of coordinates (connecting line) where n is much larger than the maximum possible number of points in the sequence. (The number of points in the sequence we will denote by k.) As the first n/k components of this vector, we will choose the x component of the standard beginning of the sequence. As the next n/k components we will choose the value of the y

Simulation of process of forming the language for description and analysis of the forms of images

I 15

component of the standard beginning of the sequence. The third n/k components of the vectors are equal to the x component of the second point in the sequence. The fourth n/k components are equal to the y component of the second point of the sequence, etc. The vector found in such a way will now be normalized and it will be considered as a characteristic of the form of this connecting line.

Algorithm offormation of the vocabulary. The vectors which yield the form of the lines form a space of "tracking features." The set of such vectors which corresponds to the set of the lines found on all images given to the machine to learn will yield the basis for the formation of a vocabulary of "tracking features." Using the algorithm of automatic classification this set may be divided into groups of neighbouring vectors. Consequently, the space of the "tracking features" is divided into subspaces which yield the sought after vocabulary.

ul

ur

dl

dr

I ~

Fro. 8.

6. AN E X P E R I M E N T A L ANALYSIS O F SEPARATE STEPS O F T H E PROPOSED SCHEME OF A LEARNING MACHINE For the experiment we used images of different dimensions (18 x 30, 20 x 25, 27 x 100, 36 x 36, 54 x 54, 72 x 72) and different content (handwritten digits and letters, words, cartographic symbols, parts of topographical charts). Windows of different dimensions were used (5 x 5, 6 × 6, 9 x 9, 12 x 12, 13 x 13, 15 x 15, 17 x 17, 19 x 19). In the case of the window over 12 x 12 squares of the raster of the original image, the raster of the window itself consisted of 6 x 6 squares. In such a way, for each square of the raster of the window there are 4 squares of the raster of the original image. As the blackness of the square of the raster of the window we assume an average blackness of the corresponding squares of the original field.

6.1. An experimental analysis of the algorithm of choosing the characteristicfragments The main goal of this sequence of experiments is the verification of the algorithms for selecting the. characteristic fragments. Specifically, the goal is to find selected fragments which are "elementary" from the human viewpoint whether or not these fragments are selected independent of " c o n t e x t " of the image. Also, these experiments determine the effect of the three parameters and logical characteristics of the analyzed algorithms on the selected fragments. All three algorithms described in Section 2 were analyzed. The locations of the local minima and maxima of the informativity function were studied for each experiment separately. It was shown that the characteristic fragments which were selected by the described programs can be divided into groups which correspond to the location of the centralized geometrical features of the image. It was shown that the influence of the "encircling fragments" on the selection of the characteristic fragments was small with respect to the change

116

I . B . MUCHNIK

in the dimensions of the analyzed images up to 1.5 to 2 times and to the change of the three parameters as proposed by the algorithm, specifically, the dimension of the moving window. The most effective algorithm in practice was found in searching with a prototype image : a change in the parameters of the prototype image practically does not influence the groups of characteristic fragments; and, additionally, this algorithm was found very suitable to apply not only to contours, but also to half-tone images. The program of extracting the characteristic fragments was also used for the solution of a special problem of finding the location of a known symbol on a topographical chart of printers copy.ts~ For this application the program was slightly modified, and the image of the symbol to be found was used as a prototype image. The experiment proved the effectiveness of such a program. Next, we will show a detailed description of the various separate experiments.

Digits. The basic data for the comparative study of various types of algorithms consisted of digits. In the experiment, we used four sets of digits "3" and "5", the image of which was given on a raster 27 × 20 squares. The first set consisted of five simple images of '~3" and four images of "5" (Fig. 9). The remaining sets (Fig. 10) included considerably more images (32 images of each digit). The second set consisted of comparatively simple images of practically the same dimensions. The third set consisted of images handwritten by different people with considerable variation of form. The fourth set included specially drawn images which varied considerably in form and dimensions. As was shown above, we used windows of different dimensions in the experiment. First we will show the results which were obtained with the aid of 6 x 6 windows. Most extensively we analyzed the algorithm using the prototype image. Here we tried to find the minimum of the function. On the first group of images we tried 20 variants of the prototype images differing in character of distribution of the blackness in the field of the window ; that is, the rate of decrease of the blackness with the distance from the center of the window. From these we choose by inspection the two best variants (Fig. 11). The results which were found using these two prototype images were given in Figs. 12 and 13, respectively. It may be seen that if the standard window on the boundary does not have bands with a blackness of less than ½, then it does not allow the selection of the end of the line (and also the part of the corners on the boundary of non-contoured figure). On the other hand, if such a band exists it interferes with the search system to "go outside the figure" which is the reason for increasing the time for searching the characteristic fragment in the case when the prototype images with an outside band are used where the blackness is small (less than ½). The use of the program which was based on the algorithm of search with the prototype image on the images of the second and third group showed that the complication of the initial material has small influence on the character of the chosen fragments. The results found on images of the fourth group showed additionally one may also change in a considerable way the dimensions of the images. The results of using the algorithm for finding the characteristic fragment based on the search of local extrema (maxima) of the blackness of the defocused images are given in Fig. 14. From this figure it may be seen that they are practically not different (in the case of contour images) from those which were obtained using the algorithms of search with a prototype image. But additionally, these algorithms are considerably more economical. The time of the search of the characteristic fragment when using these was shortened 5-8 times.

Simulationof process of formingthe languagefor descriptionand analysisof the formsof images

117

The third algorithm which was analyzed in this paper was the algorithm with a decorrelation of the image. The special feature of the work of the program based on this algorithm was found to be the close similarity of small parts of its searching trajectory with the contours of the image (Fig. 15). Because of this, already during the search one may use a coarse analysis of the relative location of the characteristic fragments. Thus, one shortens the time for searching for them. Additionally, the economy in time for searching is achieved also on account of reducing the number of trial steps. (The number of trial steps for one real step in this case is smaller than when using the algorithm with the prototype image.) The chosen characteristic fragment in this case is centralized in a sufficiently different way that this practically coincides with the results found in the case of the use of the algorithm with prototype images. The shown peculiarities of the search trajectory may be used not only for shortening the time for selecting the characteristic fragment, but also as additional information as to the characteristic of the contours of the images. With a window of dimension 12 × 12, the experiments were performed on the first set of digits using the algorithm of search with prototype image. On Fig. 16 are shown the examples of the results of the search of the local minima of the informativity function and on Fig. 17 the local maxima of this function. Using such a window ("large" dimension for these images) the forms of the chosen fragments are also shown not to be complicated. Especially interesting are the characteristic fragments which correspond to the maxima of the informativity function. They choose the "outside" parts of the figure and the "large" comparatively independent parts. In order to construct a description of the images the use of such fragments together with small fragments which characterize the details may be found quite useful ; for instance, in order to describe the fact that some fragment of a small dimension is found in the field of some large fragment. The influence of the dimension of the cutting window on the character of the location of the chosen fragments is shown directly on Fig. 18. It is seen that even for considerable changes in the dimensions of the window, the location and number of the characteristic fragments on the image varies little. It is natural, therefore, in real problems to restrict the program to a coarse evaluation of the type of the dimensions of the cutting windows.

Words. The program based on the algorithm of search with prototype images was used for choosing the characteristic fragments with images of handwritten words (. . . . , . . . . ). Only local minima ofthe informativity function were chosen. Here the separate experiments were performed with windows of the dimensions 6 x 6 and 12 x 12. The chosen characteristic fragments and their relative location on the images of the words are shown in Fig. 19. As is seen in this case, the compression of the text does not have any influence on the choice of the geometrical features of the specific words. Some characteristic fragments are selected also on the intermediate sections on the connections between letters, but the number of such fragments is less than 15 per cent of all those selected on the images of the words. It follows that the development of a special method of analysis of such fragments may be needed for automatic separation of the letters from one another on a compressed text. Cartographic symbols. (See References (8) and (12) for details.) A series of experiments was performed on unnormalized handwritten symbols which were taken from actual maps. The symbols used were "airplane" and "isolated tree" (10 variants of the image of "'airplane" and 11 variants of the image of"isolated tree"). They were increased forty times and given on a field 36 × 36. The fragments were chosen using a window of 9 x 9 squares.

118

1. B. MUCrlNIK

The main purpose for testing the programs with the selection of these symbols was to find the performance of these algorithms on not-contoured images. In this case, we used the algorithm of search with prototype images. Only local minimum of the informativity function were chosen. Examples of chosen fragments are shown on Fig. 20. The chosen fragments, although we cannot call them elementary as previously, may easily be characterized in words: "the left corner with solid black," or "the right contour corner," etc. As in the case of the contour images, the characteristic fragments were separated centrally.

5 55 53 3 533 FIG. 9.

33R~-

- d. . - .

,~.'

_

;

~__) ~

_

)

~." . ,

.

~

~."

_

--

,

'~,..' G 3

~

0.20

~ -~ '-"

__ _~

~'

_j _J ..d ~J ~

"-

S' , ~q J j

O J

,_~' J ~_, j

J

J .Y,-.'~.y J~J

~C

,,. ,.) ,.Y J 2 N j

~ j

,~

,./

J j

~J

O

3 "7 ~ "~ 7: t, "5 ~ 5 3S 5 3 3 55 ..,aJ "

FIG. 10.

~,

.-,..,

} S'

J

:5--"

j

95

~5 - i 5 -

i ,--~ ~

%

355

Simulation of process of forming the language for description and analysis of the forms of images

o

b

I/4 1/411/4 I/4 I/4 I/4 ,/43/8i3/83/83/8 ,/, 1/45/8 3 3 3/81/4

11211z I/z I/2 I/2 i/2l

,/23/, 3/, 3/, 3/, '/2 I/2 5/4 I I 3/4 I/2

,/4%3/83/83/8,/, i/, i/, i/, i/, i/, i/,

,/2 3/, 3/, 3/, 3/, ,,2 ~/2! 1/2 I/z i/2 I/zi/z

I

FIG. 1 I.

FIG. 12.

FIG. 13.

FIG. 14.

119

120

I. B, MUCI~N[K

FIG. 15.

1

PSff~JffJJ~f I

O0

I

. . . . . . . 7" oO Oo

Y//f JJ/J~/J//~

FIG. 16.

-It'.

_J

U

J

FIG. 17.

Simulation of process of forming the language for description and analysis of the forms of images

5X5 7X7

/

~L L

,,×,,~-~:"

I3XI3, I5 X %

~.

/'13x13,15x15

13X13,15X15

9X9, IIX I,, "Xl3

//,9X 19 ~ S X 5 7X7

/

,,,k~ ,I L

19Xl9- - e ! ~ . !

5X5,'X7 ~l~X'~ 19Xl9

5X5, 7X7

\\x, \\\ t 9X~9,11Xi1~ '\ 9X9,11Xll 5XS,7XT° 13XI3,15X15\ 13Xl3, t5XI5 19Xl9 17Xl7 \ 17X17, 19Xt9

FIG.18.

FIG. 19.

121

122

I . B . MUCHNIK

FIG. 20.

Topographical charts. (See Reference (8) for details.) On three fragments of topographical charts given on a field of 72 × 72 an experiment was performed for finding a known symbol (~). On Fig. 21 is shown one of the fragments of the charts as used in the experiment. This fragment was enlarged. On Fig. 22 the same fragment is shown in a form as given in the machine. As an algorithm which solves this problem we used a modified algorithm of a search with,a prototype image. The modification consists of using as a prototype the image /:x (Fig. 23) and using a window with the dimension 36 x 36 squares. The practice showed that in order to find a symbol it is necessary to select all the local minima of the function and then from these to choose the minimum. In all three cases this minimum was achieved exactly in the place where the searched symbol was located (Fig. 24).

FIG. 21.

• ,I-

. . . . . . . . . . . . . .

FnG. 22.

Ill . . . . . . . . . . .

, . . . . . . . . . . . . . . . . . . . . . . . .



• em• • t e e t e o e e e l

,....... ....

ee

:2::::::

,~,.,~..,;;;;.;."

Fno. 23.

Q~woeee~oeo'~'~ooe~eeeioeeeeeegeeooee~ ~e~t~e~e~e~geO~oeeeeeeeoeePs~

oej~moeOaooeoteeQeeleo,

::":::::~::~'"'..

.• . .....~. . .

.....

ooeeoetfot~poo

fo

;1::":: eeeeoe

......

0etoeeeeeeeoeeeQeoeq~teQQete~et4geo

t~

r~

0

0

124

I . B . MUCHNtK

....................................

:::::X::::X::::::::::X:::::::::::

• . . . . . .

° ° , ° ° o °

. . . .

~ . . . .

° , ° ° o . ° ° ,

. . . .

iiiiiiii!iiiii!iiii!i!!ig!i!!iiii!i!

:i:~:::"

., ......

"i~

i

°

.

!

ii.iiiiiiiiiiiiiii ..iii :iiiiii...i!i FtG. 24.

6.2. Experimentfor automaticformation of the vocabulary of the form The original data for the experiments for automatically constructing the vocabulary of the form will be the set of the characteristic fragments which were chosen in the above described experiments. For each of the fragments we calculated the corresponding f-vector. Then the set of the f-vectors was divided into groups according to "similarity" using the algorithm of automatic classification. We use partitioning into 5 and I0 groups. The resulting groups of vectors differ among themselves and may be described verbally and are quite easily characterized by a conventional figure. For some of the experiments the system of such conventional figures is shown in the table of Fig. 25. (The table consists of the basic features of the characteristic fragments shown in Figs. 27 and 29.) Typical features are "horizontal line," "diagonal leading from left upwards to the right," "small corner," etc. It is interesting to note that this system of figures for the given vocabulary may easily be found automatically. To achieve this, an averaging will be performed on groups of characteristic fragments "similar in form" as found by the machine. This average results in a prototype which shows a general form of the fragments of the corresponding group. The averaging is done so that a black-white image may be constructed in which the blackness exists only in those squares where there exists blackness in "practically all" fragments of the discussed group. To achieve this, the characteristic fragments are viewed as vectors the components of which assume the values "1" and "0". All fragments of one group are summed as vectors. As a result, one gets a certain "summary" vector. With the aid of this "summary" vector, two numbers may be found : the number of nonzero components of this vector (m) and the number equal to the sum of all components of this vector (s). "We will call the ratio s/m as a threshold. The prototype fragment we will call a fragment of " l ' s " and "O's" obtained by comparison of the components of the "summary" vector with the threshold. If the number corresponding to a certain component is larger or equal to the threshold, then this component of the prototype vector will have the value "1". In the other case, it will have the value "0". The system of such prototypes for the case of the classification into 5 and 10 groups is shown in Fig. 26. The original data for constructing these prototypes were given in Figs.

Simulationof process of formingthe languagefor descriptionand analysisof the formsof images

125

27 and 29 and are the same data which were used to construct the figure of the table of Fig. 25 done in a qualitative way, "according to inspection". Direct comparison of Figs~ 25 and 26 shows that the system of prototypes and the system of figures are practically the same. Thus, a qualitative view of the found sections of the groups of characteristic fragments agrees quite well with the "average" characteristic of the difference between these groups. It is interesting to note that the form vocabularies found on the basis of different (according to form and dimension) original data, and with the aid of different algorithms for choosing the characteristic fragments were found to be quite similar. Digits. The characteristic fragments chosen on the images of the digits of the first set which were shown in Fig. 9 are now divided into groups. In Fig. 27 we show the result of the program of classification of the fragments chosen with the aid of a prototype image (Fig. 1la) into five groups. Figure 28 shows analogous information on the set of the characteristic fragments as found using the prototype images of Fig. 1 lb. In Figs. 29 and 30 we show the results of the classification of corresponding sets of fragments in ten groups. Here, if characteristic fragments are found which include not more ttian two black squares, then they were excluded from the groups. Thus, on Fig. 29 the "white" fragments fall completely into the second and third groups. The figures show that the images in groups are quite different in form, and also images are combined in each group which, from the human point of view, one would naturally regard as "similar". Each of these groups is characterized by the possible variations of a given specified form. Therefore, the finding on a large image a characteristic fragment similar to a representative one of the groups, it is natural to see as the presence of the given feature inside the image. Despite the similarity of the images of the characteristic fragments of one group, it should be noted that there is some similarity between groups as shown in Fig. 27 and Fig. 28 and, respectively, on Fig. 29 and Fig. 30. Especially, the similarity may be seen between the first group in Fig. 27 and the first group in Fig. 28, between the second and fourth group of these two figures. When discussing Figs. 29 and 30, one should note the similarity between their fourth, seventh and ninth groups and also between the sixth group of Fig. 29 and the fifth group of Fig. 30, and also between the tenth group of Fig. 29 and the third group of Fig. 30. From this, one may draw the conclusion that the change in the particular parameters of a prototype image in the algorithms of the search will not be very pronounced on the formed vocabularies of the features. Figures 31 and 32 show the distinctive features as formed on the basis of the set of characteristic fragments which were obtained on the most complex images of the fourth set of 5's and 3's. The fragments were chosen with the aid of the algorithm for search with a prototype image (Fig. 11a) and an additional use of a moving window of dimension 6 x 6 squares. The local minima of the informativity function were selected. Despite the fact that characteristic fragments of these figures are quite different from those which were shown in Figs, 27-32, it is easy to note the distinct similarity between the following groups : the first group of Fig. 31 is similar to the fourth group of Fig. 28 ; analogously, the third group of Fig. 31 is similar to the second group of Fig. 28 ; the fifth group of Fig. 31 and the fifth group of Fig. 28 ; the fourth group of Fig. 32 and the ninth group of Fig. 29 ; the second group of Fig. 32 and the fifth group of Fig. 29 ; the third group of Fig. 32 and the sixth group of Fig. 29; etc. It may also be noted that a substantial increase in complication of the original data as well as a substantial change in the dimensions of the images (1.5-2 times) have a small

126

I . B . MUCH~IK

influence on the substantial character of the features obtained. As before, the groups of fragments may be easily divided according to the basic form of the configurations : "'small comers," "lines," "terminated lines," etc. Figure 33 shows examples of images of vocabulary of the features of the form for characteristic fragments of the dimension 12 x 12 squares found from digits from the first set. As is seen, the change from the small to the large fragments led to the change in the vocabulary. New fragments of form appeared. These may be called "arcs," or "S-shaped lines." But the majority of the "words" in the vocabulary coincide with the elements of the vocabulary of which the fragments were given in the Figs. 27-32. Letters. In this experiment the digits 3 and 5 and the letters a, c, p, w, u (as shown in Fig. 9) were also used as images on which the characteristic fragments were chosen. Using the program which was based on the algorithm of the search with the prototype image, 15 characteristic fragments were chosen from each of the images. (Some characteristic fragments of the images, therefore, were shown in the data more than one time.) The results of the features formed yield Fig. 34. It is seen that substantial increase in the variants of the form of the initial images has only a small influence on the complexity of the features : they are still "small corners" and "lines". The difference consists of the appearance of the new feature (group 7 in Fig. 34) which may apparently be called "connecting lines" and which is quite characteristic for handwritten letters. As before, blanks (in the fourth, eighth and ninth groups) mean that these groups consist of fragments which include not more than two black squares.

Conventional outsized cartographical symbols. (See Reference (12) for more details.) In this experiment the vocabulary of the form was formed on the basis of the set of characteristic fragments with the dimension 12 x 12 as found on ten images of the symbol "airplane" and eleven images of the symbol "isolated tree" of Fig. 20. The formed features are shown on Fig. 35. In this experiment, the dependence of the elements of the formed vocabulary on the original data is especially evident. The features obtained strongly differ from those which were obtained on the digits and the letters. Moreover, it is seen that in this case the characteristic fragments are divided into groups according to similarity. Within the groups, fragments consist of simple variants of given configurations of blackness. The fragments of dissimilar groups are quite easily seen to be different in form. ,

rT1

Vertical

[3

The ur corner

~A Ir corner i R to I dioqonol R-end of the horizontolline

The sharp middle corner

Horizontal line

5

The ur corner

6

Sharp ur corner

7

Vertical line

B;

The dl corner

FIG. 25.

Horizontal line The dl corner

UI corner, L-end of o horizontal line

Simulation of process of forming the language for description and analysis of the forms of images

10

FIG. 26.

~Fl-111q [] [] [] [] [] 3F-d 58i] S] [] [] K1 K1 [] F-} F-3 FIG. 28.

FIG. 27.

2 3

4 ~ ~ F'al

4~ ~ [~ r l i j

~ IE I~ II-I~ E] li] 8 Fi'l ~ IKI ~ I~ ,o~ [ ] [ ] FIG. 29.

7 Ill Ill III ~-I ITI

9~

,oW-I FIG. 30.

127

128

I . B . MUCHNIK

FIG. 31.

~r~

r~A3~

FIG. 32.

FIG, 33.

Simulation of process of forming the language for description and analysis of the forms of images

129

4

F]6.34.

~ 211 4~G

~ ii ~

~

[]

[]

6L~ ~lJ 8mml 91TIr~ ~o1-~rl]

~ JJ II ~il~ ~r/1

Em ~ mi ~i I~Jl

[] ~ i) It!~ I~E]

~S[][] I n [] [] ~1~ rF re [] 1][1~

FIG. 35.

6.3. The experiments for automatic formation of the vocabulary of the locations The goal for the experiment which will be explained in this section is the checking of the "stability" of the vectors p (as introduced in Section 3) which give the location of the characteristic fragments on the image, and also in studying the possibility of forming a vocabulary based on the set of such vectors in terms of which one could, in a standard way, divide similar images in groups. Figure 36 gives a qualitative view of the relative location of the minima and maxima of the informativity function on the images of the first set of digits. (A black circle shows the location of the minimum and crosses of the maximum.) Schemes of distribution of all fragments relative to a single fragment on the field of the image (described are only those fragments which correspond to the location of the local minimum of the informativity function are shown in Fig. 37 in the form of a polygon. These schemes are formed in the following way. From the center of a given fragment we draw rays in all directions toward each of the other fragments of the image. On these rays we will select segments which are proportional to the distance between the center of the given fragment and the center of the corresponding other fragments. Then the ends of these segments will be connected. The black circles on the figure correspond to the centers of the fragments. Figure 37 shows that the scheme of the fragment of different images are similar when the fragments are located on the same "places" of the chosen images.

130

I . B . M1JCHNIK

The stability of the introduced characteristic of the location of the fragments on the image with the variation of what is written is very well shown in Figs. 38 and 39. Figure 38 shows pairs of images in which the "relatively corresponding" fragments are connected. Here "relative correspondence" is found in the following way. For each fragment of one image, one tries to find the nearest, in the sense of the Euclidean distance, image between the vectors p of the fragments compared. Then, for each fragment of the second image, one tries to find the nearest from among the fragments of the first image. In Fig. 38 the lines of only the relatively nearest fragment is connected. It is seen that there are quite a sufficient number of them and the correspondence obtained is completely natural from the human point of view. Let us note that the question of establishing a correspondence of the characteristic fragment of different images has probably an autonomous interest. In particular, it is one of the central problems for the analysis of the images as developed by V. C. FINE.~te~The method proposed here may be effective for the solution of the problems of showing the correspondence between the characteristic fragments as important to those researchers. In Fig. 39 we show how the images of the digits of the first set are divided into parts in correspondence with the automatic classification of the vectors p of its fragment into 3 and 6 groups. The cases " a " and " b " are different only in the way of constructing the vectors : case a corresponds to where the separate components of the vector p show if there exists in that particular part of the image at least one characteristic fragment; the case b then has separate components of p proportional to the number of the characteristic fragments which exist in the corresponding part of the image (see Section 3). It is seen that in all cases the division of the various images into parts is quite similar. The vectors p of the characteristic fragments of the images of the first set were constructed according to the method a and were further divided into 4 and 8 groups. The results of this classification using a system of average prototypes of the groups are shown in Fig. 40. The method of getting these exactly coincides with the method of getting the prototypes of the type of formed characteristic fragments as was described earlier. Figure 41 shows an example of specially modelled data which was constructed in the following way. The left most image of the digit "5" of this figure was chosen as a prototype. New images were found as a result of the fact that independently we varied the lengths of the segments of the prototype. Each segment may be shortened by 2 or 4 squares of the raster, or enlarged by 4 squares, or left without any changes. In such a way, on the prototype we received 1024 images. For each fragment of the image the vector p of type " a " was found. The vectors p of 6 fragments of the initial image were chosen as prototypes. They were divided into 3 types as is shown in Fig. 39a and into 6 types (in this case each fragment determines its type). Then, in the usual program of recognition with the rule of correspondence of the class according to the nearest, in the sense of Euclidean distance, from the prototype of the class, we found the locations of all fragments of all constructed images. In the experiment, therefore, there were 6144 vectors p. In both experiments we obtained without any errors, the recognition of all vectors. The results of these experiments showed that the introduction of the characteristics of the location are stable for non-proportional changes of the images.

Simulation of process of forming the language for description and analysis of the forms of images

11

~

le

x m

1

It

'5 5 s '!ii. 5

It

It

If

'

'

x

'

X

-

x

X

II sl

It

X

X

~

'

"

X

X

131

a 3

X

~

11

'

X

x

X

FIG. 36.

I

I0 9 8 12

I

2

3

4

5

6

7

8

9

I0

I0 9 8

I

2_.

3

4

5

6

7

8

9

I0

FIG, 37.

FIG. 38.

F1o, 39.

132

I . B . MUCrtNIK

FIG. 40.

LI I

rI

1

FtG. 41.

6.4. Experiments to construct the logical descriptions of images and classes of images (pictures) The experiments of the subsection are connected with the choice of certain additional cooperations of analysis of images which were described in Section 4. In Section 4 it was shown how to construct the descriptions of the images in form of logical functions of which the variables are special constructions made up of characteristic fragments which lie in certain given ratios. These constructions were called terms. For forming the digit "5" of the first set, some of the chosen terms are shown in Figs. 42 and 43. In Fig. 42 the terms found from the fragments of the dimensions 6 x 6 squares of the raster which corresponds to the minimum of the informativity function are given. In Fig. 43 terms found from fragments of the dimensions 12 x 12 squares of the raster which corresponds to the maximum of the informativity function are given. It is seen that the system of terms of Fig. 42 does not duplicate the system of the terms of Fig. 43. They, moreover, complement each other. These terms may easily be described as phrases of human language of the type "fragment of form a appears in the third sector of the diagram of the directions (that is, upwards) of the fragment of form b." In Fig. 44 on the schematic diagram of the Fig. 5 the locations of some terms found with the aid of each of the introduced logical characteristics (LC) are shown. The terms found with the aid of LC " a " are shown in Fig. 44a ; those found with the aid of LC "b'" in Fig. 44b ; and those with the aid of LC " c " in Fig. 44c, and those with the aid of LC " d " in Fig. 44d. Using terms of this type it was found possible to construct some short (including 2-5 terms) discriminant and descriptive descriptions of the class of images with the aid of which one may recognize "5's" and "3's" of the first set, As was shown in Section 4, a logical description of a class of images in a natural way is introduced and in the case of using as description of the "form-location" array. Such arrays are constructed for the images of the digits of the first set. The prototype corresponding to

Simulation of process of forming the language for description and analysis of the forms of images

133

the element of vocabularies of the "location" are shown in Fig. 40, and the prototypes corresponding to the element of vocabularies of " f o r m " are shown in Fig. 45. The results of the recognition achieved by describing the images of the first set are given in Tables a-d of Fig. 46. All the tables are constructed in the same manner. The number of the column of each of these tables corresponds to the number of the image of the digit (Fig. 9) if one assumes that the images in Fig. 9 are viewed from left to right. " 1 " in the first row of the table means that the image belongs to the class of " 3 " and does not belong to the class of " 5 " ; "1" in the second row of the table means that the image belongs to the class of " 5 " and does not belong to the class of " 3 " ; " 1 " in the third row means that the image does not belong to either one of these two classes ; finally, " 1 " in the fourth row means that the image belongs to both classes. The tables are distinct in their use of different variants of classification of characteristic fragments according to " f o r m " and "location". The table a corresponds to the classification of the fragments in four groups according to " f o r m " and "location", respectively. In Table b one uses the results of the classification of the fragments in four groups according to the " f o r m " and in eight groups according to "location". Table c differs from Table a in that in this table one uses the results of the classification of the fragments according to "form", not in four but in eight groups. In the same way, this also is the difference between Tables d and b. The described tables show that the results of recognition are very good, (there are errors only in table a) despite the fact that in order to describe the class of " 5 " we did not use at all the information of the images of "'3" ; and conversely, the description of the class of " 3 " is constructed without using the information on the images of "5".

FIG. 42.

FlG. 43.

134

I.B.

a

MUCHNIK

b

c

d

FIG. 44.

I]-] 0

ru b FIG. 45.

0

3 51 5z5~545t 52 5s343s .7.. I

I

I

I

r

4-

I

I

I

I

I

I

b

5

315,15.15, 5.3,13~r3~ 3. 3.

4- -+

I

I

I

I

I

I

I

[

I

-7c

5 !3 .4- "--

51 !52 5~5~ 3, I

I

II

4+ 4-

3; 3v34 3s

I

l

I

I

I

I

I

d 3 5t 5~ 5~,54 31 32 33 34 3s +

I

I

I

I I

FIG. 46.

I

I

I

I

Simulationof process of formingthe languagefor descriptionand analysisof the formsof images

135

7. EXPERIMENTS TO CONSTRUCT A SYSTEM OF VISUAL PICTURES Experiments to form systems of visual pictures have the goal of checking the proposed scheme of learning machine as a whole. Therefore, these experiments are basic for the present paper. In Fig. 47 we showed a flow chart of the main program which realizes the basic model of the process of forming a language for the analysis of images. The flow chart shows the sequence of processing of a separate image. From left to right each block in the chart means the operation of one more step in the processing according to what was described in Section 1. All in all, three series of experiments were performed. They differ according to the choice of the original data and the features of the used algorithms and by the organization itself.

In the.first series of experiments the proposed scheme of recognition was tested upon "reliability". In order to form the vocabulary of the original data, we used some data from 9 images of the "5" and "3" of the first set, As characteristic fragments we chose the segments of the images of the dimensions 6 × 6 squares which correspond to the minimum of the informativity function. The vocabulary of the form includes 8 words. The prototypes corresponding to these words are shown in Fig. 45b. The vocabulary of the locations includes 4 words. The prototypes corresponding to them are shown in Fig. 40a. It is essential that during the formation of the vocabulary of the "form" one uses directly the point of the space of the receptors of the cutting window as vectors f and during formation of the vocabulary of "location" one uses the vectors p found according to the methods a (see Section 3). That is, one uses the most coarse characteristic giving the "form" and "location" of the fragments. As a description of the images we used a discrete "form-location" array which included therefore altogether 32 elements. Using the simplest algorithm of recognition according to the minimum of the Euclidean distance from one of the prototypes of the system as described in Fig. 45b, and correspondingly a system described in Fig. 40a for each new fragment we found whether its "form" belongs (vector f ) to one of the elements of the vocabulary of the form and whether its "location" belongs (its vector p) to one of the elements of the vocabulary of the location. Thus, for 186 images of the digits "5" and "3" of the second, third, and fourth sets of Fig. 10, which did not take part in the process of forming the vocabulary of the original symbols, we obtained binary "form-location" arrays of 32 elements each. Examples of such arrays are shown in Fig. 48. These arrays, if viewed as vectors of a space of elementary statements, were divided into groups according to "similarity" with the aid of the algorithm of automatic classification. The results of divisions into 10 groups are shown in Fig. 49. It is seen from Fig. 49 that despite the very coarse methods of constructing the arrays, there is an obvious correlation between the classification of the arrays in groups and the data of the analyzed images in classes of "5" and "3". The entire set of arrays practically divided itself, not into 10, but into 5 groups. Groups 1-5 of this array practically does not include any elements. Each of the remaining groups consists basically of images of "5" or "3'" and something which is interesting for its own sake includes images similar by writing. In particular, all 32 images of "5" of the second set fell into the eighth group and all images of this set of "3" fell into the ninth group. This proves the assumption that the "tbrm-location" arrays for different pictures (class "close" in the sense of the images) form a very strong clustering in the space of elementary statement.

136

1. B. MUCrtNIK

In the second series of experiments the process of forming the pictures was based on the same data which was used as a base for forming the vocabulary of the primary features. Additionally, the arrays used to form the pictures were constructed in more detail. The experiment was done in the following way. 30 images (Figs. 50 and 51) were given to the learning machine. The machine was not told which of the images belong to which class. On this data the machine was supposed to form the vocabulary of the primary features to construct with its aid the array descriptions of the images and then these arrays were to be classified. Such a scheme of performing the experiment is actually by itself a model of the process of forming pictures. As is seen in Fig. 50 and 51, the data used was comparatively simple and also as in the first series of experiments, the data consisted of the images of the digits '"5" and "3". The images were given on a field of 32 x 27. The cutting window was chosen with the dimension 9 x 9. Characteristic fragments were chosen corresponding to the local minimum of the informativity function. The selection of the characteristic fragment was performed using the algorithm of the search with a prototype image. The vectors f as required to form the vocabularies of "form" were constructed without any simplifications according to the algorithm described in Section 3. The vectors p required to form the vocabulary of "location" were constructed according to type b (see Section 3), that is, once again they were more accurate than in the first experiment. The vocabulary of "form" which was formed by the machine consists of elements, and the vocabulary of location, 5 elements. As the description of the images, we chose binary "form-location" arrays of 40 elements each. The entire set of the obtained descriptions is shown in Fig. 52 (the description of "5" is given in correspondence in Fig. 50) and in Fig. 53 (the description of "Y' is given in correspondence with Fig. 51). Each array of the extracted characteristic fragment corresponds to one image. The left digit of each fragment corresponds to the index of its form and the right to the index of its location. To make it more evident, Figs. 54 and 55 show the description of two conventional images with word characteristic of signs, of form, and location. The classification of the obtained arrays were performed in four groups (Fig. 56). It showed as it was to be expected, that, for the real data shown, there is practically two closed groups of images, the group of "5" and the group of "3". The third series of experiments was different from the second primarily because we chose hand-drawn conventional, outsized symbols "airplane" and "isolated tree" analogous to those shown in Fig. 20 as original data. Altogether, 76 symbols were used : 40 for the images of "isolated tree" and 36 for "airplane". The symbol was given in a field of 54 x 54 squares. The process of forming the vocabulary of the primary feature in this experiment coincide with the corresponding process in the second experiment with the following inconsequential difference; namely, that in this case the vocabulary of the form included 10 elements and vocabulary of location 6 elements. Thus, the "form-location" array had 60 elements. For each of the analyzed images a discrete and continuous "form-location" array was constructed exactly as it was described in Section 4. The classification of the discrete and continuous array was performed separately but in both cases the set of arrays divided itself into two groups. From the classification of the discrete array, only two "errors" were allowed : the description of one of the "airplanes" was found in the group of images of "isolated trees", and conversely, one image of "isolated tree" fell into the group of images of "airplanes". In the classification which was given in the form of continuous "form-location"

Simulation of process of forming the language for description and analysis of the forms of images

137

array, there were no errors at all ; one group was formed by "airplanes" and the second by "isolated trees". From this experiment, one draws the following two conclusions. The first is that the increase of the statistics of the original data improves the quality of the process of forming the pictures. The second, that the continuous arrays, it seems, are better than the discrete.

choroctorilti¢

\

the i'l pictu~

Oi(x,y

E,tmctiee

of

ni

A

,

f r ~

'I:

\ , ~ , °, \coordilmtc,

~ , ,xtrcmc

if iON

I '~"

2

I

r '~"

°"

/indice,

,

/

I i:= i+l

_ _1

FIG. 47.

Fic. 48.

"5" "3"

-

I

2

3

4

5

6

7

8

9

I0

I

2

I

I

I

15

8

33

4

28

B

3l

B

43

I

I

Fm. 49.

I

°"'

~ ~L'~I ~!

L'¢I

".~ ~ ~1 ¢,~!

,ir"

~

¢ ~ ~1~1

'a~mml31~WI nn~mmmm~ ~ll~mm~ ~"lmm~ ~nm~

!

!

q

i,

Simulation of process of forming the language for description and analysis of the forms of images F I

I n

I ~

I ~

I I

I I

t t

I E

~ i

I I

I I

I b

I

I

[

I

I

I

J

I

J

I

I

I

139

mnmmn

,/J

urn_in

aim,

FIG. 54.

FIG. 55.

~ m =

home "5" "3"

-

I

2

3

I

I

15

-

-

4

I 14

FIG. 56. REFERENCES 1. M. A. AISERMAN, E. M. BKAVlmM^N and L. I. ROZONOER, Theoretical basis of the method of potential functions in the problem of teaching automata to separate input situations into classes, Automation and Remote Control 25 (6), (1964). 2. O. A. B~HKmOV, E. M BR^VERMA~ and I. B. MUCHNm. Algorithms for teaching machines to recognize pictures based on the usage of potential functions, Automation and Remote Control 2,5 (5), (1964). 3. V. L. BR^ILOVSKV,On a method for the recognition of plants described by several parameters, and its applications. Automation and Remote Control 23 (2), (1962). 4. V. A. KOVALEVSKV,The correlation method of image recognition, Zh. Vychisl Matem. Fiziki 2 (4), (1962). 5. R. A. NASHLYrrn~S, Analytical estimate of the reliability of the correlation method of image recognition, Automation and Computers. Mintis, Vilnuyus (1965). 6. M. I. SHLESINGERand S, SVL~TOC,Og, On the construction of prototypes for correlation reading automata, Proceedings of the Third All-Soviet Conf. on Information Searching Systems and Automatic Processing of Scientific-Technical Information (1967). 7. M. M. BO~qGAgD, The Recognition Problem, Nauka, Moscow (1967). 8. A. C. VASMUT,I. B. MUCHN1K, A. B. NIKOLAYEVand B. M. CHL~mCHOVlCH, Automatication of the process of reading cartographical information, Goedes.v and Cartography, No. 6 (1970). 9. N. B. SAVALZSmN,A hypothesis on the distribution of points of fixation during observation of images, Automation and Remote Control No. 12, (1968). 10. N. B. SAVALZSHI~and I. B. MUCHNIK, Linguistic (structural) approach to the pattern recognition problem (Review), Automation and Remote Control No. 8 (1969). 1 l. V. A. KOVALEVSKY,A sequential optimization in problems of recognition and description of images compendium, Recognition of Pictures and Constructioh of Reading Automata, No. 2, Kiev (1967). 12. I. B. MUCHNIK, Local characteristic formation algorithms for visual patterns, Automation and Remote Control No. 10 (1966). 13. 1. B. MUCHNIK, Forming a language for describing visual patterns, Proc. on Automatic Analysis of Complicated Patterns. Miz Publ. Co. (1969). 14. V. P. ROMAnov, Methods of structural analysis of patterns in problems of recognition of visual pictures, Proc. on "Recognition of Pictures and Construction of Reading Automata", No. 2, Kiev (1967). 15. A. A. SAWN, On the structural description of images, Information Retrieval Systems and Automated Processing of Scientific and Technical Information. Viniti (1967). 16. V. S. FAxrq. Recognition of Images, Nauka (1970).

140

I . B . MUCHNIK

17. A. A. FELDBAUM,On some principles of recognizing pictures, Proc. Self-Learning Automatic Systems, Nauka Publ. Co. (1966). 18. M. EDEN, Handwriting and pattern recognition, IRE Trans. Inform. Theory IT-8 (2) (1962). 19. B. H. McCoRMiCK and R. NARASIMHAN, Design of a pattern recognition digital computer with application to the automatic scanning of bubble chamber negatives, Nucl. Instruments and Methods 20. (1963). 20. R. NARASIMHAN,Syntax-directed interpretation of classes of pictures. Commun. A C M 9 (3) (1966). 21. S. WATANABE(ed.) Methodologies o f Pattern Recognition, The Proc. of the International Conference on Methodologies of Pattern Recognition, held at Honolulu, Hawaii, 14--16 January 1968. Academic Press. New York, London (1969). 22. E.M. BRAVERMAN,The potential function method in the problem of unsupervised machine pattern recognition learning, Automation and Remote Control No. I0 (1966). 23. A. A. DOROFEYUK,Algorithms for teaching pattern recognition without a teacher, Automation and Remote Control No. 6 (1966). 24. V. D. GLESEZand I. I. CUKERMAN, Information and Vision, M.L. (1961).