0031-3203/87 $3.00 + .00 Pergamon Journals Ltd. Pattern Recognition Society
Pattern R¢coonition. Vol. 20, No. 2, pp. 213-227, 1987. Printed in Great Britain.
ADDRESS LOCATION ON ENVELOPES* PEN-SHU YEH, SF~GIO ANTOY,ANNE LITCHERand AZRmL ROSENFELD~f Center for Automation Research, University of Maryland, College Park, MD 20742, U.S.A. (Received 8 Jam~ary 1986; in revisedform 27 March 1986)
Abstract--ln automatic postal address reading systems, the inability of the system to locate the address block correctly contributes significantly to the error rate. Envelopes often contain much extraneous printed information in addition to the address. The ideal approach to address location would be to read all this information and identify the address field by its semantic content; but it would be computationally expensive to implement a full-envelope character reading capability. This paper describes an alternative approach, which segments the envelope into regions of consistent print style and chooses a region most likely to be the address on grounds of position, size, print style, etc. without attempting to read the characters. Postal address reading
Address location
all the printing on the envelope and thus to identify the address field by its semantic content. But it does not The U.S. Postal Service processes nearly 100 billion appear to be practical to implement a full character letter-size maiipieces each year. To provide speedy reading system over the entire envelope in real time. service and reduce the cost of mail sorting, it has Instead, our approach was to segment the envelope installed 252 Optical Character Readers (OCR's) at surface into regions of consistent print style and to various locations since 1980. About 60~o of the choose a region most likely to be the address on metered mail sent through these OCR's is read grounds of geometrical information, print style, etc. correctly while the other 4 0 ~ is rejected due to without attempting to recognize the characters. problems such as hand-printed address, partially This approach is strongly supported by an obscured address, extraneous printing (advertising experiment in which an envelope is digitized, reduced messages, etc.), colored background, varying type to a binary image and every dark component (whether fonts, etc. character or not) is replaced by its bounding rectangle Since hand-addressed mail represents only about with sides parallel to the edges of the envelope. The 15~ of all letter mail, a significant fraction of resulting image, now totally unreadable, is presented unreadable mail has machine-printed addresses. It has to a set of subjects. All the subjects are able to correctly been recognized that in the case of machine-printed identify the address block, the stamp, and the return addresses, the inability of the OCR to locate the address. Even though the experiment was not designed address block correctly contributes significantly to the to include a statistically meaningful sample of mail, the high reject rate. ~1~ result indicates how knowledge about patterns on the Currently, the problem of address block location is envelopes can aid in mail sorting without the need for partially dealt with by locating an area of high edge recognizing characters. density on the envelope surface,a~ or by optically An outline of our approach is as follows: the image detecting an area of different retlectivity than the is segmented into connected light and dark regions background. ~3JCharacter recognition procedures are separated by zero-crossings of a digital Laplacian then confined to this area only. Another approach operator. Properties of the dark connected comlimits the scanning of the envelope surface to the last ponents are computed and tabulated. These properties few dark lines, which presumably contain all or part of are then used to cluster the components into groups the address. It is fairly easy to collect mail samples with which should represent regions of uniform print style. addresses located in such a way that none of the above Finally, an inference scheme is used to select the group approaches would succeed,t4~ that is most likely to be the address. The steps in the The ideal approach to mail sorting would be to read approach are described in detail in the following sections of this paper and examples of its application to digitized envelope images are given. Further details * The research described in this paper was sponsored by the and a larger set of examples can be found in Refs (5) U.S. Postal Service under Contract 104230-84-D-069. and (6). ~"To whom correspondence should be addressed. I, I N T R O D U C T I O N
213
214
P~N-SHUYEH,SEROlOANTOY,ANNELITCHERand AZRIELROS~NF~U)
Fig. 1. The original of an envelope.
2. P R E P R O C ~ I N G
The data used in our experiments consisted of digitized portions of envelope images. The resolution of these digital images was 200 pixels/in., a resolution commonly used in OCR systems. An example of an input image is shown in Fig. 1. A simple grey-level thresholding operator is inadequate for extracting printed components from the envelope image due to non-uniform illumination and colored backgrounds. It is more effective to detect the changes in the rate of change of grey level instead. This is accomplished by first performing over the
imagef(x, y) a digital Laplacian operator defined as
(VSO(x, y) = [ f ( x + I, y) +/(x- 1, y) +f(x, y
+
l) + f ( x , y
- -
1)]
-
4f(x, y). (1)
Equation (1) represents the analog of d~f/dx ~ + @f/@y~ for a digital picture. It is a second-difference operator, having zero response to linear ramps and constant grey levels, but it does respond to the shoulders at the top and bottom of a ramp. A direct implementation of' equation (1) also responds strongly to corners, lines, line ends and isolated points. The effect of the noise on the response of a difference operator can be reduced by
,~-2,7~.,L H ,A...A20/I,=, .,.9 :,~ E ,.~'le A N e , , : = ~ . ~ ~,~%4 I) ~ ~..T ' "AN ~ DRY<3 , ¢
....
-~
,
Fig. 2. The result of a Laplacian operator followed by connected component labelling; each labelled component is displayed with a false color.
Address location on envelopes smoothing the image before applying the operator. In particular, an operator can be used that computes differences of local averages. For our problem, good results have been obtained by changing equation (1) into •
(V~0 (x, y)
(2r
4-
1) 2
r
.......
1 ~. n, y + m) (2) (2S + 1)2 . . . . . . . , sf(x + where each double summation term computes the average gray level of a square neighborhood of pixel (x, y). Suitable values for the outer (r) and inner (s) radii of the squares are, in our case, 2 and I respectively. The result of this operation is illustrated in Fig. 2. In general, the zero-crossings of the Laplacian operator correspond to points where transitions between two different grey level components occur. To delineate the boundary between two adjacent components based on the zero-crossing idea, we apply a threshold operation to the picture resulting from the Laplacian operator. The output is a binary picture where I's represent pixels with positive, and O's pixels with non-positive, second derivatives. After thresholding, a connected component labelling algorithm is applied to give a unique label to each 8-connected set of pixels. This operation typically yields over a thousand components for a picture such as that in Fig. l(a). Many undesirable components exist in the background as the result of non-uniform illumination or reflectivity, and some are superimposed on character strokes due to varying ink density within a stroke. To eliminate these components, a merging procedure is executed wherever two adjacent components display low contrast with each other. A contrast histogram is constructed using the
215
contrast value between every adjacent pair of components. The first peak of the histogram corresponds to low contrast components within the background or within character strokes. Merging is performed for those pairs of adjacent components whose contrast is below the first valley in the histogram. The amount of cleaning achieved in this way is vividly seen in Fig. 3. "The total number of components is reduced to just over 100 in this example. The method of low-contrast component elimination used here illustrates an important advantage of detecting edges using zero-crossings of the Laplacian rather than using a gradient operator. '7) The zerocrossing approach yields complete closed contours ot components, and it is possible to obtain a reasonable estimate of the contrast of a component by averaging around its contour. Gradient approaches, on the other hand, do not always yield connected contours, and low-contrast edges must be eliminated by thresholding the gradient magnitudes on a pixel-by-pixel basis, which often leads to errors due to noise. 3. FEATUREEXTRACTION For each component resulting from the previous processing, a set of features is computed. These include the position, dimensions, average grey level, thickness, and darkness vote of the component. Most of these are stir-explanatory. The average component thickness (for characters, it corresponds to stroke thickness) is approximated by the ratio of the area to half of the perimeter. The darkness vote represents the likelihood that the component is darker than the background. It is the result of accumulating indices assigned to a component by comparing its average grey level with those of its neighbors. A + 1 is assigned when it is darker than a neighbor, while 0 or - 1 is assigned if it
T
-
' 207
ER.GIO A N : @ 34o. YATT
TUL' VILLE
J
, 207"3
/
2
PRI EWI
E"
Fig. 3. The result of contrast computation and merging displayed in false color.
216
PEN-Sxu YEH, SERGIOANTOY, ANNE LITCHERand AZRIELROSENFELD
Fig. 4. Binary picture resulted from darkness voting.
is equally dark or less dark than a neighbor. By thresholding the voting result at 0, we retain only those components whose darkness votes are positive. The resultant picture is presented in binary form in Fig. 4. which clearly demonstrates the ability of this Operation to extract dark characters from an envelope image. The features described above are stored in a so-called pattern matrix. For purposes of illustration, a toy example containing only five characters was
processed. The processing results at each stage are shown in Fig. 5. The resulting pattern matrix is shown in Fig. 6. By thresholding the darkness vote, we are able to disregard backgrounds and holes in characters (such as the holes in O, B, P, etc.). Another important feature, not used in the present implementation, is the color information pertaining to each component. Most advertising messages are. printed in colored ink, whereas most addresses are printed in black. It is evident that color can serve as an
Fig. 5. (a) Toy example: the original (left)and the result of the Laplacian operator (right).
Address location on cnvelopcs
217
Fig. 5. (b) Toy example: labelled components in false colors (left), contrast computation and merging (middle), binary picture from darkness vote (right). important factor for determining if a block is likely to be an address or not, provided this information can be acquired reliably. 4. DATA'STRUCTURES The next stage in our approach is to analyze relations among the components. We have found two types of data structures to be useful in carrying out this analysis: the inclusion tree and the proximity tree. 4.1. Inclusion tree The inclusion tree is obtained from the component adjacency information by first defining the envelope border as the top node in the tree. All components touching (or adjacent to) this node are designated as its sons. Similarly, all components touching a given son (but not the top node) are designated as sons of that son; and so on. The resulting tree is shown in Fig. 7 for the toy example. Since most of the components are characters having at most two interior components, the level in the tree at
Fig. 5. (c) Binary form of the toy example. Pit Z0 t i - I I
Component Label
Component Description
1 2 3 4 5 6 7 $
bar.kgrouad of the image first (upper leftmmt) digit 1 ~.cond (upper mld,:Ue) digit 1 digit S digit 2 digit 0 fragment toe~dag the border i ~ d e of digit 0
Fig. 5. (d) Correspondence between component labels and connected components of the toy example. •
218
PEN-SHu YEH, SERGIO AN'roy, ANNE LITCHER and AZRIEL ROSENFELD Pattern Matrix p origin
dlm~
D
Arm' gray
Stroke thkk
area.
perim
Dark vote
1
0
0
160
80
37
32
11886
536
0
2
18
6
22
19
15
4
111
80
1
3
19 45
22
12
17
4
121
80
1
4
20
61
21
17
17
4
195
138
1
5
118 38
90
18
lO
4
201
150
1
6
118 58
21
17
17
4
175
120
2
7
118 77
6
3
26
2
14
12
0
8
112
12
9
34
S
97'
44
-1
62
I
~
I1~1c~1c4lo5
ca
lilt I'1' I ,,+" +,+" Proximity Tree
Fig. 6. The pattern matrix for the toy example in Fig. 2. ¢,
which grouping of components occurs can be easily identified as a level consisting of nodes that have at most two sons. Groups of components on an envelope may correspond to the address, the stamp, or an advertising message. Often the address is typed on a label or on the inside letter showing through a cellophane window on the envelope. The edge of the label or window usually creates a large component in the image, which encloses a large background region containing many characters. This component corresponds to a node having many branches in the inclusion tree. The information carried in the tree thus will facilitate grouping of components into meaningful blocks. 4.2. Proximity tree Another important geometrical relation which needs to be considered is the closeness among components. Characters carrying a given type of information (for instance, the address) tend to be placed away from those conveying other types of information (for example, advertisements). This relation can be represented by a distance matrix which tabulates the distance between each pair of components. A more systematic method of representation is a data structure similar to the dendrogram, which has been widely adopted in hierarchical clustering techniques, is) We call this structure the proximity tree. It is shown in Fig. 8 for the toy example. The tree is constructed by first picking the two
Inclusion Tree
1
Fig. 8. The distance matrix and the proximity tree structure for the toy example. closest components from the distance matrix--in this example, C5 and C6. They are linked by the distance /)(5, 6) -- 3. Another pair of closest components, C3 and C,, are now chosen and linked by their distance of /)(3, 4) = 5. Referring back to the distance matrix, we find/7(2, 3) = 28 and/)(2, 4) = 44. Since C3, C4 are already linked and viewed as a group, the distance between.C2 and (C3, C4) is defined as the minimum of D(Cz, C3):and D(C2, (?4). A tree is thus constructed that represents the geometrical closeness of all the components. Grouping of components based on the proximity tree involves a cutting operation on the tree. For the toy example, cutting the tree at the top node results in two groups, one consisting of C2, C3, (?4 and the other of C5, Ce. If cutting is performed at a distance of 28, then there will be three groups, namely C2 as the first group, C3, C4 as the second, and C5, Ce as the third. Essentially, cutting at different distances corresponds to the grouping of characters first into words, then into lines, and then into blocks. The choice as to where to cut is not a simple matter. The decision cannot be made before considering the whole set of node distances in the proximity tree and other constraints imposed by the inclusion tree and the pattern matrix. It should be noted that the proximity tree can be generated from features other than the geometrical distances between components. Using color differences, for example, a cutting of the proximity tree will give clusters of components having similar colors. Likewise, proximity trees built on the component size or component thickness will cluster components into size groups and thickness groups.
7 5. BLOCK SIZE AND ORIENTATION
2 3 4 5 6
I 8
Fig. 7. The inclusion t ~
¢,
s t r n c t u ~ ~ r the toy example.
The proximity tree can usually be used to correctly segment the components into lines and blocks of characters. The sizes and orientations of these blocks
Address location on envelopes
+um~,'n I
vttlf--"
ITIJIqPI 110t'41 I ~ l H l | I IH | III
219
~i~
.I. . . . . . . . . . . . . . . . . . , I
_~I
I I ImAl ~ I !+I ' -
-1.
v44
I I+
I i
Jll~'41 I
I
I (a)
CN-
// .
.
.'~,
~++..
/I/"
(b)
dog.
(c)
frmq.
• O
•
05 O0 75 70
• 0 0 0
55 5O
0 l
45 4O 3,5 310
13 ~1 t •
30 ~5 3,0 $ 0 -S
O • • 0 0 •
"lO
0
-IS -20
• •
• ~5 - 3o -35 • 4~ -45 -So • SS %0 "65
0 0 t o •
-70 ,7"~
O 0 0 0 •
• II0 -llS -q~
0 o 0 o
¢~q.
froq+
q,¢
O OIl IQ
Q
CdJ
IIO 75 79 1,3 4t) SS SO 45 40 3,5 ~+0 ,IS 20 15 1+0 S • -S -lo -!+5 • 20 -lS -30
3 ]. 0 0 I+ 0 l 3. 0 • 0 0 0 l ]. 0 1.]. 0 t l+ • • 0
• ]5
0
-40 "45 - SO -SS
O 0 0 3
-60 *65 -70 -75 -IO -IIS
0
*** *
***
*******
00@ Q~
02 • 0
Fig. 9. (a) Binary image, (b) pairs of nearer components, (c) direction histogram of left block in (a) and (d) direction histogram of right block in (a).
220
PEN-SHu YE8, SERGIOANTOY, ANNE LITCHER and AZRIEL ROSENFELD
then provide useful information about their probable nature. A complete address usually consists of more than three lines whereas most commercial messages are organized into one or two lines. The number of lines in a block is suggestive of the message and can be used as a piece of evidence for decision making. A method of detecting the orientation of sets of components that form nearly parallel lines in a block has been devised. It utilizes the proximity tree structure already established and detects the line orientation by finding peaks in the histogram of nearest neighbor directions. An example is given in Fig. 9 for an envelope image taken at low resolution, processed and segmented into three blocks, two of which, based on position information alone, would both be candidates for the address field. Apparently the line orientation, and consequently the number of lines in each block, provides the next level of evidence for resolving ambiguity. The method was tested on 13 different blocks from 12 envelopes. The deviation of the calculated line orientation angle from the visually determined orientation angle is less than 5°. When blocks of components are formed by cutting the proximity tree, it often happens that the zip code is segmented into a different block than the main body of the address because it is printed a relatively large distance away. This situation can be remedied if the zip code block is found to be on the right side of the address block and the code is collinear with some line in the address block. The collinearity information within a block can be found by the method described above. 6. INFERENCE PROCEDURE: ALTERNATIVES
possesses, such as number of lines, number of characters, color, character stroke thickness, block position, size, etc. In Bayesian classification, a priori information on these block properties is stored as conditional probabilities. A score for a block is easily computed from these probabilities on the assumption that all features are independent entities. A Bayesian system usually has high performance once reasonable statistics have been collected. In our case, the collection of the conditional probabilities is impractical. Another drawback of this approach is that relational information such as "the zip code is to the right of the address block", or "Line A is collinear with Line B" cannot readily be represented by a number. The implementation of Bayesian classification methods also relies on first being able to segment all dark components into blocks. Not much can be done if an incorrect or meaningless segmentation results because of an irregular or unexpected layout of the address, or because of a logo printed on the envelope surface. In other words, there will be no interaction between the segmentation and inference procedures. 6.2. Rule-based system This approach is more flexible since rules need not be applied to blocks directly. Instead there can exist rules for .lines, for character groupings, etc. Examples of rules are the following. R1. IF the size of a component exceeds a certain range, THEN the component is not a character. R2. IF the components in a block are not characters, THEN the block is a commercial message. R3. IF the components of a block are not black, THEN the block is not an address. R4. IF the components of a block have more than one background, THEN the block must be further split.
In a knowledge based vision system, we can expect that image processing, feature extraction and scene labelling (or reasoning) should interact with each other to produce a coherent result. In the domain of mailpiece processing for locating the address block, we are more concerned with region properties than with the fidelity of the components extracted at the lowest level of the task, i.e. the image processing phase; hence, it suffices to limit the interaction to the feature extraction and representation levels. Lower level image processing will be affected by the inference procedure only when missing strokes of characters or touching components create difficulty, as could happen in character recognition. We have considered several alternative approaches to the design of the inference portion of our system. The advantages and drawbacks of each approach are summarized below.
6,3. Hierarchical ambiguity resolution
6.1. Bayesian classification The likelihood of a block being either ADDRESS, STAMP, RETURN ADDRESS or COMMERCIAL MESSAGE depends on all the features this block
In this method, levels of region properties are ordered according to their significance in resolving ambiguous labels of a block. A score is associated with each level and a level is invoked only when there exist competing labels. This score is found by weighting
A problem with this approach is the lack of certainty. Thus weights or levels of confidence must be set for antecedents and consequents of the rules. The determination of these coefficients can be as hard as determining the probabilities of the Bayesian system, and less reliable. The implementation of a production system with many rules, such as those listed in the above examples, is very difficult without formally defining the level, that is component or block, at which to apply which set of rules. In general, a control strategy is lacking and inconsistent results could be obtained by firing rules in different orders or when inconsistency exists in the rule set.
Address location on envelopes each block's features based on knowing the feature's likelihood to arise from ADDRESS, STAMP, etc. Associating each level with a score also enables us to judge if the previous segmentation has been executed properly or if further refinement is necessary. The only drawback is that this method is sequential; it does not take all the features into account together in reaching a conclusion. A method of this type was used in our implementation.
7. INFERENCE PROCEDURE: IMPLEMENTATION
7.1. Rule-based approach An approach which retains the flexibility of a production-rule system and also considers the fignificance ordering of knowledge levels seems to be most appropriate for our problem domain. The reason is that for a high percentage of letter mailpieces, identification of the address field is a relatively easy task once an initial segmentation is obtained through splitting the proximity tree; only a small percentage of the letter mail needs to be processed further. Thus evidence like component size, area, stroke thickness, spectral property, is secondary and should not be invoked immediately. Our inference system consists of an initial segmentation based on Euclidean distances among components. The STAMP block and RETURN ADDRESS are eliminated if the region scores do not show uncertainty above a certain threshold. All candidate blocks are then put through a set of tests implemented as IF-THEN rules. This rule set has members such as: rl. IF block has more than one background THEN split into individual backgrounds, AND compute region score for each. r2. IF block has more than one distinct orientation THEN split into individual orientations, AND compute region score for each. r3. IF more than one block with score > threshold, THEN check number of components in each, AND keep blocks with > 1 component. r4. IF block with more than one component, THEN find number of lines/n AND keep blocks with/~ > 1. r5. IF block with/~ > 1, THEN check component size. In this rule set, rule rl makes use of the inclusion tree information, and r2 uses the line orientation detection method. Both rules examine a certain "region property". Rules r3 and r4 reflect a priori knowledge of what an address field is likely to be. Subsequent rules will implement the secondary evidence for an address field, which tends to vary depending on the addressprinting mechanism. For machine printed addresses, these rules limit the size of address components, their gray value, and their stroke thickness; for a hand-printed address, the thresholds in these rules will be relaxed and some may not be applicable at all.
221
After each rule has been fired, the region score is updated and a new ordering is configured. In implementing such a system, the order of the rules becomes an important focus-of-attention mechanism. It is sequential in nature and in fact corresponds to a vision system which starts from a low resolution feature space where perceptual grouping of individual components is more easily perceived based on various backgrounds or linear features, such as line segments. Domain knowledge of the layout of these perceptual groupings, in this case the positions of STAMP, RETURN ADDRESS, etc. should then be applied. Higher resolution features like component size, stroke thickness, and individual spectral difference will be the focus of attention only when necessary to resolve ambiguity. In implementing an inference system as described here, some training is inevitable for the purpose of establishing some threshold values for our production rules. 7.2. Cutting the proximity tree In all the approaches to locating the address block, grouping of components into blocks is necessary. As pointed out earlier, cutting the proximity tree at an appropriate level leads naturally to clusters of components, even though this operation is considered difficult in general. In our framework good results have been obtained by using heuristic functions with values associated with each link. It is also important to note that some a priori information could allow an easy cutting. For example, it would suffice to know the number of clusters present on an envelope, or the maximum distance below which two components are assumed to belong to the same cluster. The above entities cannot be known with certainty, but intervals of likelihood can be guessed or measured. For example, we expect to find at least two clusters on an envelope, namely the address and stamp, and we don't expect to find more than four or five. In the heuristic cutting of the proximity tree we do not use information of the kind described above. However, domain dependent information should be used to select the best choice when the heuristic cutting offers more than one possibility. The strategy we adopted for the heuristic cutting originated from the following reasoning. Let us consider an ideal experiment. Assume that on an envelope are present two large groups of characters. In each group the characters are organized into words and lines in the usual way, and the two groups are well apart. An ideal cutting of the proximity tree resulting from such an envelope should present several possibilities. Depending on the point of view, either the two large groups, or each individual word, or perhaps each individual line (depending whether the interword space is significantly smaller than the interline space) can be considered as complete clusters. Without any specific purpose in mind all of the above possibilities
222
PEN-SHuYF~, SF~t3ZOANTOY,AN~ Lrrolr~ and AZmELROSENFELD
are equally meaningful and are indeed suggested by our method. Assume now that the two large groups can move on the envelope. As the groups get closer and closer the impression that there are two large groups of characters on the envelope should become weaker and weaker. The point we want to make is that an absolute answer to the problem of determining which are the clusters in not only impossible, but also incorrect. The answer to the question "Are there two large groups of characters on the envelope?" should not be "yes" or "no", but should be a value that can change gradually. Thus our philosophy is to say "Yes" always and to associate a number in the real interval 1"0, 1] with the answer. The number, that we call the score, represents in some sense the level of belief of the system in its own answer. Total belief is represented by 1, total disbelief by 0. The problem of cutting the proximity tree is usually more complicated than that presented in the ideal experiment described above. The more general answer we are looking for is the length of the shortest link that is still long enough to separate two complete clusters. In order to find the answer, we associate a score with every possible length. The local maxima of the resulting function correspond to meaningful choices for cutting the proximity tree. In tests conducted on digitized envelopes we have indeed found maxima corresponding to cuttings whose resulting clusters were large groups of characters, lines, and words. In the above situation it is very easy to set bounds on the numbers of clusters, their sizes, their numbers of components, the lengths of the links, ctc. for selecting a kind of clustering which serves a given purpose. The lengths that must be scored are those of the links present in the proximity tree. A two step process assigns a score to every such length by mean*s of two heuristic functions, h, and h,. The first step assigns a score to every pair of nodes in the proximity tree, say p and q, such that p is father oft/. The score assigned to the pair (p, q) is the value of a heuristic function h~ at (p, q). The second step assigns a score to a length. Let l(p) denote the length of the link represented by a node p of the proximity tree. Ifx is a length to be scored, a set Sx of pairs of nodes is constructed as follows. The pair (p, q) is in S~ if and only ifp is the father ofq, l(p) > x, and l(q) < x. The score assigned to x is the value of a second heuristic function h, at x. A suitable definition for the heuristic functions is the following: h~(p, q) =
I(p) - l(q) -
-
,
l(p)
h2(x) = min~,.¢~s~ {hl(p, q)}. In the above discussion l(p) is the length of the link represented by node p, that is l measures a distance between two clusters of an envelope. & TESTRESULTS Because of limitations on scanning resolution, our
approach was implemented on a set of envelope images digitized at a resolution of 53 pixeis/in. In this set of examples, clusters of components were obtained by cutting the proximity tree built on geometrical distance only. A tentative score based on block position is given for each segmented region. This score is computed by first assigning three different weighting functions corresponding to STAMP, RETURN ADDRESS, and ADDRESS block, to each pixeL The functional value thus is higher at the upper right corner of the envelope for STAMP; higher at the upper left corner for RETURN ADDRESS; and higher at the central lower portion of the envelope for ADDRESS. Here no distinction is made between a possible ADDRESS block or a COMMERCIAL MESSAGE, since the latter tends to locate randomly over the entire envelope surface and features other than the block position will be necessary to distinguish the two. Detailed discussions of the results obtained on each image are given in Ref. (5). Another example, involving a part of an envelope digitized at 200 pixels/in., is shown in Fig. 10 (which is the same as Fig. 1). Various proximity trees built on either the region property of background, or component properties such as area, stroke thickness, component height, gray value, etc., were split to give segmentations in the corresponding feature space. In Figs 11 and 12 the segmentation into different backgrounds naturally split components into two blocks, which happen to match the result from the Euclidean distance segmentation. Figure 13 shows t h e tendency of components to form distinct line segments. Figures 14-17 indicate the grouping of components from each component's characteristics. The result of the segmentation in Figs 10-17 strongly suggests an inference procedure which considers more global properties first and utilizes local features as supporting evidence for a successful interpretation in the domain of address block location. 9. CONCLUDING REMARKS
A domain dependent region segmentation and labelling system has been investigated. The system is divided into three phases. In the first phase all the components of interest are extracted from a digitized image. In the second phase information about the components is accumulated into suitable data structures. The third phase segments the image into regions of consistent scope. The results of testing show that the design of the system is adequate for the goal of extracting and identifying the significant regions of an envelope. The points of major interest that emerged from our investigation are the following. In the image processing phase, contrast evaluation proves to be sufficient for retaining all and only the elementary components. In the information synthesis phase, three data structures for storing geometrical and topological
Address location on envelopes
223
Fig. 10. The original of part of an envelope at 200 pixels/in. information about the components are defined. This information can be extracted efficiently from the image and appears to be suitable for reasoning about components and regions. In the inference phase, an initial segmentation of the image is performed. Domain knowledge is matched against the information gathered from the image. The result of the matching allows a refinement of the initial segmentation and the determination of the intended scope of the various regions. The proposed approach differs from the traditional ones in the selection of the material to be read. In such a system the characters to be read can be selected with a higher degree of accuracy, hence better performance is expected. Of course, the approach has limitations,
since it does not read the characters; a commercial message structured like an address, and located in the lower center portion of the envelope, is likely to be mistaken for the address. However, such cases will be rare, and our approach should be able to handle most of the situations that occur in practice. The proposed approach, as implemented on a conventional computer, is quite slow, and could not be used in a practical mail sorting system, where on the order of 1000 envelopes per minute need to be processed. The approach involves operations that could not readily be performed by existing machine vision hardware. However, numerous new computer architectures are now being developed, many of them involving massively parallel multiprocessing and
Fig. 11. The result of segmentation into different backgrounds.
224
PEN-SHU YEH,SERGIOANTOY,ANNELITCHERand AZRELROSENFELD
designed for use in computer vision systems. It therefore seems likely that hardware capable of implementing approaches such as the proposed one at the required speeds will be available by perhaps ten
years from now, which is about the time when the U.S. Postal Service may be ready to install a new generation of automatic mail sorting systems.
9127 H-A A20712659. SERGIO ANTOY 3406 TULANE D R 34 EYATT~ MD 20783
Fig. 12. The result of segmentation based on Euclidean distance.
Fig. 13. The result of segmentation into lines.
Address location on envelopes
FiB. 14. The result of segmentation based on component height.
Fig. 15. The result of segmentation based on component grey value.
225
226
PEN-SHu YEll,SERGIOAN'rOY,ANNELITCHERand AZRIELROSENFELD
Fig, 16. The result of segmentation based on stroke thickness.
Fig. 17. The result of segmentation based on component area.
REFERENCES
1. B. Levine, Optical character reader/channel sorter read/reject analysis, USPS internal memo (1983). 2. Report on Contract 104230-80-D-1868, MRC Corp., Hunt Valley, MD (1982). 3. T. Miura and Y. Nishijima, Arrangement for detecting a window area of a window.having mail item, U.S. Patent 4034341, Nippon Electric Co. Ltd. (1977). 4. S.N. Srihari, J. J. Hull, P. W. Palumbo, D. Niyogi and C. H. Wang, Address recognition techniques in mail sorting: research directions, Technical Report 85-09, Department of Computer Science, SUNY at Buffalo (1985).
5. Final report on Contract 104230-84-D-069, Computer Vision Laboratory, Center for Automation Research, University of Maryland (1985). 6. S. Antoy, Understanding envelopes, Master's thesis, Dept. of Computer Science, University of Maryland (1985). 7. D. Marr and E. Hildreth, Theory of edge detection, Proc. R. Soc. Lond. B207, 187-217 (1980). 8. R. Dubes and A. K. Jain, Clustering methodologies in exploratory data analysis, Advances in Computers, Vol. 19, Academic Press, New York (1980).
Address location on envelopes About the Author--PEN-SHu YEH received B.S. from National Taiwan University in 1974, M.S. from University of Washington in 1976, Ph.D. from Stanford University in 1981, all in Electrical Engineering. From 1981 to 1983, she did consulting work in the area of signal and image processing. During 1983-85 she was a research scientist in the Center for Automation Research at the University of Maryland. Since 1986, she is with Martin Marietta Labs in Baltimore. Her interests are in the areas of image, signal processing, computer vision and biomedical application. About the AuIhor--SERGIO ANTO¥ received the B.S. degree in Mathematics from the University of Genova, Italy, in 1972 and the M.S. in computer science from the University of Maryland, College 'Park, in 1985. In 1984 and 1985 he worked for the Center for Automation Research on the project described in this paper. His thesis on the subject was awarded a 1985 ACM Master Degree Grant. He is currently researcher for the National Research Council of Italy in Genova. He is involved in applications of Artificial Intelligence methods to Software Engineering problems. About the Author--ANNE M. LITCI-IF.Rwas born in 1958 in New York City. She received the B.A. in Computer Science in 1979 from Queens College of the City of New York. She is currently a graduate student in the Computer Science Department at the University of Maryland. About the Author--AZRmL ROSENFELD received the Ph.D. in Mathematics from Columbia University in 1957. After ten years in the defense electronics industry, in 1964 he joined the University of Maryland, where he is Research Professor of Computer Science and Director of the Center for Automation Research. He is an editor of the journal Computer Graphicsand Image Processing, an associate editor of several other journals, a past president of the International Association for Pattern Recognition, and president of the consulting firm lmTech, Inc. He has published 19 books and over 400 papers, most of them dealing with the computer analysis of pictorial information.
227