Pattern Recognition Letters 8 (1988) 189 196 North-Holland
October 1988
Moving object detection, inspection, and counting using image stripe analysis Wen-Nung LIE and Yung-Chang CHEN Sensor Lab., Institute of Electrical Engineering, Tsing Hua University, Hsin-chu, Taiwan, 30043, Republic qf China Received 26 February 1988 Abstract: Moving object detection, inspection, and counting using image stripe analysis in factory and commercial applications are investigated. A real-time visitor counting system is presented as a realization of this strategy. Practical experiments show that 95% accuracy is achieved even with a visitor rush. The same principle can be similarly applied to other applications. Key words: Industrial automation, image sequence processing, image stripe analysis, moving object detection, inspection, realtime system, relation processing.
1. Introduction
Applications of image analysis, such as robot guidance, autonomous vehicle navigation, 3D object recognition, bin picking, quality inspection, and so forth, have been increasingly emphasized in industrial automation. However, due to the large amount of data processing and complicated environments, it is hard to make the system practical unless costly special-purpose hardwares are utilized and environments are carefully controlled. There were applications in our daily life which can be practiced with simple image methods. Examples include detection, inspection and counting of moving objects, such as products of a production line or visitors in an exhibition, which are given as embodiments in the present paper. Motion estimation from image sequence processing has been an interesting topic and is widely applied in target tracking and robot vision. Huang (1981) addressed methods for the estimation of general 2D or 3D motion (including scaling, translation, and rotation); they used either differential method to estimate image-space shifts of a number of physical points on the same rigid body to determine motion parameters, or matching method to
register the same points in two consecutive images and then to compute transformation parameters between these two sets of feature points. One recent approach, called temporal-spatial gradient method, used the velocity components of image points, which can be calculated from the spatial gradients of the images and the local intensity change over time due to motion, to realize the optical flow concept and is applied extensively to the applications of medical imaging and motion compensated image sequence coding (see Horn (1980) and Netravali (1979)). Although the above techniques own their individual sophisticated philosophies, they are computationally intensive. According to the requirements of a task, a whole processing of an image frame is not always necessary. For examples, detection problems only answer with 'existing' or 'non-existing', counting problems respond with a value and inspection problems judge the goodness of an object. In general, a 3 by 3 convolution operator operating on a 256 by 256 pixels image may take several seconds for edge detection on a mini-computer. In spite of the available real-time convolution hardwares, their cost is high. Other image processing techniques also demand similar execution time. In this point of
0167-8655/88/$3.50 O 1988, Elsevier Science Publishers B.V. (North-Holland)
189
Volume 8, Number 3
PATTERN RECOGNITION LETTERS
ft
ds'Ns
October 1988
W'~1'62'63
P1(t) I_~I ] Image I I Ifeature ] P2(t) ....Z_ t'--ttr. ~-~'~detection&! i ] camera [ xrame[ ~ ipes] [location [ PN~t) I [ P (t) ...... Pl (t)~....N~ I pl (t,) I ] property _ [ relation ~temporal output - [processing! ~'s [buffers t t'< t knowledge ]
rules
Figure h Schematic diagram for the image stripe analysis;f, d,, N,, W, 61, 62, 33 are discussed in Section 3.
view, some special schemes should be adopted to meet the system requirements. In this paper we propose a strategy using image stripe analysis for the detection, inspection, and counting of moving objects from a dense sequence of TV images. Our key idea is to sample each image frame at appropriate spacings in X or Y dimension to form image stripes, detect and locate partial object features in each stripe and finally perform relational processings in which relations between each located feature and the other ones with temporal or spatial differences are explored according to some knowledge rules. For instances, the exploration of temporal and spatial sequences of located partial object features will determine the existence of that object and its moving trajectory, or we can judge the completeness or goodness of an object by considering the set of located partial object features. The illustration of this strategy is shown in Figure 1. Among the work, features suitable for ID detection and location should be sophisticatedly defined and relation processing is governed by a set of knowledge rules, depending upon the specific application case. A real-time visitor counting system based on image stripe analysis is first proposed in the present paper and the same principle can be applied to other applications with slight modification. In this system, each passing person is characterized by his head from the top view. Individual image stripe is first processed to locate the head patterns in it and then the relations between these located patterns 190
are explored to determine the desired properties. Not only will the number of visiting persons be counted with reasonable accuracy, but also the moving direction of a person can be determined as a by-product.
2. The visitor counting system 2.1. Overview and outline
To count the number of persons visiting an exhibition is generally a task necessary for the sequel analysis of that exhibition. There were many techniques which could be utilized to design such a counting system. For instance, a set of infrared or ultrasonic type emitters or receivers are properly arranged at the inlet; once the body of a visitor interrupts the rays of emission from emitters to receivers, the accumulated number will be incremented. Alternatively, a set of mechanical switches or pressure sensors can be installed to count the persons walking through the inlet. However, the above sensors are on-off types only suitable for sequential and sparse flow of visitors. They have the disadvantage of low discrimination and are not versatile for applications in factory automation. In Tsukiyama's (1985) and Furusawa's (1986) papers, image processing techniques were adopted to find the passer's trajectory and to measure the congestion, respectively. Their methods used edge
Volume 8, Number 3
PATTERN RECOGNITION LETTERS
detection and template matching to find out the desirable patterns. Correspondingly, the operation time ranged from several to tens of minutes for a mini-computer-based implementation. Alternatively, standard or generalized Hough transform was also useful in industrial applications to detect specific patterns. Nevertheless, they are all too timeconsuming to be practically useful. Instead of overall processing of an image, image stripes are extracted, or linear sensor arrays are utilized. Number of stripes needed depends upon persons' walking speed and the relation processings between stripes. The ordinary walking speed of a person is about 120 steps one minute. Assuming that one step is measured 75 centimeters in average, the distance a person moving in 1/30 second is approximately 5 cm; this value influences the forthcoming stripe sampling ratef~ (in Section 3). With real-time picture-taking process, only one image stripe is enough for counting purpose and at least two image stripes are necessary for the determination of moving direction (i.e. outgoing or ingoing). If the system parameters (see Section 3) are adjusted to ensure that the head of each passing person be detected, relations between located patterns in temporal and spatial stripe sequences are established to evaluate the desired properties, i.e. the number of passing persons and their moving directions. As shown in Figure 2, the image is of N by N size and the two stripes (SPI and SP2) are located to be symmetric to the central line of that image. This is for the reduction of perspective distortion and the possible occluding effect between passing persons.
2.2. Detecting head patterns in image stripes The easiest and fastest method for head pattern detection in image stripes is image binary thresholding if the head is characterized by its hair gray level. Other alternatives include non-floor gray level violation and head edge point detection. The former requires the pre-calculation of the gray level distribution of floor and considers the human body which may block offthe floor as the feature, not just the head. However, this has the drawbacks of nonsymmetric shape of the feature and large variance in the stripe area covered by each passing person.
October 1988 N
--
in---SPI
SP2
N
L
out-'--
s
Figure 2. Stripe configuration of the visitor counting system.
On the other hand, the latter needs the operation of a 2D edge detector. After thresholding, a 1D sliding window of fixed size W moves along each image stripe to decide whether enough dark points (assuming that the head points have been thresholded to be dark and others to be bright) have accumulated within that window. If there are (i.e. over threshold T), a partial head pattern is located at the central position of the sliding window and outputted for later processing, as indicated by Pi(t) in Figure 1. Hence each P~(t) represents a list of head pattern positions found in image stripe i and at time t. If the threshold T is set to be a nearly full scale of head width Wh, only the part near the diameter of the head circle is sampled and detected as a head pattern. The assertion of sampling of head patterns by each stripe leads to a proper choice of T, which will be discussed in Section 3. Detailed procedure for the detection and location of head patterns is given in Algorithm 1 in a C-syntax-like language.
Algorithm h detecting head patterns in an image stripe 1. Given W, T, N, gl, g, /, see Section 3 ,/ 2. count = 0; pointer = 0; i = pointer /, count: accumulator, pointer: window start */ 3. If (gl < gi -< g,) {pointer = i; goto 4} /* gi: gray level of point i */ else i = i + 1 /* look for a starting dark point ,/ If(i > N) goto 7 else goto 3} 4. for = pointer to pointer + W /, test within the window ,/ {If (i > N) goto 5 If (gl < gi -< g,) count = count + 1} 191
V o l u m e 8, N u m b e r 3
PATTERN RECOGNITION
5. If (count > T) / . output the head pattern position • / {output (pointer + W/2); pointer = pointer + W; i = pointer; count = 0; goto 3} / . else skip the first few dark points and look for another starting dark point . / 6. for i = pointer to pointer + W {If ((gi > gu) or (gi < g~)) {pointer = i; goto 3}} 7. end
61, 62, 6 3 > 0 and Pto(io, Xo) if Pto(io, Xo) = 0 then "no heads are under surveillance" if P~o(io,Xo) = 1 then if V Q) Pt(i,y)l
1. given 2.
192
O c t o b e r 1988
Our knowledge rules can be stated blow: (1) If a person walks first through the 1st image stripe, and then the 2nd stripe, then he is ingoing. (2) If a person walks first through the 2nd image stripe, and then the 1st stripe, then he is outgoing. According to the above rules, they can be formulated, in a general form, as the following conditions:
2.3. Relation processing between stripes So far, partial patterns of the moving objects have been located in each stripe and recorded in terms of their center positions. The elements of relation processing consist of relation reference and relation operation. First, each located pattern is designated a reference range; in general, this range covers the surrounding neighborhoods extending over both the time and spatial domains. Next, the references are investigated according to the defined relation operations between them. In Figure 3 a 3D relation cube and its 2D cross sections are illustrated. Each node represents an image pixel position and is shown with a solid circle if a pattern is located there; otherwise, it is with a hollow circle. The three axes of the 3D cube represent time, x, and y coordinates, respectively. Thus, an image frame holds the same orientations as rectangle ABCD, as shown in Figure 3(b), and image sequences proceed along the time axis. The arrows in Figure 3(a), (c) mean 'refer to'. A solid node can refer to any nodes with time or spatial differences, whereas a hollow node points no arrows outwards. Due to the causal nature and the operation of time sequence processing, every node can refer to nodes at the prior instants only. The 'knowledge rules' box of Figure 1 is used to establish the associated relation references and relation operations. The operations between referred and referring nodes depend on application cases. In general, for the detection, inspection and counting problems, logic A N D (('3), OR ( ~ ), and NOT ( ) operators are enough. Taking our visitor counting system for example, let a given node be denoted by Pro (io, Xo), that is, a position Xo at stripe io and at time instant t o. Pto(io, Xo) = 1 (logic true) for a located pattern there and Pto(io,Xo)= 0 (logic false), otherwise.
LETTERS
3. 4.
/ lY- xol< 61
|0
~[
0
then "a .
if[
Pto-l(io, y)] =1
ly- xol < 61
head is ingoing"
U
b' - xol < 61 O
Pt(i,Y) 1
ly- xol < 61 then "a
.
else
head is outgoing"
"that head stays".
The above conditions will be explained in further details. First, if P,o(io,Xo)= 0 then it means no heads are under surveillance. Only those nodes of P,o(io, Xo) = 1 must be checked for ingoing or outgoing conditions. For nodes of
Pto(io,Xo) =
1, first
check their spatial neighbors (in the same stripe) at the previous instant to - 1 with the tolerance 61. If a certain node of Pro_ l(io, y)= 1 is found, that head is still under surveillance (1 ~ l, do not change state). If not, that head comes into surveillance for the first time (0 ~ 1, change state). The main goal of the first terms of line 4 and 5 is to check the sequence order in which head patterns occur between stripe io and stripe i, 0 < io - i _< 63, or between stripe io and stripe i, - 6 3 _ < i - i o < 0 . In brief, an ingoing head pattern must first be detected in stripe i, 0 < io - i < 63 at time instant prior to to
Volume 8, N u m b e r 3
P A T T E R N R E C O G N I T I O N LETTERS =
x
time
-,"".£ % . , - ~ w ' . ~..,""J~
E
F
/ (a)
rDl H
V B
G
(b) time
lllk~ll II Y
I r'-l~ I I I I ~ I I I
~ I I//#I I I I F
{ II II dI I~IIIIII I
G
(c) Figure 3. Relation graph: solid nodes represent located feature positions. (a) References to the corresponding nodes at different stripes and at different time instants. (b) Orientations of cross sections ABCD and E F G H . (c) References to the nodes at the same stripe, but at different time instants.
October 1988
Vw : average walking speed of a person (pixels/sec) f :image temporal sampling rate, f = 30 frames/ sec ]~ : image stripe spatial sampling rate ds :width between adjacent stripes, ds=J~ 1 (pixels) N s : number of stripes selected per image frame N : image stripe length (pixels) Wh: average head diameter (pixels) W : sliding window size for image stripe processing T : threshold value of accumulated dark points in a window for being detected as a head pattern gu : upper limit of hair gray level g~ : lower limit of hair gray level 61 :range of spatial neighboring references in a stripe ~52 : range of time-forward temporal references 63 : range of neighboring stripe references Parameters 61, 62, and 63 form a 3D reference range, as stated earlier. To ensure that the passing persons are completely under the surveillance of selected image stripes, some relations between parameters must hold. For example,
(1)
~)w <
(2) ds > dml, such that P,o(io, Xo)~Pto(i o - 1, Xo) = 0
P,o(io, Xo)C~P,o(io + 1, Xo) and then stripe io, at time instant to. For the outgoing head patterns, it is similarly conditioned. For our system just adopting two image stripes for surveillance, 63 = l; one stripe is checked for ingoing counting and the other for outgoing counting. Notice that the choice of 62 must take the image stripe sampling rate.[~ and the walking speed of a visitor into considerations. The setting of these parameters will be discussed detailedly in the next section.
3. Determination of parameters and experimental results In a visitor counting system, the primary parameters to be considered and their mutual relations are listed below:
0
where dmi. = x/-W2 - T 2 (3) W > Wh (4) T <_ , ~
- (Vw/f) 2
The first inequality ensures that when a head pattern center is located at Pto(i o,xo), one of its 62 time-forward temporal references at the immediate neighboring stripe in the opposite moving direction should be 1, that is,
UPt(i, y) = 1, where 0 < to - t ~< 62, ] y - XoI < 61, and i = io - 1 or i = io + 1, depending on the moving direction; this is illustrated in Figure 4 for clarification. The second equation says that the value of d, must be 193
Volume 8, N u m b e r 3
stripe
P A T T E R N R E C O G N I T I O N LETTERS
io
b
stripe
i o -i
ds ---~ Ix ) to -i
f" ", ~'-/ t~ -2
moving Idirection
P% (i0 ,x0 ) Figure 4. For 6 2 = I or 2, (vw/ft)6 2 > d~ does not hold in the above case and consequently, the two dotted circles m a y never be sampled by stripe io - 1.
greater than certain minimum so that head patterns are not detected simultaneously at corresponding positions of immediate neighboring stripes. The determination of the minimum value depends upon several factors, e.g., the height at which the camera is suspended, the average diameter of a head, and the threshold value T; this is illustrated in Figure 5, where a head is modeled as a shape of a circle. The third condition seems to be intuitive, but it justifies its usage while some non-head parts of a person (e.g. the shoulders) are thresholded as dark points due to similar gray levels. The main goal of condition (4) is to guarantee that when a person keeps on moving, his head is sampled near the diameter by each image stripe and recognized as a head pattern via threshold T. The illustration is given in Figure 6, where d is the average movement (in terms of pixels) during a picture-taking interval (1/30 sec). Besides, T is set to be a fraction of Wh, regardless of the choice of W. To verify our algorithm, the visitor counting system has been installed in a practical exhibition where a visitor rush is possible. Figure 7 shows the system configurations, consisting of a lighting lamp
October 1988
(300 Watt), a suspended CCD camera (at 5m high), a passageway (3m wide), one set of IBM PC-XT computer equipped with an image frame grabber, 4 sets of display boards to which modems are connected, an operating program written in assembly language, and so forth. The associated parameters are listed below: vw= 1.5m/see ~ 75 pixels/sec (with a camera suspended at 5m high) f = 30/sec ds--- 7 pixels Ns = 2 stripes N = 128 pixels Wh= 10 pixels W = 15 pixels gu = 32 gl= 0 T = 8 pixels 61=4 62= 3 63=
1
In general, high 62 value requires more buffers to store located head pattern positions for further relational processings. Alternatively, when condition (2) is violated (in such case, we could have 62 = 1), additional heuristic knowledge rules are necessary to be used for the detection of the person's turnback movement. For non-jamming flow of traffic the knowledge rules stated in the last section are enough, but for crowded streams, more heuristic rules should be added to make the system robust. By practical testing, our system operated with an accuracy of 95%. However, in a sparse visitor flow, f~Wh 2
dmin 2
SP i
d
SP i+l Figure 5. The determination of the m i n i m u m of d~:dmi .. dmin 194
=
2 , / ( W h / 2 ) z -- (/'/2) 2 = ~ -
T 2.
Figure 6. The threshold T is chosen such that the shaded area is guaranteed to be sampled by each image stripe. T/2 <_ x/(Wh/2) 2 - (d/2) 2 implies T < , , ~
- - d 2, where d = vw/f.
Volume8, Number 3
PATTERN RECOGNITIONLETTERS
CCD camera lamp~ 0 /11~\~
fl \\
7-/'-
stripe 1
stripe 2 Figure 7. The configurationsof the visitorcountingsystemin a practical exhibition. it is over 98%. The inaccuracy results mainly from violation of the dark feature of the hair. For exampie, a man has a non-black hat on or he is a bald person. In such cases, we may choose the floor as the background and anything interrupting the floor is recognized as an object pattern.
4. Discussion of image stripe strategy In the previous sections, we have presented a visitor counting system which uses only two image stripes per frame for counting the passing persons and discriminating their moving directions. Simple processings operate over each image stripe, reducing the tedious image sequence processing onto a 1D domain. From the viewpoint of informationcarrying, a 2D image frame is more abundant than I D image stripes. In some cases, simple I D image processing, like the one in our visitor counting system, is inadequate for locating featured objects; nevertheless, the image stripe strategy still works, but it is only in the meanings of image sampling. In other words, conventional 2D processing techniques apply mainly to the pixels of selected image stripes to reduce computational loads, and the post processings operate as before. Simplifying the low level image processing and sophisticating the post processing are the main strategy of this paper. Extracting only certain stripes for I D image processing reduces the needed time per frame. In this way, real time image frame grabbing is possible (30 frames/sec) and the speed of moving objects imposes no limitation in design-
October 1988
ing a practical system. In reality, many tasks in factory automation, such as counting, defect inspection (e.g. Davies' (1987) foodproduct inspection system) of products on a production line, can be easily implemented using this strategy. Nevertheless, in applying this strategy, there are some points for the moving objects that should be noticed: (1) The objects' shape be as symmetrical and compact as possible. (2) Stable and distinct features (e.g. the dark characteristic of hair) are required. Abiding by the above conditions will lead to simplification of the relation processing rules. On the contrary, if the moving objects have elongated shapes, it will be difficult to choose a proper size of the sliding window for I D image processing, for we can not accurately predict the possible covering of the objects onto the image stripes when the objects are arbitrarily placed in any orientations. If the objects have no stable and distinct features to be detected and located, 2D processing will be necessary. Therefore, the above two requirements ensure that the detection and location of objects' features are as correct as possible. In implementing a practical system, features of the moving objects and the rules for the relation processing should be sophisticatedly defined and they play important roles for a successful realization. To further speed up, simple hardwares for ID image stripe processing, unlike the complicated 2D image processing hardwares, could be designed. Speed is an important consideration for practical application to industrial automation.
5. Conclusion A new strategy using image stripe analysis for the detection, inspection, and counting of moving objects in industrial or commercial applications is proposed. Conventional 2D image or 3D image sequence processings are replaced by simple techniques operating on each selected ID stripe. This strategy dramatically reduces the processing time and may be easily implemented in hardwares without much cost, Number of image stripes and the width between them are properly chosen subject to 195
Volume 8, Number 3
PATTERN RECOGNITION LETTERS
some constraints (case dependent). After object patterns have been detected and located in each stripe the relations between these detected patterns are explored according to designated knowledge rules to evaluate desired properties, for instances, the goodness of a product, the accumulated number of products, and so forth. Due to image sampling and 1D processing, processing time for each image frame is so small that real-time (30 frames/sec) applications will be feasible. A real time PC-based visitor counting system is presented as a realization of the proposed strategy. By practical testing in an exhibition, 95% accuracy is achieved even with a visitor rush. Better results are possible with a sparse visitor flow. Finally, we discuss the primarY concerns and advantages in using the image stripe analysis strategy.
196
October 1988
References [1] Netravali, A. and J. Robbins (1979). Motion compensation TV coding: part 1. Bell Syst. Tech. J. 58, 631~570. [2] Horn, B.K.P. and B.G. Schunck (1980). Determining optical flow. AI Memo 572, M.I.T. [3] Huang, T.S. and R.Y. Tsai (1981). Image sequence analysis: motion estimation. In: T.S. Huang, Ed., Image Sequence Analysis, Springer, Berlin. [4] Tsukiyama, T. and Y. Shirai (1985). Detection of the movements of persons from a sparse sequence of TV images. Pattern Recognition 18,207 213. [5] Furusawa, H., S. Ikebata and M. Yano (1986). Congestion measurement by scene analysis. Proc. Conference on Industrial Electronics, Control and Instrumentation. IECON'86, 341-346. [6] Davies, E.R. (1987). A high speed algorithm for circular object location. Pattern Recognition Letters 6, 323 333.