Real-Time Imaging 3, 391–397 (1997)
Three-dimensional Polychromatic Objects Classification Based on Amplitude Modulation and Amplitude-shift Keying lgorithms allowing the classification of three-dimensional (3D) polychromatic images based on amplitude modulation and amplitude-shift keying are introduced. For this purpose, the 3D information is likened to a ‘carrier wave’, while the color information is likened to a ‘modulating signal’. The first algorithm is based on the amplitude modulation of the range by the hue. On the amplitude-shift keying platform, two algorithms are designed based on a three-channels red, green and blue modulation and a six-channels hue modulation. In the three-channels approach, the modulation of the three-dimensional image is performed on the basis of the relative ordination of the red, green and blue components, while in the six-channels approach, modulation is performed by partitioning the hue interval into six subintervals. The cross-angle distributions of the modulated range images are calculated in order to obtain invariant representations. Shift keying modulation is also applied to the cross-angle distributions in order to obtain a chromatic sampling of crossangles. From these cross-angle representations a set of histograms are built. The classification is performed by comparing the unknown histograms to a reference set. The classified object corresponds to the smallest error.
A
© 1997 Academic Press Limited
Eric Paquet Visual Information Technology, Institute for Information Technology, National Research Council of Canada, Montreal Road, Building M-50, Ottawa (Ontario) Canada K1A 0R6 E-mail:
[email protected]
Introduction Amplitude modulation (AM) and amplitude-shift keying (ASK) are of common use in communication systems [1]. Each of them can be described as the product of a carrier wave and a modulating signal. AM is an analogue modulation, while ASK is of binary type. There is an analogy between those modulations and three-dimensional (3D) polychromatic objects: a 3D object can be likened to a ‘carrier wave’, while its color distribution can be likened to
1077-2014/97/060391 + 07 $25.00/ri960073
a ‘modulating signal’. In order to exist, the distribution needs a physical support or carrier, which is given by the physical object. The color distribution can be viewed as a modulating signal because it modulates the way in which 3D information is perceived. In this paper it is proposed to modulate the geometrical information by the color distribution using AM and ASK, and to perform the classification on this basis. Four algorithms are presented: the first, based on AM, is presented in the first section, while the other three, based on ASK, are put forward in a later section.
© 1997 Academic Press Limited
392
E. PAQUET
Other sections of the paper are devoted to the analysis and classification of the geometry of modulated images, and considerations on real-time implementation.
algorithms that shall be presented are not sensitive to colorrange matching. The camera can scan a scene in real-time. Instead of using the RGB system, it is also possible to use the hue-saturation-intensity (HSI) system [5–7].
Amplitude Modulation Applied to 3D Polychromatic Images
h = h( x, y) s = s( x , y )
The AM modulation [1] can be defined as s ( q ) = a( q ) f ( q )
z = z( x, y )
i = i( x , y )
(1)
where f(q) is the ‘carrier wave’, a(q) is the ‘modulating signal’, and q is a set of parameters. We want to apply this equation to 3D polychromatic images. These images are captured by a range camera [2–4] and are made up of a range image (2)
(4)
which is simply a representation, in cylindrical coordinates, of the RGB system. The hue [7], is defined as α h = tan −1 − + kπ β
(5)
where h is defined on the interval [0, 2π], k = 2 if β ≤ 0, k = 1 if β > 0 and where
and a color image r = r ( x, y) g = g( x, y)
1 (r( x, y) − b( x, y)) 2
(6)
1 2 r( x, y) + b( x, y)) − g( x, y) ( 6 3
(7)
α= (3)
b = b( x , y ) where the red, green and blue (RGB) components correspond to the RGB representation [5,6]. Equation (2) is simply the 3D shape of the object as it would be seen by an observer looking in the z-direction. The coordinates and the range in Eqn (3) correspond to the real physical coordinates and range. This means that when the object is translated into space, the distance between a pair of points belonging to the object remains unchanged. The range camera [2–4] is made up of a linear detector and a laser scanner. The scanner uses a white laser beam composed of a red, green and blue line. When the beam reaches the surface of an object, a fraction of the incident light is reflected back to the camera, in which a prism splits up the reflected light into its RGB components. Each of these components is deflected at a different position on the linear detector. From the position of these three components and from the state of the scanning system, it is possible to determine the coordinates (x, y) and the range z of the corresponding point. The RGB components are obtained by measuring the intensity of the corresponding spots on the detector. Therefore the color is automatically and always matched to the range. For this reason, all the
β=
It is important to notice that the hue is not a function of intensity nor saturation. This is not the case for the RGB components, which are functions of both saturation and intensity [6]. As shall be shown later, both representations are interesting from the modulating point of view. Let us consider the combination of AM and the hue in which the hue is used as a ‘modulating signal’. We call this modulation H-AM or hue-amplitude modulation. In that case, Eqn (1) can be written as z( x, y; 1) = a( x, y)z( x, y)
(8)
h( x, y) a( x, y) = 1 + µ − 1 π
(9)
where
µ is a modulation parameter and a(x, y) is defined on the interval [12µ, 1+µ]. When µ is small, the modulation of the range by the color is weak, while when µ has a high value, an important modulation is obtained. Equation (8)
CLASSIFICATION OF POLYCHROMATIC OBJECTS enables us to combine both the range and the hue in a single image. The classification of these images shall be considered in a later section.
Amplitude-shift Keying Applied to 3D Polychromatic Images The modulation described by Eqn (1) can be of binary type [1] s( q ) = b( q ) f ( q )
(10)
The problem is to define the binary function. If the RGB system is used, a binary image can be defined for the red, green and blue components. Nevertheless, as pointed out in the first section, the red-green-blue components are saturation and intensity dependent, e.g. they are functions of the surrounding lighting. Thus, the binarization threshold must take into account all those factors.
RGB amplitude-shift keying applied to 3D polychromatic images
(11)
rgb( x, y; 3) ≡ b( x, y)
Multichannel hue amplitude-shift keying applied to 3D polychromatic images It is always possible to divide the hue into subintervals: each subinterval corresponds to a particular color, meaning that the classification is based on colors. In the following algorithm, the hue is divided into six subintervals, corresponding roughly to red, green, blue, magenta, cyan and yellow [5]. To each interval is associated a channel c(n) which is defined as (n − 1)π nπ c( n ) = , 3 3
(14)
(15)
b( x, y; n) = 1 ELSE b( x, y; n) = 0
For each of these components, a binary type image b(x, y; n) can be calculated using the following rule rgb( x, y; n) > rgb( x, y; m) ∀ m ≠ n THEN
Equation (13) corresponds to a three channels modulation. The final result consists of three range images modulated on the basis of the red, green and blue images. Let us call this modulation red-green-blue-amplitude-shift keying (RGBASK).
IF h( x, y) ∈ c(n) THEN
rgb( x, y; 1) ≡ r( x, y)
IF
(13)
where n = 1, 2, … , 6. For each channel, a binary image is computed using the following rule
Let us define the components of the RGB image as
rgb( x, y; 2) ≡ g( x, y)
z( x, y; n) = b( x, y; n) z( x, y)
393
(12)
b( x, y; n) = 1 ELSE b( x, y; n) = 0 where ∀ means for all and m and n = 1, 2, 3. The behavior of Eqn (12) can be described as follows: the dominant color is determined for each point, and a value of 1 is assigned to the corresponding binary image at the corresponding position, thus creating three binary maps representing the distribution of dominant colours in the xy plane. That basically eliminates most of the dependence on saturation and intensity and makes use of the natural ordination of the RGB components. These binary images are used to modulate the range image
where ∈ means belongs to and n = 1, 2, … , 6. The hue is represented by six binary images corresponding to its partition into six color images. On the basis of the six hue channels and by using Eqn (13) it is possible to modulate the range image and to obtain six range-type images. Let us call this modulation multichannel hue-amplitude-shift keying (MCH-ASK).
Cross-angle Transformation Applied to Modulated Images The images obtained with RGB-ASK and MCH-ASK are still range images. If the µ parameter is kept small (typically µ = 0.1), the images obtained with H-AM can also be considered as range images. For this reason, the analysis of the modulated images is performed by considering only their geometry. Nevertheless, it should be remembered that the modulation is based only on colors.
394
E. PAQUET
One way to characterize a 3D image is to compute a set of normals [8–10]. If no assumptions are made on the geometry of the object, these normals can be sampled randomly. The sampling is totally random and it is not related to the geometric nor the color information; the only condition being that the point belongs to the object. The precision of the description depends on the number of normals. Unless very similar objects are considered, this number can be small [10]. In order to obtain translation and rotation invariance in space, the angles between each pair of normals [9,10] are calculated by using the scalar product
(
θ ni , n j
)
n n i j = cos n n i j −1
(16)
This is called the cross-angle transformation [9]. From this set of angles, it is possible to build a histogram [10]. The horizontal scale of this histogram corresponds to the angle between a pair of normals, while the vertical scale corresponds to the occurrence of a given angle. A logarithmic scale is used for the horizontal scale in order to obtain a better description of small features which correspond to small angles. The number of histograms is one for H-AM, three for RGB-ASK and six for MCH-ASK, since to each modulated image corresponds a histogram. In order to classify an object, the unknown histograms are compared to a set of reference histograms by calculating an error defined as err =
∑ ∑ hist n
in
(i; n) − histref (i; n)
(17)
i
where i corresponds to the ith element of the histogram, n is channel n, histin is the input histogram and histref is the reference histogram. The classified object corresponds to the smallest error.
Single-channel Hue-shift Keying Applied to Cross-angle Transformation Binarization can also be useful when computing the crossangle transformation: the angle between a pair of normals is calculated only if the hue difference between the corresponding points is higher than a given threshold
(
h( xi , yi ) − h x j , y j
)
>σ
(18)
where σ is the threshold. This means that the scalar product is computed only if the colors are distinct enough. This is
not an arbitrary rule because, in many cases, the human visual system is more likely to consider the relations between two surfaces if they have a distinct color. Surfaces which have the same color are, most of the time, analysed as a unique surface. This behavior is roughly described by Eqn (18). If the object has a uniform color, it is assumed that Eqn (18) is satisfied by default. Once the set of angles has been determined, the histogram of the normals can be computed. The classification is then performed, as described in the previous section. This modulation is called single-channel hue-shift keying, or SC-HSK. It should be noticed that only non-modulated images are used for SC-HSK.
Experimental Results The reference set for H-AM, RGB-ASK, MCH-ASK and SC-HSK is made up of a vase named per1, another vase named por1, a vase named per2 which has the same shape as per1 but a uniform red color, and a vase named por2, which has the same shape as por1 but a uniform green color. It is important to keep in mind that per1 has exactly the same geometrical shape as per2 and that the same remark can be applied to por1 and por2. The color illustrations of per1 and por1 are shown in Figures 1 and 2 [11]. Even if the background of the illustrations is uniform it does not mean that the proposed algorithms are limited to such a background. As a matter of fact the background can be easily eliminated by thresholding the range image, the threshold being chosen to be smaller than the minimum range of the object. For each image, H-AM, RGB-ASK, MCH-ASK and SC-HSK modulations are calculated, the cross-angle transformation is computed, and the histograms are built. The parameter are chosen as follow: µ = 0.1 for H-AM and σ = π/5 for SC-HSK. It is important to notice that the sampling of the normals is different for each image, which means in particular that a different sampling is used for the unknown and the reference set. The classification is performed by comparing the histograms with the help of Eqn (17). The smallest error corresponds to the closest match. The errors corresponding to H-AM, RGB-ASK, MCH-ASK and SCHSK are shown in Table 1, 2, 3, and 4, respectively. In all cases, the objects have been correctly identified. It is possible to distinguish between per1 and per2 and between por1 and por2 even if they have, respectively, the same shape (only their color distribution is different). It should be noticed that the discrimination is good even if the shape of the objects are very similar. The classification is made only on the basis of geometry and not on the colors. The color distribution is used only for analysing the geometry. The number of sampling points is 100, which is
CLASSIFICATION OF POLYCHROMATIC OBJECTS
395
Figure 1. Color illustration of per1. Table 1. Errors for H-AM (m = 0.1) Input
per1
per2
por1
por2
per1 per2 por1 por2
0.0 1.7 1.0 1.4
1.7 0.0 1.7 1.6
1.0 1.7 0.0 1.6
1.4 1.6 1.6 0.0
Table 2. Errors for RGB-ASK Input
per1
per2
por1
por2
per1 per2 por1 por2
0.0 11.5 10.2 11.1
11.5 0.0 11.2 10.4
10.2 11.2 0.0 11.1
11.1 10.4 11.1 0.0
Table 3. Errors for MC-HSK Input
per1
per2
por1
por2
per1 per2 por1 por2
0.0 28.8 21.9 29.3
28.8 0.0 14.4 10.3
21.9 14.4 0.0 16.3
29.3 10.3 16.3 0.0
Table 4. Errors for SC-HSK (s = p/5) Input
per1
per2
por1
por2
per1 per2 por1 por2
0.0 0.3 1.5 1.5
0.3 0.0 1.5 1.5
1.5 1.5 0.0 0.5
1.5 1.5 0.5 0.0
Figure 2. Color illustration of por1.
much smaller than the number of points forming the image (65 536). This shows that it is possible to obtain a good discrimination even if the sampling of normals is sparse. For each algorithm, the classification is based on the histograms of the normals. A detailed analysis on how classification can be done with such histograms when the objects do not belong to the training set, are very similar or noisy, can be found in [10] and [12].
Considerations on Real-Time Implementation In many applications, real-time, or at least fast, implementation is needed. The algorithms described in the previous sections are suitable for this purpose. The input for RGBASK is made up of a RGB and a range image. These images can be provided in real time by the range camera. Now, let us consider an image of dimension MN. According to Eqns (12) and (13) it is necessary to compare the elements of three pairs of matrices, namely r(x,y)>g(x,y), r(x,y)>b(x,y) and g(x,y)>b(x,y) in order to generate the
396
E. PAQUET
corresponding binary masks (3MN operations). Such comparisons can be performed in parallel. This is followed by the application of the binary masks on the range image, which require 3MN operations. This can be performed rapidly, since no multiplications are involved and the masks can be applied in parallel. Finally it is necessary to compute a histogram for each of the modulated images by using Eqn (16). This is the most time-consuming part of the algorithm, since it involves computing derivatives and scalar products. Nevertheless, as shown in the past sections, the number of points involved is very low. When P points (P << MN) are sampled on the object, 3P normals (there are three modulated images) and 3C (P, 2) scalar products (where C is the number of combinations of pairs of normals) need to be calculated. It is possible to compute the derivatives and the scalar products in parallel. For all these reasons, this algorithm is suitable for parallel implementation. The MC-ASK involves the calculation of the hue. This can be done in real-time using a dedicated board. This algorithm involves more calculations than RGB-ASK for the modulation part, because one has to determine if the hue belongs to an interval, and not only compare two numbers. Compared to RGB-ASK, this algorithm involves twice the number of operations in order to compute the histograms, because the number of modulated images is six instead of three. On the other side, SC-ASK involves the calculation of P (P << MN) normals and C (P, 2) scalar products. As shown by Eqn (8), H-AM involves MN multiplications for the modulation. As in the previous cases, these multiplications can be performed in parallel. Only one histogram is involved in the classification process. As we can see, all these algorithms are suitable for parallel implementation. If no dedicated boards are available for computing the hue, it is preferable to resort to the RGB-ASK algorithm, which utilizes directly the information available from the camera. In the case where such a board is available, SC-ASK should be the fastest algorithm, since it has the smallest number of operations. It should be noticed that the comparison of the unknown histogram with the reference set involves no multiplications, as shown by Eqn (17).
Conclusions Classification of 3D polychromatic images based on AM and ASK has been performed. In the case of AM, the range is modulated by the hue distribution. The ASK approach is based on RGB and hue images. From the three RGB
images, three binary images are created in order to modulate the range function. In the case of the hue image, the hue interval is divided into six channels which correspond to six binary images that modulate the range distribution. In all cases, the cross-angle distribution of the normals is computed and the histograms of the normals are built. These histograms are compared to a reference set by computing an error. It is also possible to compute directly the angles between normals on non-segmented images by imposing a binary type decision: the angle being calculated only if the hue difference between the corresponding points is higher than a given threshold. RGB-ASK uses directly the information available from the camera: this can be an advantage in real-time implementations. MCH-ASK is tightly bounded to the color distribution of the object: this means that a more precise description of the color distribution of the object can be obtained. SC-HSK reproduces a characteristic of the human visual system: it considers the geometrical relations between two points if they have a distinct color. The SC-HSK has a closer relation to the 3D information than the other three algorithms, since, strictly speaking, the range image is not modulated by the colour information. This can be seen in Table 4, where the error between objects that have the same geometrical structure is much smaller than for objects for which it is different (see per1 and per2 and por1 and por2). All algorithms are suitable for parallel implementation, and the sampling of the normals is sparse, so it should be possible to perform a fast implementation with proper hardware. Finally, it should be noticed that 3D polychromatic classification should be more accurate than a standard spectral analysis. Firstly, because the spectral analysis is based only on colors while the 3D polychromatic classification is based on both colors and geometry. Secondly, because the 3D polychromatic analysis does not introduce a dichotomy between color and geometry, i.e. that they are not considered as two independent quantities, which, of course, is not the case, since the color is a function of the coordinates as well as the range. Thus a more realistic description of a 3D polychromatic object can be obtained.
Acknowledgements The author would like to thank the National Research Council of Canada, and especially Mr Marc Rioux for the polychromatic range images, and the Research Council in Natural Sciences and Engineering of Canada for funding this project. This project was also carried out with the financial support of CICYT Project TAP93–0667-C03–03 of Spain.
CLASSIFICATION OF POLYCHROMATIC OBJECTS
References 1. Stremler, F.G. (1990) Introduction to Communication Systems, Reading: Addison-Wesley Publishing Company. 2. Blais, F., Rioux, M. & Domey, J. (1991) Optical range image acquisition for the navigation of a mobile robot. IEEE International Conference on Robotics and Automation, 2574–2580. 3. Blais, F., Rioux, M. & Maclean, S.G. (1991) Intelligent, variable resolution laser scanner for the space vision system. Acquisition, Tracking and Pointing V, Proc. SPIE 1482, 473–479. 4. Rioux, M., Beraldin, J.A., O’Sullivan M. & Cournoyer, L. (1991) Eye-safe laser scanner for range imaging, Applied Optics 30, 2219–2223. 5. Russ, J.C. (1992) The Image Processing Handbook. London: CRC Press. 6. Hill, F.S. (1990) Computer Graphics. New York: Macmillan Publishing Company. 7. Gillespie, A.R., Kahle, A.B. & Walker, R.E. (1986) Color
8. 9.
10.
11.
12.
397
enhancement of highly correlated images. I. Decorrelation and HSI contrast stretches. Remote Sensing of Environment 20, 209–235. Faugeras, O. (1993) Three-dimensional Computer Vision. London: MIT Press, London (1993). Oka, R., Kasvand, T. & Rioux, M. (1985) Cross-angle transform for viewer-independent recognition of 3-D objects. IEEE Proceedings on Computer Vision and Pattern Recognition, 470–475. Paquet, E., Rioux, M. & Arsenault, H.H. (1995) Invariant pattern recognition for range images using phase fourier transform and neural network. Optical Engineering 34, 1178–1183. The 3D polychromatic images are from Mr Marc Rioux of the Institute for Information Technology, National Research Council, Building M-50, Ottawa (Ontario), K1A 0R6, Canada. Paquet, E., Rioux, M. & Arsenault, H.H. (1995) Recognition of faces by using the phase fourier transform. Pure and Applied Optics 4, 709–721.