Sewage pipe image segmentation using a neural based architecture

Sewage pipe image segmentation using a neural based architecture

Pattern. Recognition Letters ELSEVIER Pattern RecognitionLetters 17 (1996) 363-368 Sewage pipe image segmentation using a neural based architecture ...

413KB Sizes 0 Downloads 27 Views

Pattern. Recognition Letters ELSEVIER

Pattern RecognitionLetters 17 (1996) 363-368

Sewage pipe image segmentation using a neural based architecture Javier Ruiz-del-Solar *, Mario K~Sppen Fraunhofer-lnstitut IPK Berlin, Departmentof Pattern Recognition, Pascalstr. 8-9, 10587 Berlin, Germany Received 20 December 1995

Abstract This article describes a neural architecture for real-time segmentation of sewage pipe video images, which is based on processing mechanisms of the mammalian visual system and corresponds to a modified version of the Boundary Contour System. Remarkable aspects of the proposed architecture are: the use of odd-symmetric 2-D Gabor filters as receptive fields of the neurons at the Oriented Filtering Stage; the use of neurons with collinear and noncollinear receptive fields at the Cooperation Stage; and the pre-processing of the input signal using a Spatial Complex Logarithmic Mapping.

Keywords: Boundary contour system; Gabor filters; Spatial complex logarithmic mapping

1. Introduction The periodical inspection of sewage pipes is necessary to avoid the ecological damage produced when the transported substances are leaked into the environment. That leakage is produced by corrosion, fissures, and split of pipes' sections. The small diameter of pipes does not allow direct human inspection of them. Visual inspection through the processing of inner images is a good possibility to perform this task. The inspection is performed through the processing of a video signal. This signal is taken by a CCD camera mounted on a remote controlled camera-car, which moves through the inner parts of the pipes. Fig. 1 shows this inspection system. The present work is part of a research project (Lohmann and Nickolay, 1994) to automate the vi-

sual inspection process of pipes. Automating the visual inspection process saves human time and effort and can provide accurate, objective, and reproducible results. Additionally, automation can eliminate human errors resulting from fatigue or lack of concentration.

* Corresponding author. Email: [email protected] 0167-8655/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 0167-8655(95)00132-8

I

Fig. 1. Sewage pipes' inspection system.

364

J. Ruiz-del-Solar, M. KOppen/ Pattern Recognition Letters 17 (1996) 363-368

.TLt__..__L__.

VlS,q PSS I - ~ ORS .... FFS

~8.~8

Fig. 2. Block diagram of the architecture.

The proposed automatic inspection system works as follows: the camera-car moves through the pipes and looks for the location of the pipes' sockets, because of most of the pipes's faults are placed in the socket's surrounding area. Each time a socket is detected the camera-car films its surrounding area. Later this information is off-line analyzed. A complete description of the system can be found in (Lohmann, 1993). From the above description it is clear that the automatic socket detection must be performed in real time. The detection system must be also very robust, because of the variable environmental conditions inside the pipes (variable illumination, lack of equidistance between the sockets, presence of physical obstacles such as solid substances and water, etc). To implement the detection of sockets we tried to use some different approaches such as the application of edge detectors, morphological operators, and classical neural networks (Perceptron and Back Propagation). However, the results were not satisfactory. For this reason a new, very robust, neural-based architecture was designed, whose mechanisms are motivated and justified by evidence from psychophysics and neurophysiology. These mechanisms were adapted and simplified taking into account the main system characteristics, which are: real-time processing, variable environmental conditions inside the pipes, and some a priori knowledge of the physical system properties (geometry of the pipes, CCD camera, and camera-car). The block diagram of this neural-based architecture is shown in Fig. 2. This architecture is composed of three subsystems: PSS (Preattentive Segmentation J Subsystem), ORS (Object Recognition Subsystem), and FS (Foveation Subsystem). The PSS segments the input image, or

t Preattentive segmentation refers to the ability of mammalians to perceive textures without any sustained attention.

more exactly, a reduced image obtained from the original one. It has two inputs, the VIS (Video Input signal), and the PFS (Parameter Feedback Signal), which is a signal coming from the ORS that allows adjustment of local parameters. The ORS detects the pipes' sockets taking as input the output of the segmentation process (SOS - Segmentation Output Signal). Finally, the FS keeps the camera focus centered in relation to the main axis of the pipes. It receives an input signal (SMS - Spatial Mapping Signal) from the PSS and sends the FFS (Foveation Feedback Signal) to the camera-car. The purpose of this article is to describe the Preattentive Segmentation Subsystem. This description is presented in Section 2. In Section 3 results and preliminary conclusions are given.

2. Preattentive Segmentation Subsystem (PSS) The Preattentive Segmentation Subsystem (PSS) is formed by three modules (see the block diagram shown in Fig. 3), called: SCLM (Spatial Complex Logarithmic Mapping), DOI (Discount of Illuminant), and SBCS (Simplified Boundary Contour System). The following subsections describe each module of the PSS.

2.1. SCLM (Spatial Complex Logarithmic Mapping) The SCLM module performs a complex logarithmic mapping of the input signal (see Fig. 4). This mapping is given by

(u,v)=(lnf-fi+yZ,tan-t(y/x))

(1)

with (x, y) the original coordinates and (u, v) the transformed ones.

¢ 1~:$ ~

S.................

/ ................. Fig. 3. Preattentive segmentation subsystem (PSS).

_1

J. Ruiz-del-Solar, M. Ki~ppen/ Pattern Recognition Letters 17 (1996) 363-368

365

work (defined in (Grossberg, 1983)), which models the receptive fields response of the ganglions cells of the retina. Image regions of high relative contrast are amplified and regions of low relative contrast are attenuated as a consequence of the discounting process. 2.3. SBCS - Simplified Boundary Contour System

Yk. x

~

~___

~ .,.- ~

9

-___-~'

(InPmax_lnPo)

Fig. 4. Complex logarithmic mapping.

Studies of optical nerves and visual image projection in the cerebellar cortex show that the global retinotopic structure of the cortex may be characterized in terms of the geometric properties of that mapping (Schwartz, 1980; Wilson et al., 1990). The SCLM module takes advantage of the circular system symmetry focalizing the analysis into circular image segments, which allows a significant diminution of the data to be processed. This data diminution is produced by the logarithmic sampling of the input signal in the radial direction and by the constant sampling (the same number of points is taken) in each angular sector to be transformed. Additionally, this mapping provides an invariant representation of the objects, because rotations and scalings on the input signal are transformed into translations (Schwartz, 1980), which can be easily compensated. Finally, complete circular segments (360 °) are not used because normally there are water or solid sediments in the lower pipe region, which disturb the segmentation.

The SBCS module corresponds to a simplified and modified version of the Boundary Contour System (BCS) developed at the Boston University (Grossberg and Mingolla, 1985). The BCS model is based primarily on psychophysical data related to perceptual illusions. Its processing stages are linked to stages in the visual pathway: LGN Parvo ~ Interblob --* Interstripe --* V4 (see description in (Nicholls et al., 1992)). The BCS model generates emergent boundary segmentations that combine edge, texture, and shading information. The BCS operations occur automatically and without learning or explicit knowledge of the environment. The system performs orientational decomposition of the input data, followed by short-range competitive and long-range cooperative interactions among neurons. The competitive interactions combine information of different positions and different orientations, and the cooperative interactions allow edge completion. The BCS has been shown to be robust and has been successfully used in different real applications like: processing of synthetic aperture radar images (Cruthirds et al., 1992), segmentation of magnetic resonance brain images (Lehar et al., 1990; Worth, 1993), and segmentation of images of pieces of meat in an industrial environment (Dfaz Pemas, 1993). In general the standard BCS algorithm requires a significant amount of execution time that does not allow its utilization in our real-time application. For that reason the SBCS was developed. It uses: monocular processing, only the " O N " processing channel, a single spatial scale, and three orientations (a description of these characteristics can be found in (Grossberg, 1994)). Each processing stage of the proposed model is explained as follows.

2.2. DOI - Discount of llluminant 2.3,1. Oriented Filtering Stage ($1)

In "this stage variable illumination conditions are discounted by a shunting on-center off-surround net-

Two-dimensional Gabor filters, introduced by Daugman (1980), are used as oriented filters. These

366

J. Ruiz-del-Solar, M. 3~iJppen/ Pattern Recognition Letters 17 (1996) 363-368

filters model the receptive fields of simple and complex cells in the visual cortex. We use only odd-symmetric filters, which respond optimally to differences of average contrast across its axis of symmetry. By taking into account the image circular symmetry and their posterior logarithmic mapping, we use only three oriented filters (see Fig. 5). 2.3.2. First Competitive Stage ($2) Cells in this stage compete across spatial position within their own orientation plane. This is done in the form of a standard shunting equation with two additional terms, a tonic input (T) and a feedback signal (V) that comes from a later stage (Feedback Stage). Our modified dynamic shunting equation is given by

%% Fig. 6. Bipole cells with collinear and noncollinear branches. The bipole cells ( * ) are not used.

d --d-]tWijk = - a w i j k + ( B - Wijk)( Jij k + CVij k + T) - Wijk

~.,

GpqijJpq k .

(2)

(p,q)~,(i,j)

At equilibrium, this equation is determined by:

8(4jk + cv, jk + r ) Wijk = A + CVij k + T +

~, GpqijJpq k

(3)

other cells that have the same position but different orientation. As in (Lehar et al., 1990), the equilibrium condition for this dynamic competition is simulated by finding the maximal response across the orientation planes for each image position and by multiplying all non-maximal values by a suppression factor.

(p, q)

where W is the output of this stage; J is the output of the Oriented Filtering Stage; G is a gaussian mask; k is the orientation index; p, q, i, j are position indices; and A, B, C are constants. 2.3.3. Second Competitive Stage ($3) At this stage competition takes place only across the orientation dimension, i.e. cells compete with

I1

P

,°pt Fig. 5. Orientedfiltermasks.

2.3.4. Oriented Cooperation Stage ($4) The oriented cooperation is performed in each orientation channel by bipole cells that act as longrange statistical AND-gates. Unlike the standard BCS model we use bipole cells whose receptive fields have collinear and noncollinear branches (see Fig. 6). These receptive fields have properties consistent with the spatial relatability property (deduced from studies performed by Kellman and Shipley (1991)), which indicates that two boundaries can support an interpolation between themselves when their extensions intersect in an obtuse or right angle. The cooperation in the left-half receptive field (Lij k) is performed among neighbouring cells with the same orientation, and the cooperation in the right-half receptive fields (Rij k) among neighbouring cells across all the orientations. At equilibrium, the output from this stage is defined by SLij k SR ijk ZiJk = D + SLijkSRij k

k = 0, 1 2

(4)

J. Ruiz-del-Solar, M. KSppen / Pattern Recognition Letters 17 (1996) 363-368

367

with Stij k = (p,q)~Lijk 1

SR ij k

rmax~

(rmax--r0k+ 1) E • YpqrFpqijr r= rok ( p,q)~ Rijr

(5) and

Fpqijk ~ e x p ( - E ( ( l B p q i j k [ - - e ) 2 ~ - C 2 p q i j k ) ) X[lCoS(Opqij--

Ok) l] R ,

Bpq,j k = ( p - i) cos(Ok) - (q - j )

sin(OD,

Cpqok = ( p - i) sin( 0k) + (q - j )

cos( 0 k),

Fig. 7. Inputimage(376× 288 pixels).

q-j

0, = ( k -

1/ A0

(6)

with fpqij k the receptive field kernel; Bpqij , and Ceq;j, the rotated coordinates; Qpqij the direction of position (p, q) with respect to the position (i, j); Y the output of the Second Competitive Stage; rok and rmaxk the ranges of relatable orientations; D, E, P, and R constants; and A 0 the angle between orientations. 2.3.5. F e e d b a c k Stage ($5)

Before cooperative signals are sent to the first competitive stage, a competition across the orientation dimension and a competition across spatial position take place in order to pool and sharpen the signals that are fed back. Both competitions are implemented in one processing step.

3. Results and conclusions As a preliminary example of the system processing capabilities, Fig. 7 shows a sewage pipe image. In this image one can see a socket (in white). This image corresponds to a typical " g o o d " image, but there are some other ones where the visibility is worse or where physical obstacles can be observed. Fig. 8 shows the SCLM module output. It can be seen that the spatial mapping allows a great data

reduction (more than 10 times) that produces an equivalent reduction in the processing time. Fig. 9 shows the Segmentation Output Signal. It can be seen that the input image (more exactly the transformed image) is segmented in two areas: the white area, which corresponds to the socket area, and the black one, which corresponds to the rest of the image. In the upper right quadrant one can see noise that corresponds to some geometrical distortion produced by the mapping, but does not disturb the sockets' detection. The segmented image is the input of the ORS where the pipes' sockets are finally detected. That is performed by a SOM (Self Organizing Map) network. Our architecture is robust enough to recognize the pipes' socket in almost all cases. At the moment we do not have an exact performance index. This is not easy to determinate because the system deals not only with typical images but also with non-typical ones (scenes with variable environmental conditions). Actually we are performing this quantification by analyzing a great quantity of real data obtained by the camera-car working in a semi-automatic way.

Fig. 8. Spatial mappingsignal(128× 64 pixels).

368

J. Ruiz-del-Solar, M. Kfppen / Pattern Recognition Letters 17 (1996) 363-368

Fig. 9. Segmentation output signal (128 × 64 pixeis).

Finally, we want to point out we think that our processing system can also be used to process other kind of images with circular symmetry, like images from tubes, tunnels, wheels, etc. Additionally, we believe that more research must be performed in the Oriented Cooperation Stage, the most time consuming stage, to improve global system performance.

References Cruthirds, D., A. Gove, S. Grossberg, E. Mingoila, N. Nowak and J. Williamson (1992). Processing of synthetic aperture radar images by the boundary contour system and feature contour system. Proc. lnternat. Joint Conf. on Neural Networks, IV, 414-419. Daugman, J.G. (1980). Two-dimensional spectral analysis of cortical receptive field profiles. Vision Research 20, 847-856. D(az Pemas, F. (1993). Arquitectura neuronal para la segmentaci6n y el reconocimiento de im~igenes texturadas en color, usando un modelo visual crom,'itico. Ph.D. Thesis, School of Industrial Engineering, University of Valladolid, Spain.

Grossberg, S. (1983). The quantized geometry of visual space: the coherent computation of depth, form, and lightness. Behaoioral and Brain Sciences 6, 625-657. Grossberg, S. (1994). 3-D vision and figure-ground separation by visual cortex. Perception and Psychophysics 55, 48-120. Grossberg, S. and E. Mingolla (1985). Neural dynamics of perceptual grouping: textures, boundaries, and emergent segmentations. Perception and Psychophysics 38, 141-171. Kellman, P.J. and T.F. Shipley (1991). A theory of visual interpolation in object perception. Cognitive Psychology 23, 141-221. Lehar, S., A. Worth and D. Kennedy (1990). Application of the boundary contour/feature system to magnetic resonance brain scan imagery. Proc. lnternat. Joint Conf. on Artificial Neural Networks, San Diego, June 17-21, 1990, I, 435-440. Lohmann, L. (1993). Untersuchung der Einsatzmbglichkeiten der Bildverarbeitung zur Automatisierong der Rohr- und Kanalanalyse. Diplomarbeit, Technische Universit~t llmenau, Germany. Lohmann, L. and B. Nickolay (1994). System der kanten- und texturorientierten Szenenanalyse am Beispiel tier Automatisiemng in der Umwelttechnik. Mustererkennung 1994, 658-665. Nicholls, J., A. Martin and B. Wallace (1992). The visual cortex. In: From Neuron to Brain: A cellular and molecular approach to the function of the nervous system, 3rd edition. Sinauer Associates, Inc., Sunderland, MA. Schwartz, E.L. (1980). Computational anatomy and functional architecture of striate cortex: a spatial mapping approach to perceptual coding. Vision Research 20, 645-669. Wilson, H.R., D. Levi, L. Maffei, J. Rovamo, and R. DeValois (1990). The perception of form: retina to striate cortex. In: L. Spillman and J. Werner, Eds., Visual Perception: The Neurophysiological Foundations. Academic Press, New York. Worth, A.J. (1993). Neural networks for automatic segmentation of magnetic resonance bruin images. Ph.D. Thesis, Boston University, Boston, MA.