Fast hough transform on a mesh connected processor array

Fast hough transform on a mesh connected processor array

Information Processing Letters 33 (1990) 243-248 North-Holland FAST HCNJGH 10 January 1990 ON A IMESH CBNNECI’ED C.S. KANNAN* and Henry Y.H. CHUAN...

581KB Sizes 0 Downloads 72 Views

Information Processing Letters 33 (1990) 243-248 North-Holland

FAST HCNJGH

10 January 1990

ON A IMESH CBNNECI’ED

C.S. KANNAN* and Henry Y.H. CHUANG Department ofComputer Science, Univmity ofPittsburgh,

PMKESSOR

A

Pittsburgh, PA 15260, USA

Communicated by David Gries Received 5 June 1989 Revised 21 August 1989

Keywo&: Algorithm, parallel processing, pattern recognition, array architecture

I. Mmhwtion The Hough transform and its generalization have been found very useful in computer vision [4] and are frequently used in detecting the shape of object boundaries in image pattern analysis [1,3,6]. In the simplest form, the method involves transforming each of the edge pixels on an image line into a curve in the parameter space. The parameter space is defined by the parameters used to describe lines in the image space. A line in the image space is mapped into a point in the parameter space. The effectiveness of the method stems from the fact that, with a fixed size parameter space, the computation required grows linearly with the number of edge pixels, rather than quadratically. The amount of computation required in the Hough transform grows with the size of the parameter space, which determines the accuracy of the detection, and the number of edge pixels. Since the amoun; of computation is generally very large, it is cs~ntial to increase the speed of the computation, particularly in real-time image processing. A number of methods to speed up the Hough transform have been reported [2,7-91. Basically there are two approaches. Qn;e approach uses parallel processors to perform the compu-

* Current address: Tartan Laboratories Inc., Pittsburgh, PA 15146, USA. OO20-019O/90/$3.50

tation intensive parts in parallel [2,7,9]. In the other approach the amount of computation is reduced through a hierarchical organization of the computation steps [8]. Although parallel processing is essential in computing the Hough transform, it is complicated by the fact that the computation needs global information. Ibrahim et al. reported parallel methods on Non-von, a fine-grain treestructured SIMD with broadcast mechanism [7], Silberberg reported a parallel method in GAPP [5], a mesh connected programmable array [9]. Chuang et al. reported a systolic array for an improved Hough transform 121.In all the parallel methods reported, each processor processes either an edge pixel or a point in the parameter space. Assume that the size of the parameter space is n by n and the number of the edge pixels is N (generally N B n). The method by Ibrahim et al. requires at least g(N) time i, Siiberberg’s method requires 0(n2) time, and the systolic array by Chuang et al. takes G(N + n) time. In this paper, we present a method on a mesh connected torus jjfCNXSSOi array that computes the HOugh transform in O(n ) time. The mesh connected torus architecture is chosen because it is effective in processing most of the low level image processing tasks. f[n all the previous methods: it is assumed that the input data are the x-y coordinates of the time for broadcasting the coordinates of the pixels the root is not considered.

’ The

Q 1990, Elsevier Science Fublisherq .V. (Kwth-Holland)

from

243

Vchmz 33. Number 5

iNFORMATiON

10 January 1990

k-‘KOCESSING LETTERS Image

edge pixe\s. Our method can efficiently compute the E-IOU@transform for the case where the input data are the x-y coordinates of the edge pixels as well as the case where the input is the imag itsdf. We first describe the Hough transform on sequential computers in Section 2. An efficient par;lllel HOI+ transform aigorithm is then presented in Section 3. Finatiy in Section 4, we explai9 how to deal with the case when the processor array is smaller than N and/or n2.

gh transform on

ential coInputers

The Hough transform is based on the duality between points on a line/curve and the parameters of that line/curve. Specifically, a straight line in the (x, u)-image plane may be parametrized by the slopeintercept parameters (m, c) where y = FM i- c, or the (r, 6) parameters of the normal where r = x cos 8 + y sin 9. If the latter parametrization is used, as illustrated in Fig. 1, digitally cd&~ edge pixels PI and Pt will have as their duais in the (Y, Q-plane the respective sinusoidal curves intersecting at a common point (5: Bk) that parametrizes the digital line along whzGh these edge piels lie. For a given image window, t’le corresnonding K ------a finite-sized (r, @)-region can be appropriately quantized and treated as a 2-dimensiona! array of cells to keep track of the count of edge pixels mapped into the corresponding qua+ tized regions. Extraction of the edge line segments in the image window is done by searching for cells of peak counts exceeding a certain threshold value. The computation intensive part is the computation of the counts in the array. The searching for counts exceeding a certain threshold can be done easily once the counts are available. .a ;.rr=cw .h! p&c pin_~ls Asg&me that ‘?’ . -eY ““U&W rofitgp_s v aad that the size of the parameter space is Mx n, Le., the range of B is quantized into n v&es 8s, . . . , I?,_, and similarly the range of Y is also quantized into n values r,, . . . , r,_,. An n x n array 5 used to store the counts which are initial&?! to zeros. For each of the N edge pixels and f the n v&es of 8, the value of r ’ and the count corresponding to a (8, r ) is i~crei~~~t as given in t

AccumulatorArray

=j

r.

I

L

'k

\

Fig. 1. Hough transform utilizing (r, 0) parametrization.

following algor&rn. r direction.

r,. is the resolution along the

for every edge pixel (x, y! do for e=o 0, e*t...,L do

belgin r := ( x cos B -i-y sin Q/r,,;

count[& r] := count[& r] + 1 end The time complexity of this method is 0(&V), which is proportional to the product of the number of edge pixels and the number of quantized values a!ong the Q direction. The computation is too Slow for many of the applications, especia!iy in r?,,?-time applications as N and a c*? be very large. Therefore, it is very desirable to have faster

Volume 33, Number 5

INFORMA-FIQNPRQCESSINGLETTERS

method to compute the Hough transform on a mesh connected processor array.

3. An efficient pa We assume that the processor array is n by n. Initially, we assume that the image contains N = n x n edge pixels and the parameter space is n x M, i.e., both the number of edge pixels and the sic of the parameter space fit the processor array. Later on, we discuss modifications to handle situations when N and/or the parameter spa:e is greater than the size of the processor array. We refer to the processor in the ith row and jth coiumn of th c mesh as P(i, j), 0 < i < n - 1 and 0
0 a

El ‘L1

x Y

Pii..i) pixel

Si;-;e,

J

Fig. 2. Storage inside a processor.

10 Jaouary 1990

3.1. Phase one of the algorithm The idea of phase one is to make each processor compute and accumulate the taunts for all of the pixels in its column for one particular 0 value. For example, processor P(i, j) is responsible for computing and accumulating the counts for all of the pixels on the jth column for the value of 8 = 8,. It accumulates these counts in array h. In order to enable processor P(i, j) to access all of the n pixels in ies column, the 2dimensicnal array of edge pixels is rotated along the vertic ection (i direction) n times. Initially, the edge pixels and the sine and cosine values are loaded onto the processor array. Processor P(i, j) contains the coordinates of one of the N edge pixels, and the values of sin@, and cos 0,. Also, the h array in each processor is initialized to all zeros. Processor P(i, j) computes P = (x cos 8, + y sin 0,)/r,, and increments the r th element of the h array. The coor~ates of the edge pixels are then shifted (cyclically) to processor P((i - l)(mod pl), j). This process is repeated n times. The algorithm for phase one is given bekw: am executed by processor kmp n times Y:= (x cos e, + y sin 8,)/r,, h[r] := h[r] + 1; shift x and y to P((i - l)(mod n), j);

Before describing phase two of the algorithm: we consider a variation of phase c ne of the alalgorithm. In some applications, the processor array may contain the actual image (image containing pixels with value one (edge pixels) as well as those with value zero). This image could l-rave been thy. on the result of executing some Other @Orit processc9r array. In such cases, if we want to compute the Hougb transform on the act we can make the phase one of the aIgori

placed

with less expensi-fe

operations

like ad-

ditions and subtractions. Consider an n x n processor array and an n X n image. The coorzkates of the pixel stored in P(i, j) is (i, j), when the algkthm starts. Therefore, processor P(i, j) will process the pixels with the following cocrdinates in ihe n successive cycles: ((i, jj, (i + 1, j), . . . , (n - 1, J,J (0, jj, (1, j),..., (i - 1, j)},. Note that the y-coordinates remain unchanged and the x-coordinates differ by one (mod n) in successive cycles. To take advantage of this, we compute and store r. = (i cos ei + j Sin 8i)/r,, r, =i Sin @i/r,, and r, = COS Bi/~~~ once initially. In the very first cycle, r takes the value of r,. In the following (n - i - 1) cycles, we keep adding the value of re to get the value of r. In the next cycle, r takes the value of rb. We then keep adding the value of r, to get the value of r in the following (i - 1) cycles. This modificaticn uses only two initial multiplications, and the other multiplications are replaced by additions and subtractions. The modified algorithm is shown below. executed by p

10 January 1990

INFORMATION PROCESSINGLE’ITERS

Volume 33. Number 5

T P(i* j)

r,:=(icos6,+jsin8i)/r,; r, := j sin l&/r,; r‘ := cos 8,/r,,;

they would have encountered

all of the pixels in their respective columns by that time.

3.2. Phase two of the algorithm Now we consider the second phase of the algorithm. At the end of the first phase, wc have accumulated the counts for (Bi, q), 0 6 j < n - 1, in the h arrays of the processors on the ith row. The contents of tbc corresponding elements of the h arrays on P.row should be summed up to get the total counts, and this should be done in all of the n rows of processors. Consider the case of n = 3 (Fig. 3). Processor P(i, 0) sends h(i, 0) to processor P(i. 1). kxessor P(i, lj adds !z(i, Ij to him’, 0) and sends the sum to processor P(i, 2). Processor P(i, 2) in turns adds h(i, 2) to h(i, 0) + h(i, 1) (which it received from P(i, 1)). Since these additions take place in a pipelined fashion, the. total time required is 5 cycles (assuming addition and shifting can be done in the same cyclrj. Further, the pipelined summing and shifting operations take place in all of the rows in parallel. At the end of this phase, the ith row of the parameter space (correspondiig to 9;) will be stored in the rightmost processor on the itk row, i.e., P(i, n -_ 1). The algorithm for phase two is given below. The algorithm executed by promssor P(& j)

pixel to P(ji - kj(mod n), j); r := r I- r,;

if pixel = 1 n h[r] := h[r] + 1; Shift pixel to P((i - l)(mod n), j) e F r := r,;

fork:=Oton-l&l receive the sum h(i, O)[k]+ a.. +h
r := r + r,; ft

h[rj := h[rj + 1; pixel to P((i - l)(mod n), j)

h(lJJ@Jl. h(i.O)lll. h(d)121 h(s.~)l~l. h(l.l)ll].

h(i.l>!2]

bb2)[0]. h(i.2)[1]. h(i.lH2]

h(i.O)]~j+h(~.1)]0] 8?y+h!::‘~‘! < b!i.LXl-II\. :;;2;

1 hb O~O]+h(i.l][O]+h(r.2>!0! htr:on,l+b(i.,)l,kn(i,*~,] h(i.O~2l+~(i.l)]2kh(i.2~2]

Fig.

3. Phase tw sf t’nealgorithm.

Volume 33, Number 5

INFORMA’l’ION PROCESSING LETTERS

A few comments are in order. l?rocessors in the 0th (leftmost) column start executing at the 0th cycle (of the second phase), and they skip the first instruction as they have nothing to receive. Processors in the jth column start executing at the jth cycle. Processors in the (n - 1)th (rightmost) column start executing at the (n - 1)th cycle, and they ship the last instruction as they do not send the counts anywhere else. At the end of phase two, the ith row (corresponding to Bi) of the parameter space is stored in the h array of processor P(i, n - 1). These values can be either sent to a host procsssor for selecting the parameters with co-tints exceeding a certain tkesholdj or this can also be done by the processors on the last colurr~. 323. Time complexity Now we want to discuss the number of operations involved in the two phases of the above algorithm. base one. In phase one each processor performs Zil additions, Zn multiplications, iz divisions and n shifts. Assuming each of these operations take one cycle, total time for phase one is 6n = O(R). two. In phase two each processor performs n additions and n shifts. But since the processors in the last column begin execution only at the (n - 1)th cycle, the total ik,c for phase two is 2n - l= O(n). Effiekncy. The efficiency, defined as the speed-up (i.e., serial time over parallel time) per processor, of our method is nN/nn * = N/n*, When Nequals PZ* (or N is 0( n*)>, the efficiency becomes a constant.

10 January 1990

First let us consider the case when the pammeter space is larger than the size of the processor array. Let the size of the parameter space be (kra X !oz), i.e., there are kn quantized values of 8 and kn quantized values of r. S&e there are kn values of 8 for which the computations of phase one must take place, each processor is now responsible for processing k values of 8 instead of just one value of 0 as before. ‘This means that the total computation time of phase one increases by a factor of k and becomes 0( kn),. To find the computation time for phase two, note that now there are ,k different i! arrays in each processor, and the size of each of these h arrays is kn. Therefore, totally k’n dues of counts are to be summed up in each row of processors and thus the total time for phase two becomes @(k*n). The overC1 time of the algorithm is, therefore. O(k;i) + O(k*n) = O(k’n). Ncsw let us consider the second case when the number of the edge pixels is larger than the size of the processor array. Let the number of edge pixels beM=micmwheremisamultipleofn.Inthis case each processor must store in its local memory a pixel array of size (m/n) X (m/n>. Tk time for pbase one increases by a factor of (m2/n2), giving a total time of 0(m2/n) = O(M/n) for phase one. The time for phase two remains unchanged. When both the pixel array as well as the parameter space are larger than the size of the processor array, a combination of the above two schemes can be used. In this case, the time complexity of the algorithm becomes: O(km*/n + k*n) = O(kM/n + k*n).

eferences

111D.H. B&rd, Generalizing the Hough transform to detect arbitrary shapes, PrarternRecognition 13 (2) (1981) Ill-122.

I21 H.Y.H. Chuang and CC. Li, A systolic array processor for straight line detection by modified Hough transform, ipl:

ing Barge I So far we have t the number of edgb pixels as well as the eter space fit the size of the processor array. We now nsider ways is not tP of handling the situations when t case.

Proc. IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Da&me Management $095: 32%3%. [31 R.O. Duda and p.E. Hart, Use of the Ho& transformation to detect tines and ccrves in pictures, Comw~.AC

(1) (1972). [41 F. Evans, A survey and comparison of the form* in: Proc. lE!X Computer Sociew WofkJhop

On

Coin-

247

Volume 33, Number 5

INFORMATlGN

PROCESSING LETIERS

purer Architecturefor Pattern Andysis and Image Database Management (1985) 378-380. [5] W. Holsxtynski, L. Coulter, K. McCoy and E. Clord, The GAPP user’s manual (rougb draft), Martin Marietta Aerospace, Orlando9 FL (1985). [6] P.V.C. Hougb, Met&& and means to recognizR complex patterns, US Patent 3069654 (1962). [7] H.A.H. Ibrabim, J.R. Knder and D.E. Pbaw, The analysis and performance of two middle-level vision tasks on a

248

10 January 1990

fine-gunned SIMD tree machine, in: Proc. IEEE Computer Society ConjTon Computer Vision and Pattern Recognition, S~IIFrancisco, CA. (1985) 248-256. [8] H.W. Li, Fast Hougb transform: A hierarchical approach, Comput. I’ision, Graph. Image Process. (1986) 139.-161. [9j TM. Silberberg, The Hougb transform on the geometric arithmetic parallel processor, in: Proc. IEEE Computer Sociely Fi&rkshop on Computer Architecture for Pattern Anabsis and Image Database Management (1985) 387-393.