Signal Processing 17 (1989) 227-239 Elsevier Science Publishers B.V.
MINIMUM DIGITAL
CYCLE FILTERS
227
FAST
IMPLEMENTATION
BASED
ON
THE
LU
OF
TWO-DIMENSIONAL
DECOMPOSITION*
B. G. M E R T Z I O S Department of Electrical Engineering, Democritus University of Thrace, 67 I00 Xanthi, Greece A. N. V E N E T S A N O P O U L O S Department of Electrical Engineering, University of Toronto, Toronto, M5S 1A4, Canada Received 3 March 1988 Revised 20 June 1988
Abstract. A new architecture for two-dimensional filters is presented, which is based on the LU matrix decomposition and is characterized by minimum cycle time. The clock cycle time in the proposed structure is determined by one multiplication and three additions, which are the maximum number of arithmetic operations in the critical path, independently of the filter's order. This realization is characterized by all known advantages of decomposition structures (modularity, parallelism, reconfigurability) and may be used in real-time image processing applications. Zusammenfassung. Eine neue Architektur fiir bidimensionale Filter, die von einer LU Aufteilung ausgeht und die durch eine minimale Zykluszeit gekennzeichnet ist, wird vorgestellt. Die Clock Zykluszeit der vorgeschlagenen Struktur wird durch eine Multiplikation und drei Additionen bestimmt. Dies ist die maximale Anzahl von Rechnervorg~inge im kritischen Weg, ungeachtet des Orden des Filters. Diese Realisation wird von allen Vorziigen der Aufteilungsstrukturen gekennzeichnet (Modularifiit, Paralellismus, Rekonfigurabilit~it), und kann in Echtzeit-Bildverarbeitungsanwendungenverwendet werden.
R6sum6. Une nouvelle architecture pour les filtres bidimensionnels est pr6sent6e, bas6e sur la d6composition triangulaire sup-inf et caract6ris6e par un temps de cycle minimum. L'horloge du temps de cycle dans la structure propos6e est d6termin6 par une multiplication et trois additions qui sont le nombre maximum d'op6rations arithm6tiques dans la voie critique, ind6pendamment de l'ordre du filtre. Cette r6alisation est caract6ris6e par tousles avantages connus des structures de d6composition (modularit6, parall~lisme, reconfigurabilit6) et peut ~tre utilis6e dans les applications temps r6el de traitement d'images. Keywords. Two-dimensional digital filters, fast implementation techniques, minimum cycle time, LU matrix decomposition.
1. Introduction In
recent
approaches
d i m e n s i o n a l ( l - D ) infinite impulse response (IIR) digital filters has b e e n p r e s e n t e d [3], w h i c h is b a s e d years,
have
a
been
number proposed
of
different
for t h e
fast
i m p l e m e n t a t i o n o f d i g i t a l filters [3, 6, 7, 14, 17]. A m o n g o t h e r efforts, a r e a l i z a t i o n s t r u c t u r e o f o n e * The research reported in this paper was partially supported by the NATO Grant for international cooperation in research, No. 86/0063. 0165-1684/89/$3.50 (~ 1989, Elsevier Science Publishers B.V.
o n the d i r e c t f o r m II a n d m i n i m i z e s the n u m b e r of
arithmetic
operations
in
one
clock
cycle.
R e c e n t l y , the c o n f i g u r a t i o n o f [3] was e x t e n d e d to the t w o - d i m e n s i o n a l ( 2 - D ) case b y u s i n g the direct f o r m I [21]. In modern signal processing applications, the i n c r e a s e o f t h e t h r o u g h p u t rate ( a n d t h e r e f o r e o f
228
B.G. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
the allowed sampling rate) is the more important criterion; therefore, the minimization of the cycle time appears to be an equivalent objective. In order to clarify some notions, we give the following definitions: Critical path is the path which involves most multiplications and additions between two successive delays. Clock cycle time (z) is the time interval which is determined as the time needed for the execution of the intermediate computations in the critical path and the transfer of the signal from one delay to another. Throughput rate (R) is the processing rate of the filter and it is determined as the number of the produced samples at the output port in the unit of time. Sampling rate (S) is the input data rate and is less than or equal to the throughput rate. Throughput delay or latency is the time interval separating the appearance of an input sample at the input port from the appearance of the corresponding output sample at the output port. The clock cycle time is generally increased with the increase of the filter's order. This results in a decrease of the throughput rate in the implementation of higher order filters. However, following the minimum cycle approach, we can in general modify a given realization such that the critical path involves the minimum number of arithmetic operations, independently of the filter's order; i.e., the cycle time remains the minimum and the throughput rate is the maximum possible always. Therefore, we do not have to worry about any implications of the use of a higher order filter on the throughput rate. The only drawback of the minimum cycle approach is the increase of the latency, but usually latency is not of great importance. Among other architectures for 2-D digital filters, a new technique based on the decomposition theorem is presented in 19, 10, 20], which results in the implementation of an arbitrary IIR 2-D digital filter with a high degree of inherent parallelism and modularity. Special forms of the general Signal Processing
decomposition structui'e described above are the Jordan form decomposition (JD) [19], the singular value decomposition (SVD) [ 1, 18], and the lowerupper triangular (LU) decomposition [13]. A comparative study of the computational cost and of the throughput delay of the above structures with the canonical (direct) realizations is presented in [13]. Additional comparisons involving another decomposition structure and, specifically, the Walsh-Hadamard transform (WriT) decomposition [9, 10] are given in [15]. These structures result in filter architectures which are suitable for VLSI implementation [ 11 ]. In this paper, we shall present realization structures for arbitrary order 2-D finite impulse response (FIR) and IIR digital filters, which are based on the decomposition theorem and, specifically, on the LU decomposition, and are characterized by a minimum cycle time. Both the FIR and the IIR 2-D cases are considered. In the FIR filters, the critical path consists of only one multiplication and one addition, while in the IIR case, it consists of one multiplication and three additions. The proposed structure is characterized by the usual advantages of the decomposition structures and leads to real-time implementation of arbitrary order 2-D digital filters.
2. Matrix decomposition structures
The following theorem refers to the general matrix decomposition of a 2-D rational transfer function which describes a 2-D IIR digital filter. Theorem 1. The 2-D rational transfer function
n(z,, z:)
n(z,, z~)
H ( Z l , Z2) = - d ( Z l , z2) - 1 + d(Zl, Z2) tl 1
rtl 1
Y Y~ n,jz~ i=oj=o
1+~
(1) ~2 di~zI z'2
i=0j=O
(ij)~(o,o)
B.G. Mertzios, A.N. Venetsanopoulos/ Fast implementation of 2-D digitalfilters can be expressed by terms of the form ( z l - Z l i ) , (z2 - z2j ) only, where zli, z2j are constants. Feedback, which contains both variables zl and z2, is required to realize a nonseparable denominator.
229
permutation of (a) the rows and columns of N, and (b) the rows of Z2 and the columns of ZT, we arrive at an alternative form of (2), viz:
n( z, , z2) = 2V, A 2 2 , The proof of the above theorem is given in [20]. In general, in the matrix decomposition approaches, we write the 2-D polynomial n(z~, z2) in the form n(zl, z2) = ZTNZ2,
(2)
(3)
where A is a rectangular (n I + 1) × (ml + 1) matrix of rank p. resulting from the suitable permutation of the columns and rows of N, such that the first /~ successive principle minors are different from zero, i.e.,
where Zl=[1
z,
'''
z]'l]T,
Z2=[1
z2
"'"
z~'] x
al 1
...
a.l k
Dk & det
• ak l
and N=[ni-~.j-1] is a constant ( n l + l ) × ( m ~ + l ) coefficient matrix. The matrix N may be written, in general, in a number of ways as a product of two constant matrices R and S, i.e., N = RS. Then substituting N = R S in (2), we obtain
akk J
k= 1,2,...,/x
(4)
and 2 1 __ [-_A(0) --
n(z,, z2) = [ZTR][SZ2] = ~, ri(zl)si(z2).
•
"'
# 0,
L,~l
_A(1) , ZI
A(nl)]T , - . . ,
2 2 = [Z k(0), zk(l',
ZI
J
,
, , , , gk(ml)] T,
i=l
The smallest number of parallel branches for the realization of (2) is equal to the rank of the decomposition tz = rank N <~ or. Similarly, the denominator polynomial d ( z l , z2) in (1), which does not contain a constant term, is written in the form d ( z l , z2) = Z~DZ2, where D is a constant ( n 2 + l ) x (m2+ 1) coefficient matrix and Zl, Z2 have appropriate dimensions. The matrix D is also decomposed in a product of two matrices P, Q, i.e., D - - P Q . By proper selection of the decomposition matrices in the realization of the denominator polynomial, the realizability of the structure is ensured since no delay free loops appear in the feedback paths. In the sequel, we shall review concisely the LU decomposition which will be used as the basis for the development of the proposed minimum cycle decomposition.
with the a(i), i---O, 1 , . . . , n l , and k(j), j = 0, 1 , . . . , ml, taking the integer values [0,1,2,...,nl] and [ 0 , 1 , 2 , . . . , m l ] with the suitable permutation, respectively. Applying the LU decomposition theorem [5], we may write the matrix A in the form /.t
A = • d,L,U~,
(5)
i=l
where L,=[O
i-~
1
l,+,,•
l , + 2, , ' ' " l ,,+~.,] T is ( n l + l) x l,
Ui=[0
...
0
1
u~,i+l
(6a)
Ui•i+2•''Ui•ml+l] T
is ( m l + l ) × l ,
(6b)
2.1. The L U decomposition for i = l , 2 , . . . , i z In the general case, some of the first/x successive principle minors of the (nl + 1) × (ml + 1) matrix N may not be different than zero. Then by suitable
d l = D1,
and 02 d2 - D1,
• ° •
d~, = D~, . O._
(6c)
1
Vol. 17, No. 3, July 1989
230
B.G. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
The scalars Dk, k = 1, 2 , . . . , / z are given by (4) and the lgk, Ukh by A 1 2
...
k-1
g=k+l,...,n~+l,
A
2
Ukh --
.-"
k=l,2,...,~,
:)
k-1
(6d)
Ok
h=k+l,...,m~+l,
k= 1,2,..,,/~,
(6e) where the minor of A of order v is defined as
The realization (7) consists of/z parallel branches each one of which is a cascade configuration of two 1-D l~olynomials. It has been shown in the literature that the LU decomposition scheme is characterized by smaller computational and hardware requirements, relatively to the other decomposition structures [13, 15]. For this reason, the LU decomposition will be used as the basis for the development of the present minimum cycle realization structure. Note that if A is a square symmetric matrix, then Li=Ui, i = 1 , 2 , . . . , / z .
2.2. Decomposition structure form H
(6f) ai~.,h
•..
ai,,,k,
l ~ i l < i 2 < . . .
n(z,, z2)= E dJ,(z,)ui(z2)
(LU)
i=1 b~
= E d,u,(z2)i,(z~)
(UL),
(7)
i=1
where
l,(z,) = z~(°) + 12,z~(t) + ' ' " + l.,+,,lz~ ("0, (8a)
/2(z,) = z ~ t ' + 132z~(~)+ ' ' ' + / . , + , . 2 z ~ ("'), : (8b) l,.(z,) = z~ ~'-') + l,.+u,zaO')+ • • • + l., +i,rZ~(n'); (8c) ,,,(z~) = z~(°) + ,,,~z~') + . . • + ,,,.m,+,Z~(" ),
(8d)
The decomposition structure presented in [20], as well as the other realizations which are based on special matrix decompositions [1, 9, 10, 13, 15, 18, 19] are based on the direct form I realization (Fig. 1(a)). Specifically, the whole filter results from the cascade configuration of the array FIR filter HA(Z~, z2) = n(z~, z:) (Fig. 2(a)) with the all pole IIR filter HF(Z~, z2) = 1/d(z~, z2) (Fig. 2(b)), both realized in the matrix decomposition form. Here we introduce the decomposition structure form II (Fig. 3)~ which is based on the direct form II (Fig. l(b)) and results from the cascade configuration of the all-pole IIR recursive filter HF(Z~, z2)= 1/d(zl, z2) with the FIR nonrecursive filter HA(Z~, z2)= n(z~, z2), both implemented in the matrix decomposition form. Although the order of the sub-filters is reversed, the overall transfer function (1) is not affected since we deal with linear space-invariant (LSI) systems [8]. It is seen from Fig. l(b) that the intermediate variable W(z~, z2), which represents the output of the filter HF(Zl, z2), is related to the input and output by the relations W ( z, , z2) = X ( z, , z2) -
d(Zl, z2)
W ( z, , z2),
(9a)
u:(z~) = z~")+ u23z~(~) + . . . + U:.m,+,Z~ (m'),
(8e) Y ( z , , z2) = n(zl, z2) W ( z , , z2),
~(z~) = z~"-" + ,.,,.,,z~ (r) + . • . + Ur,,,,,+~Z~ (m'). Signal
Processing
(8f)
(9b)
in the z-domain. The form II is generally character-
B.G. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
X(ZI'Z2) [
~[ n ( z l ' z 2
231
Y(Zl,Z2 )
)
Nonrecursive
block
Ia,zl,z21 r a
Recursive block I
I
I
I
w( z~,z2)
X(Zl,Z z)
I
[ Y(Zl,Z2)
i: ~]n(zl'z2)l
II Nonrecursive block a(zl,z2)
l_
I-
b Fig. I. (a) Direct form 1 realization. (b) Direct form II realization of 2-D filters.
ized by reduced storage requirements, less hardware cost and increased throughput rate. In the decomposition structure form II, it is possible to share some of the delay elements Zl, z2 between the feedback and forward branches, as is shown in Fig. 3. Specifically, if h i = n 2 , we could choose the auxiliary matrix R to be equal to P. Then the polynomials r~(z~) would appear as common factors in the feedback and array blocks, thus resulting in a reduction of the storage requirements and the number of delays and registers. Similarly, if ml = m2, we could choose S = Q. In the general case where n2> n~ and m 2 > m l , we consider an augmented coefficient matrix
LoJ}n2- hi'
(10)
of dimensions ( n z + l ) x ( m l + l ) . Then the coefficient matrices )~, D, may be decomposed as follows:
N=RS,
D=RQ,
(11)
where R is an (nE+ l) x ( n : + l) square common auxiliary nonsingular matrix. The matrices S, Q are determined from (11) by S = R-1N, Q = R-1D. Using now the decomposed forms of n(zl, z2), d(Zl, z2), as they result by applying (11), we arrive at the following equations of the output Y(z~, z2) and the output of the feedback branch V(z,, z2) -d ( Z l , g2) W(Zl, Z2):
Y(z,, z2) = Y~ r~(z,)sdz2) W(zl, z2),
(12a)
V(zl, z2) = ~ r,(zl)q,(z2)W(zl, z2),
(12b)
i=1
i=1
where ~: = max(/z, v) = max(rank N, rank D).
(13)
The factor r~(z~), i = 1, 2 , . . . , ~ may be shared by the forward and feedback branches as is seen in Fig. 3. Note that we have now the same number Vol. 17, No. 3, July 1989
B.G. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
232
+
P
+
+
+
input X(Zl,Z2~
.
(zl,z 2)
:Y '+
a X(zi,z ~
.~
Y(zz'z2) .
-A
v~
+
+ +
+
"'"
~
+
+ .,.
b F i g . 2. ( a ) T h e
nonrecursive array block n(zl, z2) o f H(zl, z 2) with cycle time CA = M u + ( 2 + / z * ) A d . block 1/d(zl, z 2) o f H(z t , z2) with cycle time CF = M U + ( 3 + v * ) A d .
o f parallel branches in the feedback and forward blocks, since they share the factors r~(zl), i =
1, 2 , . . . , ~:. Since the (1,1)th element o f D is by construction zero, we can always select the decomposition Signal Processing
(b) The
recursive feedback
matrices P = R,Q such that each factor ri(Y.l)qi(z2) cannot have a constant term, i.e., cannot involve a delay free loop. Let us, for example, select R = [rq], Q = [ q q ] such that rl~ = 0 , i = 2 , . . . , n2,qll = 0 [20]. Then, the factors q~(z2) and rJzl), i=
B.G. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
X(Zl,Z 2)
j<
233
W(Zl,Z ) V(Zl,Z 2)
q~(z2)
rZ(z I )
r~(z I)
\
Y (Zl,Z 2)
s~(z2) Fig. 3. The general d e c o m p o s i t i o n structure form II.
2, 3 , . . . , ~, have as factors the delays z2 and z~, respectively. Therefore, they can be written in the form ql(z2) = zzq',(z2), (14)
ri(zl)=zlrl(zO,
i = 2 , 3 . . . . ,~,
where q'l(z2), rl(z0, i --2, 3 , . . . , ~ may not have a constant term. Fig. 3 shows that the delays z2, z~ are placed between the factors q'~(zO, r~(z2) and qi(z2), r'i(z~), i= 2, 3 , . . . , ~, respectively, in order to minimize the number of arithmetic operations per cycle.
3. The cycle time of the decomposition structures This section discusses the cycle time of the realization structures which are based on the general matrix decomposition and the LU decomposition theorem. In all the versions of the matrix decomposition structures, there are 1-D F I R filters, each one of which is a function of one of the two space variables only. Each 1-D filter may be realized in
direct form or in cascade form (by expressing the 1-D polynomial as a product of first- and secondorder terms with real coefficients [16]). Clearly, the direct form realization of the 1-D polynomials in each parallel branch is characterized by a reduced cycle time, since less arithmetic operations are performed in the critical path. Since here the figure of merit is the cycle time, we will consider in all subsequent analysis the direct form realization of the 1-D factors. In the sequel, we will consider separately the F I R filters, the I I R all pole filters, and the I I R filters realized by the direct forms I and II.
3.1. FIR filters Let an F I R filter of the form (1), where /~ = rank N<~ min(n~ + 1, ml + 1).
(15)
Then the array block of the general decomposition structure (Fig. 2(a)) requires at least g parallel branches. In order to avoid the use of time-shared buses and to reduce the cycle time, we use a tree structure of adders for the addition of the partial Vol. 17, NO. 3, July 1989
B.C. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
234
results in the parallel branches. Then, considering that ( / . t * + l ) A d > Mu, the cycle time is given by • g = M u + (2 +/~*)Ad,
the critical path, we modify slightly the decomposition structure such that in each parallel branch we have one multiplier in cascade with two 1-D polynomials with unity constant terms. This may be done always by dividing each 1-D polynomial by the coefficient of the constant term. Moreover, the realization shown in Fig. 4, which is based on the LU decomposition, has a reduced cycle time, given by
(16)
where Mu, Ad denote the time needed to execute one multiplication and one addition, respectively, and /~*= [log2/~].
(17)
In (17), the symbol Ix] denotes the first positive integer which is greater than or equal to x. Note that in order to have only one multiplication in
~'g.LU= max{Mu + 3Ad, 2Mu + Ad},
(18)
independent of the filter's order.
z I FIR
I
z I FIR
X(Zl,Z2)
Zl
"
z2
l
n1-1J
a
nl-~+l
ml_U+I
-.@
--O b
""
÷'
)
Fig. 4. (a) The realization of an arbitrary order 2-D FIR filter by the LU decomposition. (b) The direct form realization of the I-D FIR in zi, i= 1,2. Signal Processing
B.G. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
3.2. IIR all pole filters
by
Consider the simpler form of IIR all pole filters of the form 1
n2 "2
I+E
. (19)
(23)
2Mu + 3Ad,
independent of the filter's order (Fig. 5).
3.3. HR filters
E dijzlz2 ij
i=Oj=O ( i , j ) ~ (0,0)
Let v = rank O <~min(n2 + 1, m2+ 1).
(20)
The realization of the filter (19), which is based on the general matrix decomposition structure, uses also a tree structure of adders in the feedback block and therefore it has a cycle time given by (21)
where v*-- Flog2 "1.
q'F,LU - - - -
1
H F ( Z I ' 2"2) = d ( Z l , g2)
rF = 2Mu + (2 + v*)Ad,
235
(22)
It is considered that we have one multiplication in each parallel branch of the feedback block, while the constant terms of the 1-D polynomials rl(Zl), q~(z2), ri(zl), qi(g2), i=2, 3,..., ~ equal unity. The realization of (19), which is based on the LU decomposition, has a reduced cycle time given
We shall consider now an arbitrary 2-D IIR filter described by the 2-D rational transfer function of the form (1), where in general n(zl,z2) and d(zl, z2) are irreducible, relative prime 2-D polynomials. The IIR filter may be realized with the decomposition structures form I and II, which correspond to the known direct form I and II realization structures, respectively. As may be seen from Figs. 1, 2, and 5, both the realizations, which are based on the decomposition forms I and II, have a cycle time given by I"1= rtl = max{ ~'F, "/'A-~'-Ad} = max{2Mu + (2 + v*)Ad, M u + (3 +/z*)Ad}.
(24)
The disadvantage of the general matrix decomposition structures form I and II, which Y (Zl,Z_2)
z2 FIR
1_
zI FIR
---(!)-
zI FIR
--0-
I
[ z2F-
z 2 FIR
-
Z2 FIR
~
~
.
-
~
~
zI FIR
zI FIR
Fig. 5. The realization of an arbitrary order 2-D all pole IIR filter by the LU decomposition. Vol. 17, No. 3, July 1989
B.G. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
236
have been described above, is that their minimum cycle time depends on the order of the numerator and denominator polynomials of the realized filter. Therefore, the main objective in the following section will be to determine matrix decomposition structures with the minimum cycle time which is independent of the filter's order. Example 1. Let us specify the cycle time for an arbitrary second order 2-D IIR filter. Then we have, in general, n 1 =
m I =
/~= v = 3 ,
n 2 =
m 2 =
2,
/z*= v * = 2
and from (24), we obtain ~', = rn = 2Mu +4Ad.
4. Minimum cycle time decomposition structure The objective here is to determine a matrix decomposition realization structure of the given filter H(z~, z2) with the minimum cycle time, in conjunction with the minimum possible latency. It is pointed out that the main objective is to achieve minimum cycle operation, while the second one is the minimum latency relative to the minimum cycle structure. The matrix decomposition structure is characterized by arrays of parallel branches in both the nonrecursive block and in the feedback path of the recursive block. Each one of these branches is formed by cascading two 1-D polynomials. In order to obtain a structure with minimum computations in the critical path, which is independent of the filter's order, we modify the transfer function (1) as follows: = H(z,,
z2)z z'
=
f(z,, z2) z2)'
(25)
input-output relation becomes ,,~
n I
m1
r(z,, z2) = z.X(zl, z2) Y~ ~ nozlzzi i=Oj=O
n2
- Y(z,, z2) Z
i=o
m2
i
j
Y~ duz,z>
(26)
j=o
(i,j) ¢: ( 0 , 0 )
It is seen from (26) that the new filter's output A A corresponding to Y(z~, z2) is the Ym. = Y,.-1..; i.e., considering row by row scanning, it is delayed by only one pixel compared with the original filter's output Ym.. In the general case with k, I arbitrary positive integers, the output of the modified filter A k I is y,.. =Ym-k..-. while the power z~z2 would appear only in the first term in the r.h.s, of (26). The implication of this fact would be the availability of more delays in the implementation of the input block 7-1kz 2IX ( z I , z 2 ) n ( z l , z2) , which may result in a reduction of the computations in the critical path. Moreover, it is seen from (26) that k the delays z~, z2I do not influence the realization of the block Y(z~, Zz)d(z~, z2). It is clear now that the critical path will be in this latter block I>(Zl, zz)d(zl, z2). To this end we shall focus our attention to the decomposition of d(Zl, z2). In the sequel, for simplicity reasons, we shall consider the second order IIR 2-D filter. The polynomial d(zl, z2) may be properly decomposed, such that none of the products pi(zOqi(z2), i = 1, 2,. ~ , ¢, can contain a constant term. The LU decomposition may lead to an equivalent realization without delay-free loops and with smaller computational requirements. Specifically, d(z~, z2) can be written in the form [5] d(Zl,
g2)
=
AT ^ Z 1 DZ2
=[Zl
1
Fdlo z2]| 0
dll do~
dl2][1] do2|[ z2|
1
Ld2o d2, d~JLz~ j 3
= E d,(2TL)(uTz2),
(27)
i=1
with k and I being non-negative integers. Note that z~, z2 represent unit delays along a row or a column, respectively. To determine the desired minimum cycle structure, it is enough to set k = l , / = 0 . Thus the equation describing the Signal Processing
where L and U are the lower and upper triangular matrices respectively, given by L = [ I , 1,~ ,]=[L, 1o = 0 for i
L2
L3], (28a)
B.G. Mertzios, A.N. Venetsanopoulos/ Fast implementation of 2-D digitalfilters
vT] U=[ui-l,j-1] = uT i,
In (34) we made use of the relations noj = nlt~Oj, j = 1, 2, 3, and 0oo = 1. The following Lemma determines the number of additions in the critical path.
uij=Ofori>j,
vTI (28b)
dl:Ol,
D2
D3
d2=-~1,
(28c)
d3:o-~2
and D~, i = 1, 2, 3, are the principle minors o f / ~ which are different from zero, since/9 is supposed to be non-singular. Also, since doo = 0 in (27), we assign l~o =0. Using (28a)-(28c) and after some algebraic operations, the polynomial d(Zl, z2) may be written in the form
d ( z,, zz) = ( d,oz, + dzoz~)(1 + Uo,Z2+ UozZ~) + (d2 + 12,dzzZ)(z2 + u,2 z2)
2 22. + d3ZlZ
(29)
In (29) we made use of the relations d~o = d~loo= d~, dzo = d~lzo and Uoo= 1. Moreover, applying the LU decomposition, n(z~,z2) may be written as follows: 3
n(z,, z2) = ZTNZ2 = Z ni(Z~Ai)(oTz2) i--1
(30) where A and 0 are lower and upper trianglar matrices respectively, given by
zX=[&_,.j_,]=[A,
A2 zX3], 60 = 0 for i
(31)
Oo=Ofori
(32)
Forl
o=[0, ,.j 1]=|og|,
LolJ N~ n 1:
N 1,
n2 :
,
NI
m3 n3 :
-
-
N2
237
(33)
and N~, i = 1, 2, 3, are the principle minors of N which are different from zero. From (30)-(33), we conclude n( z, , zz) = (noo+ no,z, + nozz~,)
Lemma. There are at least three additions in the critical path of a 2-D IIR digital fiher, realized with the LU decomposition. Proof. Clearly, the critical path is that which is specified by the product of the polynomials ( ZTL,)( uT z2) = ( d,oz, + d2oz~)(1 + u12z2 + u,3z~). This is due to the fact that the above product constitutes the branch which involves the most additions and multiplications, whereas the polynomial n(zl, z2) is multiplied by Zl in (26). Thus, in the critical path (Fig. 6): • One addition is needed for the realization of the polynomial U~Z2. • One addition is needed for the realization of the polynomial ZTL~. • At least one addition is needed for the summation of the partial results from the branches in the feedback path of the recursive block. This addition cannot be extracted from the critical path since the only delay in the critical path is the Zl which is a factor of Z { L 1. [~ Moreover, the input block ZlX(Zl, z2)n(zl, z2) does not involve any extra addition due to the existence of the excessive delay Zl. Furthermore, note that there is only one multiplication in the critical path, since the coefficient of the minimum power of z~ in the polynomials UTZ2, U2Z2T is unity. From the above, it results that the optimal critical path of a 2-D filter realized with the LU decomposition contains one more addition than the corresponding one with the canonical form-like realization presented in [21]. This addition does not degrade significantly the minimum cycle time of a 2-D filter [21]. On the other hand, the proposed realization preserves all the desirable characteristics of decomposition structures [10, 13, 15, 20].
x (1 + 0olZ2+ Oozz~) 5.
-]- ( n 2 z l '}- ¢~21 n 2 Z 2 ) ( Z 2 -{- 012 z 2 ) 2 2
+ n3zlzz.
(34)
Conclusions
In this paper, realization structures based on the matrix decomposition approach, which are VoL 17, No. 3, July 1989
238
B.G. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
X(zl,z 2) vl~
~
E n2
d2 = 121d2 n 2 = 621n 2
:y. Fig. 6. The minimum cycle realization of a second order 2-D IIR filter by the LU decomposition.
c h a r a c t e r i z e d by m i n i m u m cycles, are p r e s e n t e d . Both F I R a n d I I R 2-D filters are c o n s i d e r e d . In a n y case, the m i n i m u m cycle time, w h i c h c h a r a c terizes the p r o p o s e d m o d i f i e d m a t r i x d e c o m p o s i tion structures, is small a n d i n d e p e n d e n t o f the filter's order. Thus these structures m a y be u s e d as the basis for VLSI p r o c e s s o r s with very high s a m p l i n g rates, w h i c h are s u i t a b l e for r e a l - t i m e image processing applications. F r o m the analysis p r e s e n t e d , it is s h o w n that the LU d e c o m p o s i t i o n structure is the most s u i t a b l e for i m p l e m e n t a t i o n with m i n i m u m cycle time. T h e i m p l e m e n t a t i o n o f the p r o p o s e d structures with VLSI systolic a n d w a r e f r o n t a r r a y p r o c e s s o r s is a t o p i c o f current investigation. Signal Processing
References
[1] H. C. Andres and C. L. Patterson, "Singular value decompositions and digital image, processing", IEEE Trans. Acoust., Speech, Signal Process., Vol. ASSP-24, February 1976, pp. 26-53. [2] N. K. Bose, Applied Multidimensional Digital Signal Processing, Van Nostrand Reinhold, New York, 1982. [3] D. Dubois and W. Steenaart, "High speed stored product recursive digital filters", IEEE Trans. Circuits Syst., Vol. CAS-29, No. 6, June 1982, pp. 390-392. [4] D. E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1984. [5] F. R. Gantmacher, The Theory of Matrices, Vol. 1, Chelsea, New York, 1977. [6] H. Jaggernauth and A. N. Venetsanopoulos, "Real-time image processing through distributed arithmetic," Proc.
B.G. Mertzios, A.N. Venetsanopoulos / Fast implementation of 2-D digital filters
[7]
[8]
[9]
[10]
[11]
[12]
[13]
IEEE Int. Symp. Circuits and Syst., Vol. 1, Newport Beach, May 1983, pp. 394-397. H. Jaggernauth, A. C. P. Loui, and A. N. Venetsanopoulos, "Real-time image processing by distributed arithmetic implementation of two-dimensional digital filters", IEEE Trans. Acoust., Speech, Signal Process., Vol. ASSP-33, December 1985, pp. 1546-1555. B. G. Mertzios, "Block-space invariance of twodimensional digital filters", Signal Process., Vol. 13, 1987, pp. 141-153. B.G. Mertzios and A. N. Venetsanopoulos, "Walsh matrix implementation of M-D filters by the decomposition theorem", Proc. Int. Conf. on Digital Signal Process. Florence, Italy, September 5-8, 1984, pp. 59-63. B. G. Mertzios and A. N. Venetsanopoulos, "'Modular realization of multidimensional filters", Signal Process., Vol. 7, December 1984, pp. 351-369. B. G. Mertzios and A. N. Venetsanopoulos, "VLSI implementation of two-dimensional digital filters via twodimensional filter chips", IEEE Trans. Circuits Syst., Vol. CAS-33, February 1986, pp. 239-249; and IEEE J. SolidState Circuits, Vol. SE-21, February 1986, pp. 129-139. B. G. Mertzios and A. N. Venetsanopoulos, "Block decomposition structures for the fast modular implementation of two-dimensional digital filters", Circuits Syst. Signal Process., Vol. 8, 1989, in press. C. L. Nikias, A. P. Chrysafis and A. N. Venetsanopoulos, "The LU decomposition theorem and its implications to the realization of two-dimensional digital filters", IEEE
[14]
[15]
[16]
[ 17]
[18]
[19]
[20]
239
Trans. Acoust., Speech, Signal Process., Vol. ASSP-33, June 1985, pp. 694-711. A. Peled and B. Liu, "A new hardware realization of digital filters", IEEE Trans. Acoust., Speech, Signal Process., Vol. ASSP-22, No. 6, December 1974, pp. 456-462. I. Pitas and A. N. Venetsanopoulos, "Two-dimensional digital filter realization by transform decomposition and its analysis", IEEE Trans. Circuits Syst., Vol. CAS-32, October 1985, pp. 1029-1039. L. R. Rabiner and B. Gold, Theory and Applications of Digital Signal Processing, Prentice-Hall, Englewood Cliffs, N J, 1975. F.J. Taylor, "A distributed arithmetic MFIR filter", IEEE Trans. Acoust., Speech, Signal Process., Vol. ASSP-32, February 1984, pp. 186-189. S. Treitel and J. L. Shanks, "The design of multistage separable planar filters", IEEE Trans. Geosci. Electron., Vol. GE-9, January 1971, pp. 10-27. A. N. Venetsanopoulos and B. G. Mertzios, "A general implementation technique for two-dimensional digital filters", Proc. Sixth Summer Symp. on Circuit Theory, Prague, Czechoslovakia, July 12-16, 1982, pp. 176180. A. N. Venetsanopoulos and B. G. Mertzios, "A decomposition theorem and its implications to the design and realization of two-dimensional filters", IEEE Trans. Acoust., Speech, Signal Process., Vol. ASSP-33, December 1985, pp. 1562-1575.
Vol. 17, No. 3, July 1989