Algorithm-based fault tolerance for matrix inversion with maximum pivoting

Algorithm-based fault tolerance for matrix inversion with maximum pivoting

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING Algorithm-Based Department of Electrical and Computer 14, 373-389 (1992) Fault Tolerance for...

1MB Sizes 22 Downloads 113 Views

JOURNAL

OF PARALLEL

AND

DISTRIBUTED

COMPUTING

Algorithm-Based

Department

of Electrical

and

Computer

14, 373-389

(1992)

Fault Tolerance for Matrix Inversion with Maximum Pivoting YAO-MING

YEH

AND

Engineering.

The Pennsylvania

Existing fault-tolerant matrix-inversion schemes suffer several drawbacks, such as being limited to fault detection, requiring rollback to resume computation, cost ineffectiveness, instability, significant roundoff errors, and potential false alarms. In this paper, an algorithm-based fault-tolerant scheme for matrix inversion with maximum pivoting that can rectify the above problems is presented. This scheme can correct a single fault and detect multiple faults in each row and column in each iteration within a computation. The sequential and parallel algorithms based on checksums to support the fault-tolerant capability are developed. In this scheme, redundant processing elements needed and additional overhead for the enhanced fault correcting ability are relatively small compared to those in existing schemes. An implementation example that performs n x n matrix inversion with maximum pivoting on an (n + 1) x (n + 1) mesh-connected array processor with a time complexity of 0(n2) is also described. 0 1992 Academic Press. Inc.

I. INTRODUCTION

The rapid progress of VLSI technology has reduced the cost of hardware and stimulated the development of parallel processing and massively parallel processing techniques. This development has helped to achieve the high performance of many numerical computations. In addition to high performance, high reliability also becomes more important to ensure that the results of computation are valid. System level (or module level) fault-tolerance techniques can be divided into two categories: system-based error masking and reconfiguration and algorithm-based fault tolerance. System-based fault-tolerance techniques use a large portion of hardware and redundancy techniques to detect the faulty components, mask them, and even reconfigure the system. These error-masking techniques include Triple Modular Redundancy (TMR), Totally Self-Checking (TSC) circuits, and watchdog processors [IO]. The algorithm-based fault-tolerance techniques 13, 41 use redundant computation within the algorithm to detect and correct errors caused by permanent or transient failures in the hardware. This approach is not as generally applicable as the classical techniques such as TMR; how-

TSE-YUN

State

FENG

University,

University

Park,

Pennsylvania

16802

ever, in some specific cases, it costs much less than the system-based techniques. Huang and Abraham 131 suggested the use of the checksum method in matrix operations for algorithmbased fault tolerance. However, their scheme was restricted to some basic matrix operations such as matrix addition, scalar product, multiplication, transposition, and LU decomposition. In fact, the checksum method can be used in any matrix computation to provide faulttolerant capability only if the checksum property can be ensured within the computation. We define the checksum property as the property that the checksum elements always hold the value of the summation of their corresponding row/column elements. Matrix inversion, which involves a large number of computation steps, is a frequently used calculation in scientific and engineering applications. Computational faults can happen in such a computation-intensive task, especially when a large matrix is inverted. To ensure the correctness of matrix inversion, the need to enhance the computation with fault-tolerant capability is very important. Huang and Abraham [3] and Jou and Abraham ]4] have proposed fault-detection schemes for matrix inversion without pivoting. However, these schemes have several drawbacks as follows: I. Their algorithms are fault detection only. Their algorithms can only detect the row where the faulty element is located. They cannot pinpoint the faulty element. Therefore, a complete restart of the whole computation is required after the fault is detected. 2. Their algorithms use the Gauss method without pivoting to accomplish the computation of matrix inversion. The Gauss method without pivoting is known to be theoretically unstable. It fails when the pivot element vanishes. In addition, their algorithms suffer from potential roundoff errors from use of the Gauss method without pivoting. These roundoff errors are likely to create “false alarms” when the checksum method is used to detect the faults. 3. Huang and Abraham’s algorithm uses n X (2n + I) array processors, and Jou and Abraham’s algorithm uses n x (2n + 2) array processors to invert the n x n matrix. The redundant processing elements need in their algo-

373 0743.73 I WY2 $3.00 Copyright 0 1992 by Academic Press. Inc. All rights of reproduction in any fcxrn reserved.

374

YEH

AND

rithms are n x (n + I) and n x (n + 2), respectively. Apparently, both schemes are not cost effective. In order to improve the stability and accuracy of computation, we propose a new parallel scheme to enforce the fault-detection and -correction ability for the computation of matrix inversion with maximum pivoting. In our scheme, the augmented Full Checksum Matrix (FCM) is defined with checksum elements attached to each row and column of the matrix to be inverted. So, 2n + I redundant elements are needed for an II x II matrix to form an (!I + I) x (n + I) augmented FCM. With this augmented FCM, we develop the concurrent row operations and parallel pivot selection to parallelize the computation of matrix inversion with maximum pivoting and enforce fault detecting, locating, and correcting before the errors in faulty elements spread to other elements during execution. Our algorithm has overcome the problems of the existing schemes: I. Our algorithm can pinpoint the faulty element, correct the error, and resume the computation without rollback. 2. Our algorithm has the flexibility to perform matrix inversion either without pivoting or with maximum pivoting. Maximum pivoting is necessary in the computation of matrix inversion. It helps to derive the inverted matrix from any nonsingular matrix; it also ensures the least roundoff errors within computation. Moreover, it prevents false alarms generated from fault detection. 3. Our algorithm uses (n + I) x (11 + I) mesh-connected array processors to accomplish the whole computation. However, our algorithm can also invert a large matrix on the limited processors using the proposed interleaved mapping scheme. Therefore, it is not restricted to the size of the array processors. As a result, the redundant processing elements needed and the additional computation overhead required are quite small compared to those of other schemes. In Section 11, the sequential version of our scheme is presented. The sequential algorithm performs II x n faulttolerance matrix inversion with maximum pivoting on an (n + I) X (2~ + 2) matrix with time complexity O(n3). In Section 111, we describe the parallel version of our algorithm. Here, overlap mapping is used to reduce the number of matrix elements involved from (11 + I) x (217 + 2) to (n + I) X (n + I); concurrent row operations are defined to parallel the row operations in matrix inversion from O(n’) to O(n). Finally, the implementation of this parallel algorithm on mesh-connected array processors is considered in Section IV. We also describe the parallel scheme for performing maximum pivoting and fault detection and correction.

FENG

II. SEQUENTIAL ALGORITHM: THE CHECKSUM METHOD IN MATRIX INVERSION WITH MAXIMUM PIVOTING The checksum method is commonly used in algorithmbased fault-tolerant techniques 13, 41. Different kinds of checksum matrices were proposed to enforce the fdulttolerant capability in matrix operations. Huang and Abraham [3] used the row checksum matrix and Jou and Abraham [4] used the row-weighted checksum matrix to detect faults for matrix inversion without pivoting. However, none of the above-mentioned techniques can correct the faulty element and resume the computation without a complete restart. In our sequential scheme, two FSMs are used in a computation that can provide the capability to pinpoint the faulty element and correct it before the faulty value spreads to other elements. This scheme can recover and continue the computation without a complete restart. We modify the checksum procedure [3] to fit our needs. It is described as follows.

Step I. Checksum Encoding. tion of each row and each column own checksum element.

Calculate the summaand put the result in its

Step 2. Matrix Opc~rtrtions. Perform the modified matrix row operations on all the elements including matrix elements and checksum elements. We modify the procedure of the Gauss-Jordan elimination with maximum pivoting to include the operations on checksum elements. Step 3. Fulrlt

Dctccting,

Locating,

trnd Correcting.

First of all, we negate the resulting value in the checksum elements. Then, sum up all the resulting values in the same row and same column (i.e., include the checksum elements). If all the resulting values are correct, the computed summation of each row and each column will be zero. As a result, any nonzero summation in a row or a column shows the presence of a fault. We can locate the faulty element by cross referencing the faulty row and column. We can also correct the faulty value by adding the computed nonzero summation to it. It is possible that the roundoff errors from Step 2 might cause a nonzero computed summation from the correct resulting value. In order to avoid this situation, we introduce a tolerant value in fault testing so that it can ignore small roundoff errors. The selection of this tolerant value can be found from the literature [7, 9, 141. In this section, we focus on Step 2 of the checksum procedure. A modified scheme of the Gauss-Jordan elimination with maximum pivoting is proposed to work with the checksum method. The proposed scheme can be used in various kinds of checksum-encoding schemes such as FCM 131, weighted FCM [4], and general linear codes 181. However, an FCM can be derived at the least computa-

FAULT

TOLERANCE

FOR

tion cost; also, it can provide sufficient information for the purpose of fault detection and correction. Therefore, the FCM is used in our sequential scheme. Before we describe the modified procedure for matrix inversion, we define the FCM as below. DEFINITION 1. The FCM Ar of the matrix A is an (n + I) x (n + I) matrix defined as Af =

/A

Ae \

eTA

where e is called the checksum-encoded vector and is defined as eT = [I, 1, . . . . I]. In Af, the checksum column is denoted Ae, the checksum row as eTA, and the double checksum as eTAe. To invert an n x n matrix A, we need another identify matrix, B, of the same size to augment the computation. The FCMs of both matrices, namely Af and Br, are used and concatenated together for the modified row operations in our scheme. The concatenated matrix is an (n + 1) x (2n + 2) matrix and is defined as PI, where k is the stage index. For example, suppose we have a 3 x 3 matrix, A, to be inverted. Matrices A and B are

MATRIX

375

INVERSION

We describe the row operations on PI, by applying elementary transformations on it (i.e., this is done by premultiplying an elementary matrix to Pg) [2]. In our modified row operations, one stage of row operations is performed by four kinds of row operations. These row operations are defined as follows. Assume that the pivot selected is on af*j* at Stage k. DEFINITION 2. The row operation to interchange pivot row i* with row j* is performed by premultiplying an elementary matrix Riej* to PL , where R;*j* is derived by exchanging rows i* and j* of an identity matrix. For instance, assume i* = 2, j* = I; then 0 R;*j* =

I

...

0

1 0

...

0

. . .

.

. . .

i0 0

.

. .

0

...

II

DEFINITION 3. The row operation to multiply ~11elements ofpivot row by a scalar is performed by premultiplying an elementary matrix Qj*j* by Pk. Matrix Qj:j* is derived by replacing the jth diagonal element of an identity matrix with Iln;*j*:

I Qjeje = Then, Af , Bf , and P, are derived as

’ G . . . . . . . .. 0 i

0

.” . .

..,

. . ..

1

DEFINITION 4. The row operation to udd u scalar multiple of elements of the pivot row to the corresponding elements of the nonpivot row is performed by premultiplying an elementary matrix Sij* by Pa, where i is the nonpivot row. Matrix Sij* is formed by inserting -aij*la;*j* in ij* element of an identity matrix:

To invert an n x n matrix, n stages of operations are required. At each stage, a pivot is selected, and then row interchange and row operations on each row of PA are performed to transform the pivot column into a checksum identity vector with I’s at the pivot and checksum and O’s at the other elements in this column. At the end of Stage n, we get an FCM of an identity matrix at the left side of P,+, and an FCM of the inverted matrix at the right side of P,+ ,.

DEFINITION 5. The row operation to add a scalar multiple of elements of the pivot row to the corresponding elements of the checksum row is performed by premultiplying an elementary matrix Scn+I)j* by Pk. Matrix S(n+l)j* is formed by inserting (Iye - u(,,+I,j*)l~;~j* into

YEHANDFENG

376

(n + l)j* elements of an identity matrix, where Ii* is the identity vector for column j*, and e is the checksumencoded vector: 1

0

I:

1

1

0

S(n+l)j*

=

. .. 0 \ .. . 0 . . . . . . I

.

.. .

Ij*e

-

U(n+l)j*

. ..

a;y*

1

Therefore, R;*j* becomes an identity matrix, and no row interchange is needed. From our assumption, PX is a concatenated matrix of two FCMs and the following relations hold among matrix elements and their corresponding checksum elements: k

Qi(n+l)

I

=

2 j=l

btt,+l, = i

Ufj

for i = 1 . . . n

(II. 1)

bfj

for i = 1 . . . u

(11.2)

Ufj

for j = 1 . . . n

(11.3)

for j = 1 . . . n.

(11.4)

j-l

Note that Ij*e = 1 for j* = 1 ..* n, if we use eT = [ 1, 1, . . . . I] as defined in Definition 1. However, the proposed scheme is a general procedure that can be applied to other checksum-encoding schemes such as Jou and Abragam’s weighted FCM [4] or Nair and Abragam’s general linear codes [8]. In Jou, Abragam, and Nair’s checksumencoding schemes, eT is defined as [w,, M’?, . . . . MI,,]; therefore, I+ = wj* for j* = 1 .*e n instead. After each stage of row operations, PL is still the concatenated matrix-of two FCMs as defined in Definition 1. This is described in the following theorem. THEOREM 1. Using the row operutions us dejined in Dejnitions 2-5, PL is the concatenated matrix of two FCMs after each stage of row operations, wherr k = I, 2, . . . . n + 1.

= i

Ut,+l,j

i=l

btu+lj.j = i btj i=l

In this stage, Q,jTj*, Si,j*, and S(,,+ij,i*are defined as 1 k !I i*j* 0 . .

Qjejs =

0

...

0

. .. 0 . . . . . . . .. 1

Proof. We prove this theorem by induction. Initially, PI is derived from checksum encoding (Step 1 of the checksum procedure); it is a concatenated matrix of Ar and Br as mentioned previously. Assume that PI, is a concatenated matrix of two FCMs as shown below:

Sij*

3

=

S(n+l).i”

PL+I is defined as the result after these n + 2 transformations on Pk. Thus, Ph+ I =

b:,

SOI+ I!j*Snj*

.** S~,j*Q,j~j*R~*,*P~.

(11.5)

... Therefore,

bi,

. . .’

b:n+,,,

...

Px+, is derived as

a!,+I II b:,,+,,,,

b:,,+,,,,,+,,

’ also a concatenated matrix of two We prove that P.,,+, IS FCMs after one stage of row operations. One stage of row operations includes n + 2 transformations on PI,. They are one Ri*j* to interchange rows i* and j*, one Qj*j* to work on the pivot row, n - 1 Sij*‘s to work on n - 1 nonpivot rows, and one Scn+l)j* to work on the checksum row. To simplify the proof, assume that the pivot selected at this stage is on af,; that is, a$7 = ai,.

P h+I=

. !,+I ad

I 4+A,l

bh+l ,n+l)l

*..

bh+t ,?I+ lb1

b”+’ ,11+ INIl+

FAULT

TOLERANCE

FOR

MATRIX

377

INVERSION

From the above, we prove that PI+, satisfies (II.l)(11.4), and hence it is the concatenated matrix of two FCMs. Therefore, the theorem is proved by induction. n Using this assured checksum property on Pk’s, where procedure, as described in Step 3 of the checksum procedure, can be applied either once after the last stage or n times after each stage of matrix operations. However, the merits of matrix row operations make the faulty value very prone to extend to other elements. The extension of the faulty values is very significant especially when the faulty value falls on the pivot row and column. Therefore, instant detection and correction of the faulty values is required after each stage of matrix operations. This will ensure that the faulty value is detected and corrected before it spreads to other elements. The sequential version of our algorithm is shown in Algorithm 1.

k = 1 . . . n + 1, the fault-testing I,

/,-a,,, x a:, + a;, a II I - a:,,+,,,

..’

x a:, + a:,,+,,,

ai!

...

\

fiJ a:, -a:,

... \

x a:; + a;.,

.

x a:, + ai;

.

a:,

To check that PL+, still holds the checksum property, we must consider each element of the checksum row and column in both Ar and Bf within Pk+, . For the checksums on the checksum column in Af and B,, we can extract and group terms (i.e., l/a:, for pivot, for nonpivot row, and (1 - a~,,+,,,)/~\, for row, -af,laf, the checksum row) to prove that they satisfy Eqs. (11.1) and (11.2). For the checksums (i.e., atz>,,,j or 6tJJ,,j) in the checksum row of both matrices, the checksum properties in Eqs. (11.3) and (11.4) are assured by the equations

=

1 - X.:r=, a),

X

+

U~j

i

a:,

$2

a:, =----+

a:, + i i-l

-

a

I

a!,

2

Version

Begin Checksum encoding: Derive a concatenated FCM P, from given matrix A; For k = 1 to n Do Select a maximum pivot from PL; Apply row operations on Px: Pk,, =

.** S?,j+Qiyy Ri*j* PL

S(n+l)j*Sqj*

Fault detection and correction; End For; Get A-’ from P,,,,; End. To illustrate our scheme, we continue the example of 3 x 3 matrix A described previously. We need three stages to invert this matrix. At Stage 1, the pivot is selected from among nine elements in A. Value 9 at element a3,, which is the maximum among them, is selected as the pivot. Then, row interchange between rows 1 and 3 is done by R3,:

Ufj R31

I

Uij

+

I: Sequential

i-l

Uij

=--

Ufj

Algorithm

[-

= 2 a:;‘.

x

&

+

1

ai,

2

=

aij

X

Ufl

+

Ufj]

The transformations

for each row of P, are

378

YEH

S3I

=

According P2

=

1000 0 100 -4 0 I 0 3 0 001 I to Eq. (IIS),

AND

s41=

the result after Stage 1 is

~%&S~IQII&IPI I

f

+.

0

-g

-p

-9

0

I

= 0

Y

ti TV

I,

3$

?j

y

11-Y’

At Stage 2, the pivot is selected among four elements (i.e., ~22, a23, ~32, and Use). V at u3? is selected as the pivot. The result after Stage 2 is

0 0

-*

-*

&

1 -*

I

s

#

44

1 -t(5117 7%

1

*

.

At Stage 3, the only potential pivot left is -J% at ~33 of P3. The result after Stage 3 is P4

an n x n matrix. If we perform the above computation by sequential matrix multiplication, we need n(n + 2)(n + l)2(2n + 2) operations for the whole computation, assuming that each matrix multiplication needs (n + 1)?(2n + 2) operations. As a result, the time complexity is O(n’). However, some redundant calculations can be eliminated from the operations in each transformation. The transformation Qii needs only (2n + 2) divisions, since only the pivot row’is affected by this transformation. The transformation Sii needs only 4 x (2n + 2) operations, since each element of the affected row of matrix Pk needs one division, one multiplication, one subtraction, and one negation within this transformation. In addition, R,, needs 2(2n + 2) operations for row interchange. After all these redundant operations are eliminated, the whole computation needs n x ]2n + (2n + 2) + (n + I) x 4 x (2n + 2)] operations. The time complexity is O(n3). Beside row operations, we also need to consider the operations for pivot selection and checksum testing. For pivot selection, the number of operations needed is n x n at Stage I, (n - 1) X (n - I)at Stage2, . . . . 1 X I at Stage n. The total number of operations is n2 + (n - I)? + (n - 2)? + ... + I = n(n + 1)(2n + 1)/6. Therefore, the time complexity is O(n3). For checksum testing, the number of operations needed is (n + 1)(2n + 2) at each stage. The total number of operations for n stages is n(n + 1)(2n + 2). As a result, the time complexity is also O(n3). Therefore, considering the operations needed for pivot selection and checksum testing, the time complexity for matrix inversion is still O(n3). III. PARALLEL SCHEME

= WWhQdW’3 1001~ 0

=

FENG

I

8 0

1

&

0 0

1 1 -6

1 I

1 3

if

-*

&

-1

At this point, we can extract the inverted 3 X 3 matrix from the left upper corner of the right matrix in Pd.

ALGORITHM: A MODIFIED COMPACT WITH THE CHECKSUM METHOD

The idea of the modified compact scheme is derived from the compact computation by Waugh and Dwyer [ 131 and Lin [5]. We parallelize the algorithm and extend it to include fault-correction ability. The idea of our modified compact scheme can be described in two aspects: overlap mapping and concurrent row operations. They are described below. A. Overlap Mapping

In short, the computation of matrix inversion with maximum pivoting is done by a series of transformations on matrix P,, P4 = S43 S23 S I3 Q 33 R 33 S42 S32 S I2 Q 22k Therefore,

23

S41 S21 QiiR3iP~.

we need n(n + 2) transformations

to invert

In the sequential algorithm described in Section II, the matrix Pk is formed by concatenating two FCMs, Ar and Br. However, Br is redundant as it is used as an augmented matrix to help the execution of row operations. In our parallel scheme, we apply a special mapping scheme, called overfup mapping, on PA so that this redundant matrix can be eliminated within our computation. In our mapping scheme, we assign different mapping mechanisms for matrix elements and checksum elements. For the matrix element eij, the processor stores aij first. Since the values in hij’s are either I’S or O'S at the

FAULT

TOLERANCE

FOR

checksum-encoding phase, they are used only to augment the computation of matrix inversion. Therefore, we do not need to store them. Once the processor is among the pivot column at Stage k, it switches the value that it stores from Uii to h,i and stores h;j afterward. For the checksum element cii, the processor stores the summation of u,, and hji. Our mapping mechanism is shown in the following equation. Here, mesh-connected array processors are represented by the first matrix; each element f’“j represents the value stored in a processing element in Stage k. The second matrix shows the values mapped to each processing element. This is shown as follows:

i.l:,,,~.::..-‘:I

fr

I

ai, or hi,

ui,, or GI

.. .

u,I,, or hi,

. . . d+,,,,

+ e+m, &,il) !.

Ll,,(,,tl)

A

N(,,tll(ntll

+ ~~~htl,

.

9’+ ~~c,,+l,

Ie + Ae

i e’1 + eTA

From the above definition, the relationships among matrix elements and their corresponding checksum elements are modified as

i I

ef;

= I . .. n

(111.2)

= 2 e:,,+l!; = n + 2 2 4. (III.31

With this mapping scheme, new operations should be defined to operate on the augmented FCM. They are described in the next subsection. B. Concurrent

Ror4, Operations

for i = I . . . n

where ef; is the element in Stage k, L’:/ ’ is the element in Stage k + 1, P&+ is the pivot in Stage k, (‘fyi is its corresponding element at the pivot row in Stage k, and e!;* is its corresponding element at the pivot column in Stage k. But this prototype formula is not fully applicable in our scheme, especially when the checksum method is added to the computation. At each stage, after the pivot is selected, we classify the elements in the augmented FCM into six classes and derive a formula of concurrent row operations for each of them. They are described in Lemma 1. LEMMA 1. The concurrent row operations lrsed trt each duss of element in an urrgmented FCM ure:

eTIe + eTAe 1 ’

where I is an identity matrix and e is the checksumencoded vector as defined in Definition I.

efi,,+~) = 1 + i

4,,+INII+I) = 2 &,+I,

forj

‘+y x r$. e;;k+l = Ok1, ,! ’ Ci’i‘

DEFINITION 6. The augmented FCM A, of the matrix A is an (n + I) x (n + I) matrix defined as A

4,+,,, = I + i ti), I-I

+ htr,

Owing to this mapping scheme, the checksum-encoding procedure should be modified accordingly. We define an augmented FCM for this mapping scheme as below.

A, =

379

INVERSION

From the analysis of time complexity on the sequential algorithm in Section II, we found that if we use matrix multiplication to implement this sequential scheme, the bottleneck of the computation is the row operations at each stage. After all calculations performed on each element in the row operations are checked, it is clear that calculations on each element are independent of each other. Their calculations depend on the pivot and their corresponding elements at the pivot row and column. If we can broadcast the values of the pivot and their corresponding elements at the pivot row and column, their calculations can be done in parallel. We also know that all the elements have a similar calculation represented by the prototype formula

P; =

.. .

MATRIX

(III. I)

I. the pivot,

1 . e !%+Ii.j L 0 i”j9 -

(111.4)

2. elements in the pivot roM’ except the pivot,

(111.5)

YEHANDFENG

380

3. elements in the pivot column except the pivot and checksum, (111.6)

=

4. the checksum in the pivot column,

\

eX$;- 2 ey = 2 - +;

(111.7)

e i?:j

5. elements in nonpivotlnonchecksum pivot columns,

rows and non-

t+lh efaj X ef;* fii - eii L .; e>i*

(111.8)

6. elements in the checksum row except the checksum at the pivot column, h+I-

e ij Proof.

_ eijk _ (e$ - 2) X e hi*i”

See Appendix

e$j

In order to verify that the checksum elements in Pi, I satisfy Eqs. (III.l)-(111.3), we consider each element in the checksum row and column within Pi+,. Since they can be proved similarly, only the double checksum, the most complicated one, is shown here. For the double checksum, e$Ti’l),,,+l,, Ai, e(n+i)(,lti)

=

(ek,+i),+

h e0,+i)(,,+i)

-

-

2)

x

e h;sj+

&,+I,

(111.9)

B.

After each stage of concurrent row operations, Pi is still an augmented FCM as defined in Definition 6. This is described in the following theorem. Using the concurrent row operations described in Eqs. (ZfI.4)-(111.9), P; is an uugmented FCM after each stage of concurrent row operations, where k = 1, 2, . . . . n + I. THEOREM 2.

h

eh+i)i'. - 2

h

=

e b+ I ).j* -

C

Li;?

ek+ij.j

-

e

,I

(,l+lll

‘1. ...

h e,,0,+1)

h e b+ I bI

c’:,,+,N,z+l

p;+,

=

. ..

a+’

. ..

e,,i

ef,+:i)i

**.

=

e~~2i)

.4+1 f ,,,,

&I c’,,(n+I) e 2

I

[

C

2)

h e(,,+i)jek+i),j

-

j-2

h pielf

!%+I e h+ l)j*

+

2 j-2

2

” TX

1-2

“liil

II

ek’ijj

=

C !=I

et,Zi)j

,I

piI

X+1 e0,+1),, f

eX, I’/’

I

i*,j*

II

We prove that P;,, is also an augmented FCM. In order to simplify the proof, we also assume that the pivot selected in this stage is on element (1, 1); that is, e&e = e!i . From Lemma 1, Pi+, is derived as

eiih+’

2 x

h

,I

eh,,,,

h f? i‘i’

-

(4,,+1).i*

+

I e,,i

ek+~j,i

,I +

Proof. We prove this theorem by induction. Initially, Pi is derived from the overlap mapping of PI. Therefore, it is an augmented FCM as defined in Definition 6. Assume that P; is an augmented FCM at Stage k:

p; =

_

i e,tj”

I4)cfI+ I)

From the above, we prove that Pi,, satisfies Eqs. (III.l)-(111.3), and thus is also an augmented FCM. Therefore the theorem is proved by induction. The parallel version of our algorithm is shown in Algorithm 2. We illustrate our parallel scheme by the same example as that in Section II. At each stage, the value stored in each processor is shown below. Note that row inter-

FAULT

TOLERANCE

FOR

change operations are not performed here. Row interchange operations are implemented by the row/column labeling method. It accomplishes row interchange when data are read out after the final stage. The row/column labeling method is described in Subsection IVE.

381

INVERSION

2. pivot row:

3. pivot column:

Checksum encoding: / 3 5

MATRIX

After Stage 1: ll\

2

1-8

9

;

k ek+l 0

Y’\

=

a?

k e i*j*

4. the checksum in the pivot column: k 2 ek+l = 2 _ ““*, rJ e iqja

After Stage 2: /-&I

A

3%

After Stage 3: f&i\

I-%

2%

8

i%\

At this point, if we use the row/column labeling method to read out the matrix elements, we can get the desired inverted matrix as

5. non-pivot/non-checksum column:

row and non-pivot

6. the checksums in the checksum the pivot column:

row exclude

e;,+ I = e; _ cei* - f) x ef*j

eiaia

Distributed

Using the above equations, we can eliminate the bottleneck presented in row operations. For each stage, at most 5 steps of calculations are needed, since the processing elements concurrently execute these equations. As a result, it needs at most 5 x n steps for II stages. Therefore, the time complexity for executing these concurrent row operations is O(n). Once we apply concurrent row operations to the computation of matrix inversion, the bottleneck of the computation is switched to the procedures of pivot selection and checksum testing. We propose additional parallel schemes for these two procedures. They are discussed in Subsections IVC and IVD, respectively. Algorithm

2: Parallel

Version

Begin

Checksum encoding: Derive an augmented FCM Pi from given matrix A; For k = 1 to n Do Select in parallel a maximum pivot from Pi; Apply concurrent row operations on Pi: 1. pivot: ek+l

rJ

=

1 e i*je

7

fault detection

and correction;

End For; Get A-l from PA+, ; End. IV.

IMPLEMENTATION

Relevant practical considerations are needed in a discussion of our parallel scheme for a specific architecture. We use MIMD mesh-connected array processors as the target architecture. The parallel schemes of pivot selection and fault detection and correction are developed according to this architecture. A. Architecture

Because the position of the pivot selected at each stage is unpredictable, it makes communication patterns irregular when the pivot and elements in the pivot row/ column are broadcasted. Therefore, the systolic array SIMD model is not applicable for the computation of matrix inversion with maximum pivoting. Hence, an MIMD model is assumed in our parallel scheme. Here, we consider two types of mesh-connected array processors: mesh-connected array processors without wraparound and with rowwise/columnwise wraparound (i.e., also called toroidal network or multiple-loop network [6]) as shown in Fig. 1. Each processing element has the ability to perform

382

YEH AND FENG

arithmetic operations and two-way communication for each communication link. A register file is also required in each of them to store the values of broadcast data and special variables for pivot selection, checksum testing, and row/column labeling.

b

B. Algorithm Algorithm 3 (Appendix A) shows the detail of the implementation of our parallel fault-tolerant matrix-inversion scheme on mesh-connected array processors with rowwise/columnwise wraparound. Steps 1 to 3 are ineluded in the parallel procedure on each element of the array processors, Step 0 does the preprocessing, and Step 4 does the postprocessing. The matrix element is stored in the E register of each processor. The I, J variables indicate the position indices of the matrix element. The other variables needed include P (Pivot), PI (Pivot row index), PJ (Pivot column index), PR (Pivot Row data), PC (Pivot Column data), RS (Row Sum), CS (Column Sum), TL, TR, TU, TD (Temporary storages for the ripple chain in Left/Right/Up/Down directions), RL (Row Label), and CL (Column Label).

C. Parallel Pivot Selection In our parallel scheme, we use a pure local algorithm [ 1] to perform maximum pivoting and pivot broadcasting. It is demonstrated below. The values shown in each matrix are the P values in the processing elements. Stage 1. In mesh-connected array processors rowwise/columnwise wraparound,

with

FIG. 1. Mesh-connected array processors: (a) without wraparound and (b) with rowwise/columnwise wraparound

Stage 3.

At one communication step, each processing element needs approximately 10 operations to perform data exchange and comparison among neighbors. If we consider the network with rowwise/columnwise wraparound, we need around 10 x (n + 1) operations for pivot selection/ broadcasting and (n + 1) operations for pivot row and column broadcasting. As a result, we need k x (n + 1) operations to broadcast all pivot values to every element in each stage, where k = 10. Hence, the time complexity is O(k x n2) for pivot selection and broadcasting. D. Distributed

Fault Detection

and Correction

The steps to accumulate row/column summations for the purpose of fault detection and correction are shown in Fig. 2. The number of propagation steps in each stage depends on the network used. For that without wraparound, n steps are needed; for that with rowwisel columnwise wraparound, (n + 1)/2 steps are needed. We illustrate the mechanism to locate and correct error by adding an error offset 1 at e23 after checksum encoding, as is shown in Fig. 3. From this figure, we know that the error offset affects both the faulty row and the faulty column. By checking its own and four neighbors’ RSs and CSs, the faulty element can detect its own fault and correct its faulty data by subtracting it with RS (or CS). Fault testing can be performed either once after the final stage or n times after each stage of concurrent row operations. If it is performed at every stage, the number of operations needed is approximately half of that of pivot selection. It is derived as follows: (n + I)/2 propagation steps are needed to accumulate row/column summations in each stage. One propagation step takes about

FAULT TOLERANCE

FOR MATRIX

+

INVERSION

383

RS:3 cs:3

RS:5 cs:5

RS:2 cs:2

RS:-10 cs:-10

RS:8 cs:a

RS:l CS:l

RS:3 cs:3

RS:-12 cs:-12

RS:9 cs:9

RS2 cs:2

RS:4 cs:4

RS:-15 cs:-15

RS:-20 ~~1-20

RS:-a cs:-a

RS:-9 cs:-9

RS:37 cs:37

b

FIG. 2. Rowwise/columnwise data accumulation: (a) initial loading of rowwise/columnwise around; and (c) in the architecture with rowwise/columnwise wraparound.

10 operations to propagate the data in four directions and accumulate them to RS and CS. Therefore, we need around 10 x (n + 1)/2 x n operations for n stages of fault testing. The time complexity is 0(1 x n2), where 1 = 5. Figure 4 describes the errors that can be detected and

3 a 9 21

5 2 11 l ~413 2416 9

10

40

+

2 2

corrected in our scheme. In Fig. 4a, the errors are located in different rows and columns, which are the errors that can be detected and corrected individually. Figure 4b describes the errors that can be detected but cannot be corrected: the errors are located in the same row and/or

3 3

5 5

a a

14 14

9 9

2 2

4

-15 -15

-20 -20

-8 -a

-9 -9

37 37

4

FIG. 3. Fault testing by introducing

accumulation; (b) in the architecture without wrap-

-10 -10 -12 -12

)..a

error offset + 1 on element (2,3).

384

YEH AND FENG

scheme, the errors that can be detected and/or corrected depend on the distribution of the errors.

‘m bm

E. The RowlCoEumn

Labeling

Mechanism

-e

+e

The operations of row interchange and potential pivots distinction are supported by the row/column labeling mechanism. This mechanism uses two registers, RL (Row Label) and CL (Column Label), in each nonchecksum element. These two registers act as the flags for current pivot row and column and as the position indices of the inverted matrix. It is shown in Fig. 5.

+e

-e

F. RowwiselColumnwise Large Matrix

-m

FIG. 4. (a) The types of errors that can be detected and corrected; (b) the errors that can be detected but cannot be corrected; and (c) the errors that cannot be detected.

column. The errors that cannot be detected are shown in Fig. 4c, which are the errors in the same row and column with complementary error offsets. Therefore, in our Stage

Interleaved

Algorithm 3 considers only the cases where the matrix size is no larger than the array processor size. However, the number of processors in a system is limited; the modified parallel scheme for the inversion of large matrices is discussed in this subsection. Our enhanced mapping scheme for large matrix is called rowwiselcolumnwise interleaved mapping. In this scheme, two passes of mapping through the whole matrix are needed in the procedure to decide the maximum pivot

1:

3 8

5 1

2 3

11 13

pivot

Stage

2:

Stage

3:

pivot +

pivot

-3139

9139

-83/l 17 7/l 17 32/l 17

7/39 a39 53139

6139 a 48/l 17 1321117

FIG. 5.

v

5119 41117 186/117 323/l 17

The row/column

Mapping for

RL

RL

RL

CL:3

CL:

CL:

RL CL:3

RL CL:

RL CL:

RL:l CL:3

RL:i CL:

RL:l CL:

IL:2

1 RL:2

1RL:2

RL:2

RL:2 CL:3

CL:1

RL:2 CL:2

RL:3 CL:3

RL:3 CL:1

RL:3 CL:2

RL:l CL:3

RL:l CL:1

RL:l CL:2

labeling mechanism.

1

FAULT Pivot

TOLERANCE

FOR MATRIX

Selection: Pass

1: column-wise derive

local

mapping

maximum

within

each

mapping

pJ-p~i

Pass

2:

row-wise

derive

mapping

maximum

19944

broadcast

the

pivot

it to all elements

3: column-wise

1

13

33

Rowwiseicolumnwise

dure of distributed fault detection and correction. It needs two passes, one to derive rowwise summation by rowwise mapping and the other columnwise summation through columnwise mapping. In other words, to invert a large matrix that is larger than the array processor size, six passes of interleaved mapping are needed for each stage: three passes for parallel pivot selection, one pass for concurrent row operation, and two passes for distributed fault detection and correction. The advantages of this mapping scheme are that the mapping pattern is regular; it is not affected by the matrix size; and, also, it makes our parallel scheme applicable for any sized nonsingular matrix.

31

l+-w-I

mapping pivot

385

column

I-

the

broadcast 19999

1

broadcast

FIG. 6. tion.

and

I

13

Pass

pivot

INVERSION

row

interleaved mapping for pivot selec-

and broadcast this pivot value to every element of the matrix. The first pass is to compute the local maximum among the elements within each mapping. Either rowwise or columnwise mapping can be used at this pass. Within this pass, the computations of each element is the same as in Algorithm 3. At the second pass, the mapping is switched from rowwise to columnwise or vice versa. Since each mapping at this pass covers all the local maximum potential pivots from the last pass, the global maximum pivot can be derived and broadcasted as in Algorithm 3. This scheme is demonstrated in Fig. 6. In this figure, assume that eight processing elements are available for the 4 x 4 FCM in the previous example. Aside from maximum pivot selecting and broadcasting, we also need to broadcast the data in the pivot row and column. One more pass of interleaved mapping is needed to accomplish this. If the second pass is a rowwise mapping, an operation of rowwise broadcasting of the pivot column is performed in this pass after the pivot is decided, and then, the third pass is columnwise mapping to accomplish columnwise broadcasting of the pivot row. A similar procedure is applied when the second pass uses columnwise mapping. This scheme can also be applied to extend the proce-

V. CONCLUSIONS

A parallel fault-correction scheme for matrix inversion with maximum pivoting is described. This scheme can detect and correct transient errors in processing elements without rollback. The proposed modijied compact scheme with checksum method optimizes the row operations within a computation to a time complexity of O(n). It also reduces the hardware overhead to half that of existing schemes. The computational merits of matrix inversion require that maximum pivoting should be included to ensure stable computation, reduce roundoff errors, and prevent false alarms. A parallel scheme for maximum pivoting is also described. This scheme uses a regular local communication to perform pivot selecting and broadcasting with a time complexity of O(n2). As a result, the whole computation needs a time complexity of O(n*) to perform it x n matrix inversion with maximum pivoting on (n + 1) x (n + 1) mesh-connected array processors.

APPENDIX A

Algorithm

3: Parallel Algorithm on Mesh-Connected Array Processors

Begin { Step 0: Checksum encoding }

{ Sum up each row/column, and load the summation to its checksum element } E(i, n + 1) +- Xj”=,E(i, j) i = 1 ... n j= 1 . ..n E(n + l,j> + CZ=,E(i,j) { Adjust each checksum element by adding 1 to it } E(i, n + 1) + E(i, n + 1) + 1 i= 1 . ..n E(n + 1,j) +E(n + 1,j) + 1 j= 1 . . . n { Sum up checksum column to get the double checksum } E(n + 1, n + 1) tQiE(n + 1,j)

386

YEH

AND

PARALLEL PROCEDURE in processor (I, J) where z= 1 . . . n + 1, J = 1 . . . 12 + 1 for k = 1 to n do { Step 1: Parallel pivot selection } { Decide the potential pivot and set its (P, PI, PJ)} if (RL not set)and(CL not set) then P t E { potential pivot } else P t 0 { not potential pivot } PZ+Z;PJ+J for i = 1 to n + 1 do Send (P, PI, PJ) to its four neighbors Receive (P, PI, PJ) from its four neighbors { Find the maximum absolute P among the P’s of its own and four neighbors and set new (P, PI, PJ) accordingly} (P, PI, PJ) + (max( [P’s I), PI, PJ) endfor { Broadcast pivot row and set CL} if (I = PZ)thenPR t P; CL t PI send PR to column-wise directions (i.e. up/down) else receive PR from column-wise directions { Broadcast pivot column and set RL} if (J = PJ)thenPC t P; RL + PJ send PC to row-wise directions (i.e. left/right)‘ else receive PC from row-wise directions { Step 2: Concurrent row operation }

FENG

{ Accumulate the propagated data to RS and CS } n+l if i = -thenRS+RS+ TL; 2 CS t CS + TU else RS +RS + TL + TR; CS + CS + TU + TD endfor { Fault detection and correction} if(l RS I> c)and(/ CS I> E) then E t E - RS { error detected and corrected } endfor END parallel procedure. { Step 4: Read out inverted matrix according to row/ column labels } A-‘(RL, CL) t E(Z, J) where Z = 1 . . . n, J = 1 . . . n END.

APPENDIX B

Proof of Lemma 1 To simplify the proofs, we assume alI is the pivot selected at Stage k. Therefore, i* = 1, j* = 1, and the pivot k k is ei*j* = e II - ~II. 1. The Pivot The transformation

performed

at the pivot row is Qi*j* :

if (I = PZ)und( J = PJ) then E t f if (I = PZ)and(J

# PJ) then E t y

if (I # PZ)und(Z # (n + l))und(J

Qisje =

= PJ) then

Et-PC

P if (I = (n + l))und(J = PJ) then v e PR - 2 r,+/L-P if (I # PZ)und(Z # (n + l))und(J # PJ) then ,VtE-PRxPC P if (I = (n + I))und(J # PJ) then EtEy(PR-2)xPC { Step 3: Distribute: fault detection and correction } Copy E to RS, CS, TL, TR, TU, TD

In the following equations, we denote the transformation on an element by an operator “0”. Therefore, application of transformation Qi*j* to bf*j* is represented as Qi*j* 0 [bf*j*]. From our overlap mapping scheme, the pivot, k ei*j*, stores af*j* before the transformation and stores bik*Jt!instead after the transformation. Therefore, we need to consider the calculation on bf*j* rather than af*j*. Before the transformation, the value of bf*j* is 1. Applying Qi*j* 7 we get the result on this element:

n+l

for i = 1 to 2 do I 1 Send TL, TR, TU, TD to its four neighbors respectively Receive TL, TR, TU, TD from its four neighbors respectively

ev’ = Qi*js 0 [bf*j*] = Qi*j* 0 [l] = (-$-) 13

=-

1

uiy*k

=- k1 eiy*

.

x

1

FAULT

TOLERANCE

FOR

2. Elements in the Pivot Row except the Pivot The transformation performed at the pivot row is Qi*j* . For these elements, we do not need to switch the values stored within them. Therefore, element ei*j stores Uf*j (i.e., stores bf*j if it has switched its storage before) before the transformation and stores av’ (or br’) after the transformation. Applying Qi*j* , we get the result on this element:

Moreover, we also need to show that this equation is still valid for the checksum element at the pivot row. Before the transformation, the value stored at this checksum is the summation of a$(,+,) and b&,+,), where a$(,+,) and b&,+,) are the summation of their corresponding row in the left and right matrices, respectively. Therefore, the following checksum property is satisfied:

MATRIX

387

INVERSION

3. Elements in the Pivot Column except the Pivot and the Checksum The transformation

used in nonpivot

row i is Sue:

Since these elements are on the pivot column, the values stored within them are switched from a$ to b$‘. As a result, the equation of the concurrent row operation is formed by considering the pivot column at the right matrix of Pk. Before the transformation, the values in this column form a checksum identify vector. Hence, we can represent the transformation on them by the equation

:) 1

&(,+I)

=

&,+II

+

b&,+1)

=

5

uf*j

+

g

bf*j.

0

(B.1)

sjj* x The checksum property is also satisfied after the transformation

! . 0 1

!X+

ei*cn+l)

=

u~*::+I)

+

bf*Td+I) = -$ a;;’ + i

b$‘.

(B.2)

j=l

Applying element

Qi*j* , we get the calculation

for this checksum

From the above, we know that b$ = 0. Therefore, result on element eij is

the

k

ev’

= Sij* 0 [b$ = fjij* o [()I = (- .$)

xl=-zp eiy*

k+l eij

_ -

k+l ei*(n+l)

=

Qi*j* 0

=

k+l ai*(n+l)

[a$(n+,)I

+

+

b$A+I)

Qi*j* o [bf*(n+l)l

4. The Checksum in the Pivot Column For the checksum S(?I+,~*:

row, the transformation 1

0

0

1 *. . 0 . . . .. . . . .

S(n+lfi* = 1 -

&+,y* k U,*j*

o

used is

. .. 0

. ..

1

This checksum element is the summation of its corresponding column. Owing to our overlap mapping scheme, processor et,+,G* stores the summation of u:,+~~ and b&+,,*, Before the transformation, the checksum identity vector is located at the pivot column of Ar within Pk; therefore, btn+lU* = 1. The value of etn+rU* is The above equations also show that this checksum element still holds the checksum property after each stage of transformations.

YEHANDFENG

388

After the transformation, the checksum identity vector is transformed to the pivot column of Br . Therefore, the value of etz++Inj*is

SC,,+IQ*, we can get the following

Applying

operations on

element et+’ rJ ** ek+’ v - e:;;,li =

S(n+lV*

=

(2

-

= ~;n+:,~ + b;:& o

;

_

6. Elements in the Checksum Row except the Checksum at the Pivot Column

+ 1

ta:,+I)jl + S(n+ly o [II

&+lg*)

X

&j

k aiy* =

= e:&

(et

-

2)

k eiy*

X

+

The transformation used on this checksum row is Cn+lli*. There are two types of checksum elements S among them: the double checksum and the checksums that are not at the pivot and checksum columns. We consider the former first. The double checksum is the summation of all the matrix elements before the transformation:

e&+l)j

d*j

5. Elements in NonpivotlNonchecksum Nonpivot Columns

Rows and

Because these elements are not in the pivot column, we do not need to switch the values stored in them. They are also not in the pivot row; therefore we can apply SO* on them. When we apply SC* on them, the operation at element eij is

= 9 a;;:,, i=l

Applying

ep’ = sij* 0 [$ = (- $-j k

=

ij -

Uf

X k

aiy*

Uf*j

_ -

k ij -

X Uf*j + t e$

X k eiy*

!%+w,

the double

efsj .

This equation is still valid for the elements on the checksum column except the double checksum. These checksum elements hold the checksum property before and after the transformation as shown in Eqs. (B.1) and (B.2). When we apply S,. on them, we can get the result

ek+’ v = =

+ i

b;;:,,

.

i=l

we can get the following

operations on

FAULT

TOLERANCE

FOR MATRIX

9. - 2) k eiy*

= ; _ (f$

X

d*j

10.

Two situations might happen for checksum elements that are not on the pivot and checksum columns: (1) b:r$j = bfn+,~ = 1. This happens for those elements that have not switched their storages before. (2) af,t:l)j = ab+l)j = 1. This happens for those elements that have switched their storages. If we consider the first situation, we have the equations

=

1 -

(e&+l)j*

=

(2

=

;, _

-

1

a ki*j*

t -

e%+lv)

X

d*j

X

U$j

+

e;

k aiy. (et*

-

2)

+ (afn+i,j

n

+ 1)

+,)j

X d*j

k eiei*

We can also prove the second situation similarly. REFERENCES 1. Bokhari, S. H. Finding maximum on an array processor with a global bus. IEEE Trans. Cornput. C-33, 2 (Feb. 1984), 133-139. 2. Carnaham, B., Luther, H. A., and Wilkes, J. 0. Applied Numerical Methods. Wiley, New York, 1969. 3. Huang, K. H., and Abraham, J. A. Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. C-33, 6 (June 1984), 518-528. 4. Jou, J. Y., and Abraham, J. A. Fault-tolerant matrix operations on multiple processor systems using weighted checksums.” Real time signal processing, VII. Proc. SPIE, 495 (1984), 94-101. 5. Lin, W. A parallel error-detecting algorithm for matrix inversion. Unpublished paper, 1988. 6. Lubachevsky, B. D. Efficient distributed event-driven simulations of multiple-loop networks. Comm. ACM 32, (Jan. 1989), 111-131. 7. Luk, F. K. An analysis of algorithm-based fault tolerance techniques. J. Parallel D&rib. Comput. 5 (1988), 172-184. 8. Nair, V. S. S., and Abraham, J. A., General linear codes for faultReceived October 2, 1990; accepted February 25, 1991

11. 12. 13. 14.

INVERSION

389

tolerant matrix operations.” Proc. 1988ZEEE International Symposium on Fault-Tolerant Computing, 1988, pp. 180-185. Ortega, J. M. Numerical Analysis: A Second Course. Academic Press, New York, 1972. Siewiorek, D., and Swarz, R. S. The Theory and Practice of Reliable System Design. Digital Equipment Corp., 1987. Sorenson, D. Analysis of pairwise pivoting in Gaussian elimination. IEEE Trans. Comput. C-34, 3 (Mar. 1985), 274-278. Veldhorst, M. Gaussian elimination with partial pivoting on an MIMD computer. J. Parallel Distrib. Comput. 6 (1989), 62-68. Waugh, B. F. V., and Dwyer, P. S. Compact computation of the inverse of a matrix. Ann. Math. Statist. 16 (1945), 259-271. Wilkinson, J. H. Rounding Errors in Algebraic Processes. Prentice-Hall, Englewood Cliffs, NJ, 1964.

YAO-MING YEH received the B.S. degree in computer engineering from National Chiao-Tung University, Taiwan, in 1981, and the M.S. degree in computer science and information engineering from National Taiwan University, Taiwan, in 1983. In August 1991, he completed his Ph.D. requirement in the Department of Electrical and Computer Engineering, The Pennsylvania State University, Pennsylvania, under the supervision of Professor Tse-yun Feng. He was an instructor in the Department of Computer Engineering, National Chiao-Tung University, Taiwan, from 1985 to 1986. His research interests include parallel processing, fault-tolerant computing, performance evaluation, and artificial intelligence. TSE-YUN FENG received the B.S. degree from the National Taiwan University, Taipei, Taiwan, the M.S. degree from Oklahoma State University, Stillwater, Oklahoma, and the Ph.D. degree from the University of Michigan, Ann Arbor, Michigan, all in Electrical Engineering. He was on the faculty of the Department of Electrical and Computer Engineering, Syracuse University, Syracuse, New York, and Wayne State University, Detroit, Michigan, and the Department of Computer and Information Science, Ohio State University, Columbus. Ohio. He is now Binder Professor of Computer Engineering at The Pennsylvania State University, University Park, Pennsylvania. He has extensive technical publications in the areas of parallel and concurrent processors, interconnection networks, computer architecture, and switching theory. Dr. Fent was Editor-in-Chief of the IEEE Transactions on Computers, 1982-1986. He has edited a number of Sagamore Computer Conference on Parallel Processing Proceedings and Interconnection Networks for Parallel and Distributed Processing. In addition, he has also edited special issues for the IEEE Transactions on Computers, ACM Computing Surveys, and the Proceedings of the IEEE. He has been an invited speaker for various organizations and served as a consultant or reviewer to several companies and publishers. He has received a number of awards and honorary recognitions for his technical contributions and scholarship. He has also been active professionally. He was President of the IEEE Computer Society, 1979-1980, Chairman of the International Conference on Computers and Applications, among others. He is currently Chairman of the Distinguished Visitors’ Program of the Computer Society, Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems, and Conference Chairman of the International Conference on Parallel Processing, which he initiated 20 years ago.