An efficient implementation of LU decomposition in C ALAN MEYER Computer Science Department, Washington State University, Pullman, Wa. 99164-1210, USA.
Keywords: LU decomposition, array access, compiler, C, pointers
multiplies, but when one considers the two-dimensional array overhead, this count is seen to be n 3 + 5 n 2 - 6n. While these counts are both 0(n3), the coefficients differ substantially. Aware of the expense o f array access, the programmer can include its reduction as one of the implementation goals and the C programming language provides features whereby this cost may be effectively reduced. The second section o f this paper discusses twodimensional array overhead and how it may be reduced in C. The third section presents an implementation of LU decomposition which utilizes this reduction technique. A listing o f this implementation is given in the Appendix. The details o f array overhead are based on the C compiler running under Unix* bsd 4.3 on a VAX 11/750. The results are independent of the optimizer. Although optimization did improve some portions of the code, the cost o f accessing elements of two-dimensional arrays was not affected. While other compilers and optimizers may perform differently, these results are believed to be typical.
I. I N T R O D U C T I O N
2. TWO-DIMENSIONAL ARRAYS
LU decomposition, a well-known method for solving systems o f linear equations, is widely used since it is easily understood and well suited for computer implementation. This paper presents an implementation of the LU method with partial pivoting, t The implementation o f this particular method was used to explore the overhead o f using two-dimensional arrays and to study how this expense might be reduced. The technique described here can be readily applied to the implementation of other algorithms. The LU method is composed o f two parts: the decomposition step and the solve step. A straightforward analysis o f the algorithm can provide a count o f the number o f arithmetic operations required for each o f these two steps. By considering the expense incurred from accessing elements o f the coefficient array one sees that, in practice, the study o f the algorithm alone does not provide an accurate count. For example, the decomposition step theoretically requires
Most high level languages support multi-dimensional a r rays as a standard data type. By relying on the compiler, the programmer need not dwell on the lower level details o f array access. However, it is sometimes worthwhile to consider these details in order to quantify the overhead involved. In C, arrays are stored internally in row major order. 2.3 As an example o f array access, consider the following array:
An implementation of the well-known LU decomposition method for solving systems of linear equations is presented. Operation counts typically stated for this method are derived from a theoretical analysis of the algorithm and ignore the practical aspects of the implementation. The overhead associated with assessing elements of the two-dimensional coefhcient array are explored herein, and seen to be substantial. This overhead is incorporated into the analysis of LU decomposition and new operation counts are developed. By exploiting the addressing capabilities of C the cost of array access is significantly reduced, resulting in an efhcient implementation. The techniques employed here can be applied to a wide variety of C programs which utilize multi-dimensional arrays.
n 3
n 2
3
2
+
n
6
A c c e p t e d S e p t e m b e r 1987. D i s c u s s i o n closes S e p t e m b e r 1988.
1988 Computational Mechanics l:'ublicalions
double data[ 10] [ 10] ; To access a particular element, data [i] [ j ] , the address o f the element is located with this access equation:
BASE +(i* 10+ j)*ELSIZE
(!)
where BASE is the base address of the array and ELSIZE is the size of each array element. For the machine used, the size o f each double was 8 bytes. In generating code for this equation, the compiler distributed the manipulation o f ELSIZE to produce:
BASE+ i . 8 0 + j . 8 * U n i x is a t r a d e m a r k o f A T & T Bell L a b o r a t o r i e s .
Adv. Eng. Software, 1988, Vol. 10, No. 3
123
The assembly code generated for this equation involves one mukiplication, one shift and two additions. This is the overhead incurred every time an element of a twodimensional array is accessed. It should be noted that this access equation is r¢!atively simple in C since all arrays have a lower bound of zero, whereas languages which allow user specified lower bounds, such as Pascal and Fortran, require a more expensive array access. To access a single element of an array, the evaluation of an equation like (1) cannot be avoided. The key to reducing the array overhead is in recognizing situations where multiple array elements are to be accessed in certain orders. For example, consider the situation in which one needs to access all elements of a particular row of the array. The first element of the row can be located using the access equation, but the expense is not necessary for the other elements. The second element of the row can be found by adding ELSIZE to the address of the first element, and so on. Unfortunately, most compilers are not smart enough to detect this situation and they will instead generate code to evaluate the access equation for each element. By using C pointers, the two-dimensional array can be viewed linearly, allowing the programmer to provide his own equation for accessing elements of the array. Consider, for example, a function for computing the determinant of a triangular matrix, done by multiplying together the diagonal elements. Figure l(a) shows a version of this function using the two-dimensional array features of C. The loop is entered SIZE times, each iteration requiring one multiplication plus the overhead in evaluating the array access equation to locate data[i] [i]. This results in 2 , S I Z E multiplications, SIZE shifts and 2 • SIZE additions to process the loop.
#define deuble
S I Z E 17
A version of this function using pointers is shown in Figure l(b). To access each successive diagonal elemen~ one needs only to add (SIZE+ I)*ELSIZE to ~b address of the previous element. Since both SIZE an~ ELSIZE are known at compile time, the compile~ generates only one addition to locate each diagonal element. This results in only SIZE multiplications and SIZE additions to process the loop in this version, Thus a simple application of pointers provides a function le~ than half as expensive as a similar function using twe dimensional arrays. The use of pointers in this manner is well known to the assembly programmer. Since assembly languages d~r not support two-dimensional arrays, the programmer has no option but to implement his own array acce~,~ code. The advantage of employing this technique in a high level language like C is that the code is more portable. C has proven to be very portable and C compilers are available for a wide range of systems. It can easily be argued that using pointers to circumvent the overhead of array access is contrary to the philosophy of a high level language. After all, the whole point of providing two-dimensional arrays as a language feature is so the programmer can ignore the details of array access. While this argument may be true for some high level languages, it is not as clear-cut for C. An experienced C programmer is well acquainted with using pointers to increase code efficiency, and it is for precisely this type of use that the pointer features of C were provided. It should be noted that it is possible to use tiais method in other high level languages such as Pascal or Fortran. In Pascal, free-union variant records could be employed to trick the compiler into allowing a program-
# d e f i n e S I Z E 17
Determinant(data)
double Determinant(data)
d o u b l e data[SIZE][SIZE] ;
double
*data ;
d o u b l e det = 1.0 ; int i ;
d o u b l e d e t = 1.0 ; int i ;
for(i=0; i
ford(i=0; i
{
return(det)
;
}
} }
return(det)
;
(b) Figure !. Computing the Determinant o f a Triangular Matrix 124
Adv. Eng. Software, 1988, VoL 10, No. 3
array overhead added substantially to the total expense. In the development of a second version, using the pointer features of C, the pattern in which elements o f the coefficient array were accessed during the different steps of the solution was analyzed. Based on this analysis, code to efficiently access the coefficient matrix was developed. The particular access methods used may be found by studying the LU code in the Appendix. The decomposition step and the solve step were considered separately. For each of these steps, the assembly code generated by the compiler was studied to determine the array access overhead, which was included with the theoretical results to establish new operation counts. For the decomposition step the pivot search, the row interchange, the forming and storing o f the multipliers,
mer to access an array with his own access equations. Although this trick is illegal in ANSI Pascal 4 few compilers enforce it and it is a well known technique for getting around Pascal's strong typing. In Fortran, E Q U I V A L E N C E statements could be used in a similar manner. Although employing these tricks may certainly be possible in Pascal and Fortran, C is preferable for this type o f pointer manipulation. In C, this method can be implemented with standard features in a manner consistent with the philosophy o f the language. 3. ARRAY O V E R H E A D IN LU D E C O M P O S I T I O N In a straightforward implementation of LU decomposition using the two-dimensional array features of C,
Theoretical Results
Array Implementation
Pointer Implementation
Decomposition Multipications
13 1 5n - ~12 . +~n
n3+5n 2-6n
Additions
0
4 3 37 -n3 + 11n2 -- --~n
2
Subtractions
Divides
3
11 2
~n + T~
Shifts
13
13
2 3 + l-~ n ' -~n
--
9 2
37 -
1 s 3 7 ~ n + ~n ~ + ~n - 3
1---~n -- 4
3
Tn
5n - ~,~ + ~,~
12
1
5,~ - ~n + ~n
12
1
12
I
12
1
13
12
1n
1 ~
1
5n--~n +~
Solve 2
Multipications
n 2 -- n
Additions
0
2n 2
1 2 5 ~n + ~n
Shifts
0
n~
3n
Subtractions
Divides
n
2
Z n 2 -- n ~
--~
n
n
2
n
3 2
--n
n
+n
1
n
Figure 2. Operation Counts Adv. Eng. Software, 1988, Vol. 10, No. 3
125
Array Implementation
lz
Pointer Implementation
20 .2000 .1000 50 2.444 1.305 i00 17.78 9.095 500 2118. 1116. Figu~ 3. Average Execution ~mes ~ Seconds for Decomposition
and the elimination were included. For the solve step the elimination and the back substitution were included. The results depict the case in which a row interchange is required at each iteration of the decomposition. The operation counts are shown in Figure 2. By considering the cost o f array access a more accurate view o f the operation counts is obtained; the counts for the array implementation are substantially higher than the theoretical counts. The pointer implementation reduces the array overhead resulting in significantly improved counts. After d e v d o p i n g operation counts for both implementations, timing tests were run on the decomposition step. The timing results are shown in Figure 3. Each test run consisted o f loading an n x n array with random values and then called LUDecompose. This timing was performed for n = 20, 50, 100, and 500. The time required for loading the coefficient values was measured and subtracted from the total time in order to isolate the decomposition cost. For each value o f n, twenty-three test runs were performed for both versions of L U D e c o m p o s e on a VAX 11/750 with Floating Point Accelerator. The times were measured using the C shell time function and only the amount of user time was considered. This timing utility is only accurate to one tenth o f a second and so the results for n = 20, in particular, are not as precise as desired. However, the times show a general trend in which the pointer version
executes significantly faster than the array version. These times reflect the speedup anticipated from the operation counts in Figure 2.
4. C O N C L U S I O N The effect o f two-dimensional array overhead in the implementation of LU decomposition in C was studied to achieve a more accurate picture o f the processing required by incorporating the array overhead into the analysis of the algorithm. This analysis was performed for two implementations in C, one using the twodimensional array features o f the language for the coefficient matrix, and the other using pointers. It was found that the cost associated with array access was substantial. The use o f pointers allowed a significant reduction of this cost.
REFERENCES Johnston, R. L., Numerwal Me/hods - A Software Approac/1. Wiley, 1982, pp. 28-44 2 Horowilz, E. and Sabra, S., Fundamentals o f Data SI/ltcturc~. Computer Science Press, 1976, pp. 62-66 3 Kernighan, B. and Ritchie D., The C Progrannning Language. Prentice-Hall 1978, p. 104 4 Cooper, D., S/andard Pascal - User Reference Manual. Norton, 1983, pp. 107-112 1
APPENDIX #include #include /* ** **
** ** ** **
*/
This file contains routines for solving an nxn system of linear equations using LU decomposition. The implementation is based on material in "Numerical Methods - A Software Approach" by Johnston. Function LUDecompose performs the decomposition step, LUSolve solves a system given a particular right hand side vector b, and LUDetermi~_nt will return the determinant of a matrix which has been decomposed to LU form.
static int *pivot = NULL
#define
12.6
SMALLEST PIVOT
;
/* Pivot vector. 1.0e-5
*/
/* Smallest allowed non-singular pivot.
Adv. Eng. Software, 1988, Vat. !0, No. 3
*/
int L U D e c o m p o s e ( a m a t , double
*amat
n, n u m c o l s )
;
int n, n u m c o l s
;
/* ** **
This function m a t r i x 'amat'
**
Parameters
p e r f o r m s t h e L U d e c o m p o s i t i o n step. The coefficient w i l l b e o v e r w r i t t e n w i t h t h e L U m a t r i c e s in t h e p r o c e s s .
:
amat
- T h i s is t h e c o e f f i c i e n t m a t r i x , w h e r e t h e c o e f f i c i e n t s are doubles. This function performs the row-major mapping itself, d e c l a r i n g 'amat' t o b e a p o i n t e r r a t h e r t h a n a n array. The calling routine can define the actual parameter t o b e either.
n - This numcols
*/
parameter
dtmpl,
/* A l l o c a t e if
(pivot
system
(nxn).
1 = n - 1 ;
*dptrl,
storage
*dptr2
;
for t h e p i v o t v e c t o r .
*/
I= NULL)
free(pivot)
if
t h e s i z e of t h e c u r r e n t
- This parameter indicates the actual defined column dimension of t h e c o e f f i c i e n t m a t r i x . A b i t r e m i n i s c e n t of F O R T R A N .
int i, j, k, n m i n u s double
indicates
;
((pivot =
(int *) m a l l o c ( n
* sizeof(int)))
= = NULL)
( fprintf(stderr, return(-2) ;
"Error
in L U D e c o m p o s e
- malloc
\n,,)
;
) /* I n i t i a l i z e
pivot vector.
for (i = 0 ; i < n _ m i n u s _ l * ( p i v o t + i) = i ; *(pivot
+ n_minus_l)
/* M a i n
loop
for
(i = 0 ;
*/ ;
i++)
= 1 ;
- perform
LU decomposition
i < n_minus_l
;
r o w b y row.
*/
i++)
( /* search k=
for largest
pivot
*/
i ;
dptrl = dptr2
/* k will be pivot = a m a t + i * (numcols + 1)
f o r (j = i+l ; j < n ; ( dptr2 += numcols ;
;
row.
*/
/* These point to the /* pivot element.
*/ */
j++)
Adv. Eng. Software, 1988, Vol. 10, No. 3
127
if (
(fabs(*dptr2)
/* C u r r e n t
> fabs(*dptrl) )
d p t r l "- d p t r 2 kJ ;
r o w is n e w p i v o t
row.
*/
;
}
/* k n o w i n d i c ~ t u row conta~nlng lar~mt /* a n d d p t r l p o l n t s to the p i v o t element. if (
(fabs(*dptrl)
< SMALLESTPIVOT)
fprintf(etderr, return(-l) ;
,Error
pivot
/* M a t r i x
in L U D e c ~ p o s e
*/ */ is s i n g u l a r .
- matrix
*/
is s i n g u l a r
\n,,)
;
} if
(k g- i)
/* I n t e r c h a n g e
rows
i and k, u p d a t i n g
p i v o t vector.
*/
{ *(pivot *(pivot dptrl dptr2 = for
+ i) = k ; + n_minus_l) = - *(pivot amat + i * numcols ; amat + k * numcols ;
(j - 0 ;
j < n ;
j++,
+ n minus_l)
dptrl++,
;
dptr2++)
( dtmpl - *dptrl ; * d p t r l ~ *dptr2 ; *dptr2 - dtupl ;
}
/* A t t h i s point, t h e l a r g e s t p i v o t has b e e n f o u n d a n d r o w i */ /* c o n t a i n s t h i s p i v o t value. T h i s n e x t loop p e r f o r m s t h e */ /* e l i m i n a t i o n s t e p on rows (i+l) t h r o u g h (n-l) */ for
(j = i+l
;
j < n ;
j++)
{ d t m p l = * ( a m a t + j * n u m c o l s + i) / *(amat + i * n u m c o l s dptrl = amat + j * numcols + i + 1 ; dptr2 = a m a t + i * n u m c o l s + i + 1 ; for (k = i+1; k < n ; k++, dptrl++, * d p t r l -- d t w p l * *dptr2 ; *(amat + j * numcols
+ i) = d t m p l
+ i)
;
dptr2++)
;
) return(l)
;
void LUSolvs(amat, double
*amat,
b, n, numcols)
*b
int n, n u m c o l s
;
;
/*
**
This ful~tion
**
hand side vector
**
by the function
**
Parameters
** ** **
amat
**
128
solves 'b'.
a system of e(fumtione given a ~rtl~llar The ooe£flcient
LU-~,prior
Bm~rlx
to calling
'aBst'
~
ri~ht
b e de~.x~pomed
LUSolve.
: - T h i s is the LU d e c ~ R s i t i o n o f t h e c o e f f i = L e n t matrix. The original aoeff~ient m a t r i x m u s t f i r s t b e p a s s e d to the f u n c t i o n L U D e o o m p o s e b e E o r e c a l l i n g t h i s routine.
Adv. Eng. Software, 1988, Vol. 10, No. 3
**
b - T h i s p a r a m e t e r is t h e r i g h t h a n d s i d e v e c t o r b. This routine w i l l c o m p u t e t h e s o l u t i o n v e c t o r a n d r e t u r n t h i s r e s u l t in t h e p a r a m e t e r 'b'. N O T E - b W I L L BE O V E R W R I T T E N .
** ** ** **
n - This
parameter
indicates
the size of the current
system
(nxn).
** **
numcols
- This parameter indicates the actual defined column dimension of the coefficient matrix. A bit reminiscent of FORTRAN.
**
*/ ( double
dtmpl,
*dptrl
int i, J, n _ m i n u s /* P e r f o r m
1 = n-i
;
the row interchanges
for (i = 0 ; ( j = *(pivot if
;
i < n_minus_l + i)
;
recorded
in 'pivot'
to v e c t o r
b.
*/
i++)
; /* I n t e r c h a n g e
(j I= i)
element
i a n d j.
*/
overwrite
b.
( d t m p l = * ( b + i) ; * ( b + i) = * ( b + j) *(b + j) = d t m p l ;
;
)
/* S o l v e t h e s y s t e m
L d - P b f o r d.
for
;
(i = 0 ;
i < n
Vector
d will
*/
i++)
( dtmpl
= *(b + i)
;
f o r (j = 0, d p t r l = a m a t + i * n u m c o l s d t m p l - = * d p t r l * *(b + j) ; • (b + i) = d t m p l
/* S o l v e for
j < i ;
j++,
dptrl++)
;
t h e s y s t e m U x = d f o r x.
(i = n m l n u s _ l
;
;
i >= 0 ;
Vector
x will
overwrite
b
(and d).
*/
i--)
{ d t m p l = * ( b + i) ; dptrl = amat + i*numcols for
(j = n _ m i n u s _ l
;
+
n
-
1 ;
j--)
j > i ;
{ d t m p l -= * d p t r l dptrl-- ;
* *(b + j)
;
} * ( b + i) = d t m p l / * d p t r l
;
)
double
LUDeterm~nant(amat,
double
*amat
n, n u m c o l s )
;
int n, n u m c o l s
;
Adv. Eng. Software, 1988, I/ol. 10, No. 3
129
/* ** **
This must
f u m u t l o n w i l l (~mmpute t h e d e t e r m i n a n t of t h e m a t r i x 'amat'. h a v e a l r e a d y b e e n d m e ~ i p e 6 e d to LU form (usa LUDm~ompose).
*/ ( double
det,
*fptr
;
int i ; d e t = 1.0
;
for (fptr = amat, d e t *= * f p t r ; return(det
i = 0
* *(pivot
;
i < n
+ n - I))
;
;
)
130
Adv. Eng. Software, 1988, Vol. 10, No. 3
i++,
fptr +=
(numcols
+ 1))
'amat'