Parallel algorithms for SIMD computers

North-Holland Microprocessing and Microprogramming28 (1989) 85-90 85 PARALLEL ALGORITHMS FOR 81MD COMPUTERS Adrisno J, de O. CRUZ NCE-Universidade F...

Download PDF

417KB Sizes 1 Downloads 246 Views

Report

PDF Reader
Full Text

North-Holland Microprocessing and Microprogramming28 (1989) 85-90

85

PARALLEL ALGORITHMS FOR 81MD COMPUTERS Adrisno J, de O. CRUZ NCE-Universidade Federal do Rio de Janeiro, Cx. Postal 2324, Rio de Janeiro, Brasll.

Abstract-Parallel

Algorithms that manipulate multi-dimensional arrays on: two-dimensional SIMD processor arrays are described here. The definition of these algorithms, which will he called COMPRESS and EXPAND is taken from APL and FORTRAN 8X. These algorithms have already been studied under the constraint that the number of elements ie equal to the number processing elements. In the algorithms presented hero these restrictions are lifted. Therefore the application of these algorithms contributes to transforming array languages into more general use languages.

1. INTRODUCTION The algorithms presented in this work can be applied to at whole class of SIMD architectures that we call processor arrays, sometimes referred in the literature as mesh-connected parallel computers (MCC), A processor array is an array of N:n~n Identical processing elements (PEa). Each PE is connected to its four nearest neighbours and has its own local memory. The PEa execute simultaneously the same instruction, received from a single control unit [1]. One criticism directed against processor arrays concerns the fixed structure to which users have to adapt their applications. Here, we present parallel algorithms that manipulate multidimensional arrays on two-dimensional SIMD processor arrays. In these algorithms the number of dimensions, and also the number of elements in each dimension, is no longer a function of the processor array size. Therefore, the application of these algorithms contributes to transforming array languages into more general use languages. The selection of the algorithms was based on the data handling functions available in two languages: APL and FORTRAN 8X. So, after analysing these languages It was decided to design algorithms to perform the following functions :- RESHAPE, SPREAD, COMPRESS (PACK in FORTRAN), EXPAND (UNPACK in FORTRAN). COMPRESS and EXPAND are presented hero, end the description of the other two can be found In [2]. 2. DATA MOVEMENT STRATEGY The method "Parallel Data Transforms' (PDTs)[3] proposed by Flandore will be used to rearrange data.. A summary of it is presented here. Any bit from an array of data with t dimensions may be identified by giving binary number x k for

each dimension k, and a value xo for the position of the bit within the element. In this way every bit of this array can be associated with a vector I

-

I

1

This array may be mapped on to • block S of storage, with r dimensions, containing the same number of bits. Again, any bit from this block of data can be identified by giving a binary number ,Irk for each dimension. Therefore every bit of this memory can also be associated with a vector y

1

It Is possible to map every element of A into S in I one-to-one basis. The PDT method deals with at subset of the set of mappings, in which 'the vector Y is formed from • particular ordering (or permutation) of elements of vector X or their Inverse. The ordering can be represented by

In Its final shape each. element (m~) of the vector above Is either of the form ab or T& , whore a corresponds to the hit position, nn~ b to the array dimension. Thin representation that defines the mapping of the elements of A into S le called the "mapping vector'. For a 32e32 matrix of one bit data, stored In a 32e32 processor array the mapping vector Is (4232221~02/4131211101). A transformation between mappings may be achieved by successively exchanging pairs of elements in the mapping vector and/or inverting some of them. Each of these operations is associated with a well defined movement of data in store. One form of representing the exchanges is J(i,j;p,q) which defines exchanges of bits at positions m~ and m~. A sequence of exchanges can be represehted by s more compact n o t a t i o n . For example, the sequence IR(2,1;0,t)R(t,1;0,1) is written more compactly as:

A.J. De O. Cruz / Parallel Algorithms for SIMD Computers

86

1

~[i~ 1=2

(i,1;0,1).

3. STORAGE SCHEME The individual elements in each plane of the a r r a y will be stored following the standard c o l u m n - m a j o r storage scheme. No assumption will be made about the m a t r i x dimensions, and t h e r e will be no r e s t r i c t i o n on the number of dimensions. When setting aside s t o r a g e space f o r a given a r r a y of data it is assumed that its dimensions are a p o w e r of 2. For instance, let A i d 1 ,d2] be a t w o - d i m e n s i o n a l m a t r i x and assume that b1 and b2 are the number of bits necessary to r e p r e s e n t dI and d2 respectively. In that case the amount of memory reserved to the m a t r i x will be 2/'t ~2~ . in o r d e r to maintain consistency the planes of data are stored according to the c o l u m n - m a j o r scheme used with individual a r r a y elements. One advantage of this scheme is that It is easy to obtain the coordinates of each array element f r o m the binary r e p r e s e n t a t i o n of its address in the p r o c e s s o r a r r a y memory. It is easy because the mapping function f o r the scheme is, simply, • reordering of the vector elements. The disadvantage is that it can cause the allocation of empty planes of memory. One solution is to compress the a r r a y , removing the planes that are empty. A r r a y s can be kept In compressed f o r m and r e c o n s t r u c t e d only when PDT operations are necessary.

the a l g o r i t h m s . However, in situations where parallel data transforms are Involved the notations presented in section 2 will be used. The following additions have been made to the Pascal syntax, in the d e s c r i p t i o n of parallel actions p e r f o r m e d on all elements along a chosen dimension of the processor array, elided subscripts will be used. A masking operation will be indicated by the word Where followed by a boolean expression whose r e s u l t is a boolean a r r a y of dimensions nfn. A( , ) := B( , ) + C( , ) Where Mask( , ) ; The expresslon Indlcates that the m a t r i x A wlll receive the result of the addltlon of the m a t r i c e s A and B, only in PEa w h e r e Mask Is True. An extended f o r m of condltlonal control Is used whenever It Is necessary to reduce an a r r a y of :boolean values to a slngle result, as Indlcated In the example below. If Any Mask( , ) = True Then t := t + 1 Else s : : s + 1 ; The notation lb. c will denote the binary number ibib_ I ...I c and )nj will be used to indicated t w o binary numbers concatenated. The complexity of algorithms wlll be measured in t e r m s of PE tlme and the number of slngle route o p e r a t i o n s . In the n e t w o r k defined f o r the MCC model, a single route operation occurs when data Is t r a n s f e r r e d between nearest nelghbour PEa. We assume that operations like addition and single routing operations can be p e r f o r m e d In 0(1) time. &.l. LINEARISE AND UNLINEARISE

4, THE SUBALGORITHMS The a l g o r i t h m s to be p r e s e n t e d later are implemented in steps that we call 'eubalgorithms' which will be described now. First, let us Introduce some definitions. When operating w l t h a r r a y s it will be more useful to know the relatlve posltion of elements In t e r m s of r o w , column and plane than by coordinates. In o r d e r to Identlfy an a r r a y element the standard array element reference method A(i,j,k) (O~i,J
syntax will be used to describe

LINEARISE performs exchanges between elements of the c u r r e n t mapping v e c t o r until elements belonging to the same dimension are groupe~L t o g e t h e r . Let us consider an m a t r i x A[2=1,2 "2] s t o r e d in n•n (n=2 e) p r o c e s s o r a r r a y , and assume that b1 and b2 are g r e a t e r than e. Due to the s t o r a g e scheme used (hi-e) elements of the mapping v e c t o r describing the f i r s t dimension are placed 2 * e positions to the left as indicated in v e c t o r 1. ((b2-1)2...e2(bt-1)l...et/(e-1)2...1202/(e-1)1...1101) (I) A f t e r p e r f o r m i n g the necessary exchanges the elements are moved to the required positions and the mapping v e c t o r 2 is obtained.

((b2-1)2...e2(e-1)2.../...|202(bl-1)l...el/le-1)l...1101) (2) The application of LINEARISE f a c i l i t a t e s the generation of coordinates of individual elements f r o m the Index of the cell w h e r e they are stored because It groups the bits representing each coordinate. As an example of the exchanges in the mapping v e c t o r , consider an a r r a y A [ 6 , 4 ] s t o r e d in a 4 • 4 array. The sequence of exchanges that can be used to t r a n s f o r m the original mapping v e c t o r into the Ideal mapping v e c t o r f o r A are:

A.J. De O. Cruz / Parallel Algorithms for SIMD Computers original vector 1st exchange

(20/1101/10 00 )

2nd exchange final mapping vector

( I ~ 120/IO00) ( 1 1 / 0 1 2 o / l o 0 o)

If bl:~ 2 I s then the operation is performed in two phases. The f i r s t one consists of (b I - 2 * e ) exchanges. The description in compact notation is:

(

LINEARISE can be performed by cyclically shifting to their final position elements that are separated. Cyclical shifts can be executed by exchanging successive vector elements of the group to be shifted with their leftmost element. For the general case, consider an array A[2ble2~e...e2 bt ] stored in an ntn (n=2 u) processor array, and assume that b,~2ee. Using the more compact notation LINEARISE Is described by: e-1 l)1l J-e-1 ( :~m]B-(j'2;m'3) J m=O When expanding the that if an index is then no exchange indices are equal to

m~- I ]B(k.3;m,3)) k=-m-e

87

.

formula above, it is assumed greater than the upper limit is performed, and negative 0.

Reference [ 2 ] shows that, for the case above where ~ :k2ae, the upper bound on the time spent routing data during LINEARISE is O(planeB~nolog2n). The variable planes contains the number of cell planes occupied by the array. Its value is given by planes=(2 bPbc÷'''~bt )/(non). UNLINEARISE can be considered the inverse of LINEARISE, l.e. an array that Is Iinearlsed has to be restored to Its original form. Therefore, vector 2 is the starting vector and ! is the final one. UNLINEARISE can be executed by cyclically shifting groups of elements of the mapping vector. This implementation is similar to the one used in LINEARISE. The difference is that the shifts now are performed in the opposite direction. For example consider the same array A[6,4] used in LINEARISE. The following exchanges Illustrate how the shifts in the opposite direction are performed.

i=I

" 11

J = 1

RCbl-e-J-i+l, 3; bt-2oe-I, 3) ).

The remaining • stages pha=a are described by:

composing the second

i-1 o

~=

i(/

~=iB(e-J, 3; e-I, 2)

11I i s ( e - k ,

2; e-i, 2))

k:!

A call to UNLINEARISE will be Indicated by UNLINEARISE(A). Reference [ 2 ] shows that the time spent routing data during this algorithm is

O(2aplanem~*n~log2n). 4.2.

GLOBAL_RANK

Lot us define the global rank of an array element as the number of selected array elements with a smaller index. Consider two arrays A and L containing M elements (L is a logical array and must be conformable with A). Lot us assume that j elofnonts of L (0::jcM) are True, and the rest are False. The selected elements of A are the ones for which the corresponding element of L is True. GLOBAL_RANK stores the global rank of each selected element of the array A in its associated tag field. As an example consider the 5~3 array stored in a 3~3 processor array Illustrated in figure I, where an asterisk over the element indicates a selected element. Figure 1 shows the ranks of the selected elements, it should be noted that due to the storage scheme used elements of consecutive ranks may appear in different p l a n K . CeJI plane 0 A'(0) F

K

Cell Plane 1 D I

N

B

G

L•(3)

E~'(1)

J

O

C

H*(2)

M

-

-

-

Figure 1 Subalgorithm G l o b a l _ Rank (starting vector) 1st exchange 2rid exchange (original vector)

(~1/0120/1000) (20/01!11/1000 ) (20/1101/1000 )

For the general case, let us again consider the array A[;2bl ~,262 e...t2bt], stored in a n t n processor array. In order to give the compact notation describing UNLINEARISE it is necessary to consider two c a n s . If 2~*e>bl>e then the operation of the algorithm can be described by: b1 -e

,b 1 -e

t j_n=J(b,-e-j, 3; b,-e-m, 2) 2me-b 1 +m-1 kH 1 ]B(e-k,

2; b,-e-m,

2)).

The algorithm RANK presented in [ 4 ] is the operation used to find the local rank of elements within a piano. Assume that LINEARISE was applied to the array L. The global rank of each element from A is then equal to its local rank plus the value of maximum rank of the preceding plane increased by one. The procedure presented below calculates the global r a n k s . Procedure GLOBAL_RANK ( A, L ) ; / * GLOBAL VARIABLES : pianos: planes occupied by A; Tag( e, ,planes): resultings ranks; e : 2 =n number of PEs per dimension. BEGIN M := 0 ;/m Maximum rank in each plane FOR p :: 0 To ( planes - 1 ) DO BEGIN

./ ./

A.J. De O. Cruz / Parallel Algorithms for SIMD Computers

88

SEL( , ) : e L ( , ,p) ; / * s e l e c t e d e l e m e n t s * / R A N K ( 2 s e ) ; / ~ Ranks elements plane p. t / Tag( , ,p):=H( , )+M;/*Store results. */ M := MAXIMUM ( H( , ) ) W H E R E L( , ,p) ; Ms=M+1; END ; END ; / * of GLOBAL_RANK m/ A d e s c r i p t i o n of the function MAXIMUM is presented in [ 2 ] . The p r o c e d u r e GLOBAL_RANK Involves arithmetic and routing operations p e r f o r m e d within the RANK operation. According to [ 4 ] the number of a r i t h m e t i c operations in RANK is O ( 2 * e ) end the number of single routing operations is O ( 4 , ( n - 1 ) ) . So in GLOBAl_RANK the t o t a l time spent in the PE is equal to O(2*eoplanee) and the number of single routing operations is O(¢*planest(n-1)). 4.3. CONCENTRATE AND DISTRIBUTE These a l g o r i t h m s are presented in [ 4 ] . They will be used to r o u t e a r r a y elements to the positions defines Dy the Tag field, within the planes of PEa. Examples of these a l g o r i t h m s ere presented here. As an example of CONCENTRATE consider an v e c t o r As(a, b*, c, d, ~ f, gl, h) w h e r e the a s t e r i s k s indicate selected r e c o r d s . The ranks of the selected elements are ( - , O, - , - , 1, - , 2, -). The r e s u l t of CONCENTRATE is A-(b, e, g, -, -, -, -, -) The p r o c e d u r e DISTRIBUTE is the reverse of CONCENTRATE. As an example, consider the a r r a y s A=(a, b, c, d~. , , -) and Tag[lie(O, 1, 3, 7, - , - , , - ) . - - h e result of DISTRIBUTE Is As(a, b, - , c . . . . d). The PE t i m e in these operations is O ( 2 * o ) and the number of single r o u t e operations is O ( 4 * ( n - | ) ) [4 ]. S. DESCRIPTION OF THE ALGORITHMS 5.1.

COMPRESS

First, let us p r e s e n t a definition of the function COMPRESS. Lot A and L be t w o a r r a y s of t dimensions containing M elements. L is a logical a r r a y end must be c o n f o r m a b l e w i t h A . Let us assume that j elements of L (0 ~j~M) are t r u e , and the r e s t is false. The function COMPRESS c r e a t e s a v e c t o r V containing j elements, by selecting all e l e m e n t s of A f o r which the corresponding element of L is t r u e . The chosen elements of A will appear in the v e c t o r V in the same s u b s c r i p t o r d e r that they are s t o r e d in A. The operation will be denoted by COMPRESS(A,L,V). As an example assume the a r r a y s A [ 3 , 3 ] end L [ 3 , 3 ] (figures 2.a and 2 . b ) . In the a r r a y L, 1 and 0 are used to indicate TRUE and FALSE respectively. Figure 2.c shows the v e c t o r V generated by the operation COMPRESS(A,L).

A D B E C F (a)Array

G H I A.

0 0 0 0 I 0 1 0 1 (b) A r r a y L.

C E I (c)Voctor

V.

Figure 2 A l g o r i t h m COMPRESS Let us assume a r e c o r d R containing the arrays A, L and Tag. Initially, COMPRESS r e a r r a n g e s R by applying LINEARISE. This puts the elements of A, which eventually will be moved to V, in the c o r r e c t o r d e r . The relative position in V of each selected element of A is then calculated by GLOBALRANK that leaves the results in Tag. Then, the process of routing the elements to t h e i r final destination position s t a r t s . The f i r s t WHILE I s . p , beginning f r o m plane 0, looks f o r the f i r s t element of A that will have to be moved to plane 0 in V. Once an element is found this loop is t e r m i n a t e d end the next WHILE loop is entered. The f i r s t t e s t within the second WHILE loop selects all elements in the c u r r e n t plane of A (indicated by the variable s) belonging to the c u r r e n t t a r g e t plane of V (variable q). The t e s t is p e r f o r m e d by comparing the variable q with the set of bits f r o m the Index of the element s t o r e d in Tag that indicates the plane. This set of bits is obtained by dropping the 2 * e least significant bits (indicate r o w and column) f r o m the Index and considering only the remaining ones. These selected elements are moved to the r e c o r d G, so that the p r o c e d u r e CONCENTRATE can route them to their c o r r e c t position within the plane of PEa. Finally they are t r a n s f e r r e d to V. Since each plane of A may contain elements of more than one plane of V another test is p e r f o r m e d . If elements belonging to the next plane of V are found, then the variable q will be i n c r e m e n t e d , end the loop is r e s t a r t e d . If no more elements are found the search has to move to the next plane of A (variable s is Incremented). Note that In this last case q is not incremented because more elements belonging to the c u r r e n t plane of V might still be found. P r o c e d u r e COMPRESS ( A, V, L ) / * GLOBAL VARIABLES: : Iog_n ( n * n : a r r a y size; num~ber of dimensions of A; b [ t ] : b l t s necessary to r e p r e s e n t dim of A; p l a n e s : planes occupied by A. */ BEGIN s := 0 ; * plane in source a r r a y * / q := 0 ; / * plane in compressed a r r a y t / G.Tag( , ) : = ~ ; / * u s o d by CONCENTRATE*/ bb=br 1 ] + b [ 2 ] + . . . + b [ t ] ; IF b i l l > • THEN LINEARISE ( R ) ; GLOBAL RANK ( A, L ) ; WHILE ('NOT (ANY ( T a g ( , , S ) ( b b _ l ) : ( 2 , e ) = q))) AND (s < planes) DO s := s + 1 ; WHILE s < planes DO BEGIN IF ANY (Tag( , ,SXbb_l)..(2~, e ) = q THEN BEGIN

A.J. De O. Cruz / Parallel Algorithms for SIMD Computers G( , ) :: R( , ,s) WHERE (Tag( , ,SXbb_l),(2ue))=q; CONCENTRATE (2as) : V( , ,q) :" G.A( , ) WHERE G.Tag( , ) <> 00 ; G.Tag(, ) :=O0 ; / o invalid addresses t / END ; IF ANY (Tag( , ,u) ( . - 1 ) : ( 2 - . ) ) = ( q+l ) THEN q :: q + 1 ELSE s := • + 1 ; END ; / o of While o / END ; / e of COMPRESS * / The complexity of COMPRESS can be obtained by adding the complexities of LINEARISE, GLOBAL_RANK and the time spent in the WHILE loops (table 1). Note that et most two destination planes can exist in each plane of A. Ais0, note that the f i r s t plane of A can only have one destination plane. Thus, considering the w o r s t possible case, the first loop is executed once and the second one (2~planes-1) times. Table 1

Complexity of COMPRESS.

OPERATION LINEARISE GLOBAL_RANK WHILE loop

ROUTING

PE

planei~nHog2n 4*p/anew(n-I) 4~2ep/anes-1)t(n-1)

2e~p/anes 2edd2epbnes-1)

So the time spent in the PE is O(2eeo(3eplanee-1)) and the time spent routing data is given by

O(planoaenOlog 2 n + 4e(n-1)o(3*planeR-t)). If we compare results obtained above with the time complexity of CONCENTRATE, we will note that the time spent in the PEs was increased by • f a c t o r that is proportional to number of planes occupied by the array. The time spent routing data exhibits an e x t r a overhead introduced by the conditional LINEARISE operation. It should be noted that the time complexity of CONCENTRATE assumes that the array is already ranked. 5.2.

EXPAND

Lot A, V and L be defined as in function COMPRESS. EXPAND(V,A,L) creates an array A from the vector V by assigning the successive elements of V, in subscript order, to elements of A whose corresponding elements of L are true. The algorithm ie best described by considering an example. In the example we have an 3 by 3 processor array, a vector V containing ¢ elements (fig 3.e) and an array L [ 3 , 3 ] (fig 3.b). EXPAND f i r s t linearises the array L, which in this came is already in the final form. Then, an array B is crested (fig 3.c) containing the addresses of the ceils (standard array reference method, see section 4) to which the elements of V will be moved. In the next step, COMPRESS is used to move these addresses,to the Tag field associated with each element of V (fig 3.d). Then the process of routing the elements to their final destination starts. Each plane of V is searched in sequence for elements belonging A. This search process is similar to the one executed in

89

COMPRESS. In the example(fig 3.d) elements are moved to G and then DISTRIBUTE is executed. The final result is shown In figure 3.e. A

D

-

B C (A)Vector V

I

1

0

0

0

O

D/2,2 . . C/1,0 (d) COMPRESS

A .

-

2,0 2,2 (c) A r r a y B

.

1,0 -

I 0 1 (h) Array L

A/0,O B/2,0

0,0

C

-

. B D (e) Result

Figure 3 Algorithm EXPAND Pracadure EXPAND ( V, A, L ) / e GLOBAL VARIABLES: i : index of each PE: • : t a g 2 • L i s m i'm Cks a,r~jf aide; t . number of dimensions of A-bet] : bits necessary to represent dim of A; pianos : planes occupied by A; planes_V : planes occupied by V; default :variable of the same type of V. t / BEGIN q := 0 ; / e plane in expanded Array t/ s :: 0 . ; / e plans In source vector V t/ bb := b [ 1 ] + b [ 2 ] + . . . + b [ t ] ; FOR p :- 0 TO pianos DO B( , ,p) : - { p s i } ; COMPRESS (B, Tag, L) ; WHILE s ( planes_Y DO BEGIN IF ANY (Tag( , , S X b b _ l ) . ( 2 • e ) ) a t THEN BEGIN G( , ):= R( , ,s) WHERE Tag( , ,S)(bb_l):(2,Q)= q ; DISTRIBUTE (2*o) ; A( , ,q):=G( , ) WHERE G.Tag( , ) <> ~ ; A( , ,q) := default WHERE G.Teg(, ) =~; G(, ).Tag :=¢0 ; END ; IF ANY ( Tag( , , e ) ( a - l ) : ( 2 e • ) )=(q+l) THEN q := q + I ELSE = : a s +1 ; END ; / e of While e / UNLINEARISE(A) ; END ; / i of EXPAND t / The complexity of EXPAND can be obtained by adding the complexities of COMPRESS and the time spent in the WHILE and FOR loops (table 2). Considering the w o r s t possible case, the WHILE loop is executed (planoe+planeLV-1) times. Table 2 Complexity of OPERATION COMPRESS WHILE loop ~NEARISE OPERATION FOIl loop COMPRESS WHILE loop

EXPAND.

ROUTINi~ p / l ~ w~,g2n~l.t( n..1) o( 3~.e~nes- 1) 4'~(p~ane~p/anem_V-.e(n.-1) 2*/~mem~tmlog2n PE pkme O(2e4m(3.p/anu-1)) 2ee e(/danem+p/anem_V- 1) )

90

A.J. De O. Cruz / Parallel Algorithms for SIMD Computers

The total time spent in the PEa is O(planes+4eee(2eplanes-1)+2eeeplanes_V). The time spent routing data is given by O(3oplanesmnmlog2n÷4(n-1)(4~*planes+planes_.V-2)~ The results obtained above show that the time complexity of EXPAND, when compared to DISTRIBUTE, suffers an increase in time complexity that is still proportional to the number of planes. 6. CONCLUSIONS In this work a scheme to store arrays that allows the implementation of efficient algorithms was proposed. Also algorithms to manipulate multi-dimensional arrays in two-dimensional processor arrays were developed. Earlier array languages Introduce constraints when processing arrays. The algorithms presented here lift these restrictions. The delays introduced by those algorithms are proportional to the number of planes occupied by the array. There is also an additional overhead in preparing the data for processing. REFERENCES [1] Jesshope C R, Rushton A J, Cruz A J O, and Stewart J M (1987), "The Structure and Application of RPA - A Highly Parallel Adaptive Architecture", in Highly Parallel Computers, ads G L Reifins and M H Barton, Elsevier Science Publishers, pp 81-95. [ 2 ] Cruz A J O (1988) "The Design of a Control Unit and Parallel Algorithms for a SIMD Computer", Ph. D. Thesis, Southampton University. [ 3 ] Flanders P M (September 1982), "A Unified Approach to a class of Data Movements on an Array Processor", IEEE Transactions on Computers, vol. C-31, no. 9, pp 809-819. [ 4 ] Nassimi D and Sahni S (February 1981), "Data Broadcasting in SIMD Computers", IEEE Transactions on Computers, vol. C - 3 0 , no. 2, pp 101-107.