Parallel algorithms for control

Parallel algorithms for control

Control Eng. Practice, Vol. 1, No. 4, pp. 635-643,1993 0967-0661/9356.00+ 0.00 © 1993PergamonPressLtd Printedin GreatBritain.All rightsreserved. PA...

612KB Sizes 7 Downloads 251 Views

Control Eng. Practice, Vol. 1, No. 4, pp. 635-643,1993

0967-0661/9356.00+ 0.00 © 1993PergamonPressLtd

Printedin GreatBritain.All rightsreserved.

PARALLEL ALGORITHMS FOR CONTROL G.W. Irwin Control Research Group, Department of Electrical and Electronic Engineering, The Queen's Universio' of Be#~t~t, Belfast B7x) 5AH, UK

Abstract Two classes of concurrent algorithnl for real-time Kalman fihering are presented in order to illustrate some algorithm engineering concepts. One is based on systolic computation, and fine-grained algorithms are derived for both regular and square-root covariance Kalman filtering. The second relies on an analogue flow diagram description of the differential equations for continuou.s-time, KalmanBucy filtering. Implementations on Inmos transputers are described in both cases. Kevwords Control engineering, Computational methods, Kalman filters, Systolic arrays, Analogue flow diagrams, Transputers. The remit of this paper was to introduce some recent research themes on parallel algorithms for control. Examination of the literature shows that effort in this area falls into the two broad catagories of CACSD and vealtime control. The former is motivated by the need to produce efficient, reliable and portable tools for many of tile common control design procedures. Thus Laub and Gardiner (1988) describe parallel algorithm implementations for a 32 processor Intel iPSC Hypercube, suitable for the analysis and control design of large scale systems with state dimensions of several hundreds or thousands. Other work is motivated by a desire 1o accelerate workstation based design tools, like Matlab, by producing parallel algorithms for computationally intensive bottlenecks. Thus Megson (1990) identified key mathematical techniques in H" design, like Singular Value Decomposition and implen~euted paral!el solutions on transputers. Fleming et al (1992) used transputers to calcuhtte the objective function in multiobjective optimisation.

1. INTRODUCTION Advances in automatic control, both theoretical and practical, have been stimulated by corresponding developments in computer technology. The increasing use of digital computers in the 1960s space programme was accompanied by advances in state space control theory while, more recently, the introduction of low-cost microprocessors facilitated widespread application of digital control loops and a conesponding development of adaptive control techniques to exploit the new teclmology. In each case the technology helped the transfer of advanced control theory into practical applications and. at the same time, stimulated further theoretical advances. Parallel processing is likely to have the same impact as earlier developments in computer technology. This was recognised by the Institute of Electrical and Electronic Engineers (IEEE, 1987) in a strategy document v,hich attempted to predict significant future trends and identify areas where effort was needed. Increased computational speed is of course the primary benefit of parallel processing, particularly for real-time control. This allows faster systems to be controlled and gives the control engineer the choice of added complexity in the contrt~l algorithm. Easy expansion, within a uniform hardware and software base, is another feature of coucurrent control systems, since it is possible to add more processors as required, with significant implications for reduced development and maintenance costs. Parallel processing also offers a natural relationship between the control system description at the design stage, in the form of a block diagram, and the final hardware implementation. Finally fault tolerance can be realised in a parallel control system by organising the computation in a distributed sense, allowiqg performance degradation rather than a complete breakdown in the event of processor failure. These advantages have been recognised and an increasing number of successful implementations of concurrent controllers are being reported (Irwin and Fleming, 1992; Rogers and Li, 1992).

While CACSD involves off-line computation, on-line control is characterised by tile time critical nature of tile control signal calculations. It was tile need for systematic procedures for mapping control algorithms onto concurrent architectures suitable for the implementation of real-lime controllers which motivated the work to bc presented below. Taking real-time Kahnan filtering as a unifying theme, the paper presents two approaches to parallel algorithms. One is based on systolic arrays for finegrained matrix calculations, the other is suitable for continuous time algorithms and follows trom analogue flow diagram representations of a system. Reference is made to real-time transputer implementations in each case and an attempt is made to unify the results by discussion of algorithm engineering concepts. 2. SYSTOLIC ALGORITHMS FOR MATRIX CALCULATIONS

A systolic array (Kung and Leiserson, 1978) is an array of 635

636

G.W. Irwin Oz.&

Xa •

X3

tilL

Q2; •



Q13

(13l.



Q&3

• (323

(332 •

• (332

Qz,2 •



(322



(331



O12



Q21











Qll







~

b,+





~

Qi

QZ.1

bo Ci

Co

~ Oo

--,tl

i

-

~ y 2

Co=Ci *

Oi

bl

~

Fig. 1. Linear array for systolic matrix-vector product Q3/,.

G33

(32/,

032

(323

(314

(331

Q22

(313

(12'[

(312

(311

Gt.1

Q32

Q23

1331

(3 22

(313

(221

Q 12

Qlz,

(311

C21

Cll

C22

C12

Fig. 2. Orthogonal array for systolic matrix-matrix product individual processing cells, each of which has some local memory, and is connected to its nearest neighbours in the form of a regular lattice. The term "systolic" refers to the rhythmic movement of data across the array which is analogous to the regular pumping action of the heart. Figure 1 shows a linear array of inner step processors for the matrix-vector product v=Ax. The y,, initialised at zero. are clocked from left to right and :tccumulate the necessary pa,tial products formed by tile inte,action of the x~ and a,i data streams. File extension to an orthogonal array for a matrix-matrix product C=AB, as shown in Fig. 2, is relatively straightforward. The original concept was proposed m order to exploit the high speed switchi,lg potential of VLSI technology in fast, real-time, matrix-based, signal processing applications. Systolic structures thus combine the necessary concurrent processing with simplicity of circuit design through cell replication across the silicon. Systolic arrays can also be regarded as descriptions of fine-grained, concurrent algorithms. Thus, Fig. 3 illustrates a triangular array for the QR decomposition of a matrix A, using Given's rotations. Tile boundary cells calculate tile rotation parameters which are passed to the internal cells which operate on the a,,.

3. SYSTOLIC ALGORITHMS FOR DISCRETE KALMAN FILTERING The basic systolic arrays above have been used to construct architectures for more complex computations, Jt is useful to omit the details of the cell descriptions, and to associate the function of the array with its geometrical

::+Z Xin

XouI

x (~

*

c,$

mout

= Sin.

xln

Cin.

m

mout

:= (m 2 ÷ x 2 )tl2

Xo~

= gin.

Xin - Sin.

rl'}

C

=

m/moui

S

:=

X/mout

Cout

=

Sour

= 5in

Cin

Fig. 3. Triangular array for systolic QR decomposition shape, in order to gain an understanding of the algorithms involved. Tiffs section will describe algorithms for covariance Kalman filtering. This is a key technique in both signal processing and modem control and is generic, in the sense that the algoritlun is closely related to LQ optimal control, to least squares parameter estimation, to parameter tracking and to adaptive control. Thus tile VLSI algorithnls developed here are more generally applicable. The discrete-time system is described by: E(k÷l) = A(k)E(k) + B(k)E(k ) +w(k)

(1)

~(k) = C(k)~(k) + z(k)

(2)

where w(k), v(k) are zero mean, independent, Gaussian white noise sequences with covariance matrices of W(k) and V(k) respectively. Tim optimal state estimation problem is to estimate the nxl state vector x(k) fiom a sequence of noisy mxl

637

Parallel Algorithms for Control measurements (m(-n), in such a way that the mean-sqUare estimation error is mmimised.

known matrices and the matrix quantity D+CA"B is required. The compound matrix

3.1. Reo.qular Covariance Filter (RCF)

,9,

The RCF algorithm updates the estimated state vector in two stages, prediction (or time update) followed by fiherillg (or measurement update) as follows:

is first formed and a linear combination of row 1 added to row 2 as follows.

Prediction or Time update

(10)

~(k+I/k) = A(k)~(k/k) + B(k)t_t(k)

(3)

P(k+l/k) = A(k)P(k/k)AT(k) + W(k)

(4)

-C+WA

Now, if W is chosen so that the (2,1) element is null

Filtering or Measurement update

D +CA -IB

V.(k+l) = C(k)P(k+l/k)C~(k) + V(k)

(5)

K(k+ I ) = P( k+ 1/k )CT(k+ I )V~4 (k+ 1)

(6)

A

D+WB

ie. tile (2,1) element is reduced to zero by simple row manipulation, then the required result appears in the (2,2) position. For systolic computation it is convenient if the matrix A is triangularised first, prior to nullification of C.

A

_x(k+l/k+l)=x(k+l/k) + K(k+l)[~(k+l) -C(k+l)_~(k+ 1/k)l

(7)

P(k+l/k+l) = P(k+l/k) - K(k+l)C(k+l)P(k+l/k)

(8)

DIL-c

Although the matrix based nature of this algorithm means that it is computationally intensive, with O(n 3) calculations to be performed in each iteration, it is therefore ideal for the application of systolic arrays.

F

TB']

triangularisation

TA

nullification

33-

B o u n d a r y cell

I n t e r n a l ceil

X

Xin

~ c , s

Cin Sin

Mode 1

mout ; :

~

+ x2

~Coul Your

Saul

moul i : Sin. Xln + Cin. m

C :m/maul

Xout ; : Ctn . Xin - Sin. m

S : : x /maul

Cout

Z: Cin

Sour i : 51n

Mode 2 -

mou ~ = m

nullification

S i: X/maul

TB

The Fadeev algorithm call be performed on a trapezoidal anay as shown in Fig. 4, which consists of the basic uiangular array for operating on A, together with a rectangular section for performing the operations required on B. Note also the dual mode of operation of the cells.

The approach taken employs the Fadeev algorithm for calculating the Schur complement of a matrix A (Fadeev and Fadeeva, 1963). Thus suppose A,B,C and D are

- triangularisotion

(11)

mou ~ ;= Ill Xoul

i= X i n -

Cout

:= Cin

Soul

:= Sin

Fig. 4. Trapezoidal array for systolic Schur complement

Sin .m

638

G.W. Irwin Table 1 RCF using the Fadeev Algorithm

Step

A

B

C

D

Result

1

1

_~(k/k)

-A(k)

B(k)t2(k)

~(k+l/k)

2

1

P(k/k)

-A(k)

0

A(k)P(k/k)

3

1

Al(k)

-A(k)P(k/k)

W(k)

P(k+l/k)

4

1

CI(k )

-P(k+ 1/k)

0

P(k+ 1/k)Ct(k)

5

1

P(k+ 1/k)Ct(k)

-C(k)

V(k)

V~(k+ 1)

6

V.(k+l)

l

-P(k+ l/k)C~(k+l )

0

K(k+ 1)

7

I

[p(k+l/k)Cl(k+l )IT

-K(k+l)

P(k+l/k)

P(k+l/k+l)

8

1

2(k+l/k)

C(k+l)

z(k+l)

zxz(k+ 1)

9

1

zx_z(k+1 )

-K(k+l )

2(k+ 1/k)

~(k+l/k+l)

Now tim compotmd matrix above is 'programmable' in tile sense that, by selecting appropriate values for B,C and D, a range of algebraic mat,ix expressions can be generated by the Fadeev algorithm. Yell (1988) realised that the trapezoidal array could be used for generating the matrix terms ,eqt, ired for a systolic Kahnan filter. Table 1 defines the data required for tile RCF (Gaston and Irwin, 1990a). Analysis of the computation involved with this systolic filter shows that O(3/2n-') systolic cells are needed to reduce tile timesteps between measurements to O(9n). However, the systolic filter has some undesireable features. Each iteration invol~es 9 steps, with feedback of intermediate results. Also, closer exalnination of Table 1 shows that the A matrix is an identity for eight of those steps, implying that implicit inversion of all I matrix is "3 occurring. Further, steps ,,4.6 involve addition of a null matrix. Essentially redundant\ has been added to facilitate parallel pro~:essing.

V~.I / . V~."l,. = C(k)P(k/k-l)C(k)r+V(k) The systolic algorithm is derived from the basic arrays in Figs 1-4. Tile orthogonal decomposition required in equation (12) can be canied out as shown in Fig. 5. This illustrates tile data to be entered and the results stored in memory. Since tile order in which the data is entered does not affect the final result, the W(k) T'~and V(k) TM terms can be preloaded.

WTI2 ( k ) /

o

/-

VTj2 (k)

~

/

3.2 Square Root CovarizlllCC Fiher (SR(TF! pT/2 (k/k_l) AT {k )

Numerical errors \,.ith regular Kalman filters cause tile error covariance matrix P to lose its symmetry and tile filter can 'blow up' causing the st;tie estimes to diverge. Square root Kalman filters force the P matrix to remain symmetric by propogating the triangular factors or square roots. "File equations belov, define the SRCP dtle to .Morf and Kailath (1975). q PT":(k/k-1)Cq(k) PLe(klk-1)Al(k)! Q(k) V l'"(k) 0 i 0 \V ~:(k) J

t /]'"(k)

=

0 0

V,.- '-'" k)C(k)P( k/ k - l } A t{ kl~! PI :(k+l/kt 0

Cr[k)

/

/

k}P{k/k-1)A T {k)

It2)

*(k+ I/k) =A(k).~(k/k- 1)+B(k )u(k)

+A(k)P(k/k-1)C t(k)Ve
~'rt'2(kt k-1 )

Fig. 5. Orthogonal decomposition for SRCF The results from the decomposition, residing in memory, are now used to provide tile state update. Taking the transpose of equation (13), and dropping the time indices for clarity, produces

Parallel Algorithms for Control

639

~T = (A~ + Bu)'I + Lz- C~_]T V-TO- V-IO-CPTAT

_

,

,,

D

,

~

c

,..-a,.v..,,~

C

A -I

e



f

J

B /

This Schur complement can be formed by using tile first m rows of the architecture in Fig. 5 as a trapezoidal array for the Fadeev algorithln. Further, since the ' A ' term is already triangular, only the second or nullification mode needs to be performed. This produces the split architecture given in Fig. 6.

/ A (k.,-1)

/

J

o

P~2( k/k-1 ) CTIk}

Two modes of operotion

[ A ( k * l ) ~ (k.,-l/k)] "r [C(k.1)_~ (k.l/k)l r

I

~--~ ~ ( k * l / k ) P"r"2(k ", 1 / k ) AT{k*1)

One mode ot operotion

p'r~2(k. 1 / k ) C T ( k * I )

Fig. 7. Complete systolic SRCF

Fig. 6. Split architecture for SRCF Tile matrix products required for tile next iteration are forrned by inse,ting a linear array between the trapezoidal and lower triangular sections. This captures the updated state vector, once available, and performs matrix-vector multiplications. The matrix-matrix products are produced by using the bottom triangular section as a muhiplier a,ray to give tile complete SRCF array in Fig. 7. This robust and compact systolic fiher compares favourably with others in the literature (Gaston et al.. 1990) requiring O(2n) timesteps per iteration. 4. ALGORITHM ENGINEERING The term "algorithm engineering" describes the hybrid discipline involving the derivation of stable numerical algorithms suitable for parallel computation, tile mapping of these algorithms onto dedicated parallel architectures and tile interaction between algorithm and architecture. As mentioned earlier, systolic arrays do not necessarily imply a specific hardware realisation. However, they are fundamental in the sense of providing representations of fine-grained parallel algorithms. They define the calculations to be performed, the order in which these are

to be done and tile input data required. This viewpoint allowed the detail to be ommitted from the basic arrays in order to concentrate on the formation of tile more complex systolic SRCF algorithm from these building blocks. Gaston and Irwin (1990a) have identified 4 ways ill which the basic arrays can be used to fo,'m more complex algoritluns: (i) Off the peg Direct mapping of an algorithm onto a basic array, with no changes to tile array. Tile data must be manipulated so that the algorithm is executed correctly. (ii) Cut-to-fit Algoritluns are mapped onto a basic array which is customised to take account of the special structure of tile data. Here the array is modified, as well as the data flow being specified. (iii) Ens'emble Several well known arrays are combined to implement one algotitlun. These are linked together, with data flowing fronl one to the other, to complete the calculations. (iv) Layer Two or more basic arrays :ue mapped into one, which switches mode from one operation to the next. The RCF algorithm provides a good example of tile first approach while the SRCF algorithm was formed by the 'layer' approach. These definitions were motivated by the notion of manipulating geomerical shapes. McWhirter (1989) takes a mo,e formal appproach by associating tile arrays with fixed matrix operators which do ,lot change with the data being processed. The more complex systolic algorithms are then expressed in terms of these operators.

640

G.W. Irwin

The ,aext section reports on transputer implementations of the systolic RCF and SRCF algorithms in order to illustrate further the process of algorithm engineering.

5. TRANSPUTER IMPLEMENTATIONS

folding of the systolic array, much like paper, down to tim size of the target hardware architecture. Transputer implementation results, on an example problem of v:~ryiug increasing state and nleasurement dimensions, confirm that the conventional filters r u n faster thaff the systolic algorithm on a single transputer, The difference was more pronounced for the RCF because of the redundant calculations introduced into the systolic RCF,

Transputer implementation of the systolic filters was studied by Maguire and Irwin (1991a) . The simple idea of assigning each systolic cell to one transputer is rejected because of the mis-,natch in granularity between the algorithm and the target hardware. Practical issues like load balancing, communications requirements and the efficient use of each processor dictate that cells are grouped into coa,'ser processes. The strategy proposed uses the inherent linear schedule of the systolic architecture, which is based on a set of equitemporal hyperptanes, as shown in Fig. 8. The schedule vector S, which defines the sequence of operations of the array, points in a direction normal to these hyperplanes since the computations along a hyperplane can be performed simultaneously. If the array is treated as a wavefront one, the data driven structure allows the schedule vector to be resolved into orthogonal components, ~ and 5;, respectively. The architecture can then be mapped efficiently into a linear array of row- or column-processes by projecting parallel to one of these schedule vectors and using the concurrency in the resulting pipeline.

/

/.

/.

e/

/

/ //

/.

i //

/ /

/e

.i /

/

/

.,"

Fig. 9. Assignment of RCF row-processes to transputers

)

I

JD=

I

I

/

//



I JI--ql

.

e

/

uI Fig. 8. Mapping systolic RCF into row-processes

For these algorithms, projection was done in the horizontal direction due to the geometry of the arrays and the larger computational requirements of the boundary cells. Fig. 8 illustrates this for a 4th order RCF array. The resulting structure consists of at pipeline of row-processes in which the original systolic cells are coalesced, thereby removing their systolic operation. One advantage of this approach is that the communication requirements are reduced to passing vectors of data between row-processes, which can be implemented efficiently on transputers. The second, and final, stage in the mapping strategy assigns these row-processes to transputers. Fig. 9 shows a one-to-one assignment for a RCF example, while Fig. 10 illustrates assigning muhiple row-processes for a SRCF example• This method proved to be more efficient than a technique suggested by Stewart (1988) which involved

> "--'+-'11 II >> ll J Fig. 10. Assignment of SRCF row-processes to transputers Heuristic partitioni,lg of the sequential algorithms is restricted to two processors because of the difficulty in extracting parallelism, and rnarginal speedups of about 1.1 can be obtained. However, the systolic array approach facilitates the introduction of more processors, allowiqg significant performance improvements over the single transputer implementatioqs for the more computationally intensive SRCF filter. For both filters mapping efficiencies improve with higher problem orders because of increasing task granularity.

Parallel Algorithms for Control 6. PARALLEL KALMAN-BUCY FILTERING Tile continous-time Kalman-Bucy filter (Kahnan and Bucy, 1961) is described by a set of ordinary differential equations as follows: ~(t) = A(t)x(t)

+ B(t)t2(t) + K(t)[.g.(t) - C(t)x_(t)]

K(t) = P(t)C'(t)V'(t)

(14)

(15)

P(t) = A(t)P(t) + P(t)At(t) - P(t)CT(t)V+(t)C(t)P(t) + W(t)

(16) Application of numerical integration procedures requires the sohltion of n:+n differential equations, which constitutes a considerable burden for online implementation. Solution of the Ricati differentkd equation for tile error covariance matrix P(t) can be made more robust by' forcing symmetry through propagation of the triangular factor. This also reduces the computation to the solution of (u:+3n)/2 differential equations. The analogue flow diagram provides a graphical description of a system for implementation on an electronic analogue computer. Further, it shows the parallelism inherent in a continuous time system by illustrating tile relationship between the concurrent operation of the various blocks. Figs. 11 and 12 contain the analogue flow diagrams describing the differential equations for a Kalman-Bucy filter with n=4 and m=l.

641

of tile measurement mauix is used to reduce the third term in equation (16) to tile multiplication of two elements in P(t). Tile solution of the n: differential equations involved can be reduced by propagating the upper triangular section of tile matrix to produce a triangular array of processing cells simiku- ill structure to the systolic arrays discussed above. Note that here the communication is not restricted to nearest-neighbour because each processing element, associated with the calculation of an element of P(t), is connected to every other processing element in its row or column. This implies that the processing elements in each row or columq can be executed simultaneously. Tile concurrent description of tile Kahnan-Bucy filter contained in the analogue flow diagrams of Figs 11 and 12 can be mapped onto a transputer array by orthogonal projection to produce either interconnected row- or column- processes, similar to stage 1 for tile systolic algorithms discussed previously. Stage 2, assigning rowprocesses to transputers, requires care to achieve load balancing. Thus, for a fourth order problem, row-processes 1 and 2 were placed on separate transputers, while rowprocesses 3 and 4 were combined and assigned to a third processor. The broadcasting of data facilitated the placing ot non-neighbouring row-processes onto transputers in order to achieve balanced computation.

7. DISCUSSION AND CONCLUSIONS In Fig. t 1 the filter is operating in open loop and tile state update involves the solution of n differential equations which can be implemented ill parallel. Similarly, Fig. 12 shows the analogue flow diagram for tile solution of the Riccati equation. This is simplified since it neglects to show tile output of tile covariance elements, which provide the link to the st:lle update equation. Also, tile dimension

x,[,.

T

"

x,[,. -

Fig. 1 I. Analogue flow diagram for state update (eqn.(14))

In this paper two classes of parallel algorithms suitable for ,eal-tmlc Kahnan filtering have been described. The first relies on a systolic array representation of the fine-grained computations involved in discrete-time covariance fihering. Tile RCF and SRCF filters were used to illustrate how complex systolic 'algorithms could be fo,med from simpler arrays for basic matrix calculations. It was argued that the systolic arrays constitute a fundamental description of the concurrent computation and do not necessarily imply a VLSI implementation. In order to illustrate this point the systolic filters were realised on transputer hardware. The mapping strategy revolved projecting into either row- or column-processes followed by placing these of processes onto processors. Tl~c transputer implementations highlighted some interesting points about algorithm engineering. The regular Kahnan filter involves less conlputation than the square root form where added robustness is achieved at the expense ot extra computation. This makes it less attractive for real-time reztlisation on couventional Von-Neumann processors. Tile systolic array algorithms facilitated the introduction of parallel p,ocessing It, achieve the speedups necessary for real-time performance. Tile speedup obtained for the systolic RCF was not large because of tile redundant calculations added in order that tile data should fit the trapezoidal array, as required for the Fadeev algorithm. The systolic SRCF algorithm on the other hand produced a rnore successful hardware implementation since the othogonal decomposition fitted more neatly and efficiently onto the triangular array. The second class of algorithm involves an analogue flow diagram desc,iption of the differential equations arising from a continuous time Kalman-Bucy type filter and again

642

G.W. Irwin

Vll

>

W|l

B'

w|l

C'

C'

C

C'

D'

D'

D'

D'

Fig. 12. Analogue flow diagram for covariance update (eqn.(16)) a projection .,qratcg.,, \~.as used tu produce transputer implementations, altholLgh this case involved commtll/icalion of data since each cell in the algorithm was connected to every other cell m tile sallle row or co[nnln of the processing array. Systolic algorithms have also been proposed for LQ optimal control (Gaston and hwin, 1990b) for parameter estimation (Gaston and Irwin, 1991c) and for self-tuning control (Maguire and Irwin, 1991b). Some results have also appeared on massivcl5 parallel algorithms where the nearest neighbour commtmication restriclion of systolic conlputatiol/ is removed. The idea here is to c o m b i n e the

fast processing properties of current VLSI technology with

the L..onununicatio,tadv:mtages of optical technology in the forln of broadcast arrays ~Miller and hwin, 1991). The parallel algorithms described have been used in a number of application areas including self-tuning automatic voltage regulation of a turbogenerator (Maguire, 1991) and target tracking (Kee and Irwin, 1991) 8. ACKNOWLEDGEMENTS The contributions of Dr Fiona Gaston, Dr L Maguire and other members of the Control Research Group at Queen's University, Belfast to tim ideas described in this paper are gratefully acknowledged.

Parallel Algorithms for Control 9. REFERENCES

Fadeev, D.K. and Fadeeva, V.N. (1963). Computational Methods of Linear Algebra, W.H. Freeman and Co. Fleming, P.J., Crummey, T.P. and Chipperfield, A.J. I 1992). Computer assisted control system design and multiobjective optimisation, Proc. ISA Conf. on Industrial Automation, pp.7.23-7.26. Gaston, F.M.F., Irwin, G.W. and McWhirter, J.G. (1990). The systolic approach to square root covariance Kalrnan filtering, J. VLSI Si~. Proc., Vol.2, No.l, pp.37 49. Gast,,m, F.M.F. and hv,m, O.W. (1990a). Systolic Kahnan Filtering: :m overview, lEE Proc.-D, Vol. 137, No 4, pp.235-244, Gaston, F.M.F. and Irwin, G.W. (1990b) A systolic linear quadratic oplm~al controller, Electronics Letters, Vol.26, No 14, pp.l(100-1002. Gaston, F.M.F. and Irwin, G.W. (1990c). A systolic par:tmeter estimator for real-time feedback control, Proc. lEE Conf. IT '90. ppl 11-116. IEEE (1987). Challenges to control: a collective view. Trans., Vol AC-32, No.4. hwin, G.\V. and Fleming, P.J. (1992). Transputers in Real-Time Control, Research Studies Press (John Wiley and Sons). England. Kalm:ul, R.E. and Bucy, R.S. ( 1961 ). New results in linear filtering :.uld prediction theory, J. Basic Eng.. Vol. 83D, pp. 95-108. Kee, R. and hwin, G.W. (1989). Parallel implementation of the tracking Kahnall filter using a network of transputers. IEE ('olhNuitml Di_e_eston "Navigation, Guidance and Colmol in Aerospace", No. 1989/142, PI". 1/1-1/7, Kung. tt.]', and l,ciscs~,on, C.E..!19781. Systolic arravs [Of VLSI, ill Sp',irs¢ Matrix Proceedings. Soc. h'~dttst, and AplA. Xl;Ith., l~lailadelphia, PA, pp.245282.

643

Laub, A.J. and Gardiner, J. D., (1988) Hypercube inlplementation of some parallel algorithms in control, in Advanced Computing Concepts in Control Engineering, M.J. Denham and A.J. Laub (eds.), NATO ISI Series, Vol. F47, pp. 362-390. Maguirc. L (t991), Parallel architectures for Kahnan filtering and self-tuning control, PhD Dissertation, The Queen's Uuiversity of Belfast. Maguire, L. and Irwin, G.W. (1991a), Transputer implementation of Kahnan filters, IEE Proc., Vol. 138, No4, pp.355-362. Maguire, L and Irwin, G.W. (1991b). Pmallel adaptive control, Prou. I st Etu-ope:m Control Conf., Grenoble, Vol. 1, pp. 590-595. McWhirter, J.G., ~1989) Algoritlam engineering- an emerging discipline, SPIE Proc. on Advanced Algorithms and Architectures for Sienal Processin~ l~v.._'.~ Vol.1152, pp. 2-15. Megson, GM., (1990), Transputer always and computer aided control system design, lEE Proc., Vol. 137, No. 4, pp. 197-210. Miller. P :rod h-win, G.\V. (1991), Mapping algorithms onto processor arrays \vith data broadcasts, in VLSI S\'stcms for DSP and Control, \Moods, McCanny and In-v.in (eds.), Vv'oodhead Publishing, U.K., pp21-27. Morf, M and Kailath, T (1975), Square root algorithms for least squares estimation, IEEE Trans. on Automatic Control, Vol.20, No.4, pp 487-497. Rogers, E and Li, Y. (1992) Parallel Processing, Prentice Hall Int. Stewart, R.W. (1988). Mapping signal processing algorithms onto fixed architectures, Proc. Int. Conf. on .Acoustics, Speech and Si~na.l Processing, pp.20372O40. Yeh, HO. (]988), Systolic implementation on Kalman filters, IEEE Trans. ASSP. Vo1.36, No.9, pp.15141517.