Copyright © IFAC Algorithms and Architectures for Real-Time Control, Ostend, Belgium, 1995
IMPROVING SIGNAL PROCESSING PERFORMANCE USING A TRANSPUTER-DSP PARALLEL ARCmTECTURE D.F. Garcia Nocetti, J. Martinez Flores, J. Solano Gonzalez DEA, IIMAS, Universidad Nacional Autonoma de Mexico Apartado Postal 20-726, Del. Alvaro Obregon CP 01000, Mexico D.F., Mexico
Abstract: This work presents the development of a Heterogeneous Processing Node (HPN) which combines a transputer with a DSP, in order to improve the time perfonnance in parametric spectral estimation applications. The HPN includes libraries for downJoading and executing pre-compiled ready-to-run DSP functions in a transparent manner, using standard transputer programming tools. The HPN has been integrated to a transputer-based parallel processing platfonn and applied to a case study conceming parametric spectral estimation of Doppler ultrasound signals. Perfonnance analysis and results are presented with respect to an homogeneous approach, revealing the effectiveness of the heterogeneous approach.
Keywords: Heterogeneous Parallel Architectures, Algorithms and Software Architectures, Signal Processing.
1. INTRODUCTION
when coarse grain tasks are involved. However, the special demands of some regular fine grain signal processing operations associated to the parametric methods have revealed shortcomings in using this architecture. A recent research has shown the feasibility of integrating transputers and DSPs in order to build a heterogeneous architecture for signal processing applications (Garcia Nocetti, et aI., 1994b).
Estimation of the Power Spectral Density (PSD) of discretely sampled detenninistic and stochastic processes is usually based on methods employing the Fast Fourier Transform (FFI), which is computationally efficient and produces reasonable results for a large class of signal processes (Ching and Wu, 1989). In spite of these advantages, the FFr-based methods have several inherent perfonnance limitations such as the frequency resolution leakage that distorts the spectral response. Advances in this field have led to the conclusion that parametric spectral estimation methods can give significant improvements in the time-frequency resolution at the expense of a greater computational complexity (Kay and Marple, 1981; David, et aI., 1991).
The work described here concerns with the development of a Heterogeneous Processing Node (HPN) which combines the communications ability of the transputer with a high perfonnance signal
WalKER
o
~_ _ _+j
WalKER 2
MONTal
Previous work has investigated the time performance of parametric methods when implemented on an homogeneous transputer-based system (Ruano, et aI., 1993; Garcia Noceni, et aI., 1994a), see figure 1. The results indicate that transputers can compute and coordinate parallel operations of irregular tasks efficiently, particularly
WalKER t----~ WalKER
1
3
Figure 1. Homogeneous system architecture 531
processing co-processor in order to improve signal processing time perfonnance in parametric spectral estimation applications. The HPN includes a number of libraries for downloading and executing precompiled ready-to-run DSP functions in a transparent manner, using standard transputer programming tools. In this way, the HPN can be easily integrated to a transputer-based parallel processing platform. As a result, it will execute efficiently those tasks which are suitable to be perfonned by a DSP.
transputer and a TMS320C30 DSP (fexas Instruments, 1990) which includes the communications interface. The transputer supports general purpose parallel computation and the communications scheme within a network of transputers, while the DSP supports high perfonnance signal processing functions. If more processing capability is needed, a T805 transputer module can substitute the T222 module. Figure 2 shows a block diagram of the HPN where a link is used to communicate with the DSP. The remaining links are used to communicate with other transputers. Therefore, several transputers may have access to the HPN as a signal processing coprocessor.
The use of the HPN aims to reduce the number of processing elements required to achieve real-time perfomlance for a given application. The heterogeneous architecture has been applied to a case study concerning parametric spectral estimation of Doppler signals for blood flow instrumentation. In particular, the modified covariance method has been implemented. Perfonnance analysis and results with respect to the transputer-based implementation are presented. Results reveal the effectiveness of the heterogeneous approach, when combining the best characteristics of the processing elements involved in solving a problem. The HPN coupled with its associated software is intended to be used in the implementation of a real-time spectrum analyzer for calculating and displaying in real-time the spectrograms of Doppler signals generated by a blood flow detector, aiming to detect cardiovascular diseases in an early stage.
r - - - - - - - - - - - - - - - - - -. • T222
•
•
1 __ -
TMS32DC3D
______________ "
Figure 2. HPN block diagram. Since the HPN includes a transputer, it can be easily connected to more transputers in order to fonn a heterogeneous hardware architecture. Figure 3 shows this architecture, which is integrated by a number of processors and a HPN. Typically, for signal processing applications, a transputer-based parallel processing system is installed within an IBM PC or compatible computer, using a PC TRAM motherboard and a number of intercolwected processing and data acquisition TRAM modules. As a result, the HPN can execute those tasks that perfonn more efficiently when executed on a DSP.
2. HETEROGENEOUS ARCHITECTURE Parallel systems based on homogeneous architectures have been previously used in order to compute parametric spectral estimation algorithms in real-time (Garcia Nocetti, et aI., 1994a). However in many implementations, the resulting tasks have been forced to run on ill-suited architectures resulting in a poor performance. As an appropriate matching of hardware granularity to task type, it is essential to fully exploit the parallel computational power available. It is expected that combining the granularity of different processor types will lead to a more effective system, that will be able to outperfonn the previous homogeneous approach.
WalKER
0
I
The HPN is the result of integrating the coarser granularity of the transputer with a digital signal processor's finer granularity. One of the guiding principles in the development of a heterogeneous node, is that the communications overhead between the DSP and the transputer should be low enough to fully exploit the characteristics of each processor. Another aspect to take into consideration, is that the heterogeneous node should be constructed from commercially available devices in order to minimize both implementation cost and design time. The HPN is integrated by two separate modules: a T222
......
MO"ITOR
DArACQ
HPN
I WalKER 1
Figure 3. Heterogeneous system architecture
532
3. SOITWARE DEVELOPMENT TOOLS
TRANSPlJTER
Integrating two different kind of processing elcments within a single parallel processing platform may be cumbcrsome from the programmer's point of view, since two independent initialization processes have to be performed. In order to simplify the initialization process, a simplified approach has been developed, which consists on perfonning the HPN initialization within the transputer application program using standard transputer progranillling tools. In the HPN the DSP operates as a slave to the transputer which sets up digital signal processing routines (for example FFT's) , as wep as loading data for processing.
__
DSP _ _.....
t4'N~lK JlON EX KunON
IUQUEST
/lfSWSfI SUCCESS
The software tools developed for the HPN include a numbcr of libraries wrinen in Parallel C, for downloading and executing pre-compiled ready-to-run DSP functions transparently . Figure 4 shows the flow charts of the initialization process used to tnmsfer signal processing functions to the DSP in the HPN. When the transfering of the functions has been completed, the DSP program is then ready to execute the functions requested by the transputers, see figure 5.
E ~ OH.....c
nON
EXfCUlION REQUEst
Figure 5. HPN function execution process. covariance method has been implemented (Kay and Marl'le, 1981). Using this technique, it is possible to obtain a better spectral estimate based on an AR model and estimating the parameters of the model from the observations. Then, spectral estimation in the context of modelling. becomes a three-step procedure. The first step is to select a model. The second step is to estim
~~o ~I'--------------~'~I___~__"__~
10 foUoABfR
....-____-1..______--,
~I.f-----------+{.I,--
_ :Pf'-ONG FWLENGrl1
\.
SOCCESS
H(z)= - - : - - - -
1+Ia[k)z . k
~
~t
k.' ~ S
Figure 6. Parametric spectral estimation model. A m
Figure 4. HPN function transfer process.
4. CASE STUDY: PARAMETRIC SPECTRAL ESTIMATION The heterogeneous architecture has been applied to a case study concerning the Auto-Regressive (AR) parametric spectral estimation of Doppler signals, for applications in blood flow instrumentation (Garcia Nocetti, et al., 1994a). In particular, thc modified
The heterogeneous approach is described and compared with the homogeneous approach previously implemented in Garcia Nocetli, et al., (1994a). The homogencous approttch uses N identical processing 533
modules, regarding the parallel execution of the spectral estimation algorithm for N consecutive data segments, each data segment spectrum being calculated by a dedicated processor.
density (D ), tlchieving a much higher processing time perfonntlnce.
: WJ
' WI
' Wo
: :W}~·~'..... ~~~D~~ : ~~g~ : ...~~1D~~--:_~ ...... __~ .. ...... ·.. f······ 1!I ...... t~~~ ~
Figure 1 shows an example using N=4 identical processor modules. The associated times from each processor and the input/output modules are shown in figure 7. A data segment of n=256 and a model order p=6 are considered. The figure shows the time slots for each processor in the system, where each slot is referred to as the time span of an input data frame. Each processor takes up a new data segment from the input data ('N), and then starts processing the covariance modified algorithm (P). Output is produced at the end of each cycl~, therefore, a processor has N time slots to handle one segment of data and the perfornlance is essentially increased by a factor of N. Assuming that the required processing time is less than N time slots, real-time processing can be achieved at a maximum input data rate which is N-times higher than an uniprocessor system. However, the delay from input to output is the same in both cases.
~
~
'
~
~
:0 : ... ,..........--;---;-_'" ...... iI-;---:-_"'.. _--;---;,..---' : Cl
'Mll~R
2
: Cl
: Cl
..... d
: CA
_.;.--'-_11 ....... : --;"'--'-_io
...... :· .. • .. . .
, 01
: 02 ~
. ······r. ;. ..
:
""""0>
......
: III ~
.... ~
' Dol
:
..... ~.....
t
:
.
... ,-+ ... ........ - ... -~---~----;-----i
Figure 8. Timing diagram of heterogeneous approach using two transputers and one HPN (n=256, p=6). 5. PERFORMANCE ANALYSIS The time performance of the heterogeneous architecture has been evaluated in function of the number of processing elements required to achieve real-time processing. Time perfonnance has been measured using the real-time clock facility available in the transputer, varying the model order (p=4,6 and 8) tlnd the length of the input dattl frame (n=64, 128, 256 and 512). The dtlttl frame time is determined by the product of the sampling frequency and the data frame length . For this case study, the data frame time has been fixed at 10 ms, and both sampling frequency tlnd data frame length have been adjusted to satisfy the signal's constraints. Real-time is then achieved when tI dtlttl frtlme can be executed in less thtln 10
On the other hand, the heterogenous approach uses M transputer nodes and one HPN to process the spectral estimation algorithm for M consecutive data segments. Each data segment is calculated in a distributed manner by one transputer and the HPN. Figure 3 shows an example using two identical processing modules and one HPN. Processing pipeJining must be performed in order to evenly space the HPN operations requested by the transputer nodes.
IllS. WI
'Mll~ ~1
'
. WJ
Wo
:
WI ·
' WI
·
'.
,::
~W}
:.;.:w. :.. :"'" :.;. :1<06 :.;. :WloL.. ~
... i .. •'" _
.......-'-.,....ee · ..,.••" -'-.;.-.....,.....,...... ; . 8 • -:-'-'--"-41& .:.
PI:
: P'.>
In order to assess the execution time perfonnance for the heterogeneous tlrchitecture, a comparison with the homogeneous approtlch has been conducted. Tables 1 tlnd 2 sumllltlrise both the real-time execution time values and the number of processors required to achieve these results. Results are presented, for different vtllues of both the length of the input data frame n tlnd the model order p of the spectral estimtltor.
: PI
' Plij :1'6' ... ,... ,.. '......~.;.....;.....;.......... ·:· .III........;.....;.....;.......... .. ,. ..,.;-:-..;....;-:-..
:P3: . :P7 .- ....:.. .."....;....o...-,..........;...ei. .L ••-:-.;.......;.....;.....;....e;, .,....· ..............-1 ~ Pot ;
: PS :
.... ,. ·;·· ••--;.--.;.....;.--;.-41'" · ...... ~ -:-.......~--ej,.. ; ~
""''''OO
.
: : : : : :. : : . : ~ . : ~ ~ ... :...; ... c.. ~ ...: .. :... ,.. ".--;.-................................;............-'-
Considering n =64, real-time ctln be achieved for the homogeneous approach using only one transputer. In contrtlst, the heterogeneous approach requires one trilnsputer tlI1(1 one HPN. For n=128 both approaches offer tI similitT performtlnce due to the fact that they required the Silme number of processing elements.
Figure 7. Timing diagram of homogeneous approach using four transputers (n=256,p=6) . The times of each processor, the HPN and the input/output modules are presented in figure 8. For this approach, the sequence is similar to the homogeneous one. However, the main difference is the pipeline structure of the system. Here, each processor takes up a new segment of input data ('N ) and starts processing the first part of the covarian~e modified algorithm. The parameters of the model (C ) are calculated and then transmitted to the HPN which in turn executes the final power spectral
The real advantage of the heterogeneous architecture over the homegeneous one is exhibited for n=256, and, in a higher extent, for n=512, where the number of processing elements have been reduced significantly. Note thtlt this is mainly due to the fact thilt the calculation of the power spectral density has been executed in the HPN, being this a more suitable 534
7. ACKNOWLEDGMENTS
architecture for this type of task, and therefore, reducing the total execution time.
The authors gratefully acknowledge Daniela Norma Ramos Hemandez and Hector Benitez Perez for their participation on this work and to the Universidad Nacional Aut6noma de Mexico, the DGAPA-UNAM (PAPIID-D0303593), the CONACyT (PAClME-F284A9209) and the British Council for their financial support.
Table 1. Real-time homogeneous approach performance processing time [ms] (number of transputers) 64
128
256
512
4
5.129 (1)
5.516 (2)
7.782 (3)
7.887 (7)
6
6.680 (1)
6.730 (2)
7.340 (4)
8.101 (8)
REFERENCES
8
8.592 , (1)
8.132 (2)
8.473 (4)
8.097 (9)
Ching, P.c., Wu S.W. (1989). Real-time digital signal processing system using a parallel processing architecture. Microprocessors and Microsystems 13, pp. 653-658.
Table 2. Real-time heterogeneous apQroach performance processing time [ms] (number of transputers) 64
128
256
512
4
3.757 (1)
7.072 (1)
5.370 (2)
7.167 (3)
6
5.342 (1)
9.533 (1)
7.470 (2)
7.310 (4)
8
7.288 (1)
5.365 (2)
6.497 (3)
9.160 (5)
David J.Y., Jones S.A., Giddens D.P. (1991). Modem spectral analysis techniques for blood flow velocity and spectral measurements with pulsed Doppler ultrasound, IEEE Trans. on Biomedical Engineering 38, pp. 589-596. Kay , S., Marple, S.L. (1981). Spectrum analysis-A modern perspective, Proc. IEEE 69, pp. 1380-1419. Garcia Nocetli D.F., Ruano M.G., Fish P., F1eming P. (1994a). Parallel implementation of a modelbased spectral estimator for Doppler blood flow instrumentation. Proc. of 8th. IEEE International Parallel Processing Symposium, Canc/in, Mexico. pp. 810-814.
6. CONCLUSIONS This work has presented the development of a heterogeneous processing node (HPN) and the software tools for improving the perfonnance of a signal processing parametric spectral estimator application, when implemented on a parallel processing platform based on transputers.
Garcia Noceni D.F., Solano Gonzalez J., Martinez Flores J. (1994b). Heterogeneous architecture for parallel real-time spectral estimation in Doppler blood flow instrumentation. Proc. of lEE International Conference on Control 94, Coventry, U.K. pp. 37-4
The HPN and the software tools have been developed to load and execute digital signal processing functions in a transparent manner, making use of standard transputer tools.
Ruano M.G., Garda Nocetti D.F., Fish P., F1eming P. (1993). Alternative parallel implementation of an AR-modified covariance spectral estimator for diagnostic ultrasonic blood flow studies, Parallel Compllling 19, pp. 463-476.
The application of the heterogeneous architecture to the case study has demonstrated its effectiveness, reducing the number of the processing nodes required to achieve real-time in a significative manner. This has been achieved by distributing the different tasks to the most appropriate processing node and applying a pipelined processing scheme.
Texas
The HPN can be extended to a number of applications where digital signal processing is required, obtaining the benefits of a high performance system with a reduced number of processing elements.
535
Instruments Inc . (1990) . TMS320C30 Evalllation Module Technical Reference.