Parallel Processing in Real-Time Ultrasonic Imaging

Parallel Processing in Real-Time Ultrasonic Imaging

Copyright © IFAC Algorithms and Architectures for Real-Time Control, Vilamoura, Portugal, 1997 PARALLEL PROCESSING IN REAL-TIME ULTRASONIC IMAGING D...

1MB Sizes 3 Downloads 87 Views

Copyright © IFAC Algorithms and Architectures for Real-Time Control, Vilamoura, Portugal, 1997

PARALLEL PROCESSING IN REAL-TIME ULTRASONIC IMAGING

D.F. Garcia Nocetti, J. Solano GonzaJez, M.F. Valdivieso Casique, R. Ortiz Ramirez E. Moreno Hemaindez*

DFA, IIMAS, Universidad Nacionai Autonoma de Mexico, PO Box 20- 726, c.P. 01000, Mexico D.F. . Mexico. *Centro de Ultrasonica, ICIMAF, Callel5 No. 551, Vedado, 10400, Habana, Cuba

Abstract: An ultrasonic imaging system based on a parallel processing architecture is presented. The system exploits different forms of intrinsic parallelism frequently associated to the process of ultrasonic imaging. Beam focusing and scanning techniques are developed and applied for increasing image spatial resolution. Parallel interpolation algorithms are implemented for improving resolution of ultrasonic imaging. These algorithms are assesed, considering performance in achieving the required speed for real-time ultrasonic imaging applications. The fleXIbility and scalability of the system allows to incorporate new developments in technology and utilise more complex algorithms for improving image visual appearence and achieve realtime response. Keywords: Parallel architectures, real-time performance, image processing, ultrasonic imaging

1. INTRODUcnON Ultrasonic imaging using digital computers has been a very active research field in recent years (Wells, 1996). Requirements for fast and accurate image construction from ultrasonic signals cover a wide field of applications ranging from signal acquisition through pre-processing, formation and displaying of the ultrasonic image to its enhancement and analysis (Fish, 1990). Custom built systems where algorithms are directly implemented in special purpose hardware tend to be expensive in design time, have a limited market and flexibility with improvements in technology and algorithms (Cavaye, et al., 1993). The aVailability of parallel architectures offer new opportunities for the realisation of low-cost, flexible, faster and more reliable systems.

This paper presents an ultrasonic imaging system based on a parallel processing transputer architecture. The system exploits the different forms of intrinsic parallelism often found within the process of ultrasound imaging (Webber, 1992). Processes such as beam focusing and scanning, formation, displaying and postprocessing are designed to be executed in parallel in order to achieve the performance required in realtime ultrasonic imaging applications. Beam focusing and scanning techniques are developed and implemented for adjusting the depth at which the transducer beam is focused and for increasing the spatial resolution of the ultrasonic image. With respect to image formation, parallel interpolation techniques are applied to improve the original image through the use of mathematical approximation functions which generate intermediate samples that together with the original ones will produce more detailed images to the observer. Furthermore, the fleXIbility and scalability of the system allows to incorporate new developments in technology and opens a greater scope for developing and utilising better and more complex algorithms that can improve the image visual appearence and achieve a real-time response.

10 the field of ultrasonic imaging, the image quality achieved depends on many factors (powis et al., 1984). Particularly important are the transducer (e.g. its sensitivity and focusing) and the "front end" electronics ( i.e. electronic noise level and accuracy of the digitized signal). Preprocessing (signal dynamic range and interpolation) and postprocessing (grey scale maps and image smoothing) are adjusted to produce the best possible image quality (Russ, 1995).

293

2. ULTRASONIC IMAGING Display

The "pulse-echo" principle is the basis of ultrasonic imaging. In practice, a series of acoustic pulses are transmitted along the ultrasound bea:m, with the transducer "listening" for echoes after each pulse. The time interval between successive transmit pulses must be such that all the echoes from one pulse have died away before the next one is transmitted. In the process of fonnation of an ultrasonic image, pulses are transmitted sequentially along adjacent lines of sight, and dots are placed at the appropiate points on the display wherever echoes are detected, see figure 1. Figure 3. Block diagram of an ultrasonic imaging system. I I

I

One important restriction in ultrasound imaging applications is that the transmission power of the pulses should be kept as low as possible for safety reasons, and still obtain the necessary infonnation. Therefore, the receiver gain, preprocessing and postprocessing should be adjusted adjusted to produce the best image quality for a given examination.

I

~ ~ Figure 1. Fonnation of an ultrasonic image.

3. SYSTEM DESCRIPTION Parallel lines of sight (as produced by linear array transducers) give rise to a rectangular scan area Lines of sight with common origin point (as produced by electronic phased array transducers) generate a sector scan area In a linear array, as depicted in figure 2a, the beam is fonned using a group of tranducer elements, it can therefore be transfered laterally by dropping one element from one end of the group and picking up an extra element at the other end. In a phased array, figure 2b, electronic time delays are used to scan as well as to focus the beam.

(a)

The work described here concerns with de development of a parallel processing ultrasonic imaging system, aiming to scan, construct and display high quality ultrasonic images in real-time with a minimum rate of 25 frames/second. Figure 4 shows a block diagram of the system that is based on a parallel processing transputer architecture. Three main blocks are processed in a pipelined fashion : focusing and scanning, processing and displaying . A transputer controls the focusing and scanning stage for acquiring the input ultrasonic data. It also provides the required processing power to construct the image to be diplayed. The INMOS T805-30 transputer architecture, programmed in the OCCAM language, is connected to a host computer which provides input, output and ftle system services. The host computer communicates with the first transputer (host node) by running a server program.

(b)

Figure 2. Methods for scanning an ultrasound beam.

Figure 3 shows an ultrasonic imaging machine in block diagram fonn . Functions under user control include: tranSIll1SS10n power, tranSlIllsSlon focal depth (for electronically focused machines), receiver gain, time gain compensation and depth gain compensation settings, preprocessing (signal dynamic range), postprocessing (grey scale maps and image smoothing), image size, zoom, etc.

Figure 4. Parallel processing ultrasonic imaging system.

294

produced by this process. The resulting matrix (256x56) is transmitted to the host transputer for further processing and the system is now ready to acquire a new frame.

3.1 Focusing and Scanning. The system achieves focusing during transmission of the ultraSOnic pulse by means of introducing delays to each group of elements (e.g. elements 1-8 for the initial pulse, 2-9 for the second and so on) out of the 64 elements-3.5 Mhz linear array transducer. The delayed pulses are produced by a programmable digital input/output transputer-controUed module specially designed for this purpose. These delays are programmed electronically to vary the depth of the focal region allowing it to be matched to the depth of the region under examination, see figure 5.

3.2 Processing. The matrix generated by the focusing and scanning stage is received by the host transputer, buffered and sent to a parallel processing subsystem which transforms the original matrix into an augmented matrix (256x168) to produce a more detailed image to be diplayed. In order to generate the augmented matrix, different interpolation algorithms have been implemented such as, linear splines, quadratic splines and cubic convolution (Russ, 1995). Each one of these algorithms have been implemented using a processor farm model of parallelism (GaUetly, 1990). Under this scheme a master process controls and assigns tasks to its subordinate worker processes by means of a so called fanner process. The workers execute identical tasks and upon completion of a task, further work is assigned until the whole set of tasks has been exhausted. Figure 6 shows the fann topology used for the implementation described in this work. Note that the master also includes a worker process which allows it to interpolate its own subframes whilst it is waiting for results from the rest of the workers . When all the subframes are processed, the master integrates the augmented data frame consisting of 256x 168 elements and sends it to the host transputer which in turn sends it to the host computer for displaying purposes.

array

Figure 5. Electronic beam focusing during transmission

Three types of beam focusing techniques have been implemented: linear, dihedral and cylindrical. For linear focusing, a group of neighbouring transducer elements are used together to form a larger transducer which projects a rectangular beam. This beam is shifted laterally by one transducer element at the time for successive transmission pulses. This produces the movement of the beam in order to generate the scanning process. For dihedral focusing, delays are applied to the transducer elements in order to divide the array in two semi-apertures forming an angle _ to the plane of the array. This type of focusing is intended to increase the image lateral resolution. Finally, cylindrical focusing can be achieved by means of directing the acoustic beams to a central point. This effect can be achieved by the curvature of the simulated lens generated by the time delays. The beams generated by the pulses from the transducer elements arrive sumu1taneously to the central point. This type of focusing is used to increase the image axial resoltution.

./

1 1 Worte (

(

...

Rauter

)

hllCrpOIOIlOll ) Algorithm

(

Ra .... r

)

1 T

Figure 6. System processing farm modeL

3.3 Displaying The displaying process is conducted by the host computer which runs a server program for this purpose. This program interfaces the computer to the host transputer, providing the parallel system with input-output facilities and ftle system services. Interpolated data frames, consisting of 256x 168 elements are encoded into a gray scale level and transferred to the host video RAM. An user program displays the images on a VGA type display .

Independently of the beam focusing technique utilised, pulse echoes are received by the transducer array as a result of each one of the shots (56 in total). The received echoes are summed up, amplified and the output signal is digitized by means of an AID converter transputer-controUed module. This module takes 256 samples of the signal and encodes it into a gray scale according to the amplitude of each of the samples. At this stage, a new set of transducer elements can be fired, and this generates a new set of echoes and its corresponding encoded array. An image frame is then constructed with the 56 vectors

4. RESULTS AND ANALYSIS As described earlier, the ultrasonic imaging system aims to acquire, construct, process and display images with a minimum rate of 25 frames/second . In order to achieve this,

295

linear splines algorithm the optimum grain size is 8. It is clear that the processing time, in the case of linear splines, is the smallest compared with the communication of data involved and therefore requires a smaller grain size.

different parallel processing schemes have been applied to the different stages that integrate the system. For the acquisition stage depicted in figure 4, pulses are fired from the transducer array constituted by 64 elements and 3.5 MHz central frequency. Pulses are produced and controlled by a transputer which adds time-delays according to the focusing scheme utilised. Echoes received by the transducer array as a result of each one of the shots (56 in total) are summed up, amplified and passed through a I MHz AID converter. The 256-bit digitized signal is then passed to the transputer which encodes it into a gray scale according to the amplitude of each of the samples of the received signal.

Considering these results, a Processor Farm implementation has been realised using a star topology ( to make the most of the 3 tansputer links available) and involving the farmer in the process of computing part of the tasks (i.e. an active farm) . The results shown in Table IT indicate the execution times and the equivalent number of frames/second obtained for each of the implemented algorithms. It can be observed that the goal rate of at least 25 frames/second has been achieved in all cases by only varying the number of processors required for each implementation. It is also important to point out that the scalability obtained from the implementations has been very high, achieving efficiencies up to 98%, 97% and 96% for the linear,quadratic and cubic convolution algorithms respectively.

As it can be observed in the block diagram depicted in figure 4, a transputer network has been used to process the 256x56 original image. In order to produce a more detailed image to the observer (i.e. a 256xl68 image), three of the most common interpolation algorithms have been implented (i.e. linear splines, quadratic splines and cubic convolution) using for this purporse a Processor Farm parallel scheme. However, it is well known (Webber, 1992) that treatment of images can become cumbersome due the great amount of data that needs to be passed through the different processors in a parallel network. A granularity and communications study has been conducted in order to determine the grain size which minimizes the communication and synchorisation time according to the interpolation algorithms considered. For this study, the original image is buffered by the master transputer and only the interpolated data (i.e. 256x 112) is communicated back by the worker processors after the intepolation algorithm has been applied. Table I shows the results where n is the number of 56-element arrays transmitted, and the columns show the communication+sequential processing times for the total image. n

linear splines

quadratic splines

cubic convolution

I

130.91

173.30

211.27

2

120.19

165.29

206.05

3

117.98

164.93

199.16

4

116.84

164.90

199.15

5

114.70

163.41

197.72

8

114.50

163.23

197.70

16

124.20

163.07

197.62

number of workers

linear splines

quadratic splines

cubic convolution

1

100.80 (9.92)

147.76 (6.76)

183.78 (5.44)

2

50.56 (19.77)

74.30 (13 .45)

92.41 (10.82)

3

34.08 (29.34)

49.72 (20.11)

61.83 (16.17)

4

28.63 (34.92)

37.92 (26.36)

46.72 (21.4)

31.92 (31.32)

38.04 (26.28)

5

6

33.4 (29.94)

Table IT. Performance results. Execution times (ms) & frames/s o Figures 7 and 8 show the resultant images before and after appliyng cubic convolution interpolation using a cylindrical focusing techique. A 35 mm diameter pipe has been used for this purposed, placing it at different depths ranging from 1 to 13 cm.

In the ultrasonic image observed in figure 7, the pipe is seen as an elipse slice. Due to the cylindrical focusing applied, the center of the object appears clearer than the edges. Figure 8 shows the ulrasonic image after applying cubic convolution interpolation. The quality of the image is excellent allowing the observer to distinguish more details which is an important feature in many systems, such as in medical tomography .

Table I. Sequential Processing + Communication times for interpolation algorithms.

It can be observed that for the quadratic splines and cubic convolution the optimum grain size is 16, whereas for the

296

Beam focusing and scanning techniques have been also developed and implemented in order to improve the image's spatial resolution. The system can be programmed electronically to vary the depth of the focal region to match the depth of the region under consideration. Considering the poor resohrtion of the original acquired image, parallel interpolation algorithms have been considered to produce a more detailed image to the observer. Algorithms such as linear splines, quadratic splines and cubic convolution have been implemented using for this pUI]>ose a farm model of paralelism. A study of granularity, considering execution and communication times for the interpolation algorithms, has been fundamental for the effectiveness of the approach. The sca1abitity obtained has been very high achieving efficiencies up to 98%, 97% and 96% for the linear splines, quadratic splines, and cubic convolution algorithms.

Figure 7. Ultrasonic original image.

The flexibility and scalability presented by the system developed, allows us to incorporate emergent technologies and opens a greater scope for developing and utilising more complex algorithms in order to improve the image visual appearence and still achieve a real-time response.

ACKNOWLEDGMENTS The authors acknowledge M. Fuentes, A. Jimenez, M. Castillo, for their participation on this work. Also to DGAPA-UNAM(PAPIIT-INI06796) , CONACYT(Proy.2l46P-A9507) and CONACYT (prog. Mexico-Cuba No.8.17) for their :fmancial support.

REFERENCES Figure 8. Ultrasonic image applying cubic convolution interpo lation.

Cavaye, D.M. and R.A.White (1993). Arterial ImagingModern and Developing Technology. Chapman&Hall Medical Series. London. U.K.

The system described is also capable of using the mUltiprocessing block for later analysis of a particular image chosen by the user. It is possible then to apply postprocessing techniques off-line whilst continuing acquiring new information.

Fish, P. (1990). Physics and Instrumentation of Diagnostic Medical Ultrasound. John Wiley & Sons, Chichester. u.K. Galletly,1. (1990). Occam 2. Pitman Publishing. U.K.

5. CONCLUDING REMARKS

Pow is, R.L. (1984). A Thinker's Guide to Ultrasonic Imaging. Urban&Schwarzenberg. Baltimore. USA.

The work presented here describes an ultrasonic imaging system based on a parallel processing architecture which acquires, constructs, processes and displays images with a minnimum rate of 25 frames/second.

Russ, 1.C. (1995). Image Processing Handbook. 2nd. Edition, CRC Press. Webber. H.C. (1992). Image Processing and Transputers. IOS Press.

The Intrinsic para1Jelism often found in ultrasonic imaging has been succesfully exploited by means of developing and implementing a parallel processing subsystem for controlling the focussing and scanning stage in the acquisition of ultrasonic data, and for providing the required capability to the system for processing and constructing high quality images in real-time.

Wells, P.N.T. (1996), State-of the-Art of Ultrasound Imaging in Medicine and Biology . Acoustical Imaging Vol. 22, Plenum Press. New York.

297