A computer architecture for real time image processing using VLSI

A computer architecture for real time image processing using VLSI

North-Holland Microprocessingand Microprogramming24 (1988) 309-314 309 A COMPUTER ARCHITECTURE] FOR REAL TIME IMAGE PROCESSING USING VLSI A.D. HOU...

471KB Sizes 4 Downloads 224 Views

North-Holland Microprocessingand Microprogramming24 (1988) 309-314

309

A COMPUTER ARCHITECTURE] FOR REAL TIME IMAGE PROCESSING USING VLSI

A.D.

HOUGHTON, N.L.

SE[D

lhe University of Sheffield Dept. of Electronic and Electrical Engineering Mappin Street Sheffield S1 3JD

7his paper describes the evolution of a computer architecture suitable for image process;ng, The design is based on a successful image processing machine called RAPAC [l] and is to be implemented using VLSI devices.

I.

INTRODUCTION

The system to be described in this paper is based upon an image processing system called RAPAC or Reconfigurable Attached Processor Architecture for Convolution. RAPAC was initially aimed at industrial inspection problems where quality control of objects on a moving conveyor is required in real time. During experimentation with the early system, a suitable source of input images was Found in the form of a road under the laboratory window. As a result, road traffic monitoring quickly became the main application to which RAPAC was put.

of RAPAC on a frame by Frame basis, and performs any high level processing required for a particular application.

>_~

~ ~ Acquistion

The inability to expand RAPAC beyond a certain limit has given rise to the design of a new system. The new system retains the highly successful features of RAPAC yet overcomes associated expansion problems ~hieh are physical in nature.

2.

AN OVERVIEW OR RAPAC

An overview of RAPAC is included to explain the design philosophy adopted For the new system. Figure I shows a schematic of RAPAC which is essentially a collection of processor and framestore units connected together in a star network [4]. The processors all perform low level image processing functions and are fabricated from discrete fTL logic. An attached host computer controls the function

I

L__~

~ - - ~ units.

|

|

Processor /

IframestOre

Central

I Dis~pla Y

Monitoring road traffic is a twofold process requiring both tracking and counting of vehicles through a road scene, and simple classification of the vehicles. Both of these tasks are formidable in image processing terms, yet both have been successfully accomplished ([2], [3]) in real time. The available processing power in RAPAC however is not sufficient to run both tasks simultaneously, and expansion of RAPAC is limited by its design.

F• I I

~

I Switch I Netw°rk I ~

~

[

Micro control

~

~

-

I L__J

and timing bus.

Interface Standard ] bus. HOST

CPU

Figure 1 If the architecture of RAPAC is determined by the route which an image takes through the system, then the architecture is dynamically reconfigurable for each frame of processing. Images travel around the system as packets of one whole image ( 6 4 k b y t e s ) , d i r e c t e d by the large central switch. A l l image t r a n s f e r is s y n c h r o n o u s , and o c c u r s i n c o n t i n u o u s frames each 20mS l o n g . The system i s t h u s c o m p a t i b l e with any s t a n d a r d v i d e o s o u r c e such as a camera o r VCR. When an image passes t h r o u g h a p r o c e s s o r , each p i x e ] o f the image i s t r e a t e d i n e x a c t l y t h e same way and the whole image u n d e r g o e s a s i n g l e common p r o c e s s . This is illustrated l a t e r by an exampl e.

310

A.D. Houghton, N.L. Seed/A ComputerArchitecture for ImageProcessing

Since the architecture (and hence the processing route) may be changed on a frame by frame basis, quite complex algorithms can be performed over a number of frametimes in spite of the low level processors available. The ability to change the processing architecture has been one of the keys to the success of RAPAC, and the switching required is performed by a large [TL network.

3.

above

way.

Frame store 1

1

metic unit

H

store 4

card

A PROCESSING EXAMPLE USING RAPAC

To help illustate how RAPAC operates, the steps required to detect vehicles in a road scene are outlined. Iwo processing stages are required from capturing the scene with the camera to producing a set of co-ordinates which correspond to vehicle positions in the image. At this rate, 25 frames per second may be processed. Figure 2 shows the architecture which is set up for the first 20mS of processing.

"rams

Frame store 1

Frame store 2

Arith metic unit

store 4

H

Local neigh bour

S

Card

Feature extract

Local neigh bour

~

Frame store 3

Figure 3

I

Frame store 3

Figure 2 lhe road scene is captured by the camera and digitised. The digitised image is both stored in framestore 1 and passed into an arithmetic unit where an image of the scene background (stored in framestore 2) is subtracted from it, so showing moving vehicles, lhe difference image is thresholded and a local neighbour operation applied to it to reduce noise, lhis semi-processed binary image is then stored in framestore 3. Figure 3 shows the second phase of processing. The architecture is rearranged so that the semi-processed image in framestore 3 is now passed back through the local neighbour processor which performs a second noise reduction operation. The final binary image is both passed into a feature extractor and combined with the previously captured image in framestore 1 to produce the new display image. The feature extractor passes the centroids and sizes of all the vehicles detected in the image to the host. Figure 4 shows a typical binary image of vehicles detectected in the

Figure 4 4.

THE LIMITATIONS OF RAPAC

Since there is only one arithmetic unit and one local neighbour processor, the ability to route the image is vital where they are required more than once for an operation. This flexible architecture is one of the main keys to the success of RAPAC, but there is a high price to pay for it. The central data switch is a high speed board built out of discrete TTL devices and has eight image input and output ports. Any one of the eight inputs may be routed to any of the other seven outputs. Each processor and framestore is connected to the switch by several (usually two) 16 way ribbon cables which carry signals and earth returns, lhe number of ports available on the switch

A.D. Houghton, N.L. Seed / A Computer Architecture for ImageProcessing

determines the numbers of maxlmum power which RAPAC primarily this limit in which prompted the design 4.1

cards and hence the can support. It is the power of RAPAC of a new system.

The Connected Star Network

The first attempts to expand RAPAE centred about modifying the switching scheme. [he physical size of the switching network increases as the square of the number of ports. Without resorting to more speeialised switching arrangements such as the Bones switch [5], the switch size is limited to 7 x 8 switch was added to RAPAC and connected to the first switch by a bridge. Figure 5 shows this arrangement.

311

processors which can send data in a single frametime to the number of available slots, but does not limit the number of processors. Figure 6 shows a bus type switch with associated timing for a typical system. implementation of such a switch in TTL logic |imits the number of time slots on the bus to about 16 where the basic image pixel rate is 5MHz. Data t h u s communicates on t h e bus a t 80MHz. This means a 16 by n s w i t c h can be realised where n i s t h e number o f a v a i l a b l e p r o c e s s o r and f r a m e s t o r e u n i t s .

I< 1 p i x e l

time

(200

nS)

>I

--[ Switch

1

Switch

J-I

2

I

I

I

I

I

I

16 a l l o c a t a b l e I

I

'B r i d g e '

I

I

I

Figure 5 This modification allows almost double of the number of processors to be used with RAPAC, but with the inclusion of an image bottleneck. During one frame period, only two images may pass between the two networks, one in each direction. To prevent an image bottleneck from occurring, very careful scheduling of tasks must be done, where possible splitting a job into two parts which are largely independent. Inevitably some processes cannot be fitted onto the architecture thus taking longer white images queue up to be passed across the network. A Time M u l t i p l e x e d

I

I

I

I

each pixel

and control

bus

I

Processor

4.2

I

I

image

I

I

slots

and

framestore

units

Figure 6 Modifying RAPAC t o operate on a bus type switch rather than a star type has many attractions in the short term but has not been adopted because it still represents a dead end in the longer term. Provided that a 16 by n switch is sufficient for most image processing applications, then the system is quite flexible, but the physical size of the processor cards presents the next limitation on the maximum power of the system. Each processor card in RAPAC measures approximately 22cm by 22cm, and is based on a double extended eurocard rack.

Switch

A second approach to routing images through the RAPAE processors was based around a time division multiplexed but network structure. With a bus network each processor or framestore connects onto a common image bus which is divided into time slots, and each time slot may be allocated to the available cards. [his system has several advantages over the star network already in use. Firstly the bus based switch does not require separate connectors for each processor, but all the cards share the same bus. This immediately makes the system more reliable and cheaper. Secondly if the slots can be dynamically allocated to the processors on a frame by frame basis, then there is no limit to the number of processors which can reside on the switch. The switch limits the number of

5.

THE VLSI SOLUTION

The recent availability of VLSI tools and facilities at cheaper prices make VLSI an attractive proposition wherever reduced board size and chip count is needed. One way of using VLSI could be to simply replace the existing RAPAC cards with smaller VLSI equivalents and indeed this is being done in one or two special cases. By far the most effective way of using VLSI however, is not to generate lots of different VLSI chips for different jobs but to generate one or a small set of chips common to every process. The reason for this is that it is not so much the quantity of chips that costs, indeed the reverse is true, but it is the number of different types of chip.

A.D. Houghton, N.L. Seed/A Computer Architecture for ImageProcessing

312

If the whole or RAPAC could be based around a single VLSI processor then it would be very cost effective. To do this clearly implies a rethink of the basic RAPAC structure, basing a new system upon the particular aspects which makes RAPAC suitable For real time processing. The two main keys to the success of RAPAC are believed to be the flexible architecture (by virtue of the switch) and the way data is handled by simple but fast processors in an image packet form. Figure 7 shows a proposed structure For a VLSI based version of RAPAC. At first sight this appears to be a fixed parallel array architecture and physically this is true. The real system architecture depends on the route taken by the data through the processors. If the VLSI devices marked 'P' on the figure are programmable and can perform any one of the functions available on the old RAPAC, then the physical structure superimposed onto the processors become almost irrelevant, fhe physical structure now serves only to provide enough interconnection between the units to allow setup of the parallel and pipelined routes required For any particular operation. A parallel structure was chosen since it allows expansion in any of four directions.

I m a g e t r a n s f e r o c c u r s on these parallel busses

lhe parallel image buses which appear unconnected in Figure 7 could serve a variety of purposes. Primarily the buses allow almost unlimited expansion of the processing machine where more power is required. Alternatively they could connect to special purpose processors such as acquisition and display cards to allow input and output of images or they could be wrapped around and connected to themselves to generate novel architectures. The control bus serves as a means of programming t h e v a r i o u s u n i t s t o p e r f o r m the required functions, In Figure 7 the devices marked 'P' must not only be able to Function like processors, but must also be able to act like framestores. Clearly with the level of integration available today it is not practicable to put enough memory (64K bytes) on the devices themselves, so each device must have some associated external memory. In a dedicated processor solution to a specific problem it might not be necessary for every processor to act as a framestore so savings are possible by omitting the memory where not required. The parallel architecture of Figure 7 describes a general purpose machine which may be used in a development environment. When a solution to a problem is Found, optimisation of the physical structure is simple. If For example more communication is required between processors which would be physically separated in a parallel structure then the image ports can simply be rerouted Figure 8 shows one possible arrangement of the processors to give more interconnection. Only P] and P5 are not directly connected together.

/ Control bus used by

a host

micro.

Figure 7 Figure 8

At first glance the structure shown in Figure 7 appears like a transputer. In fact, the system is more like a RISC version of the transputer with modified data flow, handled in packets and transferred on parallel buses. The central switch of Figure 1 has effectively been turned inside out or transformed alongside a reorganisation of the local association of the processors hence giving this structure.

In section 5 an example of image processing was given, showing how cars can be detected in a road scene. We will consider this example again and map the problem onto the new RAPAC system. Figure 9 shows a system using four VLSI processors.

Io perform vehicle detection in 40mS. the processors can be set up in the following way.

A. D. Houghton, N. L. Seed / A Computer Architecture for Image Processing

313

onto the n o r t h o u t p u t . P1 is set up as an arithmetic unit, combining the binary vehicle image with the road scene image previously stored in the P1 framestore. The combined image is sent to the east output. P2 is set up as a feature extractor. The display image coming in on the west input is passed to the feature extractor, the P2 f r a m e s t o r e and the n o r t h o u t p u t to the d i s p l a y board (DISP).

N

S

Figure 9

First

In the second 20mS of processing, the binary feature image receives a further two local neighbour operations from P3 and P4. A display image is created by P1 and the Final results are extracted by P2. Figure lO shows the processing architecture seen by the image data.

20mS of processing.

P1 is set up as an arithmetic unit, subtracting its south input From its west input, and outputting the result ot its east output. The current scene is stored to the P] framestore. P2 is set up as a local neighbour processor, receiving data from its west input and sending to its south output. The P2 framestore holds the display image which is directed to the n o r t h output. P3 is set up as a local neighbour processor, receiving data from its north input and sending to its west output. The P3 framestore receives data from its west input and stores it. This image is a semi-processed binary image of the detected vehicles.

First

P4 is set up as a local neighbour processor, receiving data from its east input and sending to its east output, fhe P4 framestore contains the reference background image which is directed to the north output. In the first 20m5 of processing, the raw road scene image, digitised by the acquisition board (ACQ in Figure 9), and is passed into PI. PI performs the background subtraction passing the difference image on to P2. P2, P5 and P4 are all set up to perform local neighbour noisereduction operations, storing the semi processed image in the P3 framestore. Second 20mS of processing. P5 is set up as a local neighbour processor and the semi-processed image in the P3 f r a m e s t o r e i s passed through the P3 processor u n i t and sent to the west o u t p u t . P4 is set up as a local neighbour processor, taking data from its east input and passing it

F =

20 m S

of

processing.

Framestore

Second

20 m S

of

processln¢

ro

Figure 10

Using the new arrangement o f processors ( F i g u r e 9), not o n l y was the v e h i c l e d e t e c t i o n program ( S e c t i o n 3) p o s s i b l e , but w i t h the a d d i t i o n o f t h r e e e x t r a l o c a l neighbour noise reduction processes. It is p o s s i b l e to execute the v e h i c l e d e t e c t i o n programme w i t h o n l y t h r e e VLSI p r o c e s s o r s , and s t i l l perform up to t h r e e l o c a l neighbour o p e r a t i o n s .

6.

CONCLUSIONS

This paper presents the basis for a high speed

314

A.D. Houghton, N.L. Seed/A ComputerArchitecture for ImageProcessing

image processing machine suited to real time applications. ]he system proposed is founded upon experience gained from RAPAC, a successful image processing machine. ]he ne~ system is based around a common VLSI processor which can be arranged in a variety of physical configurations, and provides a flexible processing path to image data. Key features of the new system include almost unlimited expansion capability, and a very high degree of communication between individual processors. The use of a single type of processor makes the system cheaper to manufacture and brings a high degree of standardisation into any control software. Also this paves the way for higher level compiler type languages which automatically schedule a task onto the available processor and framestore resources.

REFERENCES

[1]

[2]

[3]

[4]

[5]

E l p h i n s t o n e , A.C. et a l , RAPA£: A High speed Image Processing System, Proc. lEE Vol. 134, PL. E, No. l , Jan. ]987, pp 39-46. Seed, N.L. eL a l , Real Time Processing of Infra-Red Images from Road Traffic, Proe. SPIE 590, 1985, pp. 233-240. Houghton, A.D., Seed, N.L. and Smith, R.W.M., Real Time Vehicle Recognition, in print. Stallings, ~., Data and Computer Communications, MacMillan, Second Ed. 1988. Narraway, J.3. and Venkatesan, R., Fault Diagnosis in Benes Switching Networks, Proc. IEE Vol. 134, Pt. E, No. 2, Mar. 1987, pp 78-86.