High-speed processor for realtime visual inspection Visual inspection tasks in industry require fast, high volume processing but for a justifiable capital investment. J M Edmonds* and E R Davies explain how bit-slice processors satisfy this requirement in a manufacturing system
The paper describes a complete visual inspection system, including the SIP sequential image processor, accompanying software and an algorithm for inspecting biscuits in production. It also analyses the performance of the SIP system. Bit-slice technology is used because of the constraints imposed by industrial manufacture, particularly cost, speed and adaptability. The SIP has been found capable of coping with product rates of ~ 11 products per second in a typical (rectangular biscuit) inspection task, though with suitable additional circuitry this capability should rise to ~30 products per second. microsystems realtimesystems bit-slice microprocessors image processing automated inspection
Over the past two decades computervision has advanced steadily and many of the early hopes are now being realized. At this stage we are starting to see an explosion of industrial applications, particularly in the areas of automated visual inspection, automated assembly and automated vehicle guidance. However, a major problem remains before these systems can achieve their full p o t e n t i a l - that of processing the large amounts of information in visual images sufficiently rapidly to be of practical use, while at the same time using computer hardware that is sufficiently cheap to justify capital investment. In fact, the design of cost-effective realtime vision systems is still a topic on which much research remains to be done. We have examined the problem of realtime implementation of visual inspection tasks. Typically, these are concerned with the scrutiny of products such as cakes, pistons and circuit boards during manufacture, throughput rates being in the region of 10 parts per second I-7. Even Machine Vision Group, Department of Physics, Royal Hoiloway & Bedford New College, EghamHill, Egham,SurreyIVV20 0EX,UK *Now at: PhilipsResearchLaboratories,CrossOak Lane, Redhill,Surrey RH1 5HA, UK Paperreceived: 5 December 1989. Revised:9 October 1990
for images of 128x128 pixels, this corresponds to processing rates of well over 100 k pixels per second. Perhaps the most difficult part of the inspection problem in such cases is that of finding suitable processors and building viable cost-effective systems using them. In this paper we examine how well bit-slice processors fit in with this aim. The paper examines the basics of visual inspection, and describes SIP, a sequential image processor employing commercial bit-slices, and its microcode. We report how a particular biscuit inspection task has been implemented using SIP, and give an overview of SIP's performance. B A S I C S OF V I S U A L I N S P E C T I O N
The basic purpose of visual inspection is to control the quality of manufactured goods 6. Pictures are obtained by video cameras (for stationary products) or by line-scan cameras mounted above a conveyor (for moving products); the digitized images are then passed to a computer system which can analyse them and cause any defective products to be rejected. Automated visual inspection has several important advantages over inspection by humans: • Instead of carrying out inspection on a sampling basis, 100% inspection can be achieved. For production rates of 5-10 items per second, and where several inspection processes may have to be carried out on each product, it is unlikely that the human eye will be able to assimilate the required amount of information in the time available. Quite significant defects will thus go undetected. With an automatic system this should be avoided. • A human cannot measure 'at a glance' the dimensional features of an object to the same accuracy as a computer-based vision system. The latter should be able to obtain and analyse the required numerical data to any specifiable level of precision. • Many production lines run 24 hours a day, seven days a week, and indeed this is usually vital for keeping costs down. However, the tedious nature of visual inspection
0141-9331/91/010011-09 © 1991 Butterworth-Heinemann Ltd
Vol 15 No 1 January/February 1991
11
makes 30 minutes the probable limit for reliable human control. • Machine inspection can be performed in environments which humans would find uncomfortable or impossible to endure, e.g. where noise or heat is excessive, or where there is a noxious atmosphere.
Hardware for realtime inspection As noted above, a major problem of automated visual inspection is the need to scrutinize products in real time. Since typical rates are 5-10 (or even 20) per second, the image of each product must be obtained and analysed in ~150 ms. Conventional computers are usually too slow to process information at these rates, whereas parallel processors tend to be expensive and in some cases illmatched to the proposed image analysis tasks (see below). In any case, it should be noted that a high degree of sequentialism is often inherent in inspection algorithms. Thus there is a need for special sequential processors which can tackle these vision tasks efficiently. Such processors may be dedicated special-purpose hardware systems, but this solution is rather inflexible and badly suited to change of product. The alternative is programmable high-speed hardware. Though this solution tends to inflate software costs, hardware costs may be reduced, and this should represent a more satisfactory situation. Indeed, cost is often a crucial factor in choosing a suitable vision system. Our experience suggests that an affordable cost is often less than £10 000: as Davies6 notes, above this figure, the rather low profit margins characteristic of food product manufacture might be eroded excessively'. With this background of need for programmable, costeffective realtime inspection systems, we started development of our bit-slice based SIP system, whose architecture and microcode are described below. DIPOD, an earlier bit-slice based machine developed for vision work 8, turned out to have too high a cost for use in many inspection applications: we designed SIP with this constraint in mind.
SEQUENTIALIMAGE PROCESSOR A block diagram of SIP is shown in Figure 1. The basic architecture is a 16-bit machine which is partitioned into six independent units - - the processor, the image processing section, the local memory, the program, I/O and the boot section. All except the last are made independent so that they can operate in parallel.
Processor The processor section consists of four AMD29203 'super' (4-bit) bit-slice processors, operating at 8 MHz, together with a dedicated hardware multiplier. Each 29203 processor contains sixteen 4-bit registers (hence giving a total of sixteen 16-bit registers), three bidirectional I/O ports (DA, DB and Y), an ALU and in-built logic for division. The 29203 was chosen mainly for its ability to output the contents of two registers concurrently with an ALU operation. A register read makes the contents of a register available at the output of one of the three ports (determined by the instruction) as well as to the ALU.
12
Thus, two registers can simultaneously be r~.a(i, pro(: e~s~.~i and output to ports DA and DB. As ca~ b(~ seen horn Figure 1, this means that the contents of two registers ~ar~ be loaded, in a single clock cycle (125 ns). into the input registers (MX and MY) of the multiplier: the values are then multiplied on the next clock cycle. Note that any data present on the Y-port (:an be loaded into the register which addresses the DB port; thus, the result from the multiplier can either be written back into one of the registers via the Y-port, or fed back to the input of the multiplier by selecting the appropriate buffers. Alternatively, the ALU can proceed concurrently with the multiplier, and the results need be fetched only when needed. In either case, multiplication requires .just two machine cycles.
Image processing section The core of the design is the image processing section. This consists of two 128x128 image planes, P and Q, an address generator (x,y mapping) and a lookup table as depicted in Figure 1. Although 128xl 28 image planes are somewhat small, they were chosen for two reasons: 1. At the time of design, eight 64kxl chips would have been required to hold a single 256x256 image. This would have made SIP too expensive - - tile two image planes alone costing almost £2000. 2. In general, manipulation of a 256x256 image will take approximately four times as long as for a 128x128 image: the situation will be even worse for a 512x512 pixel image. However, experience has shown that 128x128 is acceptable for much realtime inspection work, thereby keeping processing time to a minimum. One of the more frequent operations in both image processing and image analysis is manipulation of a pixel and its eight neighbours within a 3x3 window. This (:an lead to considerable computational effort (both for accessingthe data and for processing it) with conventional processors. Indeed, merely fetching the data requires the relevant instruction to be fetched and executed and an appropriate address generated. Since this can cause quite a significant overhead, we decided to implement certain pixel manipulation functions in hardware. First, appropriate values are written into an X,Y register (Figure 1) to access a specific pixel; the P-register then indexes neighbouring pixels relative to this (X,Y) position. To translate these virtual addresses into absolute addresses in the physical image RAM, a lookup table is used. In SIP four 4kx4 RAM chips are used to access a 5x5 window (in each dimension 7 + 5 -- 12 bits are used to generate a 7-bit address). The neighbours indexed by various values are indicated in Figure 2. As a result the following performance was achieved. It takes one clock cycle (125 ns) to load the X,Y or the P-register. The pixel is then available to the ALU on the next clock cycle and can be manipulated and written back to the image plane in the same cycle. However, this twocycle access is reduced tojust one cycle by pipelining the pixel fetch. This is possible since the processor and image sections are independent; thus, as a pixel is being fetched, the pixel previously obtained can be operated on. Effectively, this means that any pixel can be accessed in a single cycle. This has been found to dramatically improve the performance of many vision algorithms. For example,
Microprocessors and Microsystems
16
16
MX
f¢'
MY
f
f
4Kx 16 Localmemory
16. •
~ 1
3
a
12 .,,e"
d
16,
8 Image plane P
16 16, /
~
(128x128)
16
•
•
16
8
16
Image Plane Q
J
(128x128)
DA
DB
4 x29203 Bit-sliceprocessors
Y
16
[
1
,6: F VMEbus [
7.
75
2910
interfaceand ['--,[E,-.-- Program control [~] " Counter 4
4Kx16 ]
Lookup table
4K x 15 Microprogrammemor~
~,~
Boot ROM
Figure 1. Block diagram of SIP and its VMEbus interface. For the sake of clarity, bus transceivers, etc. are not shown. The bold numbers 1-6 show the main sections of SIP, as described in the texL At the time of writing the boot section has not been implemented. 1 - processor section, 2 - image processing section, 3 - local memory section, 4 - program section, 5 - I/(9 section, 6 - boot section, a - address lines, d - data input lines
a Sobel operator applied to a 128xl 28 image takes 92 ms without the pipelined fetch but only 55 ms with the pipelined fetch. (Without the indexing lookup table the time taken would have been 130 ms.) The pipelined pixel
16
1E
14
13
12
17
4
3
2
II
18
B
0
1
10
19
6
7
8
9
20
21
22
23
24
Figure 2. Notation for single digit index into processing window. The full notation accepted by the assembler includes a letter P or Q representing the appropriate image plane, plus one of the numbers 0-24 listed above, e.g.
PO, Q8 Vol 15 No I January/February 1991
fetch strategy is typically used when there are several reads from an image plane followed by a write, as occurs with the Sobel and many otherwindow operators. In such cases efficiency is virtually doubled, as the above timings indicate. Further details are provided below and in Table 1 (see also Table 2).
Table 1.
Detail of coding for the Sobel edge detector
Cycle
Effect
Instruction
1
Load 1 into P-register on falling edge of clock (this cycle is generally saved by pipelining with previous instruction)
2
Move result to R0, shift left and SHI_ R0 load 2 into P-register on falling edge of clock
3
Add result to R0 and write 8 to P-register
4
Add result, load 0 to P-register: ADD P8, R0 write result in second half-cycle MOV RO, P0
ADD P2, R0
13
Local memory The local memory section consists of 4 kbyte of highspeed static RAM. Since the processors have only 16 registers, extra RAM is required for variables and arrays, etc. The RAM address is controlled by two 8-bit up/down counters. These were used so that sequential reads through an array stored in RAM would be both fast and efficient. Access to the RAM is two cycles for a read or a write (load address, then read/write); however, since this section can operate concurrently with all other sections, careful programming often means that the access time of the RAM can (effectively) be eliminated.
Program section The program section consists of the program memory (a 4kx75 RAM) and a program sequencer (an AMD2910). The program memory contains the microcode, whose words are 75 bit long because of the need to control the three sections previously mentioned. The 2910 allows quite complex branch routines to be executed and also makes the microcode more compact.
I / 0 section To improve its commercial acceptability, SIP was interfaced to the VMEbus. This allows it to communicate with other VMEbus peripherals such as a disc interface, external RAM and framestores. In the SIP design an external framestore is required for image acquisition and display, and this acts as a bus slave. For SIP to communicate on the VMEbus and become a bus master, a field of microcode was reserved which included DS0, DS1, request and release bus, read/write to VMEbus and a strobe that outputs addresses to the VMEbus. These signal lines were interfaced to a state machine sequencer, held in an FPLS,which monitors the VMEbus and takes control of asynchronous VMEbus transfers; in particular, this device manages the 'bus request' and 'bus granted' lines, and thus eliminates much of the complexity of VMEbus interfacing.
included is a control bit which allows SIP to be put ir~to single stepping mode: this is used mainly fc~rdownloading code and for debugging. As mentioned above, SIP's microcode consist> ot 75 bit. From a programmer's point of view, the micro~ ode is split into the following sections: • Processor control section (which includes the AMD29203 and the multiplier) - 23 bit • Memory control section (which includes both the image planes and the local memory) - 10 bit • Program control section (which includes the data sent out on the internal data bus, the program counter instruction and various output enable controls) - 35 bit • I/O control section (which handles the protocols for controllingthe VMEbus when in master mode) - 7 bit. The following details should be studied in conjunction with SIP's architecture (Figure 1). The 23 bit of the processor control section includes 9 bit for the basic AMD29203 instruction and 4+4 bit to address two of its 16 internal registers, plus 6 bit to: control the functions of the registers; provide the carry input; and to control the loading of the MX and MY registers of the multiplier, and its output enable. The 10 bit required for the memory control section include read and write enables for the image planes and local memory, plus various commands for control (count up, down, load ~r hold) of the local memory address register and for the X and Y registers that map onto the lookup table. The 35 bit required for the program control section includes 3 bit for the condition code select control (used for conditional jumps), 16 bit for the data that is output onto the main data bus, and 4 bit for the instruction to the AMD2910 program sequencer, plus 11 output enables of the various devices on the board, and the 'done' bit. The 7 bit of the I/O control section consist of various strobes that interface to the VMEbus via an FPLS VME controller, as mentioned above. These control the requesting and releasing of the VMEbus along with various strobes that are necessary to determine when the bus has been granted and which control the bus once granted.
Boot section
Discussion
This section (which has not at the time of writing been implemented) is designed to 'boot' the system from cold, with the transfer of code from a slow ROM to the fast microprogram memory. The aim of this is to avoid the need for a separate host CPU when in factory use.
As mentioned above, realtime image processing hardware has to handle data rates of the order of 100k pixels/s. To operate on a single pixel, one normally has to fetch it (in many preprocessing stages the neighbours also have to be fetched), manipulate it, and in many cases write the resulting answer back to the image plane. Analysis of image processing algorithms on commercial processors (e.g. J11, 68000) shows two main areas of weakness: the pixel fetch is normally quite slow (and is made even more severe when the neighbours also have to be fetched); and operations on pixels are often quite simple, but are slowed down because these processors are 'general purpose'. For instance, they can take several cycles to carry out an add function. Thus the design had to find solutions to these two problems. First, pixet access was implemented in hardware in order to make it run fast: furthermore, to reduce the effective pixel access time, it was made to run in parallel with the processor. Second,
Microword structure SIP's registers and image planes are memory-mapped onto the VMEbus as an 8-bit control/status register, six 16-bit words for the microcode and two banks of 16 kbyte for the image planes. The control/status register allows other processors to monitor and control the state of SIP via the VMEbus. This register includes 'start program', 'stop program' control bits and the 'done' status signal: this bit goes high when SIP has finished its current program and is generally monitored by another processor. Also
74
Microprocessors and Microsystems
the processor was chosen primarily for speed, thereby leading to use of a bit-slice processor. Further analysis showed that multiplication is frequently used in image processing algorithms and that small arrays are commonly employed (e.g. for histogramming). For this reason, a small amount of local memory and a high-speed parallel multiplier were incorporated into the design. These were made to run in parallel with each other since analysis of the algorithms developed within this laboratory showed that effective use could be made of them in this configuration. The design of the system was straightforward once all the major design decisions (see especially Figure 1) had been made.
MICROCODING As mentioned above, bit-slice designs are generally tedious to program because a great deal of hardware information (architecture, internal bus interconnections, etc.) must be known in order to program them. To alleviate this problem, an assembler was developed. As in certain other assembly languages, instructions take the basic form: < o p c o d e > <1 st operand type>, < 2 n d operand t y p e > where the operand type is either data (D), register (R), image plane access (P or Q), x-y register access (X or Y), indexed (I) or local memory (M) access. For example, M O V #4,RO moves the number 4 to register zero. Note below that comments appear after a semicolon. The effectiveness of the SIP pipelining operation may be seen from Table 1, which shows further detail for the following section of code. ; Sobel edge detector ; MOV P1, RO SHL RO ADDP2, RO ADD PS, R0 MOV R0, P0
; Intensity histogram computation VAR hist:256 MOV #hist, R0 CLR R1 clr: ADD R1, R0 CLR (RO) INC R1 CMP #256, R1 BNE clr
Number of cycles Non-pipelined
Pipelined
; this section of code ; computes(P2 + ; 2. PI +PS) andis ; neededfor the Sobel ; y-componentof ; intensitygradient
2 1 2 2 2
1 0 I I 0
;
9
3
total number of cycles
First, PI is fetched by loading I into the P-register. On the next cycle, PI is loaded into R0, then shifted left; simultaneously, the P-register is loaded with a 2. (Note
Table 2.
that w e have here made use of a feature of the 29203 - that one can optionally shift the result of an ALU operation in either direction before outputting to the Y-bus. Clearly, this feature is of value in a Sobel operator which requires several multiply-by-two operations.) On the next cycle, P2 is added to RO while simultaneously loading 8 into the P-register. P8 is then fetched, added to R0, and written back into the image plane in the same cycle. Pipelining (at present carried out manually) reduces the above section of code from 9 to 3 cycles. The speed improvement resulting from pipelining for certain other common image processing algorithms is shown in Table 2. As SIP has the ability to operate on a pixel and write the result back in the same cycle, many operations take just 1 cycle (125 ns) to execute. To clarify how the architecture of SIP relates to the microcode, we here include an example showing an intensity histogram of an image being computed. The number at the end of each instruction indicates the number of lines of microcode required in each case:
define an array of 256 words
0
clear histogram use indexed mode finished?
MOV #hist, R0
move the absolute base address of array hist to register 0
;
1
APPLY MOV P0, R1 ADD R0, RI INC (R1) END
initiate a scan over the image move the value of P0 to register 1 add to R0 to find offset into array increment using indexed facility check for end of image
; ; ; ; ;
1 2 1 3 2
The APPLY... END construct used here is handled automatically by the assembler. As can be seen from Table 3, the APPLY statement takes one cycle and initializes the X and Y counters; it also pushes the address of the next instruction onto the stack. The END instruction takes two cycles: on the first cycle the X-Y counters are incremented as appropriate, and on the second, a check is made for the end of the image. Fora 128xl 28 image, the number of cycles required is (6x256 + 2 + 8x16384 + 2) - 130000, giving a total execution time (at an 8 M H z clock rate) of 16 ms. A listing of the full instruction set can be found in Reference 9.
Performance of SIP on basic imaging functions (the timings are for 128x128 images)
Algorithm
SIP (a) + (ms)
SIP (b) + (ms)
PDP-11/73* (ms)
Speedup of (b) rel to 11/73"
Sobel Intensity histogram Threshold Mean (3x3) Median (3x3) Thin (5-iterations) #
93 16 16 65 500 690
56 16 11 65 445 471
1076 689 292 764 11607 10712
19 43 26 12 26 23
+Column (a) is for non-pipelined and (b) for pipelined code * Note that a PDP-I1/73 is ~ 2.5 times fasterthan a PDP-II/23 (seeTable 5). Hence the mean gain in speed ~ 20 relative to a PDP-11/73correspondsto a ~;in ~50 relative to a PDP-11/23 avies and Plummer algorithmTM
Vol 75 No 1 January/February "/997
15
Table 3. Detail of coding for intensity histogram computation Cycle
Effect
1
Move base address of table to R0 MOV #hist, R0
2
Set X and Y registers to 0 and set up scan Set P-register to 0
3 4 5 6 7 8 9
Instruction
APPLY
MOV P0, R1 Move result to R1 ADD R0, R1 Add R0 to R1 Load R1 to local memory address INC (R1) Get result, add 1: write back in same cycle Add 1 to image address counters END Test for end of image
INSPECTION ALGORITHM When designing industrial inspection algorithms, it is important to ensure that they are fast, robust and accurate. In these respects we have found the Hough transform (HT) 1° useful for object location. In particular, it is relatively insensitive to noise and the effects of defects or partial object occlusion. In addition it is easily implemented on SIP. Note that though the edge detection could be implemented naturally on a SIMD machine, this is not so for the Hough transform 7. The basic idea of the HT is that each primary feature located in the input image is permitted to contribute a vote at some position in a separate 'parameter space' indicating the presence of a particular type of object. Analysis of the peaks in parameter space then indicates the presence of objects. For example, circles may be located by accumulating candidate centre positions in parameter space. The primary features that are used to initiate this recognition procedure are usually orientated edge points. Earlier work in this research group involved the location of round biscuits using the HT, special-purpose hardware being developed both to carry this out and to scrutinize the products in real time 7. Here we report an algorithm for inspecting rectangular chocolate-coated biscuits which we have implemented on SIP. The rectangular biscuits are made by sandwiching cream between two wafers and then coating the biscuit with chocolate. Possible defects that can occur on such products are: too much chocolate (chocolate dripping over the sides); too little chocolate; and incorrect size or shape (non-aligned biscuit wafers). In our case, the HT is used to determine the orientation of the biscuit and locate the sides. To achieve this, edge points are first found using the Sobel operator, which is well suited for this purpose since it permits edges to be orientated within 1 o11. Once the sides have been located, the biscuit can be matched against a predefined model, thereby allowing the size, the amount of excess chocolate, and any places where chocolate is missing to be found. Note that the natural variability of biscuits (as for other food products) is quite considerable, and hence decisions on quality are rather difficult to make 6.
16
The steps in the inspection process are now as follow~: 1. Apply the Sobel edge detector to the imag~ a~d determine the orientation of each edge element. 2. Increment the positions in an accumulator array that correspond to these angles 12 3. Find significant peaks in parameter (orientation) space indicating the most prominent edges in the image (for rectangular products there will be peaks at 90 ~ intervals). 4. From this information, locate each side using least squares to improve the fit, and then deduce the corner points, finally obtaining a best fit rectangle. 5. Determine whether the dimensions are correct (i.e. check the wafers have not slipped during production). 6. Check for insufficient chocolate within the best fit rectangle. 7. Check for excess chocolate outside the best fit rectangle. 8. Initiate rejection if the product is defective. Further details of this algorithm may be found in Reference 9. It was implemented on SIP and the average time to locete and scrutinize each biscuit (including image acquisition and retrieval from an external framestore) was 130 ms, giving an average throughput capability of 7-8 products per second. The limiting factor here was a rather slow external framestore: use of a framestore with more reasonable access times (i.e. <500 ns) would bring this time down to ~90 ms, hence permitting inspection at ~11 products per second. Clearly, if fewer checks have to be made on each product, even higher throughput rates can be dealt with. A breakdown of the timings is given in Table 4. Sample results obtained with this algorithm are shown in Figure 3. Note that the HT rectangle and circle detection algorithms described in this paper are highly robust and can cope with considerable amounts of noise and image clutter- see for example Figure 4.
OVERVIEW OF SIP PERFORMANCE In this section we give an overview of the performance of the SIP system. In particular we compare it with the 'IMP' (Imaging Multi-Processor) inspection system described in Reference 7, which represents earlier work carried out in this research group. The IMP system is a round biscuit inspection system,
Table 4.
SIP timings for rectangular biscuit algorithm
Operation
Time
(ms) Sobel edge detector Finding and accumulating edge orientations Estimating angles of sides Least squares fitting of sides Finding where chocolate is missing Finding excess chocolate
55 5.8 0.3 13 5.8 1.8
Total
81.7
These times differ only slightly between biscuits. Use of a suitable external framestore on the VMEbus should permit an overall inspection time including image acquisition within ~90 ms, corresponding to product throughput rates ~11 per second (see text)
Microprocessors and Microsystems
a
i
c
i
i
ilii
ii ¸¸
d
Figure 3. Results obtained with the rectangular biscuit inspection algorithm: a a rectangular biscuit of somewhat irregular shape; b the fitted rectangular outline with a shape discrepancy clearly marked in one corner; c another rectangular biscuit; d rectangular outline, this time with excess chocolate identified
Figure 4. Robustness of the HT algorithms described in this paper. The centres of all the circular O-rings have been located accurately by the HT-based algorithm in spite of the many objects partly overlapping them which makes use of a PDP-11/23 host processor to which hardware accelerators are added to perform the computation-intensive parts of the algorithm. As a result the system was able to perform in real time at product rates ~4 per second, though simple enhancements (including replacing the PDP-11/23 host by a PDP-11/73, roughly 2.5 times faster) were reported as bringing the overall speed up to 11-12 products per second (88 ms per product) (see Figure 1 in Reference 7).
Vol "15 No 1 January/February 1991
When comparing two different systems, it is important to be sure of comparing like with like. Since the systems have not been run on identical inspection tasks, we here focus on a few of the functions listed by Reference 7, so that we can compare them directly with our own data. The result is seen in Table 5. This table shows that SIP is able to perform the IMP host functions exceedingly quickly, with an average gain in speed of the order of 50. Indeed, SIP gives a spectacular speedup of nearly 80 for the crucial, slowest function, that of edge location. On the other hand, when corn pared with the functions performed in IMP by dedicated TTL hardware, there is actually a drop in speed, as might be expected with a programmable system. However, the loss in speed is quite small, and it seems that the speed obtainable using SIP should be adequate for many realtime inspection applications. If in some instances the speed were inadequate, much of the problem would undoubtedly lie with the edge detector and other specific functions. Hence we envisage that, rather than using the IMP host with a particular amount of dedicated hardware, it will be more cost-effective to use SIP as host with a smaller amount of dedicated hardware. Table 6 lists some of the IMP and SIP-based systems that could be used for implementing round biscuit algorithms. The C*T (cost*time) criterion function for optimizing hardware-software tradeoffs, discussed in References 7 and 13, indicates an optimum system based on SIP but with use of further carefully chosen accelerators. None of this precludes the use of more up-to-date bitslices or other processor chips (such as transputers, see
17
Table 5. Performance of IMP and SIP on inspection routines (both the IMP and the SIP systems are evaluated h~ routines related to round product inspection, in 128x128 images) IMP
Function
1 2 3 4
Find edge points Accumulate points in parameter space Find peaks in parameter space Find light area (no chocolate cover)
SIP
11/23 host (ms)
Dedicated hardware (ms)
(ms)
Speed relative to 11/23
4265 86 20 434
25 + 10 +
55 3.3 0.4 11
78 26 45 39
+later reduced to -half these figures7
Table 6,
Comparison of IMP and SIP-based systems
System
Cost
(£) 1 IMP with 'optimal' 24700 complement of accelerators
Time (ms) 31 +
C*T (£s) 770
2 IMP using PDP-11/23 + original accelerators*
11500
220
2500
3 IMP using PDP-11/73 + improved accelerators*
12500
88
1100
4 SIP
6000
~120
~720
5 SIP with improved external framestore
6000
~80
~480
6 SIP with IMP (improved) edge detector
9000
~30
~270
11500
~25
~290
7 SIP with all IMP (improved) accelerators
This table demonstrates that a particular SIP-basedsystem (item 6) is optimum for round product inspection, in terms of the C*T (cost*time) criterion function: note that this criterion is only valid if there are no overriding cost or speed constraints(seediscussionin Reference 7). All timings relate to 128x128 images N.B.Severalof thesefiguresare estimates,basedon experiencewith the SIP system *Performance of IMP was limited by an overall cost figure +Estimate assuming typical VME and memory access times (see discussion in Reference 7)
below). On the other hand it is worth noting that using N identical chips tends not to give C*T values as low as might a priori be expected since, for multiprocessors, cost is proportional to N but speed is rarely increased by quite as much as N because data bottlenecks arise. Hence searching for the right types of coprocessor is a worthwhile task. In this context it is worth briefly considering the possibility of using transputer-based multiprocessor systems in our type of inspection application. In fact, transputers have been used quite widely for image processing and analysis. However, though they would seem a priori, with their integrated high-speed datachannels, to be ideally adapted for imaging work, two main problems arise in practice: the first is how to partition the task between the various transputers, and
18
the second is that of devising the best architecture for distributing the data between them. Topologies such as a processor farm using transputers have in some cases shown a near-linear increase in performance with number of processors. For example, a Sobel edge detector implemented on a single T414 transputer was found to take 2616 ms when operating on a 256x256 image, while on five transputers (one T800 master and four T414 slaves) it took 508 msTM (see also Reference 15): when we scale these times down to the 128xi 28 images used in our application, we find that SIP is about twice as fast as five transputers (see Table 2). Thus, if we tried to build a transputer system for running the complete rectangular biscuit inspection algorithm operating at 11 products/s, at least I0 transputers would be needed. However, the linear increase in performance seems to degrade seriously after four slaves (especially with the more trivial types of operation) 16because of the restricted bandwidth of the links, so that a good many more transputers may be needed to bring performance back to the required level. One way to avoid this problem is to use special-purpose video buses with dual-ported RAM to map the image directly into memory 17. However, the additional hardware complexity will markedly increase the cost of the system. Thus, for more complex tasks or a moderate product inspection rate, transputers may be viable but for high product rates (>10/s) SIP seems likely to provide a more practicable solution, the situation obviously being highly application dependent. Overall, then, it is not at all obvious that a transputer system can be built within the £I 0 000 overall price limit which we were constrained by. For reference, Table 2 shows the speed of SIP for a number of well-known algorithms.
CONCLUDING REMARKS This paper has described a complete visual inspection system, including high-speed sequential image processor, accompanying software and a biscuit inspection algorithm. It has also analysed the performance of the SIP system. Bit-slice technology was chosen because of the constraints imposed by industrial manufacture, particularly cost, speed, and adaptability. Our experience has amply justified this choice. The one-off cost of the SIP circuit boards was £620 (this price excludes the boot section, which could cost as little as £30), but for a minimum system, a camera, framestore, VMEbus rack, backplane and power supply are also
Microprocessors and Microsystems
required, leading to a total cost of some £6000. This means that SIP falls well within the cost constraint of £10 000 we noted earlier. In addition, the SIP scheme has been found capable of coping with product rates of ~11 products/s, as demonstrated in a typical (rectangular biscuit) inspection task. Analysis has shown that when there is no strict cost limit, it may be useful to add dedicated accelerators or additional processors to the SIP system: this should enable it to cope with product rates in the region of 30 per second.
10 11
12
13 ACKNOWLEDGEMENTS 14 We are indebted to Dr A I C Johnstone for advice and help, not least in providing us with an early version of his VME interfacing circuit. One of us (JME) would like to thank the SERC and National Physical Laboratory for a CASE Research Studentship. We are grateful to Dr A P N Plummer of the National Physical Laboratory for much assistance early in the SIP project.
15
16
REFERENCES 17
1 Brook, R A and Purll, D J 'On-line image acquisition and analysis for automatic product inspection' Inst. Phys. Conf. Series No 44 Chapter 4 (1979) pp 137-150 2 Agin, G J 'Computer vision systems for industrial inspection and assembly' IEEE C o m p u t e r (May 1980) pp 11-20 3 Chin, R T 'Automated visual inspection techniques and applications: a bibliography' Pattern Recogn. Vo115 No 4 (1982) pp 343-357 4 Chin, R T and Harlow, C A 'Automated visual inspection: a survey' IEEE Trans. Pattern Anal. Mach. Intell. Vol 4 No 6 (1982) pp 557-573 5 Cronshaw, A J 'Automatic chocolate decoration by robot vision' in Pugh, A (ed.) R o b o t Vision IFS Publications, Bedford, UK (1982) 6 Davies, E R 'Design of cost-effective systems for the inspection of certain food products during manufacture' Proc. 4th InL Conf. R o b o t Vision a n d Sensory Controls (1984) pp 437-446 7 Davies, E R and Johnstone, A I C 'Engineering tradeoffs in the design of a real-time system for the visual inspection of small products' Proc. IMechE Conf. o n UK Research in A d v a n c e d M a n u f a c t u r e (1986) pp 15-22 8 Pritchard, S, Cohen, D and Sleigh, A C 'DIPOD: an advanced multiprocessor system for image analysis' Proc. 2 n d lEE InL Conf. o n Image Processing a n d its Applications, L o n d o n , UK lEE Conf. Publication No 265 (June 1986) pp 134-138 9 Edmonds, J M 'Studies of inspection algorithms and
Vol 15 N o 1 January~February 1991
18
associated microprogrammable hardware implementations' PhD thesis, L o n d o n University, UK (1988) Hough, P V C 'Method and means for recognizing complex patterns' Patent N o 3069654 (1962) Davies, E R 'Circularity - - a new principle underlying the design of accurate edge orientation operators' Image a n d Vision C o m p u t i n g Vol 2 No 3 (1984) pp 134-142 Dudani, S and Luk, A 'Locating straight-line edge segments on outdoor scenes' Pattern Recogn. Vo110 (1978) pp 145-157 Davies, E R and Johnstone, A I C 'Methodology for optimising cost/speed tradeoffs in real-time inspection hardware' lEE Proc. E Vol 136 No 1 (1989) pp 62-69 Mirmehdi, M, West, G A W and Dowling, G R 'Label inspection using the Hough transform on transputer networks' Microprocessors MicrosysL (1991) accepted for publication Chapman, R, WUley, T, Bartkowiak, J G and Durrani, T S 'Image processing strategies on transputer arrays' Signal Processing Ilk Theories and Applications Elsevier Science Publishers B.V. (1986) pp 933-936 Edmonds, J M and Davies, E R 'Parallel pattern processing using transputers' Progress report, project NPL 8 2 / 0 4 4 6 (1988) Brown, C and Rygol, M ' M a r v i n - multiprocessor architecture for vision' Applying Transputer Based Parallel Machines, Occam User G r o u p Meeting 10 lOS, Amsterdam, The Netherlands (1989) Davies, E R and Plummer A P N 'Thinning algorithms: a critique and a new methodology' Pattern Recogn. Vo114 Nos 1-6 (1981) pp 53-63
Dr E R Davies is a senior lecturer in microelectronics at RHBNC, University of London. Since 1976 he has been engaged in research on various aspects of machine vision, including edge, corner and circle detection, thinning. Hough transforms, robust pattern matching, and hardware implementations of vision algorithms. He has published more than 40 papers on machine vision and automated visual inspection, and has recently completed a book entitled "Machine vision: theory, algorithms, practicalities" with Academic Press.
Dr ] M Edmonds is a senior scientist at Philips Research Laboratories, UK. He gained a PhD from RHBNC (University of London) in 1988 after carrying out research on image processing algorithms and their realtime implementation in microprogrammable hardware. After a further year's research investigating the application of transputers to vision, he moved to Philips where he is currently involved in the development of concurrent systems for realtime signal processing applications.
19