Microprocessing and Microprogramming ELSEVIER
Microprocessing and Microprogramming 40 (1994) 783-788
Towards a definition of benchmarks for parallel computers dedicated to image processing/understanding Patrick Bonnin 3'2
Edwige E. Pissaloux 1'2
T. Dillon 4
1 IEF/CNRS URA 022, 91405 Orsay, Cedex, France 2Universit~ Paris XIII, 93 430 Villetaneuse, France 3 ETCA/CREA/SP, 94 144 Arcueil, France 4Department of Computer Science and Computer Engineering La Trobe University, Bundoora, Victoria, 3083 Australia
Abstract
This paper presents an effort to define a method for evaluation of parallel architectures dedicated to vision. A definition of the benchmark concept, and a characterisation of a standard set of general-purpose vision algorithms which could constitue a benchmark are proposed. These algorithms are independent of machine architecture, environments, programming models and parallelisation techniques supported. A detailed example illustrating the proposed approach is given.
1
Introduction
The research presented here focuses on tools and methods for comparison of parallel computers dedicated to vision. Machine vision approaches require their own specific architecture, and very frequently must meet real time processing constrains (for example with mobile robots controlled by vision). As there are several different parallel architectures proposed for machine vision some means of comparison of them is necessary. Benchmarks are the most popular tools for machines speed comparison. Several different benchmarks for parallel image processors already exist ([Nag81], [Gra81], [Moo82], ([PreS9], [Lin88], [Wee91] .... ). Every designer has his own benchmarks for each image processing problem ; for example filters for image preprocessings, edge detectors for segmentation. The main characteristics of three examples of
benchmarks are briefly considered next to provide us with an with an overview of the state of benchmarks for parallel image processing. The benchmarks given in references ([Nag81], [Gra81], [Moo82]) test low level image processing aspects only, and thus consider a very limited set of image operations. No information about the algorithms implementation, such as data types used, method and obtained accuracy, is provided. These algorithms do not give enough information about the processing capabilities of computers tested. The Abingdon Cross benchmark ([Pre89], [Lin88]) finds the medial axis of a cross in (a noisy) background. It tests the following classes of operations : input/output, matrix/vector operation, Boolean, filtering, local combinatorial logic operations (used in edge detection), separation from the background, and connectivity preserving thinning (medial axis transform) of the cross procedure. The architectural tested charactristics are not exhibited. The results give only the total execution time for all steps. The second knowledge-based machine vision DARPA benchmark ([Wee91]) was designed by the University of Massachusetts, and than verified for its validity, portability and quality by the University of Maryland. The overall task that is performed by this benchmark is the recognition of an approximately specified 2 and 1/2-dimensional "mobile" sculpture in a cluttered environment, given images from intensity and range sensors. Roughly 40 algorithms from different image processing levels were chosen. Various statistics were collected including the global execution time, loading and
01¢05-6074/94]$07.00 © 1994 - Elsevier Science B.V. All fights reserved. SSDI O165-6074(94)00053-0
784
P. Bonnin et al/ Microprocessing and Microprogramming 40 (1994) 783-788
precompiled data time, initialization time, and the economy criteria. It is very difficult to interpret correctly the statistical results obtained. All the designed benchmarks were frequently evaluated through machine simulators, and some processings were implemented only partially. The effectively implemented algorithms are unknown. The discussion of the above benchmarks highlights the difficulty of using them to compare precisely different architectures. The "benchmark" has different meaning for each originator. Moreover, the proposed benchmarks do not give any information on the most convenient hardware structures for implementation of a given vision problem correctly (without side-effects), especially if conceptual (and not structural) programming is utilised. All these benchmarks do not take into account all vision primitives. This paper tries to overcome some of the above weeknesses. Section 2 proposes a definition of the concept of benchmark, and provides a characterisation of the algorithm for bemchmarks. Section 3 proposes practical rules for choosing the most adequate algorithms for low level vision tasks, and gives a detailed example of one algorithm for benchmarking. Section 4 provides some concluding remarks.
A characterisation of the concept of the benchmark for computer dedicated to a vision
paper focuses on criteria for choosing of vision processing algorithms, suitable for comparing vision oriented architectures. The chosen algorithms should be machine, programming model, language and parailelisationtechnique independent. Algorithms displaying such characteristics are potential candidates for a standard set of general purpose vision subroutines ("image processing space's vectors") which allow for expression of any aspect of vision processing. Therefore, the required characteristics of a benchmark can be summarized as follows : the benchmark has to include representative algorithms from different semantic levels of the whole computer vision process (and not merely isolated vision-related task or a very simple image processing scenario) ; - the benchmark algorithms have to include different control modes of processing : ascendent (bottom-up or data-driven) and ascendent (topdown or interpretation/recognition driven) ; - the benchmark has to include data conversion algorithms ; - no redundancy between different algorithms is allowed ; for example, when several different algorithms perform the same task (say image filtering), only one algorithm will represent this class -
- algorithms have to be parallel or parallelisable ; algorithms have to be independent of the sense of their application to data ; algorithms should have the same implementation invariants, i.e. * the same data and control structures ; * the same programming models (shared memory (PP~AM) model or distributed memory model with message passing synchronous/asynchronous communications) ; * the same parallelisation techniques (these could be explicit - specified by a programmer, implicit made by the compiler ) ; each algorithm has to test the machine's capability to support it (algorithm-architecture adequation) at data representation level, control level, and data movement (communication) level ; hardware tested characteristics have to be pointed out through benchmark ; the principle of each algorithm has to be sim-
-
It is very difficult to give an intrisic definition of the benchmark concept for a computer dedicated to vision. A proposition for a definition could be as follows : D e f i n i t i o n : A benchmark is a minimal and functionally complete set of algorithms together with some criteria that permit testing and evaluation of the capability of a computer to support a given domain of application.
-
The above definition is too difficult for a direct implementation. It seems simpler to give some criteria for characterisation of benchmarks. The existing benchmarks have pointed out that several parameters should be taken into account in order to define a benchmark. This
-
-
785
P. Bonnin et al/ Microprocessing and Microprogramming 40 (1994) 783-788
ple (this does not imply that the implementation is simple). Several questions arise immediately - how to specify the algorithm of a given step ? - how to judge the quality of the result ? :
In order to preserve the maximum invariability, algorithms have to be expressed using the universal algorithmic data and control structures. The judgement of the quality of the result is a more difficult question, because it is dependent on the image content, and statistical measurement are an efficient means to determine if a given result is correct or not over a wide range of images. Using the upper temporal limit for processing through a sequential implementation it is possible to evaluate the real speed up of a parallel computer (according to the Akl's definition [Akl89])
In the case of the low level vision task, the convolution seems to be the best candidate for the edge detector class. It is used with high pass filters (such as Prewitt, Sobel, Roberts, Kirsh, ...), and the noise removing or low pass filters, such as average, median, Gaussian .... filters . The region growing algorithm is another class of fundamental image processing algorithm. Different algorithms for region growing have been designed : split-and-merge ([Pav74]), cooperative edge-region growing ([Son89], [Pav90]) .... Other invariants of low level vision include : homogeneity predicate (useful for region growing), thresholding, different transforms (examples : affine, Fourrier, Hough, Euclidean distance), histogram (of the grey level, of the interestigness, ...). 3.2
as:
speed_up = T..para r_~eq
where : T_seq is the worst-case running time of fastest known sequential algorithm for the problem, and T_para is the worst-case running time of the parallel algorithm.
Practical rules for design and implementation of a benchmark for a low level vision task The proposed characterisation of an algorithm for benchmarking has to preserve different invariants. Four of them are discussed hereafter. There are : - the algorithm's representativity ; - the definition of its operative and control parts ; - the definition of the parallelisation technique ; the data movements. -
3.1
Algorithm's
representativity
Above all, it is necessary to choose the algorithm which represents the whole class of applications : an invariant operator or a generic algorithm, which can be instantiated for a given application.
Specification part
of
the
operative
We take an edge detector operator as the example for specification of the operative part of an algorithm for benchmarking. The edge detector looks for a transition between two regions of significantly different intensities. Consequently, the gradient function of the image, which measures the rate of change, will have a large value at these transitional boundary areas. Thus gradient-based, or first-derivativebased, edge detectors enhance the image by estimating its gradient function and then signal that an edge is present if the gradient value is greater than some predefined threshold. In order to preserve the requirements for implementations' invariants, all implementation elements have to be precisely defined, and these include involved data and formulae used. In the case of a convolution, the input data structures are an image, a 3 x 3 mask, and the threshold value ; the output data are three arrays two containing discrete approximations of gradient magnitude components (for X and Y axis), and the third for the approximation of the gradient argument. The calculations to be performed are : - combinations of the pixels values selected through a mask (X and Y gradient projections) ; gradient magnitude calculation ; gradient direction calculation. The X and Y gradient projections - directional -
-
786
P. Bonnin et al / Microprocessing and Microprogramming 40 (1994) 783-788
derivatives G(I(m,n)) and F(I(m,n)) respectively, at point (m,n) - are given through the mask of coefficients. The formula used for calculation of a gradient module has to be explicitly given for example : Euclidean : -
d(m, .) = ,/(F(/(m, n))) -~+ (G(I(,,,, n)))2
using the convenient mask ; calculate the magnitude of the gradient, d(m,n) ; calculate the direction of the gradient,
-
-
¢(m, n) ; I f (d(m,n) < threshold) THEN IM_R(m,n) = 0 ELSE IM_R(m,n) --= 1.
- max-approximation :
d(m,n) = M A X ( [ F ( I ( m , n ) I, [ G ( I ( m , n ) ) [) - +-approximation :
d(m, ~) =1 F(I(m, ~) I + I G(I(m, n)) I The gradient direction (measured in the positive sense from the X-axis) calculation can be performed in numerous ways : for example : by using the arctan with 4 or 8 discretisation points (according to the connectedness of two pixels): -
F{ I ( r n , n ) !
¢(m, n) = arctan G(l(rn,n)) - by applying the sign rule to F(I(m,n)) and G(i(m,n)) ; - by comparison of the absolute values of the F(I(m,n) and G(I(m,n)). These are examples of possible instantiations of the generic definition of the edge detector. The mask coefficients should be given (Sobel, Prewitt, ...). 3.3
Specification for the control part of a parallel algorithm
Once the edge detector has been choosen, the precise sequential and parallel algorithms should be explicity given. A sequential algorithm can be the following one : input data : IM : image of N x N pixels ; masks /* for X and Y approximation of the gradient */; threshold output data : IM_R : image-of-edge-points ;
A parallel algorithm has to take into account different parallelism models (data, control, farming, etc.). All models of parallelism have to be investigated in order to evaluate hardware characteristics. So far as data parallelism is concerned, a data parallel version of the above sequential algorithm will use the same data structures ; the following control structures will replace the two "FOR" loops of the sequential algorithm : DO ON all pixels IN PARALLEL The proposed edge detector is very simple, because it does not contain data dependencies. 3.4
The algorithm discussed so far is based on a mask 3 x 3 applied to each image pixel. Consequently, local data exchanges, with the nearest neighbour pixels are necessary. If a massively parallel architecture is considered (e.g.an n-ary hypercube of dimension 1, 2 or higher), it is possible to associate one processing element with one pixel. Consequently, the communication strategy has to include dedicated I/O operations with north, north-east, east, south-east, south, south-west, west and northwest neighbours. This will test the capability of the interconnection network with different degrees of connectivity to support the local communication.
4 FORn= 1TONDO FORm= 1TONDO - calculate the projection F(I(m,n)) of the gradient on the axis X using the convenient mask ; - calculate the projection G(I(m,n)) of the gradient on the axis Y
Data movements
Conclusion
The paper addressed the problem of benchmarks for vision computers. Starting from analysis of the existing benchmarks, an informal definition and characterisation of the benchmark concept are proposed. A benchmark, in order to be general, has to include (only) all conceptual
P. Bonnin et al / Microprocessing and Microprogramming 40 (1994) 783-788
and implementation invariants leading to independence from machine, environment, programming models and parallelisation techniques. In the context of vision, algorithms of a benchmark have to be "vectors generating the computer vision algorithm space". Consequently, several elements have to be taken into account when defining such algorithms (processings).This paper has characterised more precisely the algorithms for a benchmark. A detailed example of the whole practical procedure has been given as well. The edge detection algorithm presented uses the convolution and the Cartesian projections as their basic invariants. Usage of such invariants has led to an easy to understand and precise form of the pwd algorithm. However, the benchmark should not only evaluate the temporal performace, but has to be a guide for designing of new computer structures. This is one of our research directions.
References [1] [Ak189] Akl, S., The design and analysis of parallel algorithms, Prentice Hall, 1989 [2] [Bon89] Bonnin, P., Zavidovique, B., Principles and applications of a new [cooperative segmentation methodology, SPIE Visual Comm. and Image Proc. IV, Philadelphie, Nov. 1989, pp. 677-688 [3] [Deu89] Deuch and al., Performance of Warp on the DARPA image understanding architecture benchmarks, in P. M. Dew, R. A. Earnshaw, T.R. Heywood, Parallel processing for computer vision and display, Addison-Wesley, 1989 [4] [Gra81] Grappel, R., Hemenway, J., EDN benchmark, Electronic Design News, April 1981 [5] [Hot74] I-Iorowitz, S. L., Pavlidis, T., Picture segmentation by a directed split-and-merge procedure, Proc. of the 2nd Int. Joint Conf. on Pattern Recognition, 1974, pp. 424-433
787
[6] [Lin88] Lindskog, B., PICA3 - Linkoping Studies in Science and technology, PhD Dist., N. 176, Sweden, 1988 [7] [Moo92] Moore, M., Crawford, ]., Pascal benchmarks, iAPX86 benchmark report, Intel Corp., Feb. 1992 [8] [Nag81] Nagle, H. T., Nelson, V. P., Digital filter benchmarks, IEEE Micro, Feb. 1981, pp. 23-41 [9] [Pav90] Pavlidis, T., Liow, Y. T., Integrating region growing and edge detection, IEEE Trans. on PAMI, 12(3), March 1990, pp.225233 [10] [Pea91] Pease, D. & al.. PAWS : a performance evaluation tool for parallel computing systems, IEEE Computer, January 1991, pp. 18-19 [11] [Pre89] Preston, K., Jr., The Abingdon cross benchmark surv., IEEE Comp., July 1989, pp. 9-18
[12]
[Web92] Webb, J. A., Steps toward architecture - indept image proc., IEEE Computer, Feb. 1992, pp. 21-31
[13] [Wee91] Weems,
Ch., Riseman, E., Hanson, E., Rosenfeld, A., The Darpa image understanding benchmark for parallel computers, J. of Parallel and Dist. Comp., 11, 1991, pp.l-24
[14] [Wei90] Weicker, R. P., An overview of common benchmarks, IEEE Comp., Dec. 1990, pp. 65-75