Pattern. Recognition Le~ers ELSEVIER
Pattern Recognition Letters 17 (1996) 671-677 IAPR pages
Plus qa va, moins
va
P. Z a m p e r o n i * Institut far Nachrichtentechnik, Technische Universitllt Braunschweig, Schleinitzstrafle 23, D-38092 Braunschweig, Germany
Accepted 19 March 1996
Abstract This contribution tries to analyze some reasons why the impact of digital image processing upon its potential fields of application still remains below the expectations. Most of the considerations are of a subjective nature, but the paper's underlying concern about the practical usefulness of image processing techniques is presumably broadly shared within the research community. Thus this paper could represent an initial step towards a broader discussion. Keywords: Image processing
1. The art of image processing Applied digital image processing may be denoted as the art of deploying the basic procedures of a toolbox set, combining them so as to solve specific problems in engineering, science, medicine, quality control, and other fields of application demanding picture interpretation. Let us assume that the basic tool set is well chosen, i.e. that it represents a good compromise between containing too many special operators and being too small a set of generic operators. We shall come back later to this point. It is an art because it requires two kinds of skills: • Systematic knowledge of well-established techniques with very predictable performances, such as linear filtering, Fourier transform, dynamic programming, clustering, etc. This kind of knowledge can be acquired from textbooks or through courses in the traditional way.
E-mail:
[email protected].
• Heuristical talent, experience, and a good sense for the kind of procedure and for the sequence of procedures that are well-promising for coping with a given problem (sometimes, the cascade of wellperforming operators yields a badly-performing procedure). The latter kind of skill is very difficult to transmit systematically. The best way to learn it is the apprentice watching the master working for some years, and then let nature do. But this unpractical way is of little use for systematic teaching purposes. Asked how to carve a sculpture, Gianlorenzo Bernini would have answered: "Just chisel off the superfluous marble", which of course is no concrete instruction for action. Yet academic teaching and research, which should be one of teaching's necessary preludes, should aim at creating both kinds of skills, no matter whether the pupils are gifted or not. This is surely not the only aim of scientific work, but we somehow expect that one of its spin-offs should be the concern of establishing which part of the knowledge is fundamental, which are the basic methods to which most of the im-
0167-8655/96/$12.00 (~) 1996 Elsevier Science B.V. All rights reserved PII S0167-8655 (96)00044-X
672
P. Zamperoni/Pattern Recognition Letters 17 (1996) 671-677
plemented procedures can be roughly reduced. This looks like a useful task: it would allow to reduce the set of essential notions to a minimum; and who else than the image processing research community can be expected to carry out this task?
2. Grow and proliferate However, doubts that this community altogether is working towards this aim are justified, even after a superficial look at the current digital image processing literature, or after having attended one of the international conferences of which the event calendar is crowded. And this in spite of the fact that at these conferences one can often enjoy the presentation of quite a number of ingenious and sometimes brilliant methods. Recently I tried to convince somebody who is strongly engaged in digital television and multimedia research of the potential usefulness of digital image processing methods also within the scope of his research field. His flattering and yet frustrating answer was: "You people seem to be anticipating all the research approaches that should last for the next 300 years. 1 cannot tell the really useful results from the merely academic ones". One of the reasons of this proliferation of methods is maybe that implementing and testing with real-world images what at a first glance may look like a good idea is much easier than, say, in mechanical engineering: one does not need to realize a hardware experimental setup. Is this a real advantage or rather a potential pitfall? If we abuse of it, does this lure us into myriads of pointless experiments?
3. Waiting for the Summa The proliferation of success announcements through technical papers has grown to such an extent that no expert can have a really comprehensive overview of the whole digital image processing field (I would welcome sound objections to this statement). An expert with such an overview would be able to recognize that the "new and betteF' approach A is an allotropic (and maybe algorithmically more complex) form of methods B, C . . . . published ten or twenty years ago. Somebody stated once that intelli-
gence is the capability of recognizing hidden relations and similarities between things. A good dose of such intelligence, better if equally distributed over numerous research teams, could end up into the compilation of a Summa Elaborationis Imaginum, a volume that would be very welcome for teaching purposes at all academic levels. Although there exist already several excellent comprehensive basic textbooks on image processing (e.g. Gonzalez and Woods, 1992; Haralick and Shapiro, 1992/1993; Jain, 1989; Sonka et al., 1993), the much-awaited Summa, which should be something more than the union of several textbooks, has not yet been produced. An advantage of having such a Summa near at hand on the bookshelf would be that, before writing a further paper on "a new and better method f o r . . . " one could check whether the trouble is worthwhile, or if it can be expected that the reviewer, after consulting the same volume, classifies the paper as ddjh vu. The actual situation, in the absence of anything similar to such a book, is that even experienced reviewers are seldom in the situation of counteracting the observable metempsychosis of ideas and approaches. It follows that the state of the system is further degraded by a positive feedback in the proliferation loop.
4. Alienation: nobody understands me But there are more serious consequences of this proliferation process than the acceptance of one or two papers that are not really new. Mankind, whose welfare we have committed us to improve by spreading on it the benefits of applied image processing, understands us no longer. In this case mankind is represented by those who are responsible for innovation in such fields as for instance industrial automation, workpiece recognition, quality control, or nondestructive testing, all fields that can largely profit of computer vision techniques. It is true that sometimes the attitude of responsible persons at this very sensible interface could be described by the well-known sentence ascribed to a nazionalsocialist notable, "When I hear the word "culture" I unlock my gun", by substituting the word "culture" with the word "research"; however, we must ask ourselves earnestly if we are partially responsible of this alienation.
P. Zamperoni/ Pattern Recognition Letters 17 (1996) 671-677
We are, indeed. Because our attitude is the cause of that special sort of schizophrenia that seems to affect several managers considering to introduce image processing techniques to improve efficiency in the industrial automation processes. On the one hand they expect off-the-shelf performances which would imply the fulfillment of all the most daring computer vision dreams; on the other hand they are astonished if some tasks, that they perceive as very difficult, can be handled by standard image processing methods. The latter circumstance does not make us very sad, but did we earnestly try to discourage unrealistic expectations in the last twenty years? Our bid to those interested outsiders who have been watching the image processing scene looking out for applicational spin-off, has been a fireworks of "new and better procedures f o r . . .", an overkill of "intelligent", "dynamical", "generalizear' and "knowledge-basear' methods; a puzzling spectacle for the potential supporters of our research activities. No wonder that the trust in the usefulness of image processing has got lost together with the faculty of discerning a basic tool from a brilliant variation on the theme.
A measure of the degree of this alienation is the insignificant impact of most of those top hits in the scientific journals' best-seller list upon commercial image processing all-purpose software toolboxes. How many investigations and papers have been dedicated in the last years to a permanent refinement of such sophisticated techniques, as for instance simulated annealing, active contours, Markov random fields, relaxation labelling, or scale space methods, and how much of it do we find in software toolboxes? After years of foreplay and improvements, one should expect a d6but in real life. It is pointless to find the culprit, but somehow the scientific community as a whole has not been convincing enough on the usefulness of its output. A further detrimental effect is that not only very sophisticated and computationally complex methods have left almost no trace in software toolboxes, but also quite a number of relatively simple, artful and efficient operators, as for instance nonlinear and rank-order filters or adaptive estimators have been almost ignored; are they the innocent victims of a diffuse mistrust towards the performances of our research community?
673
5. W h a t c a n w e d o ? 1
Sensibility about the unexplicably delayed triumphal entry of digital image processing methods, an event whose imminence we proclaim in our papers, is not new, and signs of discomfort related to this delay could be observed already several years ago (Nagy, 1983; Pavlidis, 1966, 1992). The title of Pavlidis' (1992) paper takes for granted (who could ever object?) that the progress in machine vision is slow, and it just asks why it is so. Unfortunately, the implications of tentatives of sublimating this discomfort in Voltaire's spirit (Nagy, 1983), issued occasionally in particularly witty appeals to the scientific community, have not been investigated more closely. A cluster of reactions (e.g. Aloimonos and Rosenfeld, 1991; Bowyer and Jones, 1991; Huang, 1991; Kunt, 1991; Snyder, 1991) was aroused some years ago by a paper of Jain and Binford (1991). With the exception of Kunt's ( 1991 ) paper, those reactions let me somewhat perplex. My impression was that the repliers tried to show in which aspects image processing is not enough scientifically rigorous, and how it can become more and more a well-founded discipline, for instance by establishing sound and objective benchmarks for assessing the effectiveness of procedures. This is surely a good point, but it does not explain why image processing failed to become a widely useful discipline for a manifold of practical applications. And this in spite of the fact that, since 1991, several scientifically very good innovative papers have been published and also some efforts for setting objective operator quality standards have been done (Haralick, 1994). Have we possibly been too scientific and did we care too little for the practical impact of the performances that we (scientifically) optimized? I am not aware of similar debates having been published in image processing journals thereafter. Thus, the well-known aphorism seems to apply again: "Yesterday we stood on the precipice's edge, but today we have made a big step forwarc~'. Therefore, once again:
what can be done? The following reflections are intended only as a small subjective contribution to an answer to this question, or maybe even as cristallization points for a further debate. 1 (~) Lenin
674
P. Zamperoni/Pattern Recognition Letters 17 (1996) 671-677
5.1. Stop publishing (so much)
Causes and effects of the epidemic paper proliferation raging now, years after 1991, with increased violence, could not be described better than in Kunt's ( 1991 ) paper. Therefore it harmonizes with its spirit to reduce the intersection set between papers and to redirect the reader towards Kunt's publication. What follows is just a tentative of outlining some additional points of view on the subject addressed there. There are some inveterate paper writing customs that should be critically reviewed. These customs generate a complex of rituals, enforced by editorial boards, transmitted by reviewers, and scrupulously followed, especially by younger authors, in order to have their papers accepted. These rituals impose an "Introduction", in which creation's history is reviewed under a distorted Saul Steinberg's perspective (Rosenberg, 1978) and in which fundamental previous work is fit, if necessary with violence, into the role of a forerunner. This should create the impression that creation's aim was to prepare the advent of the "present work", thereby letting a niche free for it, the last jewel, no matter how tiny is the increase of knowledge it conveys. Therefore, a first progress would be to write papers that, with the support of a generous list of references, are focused on the very original core of the message, assuming that most of the readers know by heart the antefacts of the story. As for the section "Conclusions", in the worst case it is a collection of "zero-meaning-words" (Pasinetti, 1986), and in the best case it could be highly compressed without a serious information loss. If a reader has read the paper, he is also able to draw his own conclusions, that he will always prefer to borrowed ones. What about an agreement between editorial boards, reviewers and authors, on accomplishing the humanitarian act (towards the reader) of abolishing those merciless rituals of Introduction and Conclusions, often written with a road-roller technique? Watching the image processing literature since over 25 years, I have the feeling of a distorted perspective possibly underlying several publications. It is the historical perspective under which the author considers his work as an episode in the progress of the investigations on a certain subject. Unwritten rules impose
a sight, which is reflected by writing paradigms like: "Big progress has been made recently in prob tem X (quotation [A], quotation [B], etc.); however, approach [A] does not .... approach [B] does not .... etc. In this paper a new and better method f o r s o L v i n 9 pr'obZera X will be presented, that has the advantages o f . . . " . In other words, the tendency is to fit one's own work close to the end (not at the end: " . . . the obtained results are very promising; further work will be devoted to improving . . . " ) of a linear evolution of increasing progress, like in the classical allegory of the dwarf, who is smaller than the giants, but who can see farther than them by sitting on their shoulders (Merton, 1965). Now, something must be wrong if this image of one's own place in the universal hi story of image processing remains, as everybody can observe, stubbornly unchanged through the decades. This concept of time and progress is not at all obvious, but rather a timely cultural heritage, bearing a marxist scent. In consideration of the logarithmic growth of the real progress (which should include also usefulness in practical life) in digital image processing, I feel that the circular concept of time and progress of the ancient times would represent a more appropriate ideological background. In this perspective the reality, in this case the physical phenomenon of an image and of the data one can extract from it, is so complex, that in a paper dedicated to a specific approach we can seize and describe only one of its multifarious and equally important main facets. The single approaches may be all correct, but each of them can reproduce at most a partial view. The combination of all of them could possibly give a good account of the reality, if we were only able to evoke them all at a time in our mind. So, if we are dwarfs on the shoulders of giants, then we are rather sitting all side by side on their broad shoulders and are looking from the same height into different directions; stacks of dwarfs looking farther beyond the horizon seem to be a seldom occurrence. A recent study (Cocquerez et al., 1995) reports about an extensive comparative assessment of image segmentation methods following a variety of approaches. Its remarkably honest and courageous conclusion is that there is no intrinsic superiority of one method. And yet, how many papers have been published recently for proposing a new and better
P. Zamperoni/Pattern RecognitionLetters 17 (1996)671-677 segmentation method, after having chosen, on the basis of opportunely established criteria, a couple of forerunner works, which may even be really inferior? Concluding proposal: start a debate among editorial boards for setting new and widely accepted paper reviewing criteria. Raising the scientific standards is not the point, because the standards are already high enough.
5.2. Stop meeting (so much) Most of what has been written in the foregoing section applies also to the proliferation of conferences, symposia, meetings, seminars. Is it materially possible that our community produces so much valuable (again: not only from the scientific point of view, but also with respect to practical usefulness) work, to satiate the voracious throats of this thousand-headed monster? How many miles must we go for learning something really useful at a conference? This does not mean: for admiring brilliant performances. Learning something useful means for me, after attending a presentation, to feel like implementing myself the method presented and to know that, if I had time to deepen the subject, I would succeed in doing it, even without a special initiation into an exoteric science. My motivation would be to enrich my procedure toolbox with a new and useful item. A couple of big international conferences worldwide, e.g. one in the new and one in the old world, and a series of national or regional conferences, which should include English into the respective official languages, as for instance the French GRETSISymposium, should be sufficient for keeping track of the really important innovations. "Important" means here again: not important as a step in the progress of an individual trial-and-error epic, but rather worth to be disseminated because presumably useful to a broad audience.
5.3. Reform the personal curriculum assessment criteria The criteria adopted in hiring scientific personnel, especially by organizations in charge of funding scientific projects are often questionable. These criteria are mostly of an indirect nature: they do not eval-
675
uate the scientific work itself, but its formal impact on the publishing and conference attending scenario. They consider the number and weight (in kilograms) of papers and they evaluate the journals in which the papers have been published. However, the value of a paper does not depend on the name of the journal that published it or on the importance of the conference. A meaningless work can be included into a conference for instance because at that conference too few papers for filling a session have been submitted on that subject. On the other hand, some of the methods ranging now among the fundamental image processing tools have been first published in a concise or unpolished form, in internal reports or in journals of modest diffusion, as for example: the contour code, the Canny edge quality criteria, the Deriche operator. Another postulate worth to be discussed is the one claiming that scientific quality is equivalent to innovative content. This is not always true. It can be a valuable contribution also to show that methods A and B are in reality the same thing, i.e. that B is nothing new. Or to show why something cannot work, or why a problem is ill-posed. These are useful performances, from which the whole community can profit. At any rate, the curriculum assessment criteria should include a competent and detailed expertise on the scientific content, no matter how many papers have been published, and where. As long as the rules of the game are what they are, no wonder that they do not exert an influence upon the real usefulness of our scientific output.
5.4. Things toencourage What to do to render digital image processing, and with it the issue of our daily work, more useful in practice? There would be plenty of things to suggest that other people could do, but let us concentrate on what we can do, besides exerting a more or less institutionalized self-censorship on not doing what has been addressed in the foregoing sections. The concern about the practical usefulness of image processing should be the dominant thought in the definition of new projects, Ph.D. theses and workplans for cooperation between research teams. Here are a few, strongly personally biased hints, which are only intended as a primer to further suggestions.
676
P. Zamperoni/Pattern RecognitionLetters 17 (1996) 671-677
5.4.1. Algorithmical efficiency One of my impressions resulting from several years of contacts with eastern European colleagues (before the liberalization of the exchanges with Western Europe) was that they had developed a remarkable algorithmical skill, which was maybe necessary for partially compensating the deficiencies of their computing facilities. This can be a hint to useful future work in a field where there is still much to do. The usefulness of some methods, whose effects and benefits are well known, could be possibly enhanced by developing new algorithmical implementations with shorter execution time, and opening the way to implementations on special processors. Let us consider for instance approximate operator implementations, which are much faster than the exact procedure (e.g. by separation, instead of using a two-dimensional window); how big is the error with respect to the exact solution? When is it so negligible that the approximate and fast solution is good enough for most practical purposes? A further example: it could be worth investigating more closely the possibilities of implementing iconic operators more efficiently on coded images by means of code manipulations; a final decoding of the transformed images is needed only for image output purposes. Approaches of this type are known for binary images represented by means of the contour code, of quadtrees, or of the run-length-code. The latter kind of representation is a good example of an astonishing performance improvement, obtained by making use of a code that at a first glance seems very inadequate for giving account of the shape features of a binary object (Young et al., 1981).
5.4.2. VLSI and special processor implementations There is a remarkable gap between the abundance of theoretical work on image processing operators accomplished recently, and the number of commercially available VLSI circuits in which these operators are implemented. It is quite plausible that the decision of realizing a VLSI chip is prompted by the estimation of the prospective usefulness of its function, i.e. of the corresponding image processing operator, for a wide variety of applications. This estimation is undertaken by the VLSI producer on the basis of the user demand. Here the scientific community's concern about
the usefulness of the techniques it investigates should be clearly recognizable outside of the community's reserve. If we do not care about popularizing the benefits, for instance, of disposing of elastic and efficient VLSI nonlinear filter circuits for image enhancement and preprocessing, there will be a more or less casual spinoff of VLSI circuits usable also for image processing applications and far from being matched to the requirements involved with the relative tasks. Without our feedback, VLSI producers, with some exceptions (LSI-Logic, 1993), will go on making the best they can from the point of view of their technology, and it will be again a matter of luck whether and how much can be used for image processing purposes.
5.4.3. More debates on the Deeper Sense of Image Processing Some years ago I participated in the activity of the GRVO ( Groupe de rdflexion sur la vision par ordinateur), an informal e-mail discussion circle on computer vision initiated by Prof. J.M. Jolion (Lyon). The GRVO was a star-shaped network; the messages from the periphery, consisting of spontaneous reflexions, questions, answers, even jokes (pleasant wrappings of serious considerations 2 ) were retransmitted from the central node to all the participants. Like many good initiatives, the GRVO, after having flourished for a while, decayed and ceased slowly to give signs of life. I think that its decay began when the central concern of most participants shifted from the quest of the sense of image processing methodologies at large to "how to build a better mouse trap" and to showing how good was the trap they were just building. I hope that these reflections, which are intended at most as a set of empirical statements without certainty, can stimulate the sensibility of the image processing community and maybe arouse a debate, which does not need to be institutionalized into a rigid frame for being lively and fertile. Although most of the opinions manifested in this paper are of a subjective nature and may not be shared by some or by many readers, the underlying concern is earnest and, as I hope, it can constitute a common motivation for pursuing some kind of dialogue. 2 @ Rabelais
P Zamperoni/Pattern Recognition Letters 17 (1996) 671-677
Acknowledgements The author wishes to thank E.S. G e l s e m a for his suggestions and i m p r o v e m e n t s o f the E n g l i s h f o r m o f this contribution.
References Aloimonos, Y. and A. Rosenfeld ( 1991 ). A response to "Ignorance, Myopia, and Naivet6 in Computer Vision Systems". CVGIP: Image Understanding 53, 120-124. Bowyer, K.W. and J.P. Jones ( 1991 ). Revolutions and experimental computer vision. CVGIP: Image Understanding 53, 127-128. Cocquerez, J.P., S. Philipp and R. Zeboudj (1995) Comparaison de m6thodes de segmentation d'images. In: Proc. 15th GRETSI Symposium, Juan-les-Pins, September 1995, 1355-1360. Gonzalez, R.C. and R.E. Woods (1992). Digital Image Processing. Addison-Wesley, Reading, MA. Haralick, R.M. (1994). Performance characterization in computer vision. CVGIP: Image Understanding 60, 245-249. Haralick, R.M. and L.G. Shapiro (1992/1993). Computer and Robot Vision, Vols. I and II. Addison-Wesley, Reading, MA. Huang, T.S. (1991). Computer vision needs more experiments and applications. CVGIP: Image Understanding 53, 125-126. Jain, A.K. (1989). Fundamentals of Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ.
677
Jain, R.C. and T.O. Binford (1991). Ignorance, Myopia and Naivete in Computer Vision Systems. CVGIP: Image Understanding 53, 112-117. Kunt, M. (1991). Comments on "Dialogue", a series of articles generated by the paper entiteld "Ignorance, Myopia and Naivet6 in Computer Vision". CVGIP: Image Understandig 54, 428-429. LSI-Logic (1993). Digital Signal Processing Databook. Merton, R.K. (1965). On the Shoulders of Giants. A Shandean Postscript. The Free Press, New York. Nagy, G. (1983). Candide's practical principle of experimental pattern recognition. IEEE Trans. Pattern Anal Mach. Intell. 5, 199-200. Pasinetti, P.M. (1986). II ponte dell'Accademia. Rizzoli, Milano. Pavlidis, T. (1966). A critical survey of image analysis methods. In: Proc. 8th lnternat, Conf. on Pattern Recognition, Paris, 502511. Pavlidis, T. (1992). Why progress in machine vision is so slow. Pattern Recognition Lett. 13, 221-225. Rosenberg, H. (1978). Saul Steinberg. A. Knopf, New York. Snyder, M.A. (1991). A commentary on the paper by Jain and Binford. CVGIP: Image Understandig 53, 118-119. Sonka, M., V. Hlavac and R. Boyle (1993). Image Processing, Analysis and Machine Vision. Chapman & Hall, London. Young, I.T., R.L. Peverini, P.W. Verbeek and P.J. van Otterloo (1981). A new implementation for the binary and Minkowski operators. Computer Graphics and Image Processing 17, 189210.