Pattern Recognition Letters 8 (1988) 131 139 North-ttolland
September 1988
ARCHI." An expert system for biological objects recognition Olivier L E B E L Socibtk BIOCOM. BP 53, 91942 Les Ulis, France
Received 30 June 1988 Abstract: From the biological sample to the final decision, a computer vision system splits its process in many levels: image acqui-
sition, preprocessing, segmentation, postprocessing, measurements, decision. Most often, the human operator acts upon one or more of these levels, either to drive or to confirm the processing, either to do it himself. Agcnl system acts in two ways: first to drive the acquisition, preprocessing and segmentation steps (driving part), second, to filter contours issued by segmentation according to morphological or color parameters, learned during previous steps (recognition part). At every level, decisions are taken according to (i) the biological knowledge of the field, (ii) the objects which are looked for, (iii) and the associated artefacts. That can possibly imply a modification of previous steps (control and feedback). Moreover, the knowledge of the objects can adapt itself to the sample.
Key words: Expert system, image, picture processing, biology, data analysis.
I. Introduction The use o f expert systems in image processing has been widely studied. The first a p p r o a c h was to design new treatments in order to increase, for example, the quality o f segmentation, using knowledge o f elementary perceptual primitives (Nazif and Levine, 1984). On the other hand, one can use semantic knowledge to match a pattern o f the studied world with primitives extracted t h r o u g h low-level processors (Niemann et al., 1985). These works and m a n y others (e.g., Chassery and G a r b a y , 1986; Stansfield, 1986) have shown the interest in including different levels in a global system. The a p p r o a c h presented here, puts forward the use o f available treatments, leaving the processing itself to the existing and well-known algorithms. The used knowledge is a representation o f an image processing specialist's. F o r each available treatment, he knows the conditions o f use and the order o f application. The ARC8I system (Analysis and
Representation o f the Heuristical C o m p o r t m e n t towards Image) has been developed in order to ease the use o f imaging station for nonspecialists. It reproduces the c o m p o r t m e n t o f an image analysis specialist when a biologist asks him measurements on microscopic images. The biologist will point out some objects o f interest and the specialist will try to find the best succession o f processing to get the segmented image. Then, by discrimating the objects o f interest from the artefacts, the system will be able to give the right measurements. ARCHI is built a r o u n d the notion o f objects o f interest. This will guide the processing. Such notion is acquired through learning.
2. Field of application Imaging for laboratories deals with various applications such as cytology, radiology, the food industry control. Their c o m m o n characteristic is to
0167-8655/88/$3.50 O 1988. Elsevier Science Publishers B.V. (North-Holland)
131
Volume 8, Number 2
PATTERN RECOGNITION LETTERS
use images for measurement or control purpose. These images can be processed by image analysis systems, which are now widely spread. However, such systems remain difficult to use for non-specialists. The purpose of ARCHI is to transfer the specialist expertise to the system, in order to allow biologists to use imaging system, without any knowledge in algorithms. The user, simply, focuses on his application, pointing out the objects of interest to the system and specifying the aims of the experiment.
September 1988
Heasurenent, selection
Measurenent
Y Figure 2. Low-level overview.
3. System overview The system is divided into two parts, which can interact through a feedback loop (Figure !). 3.1. Low-level
The low-level part will process data from the raw image, up to the contour extraction. It will drive the choice and the order of pre-processing treatments and of segmentation algorithm. For each chosen treatment, it will set its parameters according to the knowledge of the application (subsection 4.3). Each time the image is modified, it makes some measurement on it to permit the selection of treatments (Figure 2). 3.2. High-level
The purpose of the high-level part is to classify contours generated by the low-level part, extracting
objects of interest according to rules generated in a learning step using Fisher linear discriminant analysis (Fisher, 1936; Lebart and Morineau, 1985). The low-level part will take into account the fact that the contours will be classified to set parameters of treatment in order to generate too many contours rather than too few. It will try to avoid to loose any object. If the objects of interest are not already known, it is necessary to go through a learning step, during which the biologist will classify the objects manually. When the objects are learned, the system will classify new contours. It will control the verisimilitude of generated contours and is likely to react on preceding steps by modifying, for example, the adjustment of parameters in low-level treatments. 3.3. Hardware architecture (Figure 3)
ARCHI is developed around imaging microstations from BIOCOM company (based on AT-com4,
..... -~
Preprocessing Segnentation
@
ILow-level
$
JC~ l ssfieatoi nI High-level 4,
.......
4,
Figure I. ARCHI: System overview. 132
Figure 3. Hardware architecture.
PATTERN RECOGNITION LETTERS
Volume 8, N u m b e r 2
patible with specialized imaging boards). The microstations are linked with optical microscopes for the image acquisition. They are used for acquisition and visualization of image. The expert system is developed on a VAX 780, linked to the microstations via a DECNET Network.
September 1988
l.age acquisition ~ w inag~~ Heasurenents nean Ai,Of experinent Low-l~ud 1 Processingknowledge $
[
~eraoe ReasureMent~ o~ parameters
Combinationo~ paraneters (linits)
3.4. Software architecture (Figure 4) ARCHI is written in OPS 5 (Digital Equipment Corporation), using a forward-chaining inference engine, in predicate logic. For processing, it uses Fortran libraries: IMAGENIA(from BIOCOM company) for image processing and measurements, and libraries for I/O and data analysis.
4. Knowledge used (Figure 5) All the knowledge is encoded in Ops 5, either directly into rules (Table 3) or in working elements (Table 1).
4.1. Knowledge of the experiment Beside the knowledge of images and treatments to apply, we use information about the experiment itself.
4.1.1. Object of interest The main source of information concerns the objects of interest. Their knowledge is represented in the vector space of measured parameters (surface,
] Hioh-leuel
TM
,
I Low-leuel
~ E E N I
I
Figure 5. ARCHI: Knowledge used.
form factor, Feret diameters, mean grey level and standard deviation of grey level). This knowledge is acquired during the learning step, using Fisher linear discriminant analysis (Fisher, 1936; Lebart and Morineau, 1985). This gives a linear combination of the initial parameters. This discriminant vector defines a separating plane, which can be a posteriori modified by combining parameters through a combination of significant parameters (CSP). CSP can be represented using a set of authorized intervals on parameters and of relations between the parameters. An example of CSP is given in Table 1.
4.1.2. Reference images To take into account the biological variations,
Table 1 Structure o f a CSP (literalize CSP number name status)
unique number to reference a CSP name of the associated object of interest in Necessary or Redhibitory
(literalize R E L A T I O N number to reference the relation numcsp number of the related CSP type in Equal, Lesser Than, Greater T h a n paraml first argument of the relation param2) second argument of the relation
}
FORTRAN
(literalize P A R A M C S P name name of the parameter (surface, perimeter,...) numcsp number of the related CSP minimum lowest limit of the interval maximum) highest limit of the interval
Figure 4. Software architecture.
133
Volume 8, N u m b e r 2
P A T T E R N R E C O G N I T I O N LETTERS
together with some optical deformations, we represent the mean measurement of parameters as measured in learning images. This will allow normalization if small variations occur and also allow to apply defined limits in CSP.
4.1.3. Kind of experiment For some processing, it is useful to have a representation of the means for the image acquisition (microscope, light box). This is used to know how to get a white level reference image: for a light box, this image is got by removing the sample; for a microscope, it is got either by removing the sample or by defocusing. This knowledge is also used to enable a focusing algorithm (maximum of contrast), when using a microscope which has such capability.
4.1.4. Aim of experiment We have distinguished two main directions for processing: either the purpose of imaging is only to get an enhanced image of the sample or it is used for measurement purpose. This knowledge is used to guide the selection of processing (subsection 4.3). In the measurement case, it is useful to know what kind of precision is needed. This is used to set parameters of process.
4.2. Knowledge about images In order to be able to select the treatments, at each step, the system needs information about the
Table 2 Available treatments and parameters. Treatments are divided into two classes depending on the aim of the experiment. Aim = Measurement - Smoothing (param.: number of iterations) - Low-cut filtering (param.: size) - Thresholding (param.: threshold, see 5.2) - Vectorization (param.: approximation) Aim = Enhancement (no parameters) - Sharpening - Median filtering - Histogram equalization - Smoothing
134
September 1988
content of images. This is done in the present system through measurements in the histogram. We represent especially the homogeneity of the image because microscopic images are often inhomogeneously lighted. This is characterized by the width of the histogram peak, at half height. Then, we represent the degree of binarization of images in order to facilitate the threshold selection. This is characterized by measurement of entropy (being 1 for two equal classes). Furthermore, we use other measurements such as the grey level of the peak.
4.3. Knowledge about processing Heuristic search for an optimized sequence of processing is divided into two parts. The first action is to select, at each step, the best treatment among those available (Table 2) and the second is to set the parameters of the selected process to values which depend on other information, especially the average measurements (size, intensity) of the objects of interest. In both parts, the knowledge is represented directly into rules, which are written by the specialist. An example of a rule is given on Table 3.
Table 3 An example of rule ; If the task is to select processing ; and the aim o f experiment is not ; to enhance images ; and there is an inhomogeneous image, ; which is the current image ; and the correction has not already be done ; then create a task to make the correction (p RAW__IMAGECORRECTION__SELECTION (TASK ^name processing) (EXPERIMENT^aim o enhancement) (CURRENT__IMAGE~number ^status done) (make TASK ^name raw_image~correction))
Volume 8, Number 2
PATTERN RECOGNITION LETTERS
5. Application of knowledge Knowledge about objects, experiment, images, and processing enable us to select processing. The set of considered treatments depends on the experiment aim (image enhancement or measurement).
5.1. bnage enhancement In that case, we will consider two possible sequences of processing: either histogram equalization and smoothing or contrast enhancement and median filtering according to the width of the histogram.
5.2. Measurements If the purpose of the experiment is to get measurement on objects (this is compatible with the highlevel part), we will consider mainly the segmentation. Here, we use thresholding only. Consequently, we will have to correct the background if it is found inhomogeneous. According to the knowledge of means for image acquisition, we will select either a
September 1988
low-cut filtering or a correction using a white level reference image. To make the thresholding possible, we have to determine a threshold. This is done through the histogram. Then, we use two methods according to the entropy. If the image is rather binary, we set a threshold not far from the peak, either lower or higher according to the contrast of the object of interest. Otherwise, we use an algorithm of valley finding (Roux, 1988). If we are still unable to determine a threshold, we will smooth the image, modify the measurements and reapply the process. After the thresholding, we use a vectorization algorithm to build polygonal contours on which we can apply the high-level part. The approximation of the vectorization is determined by the size of the objects of interest. Using the CSP, the high-level part applies selection rules to make a discrimation between the objects of interest and the artefacts.
6. Results We have applied ARCHI to blood smear images (Image 1), on which we can see erythrocytes (small round cells), leucocytes (large cells with complex
Image I. The object to be detected is the malignant cell (large isolated cell). 135
Volume 8, Number 2
PATTERN RECOGNITION LETTERS
September 1988
Image 2. Raw image: the system detects inhomogeneous illumination by measuring the width of the histogram peak at half-height.
nucleus) and malignant cells (large cells with large and textured nucleus). The objects of interest are malignant cells. The needed precision is low. No
white level reference images are available. Three images are used for learning and one for the classification (Image 2).
Image 3. Corrected image: low-cut filtering has been applied and the histogram is now bimodal, allowing threshold detection. 136
Volume 8, Number 2
PATTERN RECOGNITION LETTERS
September 1988
Image 4. Vectorized contour: after thresholding, contours have been vectorized and measured (Table 4).
Images have been digitized using a BIoCoM 200 imaging w o r k s t a t i o n , transferred to a VAX 780 via a DECNET network. All t r e a t m e n t s are d o n e on the
VAX 780, a u t o m a t i c a l l y by the system. Resulting images ( ! - 5 ) are backtransferred to a w o r k s t a t i o n for visualization.
Image 5. Classifiedimage: applying learned classification,malignant cells have been detected. 137
Volume 8, N u m b e r 2
P A T T E R N R E C O G N I T I O N LETTERS
September 1988
Table 4 Parameters measured by the system on some contours
Cont.
Surface
l0 1065.93 12 988.12 19 1009.34 28 1955.02 31 1128.17 36 1943.71 42 2015.85 44 2020.10 47 2150.24 49 2020.10 *50 4362.02 52 988.83 57 1046.83 62 2020.10 68 2338.39 80 823.32 82 1046.83 83 118.83 N u m b e r o f c o n t o u r s = 18 Average 1613.43
Form factor
Great Feret diameter
Orthogonal Feret diameter
Small Feret diameter
Average grey level
0.94 0.97 0.97 0.61 0.96 0.63 0.51 0.55 0.51 0.54 0.98 0.95 0.97 0.74 0.57 0.96 0.95 0.80
43.34 36.68 37.78 75.02 41.13 66.91 78.08 71.40 77.07 80.30 80.01 37.98 38.16 66.66 79.31 34.34 39.90 16.67
36.96 35.69 35.70 59.16 38.80 58.27 71.43 67.35 65.29 62.10 70.52 36.86 36.91 56.21 69.54 32.52 37.23 11.82
32.74 34.69 35.02 37.42 36.22 40.43 37.38 39.97 41.01 36.16 70.42 35.04 36.05 39.10 40.40 31.66 35.04 11.82
146 149 145 145 147 148 145 147 147 149 98 147 146 145 148 149 148 156
0.78
55.60
49.02
37.25
145
* = the selected contour
Starting with the raw image (Image 2), the system detects an inhomogeneous illumination (the peak is too wide). Consequently, it induces a correction (low-cut filtering) which will give Image 3. On that image, we can see that the histogram is now bimodal. The system induces a thresholding with a threshold near but below the peak. It is then able to vectorize contours (Image 4). We can see that we have got too many contours but this is done on purpose to avoid loosing any object. Parameters are measured on contours (Table 4). The system then applies the classification and gets Image 5.
a known objects data base which could then be applied to more sophisticated applications including objects recognition in situations where many objects can be found.
Acknowledgements We would like to thank F. Diqual, M. Roux, G. Colliot for their programming help and the image laboratory of the ENST in Paris (Pr. H. Maitre) for their expertise.
References 7. Conclusion The first results show that the use of an expert system is very suitable to drive simple algorithms. ARCHI keeps an open structure and can be easily extended to new processings. Furthermore, the representation of objects of interest enables us to build 138
Chassery, J.M., and C. Garbay (1986). Expert systems, image processing and image interpretation. Proc. 8th Int. Conf. Pattern Recognition, Paris, France, 175 178. Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179 188. Lebart, L., and A. Morineau (1985). SPAD: Syst6me portable pour l'analyse des donn6es. Version 1985. CESIA, Paris.
Volume 8, Number 2
PATTERN RECOGNITION LETTERS
Nazif, A. and M.D. Levine (1984). Low level image segmentation an expert system. IEEE Trans. Pattern Anal. Machine Intell. 6, 555 577. Niemanm H., H. Bunke, I. Hofmann, G. Sagerer, F. Wolf and H. Feistel (1985). A knowledge based system for analysis of gated blood pool studies. IEEE Trans. Pattern Anal. Machine lntell. 7,246 259.
September 1988
Roux, M. (1988). Etude d'un syst+me expert de pr6traitement des images. Rapport de stage, Ecole Nationale Sup6rieure des T61~communications. Stansfield, S.A. (1986). ANGY: A rule based expert system for automatic segmentation of coronary vessels from digital subtracted angiograms. IEEE Trans. Pattern Anal. Machine Intell. 8, 188 199.
139