and Microscopir~aAcl~,Vol. 21, No. 1/2, pp. 29—55, 1990. Printed in Great Br,1a~n.
0739—6260/90 53.00+0.00 1990 Pergamon Press pie
Micron
CESAR: A COMPUTER SUPPORTED MEASUREMENT SYSTEM FOR THE ENHANCEMENT OF DIAGNOSTICS AND QUALITY IN CYTOLOGY 1.
GAHM”
and B.
AEIKENSt
~Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung, Universität Hannover, F.R.G. tRobert-Koch-Strasse 40a, 28 Bremen 61, F.R.G. (Received 18 November 1989; revised 7 March 1990)
Abstract—In the following presentation, the concept of the CESAR Cytology system (CEll Screen and Analysing Routine system) IS introduced, which unlike conventional systems also comprehensively integrates human empiric knowledge in the evaluation and interpretation of the measurement results. For this purpose CESAR is especially equipped with an objectoriented data base. Due to its special structure graphical presentations of the measurement or classification results (histograms, scatterplots) can be used as directories of a cell image data base. In this way the numerical resultscan be ~translated’intocell images with a similar content ofinformation, but which are far better suited to be compared with the empirical knowledge of the human brain,
INTRODUCTION The evaluation of cytological specimens for diagnostic purposes has become increasingly important in clinical routines. This is not only due to the fact that, in many cases, the removal of tissue for cellular investigation is relatively simple and not exceptionally stressful to the patient, but also, especially with respect to the early diagnosis of cancer, changes in the structure of cells give an indication of cancer being present in the patient. Generally, cellular tissue is visually evaluated by a cytologist with the aid of a microscope. The diagnosis is therefore dependent on the experience of the individual cytologist and is based on knowledge derived from this experience and not according to explicit scientific criteria. In many cases, however, because the human brain is not capable of generating and memorizing quantitative measurement values, the diagnoses resulting from this method often differ, and cannot be reproduced. Contrary to the human brain, computer supported measurement systems are capable of generating quantitative measurement values and storing flawless data sets. Systems currently available on the market which include these characteristics are used for cytologic evaluations. The cells are measured, the results are statistically analysed, with or without the aid of pattern recognition programs, and graphically displayed. This procedure, however, has three distinct disadvantages which place serious limitations on the results: (1) Empirical knowledge can very rarely be explicitly formulated, and as such cannot be fully accessed or utilized by a computer. Sponsored by BMFT, Bonn and LOTTO+ TOTTO. I.ower Saxony. 29
30
T. Gahm and B. Acikens
(2)The ability of the brain to compare and recognizepatterns and similarities, and to abstract this information according to implicit criteria, cannot be adequately applied. (3) Due to the computing limitations, it is only possible to achieve a simplified data evaluation. In order to combine both the advantages of the computer based and the visual evaluation, a new concept of a system for diagnosis support in cytology was developed. Its central part is an object-oriented data base. Its structure enables the cytologist to access the quantitative information gained by the measure procedure and stored in the data base by means of his empirical knowledge. This system will be referred to in the following text as the CESAR Cytology system (CEll Screen and Analysing Routine system) or the CESAR cyto-analyser. DEVICE CONFIGURATION The basic components of the CESAR cyto-analyser are illustrated in Fig. I. The LEITZ ORTHOPLAN research microscope, used for data recording, and the KONTRON image analyser, IPS, used for processing the data form the nucleus of the system. The comparison of these data with the empirical knowledge of the cytologist leads to the final interpretation and to an enhanced cytological diagnosis. Microscope
The microscope is equipped with planapochromatic objectives with magnification factors of 63 and 16 as well as numeric apertures of 1.40 (oil immersion) and 0.40 (dry lens). The cell images are recorded from the microscope using a BOSCH TIVK9B camera with a HEIMANN Pasecon tube. The images are then transmitted to the image analyser where they are digitized in 512 x 512 pixel to a depth of 8 bit. The microscope is fitted with a vario-optovar for adjusting the camera. This enables the magnification to be additionally increased by the factor of 1 to 3.2. A magnification factor of 1 .6 proved to be a suitable setting for the oil immersion lens planapochromatic 63/1.40. In this case the geometric resolution for the digitized images was 0.5 ~tm. Due to the fact that most of the relevant chromatin structures used for characteriLing the texture of the nucleus normally lie in the range of 0.8 jim. this setting is a good compromise between maximum information and storage space for the digitized image. For fully automatic scanning of specimens, the microscope is fitted with a
_____________________
ENHANCXDCTrOI.OmCAL DIAGNOSI8
J4-___.._____
S
(otili titration of the (l~SARe\lo—000I\’,ct
~—--~~-~
CESAR: A Computer Supported Measurement System
31
MARZHAUSER scanning stage EK 32 with a specimen mount which can be manually moved in x and y directions and a step motor for automatic focusing. An adjustment unit manufactured by KONTRON is used to stabilize the source of illumination for the microscope. If required, this unit enables the modulation of the video image to be automatically optimized, the storing of preselected levels of illumination, or the manual setting of the level using a potentiometer. In all cases, it prevents an overmodulation of the TV camera.
IPS image analyser The IFS image analyser manufactured by KONTRON comprises a host computer (Z80 CPU, 8 bit data bus, 16 bit address bus, external ECB bus), for control of the fast image processing unit which is equipped with an efficient micro-programmable array processor with pipeline structure. The Z80 CPU will soon be replaced by a 80386 CPU. 4 MByte of memory are available for the intermediate storage of digitized images. The video input signal from the TV camera is converted via an 8 bit AD converter with a maximum sample rate of 20 MHz. The output signal can be displayed on a colour monitor via preselected look-up tables. A digitizer tablet (0.02 mm resolution, with an active measuring area of 280 x 280 mm) and a cursor, enable interactive manipulations during the processing of images. A special menu technique, incorporating the use of the cursor, provides for user-friendly operation of the image processing software. Special interfaces are available for connecting peripheral devices, such as scanning stage and autofocus controllers, to the host computer. An RS 232 interface enables data to be transferred to other systems. The system is fitted with a 20 MByte hard disk, a 592 KByte floppy disk drive as well as two drives for exchangeable disk (Bernoulli principle, 10 MByte each) for storage purposes. A 512 x 512 pixel image with 8 bit grey levels can be loaded into the video memory from disk or written from the memory to disk in approx. I s. For computing-intensive applications such as the calculation of classification matrices, a three-way interface can be used to switch the host computer over to a second, more powerful host computer which runs on a Motorola 68000 CPU with a 16 bit data bus and a 32 bit address bus. Sofi ware The software has been developed in the form of various program modules on the basis of the KONTRON image processing library. The modular structure enables the user to design macros combining a sequence of individual programs specific to a particular application. The required programs are simply selected from a menu, using the cursor.
DATA ACQUISITION The program sequences needed for the data acquisition macros are largely dependent on the type of material under investigation, its preparation and the required diagnosis criteria. The macros usually generate grey value images of interesting objects of the investigated medical material. Depending on the chosen operation mode the preselection of these objects is done interactively by the operator or automatically by the machine according to certain predifined criteria. Figure 2 shows a typical cell scene of a Pappenheim stained cytological specimen of a bone tumour. The cells marked by a frame were automatically detected and their high resolution grey value images (Fig. 3) automatically stored in an image buffer for the subsequent evaluation.
32
T. Gahn1 and B. Aeikens
Fig. 2. Typical cell image ofa Pappenheim stained cyiological specimen ofa hone tumour. 1 lie cells marked h~a frame can he seen again in I ig. 3.
_
~4A
Fig. 1. (ireS s tiluc image consisting ol line nuclei images
CESAR: A Computer Supported Measurement System
33
GENERATION AND MANAGEMENT OF THE DATA BASE Cell images, cell locations, feature data which were derived from the cell images or from other feature data sets, statistical data, patient data and specimen identification data are part of the central knowledge base of the CESAR system. This knowledge base enables the system to run quantitative and qualitative comparisons with stored cell images and related data and thus to function as a highly accurate ‘electronic cytology atlas’, arranged according to specific problems. The organization of the knowledge base is shown in Fig. 4. The patient data with cross references to the ‘pages’ in the feature and cell image data banks and the allocated specimen numbers are managed by a relational data base. As can be seen from the diagram, the actual access to the specific data always takes place from the preceding (in the diagram) to the following knowledge base. This means that it is only possible to search for the right ‘feature page’ via the patient data; in order to compare images from the cell image data base, it is necessary to have the applicable ‘feature page’ (see below) and to obtain a reliable search for the original cells, the appropriate grey value images. Patient data The most important data relating to the case history and the investigations which have been carried out to date are stored in this block. Feature data bank Grey value images represent an enormous amount of data. For that reason they are normally described by a limited number of characteristic features. The features fulfil two purposes: firstly they present a simplified, but viable characteristic description of the cell images (data reduction), and secondly they establish an exceptionally clear-cut and logical access path to the cell image data bank for the CESAR cyto-analyser. During the feature extraction phase, new ‘pages’ in the feature data bank are
RELATIONAL OATABAS~
~LATA
Fig. 4. Organization of the knowledge base.
1
1 . (iahm and B. Acikens
34
automatically generated from the allocated grey value images. As a rule, data relating to all the cells stemming from the same specimen, are stored in one file. The term ‘feature data bank page’, which has already been used several times, relates to this unit. If the amount of data exceeds the capacity of one file, additional files will be created automatically. Basic teal ore set
In order to be able to cover a large range of applications, the CESAR software incorporates a comprehensive set of 60 basic features which characterize the geometric. densitometric and texture features of the cells. The basic features are given in the following list. Geometric ,feature.s
I 2 3 4
AREA PER! CPER DMAX 5 DMIN 6 SMPX 7 SMPY 8 FCIR 9 DCIR 10 SAXE
object area object perimeter convex perimeter
II 12 13 14
largest axis of the same ellipsoid area of the ellipse calculated from 10 and II perimeter of the ellipse angle between LAXE and the x axis.
15 16 17 18
19 20 21 22 23 24 25 26 27
LAXE ELAR ELPE ANGL DenSi/ 0/lie! nc TM EA TMAX TM IN
TSTD TRNG
maximum diameter minimum diameter v-coordinate of the centre of gravity of the object 1-coordinate of the centre of gravity of the object circular form factor diameter of area equivalent circle smallest axis of a rotational ellipsoid with the same moment of inertia as the current object
lea! un’s mean transmission maximum transmission minimum transmission S.D. of the transmission
TMAX—TMIN
OMEA OMAX
mean optical densit~ maximum optical density
OMIN OSTD ORNG SKEW EXZE IODN
minimum optical density S.D. of the optical density OMAX.—OMIN greY value distribution distortion grey value distribution excess integrated optical density
7ev/tire 28
PARE
29 30 31 32
PMAR PSTD PANZ PABD
sum of all IOD areas of an object, divided by the total area. The IOD areas are those areas with the highest optical densit~ which together contain a user-detined percentage of the total integrated optical density of the objecl mean IOD area S.D. of the IOD area number of JOD areas distance between the common centre of gravut\ of the 101) areas and the centre of gravity of the object
CESAR: A Computer Supported Measurement System
33
PMAB
34 35
PSAB SSTD
36 37 38 39
MPER RPAR ATST FREQ
40 41
FARE LPAR
42 43 44 45 46
LANZ LMEA LSTD LARE LBAZ
35
mean distance of the IOD areas from the centre of gravity of the object S.D. of the individual distances sum of the standard deviations of the differences in the x and y coordinates of the centres of gravity of the IOD areas to the centre of gravity of the object mean perimeter of the IOD areas MPER/PMAR PAR E/predefIned IOD threshold percentage ratio between the perimeter and area of those object areas containing the most frequent grey value, including an offset region total area of this grey value region ratio between the perimeter and area of the light object regions. The light object regions are derived from the user-defined percentage of the total object area via the grey value histogram number of light areas mean light areas S.D. of these areas sum of all light areas number of light areas, the size of which exceed the minimum area (60 pixels)
The features 47—60 are derived from the grey value matrix (co-occurrence matrix, for example see Haberãcker (1989)) and relate to the definitions determined by Haralick (1989). To achieve the directional invariance, the mean value is determined from the individual values of four directions (0~,45~,90~,l35~).The features give information regarding homogeneity and contrast of the image being processed as well as the number and type of available grey value transitions and the complexity of the image. 47 48 49 50 SI 52 53 54 55 56 57 58 59 60
CASM CCOR CVAR CIDM CENT DMEA DVAR DASM DCON DENT GMEA GVAR GASM GENT
second moment angle correlation variance inverse difference moment entropy mean value of the difference difference variance second difference angle moment contrast difference entropy sum of the mean values sum of the variances second sum angle moment sum of the entropy.
The features which are to be calculated from a set of cell images can be determined using a feature editor (Fig. 5) with the aid of the cursor. Depending on the selection, the feature is referenced to the nuclei, the plasma or one of the first four statistical moments of the whole cell population. In addition, new weighted feature combinations can be defined via one of the operators (+ x /( )) and stored under a new name in the LINKOMB field. Because specific access to the cell image data bank can be achieved via the individual features, the feature menu displays the main ‘directory’ of the ‘electronic knowledge base’. Due to the possibility of linking the available basic parameters into new features, new access paths to the cell image data bank can be directly created, —
36
T. (iahm and B. Aetkcns
~
I e,tiurc
—
,jit’t
Kant nneii Loecc— ma /i5 Jon/nat io/ u teat ones The feature sets described in the previous chapter are mostly straightforward. Many of the features are dependent on each other and only contain a small amount ol additional information. lfa feature set is to be stripped of redundant information, the feature vectors must he transformed in such a way that the components are orthogonal to each other. A conversion of this type can he achieved by the Karhunen Loeve transformation (Niemann. 1983: Young and Calvert. 1974). The components of a transformed feature vector can normally no longer he interpreted. hut they are uncorrelated and arranged in the feature vector according to their contribution in the classification. The (TSAR software offers the possibility of transforming feature vectors in a data file according to Karhunen Loeve and to insert the new feature alues for classitication purposes or to write them to a new feature file. The latter is again a part of the feature data bank and as such can he used to retrieve cell images from the image data batik or to create new features using the editor. This enables a better understanding of the transformed featLires. 1! inuci~jc’(101(1 baiiI~ cc’ This data bank currentl consists ofapprox. 25.000 cell Images. Data sets ofthis type require a large amount of memors (25.000 cell images need I .5 GByte of memor\ At present. the CESAR cyto-analyser uses streamer cassettes. In order to process or compare image data sets. the cell images are copied from tape to the disks of the Bernoulli box. where the image can he accessed at high speed. In future this method will be replaced by fast-access optical disks. Besides the grey value information each cell image contains additional data like the specimen nLimber. expert opinions and the coordinate of the cell location on the slide. ftc’
.S/3c’cunen iiiait cu/el/i c’/ it The specimens are allocated a 6—digit code in the clinic (consecutive iiumher and year). This code is a part of the specimen measurement protocol and is ~ ritten on e~er~
CESAR: A Computer Supported Measurement System
37
cell image as well as in the related feature file. This number is also entered in the relational data bank. In this way the correct arrangement of patient data, feature files, cell images and specimen is guaranteed. ACCESS TO THE IMAGE DATA BANK The concept of the CESAR system is based on the quantitative evaluation of measurement values and the incorporation of empiric knowledge for supplementary and checking purposes. For this reason, all methods of feature evaluation have been designed in such a way that results can immediately be used as an access path to the cell image data bank. This means that the evaluation of features is synonymous with the generation of a ‘graphic directory’ for the image data bank. The term ‘graphic’ is used because every type of feature evaluation available on the CESAR system is graphically represented. Access to the cell images is achieved directly via this display. The generation of new features by linking the 60 basic parameters, using the four basic calculation modes is also a part of this concept. Because this linking process can be carried out at any time (even after feature extraction), and exceptionally quickly (within seconds), on an existing feature file, even more comprehensive possibilities exist for creating access paths. It is only due to this concept that data sets can be visualized very easily and can be accessed, evaluated and interpreted on a level the cytologist is acquainted with because of his training. Histogram display One-dimensional frequency distribution is displayed in the form of a histogram. Data relating to different measurements or to different patients can be incorporated into one histogram. The individual sections can be allocated various colours in order to facilitate the visual interpretation of the histogram at a later stage. Figure 6 illustrates the frequency distribution of nuclei areas (in pixels) taken from two patients. The identification of the last patient to be included in the histogram, as well as the allocated colour code, are displayed in the upper margin of the image.
1652xxxX/87
01
ABS. H~Ei.jFICKEIT
30000
K PPEQ
Fig. 6. Frequency distribution of the nuclei area (in pixels) of two patients. Different gre\
values are used for the B W presentation instead of colours.
35
T. Gahm and B. Acikens
The cursor is used to select the features which are to be included in the histogram display from the feature menu. The input will be ignored if no data set is available for the selected feature. An exception to this occurs with features allocated to the linear combination’ class, whereby the input will only he ignored when one of the features used for the combination is not present in the data file, otherwise the data relating to the new feature will be calculated from the values in the remaining data file and displayed. This is a fast and flexible method of testing new features based on existing data files.
The mean value, S.D., skewness or excess of the frequency distribution can he displayed by indicating the appropriate fields (MEAN, VAR, SKEW, EXZE) during the feature selection. This is a very easy way of comparing populations according to their first four statistical moments. For example, Fig. 7 shows a comparison of the SE). of the integrated optical density of 14 Feulgen stained osteosarcoma and 14 aneurysmatic bone cysts. One can see from this comparison that the distribution of the IODN in the osteosarcoma population is considerably larger than that in the bone cyst.
085. ~/PEUFIG~SIT 4
i~ 5
1 0 0 ‘4
7 Comparison of the standard cier i:aion of the integrated optical den~tl\ ot OsieOsareOma (dark and 4 aileLir\snl:aie hone us’,is.
4
i/ic his! os//am used as a ‘dinc’c! on( oil/ic 1/11th/c dc//a ha/ik
If the contents of a histogram class is to he checked, one only iieeds to select the relevant class using the cursor. The selected class will be identitied by an arrow. When two histogram classes are selected, the range between these two classes will be marked (Fig. 8). A special data format is used to store the leatures of each measured cell population on disk or in the ideo memory. This format enables last access to the stored cell images. the feature values of which lie within the marked histogram classes. The result of this class feedback function ts that all cell images present in the selected classes will he copied to an overview image in reduced format (I ig. 1)1. The resolution of the individual cells in the overview image is naturall~not as good as in the original image. However. the overview image can he used 115 0 CO~\eiucnt ~graphic directory’ of the cell image data bank. The image of interest is selected using the cursor. it will then he retrieved from t lie data hank and loaded to I he i niage
CESAR: A Computer Supported Measurement System
a2803XXXX/~7
01
gos. /4gEUFICKEIT
O 4~e0
0
K I QOFI
Fig. 8. Marking offive histogram classes for feedback. The feedback region is identified by two arrows which can he interactively defined.
~ii~
Fig. 9. The feedback cell images (Feulgen ~i,iined osteos.irLoma cells) are displayed in an
over~ic’s Image.
39
1. (iahm and B. Aeikens
40
memory. In this way, both images can be displayed alternatively on the monitor (Figs 10, II).
As already mentioned, the original cell image contains coded information regarding the position of the cell in the specimen. Therefore, if required, the cell contained in the original image can be automatically relocated under the microscope (Fig. 12). In addition, all measurement values calculated for these cells can he displayed on the B W monitor (Fig. 13).
~T~1 V~4i
Fig.
0. Seleuiiie in image from the ose, S ft~ rn ge. u~iiiethe cursor.
________-
-,w,
-~
I l1~
ii..iiii I
in
is.
i~.oii’nl,iia.
li..Iil
ii
,.(i
ill~ ~ ‘ii
1 lorti
ii,. nI,’,’.’
iii
,ll
Ii
v
CESAR: A Computer Supported Measurement System
4!
t.. Fig. 12. The coded information in the cell image enables the related cell to be automatically positioned under the microscope.
-.
Fig. 3. Outpia of all feature values in the
feature file
which relate to the selected cell.
A powerful tool for understanding the cells and extracted features is provided by the close link between the image data bank and the evaluation. For example, this enables comparisons to be made, in one image, between different patients or between normal and tumour cells (Fig. 32). The ‘trend feedback’ has been specially implemented to provide a better understanding of the contents of a feature. A cell image is automatically selected from each histogram class. The images are then arranged in an overview image according to
47
f. (iahni and B. Aeikens
increasing feature values. To accentuate the differences in the feature values from one image to another, the numerical values together with the feature name can also he displayed. Figure 14 shows an example of the frequency distribution of the integrated optical density in a urothelium cell population. The related ‘trend feedback’ can he seen in Fig. 15. The ‘intersection feedback’ enables the specific access to cell images which fulfil
UN14JP0306/21
01
ties. HtiEUFICKEIT
i1
21
MI
0
3000
i’
l-ig.
4. I- requenci distribution of the integrated optical densits of a uroitieliciin cell population.
•
J6 ,.,,.e,~
is
i~o~
T:
~.I
si iced n,,. I,
\
L
.11 It
iii
e4i.
I’
lheseII
iii il’ni a IL ill,
irri,is,.. is ‘‘‘iiisi..IiiiLis.Iiiil,.\IliIL
‘~ l,~ I I 11,1111 5. h ‘‘I it I ,li~~sl I\L’l iii ‘ii’. ‘‘\sI .,\ IIiI,i.is li/i)”.,
. I
iii
,
‘1,1111.
CESAR: A Computer Supported Measurement System
43
several feature criteria. As explained above, the range in the histogram of the first feature which contains the cell images ofinterest for feedback, is marked. If, however, a criterion for the feedback is that the cell images must fulfil additional feature conditions, the marked range is stored in an intermediate memory and the histogram of the second feature of interest is displayed and the applicable range must be marked. Now, when the feedback function is activated; the overview image will only contain those cell images, the feature values of which are present in both ranges. If one defines the range in the histogram to the mean value u (e.g. u±2h,b. .class width), then one has the possibility of selecting ‘typical’ cell images from the actual data set, with respect to the appropriate features (Fig. 16). .
Fig. 16. Intersection feedback: For example, the displayed cell images are located in the twoclass environment of the mean value of the features IODN, AREA, FCIR, OMEA. (An explanation of the abbreviations is given in Basic f~atureset).
Histogram normalization For some applications, it is necessary to normalize the frequency distribution before quantitative evaluations can be carried out (e.g. DNA measurements). The CESAR has two normalization functions which can be used in such cases. If a reference for the normalization is not available, or if the reference data is mixed with the cells to be normalized and stored in the same feature file, then the cursor can be used to enter the reference values interactively. These values will then be stored and made available as references for the next histogram evaluation. The second possibility for normalization defines the reference values from a population of reference cells. The reference cell population can be derived in various ways. The acquisition of external references can be achieved either from a special specimen or overlaid on the specimen which is to be measured (e.g. chicken erythrocyte). Internal references are already present on the specimen to be measured (e.g. squamous epithelium or leucocytes). In the latter two cases, the reference areas on the specimen can already be taken into consideration by determining the scan regions. The reference cells are presorted in a different way and their images are stored on a different disk.
T. (iahm and B. Aeikens
44
In all three cases, after the feature extraction process. a data file of the reference cell population is created. Basically the CESAR can use any other feature data file as reference data file. In this way each feature can be normalized by the reference value determined from the frequency distribution of the applicable feature in the reference population. The definition of reference values is illustrated in Figs 17 aIid 18. The reference histogram is superimposed over the histogram which is to he normalized (Fig. 1 7). In order to determine the reference values as accurately as possible, the marginal classes of the reference histogram can be interactively removed. The reference
43Tlsxss.9,~
01
ties’ HtiuFIC/‘ LIT 33
0
~ 0
4000
~ I 0 D II
Fig. 17. Determination of the reference value from a reference cell populaiion,
43TlftX~~/S6
01
ties. HtiEUF IC KEI~ 40
80
33 iii i/i
iii Ii ii Ii -III
~I/ *11 Iii Ii
III
i/i iii iii iii iii iii is iii ill iii iii iii
01 0
4000
~ 3080
11g. 15. The reference salue is entered in the histogram.
CESAR: A Computer Supported Measurement System
45
value is then automatically determined from the remaining population. It is often necessary to multiply this value by a correction factor (Böhm, 1968; Böhm et al., 1968) which is entered interactively or stored as a default value. The corrected reference value is then entered in the histogram (Fig. 18) and is available for subsequent evaluations. Scattergram display A scattergram is used for the simultaneous display of two features. Similar options to those described for the histogram display are also available for this type of data presentation. The features to be displayed are selected from the feature menu. Feature values of individual cells, as well as the first four statistical moments of the whole cell population, can also be displayed in the scattergram. The scattergram, therefore, provides for the combination of the following types of features: —2 cell features from the same patient. —2 cell features from 2 different patients. —2 population features. —I cell feature and I population feature. The scattergram used as a ‘directory’ of the image data bank Similar to the histogram, the scattergram can be used as a ‘graphic directory’ of a part of the cell image data bank. By marking individual points on the diagram with the cursor, individual cells complying with the appropriate characteristics defined by the features can be specifically retrieved from the image data bank (Figs 19, 20). Should several cells be located in the same position on the scattergram, and as such represented by one point only, they will still be correctly interpreted by the feedback and the applicable images retrieved. The individual cell feedback can be supplemented by range feedback. Ranges can be interactively marked in the scattergram via the cursor, or they can be defined by dilated calculated curves, e.g. a regression curve (see Figs 21 and 22). The cell images relating to the points included in the marked range are displayed in the normal way. In the same way as the trend feedback in the histogram display, cell retrieval can also take place systematically (according to increasing feature values in x- or y-direction). Numerical feature values can also be displayed for comparison purposes (Fig. 23). ‘Feature tracking’ is an additional auxiliary aid which is of particular interest with respect to gaining a better understanding of a data set. First the cells or cell clusters of interest are interactively marked in a scattergram (each dot represents at least one cell). In following scattergrams with different feature combinations, these cells then can always be recognized because their dot representatives are pointed out by a special colour code. Figure 24 shows an identified cell cluster in a scattergram of the nuclei area AREA and the integrated optical density IODN. Now, for example, to answer the question as to where these cells are located in a scattergram of the nuclei area AREA and the mean optical density OMEA, the cells in the marked clusters can be displayed in a new colour coded diagram. It will then immediately be apparent that the cells also cluster in this combination of features (Fig. 25). C’lassi/ication
Classification is another step in evaluating cells. This function enables unidentified cells or cell populations to be automatically allocated to classes which are normally empirically predefined. During the training phase, these classes can be arbitrarily defined and modified by the user, and thus easily adapted to the specific problem. In order to create a knowledge base, the CESAR is interfaced to the IPACS
46
T. Gahm and B. Aeikens
1432’IXXXX/86
01
20000
K R
E
~ S
,
,,
4’
0 0
4000
4
I 0 0 t~
I ig. 9. Individual cell feedback: The cursor is used for the selection.
I
ig. 20. 1 id i sidua I cell feert hack [he .i pp1 cable irn.ige is dispIa~ cr1 ut I he o se I ic’s iiliags iii .1 I cr1riced fo rina i
CESAR: A Computer Supported Measurement System
R2734XXXX/8”
01
i eeoe
~0
~ 0
2000 K 300
Fig. 21. Interactively traced feedback range in a scattergram.
•255~0106/86
01
*
o
o
*eeei K ZOOM
Fig. 22. A feedback range derived from a dilated regression curve.
47
48
1. Gahm and B. Aeikens
—~-
-~
Fig 23. Trend feedback: The cells lire displayed in the overvie’s image according to increasing feature salucs in s direction (IODN I
8~
01
20000
K 0
0
40130
I
lig’ 24 Marking
:i
ccl] cluster In
Ii
I 00 0
scaiterpram (nucleus area to iiiicgr:iied opiic~ld~iisil~1
CESAR: A Computer Supported Measurement System
49
U43,”1XXXX/86 01
20000
K
A
0 0
t
K OMEQ
Fig. 25. Feature tracking shows that cell clustering also occurs with another combination of features (nucleus area to mean optical density).
[Interactive Pattern Analysis and Classification System (Kirndorfer and Kontron, 1987)] classification software package. The knowledge base which is generated using this package can then be incorporated in the CESAR cyto-analyser for classification purposes. Training phase In addition to selecting a suitable classifier, the dimension and characteristics of the training sample as well as the reliability of the reference values are of prime importance when creating an efficient knowledge base for classification. The input of reference values depends on the specific task, therefore various possibilities for defining the reference values are available. To create a knowledge base for individual cell classification, every cell in the training sample must first be given a reference appraisal. In the simplest form, this can be done by globally allocating an appraisal to all cells from the same patient. A typical application for this method would be, for example, to apply the diagnosis made by a histologist with respect to a histological section. This method has the disadvantage that all cells from one patient would always be allocated to the same class. The possibility could arise therefore, that when carrying out a classification at a later stage, the cells could be allocated according to the patient and not, for example, according to malignancy. To a large extent, this problem can be overcome if an opinion is given for each stored cell image by an expert who has no previous knowledge of the case. The cells would then be allocated to several classes and the association between class and patient would be eliminated. Both methods are available in the CESAR software. Global appraisals can be entered via a simple measurement protocol and, in the same way as the cell positions, stored directly in the applicable grey value image. The input of the individual cell appraisals is achieved by retrieving the relevant cell images from the image data bank and displaying them on the monitor (Fig. 26). The
50
T. (Jahm and B. Aeikens
__ ~~k~__ KCO
.0
KCO~,
ccg
KCO
,
KtO
~ Fig. 26. Input md dm’.pI.i~ il indisidu,il cell .ipprais.ils the .tpplieihle rell can he automaiis,il]s p’siiioned undci ihe niieis’scopc b~selecting one 01 the squ.lle tlelds.
required image is then marked and the appraisal is entered via the keyboard. This information is also coded directly into the grey value image and can be retrieved at any time. In particular, this method enables the opinions of various experts to he stored. compared and evaluated independently from each other, or the consistency of the appraisals given by one expert to he monitored over a period of time. It is not always easy to give an appraisal of individual cells from grey value images because very often the cell environment must also he taken into consideration for the diagnosis. This problem is overcome by making the information readily available: simply by moving the cursor to one of the square fields (Fig. 26) any one of the cells displayed on the monitor can be automatically positioned under the microscope. The appraisal can then he carried out on the original cells taking the context information into consideration (Fig. 27) or by using high resolution magnification (Fig. 28). A third niethod for entering reference values for individual cells can he achieved using a scattergram. For example. based on their DNA content, the stored cells in a cell population are to he allocated to twd) different classes: the different ranges on the scattergram are traced using the cursor, the cells within the contour are allocated to class I and the remainder cells are allocated to class 2. Numerous classes can he generated by consecutively repeating this procedure. The featLire editor is used to select the features. Up to 10 ditlerent coliditions can he predefined for each feature name. (on of a classifier The following 10 classifiers are supported by the (TSAR cvto-analyser:
-----Minimum distance classifier --Mahalanobis distance classifier -Quadratic distance classifier
(ME)) )MH) )QD)
—---Bayes classifier
(BY)
--Linear polynome classifier ----Quadratic polynome classifier
(LP) )QP(
CESAR: A Computer Supported Measurement System
o,’
51
-!
Fig. 27. The individual cell appraisal can now also take the context information into consideration.
,~ ~
..,.
~
—
____
•~ Fig. 28. Individual cell appraisal using the original cell under the microscope at a magnification of 64 x 1.6, aperture 1.40.
—Linear geometric classifier —Quadratic geometric classifier —Next neighbour classifier (Euclidian distance) —Next neighbour classifier (city block)
(LG) (QG) (NE) (NC)
A detailed description of the classification strategies can be found in Meisel (1972), Niemann (1983) and Young and Calvert (1974). The selection of the most suitable
52
T. Gahm and B. Aeikens
classifier for a specific problem is empirically achieved by comparing classification results. Classification phase
After calculating the classification matrices, they can be stored on floppy or hard disk and loaded into the video memory when required. The user can then switch between knowledge bases expeditiously. In this way unidentified cells, or cell populations. can be processed using different and/or differently ‘trained’ classifiers. Contrary to the exceptionally time consuming and computing-intensive training phase (it takes several hours on a Motorola 68000 to compute the knowledge base of a quadratic classifier using 10 features and a training sample with approx. 6000 vectors) it only takes a few seconds to actually run the classification procedure on a cell population.
The results can be output numerically, whereby the absolute cell counts, or the relative portions thereof, are indicated for each individual class, or they can be output graphically. In the latter case, a colour is allocated to each class, and the classification result is projected on a histogram (Fig. 29) or a scattergram (Fig. 30). according to the features which were defined by the user in the feature editor. Not only does this procedure give a clear and concise visual impression of the results, but it also has the added advantage that all feedback possibilities mentioned in the previous chapters can be used (including repositioning the original cells under the microscope so that the classification result can be checked) and in turn, the display can be used as a ‘graphic
directory’ for the cell image data bank with an ordinal index for tumours. For example. Fig. 32 shows a comparison of diploid and aneuploid cells taken froni an osteosarcoma, which was achieved simply by tracing the required areas in the applicable scattergram (Fig. 31). The class index and the patient number are also coded into the grey value image. The graphic display of various cell populations simultaneously in one image, or the reduced display of four populations, enables visual comparisons to be made between
o1t:t~. ties.
-
HOLUFICI
LIT
26
0 0
2000
S
I 0 0 I’
~
Fig. 29. Projection of the classification results on a histimgrarn. e.g. the integrated optical densii~of an osteosarcoma. iwo classes: B benign cells: M malignant cells
CESAR: A Computer Supported Measurement System
01
3176XXXX/85
6008
V
4
:~0.:
a:
* I 0
0 0
600
K 100 ‘4 10MM
Fig. 30. Projection of the classification results on a scattergram of nuclei areas and the integrated optical density of an aneurysmatic bone cyst. Two classes: B—benign cells; M-—malignant cells.
4371XXXX/86
01
20800
K
I 8
4000 K 10014 MM
Fig. 31. Feedback of the classification results of an osteosarcoma by tracing the regions of interest.
53
T. Gahm and B. Acikens
54
Fig. 32. Feedback result: Comparison of benign and malignant cells. Ihe upper index in each grey value image refers to the patient number and the lo’ser odes to the class allocation.
the various classification results and between the reference indexing defined by experts
and the computerized classification results. In this way cells which deviate from the class allocation can be easily recognized, and if necessary can he positioned under the
microscope and reappraised. DISCUSSION The concept of designing a cytological evaluation program, which unlike
conventional methods also comprehensively integrates human empiric knowledge in the quantitative measurement and evaluation procedure, provides for completely new possibilities of checking and interpreting the results. Due to the close link between images and extracted feature values the cytologist no longer has to deal exclusively with numbers and their graphical representations but he also can ‘translate’ the numerical results into cell images or even go back to the applicable original cells. Not only does this provide a ‘plausible’ means of checking the computer results by comparing them with one’s own empiric knowledge, but it also makes it easier to interpret the connection between quantitative features and empiric knowledge. This aspect, for example. facilitates the choice of characteristic feature sets for classification purposes. It is also a very powerful tool in respect of explicit knowledge representation
in cytological expert systems. The possibility to define new and more specific features even after cell acquisition and feature extraction can be used to improve acquisition and to systematically supplement ones own empiric knowledge. These attributes make the (‘ESAR particularly suitable as an educational training systeni. The easy way of checking the results encourages the completely automatic cell acquisition. Problems which might occur like wrong or incomplete object preselection. segmentation and background problems can easily be detected and mostly corrected afterwards. Due to the fact that all cell positions and the applicable images are stored. the whole scanning path can he traced automatically ~shene~cr needed and the cells in
CESAR: A Computer Supported Measurement System
55
the microscopic image can be compared with the images of the objects collected during data acquisition. Thus CESAR is well suited to fill the gap between interactive evaluation systems and the fully automated ones. The scanning way tracing can also be nicely applied to measure the same cells more than once, after restaining for example. Because the results are always graphically displayed, and access to various data banks is also achieved via graphic directories, even complicated mathematical operations (e.g. classification) can be easily interpreted, understood and controlled by inexperienced users. This means that the user can concentrate on the cytologic evaluation instead of on the system. REFERENCES Böhm, N., 1968. Einfiufi der Fixierung und der Säurekonzentration auf die Feulgen-Hydrolyse bei 28CC. Hi,stochemie, 14: 201—211. Böhm, N., Sprenger, E., SchlUter, G. and Sandritter, W., 1968. Proportionalitätsfehler bei der FeulgenHydrolyse. Histochemie, 15: 194—203. Haralick, M. R.,. Shanmugan, K. and Dinstein, 1., 1973. Textural features for image classification. IEEE Transactions On Systems, Man, and Cybernetics, Vol. SMC-3, No. 6. Haberäcker, P., 1989. Diqitale Bildverarbeitung, Grundlagen und Anwendungen, Carl Hanser Verlag, München, Wien. Kirndorfer, H. and Kontron, 1987. IPACS Reference Manual 8057 Eching, Breslauer Str. 2. Meisel, W. S., 1972. Computer-oriented approaches to pattern recognition. Mathematics in Science and Engineering, Vol. 83. Academic Press, New York, San Francisco, London. Niemann, I-I, 1983. Kla.ss)fikation von Mustern, Springer-Verlag, Berlin, Heidelberg, New York, Tokyo. Takahashi, M., 1987. Farbatlas der onkologischen Zytologie dt. Ubersetzung der 2. Auflage, perimedFachbuch-Verl.-Ges., Erlangen. Young, T. Y.and Calvert, R. W., 1974. Classification, Estimationand Pattern Recognition. American Elsevier Publishing Company, New York, London, Amsterdam.