Feature evaluation for automated cancer cell identification

Feature evaluation for automated cancer cell identification

Comput. Biol. Med. Pergamon Press 1975. Vol. 5. pp. 245-255. Printed in Great Britain FEATURE EVALUATION FOR AUTOMATED CANCER CELL IDENTIFICAT...

697KB Sizes 0 Downloads 73 Views

Comput.

Biol. Med.

Pergamon

Press

1975. Vol. 5. pp. 245-255.

Printed

in Great

Britain

FEATURE EVALUATION FOR AUTOMATED CANCER CELL IDENTIFICATION Y. IMASATO,T. YONEYAMA,*S. WATANABE*and H. GENCHI* Department

of Medical Systems, Toshiba Electric Co., Ltd. 30, Hisamoto, Takatsu-Ku, Kawasaki, Japan (Received 21 June 1973 and in revised form 4 November 1974)

Abstract-This paper describes a study on cellular patterns of cervical smears to aid in the development of an automated prescreening instrument for cervical cancer. Normal and abnormal cell images were photographed and stored on magnetic tape along with other data collected on each patient and each cell. This data, which included classifications by means of the percentile method, was then introduced into an information-theoretic ambiguity function to evaluate the usefulness of cellular parameters in grouping the cells into two categories, normal and abnormal. Prescreening evaluation

Automated cytology Cervical cancer Ambiguity Diagnosis

Pattern recognition

Feature

INTRODUCTION The shortage in trained cytotechnologists is becoming a serious problem as the need for mass population screening increases for early detection of cervical cancer. Since less than five out of 1000 cervical smears in mass screening have the possibility of cancer, the development of an automated prescreening instrument which can distinguish reliably between normal and suspicious smears is one of the most practical solutions. The system named CYBEST (CYto-Biological Electronic Screener by Toshiba) has been developed based on pattern recognition techniques, since it’was found that cell malignancy appears predominantly in its morphological patterns. In this article, the method of data collection and the study of feature evaluation, which contributed to the development of this hardware, are described. DATA

ACQUISITION

For the automation of morphological diagnosis in cytology, it is necessary to study the quantitative characteristics of cellular patterns as well as establish a diagnostic logic through computer simulation. Since the features of cells on the slide are varied and complex, statistical analysis is indispensable in obtaining a logic to differentiate between normal and suspicious cells. Color slide transparencies were taken of cellular patterns with varying degrees of abnormality. To avoid the color discrepancy due to differences in film production, Ektachrome film with the same emulsion number was used. A cell was positioned in the center of the field and photographed through a microscope with an Olympus Photomax, Model LB-A camera. The magnification of the objective and the ocular were 40 x and 10x respectively, but the total magnification on the film was about 200x due to the intermediate lens. * R/D Center, Toshiba Electric Co., Ltd. 245

246

Y. IMAMASATO, T. YONEYAMA, S. WATANABE and H.

Cytoplasmic features

Nuclear features Density

Dense Medium Light

Color

Rim

Thick Normal Obscure

Density

Shape

Regular Irregular

Amount

No. of nucleus

Mononucleate Polynucleate

Chromatin structure

Fine net Coarse net Fine grain Coarse grain Uniform None Slight Destruction Fusion Picnosis No nucleus

Nucleolus Vesicle Size

Green Red Orange Dense Medium Light Scant Medium Wide No cytoplasm

Vesicle

Degeneration

GENCHI

Nonexistent Existent

Inclusion

Nonexistent Existent

Clear Obscure Nonexistent Existent Large Small Fig. 1.

Samples were collected mainly by a cervical scraping technique from the out-patients of Japan Central Red Cross Hospital and the central Hospital of Japan National Railways. Data on the patient and on the photographed cells was collected by two cytopathologists and three cytotechnologists using two forms labeled A and B. Form A, which corresponds to each smear, contained the patient number, institute number, age, name, date of test, menstrual cycle, cell collection technique, preparation techniques, cytological diagnosis according to the Papanicolaou classification, clinical findings, histological diagnosis and so forth. Form B, which corresponds to each photographed cell, included patient number, degree of abnormality by Papanicolaou classification and type of cell. Furthermore, Form B also included the classification of the 14 nuclear and cytoplasmic features in Fig. 1 by medical professionals. The criteria for classification were made in advance and the adjustment of all discrepancies in judgement was agreed upon. The Papanicolaou classification has been intended as a cytological judgement on an entire smear to describe the degree of abnormality, but not used to classify a single cell. Here, in selecting a cell to photograph, not more than 10 cells upon which the final cytological diagnosis of the smear was based or which showed the highest malignancy within the smear were chosen. In other words, if the smear classification was Pap V, the cell chosen from this smear was classified as Pap V. However, there were not many smears classified as Pap III,, or Pap III, in the routine clinical examination.

247

Automated cancer cell identification

Therefore, cellular photographs representing these degrees of abnormality were also taken from smears diagnosed as Pap IV and Pap V. On each slide, there were also artifacts other than normal and abnormal uterine cells. These, designated as “others” in Table 1, include leukocytes, erythrocytes, mucus, cell remnent, trichomonus, candida, bacteria, sperm, scratches on the slide surface and dust, which were also chosen at random and photographed for simulation. Table 1 Category

No. of photos 235 93 175 243 267 330 123 1470

I II III, III, IV V

P

AP Others Total

Others include leucocyte, erythrocyte, sperm, bacteria, etc.

The nuclear and cytoplasmic perimeters of the photographed cells were enlarged by slide projector and manually traced on sheets of paper. Then a set of one photo and two traced patterns were stored on magnetic tape in digital form. The photograph digitization was accomplished by using a flying spot scanner-computer system with actual resolutions of 1 micrometer discriminating 64 gray levels on a raster of 108 x 108 spots. The cytoplasmic and nuclear perimeters were also stored on magnetic tapes in binary form through a vidicon camera. The position alignment of each cell pattern and the two perimeter patterns were accomplished by visual inspection on a digital display. A block diagram of the system is shown in Fig. 2. The two data forms, A and B, were transferred to IBM punch cards and stored on magnetic tapes. The number of cells collected is tabulated in Table 1.

Computer

Controller

FSS VDC

L

system TQSBAC 3400

Console typewriter

Fig. 2.

4

Line printer

248

Y. IMASATO, T. YONEYAMA. S. Nuclear

WATANABE

and H. GENCHI

density

Dense Medium

Light

Chromatin Fine

net-like

Coarse Fine

pattern

net-like

granular

Coarse

granular

Uniform

Cytoplasmic

amount

No cytoplasm

Medium

Nuclear

rim

Fig. 3.

ANALYSIS

AND

DISCUSSION

Based primarily upon the data from forms A and B, each feature and several combinations of features were studied from the viewpoint of distinguishing between normal and suspicious cells. Figures 3-6 show the ratio of feature values within each Papanicolaou classification. As has been reported before Cl], the ratio of cells with high nuclear density increases as the degree of cellular abnormality increases. That is to say, only 6.3% of the cell nuclei in Pap I are densely stained, whereas in Pap V, 88.2% of the cells have denselystained nuclei. Accordingly, the percentage of medium-stained nuclei decreases from 92.6% in Pap I to 10.9% in Pap V. In the statistics on chromatin structure, the percentage of nuclei with fine chromatin grains decreases from 46.8% in Pap I to 10.9% in Pap V, while the percentage of nuclei with coarse chromatin grains changes from 2.4% in Pap I to 67.3% in Pap V. Nuclei with uniform chromatin structure occupy 48.9% of Pap I, due to the cornified squamous epithelial cells, however, the percentage did not change much in Pap II through V. Looking at the amount of cytoplasm which is used as the basis for the nuclear cytoplasmic ratio, it was found that cells with scant cytoplasm occupy from 20% of Pap I to 80.6% of Pap V, whereas cells of medium cytoplasm decrease from 76.3% of Pap I to 13% of Pap V. Cells with naked nuclei (no cytoplasm) and with wide cytoplasm did not show a remarkable percentage change with the change in abnormality. Thick

Automated cancer cell identification

249

Nuclear degeneration None

Slight De:;;ction Picnosis No nucleus Nuclear shape Regular Irregular

Cytoplasmic

shape

Regular Irregular

Cvtoplasmic _

color

Green Red Orange

Fig. 4.

nuclear rims appear 4”/, of the time in Pap I and increases to 77.3% in Pap V, while normal nuclear rims decrease from 91.3% of Pap I to 20.6% of Pap V. Almost all cells in Pap I have regular nuclear shape while half of the cells have irregular nuclear shape in Pap V. Cytoplasmic shape and color did not show a noticeable change with the increase in the degree of abnormality. Nucleolus and the number of nuclei in a large percentage of Pap I cells were obscured and mononucleated respectively while lO-16% of cells were conspicuous nucleolus and 11-24x of polynucleated cells were observed in Pap II-V cells. Nuclear size has often been said to be one of the most prominent features in abnormal cells. In this statistical study, cells with small nuclear size decrease from 90.4% of the Pap I cells to 50% of the Pap II cells to 28.3% of the Pap V cells. Looking at these results it would seem that if the occupying ratio of a feature value is heavily dependent upon the degree of abnormality, that feature would provide a good means of discriminating between normal and suspicious cells. However, if only one feature such as nuclear size is considered for discrimination, intolerable false positive and negative cases would appear. To evaluate the “goodness” of each feature individually and in combination with other features, an ambiguity function was introduced. Here, the following well-known information-theoretic function was used [ 11.

where C, Y and N mean feature value, value in classifying category

(i.e., normal and

250

Y. IMASATO,T. YONEYAMA, S. WATANABE and H. GENCHI Nucleolus Prominent Obscure

No. of nucleus Mononucleus Polynucleus

Cytoplasmic

density

Dense Medium Light

Vesicle in cytoplasm Nonexistent Existent

abnormal cells in this case) and the number of category, (i.e., 2 in this case), respectively. N is employed to normalize the uncertainly, HK, between 1 and 0. Hence the closer H, is to 0, the better the feature for discrimination, The calculated ambiguity of each feature is illustrated in Table 2. According to the data, the chromatin structure shows the least ambiguity and this value is obtained when the chromatin structure is divided into five values, i.e. fine net, coarse net, fine grains, coarse grains and uniform. If the cells have degenerated so as to obscure the chromatin structure, which is often seen in a self-irrigation smear, this five-separation would be very difficult. Table 2 AMBIGUITY I IONE FEATURE1 CHROMATIN PATTERN

0.7489

NUCLEAR DENSITY

0.7498

CYTOPLASMIC AMOUNT

0.7809

NUCLEAR RIM

08024

NUCLEAR AREA

aBz94

CYTOPLASMIC SHAPE

o.B993

MJCLEAR CEGENERATION

0.9060

NUcL5wwAPE

09DGO

Automated

cancer

cell identification

251

Nonexistent Exlstdnt

Inclusion in cytoplasm Ngnexlstent Exlstent

Cell

isolation

Isolate Group Other

Nuclear

size

Large Smal

I

Fig. 6.

Tables 3 and 4 show the ambiguity of combining three and four features, respectively. The smallest ambiguity for three-feature combinations was obtained from nuclear density, chromatin structure and cytoplasmic area, while adding nuclear area to these provided the smallest ambiguity for the four feature combinations. It should be remarked that the combination of features with small ambiguity was not necessarily the best combination. Table 5 shows the case of combining 6 features: nuclear density, chromatin Table

3

AMBIGUITY 2 ITHREE FEATURES) CHROMATIN NUCLEAR

P/LTTERN DENSITY

CYTOPLASMIC CHROMATIN

PATTERN

NUCLEAR RIM CYTOPLASMIC

0.5094 AMOUNT

CHROMATIN PATTERN NUCLEAR ‘SHAPE NUCLEAR CHROMATIN

0.4892

AMWNT

0.5 I45

DENSITY WTTERN

NUCLEAR

DENSITY

b&CLEAR

RIM

0.5188

252

Y. IMASATO,T. YONEYAMA, S. WATANABE and H. GENCHI Table 4

AMBIGUITY

3

(FOURFEATURES)

CHROMATIN PATTERN NUCLEAR DENSITY NUCLEAR AREA CYTWLASMIC AMOWT

0.3429

CHROMATIN PATTERN NUCLEAR DENSITY NUCLEAR AREA NUCLEAR RIM

0.3679

CHROMATIN PATTERN NUCLEAR AREA NUCLEAR RIM CYTOPLASMIC AMWNT

0.3769

CHROMATIN PATTERN NUCLEAR DENSITY NUCLEAR SHAPEI cmPLASMlc AMaJNT

a3794

Table 5

AMBIGUITY 4 ISIX FEATURES) CHROMATIN PATTERN NUCLEAR AREA NUCLEAR RIM NUCLEAR SHAPE NUCLEAR DENSITY CYTOPLASMIC AMOUNT

Number of feature

Fig. 7.

0.2060,

Automated cancer cell identification

253

structure, cytoplasmic area, nuclear size, irregularity of nuclear perimeter and thickness of nuclear perimeter. The obtained value, H, = 0.2060, is equivalent to, for instance, 0.5% of falsely negative and 5% of falsely positive cases. Figure 7 shows the decrease of ambiguity along with the increase in the number of features considered. The dashed line represents the minimum uncertainty of each level of feature combination and the vertical length illustrates the variation of ambiguity of the combinations of features at each level of combination. When increased number of features are considered, uncertainty decreases remarkably, but in order to obtain H, = 0.1, more than 10 features have to be taken into consideration. Since the morphological pattern seems to change gradually from normal to abnormal cells, the ability to separate between Pap I and others (Pap II-V) was also tested, although patients of Pap II are not re-examined after the conventional mass screening. Table 6 shows Table 6 Ambiguity (One feature) Nuclear density Cytoplasmic amount Chromatin structure Nuclear area Nuclear rim Cytoplasmic shape Nuclear shape Nuclear degeneration No. of nuclear Cytoplasmic color Nucleolus Cytoplasmic vesicle Cytoplasmic density Cell isolation Cell inclusion Nuclear vesicle

0.6873 0.7341 0.7368 0.7472 0.7666 0.8466 0.8572 0.8811 0.9329 0.9337 0.9408 0.9718 0.9830 0.9877 0.9890 0.9894

the results of this test when each feature was considered individually. A slight decrease in ambiguity and some changes in order were observed, compared to the former results. This stems from the fact that a cell in Pap II is close to an abnormal one morphologically. Inflammatory cases, namely, acute cervical inflammation and trichomonus infection, also contribute to this fact since they tend to be classified as Pap II by human observers, but are common causes of false positive cases in instrumental classification.

CONCLUSION The pioneering study of the cytological prescreening instrument, Cyto-analyser, which was done by researchers at the National Cancer Institute and the Airborne Instruments Lab., clarified by hypothetical instrumentation that nuclear features (density and size) are sufficient for differentiation between normal and suspicious cells [2,3]. It also reported that preparation imperfections, such as leukocyte aggregates prohibited the development of a clinically useful instrument [4]. There have been other extensive studies

254

Y. IMASATO. T. YONEYAMA, S. WATANABEand H. GENCHI

on feature evaluation for the purpose of developing a cytological prescreening instrument, in which only one measurable cellular feature (nuclear size) was considered [S]. In our study no cellular feature was measured directly. Rather, fourteen features of a cell were rated and classified professionally and served as possible criteria for screening. As this report shows, it is very difficult to distinguish between normal and abnormal cases if the features of each cell are considered individually. Since the purpose of mass population prescreening is to distinguish between normal and suspicious smears, not between normal and suspicious cells, the statistical features within an entire smear should be considered. As a result of this study of cellular feature evaluation, the following conclusions have been reached : (1) The chromatin structure observed in the nucleus is the best single feature for the discrimination between abnormal and normal cases, provided that it is rated according to five classifications; namely, fine net, coarse net, fine grain, coarse grain and uniform. (2) Nuclear density, nuclear size, chromatin structure, nuclear perimeter and nuclear shape are good discrimination features in the sense that the percentage of cells categorized similarly for each feature changes drastically with the increase in the degree of abnormality. (3) Even if the combination of six features, which were commonly employed by human observers, was utilized, the uncertainty on a single cell decreased only to 0.2020, which is equivalent to, for instance, a 5% false positive and 0.5% false negative rate. (4) Finally, the morphological features of each cell as well as the statistical features within a smear should be considered to differentiate between normal and abnormal smears. SUMMARY This paper describes a study on cellular patterns of cervical smears to aid in the development of an automated prescreening instrument for cervical cancer. About 1500 normal and abnormal cell images were photographed and stored on magnetic tape along with other data collected on each patient and each cell by two cytopathologists and three cytotechnologists. This data, which included classifying certain cellular features and also combining some of these classifications by means of the percentile method, was then introduced into an information-theoretic ambiguity function to evaluate the usefulness of each cellular parameter in grouping the cells into two categories, normal and abnormal. Acknowledgement-The authors would like to express their gratitude to Dr. Ryosei Kashida (Toshiba Central Hospital), Dr. Noboru Tanaka (Chiba Cancer Center Institute), Dr. Masanobu Takahashi (Central Hospital of the Japanese National Railways) and their associates for providing us samples and giving us valuable advice and also to Mr. Jim Ungerleider (National Biomedical Research Foundation, Washington, D.C.) for his suggestions and edition of this paper.

REFERENCES 1. P. H. Bartel, G. F. Bahr and G. L. Wied. Information theoretic approach to cell identification, Automated Cell Identijication and Sorting, G. L. Wied and G. F. Bahr (eds.), pp. 361-384 Academic Press, New York (1970).

Automated cancer cell identification

255

* W. E. Tooles. W. J. Harvath and R. C. Bostrum. A studv of the auantitative characteristics of exfoliated cells from the female genital tract--I. Measurement methods and ‘results, Cancer 14, 437 (1961). W. E. Tooles, W. J. Harvath and R. C. Bostrum. A study of the quantitative characteristics of exfoliated cells from the female genital tract-II. Suitability of quantative cytology measurements for automatic prescreening, Cancer 14, 455 (1961). C. C. Spencer and R. C. Bostrum, Performance of the cytoanalyser in recent clinical trials. J. Nat. Cancer Inst. 29, 267 (1962). M. D. Rimmer, The feasibility of automating pre-screening of cervical smears on the basis of nuclear size, J. Obsrect, Gynec, Br. Cmwfth 75, 1045 (1968). Ahout the Author-Yurcw IMASATO received the’B.S. degree in electronic engineering from the University of Electra-communications in 1963 and the M.S. degree in biomedical engineering from the University of Pennsylvania in 1969. Since he joined Toshiba Electric Co., Ltd., his research activities include a variety of biomedical instrumentations and medical care systems. From 1974 to 1975, he was an exchange researcher to Sweden through Japan-Sweden Foundation and engaged in the investigation of the computer application in medicine. At present, he is a senior engineer in Biomedical Engeneering Division, Toshiba Electric Co., Ltd. where he is in charge of medical care systems and biomedical instrumentations. He is a member of Japanese Society of Medical Electronics and Biological Engineering. About the Author-TsuNEo

YONEYAMAreceived the B.E. degree in electronics from Tohoku University, Sendai, Japan in 1962. He joined Central Research Laboratory, Toshiba Electric Co., Ltd., Kawasaki, Japan in 1962, where he has been engaged in research of electronic image transmission and recording systems. He is presently concerned with electro-optical I/O devices relating to pattern recognition, as a researcher of the Information Systems Laboratory, Toshiba Research and Development Center. About the Author-SADAKAZU WATANABEwas born in Kyoto,

Japan, on January 11, 1938. He received the B.S. degree in physics from the University of Kyoto in 1962. He joined Toshiba Electric Co., Ltd. in 1962, where he has been doing research on pattern recognition and image processing, especially for the development of optical character readers and medical image processing systems. He is now a senior research scientist of Toshiba Research and Development Center. He is a member of Society for Electonics and Communication Engineers of Japan, Japanese Society of Medical Electronics and Biological Engineering, and Pattern Recognition Society. About the Author-Hraosrn

GENCHI was born in Tokyo, Japan on November 7, 1934. He received the B.S. degree in applied physics from the University of Tokyo in 1957, the M.S. degree in electrical engineering from Massachusetts Institute of Technology in 1962, and the Ph.D. degree from the University of Tokyo in 1971. He ioined Toshiba Electric Co., Ltd. in 1957. Since 1962 he has been doing research in pattern recognition. He was awarded Okochi Memorial Engineering Award in-1970 for the development of Handwritten Postal Code Reader-Sorters for the Ministry of Posts and Telecommunications in Japan, in which he was the acting project leader. He now is the assistant to the director of Information Systems Laboratory, Toshiba Research and Development Center. Dr. Genchi is a member of IEEE, Society for Electronics and Communications Engineers of Japan, Information Society of Japan, and an associate member of Sigma Xi.