Computer-assisted Diagnosis of Focal Liver Lesions on CT Images: Evaluation of the Perceptron Algorithm1 Eike Hein, MD, Andreas Albrecht, PhD, Daniela Melzer, RH, Kathleen Steinhöfel, PhD, Patrik Rogalla, MD Bernd Hamm, MD, Matthias Taupitz, MD, PhD
Rationale and Objective. The purpose of the study was to investigate a modified version of a so-called Perceptron algorithm in detecting focal liver lesions on CT scans. Materials and Methods. The modified Perceptron algorithm is based on simulated annealing with a logarithmic cooling schedule and was implemented on a standard workstation. The algorithm was trained with 400 normal and 400 pathologic CT scans of the liver. An additional 100 normal and 100 pathologic scans were then used to test the detection of pathology by the algorithm. The total of 1000 scans used in the study were selected from the portal venous phase of upper abdominal CT examinations performed in patients with normal findings or hypovascularized liver lesions. The pathologic scans contained 1 to 4 focal liver lesions. For the preliminary version of the algorithm used in this study, it was necessary to define regions of interest that were converted to a matrix of 119 x 119. Results. Training of the algorithm with 400 examples each of normal and abnormal findings took about 75 hours. Subsequently, the testing took several seconds for processing each scan. The diagnostic accuracy in discriminating scans with and without focal liver lesions achieved for the 200 test scans was approximately 99%. The error rate for pathologic and normal scans was comparable to results reported in the literature, which, however, were obtained for much smaller test sets. Conclusion. The modified Perceptron algorithm has an accuracy of close to 99% in detecting pathology on CT scans of the liver showing either normal findings or hypovascularized focal liver lesions. Key Words. Pattern recognition; CT; CAD; neoplasm. ©
AUR, 2005
Advances in modern imaging technology, in particular in cross-sectional imaging modalities (computed tomography [CT] and magnetic resonance imaging [MRI]), produce an ever-increasing amount of data to be assessed by the radi-
Acad Radiol 2005; 12:1205–1210 1 From the Department of Radiology, Charité, Medizinische Fakultät, Humboldt-Universität zu Berlin, 10098 Berlin, Germany (E.H., D.M., P.R., B.H., M.T.); the University of Herfortshire, Dept. of Computer Science, Hatfield, Herts, United Kingdom (A.A.); King’s College London, Dept. of Computer Science, London, United Kingdom (K.S.). Received January 17, 2005; revision received May 2, 2005; revision accepted May 2. Address correspondence to E.H. e-mail:
[email protected]
© AUR, 2005 doi:10.1016/j.acra.2005.05.009
ologist. An example of an abdominal CT obtained on a multislice CT scanner may illustrate the situation. With a scanning length of 45 cm, 1 mm slice thickness, and 0.8 mm reconstruction interval, 562 CT scans must be evaluated by the radiologist. To support image interpretation for the radiologist, different approaches for computer-assisted analysis aiming at semiautomatic image interpretation have been proposed for radiography (1– 4), in particular CT and MRI (5–7). In the literature, a combination of two methods has been used for such approaches: Neural networks are trained on data extracted from texture analysis. In neural networks, input data are used to adjust functional units in
1205
HEIN ET AL
subsequent layers to a given classification problem, where different methods are employed to calculate the adjustment. Usually, the networks computed by such training procedures are of small depth— one or two hidden layers in back-propagation networks (1). In medical imaging, neural networks have been introduced as decision support tools (3, 8 –12), e.g. to differentiate microcalcifications on mammograms into benign and malignant (3) or to assist the differential diagnosis of focal liver lesions based on ultrasound or CT images (9 –12). The methods reported in the literature rely on feature extraction, for instance contrast, spatial grey level distribution, or entropy. Six features commonly used in texture analysis are extracted in (8), 13 features are used in (3) which are then reduced to 6 features, and 49 features are used in (10), which are basically derived from 1 gray level feature and 8 texture features. In our approach, we use similar structures, namely networks (loop-free circuits) of functional elements, where the functional units are trained by a specific procedure. However, we use the entire image information as input data, i.e., we do not extract features from the images. The motivation is mainly based on two arguments: Firstly, the choice of features is a subjective decision. The selection of features, of course, can be justified by experiments, but at this point we utilize a second, complexity-based argument: If we assume that tumor classification is a computationally difficult problem, then we can apply wellknown results about the circuit complexity of decision problems from the class of the most complex problems associated with a given number n of input variables. The number of functional units (threshold functions) necessary to represent such complex problems on n variables can be estimated by c(2n/n)1/2 for a small constant c that is approximately from the range [2, . . . , 8] (13), regardless of the depth of circuits. Thus, if we take c ⫽ 8 and n ⫽ 1 ⫹ 8 (10) for the number of features used to perform the classification, we obtain 61 as the estimated circuit complexity of the image classification based on features. This number of functional units seems to be too small to cover the complex problem of tumor recognition. Here, we neglected the complexity to calculate features like contrast, entropy, etc. from image data. The complexity of these calculations depends on the image size and can be quite large, which is not the case in our approach, where the image size affects the “fan-in” of threshold gates but not the number of threshold gates. Thus, in our approach we use ROIs as input data and a relatively large number of 288 functional units to decide the classification problem,
1206
Academic Radiology, Vol 12, No 9, September 2005
where 252 of the units are perceptron-like units depending on the particular problem. MATERIALS AND METHODS Subjects and Image Acquisition The input data for the algorithm were selected from contrast-enhanced liver CTs from a total of 200 patients (CT studies performed over a 4-year period). The scans were acquired in spiral technique on a Tomoscan Expander or a PQ 2000 (Philips Medical Systems, Eindhoven, Nordbrabant, The Netherlands) using similar scanning parameters. Techniqual data: 120 kV, 150 mAs, FOV 400 mm, collimation 7– 8 mm, pitch of 1–1.5, reconstruction interval 4 –5 mm, and delay 70 sec after intravenous application of an iodinated contrast medium. A total of 50 examinations was selected as the normal population without focal liver lesions or just one or several benign cysts (Group A). The other group (Group B) consisted of 150 examinations showing one or more hypovascularized liver tumors. The diagnosis in Group A was confirmed by follow-up MRI, CT, or ultrasound over a period of at least 12 months. The diagnoses in Group B were confirmed by biopsy of the liver lesion in 79 cases. In the other 71 cases, the new appearance of hepatic lesions with a known and histologically validated extrahepatic malignancy was regarded as confirmation of metastases. Table 1 summarizes the different tumor entities present in Group B and the ways in which the diagnoses were confirmed. The inputs to the algorithm were square fragments (regions of interest, ROIs) of CT images. To reduce the image size, an experienced radiologist determined 2–15 (mean 9) ROIs per examination selected for Groups A and B. This was done because the interface of the version of the program used at this time accepted input data of this type only. ROIs with a size of 128 ⫻ 128 were selected from the original images with a 512 ⫻ 512 matrix typically used in CT. The ROIs were then converted into a matrix of 119 ⫻ 119 pixels with an 8-bit gray scale in DICOM standard format. Most importantly all selected ROIs were different from each other; there was no overlap. All ROIs extracted from scans of patients in group B contained lesions. Partial liver representations and liver vessels were not excluded from the ROI selection. Algorithm and Hardware Theoretical background.——The algorithm adjusts the weights of perceptron-like functional units of the follow-
Academic Radiology, Vol 12, No 9, September 2005
COMPUTER-ASSISTED DIAGNOSIS OF LIVER LESIONS ON CT
Table 1 Frequency of the different tumors present in the 150 patients of Group B. The diagnosis was confirmed on biopsy either of the primary extrahepatic tumor site or the liver lesion
Tumor Entity Hepatocellular cancer Cholangiocellular cancer Colorectal cancer Breast cancer Malignant melanoma Neuroendocrine tumor Tumor of the ENT Renal cell carcinoma Pancreatic carcinoma Esophageal carcinoma Cervical cancer Angiosarcoma Adrenal gland carcinoma Urothelial carcinoma Metastasis from unknown primary Total
Biopsy of Liver Lesion
Biopsy of Primary Tumor
18 11 11 0 4 12 0 3 6 2 0 3 3 4
– – 23 20 10 4 5 3 0 4 2 0 0 0
2 79
– 71
ing loop-free circuit of threshold functions: The circuit consists of four layers, where perceptron-like units are placed in layer one and layer three only. At depth (layer) four, a simple voting function V4 decides on the final outcome. The input values to this output gate V4 are from t perceptron-like threshold functions P[3,i], i ⫽ 1, . . . , t. Each P[3,i] produces an output from {⫺1, ⫹1}. The input values to each P[3,i] are from q counting functions, i.e., at layer two there are q⫻t such counting gates V[2,j], i.e. j ⫽ 1, . . . , q⫻t. Each V[2,j] produces an integer output from [⫺t, ⫹t]. The input values to each V[2,j] are from t perceptron-like threshold functions P[1,k], where k ⫽ 1, . . . , q⫻t2. Each P[1,k] produces an output from {⫺1, ⫹1}. Thus, the total number of perceptron-like threshold functions is t⫻( q⫻t ⫹ 1), and the total number of units (gates) is (t⫹1)⫻( q⫻t ⫹ 1). The input values to each P[1,k] are the n ⫽ 119 x 119 ⫽ 14161 pixel values. From preliminary experiments we determined t ⫽ 7 and q ⫽ 5, i.e. the circuit consists of a total of 288 units, where 252 are perceptron-like threshold functions. The structure is totally different from the one used in (15), and the number of gates is much larger since no gray scale pre-processing is used. The perceptrons P[1,k] and P[3,i] are of the type ⌺ wsxs ⬎ ␦ with output from {⫺1, ⫹1}, where s ⫽ 1, . . . , 14161 for P[1,k] and s ⫽ 1, . . . , t for P[3,i]. The weights ws are
calculated by a combination of the classical perceptron algorithm (14) with a stochastic local search-based optimization procedure. The combination of both methods is motivated by the following observation: The perceptron algorithm is able to correctly classify sample data if the data are linearly separable, which rarely is the case for data related to tumor classification. Therefore, the task is to at least minimize the classification error produced by a single perceptron. Höffgen et al. (16) have shown that finding a perceptron that minimizes the number of misclassified data is a difficult combinational problem. Therefore, we need to apply heuristics to the training of perceptrons. Simulated annealing (17) is our choice of a local search-based optimization strategy. The neighborhood relation is defined by the set of misclassified samples according to the decision Ó wsxs ⬎ ⌺. After the neighborhood selection, the weights ws are re-calculated according to the perceptron algorithm and the number of misclassified samples is calculated again. If the new perceptron passes the stochastic test associated with simulated annealing (see (17) for details), then the calculation continues with the new weights; otherwise, a new neighbor is chosen in accordance with a specific generation probability. Here, we employ a “cooling schedule” where the temperature is lowered at each successful transition step. Training and Testing For the training phase, a novel technique has been introduced: The perceptrons at layer one and layer three are trained by different methods. The training data of 800 samples are partitioned into two groups of equal size and uniformly distributed with respect to negative and positive samples sets. Thus, 200 positive and 200 negative examples are used to train the first layer perceptrons. A random selection of 50 positive and 50 negative examples is used to train a single perceptron P[1,k] by the procedure described above. Since we do not assume that the sample sets are linearly separable, we do not aim at zero classification of training data. The simulated annealing-based search is interrupted when the size of misclassified subsets stabilizes. After the completion of the training phase for layer one, the second subset of 200 ⫹ 200 samples is applied to sub-circuits defined by each P[3,i], where the weights at layer one are now fixed as a result of the first step of the training procedure. In this way, the 50 ⫹ 50 random selection out of 200 ⫹ 200 samples generates “new” training data from outputs of gates V[2,j] for each of the P[3,i]. This approach allows us to further improve
1207
HEIN ET AL
the classification accuracy on the available training and test data. The algorithm was implemented and tested on a rather slow workstation (SUN Ultra 5/360, Santa Clara, CA) with 256 MB RAM and clock rate 360 MHz; the system was used due to the UNIX environment. The set of learning examples consisted of 400 ROIs without and 400 with focal liver lesion selected as described above. Following the learning phase, the algorithm was tested with 100 negative examples from Group A and 100 positive examples from Group B. The run-time required for the learning and testing phases was recorded. We calculated the accuracy of the classification into normal and abnormal (i.e., presence of a focal lesion) examples. We also recorded misclassified ROIs.
RESULTS The duration of the training phase for the 800 ROIs (400 normal and 400 pathologic scans) run by the algorithm was 75 hours with a depth four circuit as tested in this study. The classification of a single case in the testing phase was performed on average within seconds. On the 100 ROIs of each group with normal and pathologic diagnoses, an accuracy of 98.5% in classifying scans with or without a hypovascularized lesion was achieved. The number of false positive and false negative classifications is listed in Table 2. The error rates on the 100 pathologic scans and the 100 normal scans were comparable. Figure 1 gives an example of a fragment without a focal liver lesion. There are several contrasted vessels corresponding to liver veins which show up as hyperdense structures. These vessels did not interfere with the detection rate of the algorithm. An example of a focal liver lesion is given in Figure 2. A hypodense lesion is surrounded by normal liver parenchyma corresponding to a liver metastasis in a patient with colorectal cancer.
DISCUSSION The quality of image interpretation in radiology varies with the experience of the reader (radiologist) (18). This is one of the reasons for the development of automated procedures for reviewing imaging data and arriving at a computer-assisted diagnosis. Another reason for automated image analysis aimed at preclassification of images into normal and abnormal findings is the large amount of
1208
Academic Radiology, Vol 12, No 9, September 2005
Table 2 Classification Results on 200 Test Samples for Two Depth Four Circuits
[t,q]
False Positive
False Negative
Sensitivity
Specificity
Total Correct Classification
[9,3] [7,5]
3 1
3 2
97% 99%
97% 98%
97% 98.5%
Figure 1. Region of interest of a CT scan obtained during the portal venous phase showing normal, homogeneous liver parenchyma.
Figure 2. Example of a focal liver lesion. Hypovascularized lesion partially surrounded by normal liver parenchyma. This was a metastasis in a patient with colorectal cancer.
image data generated by modern imaging modalities. With the advances made in computed tomography and the development of multislice CT scanners, up to 1500 or even more individual images are generated during one CT examination. The same is valid for magnetic resonance imaging, where the amount of data to be interpreted and processed is likewise increasing with the development of whole-body scans, e.g., in a screening setting. The method for CT scan classification into normal and abnormal findings presented here is a first step toward automated image processing. The combination of perceptron algorithm and simulated annealing used in our study re-
Academic Radiology, Vol 12, No 9, September 2005
COMPUTER-ASSISTED DIAGNOSIS OF LIVER LESIONS ON CT
quires only modest parameterization. Basically, one has to set the following parameters only:size of subsets in the training set, depth and size of the classification circuit, and some parameters of the associated simulated annealing-based procedure. To shorten the long learning phase of the algorithm, the image information was reduced by selecting fragments (ROIs) of the liver for investigating the preliminary version of the algorithm. This was necessary since computer hardware was a limiting factor for calculation speed at the time the study was performed. On the other hand the liver is a homogeneous organ that appears to be especially suitable for a feasibility study because there is less interindividual variation in the normal appearance compared, for instance, to the breast in X-ray mammography. Furthermore, selection of a region of interest reduces the variety of forms of extrahepatic structures, which might pose problems due to their complex anatomy, and allows for good training of the algorithm with respect to normal images without pathology. Moreover, the majority of focal liver lesions are depicted as round, smoothly marginated structures that are clearly delineated from the surrounding liver parenchyma. Thus, the present study is a simplified test of the algorithm with respect to the medical image material. The accuracy of the algorithm is close to 99% and thus high. Other studies on detecting focal liver lesions in CT images have produced similar results: Shimizu et al. (11) investigated liver lesions in CT images from four different perfusion phases taken from 10 patients. They obtained a classification sensitivity of 100%. Mir et al. (12) report 99% correct classification of small liver lesions in contrast material– enhanced CT images. Their method uses texture analysis. In our study, the classification is not restricted to selected features extracted from images, i.e., the original CT image information is taken as an input to the algorithm. However, to compare the outcome of our computational experiments to published results is difficult since, unlike other areas of pattern recognition and machine learning, there are no commonly acknowledged benchmark sets. In our study, we used a relatively large number of 200 test samples, which exceeds the total number of images used, e.g., in 86 images (3), 30 cases (8), and 147 ROIs (10). Further studies are needed to investigate the performance of the algorithm when the region of interest is extended to the entire liver or even the entire abdominal cross-section. Another option is to expand the algorithm to circuits of depths larger than four to achieve a
higher accuracy. In the present study, training of the algorithm on 800 cases took about 75 hours. However, the training phase is done only once and the classification (test) of a single image takes only a few seconds. With the use of a more powerful system than used in our study the speed-up should be proportional to the ratio of clock rates of the computer used. Moreover, the continuing advancements in hardware make it likely that powerful PCs are widely available and, for instance, will also become affordable for radiological practices. Further improvement of the algorithm may also be achieved by incorporating combinations of CT scans obtained during different perfusion phases or combinations of different sequences and/or perfusion phases in MRI.
CONCLUSION The implementation of the modified perceptron algorithm is a new approach to classifying radiological cross sectional images. The classification accuracy, which is between 98% and 99%, of detecting pathology on CT scans of the liver showing either normal findings or hypovascularized focal liver lesions is very promising. Thus, the combination of the perceptron algorithm with simulated annealing-based search together with the depth-four network structure, where perceptrons at depth three are trained on separate samples, has proved to be competitive. In fact, the algorithm provides a classification tool where the radiologist can cut out an ROI of fixed size which is then classified within a few seconds. However, further development and testing of the algorithm are necessary before routine clinical applications become possible. REFERENCES 1. Asada N, Doi K, McMahon H, et al. Potential usefulness of an artificial neural network for differential diagnosis of interstitial lung diseases: a pilot study. Radiology 1990; 177:857– 860. 2. Bohndorf K, Tolxdorff T, Pelikan E, et al. New developments in the computer-assisted diagnosis of focal bone lesions. Radiologe 1992; 32: 416 – 422. 3. Chan H-P, Sahiner B, Petrick N, et al. Computerized classification of malignant and benign microcalcifications on mammograms: texture analysis using an artificial neural network. Phys Med 1997; 42:549 –567. 4. Link TM, Majumdar S, Grampp S, et al. Imaging of trabecular bone structure in osteoporosis. Eur Radiol 1999; 9:1781–1788. 5. Handels H and Tolxdorff T. A New Segmentation Algorithm for Knowledge Acquisition in Tissue-Characterizing Magnetic Resonance Imaging. J Digit Imaging 1990; 3:89 –94. 6. Bernarding J, Reul J, and Tolxdorff T. Differentiation of Normal and Pathologic Brain Structures in MRI Using Exact T1 and T2 Values Fol-
1209
HEIN ET AL
7.
8.
9. 10.
11.
lowed by Multidimensional Cluster Analysis. MEDINFO 95 Proceedings, 1995 IMIA, RA Greens, et al. (editors): p. 687– 691. Bernarding J, Braun J, Hohmann J, et al. Histogram-Based Characterization of Healthy and Ischemic Brain Tissues Using Multiparametric MR Imaging Including Apparent Coefficient Maps and Relaxometry. Magn Res Med 2000; 43: 52– 61. Chen EL, Chung PC, Chen CL, et al. An automatic diagnostic system for CT liver image classification. IEEE Trans Biomed Eng 1998; 45:783– 794. Maclin PS, Dempsey J. How to improve a neural network for early detection of hepatic cancer. Cancer Lett 1994; 77:95–101. Gletsos M, Mougiakakou SG, Matsopoulos GK, et al. A computeraided diagnostic system to characterize CT focal liver lesions: design and optimization of a neural network classifier. IEEE Trans Inf Techn in Biomed 2003; 7:153–162. Shimizu A, Hitosugi T, Nakagawa J-Y, et al. Development of computeraided diagnosis system for 3D multi-detector row CT images of livers. International Congress Series, 2003; 1256:1055–1062.
1210
Academic Radiology, Vol 12, No 9, September 2005
12. Mir AH, Hanmandlu M, Tandon SN. Texture analysis of CT-images for early detection of liver malignancy. Biomed Sci Instrum 1995; 31:213–217. 13. Maass W. On the complexity of learning on neural nets. In: Proc. Computational Learning Theory: EuroColt’93, Oxford University Press 1994, p. 1–17. 14. Minksy ML, Papert SA. Perceptrons. Cambridge, Massachusetts: MIT Press, 1969. 15. Albrecht A, Hein E, Steinhöfel K, et al. Bounded-depth threshold circuits for computer-assisted CT image classification. Art Intel Med 2002; 24:179 –192. 16. Höffgen K-U, Simon H-U, van Horn KS. Robust Trainability of Single Neurons. J Comp Syst Sci 1995; 50:114 –125. 17. Aarts E. Local Search in Combinational Optimization. New York: Wiley & Sons, 1998. 18. Hillier JC, Tattersall DJ, Gleeson FV. Trainee reporting of computed tomography examinations: do they make mistakes and does it matter? Clin Radiol 2004; 59:159 –162.