Diagnosis of solitary lung nodules using the local form of Ripley’s K function applied to three-dimensional CT data

Diagnosis of solitary lung nodules using the local form of Ripley’s K function applied to three-dimensional CT data

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239 journal homepage: www.intl.elsevierhealth.com/j...

1MB Sizes 0 Downloads 1 Views

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

journal homepage: www.intl.elsevierhealth.com/journals/cmpb

Diagnosis of solitary lung nodules using the local form of Ripley’s K function applied to three-dimensional CT data ´ Erick Corrˆea da Silva a , Aristofanes Corrˆea Silva a,∗ , Anselmo Cardoso de Paiva a , Rodolfo Acatauassu´ Nunes b , Marcelo Gattass c a

˜ ˜ Lu´ıs, MA, Brazil Federal University of Maranhao-UFMA, Av. dos Portugueses, SN, Campus do Bacanga, Bacanga 65085-580, Sao ˜ Francisco de Xavier, 524, Maracana˜ 20550-900, Rio de Janeiro, RJ, Brazil State University of Rio de Janeiro-UERJ, Sao c Pontifical Catholic University of Rio de Janeiro-PUC-Rio, Department of Computer Science, R. Marquˆes de Sao ˜ Vicente, 225, Gavea ´ 22453-900, Rio de Janeiro, RJ, Brazil b

a r t i c l e

i n f o

a b s t r a c t

Article history:

This paper analyzes the application of Ripley’s K function to characterize lung nodules as

Received 8 June 2007

malignant or benign in computerized tomography images. The proposed characterization

Received in revised form

method is based on a selection of measures from Ripley’s K function to discriminate between

8 February 2008

benign and malignant nodules, using stepwise discriminant analysis. Based on the selected

Accepted 11 February 2008

measures, a linear discriminant analysis procedure is performed once again in order to predict the classification of each nodule. To evaluate the ability of these features to discrim-

Keywords:

inate the nodules, a set of tests was carried out using a sample of 39 pulmonary nodules,

Diagnosis of lung nodule

29 benign and 10 malignant. A leave-one-out procedure was used to provide a less biased

Ripley’s K function

estimate of the linear discriminator’s performance. The best setting of the analyzed func-

Textural characterization

tion in the tested sample presented 70.0% of sensitivity but with 100.0% of specificity and 92.3% of accuracy. Thus, preliminary results of this approach are very promising regarding its contribution to the diagnosis of pulmonary nodules, but it still needs to be tested with larger series and associated to other quantitative imaging methods in order to improve global performance. © 2008 Elsevier Ireland Ltd. All rights reserved.

1.

Introduction

Cancer is becoming an increasing cause of mortality and has been filling the gap produced by the decline of mortality by cardiovascular diseases observed in the last decade. Among the different types of cancer, lung cancer has been considered the most important threat, not only for its expressive incidence but for becoming the leading cause of cancer mortality in men and women. In developed countries, lung cancer has mortality rates greater than the sum of mortality rates by prostate, breast and colorectal cancer [1].



Lung cancer is a serious public health problem in Europe, the United States and many other countries around the world. The disease is also known as the one with the shortest survival among other malignancies [2]. As soon as the diagnosis is made it has been estimated that only 13% of the patients will live for another 5 years [3]. In Brazil, lung cancer occupies the first place of death by cancer in men and the second in women. In 2008 the incidence of 27,000 new cases of this disease [4] is expected. While lung cancer can be dramatically reduced by primary prevention, results of recent campaigns against smoking are

Corresponding author. Tel.: +55 98 21098832; fax: +55 98 21098841. E-mail addresses: [email protected] (E.C. da Silva), [email protected] (A.C. Silva), [email protected] (A.C. de Paiva), [email protected] (R.A. Nunes), [email protected] (M. Gattass). 0169-2607/$ – see front matter © 2008 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.cmpb.2008.02.003

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

expected to become apparent only after two or three decades. Meanwhile, lung cancer continues to be present in advanced stages with a global outcome of around 13% in 5 years [3]. From the biological point of view, lung cancer is an uncontrolled growth of abnormal cells in one or both lungs, which can reproduce very fast and cause regional and distant metastases even when the patient is asymptomatic. Lumps of cancer cells – tumors – impair lung functions when they obstruct bronchi, but cause less damage when they are located in other areas. The pulmonary parenchyma does not have painful nervous terminations and, unless the tumor invades precociously the parietal pleura, pain will be a late sign of lung cancer [5]. Nowadays, the main chance of discovering lung cancer in its initial stage is by incidentally finding a solitary pulmonary nodule revealed by chest X-ray or computerized tomography (CT) indicated to explore an abnormal thoracic clinical manifestation or in a routine preoperative evaluation. Another possibility, that has become important in recent years, is a CT Screening Lung Cancer Program in high-risk patients such as heavy smokers (with more than 30 years of tobacco use) [6]. The main issue with solitary pulmonary nodules is identifying their nature. Sometimes this is possible simply through radiological findings that allow diagnosing a nodule as benign due to total, central, lamellar or popcorn calcifications and high fat contents (hamartoma). Alternatively, some features are highly suggestive of malignity, such as speculate margins and pleural tail, but unfortunately around 15% of these findings also occur in benign nodules. In many other cases it is not possible, by means of simple radiological criteria, to identify the nature of the nodule, which is then classified as undetermined. This situation is particularly frequent in nodules with less than 1 cm in diameter, where benign etiology can respond for more than 90% of the total [7,8]. The top row in Fig. 1 shows the texture from a slice of two benign (a and b) and two malignant (c and d) nodules. The bottom row in Fig. 1 shows their respective 3D shapes. At radiological examination, solitary pulmonary nodules are approximately round lesions shorter than 3 cm in diameter, completely surrounded by lung parenchyma, and can represent a benign or a malignant disease. Any larger lesion is named pulmonary mass and should be considered as malig-

nant until counterproved. In all of these situations, etiologic definition is paramount to the medical decision. Although histological examination is the gold standard for diagnosis – normally obtained by invasive procedures – image methods and especially CT can aid the diagnostic process by analyzing the nodule’s attributes [9]. Radiological characteristics of benignity are well known and are based on calcifications or fat texture patterns that shift the mean radiological density out of the range of soft tissues. Malignity does not have similar texture criteria and its diagnosis is normally suggested by an irregular shape associated to certain clinical data, such as the load of tobacco. The administration of venous iodine contrast during CT improves texture resolution when discriminating between benign and malignant nodules [10]. Recently, there has been renewed attention about the quantification of wash-in and wash-out after contrast injection to obtain a nodule’s characterization [11]. Unfortunately, the small diameters of nodules and patients’ allergic reactions to these techniques are limiting factors. Even the most modern metabolic image method in clinical use, which is positron emission tomography (PET) superposed to helical CT examination (PET–CT) with image acquisitions before and after the intravenous administration of 18-fluoro-deoxyglucose, faces important limitations represented by false positivity of some inflammatory processes and false negativity of small or indolent cancers [12]. Computer-aided diagnosis (CAD) systems have been developed to assist radiologists and other specialized physicians in diagnostic setting, such as with early detection of lung cancer in radiographs and CT images. These systems can provide a second opinion and may be used as a first stage of radiological interpretation in the near future [13]. On the other hand, there are a number of reports on qualitative morphologic CT data in the medical literature, but there are relatively few reports on quantitative CT data and it seems that, in general, they are underutilized. The present quantitative work intends to apply a local form of Ripley’s K function, which is widely used in spatial studies, through point pattern analysis – as in Ecology – to 3D pulmonary nodules imaged by CT. This work’s main contribution and objective consists in observing the discriminatory power of this new method when distinguishing between benign and malignant nodules. This paper is organized as follows. Section 2 presents some prior works by the authors and related works. Section 3 describes the image database used, presents the segmentation procedure, and discourses about texture analysis presenting Ripley’s K function and explaining its application to nodule characterization in order to diagnose lung nodules. Discussion and analysis of the results using the proposed approach are presented in Section 4. Finally, Section 5 draws some concluding remarks.

2.

Fig. 1 – Examples of benign and malignant lung nodules.

231

Background

The authors have been studying methods for the characterization and diagnosis of lung nodules using several different approaches. Initially, we analyzed the shape and geometrical characteristics of nodules, applying geometrical measures

232

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

such as sphericity index, convexity index and the number of skeleton branches, among others [14,15]. The discriminant potential of spatial statistics for texture characterization was subsequently investigated. Thus, a set of spatial statistical measures like semivariogram, covariogram, madogram, as well as some spatial autocorrelation indices such as Moran’s Index and Geary’s Coefficient were studied [16,15,17]. The results obtained from these previous works were very promising, and encouraged the investigation of other spatially based statistical methods that may be more efficient for lung nodule discrimination. Other authors have been hypothesizing that quantitative CT data derived from geometric and texture parameters may contribute to differential diagnosis between benign and malignant solitary pulmonary nodules, even without using contrast. McNitt-Gray et al. [18] extracted measurements from nodule shape, attenuation coefficient, attenuation distribution and texture. They used a discriminant analysis technique with a stepwise variable selection procedure to separate benign from malignant nodules. Kawata et al. [19,20] have presented a method to characterize the internal structure of 3D nodules using the shape index and density of computerized tomography images to locally represent each voxel. They created a histogram of characteristics based on shape index, called shape spectrum measurements, to store voxels with a given index to define the nodule. Matrices similar to those of the texture-analysis method (co-occurrence matrix) were also created for shape index and density. The statistical discriminant analysis technique was employed to classify benign and malignant nodules.

3.

Computational methods

The process used in this work for lung nodule diagnosis is based on a four-step procedure, as illustrated in Fig. 2. The first is image acquisition, which is done through a CT exam of the

Fig. 2 – Steps for lung nodule diagnosis.

patient’s chest. In the second step, the volume is formed and the 3D representation of the lung nodule is extracted using a region-growing algorithm [21]. Then a characterization step is initiated, starting with the application of a quantitative function to the nodule in order to define some of its determinant aspects. In this work, the local form of Ripley’s K function was used for this characterization. Finally, a stepwise discriminant analysis is performed, verifying among the obtained measures which would be the set that discriminates more precisely the nodule in two classes—benign and malignant. This concludes the lung nodule’s classification.

3.1.

Image acquisition

The images used herein were provided by the Fernandes Figueira Institute and the Pedro Ernesto University Hospital – both from Rio de Janeiro city – for a project of CAD tool development. They were obtained from different real patients, comprising a total of 39 nodules (29 benign and 10 malignant). The images were acquired with a Helical GE Pro Speed tomography under the following conditions: tube voltage 120 kVp, tube current 100 mA, image size 512 × 512 pixels, voxel size 0.67 mm × 0.67 mm × 1.0 mm. The images were quantized in 12 bits and stored in DICOM format [22]. It is important to stress that the CT exams were performed without contrast injection, which may be clinically used in order to improve diagnosis but also causes some morbidity and occasional mortality by allergic complications. It is also necessary to highlight that the nodules were previously diagnosed by physicians and that the final diagnosis of benignity or malignancy was further confirmed by histopathological exam of the surgically removed specimen or by radiological 3-year stability. This also explains the reduced size of the present sample. In our work the nodule’s size is considered to be the maximum diameter of the sphere that involves the most distant points in axes xy or z. According to this definition, the mean diameter of the benign nodules was 23.72 mm (standard deviation 13.34) and the mean diameter of the malignant nodules was 40.93 mm (standard deviation 17.86). There are some nodules that have a diameter equal to 3 mm or less in the xy-axis (the most common definition of a nodule), but in the z-axis their diameter is larger than 3 mm. The data set used contains malignant nodules with larger mean diameter than benign ones, is a common finding. The general characteristic that malignant nodules are larger than benign ones is known and normally found in specialized literature [23]. But this does not mean that there is a cutoff diameter to separate malignant from benign nodules. In the studies, usually only percentages are given. In a revision study of patients with either screening-detected or incidentally detected lung nodules, the prevalence of malignancy was of 6–28% in nodules that measured from 5 to 10 mm in diameter, and 64% to 82% in nodules that measured over 20 mm in diameter [24]. In our work the smallest malignant nodule had 12 mm of diameter and the smallest benign one had 7 mm, but there were malignant nodules that presented, for example, diameters of 28 and 29 mm while there were benign nodules with

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

diameters of 35 and 36 mm. In spite of diameter being a general reference and being included in the Bayesian method to aid in the distinction between benign and malignant nodules, it does not have a decisive value alone [25]. Ripley’s K function, which adds 3D texture data, intends to provide a new contribution to this scenario. Although the literature varies on this point, an attenuation value of 200 HU is advocated by many as a good discriminator between calcified and non-calcified nodules. In our data set there are 15 benign nodules that had voxels equal to or above 200 HU (from a total of 29 benign nodules) and 3 malignant nodules had voxels above 200 HU (from a total of 10 malignant nodules).

3.2.

Lung nodule segmentation

In most cases, large lung nodules are easy to be visually detected by physicians, since their shape and location are in contrast with other normal lung structures. However, the voxel density of nodules is similar to that of other structures, such as blood vessels, which makes any kind of automatic computer detection difficult. Generally speaking, solitary pulmonary nodules are normally found unexpectedly in chest X-rays or CTs. The main reason for this is that (>1 cm) nodules can be easily distinguished from the surrounding structures. If this distinction is difficult, the nodule’s diagnosis is difficult as well. In an evaluation of an automatic segmentation process or a program created to support lung cancer screening, usually the gold standard is the analysis made by one or more radiologists. For example, it can be difficult for an automatic segmentation program to distinguish between a nodule and the chest wall, but it is relatively easy for an experienced observer to separate these two structures. The same occurs with a nodule close to a vascular structure. Unless the nodule is in a central position close to a hilar vessel, the distinction is not difficult. However, small (<1 cm) nodules represent a different scenario. In this setting the radiological diagnosis can be difficult, as well as the separation from vascular structures. Refs. [26,13] are examples of authors that consider the radiologist as a reference to evaluate computer analysis. In our work, a semi-automatic segmentation process was performed using a Pulmonary Nodule Analysis System called ´ [27]. In this system, beyond the 3D region-growing algoBebui rithm with voxel aggregation, two resources help and provide greater control over the segmentation procedure: the barrier and the eraser. The barrier is a cylinder placed around the nodule by the user with the purpose of restricting the region of interest and preventing the segmentation by voxel aggregation from invading other lung structures. The eraser allows physicians to erase undesired structures, either before or after segmentation, in order to avoid and correct segmentation errors [27]. From our sample, 11 nodules required manual intervention using the barrier and eraser because they were close to other lung structures. The bottom row in Fig. 1 shows the 3D reconstruction of the nodules in the top row and exemplifies nodule segmentation.

233

3.3. Textural characterization of nodules applying Ripley’s K function Textures represent tonal variations in the spatial domain and determine the overall visual smoothness or coarseness of image features. They reveal important information about the structural arrangements of the objects in the image and their relationship with the environment. Consequently, texture analysis provides important discriminatory characteristics related to variability patterns of digital classifications. Texture processing algorithms are usually divided into three major categories: structural, spectral and statistical [28]. Structural methods consider textures as repetitions of basic primitive patterns with a certain placement rule [29]. Spectral methods are based on the Fourier transform, analyzing the power spectrum [29]. The third and most important group in texture analysis is that of statistical methods, which are mainly based on statistical parameters such as the Spatial Gray Level Dependence Method-SGLDM, the Gray Level Difference Method-GLDM, and Gray Level Run Length Matrices-GLRLM [30]. In practice, some of the most common terms used by interpreters to describe textures, such as smoothness or coarseness, bear a strong degree of subjectivity and do not always have a precise physical meaning. Analysts are capable of visually extracting textural information from images, but it is not easy for them to establish an objective model to describe this intuitive concept. For this reason, the development of quantitative approaches has been necessary to obtain texture descriptors. Thus, in a statistical context, textures can be described in terms of an important conceptual component associated to pixels (or other units)—their spatial association. This component is frequently analyzed at the global level by quantifying the aggregation or dispersion of the element in study [31]. In this work, texture analysis is done by quantifying the spatial association between individual voxel values from nodule images by applying the local form of Ripley’s K function—which will be discussed in the following subsection.

3.3.1.

Ripley’s K function

Patterns of “small” objects in two or three dimensions or on the surface of terrestrial or celestial spheres are commonplace; some examples are towns in a region, trees in a forest and galaxies in space. Other spatial patterns such as a sheet of biological cells can be reduced to a pattern of points [32]. Most systems in the natural world are not spatially homogeneous but display some kind of spatial structure. As the name suggests, point pattern analysis comprises a set of tools for looking at the distribution of discrete points [33], for example individual voxels in a three-dimensional image that has been mapped in Cartesian coordinates (x, y, z) within a study volume. Point pattern has a long history in statistics and the vast majority of point pattern methods rely on a single distance measurement. There are plenty of indices – most use the Poisson distribution [34] as the underlying model for inferences about patterns – able to quantify the intensity of the pattern at multiple scales. Point patterns can be studied by first-order and secondorder analysis. The first approach uses point-to-point mean

234

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

As every point in the sample is taken once as the center of a plot circle, Ripley’s K function provides an inference at the global level of the studied element. However, this measure can also be considered in a local form for the ith point [35]: Ki (t) =

A ı(dij ) n

(2)

i=j

The analysis of patterns in three dimensions has received less attention than two-dimensional analysis, in part because such data may be harder to find. However, Ripley’s K function analysis can be modified for three dimensions, as described by Baddeley et al. [36]. It remains with the usual notation as: K3D (t) = Fig. 3 – Schematic illustration in 2D of measurement of Ripley’s K function. The amount of points of interest within the circle are counted.

distance or derives a mean area per point, and then inverts this to estimate a mean point density from which the test statistics about the expected point density are derived [33]. Second-order analysis looks at a larger number of neighbors beyond the nearest neighbor. This group of methods is used to analyze the mapped positions of objects in the plane or space, such as trees’ stems, and assumes a complete census of the objects of interest in the zone (area or volume) under study [33]. One of the most commonly used second-order methods is Ripley’s K function. Ripley’s K function is a tool for analyzing completely mapped spatial point process data, i.e. data on the locations of events. These are usually recorded in two dimensions, but there may be locations along a line or in 3D space. Completely mapped data include the locations of all events in a predefined study area. Ripley’s K function can be used to summarize a point pattern, test hypotheses about the pattern, estimate parameters and fit models [32]. Ripley’s K method is based on the number of points tallied within a given distance or distance class. Its typical definition for a given radius, t, is tallied as: K(t) =

A  ıij (dij ) n2 i

(1)

j

for i = j, where A is the sampled area, n is the total number of points and ıij is an indicator function equal to 1 if the distance dij between the points on locations i and j is lower than the radius t, else it takes on 0. In other words, this method counts the number of points within a circle of radius t from each point, as Fig. 3 shows. Although it is usual to assume stationarity, which means the minimal assumption under which inference is possible from a single observed pattern, K(t) can be interpreted as a non-stationary process because K(t) is defined in terms of a randomly chosen event. It is also usual to assume isotropy, i.e. that one unit of distance in the vertical direction has the same effect as one unit of distance in the horizontal direction.

V  ıij (dij ) n2 i

(3)

j

and with V as the plot volume. It represents the number of points within a sphere of radius t with center on each point. In a local form, for analysis in three dimensions, it is defined by: Ki3D (t) =

V ı(dij ) n

(4)

i=j

In this work we propose a lung nodule characterization from CT images by quantifying the spatial association among individual voxel values through the local form of Ripley’s K function, Eq. (4), applied to three-dimensional data, in order to verify its accuracy in discriminating between malignant and benign nodules.

3.3.2.

Application of Ripley’s K function to the nodules

As our idea was to analyze the spatial association of each individual voxel value, we chose to do this analysis at local level to avoid the large computational effort of doing it at global level. Thus, the measures (variables) extracted from the nodules, considered as texture signatures, were obtained by computing the local form of Ripley’s K function for each gray level from the 3D volume generated by the composition of CT images, as stated in Eq. (4). If the proposed procedure were applied to the original nodule images, obtained in 12 bits, as already mentioned, we would have 4096 gray levels to analyze, which would also require a large computational time. In order to reduce the computational effort we performed experiments with nodules quantized to 3, 4, 5, 6, 7 and 8 bits (or 8, 16, 32, 64, 128 and 256 gray levels, respectively) and observed which ones provided better information to discriminate the nodules. For each quantization level we applied Eq. (4) to each individual voxel value. For example, for a nodule quantized to 8 bits, we obtained the 256 function values of Ki3D (t) where i = 1, 2, 3, . . . , 255. Thus, we obtained a set of 504 (equal to 8 + 16 + 32 + 64 + 128 + 256) different values of Ki3D (t). In order to carry out the analysis along almost the entire nodule we performed the analysis using five different distance classes for the radius t. First, we took the nodule’s central voxel and then we took the farthest one from it. We consider as central voxel the point found in the median of the X and Y ranges

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

235

Fig. 4 – Schematic illustration in 2D of computing the local form of Ripley’s K function for different distance classes: (a) distance d between the central point and the farthest one from it; (b) sphere representation in 2D with radius t equal to (1/6)d, (1/3)d, (1/2)d, (2/3d) and (5/6)d to compute.

(using Cartesian orientation) from the middle slice of the segmented object. The distance d between these two voxels was adopted as the base value to compute the distance classes employed. Fig. 4 (a) shows this idea in 2D, VC being the central voxel representation and VF representing the farthest one. The different kinds of points illustrate different gray levels (or voxel densities). The analysis was made applying Eq. (4) taking the central voxel as the ith point in which the sphere with radius t was centered, and then computing assuming t values as (1/6)d, (1/3)d, (1/2)d, (2/3)d and (5/6)d. Fig. 4(b) illustrates a two-dimensional idea of this analysis. Thus, it was possible to observe the spatial association among individual voxel values from central locations to peripheral zones of the nodules. It is important to keep in mind that we used just the segmented nodule’s image, so only the voxels considered part of the nodule were used in the calculations of Eq. (4). Thus, there were “empty” voxels within the spheres that were not computed. There is no doubt that, in the segmented nodule, voxels with the partial contribution of lung parenchyma may exist, but exclusive lung parenchyma voxels were not included by the region-growing algorithm used for nodule segmentation in the automatic phase. This phase is preceded by a manual phase to exclude “touch regions” like chest wall, vessels or large fibrous lines, preserving lung tissue.

3.4. (LDA)

Classification using linear discriminant analysis

variables that do not contribute to a distinction between the groups (benign and malignant nodules). In our case, for example, that set of 504 variables was reduced to 6 based on the analysis performed for t = (1/3)d. Thus, even with a reduced sample there was a greatly reduced risk of overtraining.

3.5. Validation and evaluation of the classification methods In order to validate the classificatory power of the discriminant function, the leave-one-out technique [40] was employed. Through this technique, the candidate nodules from 39 cases in our database were used to train the classifier; the trained classifier was then applied to the candidate nodule in the remaining case. This technique was repeated until all 39 cases in our database had been the “remaining” case. It is important to keep in mind that for all 39 cases the variables used for the LDA are that ones previously selected with the stepwise procedure. In order to evaluate the classifier with respect to its differentiation ability, we assessed its sensitivity, specificity and accuracy. Sensitivity is defined by TP/(TP + FN), specificity is defined by TN/(TN + FP), and accuracy is defined by (TP + TN)/(TP + TN + FP + FN), where TP is true-positive, TN is true-negative, FN is false-negative, and FP is false-positive. Herein, true-positive means malignant nodule classified as malignant, i.e. the cancer presence was diagnosed correctly. The meaning of the other ones is analogous.

4. SPSS (Statistical Package for the Social Sciences) [37], a statistical software, was used for the analysis and classification of nodules. It was used in order to perform a linear discriminant analysis [38] with a stepwise procedure. The stepwise procedure is performed before the leave-one-out classification process and selects the variables that better discriminate the groups [39], eliminating redundant information and those

Results

Table 1 shows the results obtained with the proposed method. Values obtained from the presented function applied to each gray level for each quantization were computed separately, and all were combined for the five different distance classes previously described. The best results were obtained from the analysis with t = (1/3)d, which had accuracy above 90% and

236

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

Table 1 – Accuracy analysis in the diagnosis of lung nodules Distance class t

% success Specificity

1 6d 1 3d 1 2d 2 3d 5 6d

93.1 100 96.6 96.6 93.1

Sensitivity 40.0 70.0 60.0 60.0 40.0

Accuracy 79.5 92.3 87.2 87.2 84.6

specificity of 100%. For all other distance classes, specificity above 90% and accuracy around 80% were verified. We believe that the analysis carried out with t = (1/3)d provided the best result because the sphere with radius equal to this distance class still encloses completely the central calcification that some of the benign nodules from our sample present, thus computing more precisely benignity features of the nodules. For other benign nodules without central calcification this function can still discriminate most central density characteristics, less but near of calcification criteria. The classifier obtained 100% of specificity, which means that the classification was correct for all benign nodules. However, the result could change with a larger series. On the other hand, because the sensitivity was 70%, a few malignant nodules were interpreted as benign nodules. In practice, this diagnosis could prevent unnecessary surgical intervention for benign nodules, but some cases of early lung cancer detection could have their chances of cure decreased.

At same time, it was supposed that the spheres centered with radius t greater than (1/3)d. In this work (1/2)d, (2/3)d and (5/6)d, circumscribe the region with the central calcification, thus enclosing regions where the distinction between malignant and benign nodules is not so clear, providing less discriminant information to the classifier. So, as can be seen in Table 1, for these distance classes specificity, sensitivity and accuracy are lower. Based on the results shown in Table 1, more detailed results of the analysis made with distance class t = (1/3)d are shown in Table 2, ratifying that the local form of Ripley’s K function applied to three-dimensional data was a valuable tool to characterize lung nodules and to discriminate between benign and malignant when that distance class was adopted and all measures extracted from each proposed quantization were combined. We believe that if this process were repeated with the same nodules being segmented by different specialists, these results would not change significantly because in this series all nodules are solid, relatively well limited and without ground glass components. Only a few points would differ, the central point would not shift significantly, and spheres with similar volumes would be obtained. Thus, the values of Ripley’s K function would not be very different. Taking into account the analysis carried out with the distance class t = (1/3)d, Figs. 5–7 show the application of the local form of Ripley’s K function to the volumes represented by Fig. 1 on each nodule quantization. The values of Ki3D ((1/3)d) are normalized in the graphs. In fact, as Ripley’s K function counts the number of occurrences of an event inside a region, these graphs are a proportion of the

Table 2 – Detailed accuracy analysis in the diagnosis of lung nodules Distance class t

Quantization

% success Specificity

t = (1/3)d

3 bits 4 bits 5 bits 6 bits 7 bits 8 bits All combined

86.2 86.2 86.2 93.1 89.7 93.1 100.0

Sensitivity 70.0 70.0 70.0 70.0 50.0 40.0 70.0

Fig. 5 – Eq. (4) applied to examples in Fig. 1 quantized to 3 and 4 bits.

Accuracy 71.8 71.8 71.8 76.9 79.5 84.6 92.3

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

237

Fig. 6 – Eq. (4) applied to examples in Fig. 1 quantized to 5 and 6 bits.

Fig. 7 – Eq. (4) applied to examples in Fig. 1 quantized to 7 and 8 bits.

nodules’ histograms in the volume related to the referred radius. Remembering that Eq. (4) takes into account the amount of occurrences of an event (the presence of each gray level) in a region under study with respect to all occurrences of this event in the universe of the sample, by observing the graphs shown from Figs. 5 to 7, it can be interpreted that high values of Ki3D ((1/3)d) mean that this relation is high, but they do not give an idea of the amount of occurrences of each individual voxel value. Thus, it could be said that, in benign nodules,

voxels with higher density from the central calcification are not always in the nucleus of this calcification. In malignant nodules, for time, although histopathological studies indicate the inexistence of central calcification, with Ki3D ((1/3)d) it can be verified that a high percentage of the few voxels with high density in these nodules are located in the nucleus covered by the sphere with radius t = (1/3)d. Figs. 8 and 9 show the application of Ki3D (t) to the volumes represented by Fig. 1 quantized to 8 bits and taking into account the distance classes t = (1/6)d, t = (1/2)d, t = (2/3)d

Fig. 8 – Eq. (4) applied to examples in Fig. 1 quantized to 8 bits and with distance classes t = (1/6)d and t = (1/2)d.

238

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

Fig. 9 – Eq. (4) applied to examples in Fig. 1 quantized to 8 bits and with distance class t = (2/3)d and t = (5/6)d.

and t = (5/6)d. The application with t = (1/3)d was already shown in Fig. 7. Comparing the graphs we can notice that, as the distance class t is increased, the values of Ki3D (t) grow. This is easy to understand by recalling that the function under study takes into account the frequency of occurrences of an individual voxel value in a region of study with respect to all occurrences of this event in the nodule’s universe. That is, as the volume under study is increased, the number of occurrences of a voxel value considered for analysis in this region increases accordingly. In Fig. 9, it can be observed that for t = (5/6)d, in all nodules almost all individual voxel values provided the maximum or a very high value of Ki3D (t). This explains why one cannot carry out analyses with distance class t = (6/6)d (or simply t = d): with this distance class, all events (all individual voxel values) would provide the maximum value of Ki3D (t), for the whole sample (the entire nodule) would be enclosed by the plot volume. This would prevent the classifier from finding variables that discriminate the nodules between benign and malignant. With respect to comparisons between this approach and other methods mentioned in Section 1 with the same data set, it was verified that this technique presented better results regarding specificity, but unfortunately it did not provide very good sensitivity results.

local form of Ripley’s K function to three-dimension sample data, can contribute to discriminate benign from malignant lung nodules on CT images. Based on these results, we have observed that such measures provide significant support to a more detailed clinical investigation, and the results were very encouraging when nodules were classified with discriminant analysis. Nevertheless, it is necessary to perform tests with a larger database and more complex cases in order to obtain a more precise behavior pattern. In addition, due to the relatively small size of existing CT lung nodule databases and the various CT imaging acquisition protocols, it is difficult to compare the diagnosis performance between the algorithms developed here and others proposed in the literature. In spite of the good results obtained only by analyzing the spatial association of textures using the local form of Ripley’s K function to three-dimensional data, further information could be obtained by analyzing point patterns using other rules to determine the plot volume or just by applying the global form of the function under study. So, as future work, investigation of these other ways in order to verify the possibility of a more precise and reliable diagnosis is proposed.

5.

We would like to thank CAPES (process 0044/05-9) for funding this research, and the staff from Instituto Fernandes Figueira, ´ particularly Dr. Marcia Cristina Bastos Boechat, for the images provided.

Conclusions

This paper has presented a point pattern analysis function to characterize lung nodules as malignant or benign. The measures extracted from Ripley’s K function were analyzed and had great discriminatory power, using discriminant analysis in order to make the classification. Unfortunately, in spite of a specificity of 100%, the sensitivity was 70%, presenting some false-negative results. However, these numbers were obtained without contrast injection, which has been clinically used to increase specificity and sensitivity but also carries some morbidity and mortality by allergic complications. The number of studied nodules in our dataset is too small to draw definitive conclusions, but the preliminary results of this work are very encouraging, demonstrating that a linear discriminant classifier using point pattern analysis, applying the

Acknowledgements

references

[1] D.M. Libby, J.P. Smith, N.K. Altorki, M.W. Pasmantier, D. Yankelevitz, C.I. Henshcke, Managing the small pulmonary nodule discovered by ct, Chest 125 (2004) 1522–1529. ´ ´ ˜ 4th edition, [2] A.B. Tarantino, Nodulo Solitario Do Pulmao, Guanabara Koogan, Rio de Janeiro, 1997, pp. 733–753, Chapter 38. [3] R. Lag, Seer cancer statistics review, 1975–2002, Tech. rep., National Cancer Institute, Bethesda, MD, available at http://seer.cancer.gov/csr/1975 2002/, based on November

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 0 ( 2 0 0 8 ) 230–239

[4] [5]

[6] [7]

[8]

[9] [10]

[11]

[12]

[13] [14]

[15]

[16]

[17]

[18]

[19]

[20]

2004 SEER data submission, posted to the SEER web site 2005. ˆ ˆ INCA, Estimativas 2008: Incidencia de cancer no brasil, 2007, available at www.inca.gov.br. D. Schrump, N. Altorki, C. Henschke, D. Carter, A. Turrisi, M. Gutierrez, Non-small cell lung cancer, in: V.T. De Vita, S. Hellman, S.A. Rosenberg (Eds.), Cancer, Principles and Practice of Oncology, 6th edition, Lippincott Williams and Wilkins, Philadelphia, 2005, pp. 753–810. C.I. Henschke, D.F. Yankelevitz, CT screening for lung cancer, Radiol. Clin. North Am. 3 (38) (2000) 487–495. G.A. Lillington, C.J. Caskey, Evaluation and management of solitary and multiple pulmonary nodules, Clin. Chest Med. 14 (1) (1993) 111–119. C.I. Henschke, D.F. Yankelevitz, D.P. Naidich, D.I. McCauley, G. McGuiness, D.M. Libby, J.P. Smith, M.W. Pasmantier, O.S. Mietinnen, Ct screening for lung cancer: significance of nodules at baseline according to size, Radiology 30 (1) (2003) 11–15. D. Ost, A.M. Fein, S.H. Feinsilver, The solitary pulmonary nodule, N. Engl. J. Med. 25 (348) (2003) 2535–2542. S.J. Swensen, The probability of malignancy in solitary nodules. Application to small radiologically indeterminate nodules, Arch. Intern. Med. 8 (157) (1997) 849–855. Y. Jeong, K. Lee, S. Jeong, M. Chung, S. Shim, H. Kim, O. Kwon, S. Kim, Solitary pulmonary nodule: characterization with combine wash-in and wash-out features of dynamic multidetector row CT, Radiology 2 (237) (2005) 675–683. G. Pepe, C. Rosseti, S. Sironi, G. Landoni, L. Gianoli, U. Pastorino, P. Zannini, M. Mezzetti, A. Grimaldi, L. Galli, C. Messa, F. Fazio, Patients with known or suspected lung cancer: evaluation of clinical management changes due to 18 F-deoxyglucose positron emission tomography (18 F-FDG PET) study, Nucl. Med. Commun. 9 (26) (2005) 831–837. D.-Y. Kim, et al., Pulmonary nodule detection using chest ct images, Acta Radiologica (44) (2003) 252–257. A.C. Silva, P.C.P. Carvalho, M. Gattass, Analysis of spatial variability using geostatistical functions for diagnosis of lung nodule in computerized tomography images, Pattern Anal. Appl. 7 (3) (2004) 227–234. A.C. Silva, P.C.P. Carvalho, A. Peixoto, M. Gattass, Diagnosis of lung nodule using gini coefficient and skeletonization in computerized tomography images, in: Proceedings of the 19th Annual ACM Symposium on Applied Computing (SAC 2004), ACM Press, 2004, pp. 243–248. A.C. Silva, E.C. da Silva, A.C. de Paiva, R.A. Nunes, M. Gattass, Diagnosis of lung nodule using Moran’s Index and Geary’s Coefficient in computerized tomography images, Pattern Anal. Appl. 11 (2008) 89–99. A.C. Silva, P.C.P. Carvalho, M. Gattass, Diagnosis of lung nodule using semivariogram and geometric measures in computerized tomography images, Comput. Methods Programs Biomed. 79 (2005) 31–38. M.F. McNitt-Gray, E.M. Hart, N. Wyckoff, J.W. Sayre, J.G. Goldin, D.R. Aberle, A pattern classification approach to characterizing solitary pulmonary nodules imaged on high resolution CT: preliminary results, Med. Phys. 26 (6) (1999) 880–888. Y. Kawata, N. Niki, H. Ohmatsu, R. Kakinuma, K. Eguchi, M. Kaneko, N. Moriyama, Quantitative surface characterization of pulmonary nodules based on thin-section CT images, IEEE Trans. Nucl. Sci. 45 (4) (1998) 2132–2138. Y. Kawata, N. Niki, H. Ohmatsu, M. Kusumoto, R. Kakinuma, K. Mori, H. Nishiyama, K. Eguchi, M. Kaneko, N. Moriyama, Computer aided differential diagnosis of pulmonary nodules

[21] [22] [23]

[24]

[25]

[26]

[27]

[28] [29] [30]

[31] [32] [33]

[34] [35]

[36]

[37] [38] [39]

[40]

239

using curvature based analysis, in: Proceedings of the International Conference on Image Analysis and Processing, vol. 2, IEEE Computer Society Press, 1999, pp. 470–475. N. Nikolaidis, I. Pitas, 3D Image Processing Algorithms, John Wiley, New York, 2001. D.A. Clunie, DICOM Structured Reporting, PixelMed Publishing, Pennsylvania, 2000. M.K. Gould, J. Fletcher, M.D. Iannettoni, W.R. Lynch, D.E. Midthun, D.P. Naidich, D.E. Ost, Evaluation of patients with pulmonary nodules: when is it lung cancer? ACCP Evidence-based Clinical 132 (3) (2007) 108S–130S, Practice Guidelines (2nd edition). Chest. Supplement. M.M. Wahidi, J.A.M.D. Govert, R.K. Goudar, M.K. Gould, D.C. McCrory, Diagnosis and management of lung cancer, ACCP Evidence-based Clinical 132 (3) (2007) 94S–107S, Practice Guidelines (2nd edition). Chest. Supplement. K. Nakamura, H. Yoshida, R. Engelmann, H. MacMahon, S. Katsuragawa, T. Ishida, K. Ashizawa, K. Doi, Computerized analysis of the likelihood of malignancy in solitary pulmonary nodules with use of artificial neural networks, Radiology 214 (2000) 823–830. S. Takashima, S. Sone, F. Li, Y. Mauyama, M. Hasegawa, M. Kadoya, Indeterminate solitary pulmonary nodules revealed at population-based ct screening of the lung: using first follow-up diagnostic ct to differentiate benign end malignant lesions, Am. J. Roentol. 180 (5) (2003) 1255–1263. ´ ´ A.C. Silva, P.C.P. Carvalho, Sistema de analise de nodulo ´ ´ pulmonar, in: II Workshop de Informatica aplicada a Saude, Universidade de Itajai, Itajai, 2002, available at http://www.cbcomp.univali.br/pdf/2002/wsp035.pdf. R.C. Gonzalez, R.E. Woods, Digital Image Processing, 3rd edition, Addison-Wesley, Reading, MA, USA, 1992. A. Meyer-Baese, Pattern Recognition for Medical Imaging, Elsevier, 2003. V.A. Kovalev, F. Kruggel, H.-J. Gertz, D.Y.V. Cramon, Three-dimensional texture analysis of MRI brain datasets, IEEE Trans. Med. Imag. 20 (5) (2001) 424–433. M.D. Scheuerell, Quantifying aggregation and association in three dimensional landscapes, Ecology 85 (2004) 2332–2340. B.D. Ripley, Modelling spatial patterns, J. Roy. Statist. Soc. B 39 (1977) 172–212. D.L. Urban, Spatial analysis in ecology—point pattern analysis, 2003, available at: http://www.nicholas.duke.edu/lel/env352/ripley.pdf. A. Papoulis, S.U. Pillai, Probability Random Variables and Stochastic Processes, 4th edition, McGraw-Hill, 2002. M.R.T. Dale, P. Dixon, M.-J. Fortin, P. Legendre, D.E. Myers, M.S. Rosenberg, Conceptual and mathematical relationships among methods for spatial analysis, Ecography 25 (2002) 558–577. A. Baddeley, C.V. Howard, A. Boyde, S. Reid, Three-dimensional analysis of the spatial distribution of particles using the tandem-scanning reflected light microscope, Acta Stereologica 6 (Suppl. II) (1987) 87–100. L. Technologies, SPSS 11.0 for windows, 2003, available at http://www.spss.com. P.A. Lachenbruch, Discriminant Analysis, Hafner Press, New York, 1975. J.S. Wihtaker, Use of the stepwise methodology in discriminant analysis, Reports – Descriptive – Speeches/Conference Papers 150, Annual Meeting of the Southwest Educational Research Association, Austin, TX, 1997. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition, Academic Press, London, 1990.