Computer-aided detection (CAD) of lung nodules and small tumours on chest radiographs

Computer-aided detection (CAD) of lung nodules and small tumours on chest radiographs

European Journal of Radiology 72 (2009) 218–225 Contents lists available at ScienceDirect European Journal of Radiology journal homepage: www.elsevi...

660KB Sizes 0 Downloads 35 Views

European Journal of Radiology 72 (2009) 218–225

Contents lists available at ScienceDirect

European Journal of Radiology journal homepage: www.elsevier.com/locate/ejrad

Computer-aided detection (CAD) of lung nodules and small tumours on chest radiographs D.W. De Boo a,∗ , M. Prokop b , M. Uffmann c , B. van Ginneken d , C.M. Schaefer-Prokop a,e a

Academic Medical Center (AMC), Dept of Radiology, Meibergdreef 9, 1105 AZ Amsterdam, Netherlands University Medical Center (UMC) Utrecht, Dept. of Radiology, Heidelberglaan 100, 3584 CX Utrecht, Netherlands University Hospital Vienna (AKH) Vienna, Dept. of Radiology, Waehringer Guertel 18-20, 1090 Vienna, Austria d University Medical Center (UMC) Utrecht, Image Science Center, Heidelberglaan 100, 3584 CX Utrecht, Netherlands e Meander Medical Center, Dept of Radiology, Utrechtseweg 160, 3800 BM Amersfoort, Netherlands b c

a r t i c l e

i n f o

Article history: Received 7 May 2009 Accepted 7 May 2009 Keywords: CAD Lung cancer Pulmonary metastases Perception

a b s t r a c t Detection of focal pulmonary lesions is limited by quantum and anatomic noise and highly influenced by variable perception capacity of the reader. Multiple studies have proven that lesions – missed at time of primary interpretation – were visible on the chest radiographs in retrospect. Computer-aided diagnosis (CAD) schemes do not alter the anatomic noise but aim at decreasing the intrinsic limitations and variations of human perception by alerting the reader to suspicious areas in a chest radiograph when used as a ‘second reader’. Multiple studies have shown that the detection performance can be improved using CAD especially for less experienced readers at a variable amount of decreased specificity. There seem to be a substantial learning process for both, experienced and inexperienced readers, to be able to optimally differentiate between false positive and true positive lesions and to build up sufficient trust in the capabilities of these systems to be able to use them at their full advantage. Studies so far focussed on stand-alone performance of the CAD schemes to reveal the magnitude of potential impact or on retrospective evaluation of CAD as a second reader for selected study groups. Further research is needed to assess the performance of these systems in clinical routine and to determine the trade-off between performance increase in terms of increased sensitivity and decreased inter-reader variability and loss of specificity and secondary indicated follow-up examinations for further diagnostic workup. © 2009 Published by Elsevier Ireland Ltd.

1. Background Lung cancer is the second most commonly diagnosed cancer and the leading cause of cancer-related deaths in the United States. Despite increasing awareness of the deleterious effects of smoking and continuous research in lung cancer diagnosis and treatment, lung cancer mortality is very high and has not changed substantially over the last decade. Yet it is proven that bronchogenic carcinoma has a much better chance of cure if diagnosed at an early (localized) stage [1]. Lung cancer screening using CT is presently subject to intense research with a number of randomized and non-randomized trials still ongoing. While CT is very sensitive for detection of small pulmonary lesions, the vast majority of detected nodules are in fact benign [2]. The verdict is not yet out whether CT screening is reducing lung cancer-related mortality or in fact leads to a large number of unnecessary interventions and ultimately no survival benefit.

∗ Corresponding author. E-mail address: [email protected] (D.W. De Boo). 0720-048X/$ – see front matter © 2009 Published by Elsevier Ireland Ltd. doi:10.1016/j.ejrad.2009.05.062

Lung cancer screening efforts in the 1980s had focused on the use of chest radiographs. While more cancers were detected and more resections were performed, there was no survival benefit for screened participants. Since then the technique of chest radiography has seen vast improvements including the transition to digital radiography with better contrast resolution and better depiction of previously difficult areas such as the lung recesses and the retrocardiac space. Detection performance for small lung tumours should therefore be improved. Whether this translates into better patient survival, however, remains to be seen. Lung cancers missed on chest radiographs, however, remain one of the major reasons for lawsuits against radiologists [3]. Investigators of the Mayo Lung Project reported 75% of the perihilar (12/16) and 90% of peripheral nodules (45/50) had been missed at time of primary interpretation, but were visible on the chest radiographs in retrospect [4] and similar results were confirmed by other publications [5,6]. For this reason, detection of suspicious nodules remains a main task for any radiologist reading chest radiographs, even if the primary diagnostic question is not cancer-related. Though its inferior sensitivity to CT, chest radiography represents a fast and relatively cheap imaging modality that remains a

D.W. De Boo et al. / European Journal of Radiology 72 (2009) 218–225

popular modality for the surveillance of pulmonary metastatic disease in patients with known malignancies and is in most cases the primary method of choice to screen for intrapulmonary abnormalities. 2. Rationale for CAD Detection of subtle focal pulmonary opacities on radiographs remains a challenge. It is limited by two categories of noise: (1) the radiographic noise (mottle) related to the quantum nature of radiation [7,8] and (2) the anatomic noise [9] which refers to surrounding and superprojecting anatomic structures such as ribs, abnormalities in the lung parenchyma and vascular structures. The term ‘conspicuity’ of a lesion describes the relation of feature contrast to surrounding complexity, thus including the contributions of both noise categories [10]. In chest radiography, anatomic noise appears to have the far greater influence on the detection of pulmonary nodules [11,12]. The complexity of the surrounding anatomic noise greatly influences the perception of the radiologists. In an experiment, in which the authors slightly varied the location of simulated nodules, Samei et al. found strong correlations between nodule size, nodule location and observer detection performance [9]. Anatomic structures such as ribs and pulmonary vessels superimposing on a subtle lung nodule on a chest radiograph influenced nodule detection, measured as the area under the receiver operating characteristics (ROC) curve (Az), by as much as 28%. The effects of distracting anatomic noise are aggravated by the intrinsic limitations of human perception. Perception is influenced by predictable measures such as training and experience but also by less predictable effects such as concentration, distraction and fatigue. Radiologists are not consistent in what they detect and what they diagnose. The less conspicuous the findings, the greater will be the variability. When confronted with a set of radiographs a second time, the reader may detect different lesions than the first time. Studies have repeatedly shown that lesions have been missed that were evident in retrospect [4–6]. To improve interpretation in chest radiography, various technical innovations have been introduced, among which image processing techniques represent the most important tools. Image processing includes techniques such as dual energy subtraction, temporal subtraction and tomosynthesis (see dedicated articles in this issue) that all aim at increasing the conspicuity of a lesion by reducing the impact of anatomic noise. Computer-aided diagnosis (CAD) techniques take a different approach: they do not alter the anatomic noise but aim at decreasing the intrinsic limitations and variations of human perception by alerting the reader to suspicious areas in a chest radiograph. In addition to prototypes only applied under research conditions, there are currently two FDA approved systems available on the market (IQQA-chest, EDDA Technology, Inc., Princeton Junction, NJ, USA, Fig. 1 and ONGUARD; Riverain Medical, Miamisburg, OH, USA, Fig. 2), more systems from other manufacturers are expected to follow. For mammography, the first FDA approved clinical system was introduced already in 1998. Since then systems were continuously improved and received a widespread adoption in the US, where there is additional reimbursement for the use of CAD [13]. Though being in use already for a longer time, the clinical value of CAD is still being debated [14]. 3. Prerequisites for routine implementation of CAD Computerized methods for automatic detection of lung nodules were published already in the 1970s, none of these methods, however, were applied in clinical studies at this point, probably because

219

of large numbers of false positive findings generated with these early methods [15]. A study from 1993 postulated the potential usefulness of CAD schemes for nodule detection if the false positive rate could be reduced to the level of approximately one (!) false positive detection per radiograph at a sensitivity of 75% [16]. More than 15 years later, even the most advanced CAD algorithms do not live up to this postulated requirement. In the meantime CAD has become clinically available for mammography, and it now appears to be ready also for broader clinical application in the chest. The reason for the difference in acceptance between CAD for mammography and CAD for chest radiography may be based on the different detection tasks. In mammography the detection task is two-fold and refers to the localization of small but high-contrast microcalcifications on one side and the detection of ill defined soft-tissue masses on the other side. While for the first, the sensitivity of modern CAD systems is reported to be very high (98%), the sensitivity for mammographic masses is known to be significantly lower [17]. Pulmonary nodules and masses, however, seem to face similar problems with respect to conspicuity and their vulnerability towards the effects of overlying and distracting anatomic structures. All systems for computer-assisted detection are designed, to be used as a second reader: they are meant as complementary tool that draws the radiologists’ attention to certain image areas that need further evaluation. They are not designed to detect all potential lesions, which would allow the radiologist only to focus on the areas identified by the CAD system. The present systems do not provide a 100% sensitivity, which makes it necessary for the radiologist still to evaluate the whole image. However, the systems can detect additional lesions that might escape the radiologist’s attention. In clinical practice, the CAD program runs in the background while the radiologist completes visual interrogation of the PA and lateral radiograph. Subsequently suspicious lesions detected by the CAD are revealed and the radiologist has the opportunity to accept or discard the CAD findings. Usually the findings are indicated by ROIs around the suspicious lesion or by highlighting the suspicious areas. Comparing the original image with the image with CAD annotations can be done in a side-to-side fashion or by toggling the highlighted areas or circles on and off. Whatever implementation, however, application of CAD results in additional reading effort and increased reading time at least for a substantial number of cases, certain readers and within an initial time period of adaptation. Whether reading time will be generally increased also after adaptation to the system remains to be evaluated. In order to make these additional investments worth their while, the number of false positive lesions has to be small and the true positive candidate lesions should complement the lesions found by the radiologists: there is no advantage if the lesions found by CAD completely overlap the lesions found by human observers. Another prerequisite for clinical practice is the seamless integration of the CAD into a picture archiving and communication system (PACS). CAD results have to be available on-demand and need to be presented in a fashion that disturbs the normal workflow as little as possible. In addition, CAD should work well independently of which digital imaging system has been used for acquisition of digital chest radiographs. A few logistic and potentially medico-legal issues have not yet been resolved: whether and for how long candidate lesions have to be stored, and whether candidate lesions become visible only for radiologists or also for the referring physicians [18]. CAD will most inevitably detect lesions that turn out to be true tumours but have been refuted by the radiologist at time of evaluation. This may lead to more defensive medicine with unnecessary follow-up radiographs or CT examinations if lawyers are allowed to interpret such a situation as an instance of malpractice. In addition will it be difficult for less experienced clinical colleagues to differentiate between

220

D.W. De Boo et al. / European Journal of Radiology 72 (2009) 218–225

Fig. 1. Two intrapulmonary metastases (see CT), both correctly assigned by the CAD (IQQA-chest, EDDA Technology, Inc., Princeton Junction, NJ, USA).

true and false positives candidates. Again this holds potential for improper follow-ups and conflict between radiologists and referring physicians. For all these reasons, it is probably most appropriate not to store the CAD results in a PACS environment. 4. CAD results as published so far 4.1. Study designs There are various study setups and statistical models that have been applied to assess the effects of CAD (see Table 1). Selection and prevalence of lesions, number and experience of readers, standard of truth and statistical analysis, however, represent factors that will influence the results and should be considered when drawing conclusions and comparing performance of various CAD systems. The majority of studies are based on a set of radiographs that was selected to include a certain number of exams containing nodules

or tumours and a certain number of controls (see Table 1). Ideally, presence or absence of lesions is proven by CT as the superior gold standard. However, such a process introduces a selection bias that depends on the selected type of abnormality and does not reflect the normal clinical situation in which there is no superior standard available for most patients. CAD performance has two components, the stand-alone performance of CAD without human interaction and the effect of CAD on reader performance. Stand-alone performance is a good indicator of the magnitude of the potential effects of CAD: it determines how many and what kind of lesions can be detected at how many false positives per radiograph. True and false positive rates are interrelated: higher true positive rates for a specific CAD system always come at higher false positive rates as well. For this reason a compromise has always to be made between these two factors. In some studies the operating point was variable and allowed for analysis of this interrelation, in other studies the operating point was fixed and provided only one set of true positive and false positive rates.

D.W. De Boo et al. / European Journal of Radiology 72 (2009) 218–225

221

Fig. 2. Correctly assigned CT proven T1 tumour (ONGUARD; Riverain Medical, Miamisburg, OH, USA) in the left upper lobe and one false positive call in the left long apex due to the crossing of clavicle and 1st rib. Table 1 List of publications using FDA approved and prototypes of CAD schemes for the detection of nodules and T1 bronchogenic carcinomas. Author

System

n

Prevalence

CAD stand alone

FP per image by CAD

Reader vs. reader with CAD

CAD for pulmonary nodules v Beek (2008) Bley (2008) Hardie (2008) He (2008) Kasai (2008)

EDDA EDDA No FDA EDDA No FDA

214 117 154 116 60

26% 36% 100% 50% 52%

– 39% 78% 67% 52%

– 2.7 4 2.4 4.2

Shiraishi (2007) Schilham (2007)

No FDA No FDA

106 247

100% 62%

Shiraishi (2006) Shiraishi (2006) Song (2005) Coppini (2003)

No FDA Riverain EDDA Riverain

48 459 232 128

75% 100% 36% –

Shiraishi (2003) Freedman (2002) MacMahon (1999) Xu (1997) Kobayashi (1996) Matsumoto (1992) CAD for T1 Lungcarcinoma Li (2008) Sakai (2006) Kadeka (2004)

No FDA No FDA No FDA No FDA No FDA No FDA

90 240 40 200 120 198

60% 67% 50% 50% 50% 48%

71% 51% 67% 62% 70% 71% 60%** 75%** 100%** 66% 80%** 70% – 60%

4.9 2 4 – 5 2.8 4.3 10.2 3.1 5 1** 1.7 – 15

Sensitivity 64% vs. 93%* – – – Sensitivity 65% vs. 68%* Az 0.804 vs. 0.816 – – – Az 0.724 vs. 0.778* – Sensitivity 49% vs. 80%* – – Az 0.682 vs. 0.808* Az 0.835 vs. 0.865* Az 0.825 vs. 0.889*

Riverain No FDA No FDA

34 100 90

100% 50% 50%

34% 74% –

5.9 2.2 3.2

FP: false positives; Az: area under ROC curve. * Significant difference. ** Fixed.

Az 0.894 vs. 0.940* – – Az 0.986 vs. 0.923* Az 0.924 vs. 0.986*

222

D.W. De Boo et al. / European Journal of Radiology 72 (2009) 218–225

Because CAD schemes cannot detect all potential lesions in a radiograph at a reasonable false positive rate, they need to be used as second reader. How much a CAD scheme is able to improve the performance of a reader depends on the following factors: - the reader experience (less experience will lead to a potentially greater increase in performance), - the stand-alone performance of CAD (the higher the detection rate of nodules that are typically missed by human observers, the more effect a CAD will have) and - the ability of observers to distinguish between a true positive and a false positive marking. Especially the latter has not yet been extensively studied but is an important factor that will ultimately determine whether CAD will mainly increase the numbers of true lesions found by an observer or whether it will also increase the number of false positives detected by the observer. It is therefore important that readers become familiar with the behaviour of “their” CAD system so that they learn the optimum cut off for differentiating true and false positive lesions. As a result of unfamiliarity with CAD, observers may dismiss true positive candidate lesions or loose confidence in CAD readings because of too many false positive lesions. Statistical analysis of data is important for interpreting the outcome of CAD studies: sensitivity and specificity are influenced by the prevalence of disease within the study group, especially if a high prevalence is known or suspected by the readers. For this reason, receiver operating curve (ROC) approaches are to be preferred because they incorporate the interrelation of sensitivity and specificity of human observers depending on their confidence level. ROC statistics work on a per-region basis: any positive reading of a region containing a true lesion is counted as true positive, independent of whether the reader had actually identified the lesion or had read a false positive contained in the same region. Localized ROC (L-ROC) analysis forces the reader to indicate a lesion and thus also incorporates the correct localization of a lesion. Any type of ROC statistics, however, requires the availability of a superior gold standard, usually provided by CT examinations. 4.2. CAD for lung nodule detection First publications [19] are more than 10 years old and were based on the comparison of conventional radiographs and digitised versions of these radiographs that included the superimposed CAD output. Although it can be expected that CAD software has further improved in the meantime, already this first study reported excellent results: diagnostic accuracy improved significantly with CAD (ROC area 0.906 vs. 0.948) while reading time did not increase significantly. The authors also reported a larger benefit for inexperienced radiologists when using CAD and demonstrated a subsequent decrease in variability of accuracy across readers of different experience levels. The first study that brought the potential of CAD for nodule detection to the attention of a large group of radiologists was a large-scale observer test conduced during the 1996 RSNA scientific assembly in Chicago [20]. Radiologists at the RSNA were invited to evaluate 22 abnormal and 20 normal digitised chest radiographs in random order without and then with CAD. The CAD program had a stand-alone performance of 70% at an average false positive rate of only one per radiograph. The 146 observers included chest radiologists, general radiologists and residents. For all categories of readers, an improved detection rate was seen with the application of CAD. ROC areas varied from 0.697 to 0.825 without CAD and from 0.80 to 0.88 with CAD. The correct diagnosis was shown after each case,

which is unrealistic under clinical conditions but is likely to have introduced a learning effect. Learning how to use CAD, however, is crucial: understanding which lesions a CAD is likely to detect and which it is likely to miss is important for a reader to make the decision whether to accept a CAD-detected lesion or dismiss it as false positive. The reduction in variability between radiologists of different experience was further confirmed by other studies: the less experienced the observers, the greater the improvement in lung nodule detection with CAD will be [21–23]. In 2008 Bley et al. published the stand-alone performance of one of the first commercial CAD systems (IQQA-chest, EDDA Technology, Inc., Princeton Junction, NJ, USA) for the detection of CT proven nodules with a mean diameter of 7.5 mm ± 2.2 mm [24]. For these small nodules the CAD system yielded a sensitivity of 39% (stand-alone performance) at 2.5 false positive findings per scan, compared to radiologists who had sensitivities between 18% and 30%. Most interestingly, the agreement among radiologists was larger (kappa = 0.64–0.73) than between radiologists and the CAD (kappa = 0.45–0.52). This indicates that CAD indeed detected different lesions than the radiologists. An important step towards an increased comparability of the performance of various CAD schemes would be their evaluation using the same study cases: Schilham et al. assessed their own CAD system developed in academia using the JSRT database, a large and publicly available database; they reported a sensitivity level of 51% when accepting 2.0 false positives per image and a sensitivity of 67% when accepting 4.0 false positives per image, respectively [25]. Other authors detected 78% of the nodules but at a false positive rate of 4.0 per image using the same database [26]. These results nicely demonstrate the inevitable trade-off between sensitivity and specificity that these systems characterizes. Kasai et al. evaluated the effect of CAD on the detection of subtle nodules in PA and lateral chest radiographs [27]. About 50% of the nodules (15/31) were visible only on the PA views. The CAD software indicated candidate lesions on both views. The authors found an increased sensitivity (64.9 vs. 67.6) with CAD but no significant increase of the ROC area (0.804 vs. 0.816, p = 0.297) indicating that the increased sensitivity was counteracted by an increased number of false positive readings with CAD. The only study so far that tried to assess the usefulness of CAD in a clinical environment was published in 2008 by van Beek et al. [28]. They investigated the benefit of CAD for nodule detection on chest films obtained in daily practice. The study group exclusively consisted of patients with known extra-pulmonary primary malignancies who were under surveillance for metastatic disease. For 214 of 324 chest radiographs a standard of reference could be established by CT or follow-up of at least 6 months. Nodules were present on 55 of these 214 chest radiographs. The sensitivity significantly increased with CAD from 63.6% to 92.7% with a decrease in specificity from 92.7% vs. 96.2%. In summary, nodule CAD has been shown to increase reader performance with better sensitivity especially of inexperienced readers and to decrease inter- and intra-reader variability. 4.3. CAD for detection of early stage bronchogenic cancer The first studies evaluating the effects of CAD on the detection of T1 tumours were published by Japanese colleagues. This task may differ from detecting metastases because nodule characteristics of T1 tumours are often different from those of metastases. A significantly increased detection rate for early lung tumours (n = 45, mean diameter 18 mm, with a range between 8 and 25 mm) was described by Kakeda et al. [21], who evaluated CAD using eight radiologists of varying experience. The area under the ROC

D.W. De Boo et al. / European Journal of Radiology 72 (2009) 218–225

curve increased from 0.924 to 0.986, and thus started already at a very high baseline. Board certified radiologists performed better and profited less from CAD than residents, but even residents had a baseline performance without CAD that exceeded 0.90. These results were confirmed by Sakai et al. [29] who also assessed the detection of T1 tumours by eight readers of varying experience. Four chest radiologists and four residents evaluated PA radiographs of 50 patients with T1 tumours (<3 cm in diameter) and 50 controls. Stand-alone performance of CAD (EpiSight/XR Mitsubishi space software, Amagasaki, Japan) included a sensitivity of 74% and an average false positive rate of 2.3 per case for the normal controls. ROC area increased from 0.896 to 0.924 with CAD. CAD might help to reduce the number of T1 bronchogenic tumours originally missed in the reports at the time of clinical evaluation. White et al. reassessed the detectability of 88 missed cancers [30]. They included all prior images of missed cancers in their study and were able to include 114 positive images with 88 different lesions. The CAD system (ONGUARD; Riverain Medical, Miamisburg, OH, USA) correctly identified 36/88 (41%) of the originally missed lung cancers, which indicates the potential of CAD to reduce the number of missed cancers. However, this study only evaluated the stand-alone performance of CAD, which does not include reader response to candidate lesions. It is not clear how many of the tumours detected by CAD would be accepted by readers and at what cost in terms of false positive readings. Li et al. [31] published the stand-alone performance of the same CAD scheme for a selected group of 34 CT proven T1 cancers that had been missed in the original CXR reports. CAD detected 35% of originally missed lesions. Sensitivity was 45% for the more obvious and 30% for the more subtle cases. The average false positive rate was 5.9 per radiograph.

4.4. Differentiation of benign from malignant lesion Differentiating benign from a malignant pulmonary nodules on chest radiographs is a difficult and in many cases impossible task. Usually patients with newly identified non-calcified nodules undergo further diagnostic workup that may include CT, PET or biopsy. The superiority of PET–CT for detecting metabolic activity and the ability to perform accurate growth measurements with CT [32] have made early attempts to use morphological features on radiography for nodule differentiation less attractive. However, in 2003 Shiraishi et al. presented a CAD system trained to discriminate benign nodules from malignancies with a performance better than even experienced radiologists (p < 0.002) [33]. In addition to clinical features such as age, sex and history of the patients, a computerized analysis of morphological features such as irregularity, density and contrast had formed the base for this CAD scheme With CAD the increase in performance was higher for experienced than for inexperienced radiologists, which suggests that CAD and personal skills are complementing effects. Most interestingly, however, the stand-alone performance of CAD was higher than the performance of each radiologist independent of experience. This demonstrates that radiologists may be reluctant to accept the suggestions of a CAD [33]. In a second study the same group of authors tested the combined – detection and classification – task using an advanced CAD scheme and reported similarly significant improvement for both, detection and classification with application of CAD (ROC areas 0.724 vs. 0.778, p = 0.008.) [34]. In summary, both detection and classification of T1 tumours has been shown to increase with the application of CAD which is especially appealing for lesions that had been missed during the primary reports.

223

5. Limitations of present studies The vast majority of studies assessed the performance of CAD for a selected study group with a varying prevalence of lesions and a varying conspicuity of lesions. The selection of the lesions with respect to size, location and overall conspicuity is tremendously important when interpreting the results. Additional information such as the fact that the lesions had been originally missed or a relatively low sensitivity below 40% serve as indicators for a generally low conspicuity of the majority of lesions. Inclusion of many small nodules (5–10 mm based on CT) of which many might be hardly or not at all visible on a chest radiograph limits the assessment of CAD as well as the inclusion of lesions that are so obvious that the performance without CAD is already exceeding an ROC area of 0.9. As recently pointed out in an editorial by Gur [35], it becomes the more difficult to prove a clinically relevant improvement by a certain technique (e.g., CAD), the higher the baseline performance is to start with. The consequence is that the various studies show widely varying levels of improvement because of the varying level of baseline performances. He also stressed the important point how difficult the appreciation of a ‘clinically relevant performance difference’ is as one approaches a perfect performance level. While statistics may show significance already for small differences, the perspective in terms of clinical relevance and importance remain frequently unanswered. In studies with a limited and selected group of readers (mostly four to six) the chance that outliers have a great influence on the outcome results is high. Freedman graphically demonstrated the diversity across readers and cases with and without use of CAD by so-called ‘heat maps’ [36]. These ‘heat maps’ display three-dimensional data in two-dimensions with the third dimension represented by colour and were found well suited to document the complexity of reader variability as function of case conspicuity, experience and level of training and the interaction between these facts with CAD. This diversity, however, is easily missed in summary statistics but represents the determinant factor in the individual (clinical) situation. An equally important point is the integration of readers’ behaviour. For organizational reasons it is obviously easier to assess the stand-alone performance of a CAD system without including readers in the study. But it should not be forgotten that these systems are thought to be used as second opinion, that as such the highest performance can be expected when readers use CAD as complimentary tool. To be able to use a CAD to its full advantage, however, readers show a learning curve: they have to become familiar with the potential and limitations of the CAD they are using and they have to build up a trust into the CAD’s capability to be able to accept the true positive lesions without a too big loss of specificity by correctly ruling out the false positive lesions. More experience has to be gained with respect to the influence of the processing on the performance of CAD. Though there is a tendency towards a consensus to what represents a ‘good quality chest radiograph’ with respect to processing, there remains still room for individually customized versions that may drastically differ from the image display in other institutions. Most manufacturers use some type of multi-scale multi-frequency processing which has similar but not identical effects on the image display. He et al. found a significant impact on the performance of CAD for the detection of nodules when comparing a default processing (parameter 0.0 for structure preference) with a high-pass filtered image (parameter 0.4 for structure preference) and a low-pass filtered image (parameter < 0.01 for structure preference) [37]. There is so far only one study that aimed to investigate the effect of CAD in daily practice [28]. It is thus far the only study which provides some information about the order of magnitude of potential improvement through CAD which can be expected in clinical routine and how many radiographs have to be evaluated for that.

224

D.W. De Boo et al. / European Journal of Radiology 72 (2009) 218–225

The results are valid for a specific group of oncology patients with known extrapulmonary malignancies. In a clinical setup, establishment of truth remains a problem because not all of these patients can undergo CT and vice versa inclusion of only patients with CT would represent again a selection bias. Thus it is not surprising that only for 66% of the original study images, a truth could be established based on follow-up and CT. Prevalence of disease was unusually high with 25.7% (55/214) that were found to have lung nodules. According to the authors, pathology (e.g., nodules) was established in 19 of the 55 patients only with the help of CAD resulting in an increase of sensitivity from 63.6 to more than 90%. Further clinical evaluation, however, confirmed nodules in only 16 patients, and in only 5 of the 16 patients the nodules were proven to be malignant.

6. Discussion 6.1. Sensitivity vs. specificity Increase of sensitivity and decrease of specificity are somewhat correlated to each other, a finding which is easily understandable if we consider the concept of each reader’s ‘individual threshold’ to accept or dismiss a lesion. This concept is true for both, for lesions the reader detected himself as well as candidate lesions ‘offered’ by CAD. Even though many if not most of the false positive lesions may be easily dismissable given a certain radiological knowledge and experience, there is a substantial amount of lesions for which it is difficult to differentiate between projection effects, scarring and a potential malignancy. These lesions increase in number in patients with preexisting lung disease such as chronic obstructive lung disease, chronic airways disease or interstitial lung disease. Especially for this patient group it seems inevitable that a substantial increase in sensitivity associated with CAD goes along with a decrease in specificity. The opposite effect is seen when readers keep the threshold high: a substantial number of correctly indicated lesions by CAD will not be accepted as such. In summary learning effects to get adjusted to the characteristics, e.g., capacities as well as limitations of the system seem to be very important. The detrimental effect of decreasing specificity or – at least – substantially decreasing reader confidence is best studied in lesionfree images (‘controls’). Images, previously called not suspicious will now be called suspicious or cases, previously only being characterized as being of low suspicion will now be called ‘highly suspicious’, when also identified by CAD. The ability of differentiating false from true positive lesions and to rule out false positive lesions is significantly correlated with the experience of the reader and the amount of anatomic background noise in the source image. Yet, a decrease in specificity with the application of CAD is described in multiple studies and not for inexperienced readers. A quantification of that effect with respect to the amount of secondarily indicated CT examinations for further diagnostic workup seems warranted with respect to radiation dose, financial aspects and number and invasiveness of further diagnostic examinations.

6.2. Decrease of intra- and inter-reader variability Freedman and al demonstrated that most of the lesions newly identified by one radiologist using CAD were actually cases that the radiologist himself or other radiologists would have identified had they interpreted the images without CAD at another time point. Even when the application of CAD did not succeed in significantly increasing the sensitivity, in most studies so far it could be shown that it decreases intra- and inter-observer variability [36].

6.3. Value of a high negative predictive value of CAD The majority of radiographic exams are in deed performed to exclude pathology. It is conceivable that the increase in confidence by CAD to exclude pathology (‘I did not see a nodule and CAD also did not find a nodule’) will play a substantial role in clinical practice dependant on the prevalence of disease in the patient group. It has, however, to be noted that the less than 100% sensitivity of CAD requires its use as ‘second reader’ precludes its use for primary reading.

7. Options for the future Computer-aided diagnoses (CAD) has become one of the major research topics in medical imaging and diagnostic radiology, and has been applied to various imaging modalities including CT, MRI, ultrasound and radiography. Current CAD schemes appear to have the potential to increase the sensitivity for detection of focal lung lesions, such as small tumours and metastases. They appear to work best for less experienced readers. CAD at the same time is likely to increase the rate of false positive readings. Up to now there is little evidence about how CAD performs in a clinical environment with regard to diagnostic accuracy on one side and negative effects such as unnecessary follow-up exams with increased cost and unnecessary radiation exposure on the other side. Publications so far tended to focus on the effects of CAD on diagnostic accuracy, its effects on productivity may eventually prove to be equally important. Development and evaluation of these techniques require large databases with validated clinical images. Ideally common databases should become available that can be used to train and test CAD systems and improve comparison of future commercial systems. This issue is being addressed successfully by several initiatives with the goal to provide a database of web accessible cases for use in computer-aided diagnosis (CAD) research [38].

References [1] Henschke CI, McCauley DI, Yankelevitz DF, et al. Early lung cancer action project: overall design and findings from baseline screening. The Lancet 1999;354:99–105. [2] Swensen SJ, Jett JR, Hartmann TE, et al. CT screening for lung cancer: five year prospective experience. Radiology 2005;235:259–65. [3] White CS, Sali AI, Meyer CA. Missed lung cancer on chest radiography and computed tomography: imaging and medico-legal issues. J Thorac Imaging 1999;14:63–8. [4] Muhm JR, Miller WE, Fontana RS, et al. Lung cancer detected during a screening program using four month chest radiographs. Radiology 1983;148: 609–15. [5] Quekel LG, Kessels AG, Goei R, van Engelshoven JM. Miss rate of lung cancer on the chest radiograph in clinical practice. Chest 1999;115:720–4. [6] Shah PK, Austin JH, White CS, et al. Missed non-small cell lung cancer: radiographic findings of potentially respectable lesions evident only in retrospect. Radiology 2003;226:235–41. [7] Gavelli G, Giampalma E. Sensitivity and specificity of chest X-ray screening for lung cancer. In: Proceedings of the international conference on prevention and early diagnosis of lung cancer. 1998. p. 103–8. [8] Burgess AE, Wagner RF, Jennings RJ. Human signal detection performance for noisy medical images. IEEE Computer Soc Int Workshop Med Imaging 1982:88–105. [9] Samei E, Flynn MJ, Eyler WR. Detection of subtle lung nodules: relative influence of quantum and anatomic noise on chest radiographs. Radiology 1999;213:727–34. [10] Kundel HL, Revesz G. Lesion conspicuity, structured noise, and film reader error. Am J Roentgenol 1976;126:1233–8. [11] Boynton RM, Bush WR. Recognition of forms against a complex background. J Opt Soc Am 1956;46:758–64. [12] Revesz G, Kundel HL, Graber MA. The influence of structured noise on the detection of radiologic abnormalities. Invest Radiol 1974;9:479–86. [13] Bick U, Diekmann F. Digital mammography: what do we and what don’t we know. Eur Radiol 2007;17:1931–42. [14] Astley SM. Computer-aided detection for screening mammography (review). Clin Radiol 2004;59:390–9.

D.W. De Boo et al. / European Journal of Radiology 72 (2009) 218–225 [15] van Ginneken B, te Haar Roemny BM, Viergever MA. Computer-aided diagnosis in chest radiography: a survey. IEEE 2001;20(12):1228–41. [16] Matsumoto T, Doi K, Kano A, Nakamura H, Nakanishi T. Evaluation of the potential benefit of computer-aided (CAD) for lung cancer screenings using photofluorography: analysis of an observer study. Nippon Acta Radiol 1993;53:1195–207. [17] Warren Burhenne LJ, Wood SA, D-Orsi CJ, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000;215:554–62. [18] Chen JJ, White CS. Use of CAD to evaluate lung cancer on chest radiography. J Thorac Imaging 2008;23(2):93–6. [19] Kobayashi T, Xu XW, MacMahon H, Metz CE, Doi K. Effect of a computer-aided diagnosis scheme on radiologists performance in detection of lung nodules on radiographs. Radiology 1996;199:843–8. [20] MacMahon H, Engelmann R, Behlen FM, et al. Computer-aided diagnosis of pulmonary nodules: results of a large scale observer test. Radiology 1999;213:723–6. [21] Kakeda S, Moriya J, Sato H, et al. Improved detection of lung nodules on chest radiographs using a commercial computer-aided diagnosis system. AJR 2004;182:505–10. [22] Shiraishi J, Katsuragawa S, Ikezoe J, et al. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. AJR 2000;174:71–4. [23] Song W, Fan L, Xie Y, Qian JZ, Jin Z. A study of inter-observer variations of pulmonary nodule marking and characterizing on DR images. Proc SPIE 2005;5749:272–80. [24] Bley TA, Baumann T, Saueressig U, et al. Comparison of radiologist and CAD performance in the detection of CT confirmed subtle pulmonary nodules on digital chest radiographs. Invest Radiol 2008;43(Suppl. 16). [25] Schilham A, van Ginneken B, Loog M. A computer-aided diagnosis system for detection of lung nodules in chest radiographs with an evaluation on a public database. Med Image Anal 2006;10:247–58. [26] Hardie RC, Rogers SK, Wilson T, Rogers A. Performance analysis of a new computer aided detection system for identifying lung nodules on chest radiographs. Med Image Anal 2008;12:240–58.

225

[27] Kasai S, Li F, Shiraishi J, Dois K. Usefulness of computer-aided diagnosis schemes for vertebral fractures and lung nodules on chest radiographs. AJR 2008;191:260–5. [28] van Beek EJR, Mullan B, Thompson B. Evaluation of a real-time interactive pulmonary nodule analysis system on chest digital radiographic studies: a prospective study. Acad Radiol 2008;15:571–5. [29] Sakai S, Soeda H, Takahashi N, et al. Computer-aided nodule detection on digital chest radiography: validation test on consecutive T1 cases of respectable lung cancer. J Digit Imaging 2006;19(4):376–82. [30] White CS, Flukinger T, Jeudy J. Use of a computer-aided detection system to detect missed lung cancer on chest radiography. Radiol Soc N Am 2006 (abstract). [31] Li F, Engelmann R, Metz CE, Doi K, MacMahon H. Lung cancers missed on chest radiographs: results obtained with a commercial computer-aided detection program. Radiology 2008;246(1):273–80. [32] Gietema HA, Schaefer-Prokop CM, Mali W, Groenewegen G, Prokop M. Pulmonary nodules: interscan variability of semiautomatied volume measurements with multisection CT: influences of inspirations level, nodule size and segmentation performance. Radiology 2007;245(3):888–94. [33] Shiraishi J, Abe H, Engelmann R, et al. Computer-aided diagnosis to distinguish benign from malignant solitary nodules on radiographs: ROC analysis of radiologists’ performance—initial experience. Radiology 2003;227: 469–74. [34] Shiraishi J, Hiroyuki A, Li F, et al. Computer-aided diagnosis for the detection and classification of lung cancers on chest radiographs: ROC analysis of radiologists’ performance; 2006. [35] Gur D. Imaging technology and practice assessment studies: importance of the baseline or reference performance level. Radiology 2008;247:8–11. [36] Freedman M, Osicka T. Reader variability: what we can learn from computeraided detection experiments. JACR 2006;3(6):446–55. [37] He Q, He W, Wang K, Ma D. Effect of multislice processing in digital chest radiography on automated detection of lung nodule with a computer assistance system. J Digit Imaging 2008;21(Suppl. 1):S164. [38] MacMahon H. Advanced image processing and computer-aided diagnosis: are we there yet? Editorial J Thorac Imaging 2008;23(2):75–6.