Interobserver agreement in describing video capsule endoscopy findings: A multicentre prospective study

Interobserver agreement in describing video capsule endoscopy findings: A multicentre prospective study

Digestive and Liver Disease 43 (2011) 126–131 Contents lists available at ScienceDirect Digestive and Liver Disease journal homepage: www.elsevier.c...

425KB Sizes 0 Downloads 9 Views

Digestive and Liver Disease 43 (2011) 126–131

Contents lists available at ScienceDirect

Digestive and Liver Disease journal homepage: www.elsevier.com/locate/dld

Digestive Endoscopy

Interobserver agreement in describing video capsule endoscopy findings: A multicentre prospective study Alessandro Pezzoli a,∗ , Renato Cannizzaro b , Marco Pennazio c , Emanuele Rondonotti d , Laura Zancanella e , Nadia Fusetti a , Marzia Simoni f , Franco Cantoni g , Raffaele Melina h , Angela Alberani i , Giancarlo Caravelli j , Federica Villa d , Fausto Chilovi e , Tino Casetti g , Gaetano Iaquinto h , Nicola D’imperio i , Sergio Gullini a a

Department of Gastroenterology and Endoscopy Unit, Azienda ospedaliera-universitaria Sant’Anna, Ferrara, Italy Department of Gastroenterology, Centro di riferimento oncologico (CRO), Aviano (PN), Italy c Department of Gastroenterology, San Giovanni Battista University Teaching Hospital, Torino, Italy d Department of Gastroenterology, Department of Medical Sciences, IRCCS Policlinico Mangiagalli, Regina Elena Foundation, Università di Milano, Italy e Department of Gastroenterology, Ospedale generale, Bolzano, Italy f Istituto di Fisiologia Clinica CNR, Unita’ di Epidemiologia Ambientale Polmonare, Pisa, Italy g Department of Gastroenterology, Ospedale S. Maria delle Croci, Ravenna, Italy h Department of Gastroenterology, Ospedale Moscati, Avellino, Italy i Department of Gastroenterology, Ospedale Maggiore, Bologna, Italy j Department of Gastroenterology, Ospedale S.Gennaro, Napoli, Italy b

a r t i c l e

i n f o

Article history: Received 18 February 2010 Accepted 26 July 2010

Keywords: Bleeding Capsule endoscopy Interobserver agreement Kappa statistic

a b s t r a c t Background and Aim: Few studies have specifically addressed interobserver agreement in describing lesions identified during capsule endoscopy. The aim of our study is to evaluate interobserver agreement in the description of capsule endoscopy findings. Materials and methods: Consecutive short segments of capsule endoscopy were prospectively observed by 8 investigators. Seventy-five videos were prepared by an external investigator (gold standard). The description of the findings was reported by the investigators using the same validated and standardized capsule endoscopy structured terminology. The agreement was assessed using Cohen’s kappa statistic. Results: As concerns the ability to detect a lesion, the agreement with the gold standard was moderate (kappa 0.48), as well as the agreement relating to the final diagnosis ( 0.45). The best agreement was observed in identifying the presence of active bleeding ( 0.72), whereas the poorest agreement concerned the lesion size ( 0.32). The agreement with the GS was significantly better in endoscopists with higher case/volume of capsule endoscopy per year. Diagnostic concordance was better in the presence of angiectasia than in the presence of polyps or ulcers/erosions. Conclusions: Correct lesion identification and diagnosis seem more likely to occur in presence of angiectasia, and for readers with more experience in capsule endoscopy reading. Published by Elsevier Ltd on behalf of Editrice Gastroenterologica Italiana S.r.l.

1. Background Capsule endoscopy (CE) has been in use in clinical practice since 2000 [1], and within a few years it has become a first-line test for visualizing the mucosa of the small intestine [2,3]. This technique has made it possible to examine directly the small-bowel mucosa and to visualize intestinal segments, which were previously explored only by using radiological tech-

∗ Corresponding author at: Department of Gastroenterology and GI Endoscopy, University Hospital Sant’Anna, Corso Giovecca 203, 44100 Ferrara, Italy. Tel.: +39 0532236833; fax: +39 0532236932. E-mail address: [email protected] (A. Pezzoli).

niques. As a consequence, small-bowel pathologies have started to be observed for the first time, and this might explain the difficulty in detecting significant lesions and interpreting the findings. Few studies have specifically addressed interobserver agreement in describing lesions identified during CE. Interobserver agreements have been mostly analysed as an ancillary part of studies designed for other aims (i.e. comparison of diagnostic yield in different technologies) or documented in abstract forms [4–9]. Most published studies include a limited number of readers [10–13], often with minimal experience in CE reading [14], or focus on only specific pathologies [15,16]. The aim of the present study was to evaluate interobserver agreement and accuracy in the interpretation of CE by expert

1590-8658/$36.00 Published by Elsevier Ltd on behalf of Editrice Gastroenterologica Italiana S.r.l. doi:10.1016/j.dld.2010.07.007

A. Pezzoli et al. / Digestive and Liver Disease 43 (2011) 126–131

endoscopists, who recorded their findings by using the same standardized terminology. 2. Materials and methods The study was a multicentre study. A group of 8 endoscopists (investigators) participated in the study. All investigators have considerable experience in reading CE as they perform an average of 56 CE per year (range from 30 to 100 CE/year). An external investigator (M.P.), a well-known gastroenterologist with extensive experience in capsule endoscopy (over 900 CE readings at the time when this study was performed), non-participant in the study, randomly selected from his historical patients, 75 short segments of CE (20 s). The segments were recorded (and deidentified) on DVD (videos) and sent to each investigator. It was decided to display short segments to be sure that all investigators were analysing the same picture in each video. Moreover, in this way, a high number of cases could be analysed by several investigators in a relatively short time. This external investigator was the only one who knew the clinical history and the final diagnosis of his patients, and he was regarded as “gold standard” (GS) in statistical analyses. Both the GS and the investigators were provided with an “ad hoc” form to record their CE findings. The form included questions with mutually exclusive multiple response options based on the standardized capsule endoscopy structured terminology (CEST) [17,18]. The CEST allows the description of different attributes using standardized and appropriate terms by taking into account the specific characteristic of CE. The attributes are a list of descriptive concepts, as well as size, colour and shape of a lesion, for which there are a series of possible values corresponding to an appropriate term. Qualifying attributes which provide additional details are attached to the term. The list of terms with the specifications given by the attributes allows the translation of the concepts evoked by the observer into a structured language. The investigators received no training on CEST before the revision process. Considered items were presence of a lesion, bleeding, shape/colour/size of the lesion, and final diagnosis. ‘Target’ characteristics of the videos, as reported by the GS, are shown in Table 1. The most common diagnoses were ulcers (20%), polyps (17%), and angiodysplasias (11%). Eight percent of cases had a diagnosis of Table 1 ‘Target’ characteristics of the videos (n = 75) as stated by the gold standard. Item

n (%)

Presence of a lesion Presence of bleeding Shape Flat Plaque Protruding Submucosal Colour Red White Similar to mucosa Other Size <5 mm >5 mm >10 mm Diagnosis Mass-tumour Ulcer/erosion Angiectasia Polyp Other Normal mucosa

64 (85.3) 11 (14.7) 35 (46.7) 7 (9.3) 15 (20.0) 7 (9.3) 18 (24.0) 25 (33.3) 20 (26.7) 1 (1.3) 5 (6.7) 27 (36.0) 32 (42.7) 6 (8.0) 15 (20.0) 8 (10.7) 13 (17.3) 22 (29.3) 11 (14.7)

127

mass/tumour. A miscellanea of other diagnoses (e.g. villi abnormalities, stenosis, bleeding without evident lesions, scar, stenosis, aphtha, diverticuli, etc.) were included, as well as 11 cases (15%) with normal mucosa. These negative videos were included to evaluate the investigators’ ability in discriminating between normal and pathological findings. Each investigator was blinded as to the interpretation of the 7 other participants and the clinical data of the cases. However, the investigators were informed about the presence of negative videos. The GIVEN software program to view the recordings in a consecutive order was used by all investigators. 3. Statistical analyses Statistical analyses were performed by SPSS (rel 13). Used routines were frequency analysis, cross-tabulation, chi-square test and logistic regression analysis. Interobserver agreement was assessed by using Cohen’s kappa () statistic [19,20] that determines the level of agreement between 2 investigators who are assigning one of n categories to m subjects, taking into account the agreement occurring by chance. The  index ranges from 0 (absence of agreement) to 1 (perfect agreement). A value less than 0.20 indicates slight agreement, from 0.21 to 0.40 fair, from 0.41 to 0.60 moderate, from 0.61 to 0.80 good, and higher than 0.81 almost perfect agreement. Cohen’s  indices were computed separately for each considered item, and for each of the 8 pairs consisting of one investigator and the GS. The results are reported as mean , 95% confidence interval (CI) of the mean, and minimum–maximum. Since it has been observed that  may tend to underestimate the agreement [21], the absolute agreement (accordance with the standard assignation) was also considered. The absolute agreement was used in logistic regression analyses to assess the risk for disagreement with the GS, as concerns lesion identification and diagnosis, in presence/absence of more common specific target diagnoses (angiectasia, polyps, or ulcer/erosion). The statistical model included disagreement (0 = no, 1 = yes), as dependent variable, and target specific diagnoses as independent dichotomous variables (0 = no, 1 = yes). Ulcers and erosions, for statistical analyses, were grouped into a single category defined as “ulcer”. The agreement with the GS was also evaluated separately for the two investigators with the highest case volume of CE (more than 60 CE per year) and the two investigators with the lowest case volume (about 30 CE per year). A sample size of 75 cases was reviewed by each investigator; this was the minimum number of videos, as indicated by a power analysis, to give statistical power to analyses with a significance level of 5% (p = 0.05) [22]. Finally, for more information, we evaluated interobserver agreement between the investigators, independent of the GS, for each item. Cohen’s  indices were obtained for each pair of different readers (here, 28 combinations) to provide means and ranges (minimum–maximum). The mean value of Cohen’s  indices is the same as the total , as computed by Fleiss’  statistic, that evaluates interobserver agreement amongst more than 2 readers [20]. 4. Results All investigators reviewed all 75 short segments and filled in the answer sheet for all the considered categories (response rate 100%). The results regarding interobserver agreement between the investigators and the GS are summarized in Table 2. For each considered item, Cohen’s  values consistently vary among the readers, as indicated by large ranges. For the first step analysed, concerning the presence/absence of a lesion, the absolute concordance of the investigators with the GS was achieved in 86% of the observations. The overall  coefficient

128

A. Pezzoli et al. / Digestive and Liver Disease 43 (2011) 126–131

Table 2 Interobserver agreement (total number of observations, n = 600). Item

Agreement between investigators and GS Absolute agreement (%)

Lesion Bleeding Shape Colour Size Diagnosis * **

86.2 92.2 61.2 61.2 47.5 57.2

Cohen’s 

Agreement between investigators Cohen’s **

*

Mean (95% CI)

Min–max

Mean (95% CI)

Min–max

0.48 (0.33–0.62) 0.72 (0.60–0.83) 0.45 (0.36–0.55) 0.49 (0.40–0.58) 0.34 (0.23–0.44) 0.45 (0.38–0.52)

0.21–0.72 0.45–0.90 0.28–0.60 0.34–0.63 0.13–0.48 0.33–0.57

0.41 (0.35–0.46) 0.60 (0.55–0.65) 0.35 (0.32–0.39) 0.40 (0.36–0.44) 0.30 (0.26–0.33) 0.40 (0.37–0.42)

0.12–0.78 0.28–0.79 0.20–0.53 0.22–0.57 0.13–0.44 0.22–0.53

Cohen’s  index was computed for each of the 8 pairs consisting of one author and the gold standard (GS). Cohen’s  index was computed for each of the 28 different pairs of investigators.

resulted in 0.48 (95% CI 0.33–0.62), thus indicating moderate agreement. According to the ‘target’ findings, sensitivity (identification of true positives) is high (91%) whereas specificity (identification of true negatives) is quite low (59%), which means the normal status was not identified in 41% of negative videos (Fig. 1). Mean identification per observer of the normal mucosa (n videos = 11) was 6.5 (standard deviation 2, range 3–10). The agreement for active bleeding was good ( 0.72, 95% CI 0.60–0.83). Presence/absence of bleeding was correctly identified in 92% of the cases. Both sensitivity and specificity were high (83 and 90%, respectively) (Fig. 1). The agreement with the GS was moderate for both the shape ( 0.45, 95% CI 0.36–0.55) and the colour of the lesion ( 0.49, 95% CI 0.40–0.58). The size of the lesions was the item with the worst agreement, showing an overall  value of 0.34 (95% CI 0.23–0.44). Fig. 2 shows the prevalence of lesion characteristics, as reported by the investigators, according to the target findings. “Protruding” is the shape most easily identifiable by the investigators (78% of the lesions with this shape were correctly identified), whereas “plaque” is the shape most difficult to recognize (only 21% of such lesions were correctly identified). Red is the colour most prevalently recognized (70% of the red lesions were correctly described). As concerns the size, the investigators agree with the GS especially for lesions smaller than 5 mm or larger than 10 mm. Finally, the agreement for the diagnosis was also moderate, with a  value of 0.45 (95% CI 0.38–0.52). The absolute agreement with the GS was achieved in about 57% of the cases. Angiectasia is the diagnosis most identified, whereas a correct diagnosis of ulcer occurs with the lowest prevalence (Fig. 3). Data concerning the agreement for lesion detection and diagnosis were analysed separately for more common target pathologies (angiectasias, polyps and ulcer) to assess whether the type of diagnosis may affect the extent of concordance with the GS. The presence of a lesion was correctly identified in all videos of patients with a diagnosis of angiectasia, while it was correctly identified in 91.4% and 85.5% of videos relating to patients with polyps or

Fig. 1. Lesion and bleeding identification. Prevalence of investigators’ findings according to the gold standard (GS).

Fig. 2. Shape (a), colour (b), and size (c) of the lesion. Prevalence of investigators’ findings according to the gold standard (GS).

Fig. 3. Prevalence of diagnoses reported by the investigators according to the ‘target’ diagnoses of the gold standard (GS).

A. Pezzoli et al. / Digestive and Liver Disease 43 (2011) 126–131

129

5. Discussion

Fig. 4. Multiple logistic regression analysis. Risk for disagreement (dependent variable) with the gold standard (GS) as concerns lesion identification (filled circles) and final diagnosis (empty triangles) mutually adjusted for specific GS’ ‘target’ diagnoses (reference: absence of specific diagnosis). The statistically significant associations (p < 0.05) are highlighted.

Table 3 Agreement between the investigators and the gold standard according to lower or higher experience in capsule endoscopy reading. Item

Lesion Bleeding Shape Colour Size Diagnosis *

Cohen’s  (95% confidence interval) Lower case/volume per year

Higher case/volume per year

p*

0.24 (0.06–0.42) 0.71 (0.54–0.88) 0.40 (0.33–0.41) 0.41 (0.37–0.42) 0.19 (0.10–0.25) 0.36 (0.28–0.50)

0.50 (0.30–0.71) 0.77 (0.63–0.92) 0.49 (0.39–0.58) 0.42 (0.27–0.46) 0.36 (0.30–0.43) 0.52 (0.40–0.61)

<0.05 ns ns ns ns <0.05

Statistical difference of absolute agreement by chi square, ns = not significant.

ulcer, respectively (significant difference between diagnoses, p of the trend < 0.01). As concerns the diagnosis, the absolute agreement with the GS was significantly more prevalent for videos of patients with the diagnosis of angiectasia (87.5%) than for videos of patients with a diagnosis of polyps (51.0%) or ulcer (44.2%) (p of the trend < 0.001). Logistic regression analyses provide, in quantitative terms, the amount of the risk for disagreement with the GS in presence of angiectasia, polyps, or ulcer, in comparison to their absence, respectively. Such a risk is about two-fold lower for patients with rather than without a diagnosis of polyps (Odds ratios, OR 0.42, 95% CI 0.20–0.89, p = 0.02); the diagnosis of ulcer also is inversely associated with the risk for disagreement, but not significantly (OR 0.74, 0.41–1.33) (Fig. 4). As regards the final diagnosis, the risk for disagreement with the GS is about five-fold lower for patients with rather than without angiectasia (OR 0.20, 95% CI 0.09–0.43, p < 0.001), and about two-fold higher for patients with rather than without ulcer (OR 1.75, 1.14–2.67, p < 0.01). The risk for disagreement is also higher, but not significantly, for patients with rather than without a diagnosis of polyps (OR 1.33, 0.85–2.08) (Fig. 4). The two investigators with the lowest case volume of CE per year, in comparison to the two with the highest, showed a significant lower agreement in both lesion identification ( 0.24, 95% CI 0.06–0.42 versus 0.50, 95% CI 0.30–0.71, p < 0.05) and diagnosis ( 0.36, 0.28–0.50 versus 0.52, 0.40–0.61, p < 0.05). The different experience did not significantly affect the agreement relating to the bleeding or lesion description. The results are summarized in Table 3. Additionally, we evaluated interobserver agreement between the 8 investigators, independent of the GS (Table 2). In general, in terms of agreement level, the results are quite similar to those found for interobserver agreement between the investigators and the GS, except for the shape of the lesion (fair rather than moderate agreement).

Our results seem to indicate some difficulty in interpreting CE findings also on the part of expert endoscopists, especially as concerns the description of the lesions. In the literature, the diagnostic yield of CE ranges from 31 to 74% in different studies [23–26]. This wide range may be due to several factors, and we feel that the varying degree of accuracy of the endoscopists in detecting and describing small-bowel lesions could be a relevant issue. It is therefore very important to address interobserver variability when analysing small bowel images. In studies on CE reading agreement, the interpretation of images is important and high precision in describing clinical findings is requested. In our opinion, the main strength of the present study is the use of the same standardized terminology to describe CE findings, which allows a more accurate comparison of the reports. In previous studies, it is often difficult to understand how the readers reported CE results. Until the present CEST terminology had been used in only one previous study [27]. A comparison with other studies is not easy, because of different methodology. We found, on average, a moderate interobserver agreement (total mean  = 0.48), according to the results by previously published papers (mean  = 0.50) [28]. A similar result was also reported by Lai et al., who assessed the agreement of smallbowel diagnosis among three readers, including one specialist in gastroenterology ( = 0.56) [10]. This result may be judged satisfactory, but not optimal. A recent study on CE reports based on CEST found a better agreement between gastroenterology experts and a gold standard ( = 0.64) [27]. Substantial agreement amongst three expert observers is also reported by Jensen et al. ( = 0.68), but it concerns only the detection of small-bowel Crohn’s disease [29]. In the present study, sensitivity for the detection of a lesion is very high (91%), in agreement with other studies [10,11], but specificity for recognizing the normal mucosa is quite low (59%). This explains the moderate agreement with the GS for lesion detection ( = 0.48). There is evidence that CE, in comparison to other tests, may increase the number of false positives [30]. This is one of the limitations of CE: sometimes findings of uncertain relevance can be noticed in healthy subjects [31]. In our study, false-positive findings were reported for 41% of videos with normal mucosa. Of these, the incorrect presence of polyps was recorded only for one case; in most of normal cases (33%) the observers reported the presence of ulcers (or erosions). The lesions that were reported for normal videos were mainly flat (81%) white or similar to the mucosa in 31% of cases, and >10 mm in size only in 14% of case. An excess precision in reporting too small or minor mucosal abnormalities, which are clinically insignificant, and not reported in routine clinical practice, could partially explain low specificity for detecting normal status. According to other investigators [4,6], the concordance was better in the presence of target diagnosis of angiectasia rather than polyps or ulcers. This result is relevant, since angiectasia is the most common finding revealed by CE [32], and it seems to permit a more specific diagnosis. The relatively low sensitivity for polyps (51%), in our study, might confirm the conclusions of Matsumoto et al. [33], who noted that double balloon enteroscopy appears to be superior to CE in the diagnosis of small intestinal polyps. There are conflicting results on interobserver agreement related to ulcers or erosions. A study that compared the ability of CE to diagnose postoperative recurrence of Crohn’s between two readers found almost perfect agreement for erosions, but considerably lower agreement for ileal ulcerations [34]. Another study reports that accuracy was higher in cases with ulcers, and lower for erosions [27]. When we analysed ulcers and erosions separately, we found fair agreement for both, likely due, in part, to the relatively high amount of false-positive findings.

130

A. Pezzoli et al. / Digestive and Liver Disease 43 (2011) 126–131

Several studies show that CE identifies bleeding more accurately than other pathologies [6–10,27], and this was also confirmed, in our study, by the substantial agreement with GS and by both high sensitivity and specificity. Lesion size is the category in which we found the worst agreement ( = 0.34). With the advent of double balloon enteroscopy some ileal lesions can be removed endoscopically [35], and a correct definition of size is important for decision-making in therapy. The problem in sizing small-bowel lesions is another clinical limitation of CE [31]. On the other hand, the poor ability of endoscopists to correctly determine the size of polyps is a well-known phenomenon even in conventional endoscopy [36,37].Although our study involved only expert endoscopists, there was a wide variability in interobserver agreement between each investigator and the GS. This might be partially explained by the noticeable variation of case volume among the investigators. According to other studies [6,27], interobserver agreement is better among more experienced than less experienced endoscopists; the investigators with the lowest number of CE per year often miss the lesion and make an incorrect diagnosis. Little is known about training in CE. The American Society for Gastrointestinal Endoscopy (ASGE) has established the threshold number of procedures at which competence could be achieved in conventional endoscopy [38], but no specific indications are given about the training in reading CE [39]. With the increasing use of this new procedure in gastrointestinal departments the aspect of formal training is becoming an emerging issue as outlined by our results and by recent data [40]. Our study presents some limitations. The choice to show only a short segment of CE might be regarded as a major limitation, because it might affect the extent of agreement between the investigators and the GS as concerns the final diagnosis. Our investigators were unaware of the patients’ history, and by reviewing only a short segment of CE, they were unable to express an opinion on the clinical significance of the findings. However, in comparison with other studies based on complete observation of CE transit, we found similar results regarding the extent of interobserver agreement in final diagnosis. It is interesting to point out that our main aim was to evaluate those aspects of interobserver agreement that have been poorly or never investigated in previous studies, namely the reader’s ability to detect both positive and negative findings (presence/absence of any lesion), and to characterize detected lesions (size, colour, and shape). The evaluation of these aspects is not affected by the duration of the CE segment. Such a short segment avoids the comparison of findings related to different scenarios. Last but not least, as reported above, we were able to collect a large body of observations – favouring reliable statistical analyses – by a relatively high number of observers, without requiring a large expenditure of their time. Other previously published studies on interobserver agreement analysed short segments of CE [6–27]. We evaluated the accuracy of the investigators against the targets findings defined by an expert gastroenterologist, as other studies did [27,29]. Unfortunately, the final diagnosis was not confirmed by endoscopy, surgical or histological outcomes for all videos. This might be another limitation that explains the poor agreement for subjective categories such as size and shape. Finally, the lack of CEST training might be criticized. All investigators of the present study were experts in CE reading, but we cannot exclude that such a training could slightly increase the extent of agreement. However, in a recent study based on CEST, no training was performed, despite the fact that the study included trainees as well as experienced gastroenterologists [27]. In conclusion, our study confirms that there is a moderate degree of agreement in reporting CE findings even though the concordance increases in the presence of bleeding and angiectasia. Analyses with

statistical power and high comparability of the findings, due to the use of the same standardized lexicon, confirm the difficulty in interpreting CE images even for expert endoscopists. Conflict of interest statement None declared. Acknowledgement We thank Ms. Alison Milne for reviewing the English version of the manuscript. References [1] Meron GD. The development of the swallowable video capsule (M2A). Gastrointest Endosc 2000;52:817–9. [2] Mishkin DS, Chuttani R, Croffie J, et al. ASGE Technology Status Evaluation Report: wireless capsule endoscopy. Gastrointest Endosc 2006;63:539– 45. [3] Marmo R, Rotondano G, Piscopo R, et al. Meta-analysis: capsule endoscopy vs. conventional modalities in diagnosis of small bowel diseases. Aliment Pharmacol Ther 2005;22:595–604. [4] Saurin JC, Delvaux M, Gaudin JL, et al. Diagnostic value of endoscopic capsule in patients with obscure digestive bleeding: blinded comparison with video push-enteroscopy. Endoscopy 2003;35:576–84. [5] Mylonaky M, Fritscher-Ravens A, Swain P. Wireless capsule endoscopy: a comparison with push enteroscopy in patients with gastroscopy and colonoscopy negative gastrointestinal bleeding. Gut 2003;52:1122–6. [6] De Leusse AVI, Landi B, Burtini P, et al. Video capsule endoscopy (CE) for obscure gastrointestinal bleeding: feasibility, diagnostic yield and interobserver agreement. Gastroenterology 2003;124:A245. [7] Mergener K, Enns R. Interobserver variability for reading capsule endoscopy examinations. Gastrointest Endosc 2003;57:AB85. [8] Sigmundsson H, Das A, Isenberg G. Capsule endoscopy (CE): interobserver comparison of interpretation. Gastrointest Endosc 2003;57:AB165. [9] Sneha J, Deepak K, Crawford PJ. Inter-observer variations on interpretation of capsule endoscopy and its impact on training requirements for competence. Gastrointest Endosc 2008;67:AB301. [10] Lai LH, Wong GL, Chow DK, et al. Inter-observer variations on interpretation of capsule endoscopies. Eur J Gastroenterol Hepatol 2006;18:283–6. [11] De Leusse A, Landi B, Edery J, et al. Video capsule endoscopy for investigation of obscure gastrointestinal bleeding: feasibility, results and interobserver agreement. Endoscopy 2005;37:617–21. [12] Levinthal GN, Burke CA, Santisi JM. The accuracy of an endoscopy nurse in interpreting capsule endoscopy. Am J Gastroenterol 2003;98:2669–71. [13] Niv Y, Niv G. Capsule endoscopy examination-preliminary review by a nurse. Dig Dis Sci 2005;50:2121–4. [14] Chen GC, Enayati P, Tran T, et al. Sensitivity and inter-observer variability for capsule endoscopy image analysis in a cohort of novice readers. World J Gastroenterol 2006;12:1249–54. [15] Petroniene R, Dubcenco E, Baker JP, et al. Given capsule endoscopy in celiac disease: evaluation of diagnostic accuracy and interobserver agreement. Am J Gastroenterol 2005;100:685–94. [16] Biagi F, Rondonotti E, Campanella J, et al. Video capsule endoscopy and histology for small-bowel mucosa evaluation: a comparison performed by blinded observers. Clin Gastroenterol Hepatol 2006;4:998–1003. [17] Korman LY, Delvaux M, Gay G, et al. Capsule Endoscopy Structured Terminology (CEST): proposal of a standardized and structured terminology for reporting capsule endoscopy procedures. Endoscopy 2005;37:951–9. [18] Delvaux M, Friedman S, Keuchel M, et al. Structured terminology for capsule endoscopy: results of retrospective testing and validation in 766 small-bowel investigations. Endoscopy 2005;37:945–50. [19] Landis JR, Koch GG. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 1977;33:363–74. [20] Fleiss JL. The measurement of interobsever agreement. In: Statistical methods for rates and proportions. New York: John Wiley; 1981. p. 212–36. [21] Strijbos J, Martens R, Prins F, Jochems W. Content analysis: what are they talking about? Comput Educ 2006;46:29–48. [22] Altaye M, Donner A, Eliasziw M. A general goodness-of-fit approach for inference procedures concerning the kappa statistic. Stat Med 2001;20: 2479–88. [23] Costamagna G, Shah SK, Riccioni ME, et al. A prospective trial comparing small bowel radiographs and video capsule endoscopy for suspected small bowel disease. Gastroenterology 2002;123:999–1005. [24] Ell C, Remke S, May, et al. The first prospective controlled trial comparing wireless capsule endoscopy with push enteroscopy in chronic gastrointestinal bleeding. Endoscopy 2002;34:685–9. [25] Pennazio M, Santucci R, Rondonotti E, et al. Outcome of patients with obscure gastrointestinal bleeding after capsule endoscopy: report of 100 consecutive cases. Gastroenterology 2004;126:643–53.

A. Pezzoli et al. / Digestive and Liver Disease 43 (2011) 126–131 [26] Mata A, Bordas JM, Feu F, et al. Wireless capsule endoscopy in patients with obscure gastrointestinal bleeding: a comparative study with push enteroscopy. Aliment Pharmacol Ther 2004;15:189–94. [27] Jang BI, Lee SH, Moon JS, et al. Inter-observer agreement on the interpretation of capsule endoscopy findings based on capsule endoscopy structured terminology: a multicenter study by the Korean Gut Image Study Group. Scand J Gastroenterol 2010;45:370–4. [28] Mergener K, Ponchon T, Gralnek I, et al. Literature review and recommendations for clinical application of small-bowel capsule endoscopy, based on a panel discussion by international experts. Endoscopy 2007;39:895–909. [29] Jensen MD, Nathan T, Kjeldsen J. Inter-observer agreement for detection of small bowel Crohn’s disease with capsule endoscopy. Scand J Gastroenterol 2010;45:878–84. [30] Lashner Ba. Sensitivity-specificity trade-off for capsule endoscopy in IBD: is it worth it? Am J Gastroenterol 2006;101:965–6. ˜ [31] Munoz-Navas M. Capsule endoscopy. World J Gastroenterol 2009;15:1584–6. [32] Sturniolo GC, Di Leo V, Vettorato MG, et al. Small Bowel exploration by wireless capsule endoscopy: results from 314 procedures. Am J Med 2006;119: 341–7. [33] Matsumoto T, Esaki M, Moriyama T, et al. Comparison of capsule endoscopy and enteroscopy with the double-balloon method in patients with obscure bleeding and polyposis. Endoscopy 2005;37:827–32.

131

[34] Bourreille A, Jarry M, D’Halluin PN, et al. Wireless capsule endoscopy versus ileocolonoscopy for the diagnosis of postoperative recurrence of Crohn’s disease: a prospective study. Gut 2006;55:978–83. [35] Yamamoto H, Yano T, Kita H, et al. New system of double-balloon enteroscopy for diagnosis and treatment of small intestinal disorders. Gastroeterology 2003;125:1556. [36] Fennerty MB, Davidson J, Emerson SS, et al. Are endoscopic measurement of colonic polyps reliable? Am J Gastroenterol 1993;88:496–500. [37] Gopalswamy N, Shenoy VN, Choudhry U. Is in vivo measurement of size of polyps during colonoscopy accurate? Gastrointest Endosc 1997;46:497–502. [38] American Society for Gastrointestinal Endoscopy. Principles of training in gastrointestinal endoscopy. Gastrointest Endosc 1999;49:845–53. [39] Faigel DO, Baron TH, Adler DG, et al. ASGE guideline: guidelines for credentialing and granting privileges for capsule endoscopy. Gastrointest Endosc 2005;61:503–5. [40] Sidhu R, Sakellariou P, McAlindon ME, et al. Is formal training necessary for capsule endoscopy? The largest gastroenterology trainee study with controls. Dig Liver Dis 2008;40:298–302.