Stigmata of hemorrhage in bleeding peptic ulcers: an interobserver agreement study among international experts James Y.W.l.au, MBBS, FRCS(Edin), Joseph J.Y. Sung, MBBS, MRCP(UK), PhD Angus C.W. Chan, MBChB, FRCS(Edin), Grace W.Y. Lai, Msc(Oxon), Joseph T.F. Lau, PhD (Cal) Enders K.W. Ng, MBChB, FRCS(Edin), S.C. Sydney Chung, MD, FRCS(Ed), MRCP(UK), FRCP(Ed) Arthur K.C. Li, MA, MD, FRCS, FRCS(Ed), FRACS Shatin, HongKong Background: Stigmata of hemorrhage predict rebleeding and outcome of patients with bleeding peptic ulcers. There are variabilities in reported incidences of stigmata and their respective rebleeding risks. We sought to study the interobserver agreement among experts. Methods: Between June 1994 and July 1994, 100 consecutive patients with bleeding peptic ulcers underwent videoendoscopy within 24 hours of their admissions. An edited videotape of these ulcers was compiled and sent to an international panel of 14 experts. They independently rated these ulcers exclusively into one of the six categories: spurting, oozing, nonbleeding visible vessel, adherent clot, flat pigmented spot, or clean based. Agreement between any t w o experts was expressed by a kappa estimate (K). Agreements over individual stigmata and a composite K estimate (Kw) signifying overall agreement were also computed. Results: Out of the possible 91 pairwise K estimates among 14 experts, 35 (38,5%) were less than or equal to 0.40, indicating poor agreement. None of the K estimates was greater than 0.75. Composite K estimates for individual stigmata were as follows: spurting K = 0.664, oozing K = 0.420, nonbleeding visible vessel K = 0.342, adherent clot K = 0.426, flat pigmented spot K = 0.393, and clean-based ulcer K = 0.371. The weighted K estimate was 0.426. Conclusion: Agreement between experts was poor in more than a third of occasions. Although the overall interobserver agreement was fair (0.4 < K < 0.75), agreements for nonbleeding visible vessels, flat pigmented spots, and clean-based ulcers were poor. (Gastrointest Endosc 1997;46:33-6.)
Two decades ago, Forrest et al. 1 described endoscopic findings in patients with bleeding peptic ulcers. They were categorized into those with active bleeding, signs of recent bleeding, or no sign of bleedReceived July 16, 1996. For revision September 19, 1996. Accepted January 28, 1997. From the Departments of Surgery and Medicine, Center of Clinical Trials and Epidemiological Research, Prince of Wales Hospital, The Chinese University of Hong Kong, Shatin, Hong Kong. Presented in part a~ the annual meeting of the American Society for Gastrointestinal Endoscopy, May 1995, San Diego, California (Gastrointest Endosc 1995;41:368). Reprint requests: Sydney S.C. Chung, Surgery, Prince of Wales Hospital, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong. Copyright © 1997 by the American Society for Gastrointestinal Endoscopy 0016-5107/97/$5.00 + 0 37/1/81576 VOLUME 46, NO. 1, 1997
ing. The use of endoscopic signs in the prediction of rebleeding and mortality has been common. Reports on the assessment of various endoscopic treatments have been based on modifications of the above nomenclature. However, ]arge discrepancies exist over the reported prevalence of these endoscopic signs and their respective rebleeding risks. This m a y be attributed to the variable interpretations of these signs, The present study attempts to elucidate the interobserver agreement over endoscopic signs in bleeding peptic ulcers among international experts.
PATIENTS AND METHODS Between June 1994 and July 1994, 100 consecutive patients with peptic ulcer bleeding who presented to the Prince of Wales Hospital underwent endoscopy (GIFXQ200 with the EVIS-200 system, Oylmpus Co., Tokyo, GASTROINTESTINAL ENDOSCOPY 33
J Lau, J Sung, A Chan, et al.
Interobserver agreement study on stigmata of hemorrhage
Table 1. An international panel of 14 experts participated in the study Name
Country
P.C. Bornman F.J. Branicki S.C.S. Chung P.B. Cotton M.L. Freeman D.M. Jensen H,J. Lin P. Rutgeerts N. Soehendra J.D. Sollano R.J.C. Steele D.W. Storey C. Sugawa C.P. Swain
South Africa Australia Hong Kong United States United States United States Taiwan Belgium Germany Phillipines England Australia United States England
Japan) within 24 hours of admission. Video images were recorded on U-matic tapes (U-matic SP, VO-9600P videorecorder, Sony, Tokyo, Japan). An edited videotape in the VHS format consisting of 20-second clips of these 100 bleeding ulcers was compiled and sent to 14 investigators with interests in endoscopic ulcer hemostasis (Table 1). They independently rated these ulcers exclusively into one of six categories: ulcers with (1) spurting or (2) oozing hemorrhage, ulcers with (3) nonbleeding visible vessels, (4) adherent clots, or (5) flat pigmented spots, and (6) clean-based ulcers. The interobserver agreement over endoscopic signs in bleeding peptic ulcers was analyzed using kappa statistics (K). Pairwise K estimates were calculated to evaluate agreement between any two endoscopists. 2' 3 Individual K estimate was then computed for each of the six categories. a They represented the degree of agreement among endoscopists in each category. A weighted ~ estimate (KW) signifying a single overall measure of interobserver agreement in the suggested nomenclature was computed using the method proposed by Landis and Koch. 4 It represented the weighted average of K estimates for each individual category. Each K value and its standard error were estimated and a test of significance was performed against the alternate hypothesis that the interobserver agreement was better than the degree of agreement by chance alone, i.e., Ho: K = 0. The hypothesis that the underlying value of K is 0 is tested by referring the value of the critical ratio (z = K/standard error of K) to the tables of the standard normal distribution. A sufficiently large z rejects the hypothesis. K estimates computed represent agreement over and above the agreement that could have occurred by chance alone. The maximum value of K is 1, indicating perfect diagnostic concordance. A value below zero results when the observed agreement falls below agreement expected by chance alone. A higher score indicates less disagreement whereas wide disagreement gives rise to a low score. Interpretation of the K estimate (K) is as follows: K = 1 indicates perfect diagnostic concordance; K -> 0.75 indi34 GASTROINTESTINAL ENDOSCOPY
cares excellent agreement beyond chance; 0.40 < K < 0.75 indicates fair to good agreement beyond chance; and 0 -< K <- 0.40 indicates poor agreement beyond chance.
RESULTS All endoscopists p a r t i c i p a t e d in t h e study. Pairwise K e s t i m a t e s are e x p r e s s e d in t h e cells of a 14 × 14 m a t r i x table (Table 2). O u t of t h e possible 91 p a i r w i s e K e s t i m a t e s a m o n g 14 experts, t h e v a l u e was below or e q u a l to 0.40 on 35 occasions (38.5%), signifying poor i n t e r o b s e r v e r a g r e e m e n t b e t w e e n two given endoscopists. All t h e o t h e r K e s t i m a t e s fell w i t h i n t h e r a n g e of 0:40 a n d 0.75, i n d i c a t i n g fair to good a g r e e m e n t . N o n e of t h e K e s t i m a t e s was g r e a t e r t h a n 0.75. I n d i v i d u a l K e s t i m a t e was t h e n c o m p u t e d for each of t h e six categories in t h e proposed n o m e n c l a t u r e (Table 3). Good a g r e e m e n t b e y o n d c h a n c e (K = 0.664) was o b t a i n e d for ulcers w i t h s p u r t i n g h e m o r rhage. F a i r a g r e e m e n t b e y o n d chance was o b t a i n e d for ulcers w i t h oozing h e m o r r h a g e a n d a d h e r e n t clots w i t h K = 0.420 a n d 0.426, respectively. Agreem e n t s for n o n b l e e d i n g visible vessels, fiat pigm e n t e d spots, a n d clean-based ulcers w e r e poor, w i t h K = 0.342, 0.393, a n d 0.371, respectively. T h e w e i g h t e d K e s t i m a t e (Kw) a m o n g t h e six categories was 0.426 +_ 0.005, i n d i c a t i n g fair overall agreement. E a c h K e s t i m a t e was t e s t e d for its d e p a r t u r e from t h e v a l u e of 0, which signified c h a n c e a g r e e m e n t alone. All K e s t i m a t e s w e r e significantly g r e a t e r t h a n 0. In addition, t h e m a g n i t u d e s of t h e z-scores for t h e i n d i v i d u a l categories a n d t h e overall measure i n d i c a t e d good reliability of t h e e s t i m a t e s .
DISCUSSION Endoscopic a p p e a r a n c e s of b l e e d i n g peptic ulcers provide v a l u a b l e prognostic i n f o r m a t i o n . T h e y predict r e b l e e d i n g risks a n d s u b s e q u e n t o u t c o m e in p a t i e n t s . 5-9 In t h e e r a of t h e r a p e u t i c endoscopy, t h e i r identification becomes e s s e n t i a l for f o r m u l a tion of t r e a t m e n t strategies; ulcers w i t h active bleeding a n d h i g h - r i s k s t i g m a t a m a n d a t e t r e a t m e n t a n d those w i t h low-risk s t i g m a t a or a clean base can be left u n t r e a t e d . T h e findings of s t i g m a t a of r e c e n t h e m o r r h a g e in a n ulcer c r a t e r would also p i n p o i n t t h e source of b l e e d i n g a n d allow a c c u r a t e p l a c e m e n t of h e m o s t a t i c devices. T h e incidence a n d r i s k of r e b l e e d i n g associated w i t h t h e s e s t i g m a t a v a r y widely in t h e l i t e r a t u r e . T h e i r r e p o r t e d p r e v a l e n c e v a r i e d from 5.6% to 57.8% 8-14 a n d t h e i r incidence of r e b l e e d i n g h a d a n e v e n w i d e r r a n g e of from 0% to 100%.~, s, 9, 11, 1~-24 T h e d i s c r e p a n c y m a y be exp l a i n e d by t h e v a r i a b l e visual i n t e r p r e t a t i o n a m o n g endoscopists. T h e N.I.H. C o n s e n s u s C o n f e r e n c e on VOLUME 46, NO. 1, 1997
Interobserver agreement study on stigmata of hemorrhage
J Lau, J Sung, A Chan, et al.
l'able 2, Pairwise K estimates (K) among 14 international experts in the rating of 100 ulcers Rater
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Rll
R12
R13
R14
R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 Rll R12 R13 R14
1
0.267* 1
0.168" 0.367* 1
0.253* 0.462 0.317" 1
0.298* 0.399* 0.407 0.562 1
0.426 0.500 0.368* 0.419 0.454 1
0.398* 0.502 0.467 0.418 0.431 0.440 1
0.331" 0.641 0.487 0.448 0.456 0.594 0.621 1
0.313" 0.553 0.414 0.388* 0.422 0.557 0.513 0.573 1
0.349* 0.496 0.296* 0.412 0.407 0.439 0.485 0.534 0.532 1
0.344* 0.474 0.380* 0.400* 0.383* 0.511 0.529 0.616 0.520 0.429 1
0.365* 0.493 0.314" 0.364* 0.360* 0.394* 0.537 0.508 0.340* 0.484 0.414 1
0.257* 0.579 0.400 0.323* 0.342* 0.447 0,395* 0.567 0.419 0.455 0.332* 0.401 1
0.363* 0.384* 0.290* 0.451 0.404 0.434 0.480 0.407 0.470 0.464 0.380* 0.344* 0.374* 1
*K estimate < 0.4, indicating poor agreement beyond chance. Table 3. K estimates agreement
(K) over individual stigmata and the weighted K (Kw) estimate indicating the overall
Endoscopic signs Spurting hemorrhage Oozing hemorrhage Nonbleeding visible vessel A d h e r e n t clot Flat pigmented spot Clean based Weighted K (Kw +-- S.D.)
K estimate (K) 0.664 0.420 0.342 0.426 0.393 0.371 0.426 -+ 0.005
peptic ulcer bleeding called for better standardization on terminology of various stigmata. 25 Laine et a l Y conducted a survey at the postgraduate course of the American College of Gastroenterologists. Respondents were shown slides of bleeding ulcers with stigmata. They disagreed in more than a quarter of cases in the labeling of ulcer features. The diagnostic accuracy appeared to be related to the endoscopists' experience and was shown to improve after a brief session of teaching. This particular study was based on the .assumption that the investigators' diagnosis represented the correct answers. Our study attempted to elucidate the interobserver agreement among international experts. It was not designed to arrive at a consensus of correct ulcer labeling among experts. It was hoped that video recordings provided a closer resemblance to real-life situations. The interpretation of edited videotapes, however, entails limitations, some of which were pointed out to us by our international panel of experts. First, some panelists commented that a 20-second recording was too short for full assessment of some of the more difficult ulcers. Second, some expressed the VOLUME 46, NO. i:, 1997
Z-score
Variance
63.356 40.086 32.643 40.610 37.532 35.382 87.270
0.076 0.095 0.095 0.096 0.096 0.067
wish to wash and unveil some ulcers for better viewing before committing to a diagnosis. Views before and after endoscopic washings were included in some but not all ulcers. Third, respondents were forced to commit to a diagnosis by virtue of the study design even when they considered it equivocal. These limitations rendered interpretation of stigm a t a more difficult and might have exaggerated the disagreement among respondents. Some respondents disagreed with the nomenclature used in the study and suggested their own. There were diverse opinions among panelists as to how endoscopic signs of bleeding peptic ulcers should be classified. One panelist suggested that bleeding activ~ity should be graded into ooze, mild, moderate, and severe. A separate category" of "slough" should be introduced. Another panelist chose to ignore red and black dots and placed them in the category of clean-based ulcers. He also argued that clots not covering entire ulcer craters should be construed as visible vessels. A novel classification based on the bleeding activity was suggested by another panelist. Ulcers were categorized into three groups: ones with (1) arterial bleeding (pulsatile), (2) GASTROINTESTINAL ENDOSCOPY 35
J Lau, J Sung, A Chan, et al.
active oozing, or (3) no active bleeding. The ones with active oozing or no active bleeding were further divided according to their associated features: those with visible vessels or protruding thrombus, those with adherent clot, or those with flat ulcer base. The findings of the present report nevertheless confirm the variability among experts in the interpretation of endoscopic appearances in bleeding peptic ulcers. Given any two endoscopists, their agreement was poor more than one third of the time. Although the overall interobserver agreement was fair, agreements over visible vessels, flat pigmented spots, and clean-based ulcers were poor. Extrapolating the disagreement into clinical situations, stigmata considered dangerous by one endoscopist may be disregarded by another. Conversely, stigmata considered benign by one endoscopist may be labeled dangerous and treated by another. The variable agreement also raises concern over the validity of past reports over stigmata of hemorrhage in bleeding peptic ulcers and, in particular, when one generalizes findings from one report to another. The interpretation of endoscopic signs in bleeding peptic ulcers is highly subjective and observer dependent. In the N.I.H. Consensus Conference, a visible vessel was defined as a pigmented protuberance in the ulcer crater. 25 The appreciation of protuberances or elevations from ulcer floors can be subtle; they can be alternately labeled as a flat pigmentation when seen en face. The distinction between nonbleeding visible vessels and clots can also be difficult because both are protuberant. In addition, both nonbleeding visible vessels and clots are likely to represent heterogeneous conditions. Vessels vary in size, depth, and location. The definition of adherent clots, on the other hand, varies with endoscopic washing techniques. Nevertheless, there is little doubt that endoscopic signs in bleeding peptic ulcers are of prognostic importance. The correct interpretation of high-risk stigmata leads to early endoscopic treatment and improves patient outcome. Consensus development panels among institutions may improve uniformity in labeling stigmata of hemorrhage in bleeding peptic ulcers. ACKNOWLEDGMENT
We thank our panel of international experts for taking part in the study. REFERENCES 1. Forrest JAH, Finlayson NDC, Shearman DJC. Endoscopy of upper gastrointestinal bleeding. Lancet 1974;2:394-7. 2. Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971;76:378-82.
36 GASTROINTESTINAL ENDOSCOPY
Interobserver agreement study on stigmata of hemorrhage 3. Fleiss JL. The measurement ofinterrater agreement in statistical methods for rates and proportions. 2nd ed. New York: Wiley;1981. 4. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74. 5. Foster DN, Miloszewski KJA, Losowsky MS. Stigmata of recent hemorrhage in diagnosis and prognosis of upper gastrointestinal bleeding. BMJ 1978;1:1173-7. 6. Griffiths WH, Neumann DA, Welsh JD. The visible vessel as an indicator of uncontrolled or recurrent gastrointestinal hemorrhage. N Engl J Med 1979;300:1411-3. 7. Storey DW, Bown SG, Swain CO. Endoscopic prediction of recurrent bleeding in peptic ulcers. N Engl J Med 1981;305: 915-6. 8. Wara P. Endoscopic prediction of major rebleeding--a prospective study of stigmata of hemorrhage in bleeding ulcer. Gastroenterology 1985;88:1209-14. 9. Chung SCS, Leung JWC, Lo KK, So LYS, Li AKC. Natural history of the sentinel clot: an endoscopic study [abstract]. Gastroenterology 1990;98:A31. 10. Vallon AG, Cotton PB, Laurence BH, et al. Randomized trial of endoscopic argon laser photocoagulation in bleeding peptic ulcers. Gut 1981;22:228-33. 11. Bornman PC, Theodorou NA, Shuttleworth RD, et al. Importance of hypovolaemic shock and endoscopic signs in predicting recurrent haemorrhage from peptic ulceration: a prospective evaluation. BMJ 1985;291:245-7. 12. Swain CP, Kirham JS, Salmon PR, et al. Controlled trial of Nd-YAG laser photocoagulation in bleeding peptic ulcers. Lancet 1986;1:1113-6. 13. O'Brien JD, Day SJ, Burnham WR. Controlled trial of small bipolar probe in bleeding peptic ulcers. Lancet 1986;1:464-7. 14. Matthewson K, Pugh S, Northfield TC. Which peptic ulcer patients bleed? Gut 1988;29:70-4. 15. Papp JP. Endoscopic electocoagulation in the management of upper gastrointestinal tract bleeding. Surg Clin North Am 1982;62:797-805. 16. Panes J, Viver J, Forne M. Controlled trial of endoscopic sclerosis in bleeding peptic ulcers. Lancet !987;2:1292-4. 17. Jensen DM, Machicado G, Kovacs T, et al. Controlled randomized study of heater probe and BICAP for hemostasis of severe ulcer bleeding [abstract]. Gastroenterology 1988;94: A208. 18. Swain CP, Bown SE, Storey DW, et al. Controlled trial of argon laser photocoagulation in bleeding peptic ulcers. Lancet 1981;2:1313-6. 19. Freitas D, Donato A, Monteiro JG. Controlled trial of liquid monopolar electrocoagulation in bleeding peptic ulcers. Am J Gastroenterol 1985;80:853-7. 20. Laine L. Multipolar electrocoagulation for the treatment of ulcers with non-bleeding visible vessels: a prospective, controlled trial [abstract]. Gastroenterology 1988;94:A246. 21. Buset M, Des Marez B, Vandermeeren A, et al. Laser therapy for non-bleedingvisible vessels in peptic ulcer hemorrhage: a prospective randomized study. Gastrointest Endosc 1988;34: 173. 22. Moreto M, Zaballa M, Ibanez S, et al. Efficacy of monopolar electrocoagulation in the treatment of bleeding gastric ulcer: a controlled trial. Endoscopy 1987;19:54-6. 23. Krejs GJ, Little KH, Westergaard LT, et al. Laser photocoagulation for the treatment of acute peptic ulcer bleeding: a randomized controlled clinical trial. N Engl J Med 1987;316: 1618-21. 24. Chang-Chien C, Wu C, Chen P, et al. Different implications of stigmata of recent hemorrhage in gastric and duodenal ulcers. Dig Dis Sci 1988;33:400-4. 25. Therapeutic endoscopy and bleeding ulcers. JAMA 1989;262: 1369-72. 26. Laine L, Freeman M, Cohen H. Lack of uniformity in evaluation of endoscopic prognostic features of bleeding ulcers. Gastrointest Endosc 1994;40:411-7.
VOLUME 46, NO. 1, 1997