Observer Variation in Assessment of Jejunal Biopsy Specimens

Observer Variation in Assessment of Jejunal Biopsy Specimens

GASTROENTEROLOGY 1982;83:1217-22 Observer Variation in Assessment of Jejunal Biopsy Specimens A Comparison Between Subjective Criteria and Morphometr...

2MB Sizes 0 Downloads 35 Views

GASTROENTEROLOGY 1982;83:1217-22

Observer Variation in Assessment of Jejunal Biopsy Specimens A Comparison Between Subjective Criteria and Morphometric Measurement GINO ROBERTO CORAZZA, FIORENZA BONVICINI, MARZIO FRAZZONI, MARISA GATTO, and GIOVANNI GASBARRINI III Patologia Medica, Policlinico Universitario S. Orsola, Bologna, Italy; Istituto di Igiene, Universita di Bologna, Bologna, Italy

In an observer variation study, the agreement rate reached by a pair of observers who measured 85 jejunal biopsy specimens using surface/volume ratio was compared with that reached by a pair of observers who subjectively assessed the same biopsy series. Both the agreement reached between the subjective observers and that reached between the objective observers proved significant by kappa statistics. However, both total and partial kappa values were greater for the pair of objective observers in every diagnostic category. Moreover, the percentage of agreement on the grading of all biopsy specimens, independent of the diagnostic categories, proved significantly higher for the objective observers. In the pair who made a qualitative assessment, interobserver variation was particularly evident in the grading of partial villous atrophy. Contrastingly, in the pair who made a quantitative assessment, agreement was present in all but one slide. The kappa values for the observers examining the same biopsy specimens on two separate occasions indicated closer agreement rates for the two objective observers. It is concluded that morphometric measurement is more reliable than subjective criteria in the assessment of jejunal biopsy specimens and that qualitative assessment of biopsy specimens should be coupled with quantitative histology. Received November 24, 1981. Accepted July 9, 1982. Address requests for reprints to: G. R. Corazza, M.D., III Patologia Medica, Policlinico S. Orsola, 40138 Bologna, Italy. This work was presented in part at the XXIII Congress of the Italian Society of Gastroenterology, Bologna, Italy, July 1981. The authors are grateful to Dr. M. F. Dixon, Department of Pathology, University of Leeds, United Kingdom, for his critical review. © 1982 by the American Gastroenterological Association 0016-5085/82/121217 -06$02.50

Although jejunal mucosal morphometry has been widely used both in gluten-sensitive enteropathy (18) and in many other conditions (9-16), the claims of greater reliability of morphometry over simple subjective evaluation have never been adequately assessed. The purpose of this study was (a) to compare the agreement rate reached by a pair of observers who measured the jejunal biopsy specimens by a morphometric technique to that reached by a pair of observers who subjectively assessed the same biopsy series and (b) to evaluate the intraobserver variation for each of the four observers.

Materials and Methods Intestinal Biopsy Specimens Ninety-six jejunal biopsy specimens were considered for the study, but 11 were discarded according to the criteria of Perera et al. (17) because they were too small or poorly orientated. All the biopsy specimens were taken from the region of the ligament of Treitz using the adult Crosby capsule as part of the routine diagnostic procedure. Sections, fixed in buffered formaline, were processed by standard methods and, after paraffin embedding, were cut at 4-JLm thickness and stained with hematoxylin-eosin. The final clinical diagnoses of the 85 patients whose biopsy specimens were included in the study were: 18 untreated gluten-sensitive enteropathy, 23 gluten-sensitive enteropathy on gluten-free diet, 2 dermatitis herpetiformis, 2 Crohn's disease, 1 Whipple's disease, 1 hypogammaglobulinemia, 1 giardiasis, and 37 subjects whose biopsy specimens excluded evidence of diseases of the small intestine. However, the four independent observers were not given any clinical information as its interpretation could have proven a further source of variation among the observers (18). The 85 slides were therefore identified by a code number.

1218

CORAZZA ET AL.

Histological Assessment Two observers (A and B, henceforth named subjective observers) with several years of experience in small bowel histopathology evaluated the biopsy specimens subjectively, grading the slides as normal, i.e., finger- or leaf-shaped villi, partial villous atrophy (PVA), i.e., villouslike structure still detectable, and subtotal villous atrophy (SV A), i.e., villi no longer detectable. Two observers (C and D, henceforth named objective observers), one trainee and one with several years of experience in small bowel diseases, objectively assessed the slides according to the method of Dunnill and Whitehead (2) using a Weibel graticule (Graticules LTD, Tunbridge Wells, Kent, U.K.) inserted in the microscope eyepiece. The magnification was kept constant throughout the study (X125). At this magnification the length (1) of each line cast on the section was measured with a stage micrometer (1 = 1.7 X 10- 2 cm). A record was made of the number of times the lines cut the mucosal surface (c) and the number of hits (h), i.e., end points of the lines falling on the tissue section between the covering epithelium and the muscularis mucosae (Figure 1). For each specimen the number of hits counted was -200. The ratio of cllh was then calculated and regarded as a measure of surface area to volume ratio.

GASTROENTEROLOGY Vol. 83, No. 6

on the presence of the diagnosis sought (19). Overall agreement is an index that takes into account all slides agreed upon as not having the given diagnosis, whereas specific agreement ignores them and can be interpreted as the conditional probability that one observer will report a specific diagnosis, given that the other did so. As neither overall nor specific agreement rates take into account the varying contribution of chance agreements to observed agreement rates, we also used kappa statistics, which measure the level of agreement not accounted for by chance (20,21). Kappa varies from negative values for less than chance agreement, through 0 for chance agreement, to + 1.0 for perfect agreement. Its sampling characteristics are known, and therefore we applied a modified X2 approach as a separate test of significance, sensitive to differences between the observed and expected patterns of agreement (22). To give additional insight into the data analysis, we partitioned the total kappa (Kt) into a set of partial kappas (Kp) (22). Each Kp evaluates the agreement rates reached by two observers for those slides placed by the first observer in a specific diagnostic category. Finally, the percentage of agreement on the grading of all biopsy specimens, independent of the three diagnostic categories, obtained by the pair of subjective observers was compared with that obtained by the pair of objective observers using the X2 test.

Interobserver and Intraobserver Variation

Results

Variation between the subjective observers (A and

Interobserver Variation

B) was calculated by comparing their pathologic diagnoses made as described above. The objective observers' (C and

D) measurement of the c/lh ratio reached a high correlation coefficient (rs = 0.989, P < 0.001) (Figure 2). Then observer D graded the slides histologically into three groups called normal (40 slides), intermediate (23 slides), and atrophic (22 slides) in accordance with Dunnill and Whitehead (2). In fact, these categories correspond to the groups of normal, PYA, and SVA, respectively. For each of the groups the mean of the clIh values obtained by observer D was calculated (Normal: x = 86.26, range = 57.6-140.5; intermediate: x = 29.35, range = 17.6-49.8; atrophic: x = 9.33, range = 1.3-15.6). The values for the three groups were significantly different (rank sum test, p < 0.001), and there was no overlap. The 10wElr limit of the normal range and the upper limit of the atrophic range were used to partition the c/lh values obtained by observer C into three corresponding groups. This enabled us to compare the results of the two objective observers for the agreement in the diagnosis of each group. All the observers blindly reexamined 16 slides chosen at random after an interval of 3 mo, and their results were compared with those obtained at the first examination.

Statistical Analysis The degree of agreement on a single diagnosis was measured as (a) overall agreement, i.e., the proportion of all slides on which the observers agreed on the presence or absence of such a diagnosis, and as (b) specific agreement, i.e., the proportion of slides on which the observers agreed

Table 1 shows the subjective assignment of the 85 slides to the three diagnostic categories by observers A and B. Table 2 shows the assignment of the 85 slides to the three diagnostic categories on the basis of the cllh ratio obtained by observers C and D. Table 3 shows the agreement rates in the diagnosis reached by the two pairs of observers. For each grade, both the overall and the specific agreement rates were higher between the objective observers. In the pair of observers who made a qualitative assessment of the slides, the interobserver variation was particularly evident in the grading of PV A. This trend was confirmed by kappa statistics. Although the level of agreement was greater than that expected by chance (Kt > 0) for each of the histologic grades and each of the pairs of observers and the pattern of agreement was distributed differently from expectation at a high significance level (p < 0.001), the Kt values were still greater for the objective observers. The percentage of agreement on the grading of all biopsy specimens independent of the three diagnostic categories proved significantly higher for the pair of objective observers (AlB = 85.8%, CID = 98.8%; X2

= 10.07, P < 0.005). Given the fact that intermediate lesions corresponded to PVA and atrophic mucosa to SVA, we calculated the agreement between observer Band

OBSERVER VARIATION IN JEJUNAL BIOPSIES

December 1982

1219

-- /

Figure 1. A. Normal jejunal mucosa. B. Partial villous atrophy. C. Subtotal villous atrophy. The number of cuts made by the superimposed template on the mucosal surface decreases from A through C while the number of hits on the lamina propria increases. (Magnification x 80).

observer D who had also graded the slides subjectively. The agreement did not prove greater than that found between A and B. In fact, the Kt values were 0.54, 0.74, and 0.88 for PVA/intermediate lesions, SVA/atrophic mucosa, and normal mucosa, respectively. Table 4 shows the same data analyzed by Kp. In the pair of subjective observers, the lowest Kp values were found in the case of PYA (Kp = 0.51, 0.75J. In particular, when observer A made this diagnosis, observer B disagreed notably (Kp = 0.51). The AlB agreement was higher in the case of SVA and normal mucosa (Kp = 0.93 and 0.90, respectively). On the

contrary, when observer B made these diagnoses, the agreement rate was lower (Kp = 0.76 and 0.82, respectively), meaning that some slides graded by observer A as PVA were graded by observer B as SVA or normal mucosa. In the pair of objective observers, agreement was present in all but one slide. This was diagnosed as atrophic by observer C and intermediate by observer D. Intraobserver Variation Table 5 shows Kt values for each of the four observers who examined the same 16 biopsy speci-

1220

CORAZZA ET AL.

GASTROENTEROLOGY Vol. 83, No.6

.c 140

~ C 120 a: w > a:

rs

=

p

< 0.001

0.989

w 100

I/)

III

0

Figure 2. Correlation between measurements of the c/lh ratio by observers C and 0 on 85 intestinal biopsies.

00 00

80

o o

:

60

o

..

,

o

0

o

o

o o

40

. ..-,: o

20

o

o ,

000

0".... ~o

-.;. .

20

mens on two separate occasions. Once again a closer agreement rate was reached by observers C and D. Observer D reached an absolute agreement rate.

Discussion Jejunal mucosal changes are usually described in subjective terms. However, it has been claimed that even a severe abnormality such as a flat mucosa might represent a source of diagnostic disagreement among expert observers (23). In addition, mild mucosal changes have been reported in gluten-sensitive enteropathy (24-26) and in many other conditions including dermatitis herpetiformis, tropical sprue, Whipple's disease, intestinal lymphoma, giardiasis, and eosinophilic gastroenteritis (27), and the detection of these lesser degrees of mucosal abnormality

40

60

80

100

120

140

OBSERVER

C

160 cjlh

constitutes a frequent puzzle for pathologists. In the presence of partial villous atrophy, pathologists are sometimes reluctant to define a biopsy specimen as abnormal, yet a uniform diagnostic behavior is required in the assessment of conditions like glutensensitive enteropathy, which imply a serial assessment of biopsy specimens and lifelong therapeutic decisions. Observer variability has recently been evaluated in the interpretation of liver (28,29) and esophageal biopsy specimens (30), but it has been barely attempted in small intestine biopsy specimens. As far as the quantitative histology is concerned, Risdon and Keeling (4) found a highly significant correlation for the c/lh ratio between two objective observers, but confined their analysis to the two sets of numerical results. By dividing our numerical results into three grades and by a complete statistical analysis of the interobserver agreement within each of the grades, we have shown the

Table 1. Subjective Histologic Grading on 85 Intestinal Biopsy Specimens: Observer A vs. Observer B Table 2. Objective Histologic Grading on 85 Intestinal Biopsy Specimens: Observer C vs. Observer D

Observer A Observer B

Normal

PVAa

SVA b

Total

Normal PYA" SVAb

37 2 0

4 14

5

0 1 22

41 17 27

Total

39

23

23

85

a

PV A phy.

=

Partial villous atrophy.

b

SV A

=

Subtotal villous atro-

Observer C Observer D

Normal

Intermediate

Atrophic

Total

Normal Intermediate Atrophic

40 0 0

0 22 0

0 1 22

40 23 22

Total

40

22

23

85

OBSERVER VARIATION IN JEJUNAL BIOPSIES

December 1982

Table 5. Intraobserver Variation in 16 Biopsy Specimens on Two Separate Occasions

Table 3. Interobserver Variation for Each Pair of Observers Specific Total Significance Overall >chance agreement agreement kappa Observers A and B Partial villous atrophy Subtotal villous atrophy Normal mucosa Observers C and D Intermediate Atrophic Normal

0.86

0.69

0.61

0.001

0.93

0.88

0.83

0.001

0.93

0.92

0.86

0.001

0.99 0.99 1.00

0.98 0.98 1.00

0.97 0.97 1.00

0.001 0.001 0.001

impressive reliability of objective measurements. Furthermore, in the comparison of the agreement between the subjective and objective pairs of observers, the percentage of agreement on the grading of all biopsy specimens, independent of the three diagnostic categories, proved significantly higher in the pair of observers who assessed the biopsy specimens morphometrically. Moreover, in each of the diagnostic categories, the interobserver agreement between the objective observers was always higher than that obtained by the two subjective observers. This was particularly obvious in the case of moderate mucosal lesions. This is not surprising because although the agreement between the paired reportings was not statistically measured, previous studies on the qualitative assessment of jejunal biopsy specimens have reported a remarkable interobserver variation for moderate changes (5,6,31). It is therefore very impressive that in the evaluation of those biopsy specimens graded as intermediate by observer D, disagreement with the other objective observer was present in only one slide out of 23, and it must be pointed out that even in this case the cllh values were very close. The good agreement between one trainee and one experienced observer confirms the suggestion that

Table 4. Interobserver Variation for Each Pair of Observers (Partial Kappa)

AlB

Significance >chance

BIA

Significance >chance

Partial villous atrophy Subtotal villous atrophy Normal mucosa

0.51 0.93 0.90

0.05 0.05 0.05

0.75 0.76 0.82

0.05 0.05 0.05

Intermediate Atrophic Normal

1.00 0.94 1.00

0.05 0.05 0.05

0.94 1.00 1.00

DLC.-

CLIL

1221

0.05 0.05 0.05

Observers

Total kappa

Significance >chance

A B C D

0.63 0.89 0.91 1.00

0.005 0.001 0.001 0.001

observers with less experience can be helped by morphometric measurements (16). The interobserver agreement between our subjective observers, although lower than that found between the objective observers, was supported by kappa values significantly higher than those expected by chance, confirming the experience of the observers. The fact that the agreement rate reached by Band D (the objective observer who had also assessed the biopsy specimens subjectively) was not higher than that by A and B further confirms that the subjective evaluation per se and not the skill of individual observers determines the higher variation always present between the subjective observers. As far as the intraobserver variation is concerned the lesser agreement found for the subjective observ~ ers raises some doubts on the validity of qualitative assessment of jejunal biopsy specimens in the diagnosis of gluten-sensitive enteropathy, and emphasizes the need to make direct comparisons between serial biopsy specimens and not to report individual biopsy specimens in isolation. The c/lh values obtained by observer D for the intermediate group (Methods) did not overlap with the normal or atrophic grades. This is in keeping with previous results from Dunnill and Whitehead (2) and confirms that the method is helpful in detecting even mild villous abnormalities. Other authors (4,6) failed to get a similar discriminatory power, but both the studies were performed on children and, as relative immaturity of mucosal morphology in young children is documented (4,16), a partial overlap between young controls and children with mild degrees of jejunal abnormalities is not an unexpected finding. In conclusion, we have shown the greater reliability of quantitative histology, and we believe that the higher inter- and intraobserver variation shown within the pair of subjective observers may also be relevant in the accuracy of diagnosis. If it is true that there is no guarantee that a reliable assessment is a valid one, it is also true that a less reliable assessment must be less valid. Hence, we propose that, whenever possible, qualitative assessment of jejunal biopsy specimens should be coupled with quantitative histology.

1222

GASTROENTEROLOGY Vol. 83, No.6

CORAZZA ET AL.

References 1. Stewart JS, Pollock DJ, Hoffbrand AV, et al. A study of proximal and distal intestinal structure and absorptive function in idiopathic steatorrhoea. Q J Med 1967;36:425-44. 2. Dunnill MS, Whitehead R. A method for the quantitation of small intestinal biopsy specimens. J Clin Pathol (Lond) 1972;25:243-6. 3. Chapman BL, Henry K, Paice F, et al. Measuring the response of the jejunal mucosa in adult coeliac disease to treatment with a gluten-free diet. Gut 1974;15:870-4. 4. Risdon RA, Keeling JW. Quantitation of the histological changes found in small intestinal biopsy specimens from children with suspected coeliac disease. Gut 1974;15:9-18. 5. Scott BB, Losowsky MS. Patchiness and duodenal-jejunal variation of the mucosal abnormality in coeliac disease and dermatitis herpetiformis. Gut 1976;17:984-92. 6. Glasgow JFT, Corkey CWB, Molla A. Critical assessment of small bowel biopsy in children. Arch Dis Child 1979;54: 604-8. 7. Slavin G, Sowter C, Robertson K, et al. Measurement in jejunal biopsies by computer-aided microscopy. J Clin Pathol 1980;33:254-61. 8. Rosekrans PCM, Meyer CJLM, Polanco 1, et al. Long-term morphological and immunohistochemical observations on biopsy specimens of small intestine from children with gluten-sensitive enteropathy. J Clin Pathol1981;34:138-44. 9. Binder V, Soltoft J, Gudmand-Hoyer E, Histological and histochemical changes in the jejunal mucosa in ulcerative colitis. Scand J Gastroenterol1974;9:293-7. 10. Dano p, Vagu Nielsen 0, Petri M, et al. Jejunal morphology and mucosal enzyme activity following intestinal shunt operation for obesity. Scand J Gastroenterol1976;11:129-34. 11. Dunne WT, Cooke WT, Allan RN. Enzymatic and morphometric evidence for Crohn's disease as a diffuse lesion of the gastrointestinal tract. Gut 1977;18:290-4. 12. Wright SG, Tomkins AM. Quantitative histology in giardiasis. J Clin Pathol1978;31:712-6. 13. Riecken EO, Zewnek A, Lay A, et al. Quantitative study of mucosal structure, enzyme activities and phenylalanine accumulation in jejunal biopsies of patients with early and late onset diabetes. Gut 1979;20:1001-7. 14. Bateson MC, Hopwood D, Mac Gillivray JB. Jejunal morphology in multiple sclerosis. Lancet 1979;i:l108-10. 15. Rosekrans PCM, Meyer CJLM, Cornelisse CJ, et al. Use of morphometry and immunohistochemistry of small intestinal

16.

17. 18. 19.

20. 21. 22.

23.

24. 25. 26.

27. 28.

29.

30.

31.

biopsy specimens in the diagnosis of food allergy. J Clin Pathol 1980;33:125-30. Penna FJ, Hill ID, Kingston D. Jejunal mucosal morphometry in children with and without gut symptoms and in normal adults. J Clin PathoI1981;34:386-92. Perera DR, Weinstein WM, Rubin CE. Small intestinal biopsy. Hum PathoI1975;6:157-217. Koran LM. The reliability of clinical methods, data and judgments. N Engl J Med 1975;293:695-701. Fletcher CM. The problem of observer variation in medical diagnosis with special reference to chest diseases. Methods Inf Med 1965;3:98-103. Cohen JA. Coefficient of agreement for nominal scales. Educ Psychol Measmt 1960;20:37-46. Spitzer RL, Fleiss JL. A re-analysis of the reliability of psychiatric diagnosis. Br J Psychiatry 1974;125:341-7. Light RJ. Measures of agreement for qualitative data: some generalizations and alternatives. Psychol Bull 1971;76:36577. Shmerling DH. Questionnaire of the ESPGAN on coeliac disease (discussion). In: McNicholl B, McCarthy CF, Fottrell PF, eds. Perspectives in coeliac disease. Lancaster: MTP Press, 1978:245-9. Challacombe DN, Dawkins PD, Baylis JM, et al. Small-intestinal histology in coeliac disease. Lancet 1975;i:1345. Dellipiani AW. Small-intestinal histology in coeliac disease. Lancet 1975;ii:550. Scott BB, Losowsky MS. Coeliac disease with mild mucosal abnormalities: a report of four patients. Postgrad Med J 1977;53:134-8. Brandborg LL. Histologic diagnosis of diseases of malabsorption. Am J Med 1979;67:999-1006. Orlandi F, and the study group of randomized clinical trials. Observer error in morphological diagnosis of chronic active hepatitis and cirrhosis. Ital J Gastroenterol1979;11:5-8. Theodossi A, Skene AM, Portman B, et al. Observer variation in assessment of liver biopsies including analysis by kappa statistics. Gastroenterology 1980;79:232-41. Adami B, Eckardt VF, Paulini K. Sampling error and observer variation in interpretation of esophageal biopsies. Digestion 1979;19:404-10. Rubin CE, Brandborg LL, Phelps PC, et al. Studies of celiac disease. 1. The apparent identical and specific nature of the duodenal and proximal jejunal lesion in celiac disease and idiopathic sprue. Gastroenterology 1960;38:28-49.