Interactive cytometry, chance or evil of bias?

Interactive cytometry, chance or evil of bias?

Path. Res. Pract. 188,391-395 (1992) Interactive Cytometry, Chance or Evil of Bias? G. Burger, M. Aubele, U. JOtting and G. Auer* GSF-Institut fur St...

431KB Sizes 0 Downloads 38 Views

Path. Res. Pract. 188,391-395 (1992)

Interactive Cytometry, Chance or Evil of Bias? G. Burger, M. Aubele, U. JOtting and G. Auer* GSF-Institut fur Strahlenschutz, Neuherberg, FRG and *Immunpathology Institute, Karofinska Hospital, Stockholm, Sweden

SUMMARY Interactive selection of a limited number of cells in imaging cytometry for determining the DNA histogram of breast cancer cells as the best known prognosticator at the moment, implies statistical and systematic sampling problems. Analysis of histograms of 361 breast cancer aspirate specimens measured in two laboratories demonstrate the expected high statistical variations in view of the only 100 cells measured per case but also slight systematic differences. Controlled systematic sampling without pathological bias results in a somewhat higher malignancy grading than selective biased sampling. For this finding we have no explanation. The main result is, however, that we did not find the expected contrary~ which makes the argument for at least this application invalid that expert pathologists are needed for reliable interactive sampling.

Introduction

Imaging cytometry has the main advantage of visual selection of objects to be measured. Especially in tumor specimens it is often stated that a selection of 'tumor cells' provides the most economic way of measuring representative cell populations 2, 9. This is certainly true for most routinely prepared smears, imprints or even histological sections, where in case of automated specimen screening the accurate detection and rejection of all objects not displaying isolated tumor cells is practically unfeasable. The visual recognition of artefacts, agglomerates and blood cells is an easy task. It is slightly more difficult for cells from connective tissue, and even more for benign cells of the same tissue origin. The reason is that not all tumor cells look alike, especially if they stem from different tumor regions of possibly different genetic origin. This can easily result in observer-specific biased cell sampling, which is obviously dangerous when subsequent specimen classification is sensitively dependent on the occurrence of certain cell types. Generally spoken the chance of cell selection bears in itself the evil of inherent violation of basic sampling constraints. The problems arising c~ncern the accuracy of measurement. Another problem which arises mainly for economic reasons when only small numbers of cells are selected and © 1992 by Gustav Fischer Verlag, Stuttgart

measured is that of the precision of measurement. It has to be judged not with respect to the total number of cells measured but rather the amount of those cells which are relevant for classification. The problem is immediately understood when looking upon DNA-histograms, where generally the information on the prominent stemline (e.g. in terms of ploidy) is not sufficient for diagnostic histogram evaluation. In this case, both precision and accuracy are also dependent on correct standardization performed by suitable selection and measurement of internal reference cells. Especially in retrospective studies based on archival material the use of external reference cells is meaningless, internal reference cells (e.g. leukocytes) are sometimes rare and lack of DNA standardization is a serious problem.

Aim of the Study The aim of the study is the investigation of the influence of interobserver sampling variations on patients' prognosis. This requires independent investigations not only on experimental and numerical problems connected with DNA histogram acquisition, but also on prognostic relevant means of histogram gradings . 0344-0338/92/0188-0391$3.50/0

392 . G. Burger et al.

Material and Methods 361 Feulgen-stained fine needle aspirates from patients with invasive breast adenocarcinomas with follow up periods of 10 to 15 years from the Stockholm breast cancer trial have been investigated independently by two labs. In one case (Stockholm) each specimen was carefully screened by an experienced pathologist and the most representative tumor cell regions marked for subsequent cell measurement. Special care was taken on the eventual admixture of benign epithelial cells which could erroneously affect the DNA-measurements. The latter were performed by means of a microspectrophotometer 7,8. In the other case (Munich) all specimens were scanned in a controlled manner and each well preserved epithelial cell was measured. Measurements were performed by TV image cytometry 4. In both cases 100 cells were generally selected. For this study only total nuclear extinction (or integrated optical density, IOD) was analyzed. As internal DNA reference up to 30 leukocytes were measured. No distinction was made between granulocytes and lymphocytes. Both show the same average lOD, the value of which was set at 1.9c.

Results As a first check on agreement between the labs, the two sets of DNA-histograms have been visually classified

:1.00

according to Auer typing as described in the literature! at first by each lab independently and then in a non-paired exercise by joint group consensus. For the consensus typing the following qualitative criteria were applied to the histograms: Auer I: Narrow (obviously homogeneous), low proliferating diploid stemline distributions with only a low amount of tetraploid cells « 10%). The correct positioning of the peak is not important. Any ploidy between 1.S c and 2.S c (defined as the 2c interval) is accepted. It is believed that the 2c-deviations are to a great deal due to standardization problems. Even if the peaks were genetically not truely diploid their prognostic characteristics were assumed to be similar. The main visual criterion is the 2.S c exceeding rate. The overall classification is 'nonproliferating diploid'. Auer II: Significantly broadened peridiploid peaks (possibly genetically not homogeneous) but still low proliferating, or narrow stemlines as in Auer I with any amount of tetraploid cells (4c-interval between 3.Sc and Sc) > 10%.

...

1 1 1 1. 1. 1

.{.

1 1

:I.

BO..

+

+ 1

1. 1. 1.

1.

a; rn

60"

1

1. :I. :l

...

:l 1

:I. 1.

'"

1 11

III

iU

1 :l:l

1.

c

E

'0

.c .>::

()

~

AO ..

...

20 ..

...

...

CI)

ca

a:

Cl

c

:c CI) CI)

... :l:l1.

u

><

1. 1 1

1.

1 1. :l

W

u

It)

C'Ii

:l

0.0 .·t·

0 .. 0

+ .. ,.""+ .. " .. "+ .. B.O 24..

.u ..

... ""~+""""·~ .... ",,+ .. u,,,,+ .. ",."·t,,,, .... +,,"" .. +"

l,O..

56..'

32.

2.5 c Exceeding Rate

72..

..

""+" ....... ~".'''''+"''''''+"

BB..

BO.

Munich Data Set

lO.(,

U2

Fig. 1. Correlation of the 2.5 c exceeding rates for two independent measurements.

Interactive Cytometry . 393

The latter may constitute a Grblock of a diploid stemline and/or a new stemline of resting cells. The 5 c-exceeding rate should in any case be small, if it is not due to a pure octoploid component. Main visual criteria are the S-phase (cells in-between the two main peaks defined as 3 c interval between l.5c and 3.5 c), the G2 amount and the 5 cexceeding rate. The overall classification is generally 'nonproliferating tetraploid'.

Table 1. Comparison of independent measurement and subsequent visual DNA typing of 361 aspirate specimens with consensus classification Stockholm data set Auer I Auer II Auer III Auer IV Total Munich Auer I 57 data Auer II 12 set Auer III 8 Auer IV 0

Auer III: Proliferating diploid populations (marked S-phase) or clear aneuploid, but rather resting stemlines. The 5 cexceeding rate should still not be higher than an estimated 10%. Main visual criteria are the S-phase and the 5cexceeding rate. The overall classification is 'proliferating euploid or nonproliferating aneuploid'.

Total

Table 1 shows the confusion matrix for the consensus evaluation of both measurements. 69.8% of all cases are in

40

6 7 29 12

13 21 121

0

69 77 73 142

75

54

155

361

agreement. The matrix shows a balanced condition, there is no systematic under- or overrating from one side, as it could be caused by systematic deviations between the measured histograms. This is further investigated by analysing the distributions of various cellular components expected to be particularly dependent on biased cell selection and representing one of the above mentioned main classification criteria. Examples are the 2.5 c-exceeding rate or the 5 c-exceeding rate. Figures 1 and 2 show the results. In both figures the straight line of correspondence and the

Auer IV: Any multiploid, aneuploid, heteroploid distribution with a 5c-exceeding rate > 10%. (Often typically described as 'Manhattan skyline'.) The overall classification is 'proliferating aneuploid'.

"+"""",, ..

77

6 45 15 9

"·r ........ ",, ,,'

...

+

...

30

...

·t·

2~:3

+

...

~~O

+

Gi CI)

~

Q

E '0 .c .:.: 0

en

....CIS CI)

a::

+

1.:1.

0

:I.. 2

_. 1. :1.:5

+

2

I

:1.:1.

1.

:?

Cl

c::

is CI) CI)

10 1.

U

><

1

W U

0

u'i

... 1. I

,J"

1.

:I.

...

+:1.

0 ..

Fig. 2. Correlation of the_ 5 c exceeding rates for two independent measurements.

+ .. ,."""" "·t·""""",, "+""""",, ".{.... " .. ,,"" .. I:;~

0 ..

... J"

IO

1.~)

+ .. """"" "·t·" .. ,, ,... " "·t·,." .. ,, ,.""

2~::J

3~:-j

30

5.0 c Exceeding Rate

Munich Data Set

AO

394 . G. Burger et al. Table 2. 2.5 c exceeding rates for two independent measurements

Table 3. 5c exceeding rates for two independent measurements Stockholm data set ,;;; 10% > 10% Total

Stockholm data set < 10% ;;;: 10% Total Munich data set

< 10%

55

30

12 264

294

Total

85

276

361

;;;: 10%

67

Munich data set

x- and y-regression lines are shown. In addition 10% thresholds for the 2.5 c-exceeding rate and 10% thresholds for the 5 c-exceeding rate are shown. The main result is the big variance in the measures, demonstrating a high degree of statistical uncertainty. In both figures slight systematic deviations from strict correspondence can be seen. Table 2 demonstrates that the Munich data set includes less pure diploid cases with 2.5 c-exceeding rates below 10% (67 cases) than the Stockholm data set (85 cases). The situation is similar in Table 3. Here the Munich data set shows systematically a somewhat higher 5 c-exceeding rate than the Stockholm data set . .. "" ...... " .. -t."

4,.2

+

:3.9

...

Of

..

"""

"+" .. " .. ,, ..

"·t" ........ ,, ..

+ .. "" .... " .. -t'"

> 10%

,;;; 10%

242 24

17 78

259 102

Total

266

95

361

Discussion These differences do obviously not result in a marked imbalance of the visual classification as given in Table 1. The nonconcordant classifications in this table may be partly due to quantitative differences in the histograms but partly also to uncertainties in the visual qualitative histogram typing. To avoid these uncertainties and to be able to allocate the different classifications to true histogram deviations an algorithm was developed for automated histogram evaluation based to a great deal upon the above mentioned criteria. These were the 2c, 3c and 4c interval

.. " " ...... '"""" .. " "·t·""""",, .. -t." .. " ..

1

1.

1. 1 :3.6

...

1.

-I-

1 1.

a; en

:3.3

~

c

E "0

:3,,0

...

... :I. 1

.c

~

212

0

~

2.7

1.

+

1

>-

1

1.

0 .......

Q.

C 2 .. 4

LI.I

...

11111

1. 1.

11:1. 1 2

...

2

1.1

...

1.

2

11

1

1.

1 1 :l1

2 .. 1

1.

...

"

...

1

21

1 1

L8

+

... 1 nil

1..8

"+ .......... ,, 2,,1.

"·t·,, " ...... " .. ·to ............ "·t·,, .. "" .... ,,' .. " .. " .. " .. ·f·" ............

2 .. 7

2,,4

Entropy

3,,:3

:3,,0

Munich Data Set

+ ........ " u""

3,,9

3,,6

Fig. 3. Correlation of the entropy measure for two independent measurements.

Interactive Cytometry . 395 Table 4. Confusion matrix of an automatic DNA typing for two independent measurements

Munich data set

Group

Stockholm data set I II III

IV

Total

I II III IV

69 24 3 0

11 61 28 0

0 12 83 17

0 2 13 38

80 99 127 55

Total

96

100

112

53

361

contents and the 5 c exceeding rate. The programme is structered as a decision tree. The limits have been set by trial and error so that four strata were defined which show significantly different survival curves for both data sets. Table 4 shows the new confusion matrix based on an automated approach. This matrix is now markedly imbalanced and overall systematic differences between the two data sets are obvious. Further to the new algorithm which provides a dichotomous histogram typing, other continuous grading algorithms according to Backing (2c-deviation index), Brugal (ploidy balance) and Stenkvist (entropy) have been applied 3, 10, 11. The presentation and discussion of all results is beyond the scope of this paper and published in detail elsewhere6 . As an example only the situation for the entropy measure is shown in Fig. 3. Results of equal partitioning of the distribution into three strata are shown in the confusion matrix of Table 5. Again the matrix is Table 5. Confusion matrix for the entropy measure for two independent measurements Entropy Munich data set

I II III Total

Stockholm data set I II III

Total

82 39 0

11 111 31

0 16 71

93 166 102

121

153

87

361

imbalanced towards a higher malignancy grading in case of the Munich data set. References I Auer GU, Caspersson T, Wallgren A (1980) DNA content and survival in mammary carcinoma. Analyt Quant Cytol Histol 3: 161-165 2 Askensten UG, van Rosen AK, Nilsson RS, Auer GU (1989) Intratumoral variations in DNA distribution patterns in mammary adenocarcinomas: A combined flow cytometric and microspectophotometric study. Cytometry 10: 326-333 3 Backing A, Chatelain R, Biesterfeld S, Noll E, Biesterfeld D, Wohltmann D, Goecke C (1989) DNA grading of malignancy in breast cancer. Prognostic validity, reproducibility and comparison with other classifications. Analyt Quant Cytol 1112: 73-80 4 Burger G,Jiitting U, Gais P, Rodenacker K, Schenck U (1988) Anwendung der hochauflasenden Bildanalyse in der medizinischen Diagnostik. In: Morphometrie in der Zyto- und Histopathologie. Burger G, Oberholzer M, Gassner W (Eds) Springer Verlag Berlin, 200-226 5 Burger G, Jiitting U, Aubele M, Auer G (1989) The role of high resolution cytometry in the prognosis of breast cancer. In: Advances in Analytical Cellular Pathology. Burger G. et al. (Eds) Elsevier Sc. Pub!., 179-180 6 Burger G, Jiitting U (1991) The analytical evaluation of DNA-distributions for breast cancer prognosis. Submitted to ACP 7 Fallenius AG, Auer G, Carstensen JM (1988) Prognostic significance of DNA measurements in 409 consecutive breast cancer patients. Cancer 62: 331-341 8 Fallenius AG, Franzen SA, Auer GU (1988) Predictive value of nuclear DNA content in breast cancer in relation to clinical and morphologic factors. Cancer 62: 521-530 9 Mesker WE, Eysackers MJ, Ouwerkert-van Velzen MCM, Driel-Kulker AM], Ploem JS (1989) Discrepancies in ploidy determination due to specimen sampling errors. Analytical Cellular Pathology 112: 87-95 10 Opfermann M, Brugal G, Vassilakos P (1987) Cytometry of breast carcinoma: Significance of ploidy balance and proliferation index. Cytometry 8: 217-224 II Stenkvist B, Westman-Naeser S, Holmquist ], Nordin B, Bengtsson E, Vegelius], Eriksson 0, Fox CH (1978) Computerized nuclear morphometry as an objective method for characterizing human cancer cell populations. Cancer Research 38: 4688-4697

Received September 27, 1991 . Accepted October 28, 1991

Key words: Interactive cytometry PD Dr. Georg Burger, GSF-Institut fiir Strahlenschutz, Ingolstadter Landstr. 1, D-W-8042 Neuherberg, FRG