Quantifying Interobserver Variation in Target Definition in Palliative Radiotherapy

Quantifying Interobserver Variation in Target Definition in Palliative Radiotherapy

Int. J. Radiation Oncology Biol. Phys., Vol. 80, No. 5, pp. 1498–1504, 2011 Copyright Ó 2011 Elsevier Inc. Printed in the USA. All rights reserved 036...

373KB Sizes 0 Downloads 35 Views

Int. J. Radiation Oncology Biol. Phys., Vol. 80, No. 5, pp. 1498–1504, 2011 Copyright Ó 2011 Elsevier Inc. Printed in the USA. All rights reserved 0360-3016/$–see front matter

doi:10.1016/j.ijrobp.2010.04.014

CLINICAL INVESTIGATION

Palliation

QUANTIFYING INTEROBSERVER VARIATION IN TARGET DEFINITION IN PALLIATIVE RADIOTHERAPY DANIEL GRABARZ, M.D.,*y TONY PANZARELLA, M.SC.,z ANDREA BEZJAK, M.D., M.SC., F.R.C.P.,y MICHAEL MCLEAN, M.D., F.R.C.P.,y CHRISTINE ELDER, M.B.CH.B., F.R.A.N.Z.C.R.,yx y AND REBECCA K. S. WONG, M.B.CH.B., M.SC., F.R.C.P. *Centro Oncologia Mendel & Associados; yRadiation Medicine Program, Princess Margaret Hospital, University of Toronto, Toronto, Ontario, Canada; zBiostatistics Department, Princess Margaret Hospital, and Dalla Lana Scholl of Public Health, University of Toronto, Toronto, Ontario, Canada; and xOncology Department, Auckland City Hospital, New Zealand Purpose: To describe the degree of interobserver and intraobserver variability in target and field definition when using three-dimensional (3D) volume– vs. two-dimensional (2D) field–based planning. Methods and Materials: Standardized case scenario and diagnostic imaging for 9 palliative cases (3 bone metastases, 3 palliative lung cancer, and 3 abdominal pelvis soft-tissue disease) were presented to 5 study radiation oncologists. After a decision on what the intended anatomic target should be, observers created two sets of treatment fields, first using a 2D field–based and then a 3D volume–based planning approach. Percent overlap, undercoverage, and over-coverage were used to describe interobserver and intraobserver variations in target definition. Results: The degree of interobserver variation for 2D and 3D planning was similar with a degree of overlap of 76% (range, 56%–85%) and 74% (range, 55%–88%), respectively. When comparing the treatment fields defined by the same observer using the two different planning methods, the mean degree of overlap was 78%; over-coverage, 22%; and under-coverage, 41%. There was statistically significantly more under-coverage when field-based planning was used for bone metastases (33%) vs. other anatomic sites (16%) (p = 0.02). In other words, 2D planning is more likely to result in geographic misses in bone metastases compared with other areas. Conclusions: In palliative radiotherapy clinically significant interobserver and intraobserver variation existed when using both field- and volume-based planning approaches. Strategies that would reduce this variability deserve further investigation. Ó 2011 Elsevier Inc. Palliative radiotherapy, Interobserver variability, Radiotherapy planning, Fluoroscopy, Computed tomography.

INTRODUCTION

based approach, where treatment field borders are placed based on anatomic landmarks (e.g., 5 cm below the carina for defining inferior borders intended to capture mediastinal lymph nodes). This has evolved more recently to virtual simulation (4), the use of digital radiographs generated from planning computed tomography (CT), and full volumebased planning (5), where the complete three-dimensional (3D) CT data sets are used to target delineation and dosimetric assessment. The efficacy and toxicity of any radiotherapy treatment are a function of many factors including dose fractionation, target definition, and treatment technique. Although the concepts of gross tumor volume (GTV), clinical target volume (CTV), and planning target volume (PTV) as described in International Commission on Radiation Units and Measurements Report 62 (5) are the cornerstone of radiotherapy practice today, especially where curative treatments are

Approximately 30% to 50% of cancer patients require radiotherapy (1), with an estimated 20% to 50% being delivered with a palliative intent (1, 2). The treatment technique used to deliver palliative radiotherapy is in part influenced by the intended dose. ‘‘Simple’’ palliative radiotherapy treatment plans can be operationalized as treatment techniques using no more than two beams, without the use of conformal or intensity-modulated radiotherapy planning algorithms. The clinical circumstances in which this approach is appropriate are common, representing approximately 50% to 70% of palliative radiotherapy courses being delivered in contemporary radiotherapy departments (3). Historically, the planning of simple palliative radiotherapy has traditionally been accomplished by use of a fluoroscopy-

Conflict of interest: none. Received Aug 5, 2009, and in revised form April 1, 2010. Accepted for publication April 3, 2010.

Reprint requests to: Rebecca K. S. Wong, M.B.Ch.B., M.Sc., F.R.C.P., Radiation Medicine Program, Princess Margaret Hospital, University of Toronto, 610 University Ave., Toronto, Ontario, Canada M5G 2M9. Tel: (416) 946-2126; Fax: (416) 946-2111; E-mail: [email protected] 1498

Interobserver variability in palliative RT d D. GRABARZ et al.

concerned (6), their application in the palliative setting is less explicit. For example, GTV choice may reflect radiation oncologists’ estimate of which part of the visible tumor was accounting for the existing or impending symptoms. The principles that should guide optimal CTV definition for different palliative circumstances are not well established. All of these factors can account for substantial variation in the final choice of treatment target volumes in palliative patients. The goal of this project was to explore the degree of interobserver variability in target definition for common palliative scenarios, as well as to describe the degree of intraobserver variability when using a two-dimensional (2D) field–based vs. 3D volume–based approach for treatment field definition. This project was a companion study to a larger project designed to develop and examine the use of a cone-beam CT–enabled one-step online planning process for palliative radiotherapy (7–9). METHODS AND MATERIALS For the purpose of our study, 2D, field, or digital reconstruction radiograph (DRR)–based planning refers to the placement of treatment fields based on estimation of the location of the gross treatment target, CTV, and PTV by use of bony landmarks. This approach has been and remains the standard practice in many centers for simple palliative radiotherapy. Three-dimensional planning refers to the use of 3D anatomic information made available through planning CT to allow GTV, CTV, and PTV delineation. Nine patients and five observers formed the subject of this study. The 9 patients, 3 patients for each of the 3 anatomic areas (spine, mediastinum, and abdomen and pelvis), were accrued as part of a clinical study designed to develop and examine the use of a cone-beam CT– enabled one-step online planning process for palliative radiotherapy. Patients were eligible for the study if they were being considered for palliative radiotherapy and were suitable for a parallel pair beam arrangement and palliative dose fractionations (1–10 fractions). Five observers (three staff radiation oncologists and two radiation oncology fellows) performed the target delineation exercise. The project was approved by the University Health Network Research Ethics Board. All patients provided informed consent. Standardized clinical information and diagnostic images were provided. Observers were instructed to decide what the target area of interest should be before commencing the delineation exercise. Each observer generated treatment fields for all 9 study cases based on two methods consecutively. First, for the 2D, field, or more accurately, DRR-based approach, observers were asked to place treatment fields using DRR only (TFDRR). Fields and shielding were placed to allow adequate coverage of the intended target volume by 95% of the prescribed dose. This approach mimicked conventional fluoroscopybased planning. Second, treatment fields were created by use of 3D-based planning. After definition of GTV, a standardized margin of 1.5 cm (assuming 5 mm for CTV, 5 mm for PTV, and 5 mm for penumbra) was used to generate the treatment fields (TF3D). Observers were instructed not to modify the TFDRR after viewing the 3D information. Axial, sagittal, and coronal reconstruction images were available for the 3D planning. Windowing, zooming functions were available to optimize images as clinically indicated. No contrast was used. Image sets were exported to the Pinnacle planning platform (Pinnacle3, version 8.0, from Philips Medical Systems, Madison, WI) for the purpose of field and target delineation. Volumes and treatment fields created were used for study purposes only.

1499

Data analysis For each of the 9 cases, TFDRR, GTV, and TF3D were generated by each of the 5 observers. Variability across treatment fields (TFDRR and TF3D) can be described both in terms of their absolute area and by taking into account their spatial relationship to each other. This also applied to GTV volumes. To describe the degree of interobserver variability, two sets of metrics were used. First, we compared the absolute areas by computing the variation coefficient (VC). This was calculated as the ratio between the standard deviation (SD) and the mean of a set of values (VC = SD/ Mean). A smaller value represented less variation across observers (10, 11). The same approach was used for describing variation across a set of target volumes. Second, when spatial location was considered, the concept of degree of overlap, over-coverage, and under-coverage was used. To allow comparison between cases, standardization of the values was required. This can be achieved by calculating the percent overlap, over-coverage, and under-coverage against a standard. Because all observers were equally qualified to define the region, there was no gold standard when considering interobserver variations. To address this, we calculated two values for each pair of observers, using the region defined by each observer in turn as the denominator. For each case, with 5 observers, there were 20 values for percent overlap, under-coverage, and over-coverage (Fig. 1). For intraobserver variation when using two different planning strategies (i.e., 2D DRR–based vs. 3D volume–based planning), we used a similar methodology. However, the treatment fields that were defined using 3D planning were used as the denominator. Significance testing was performed by use of nonparametric Wilcoxon rank sum test (exact) when two groups were compared and the Kruskal-Wallis test when more than two groups were compared. The Friedman test was used to compare related samples.

A

B

A not B A&B

B not A

Contour by observer A Overlap between A & B Contour by observer B

Overlap Undercoverage Overcoverage

= (A&B)/A, (A&B)/B = (B not A)/B = (A not B)/B.

Fig. 1. Definition of overlap, over-coverage, and under-coverage.

1500

I. J. Radiation Oncology d Biology d Physics

RESULTS Patient and observer characteristics Nine consecutive cases fulfilling the inclusion criteria were accrued, three for each of the three anatomic areas (spine, lung, and abdominal pelvis soft-tissue disease). Patient characteristics are shown in Table 1. The spine cases included one patient (Case S1) with significant soft-tissue disease around L2 overlying the kidneys, whereas 2 cases (Cases S2 and S3) represented patients with multiple vertebral bone metastases requiring decisions on which levels to include as part of the target volume. Regarding the lung cases, 2 patients (Cases L1 and L2) had significant mediastinal lymphadenopathy and one (Case L3) had significant postobstructive atelectasis requiring a decision on the extent of disease that should be included. The abdomen/pelvis cases included soft-tissue diseases that were relatively well defined. Interobserver variation in 2D field–based planning Observers were asked to place treatment fields they would use to treat each of the cases. The fields were estimated to provide adequate coverage of the target volume with at least 95% coverage. TFDRR refers to the treatment fields defined by use of digital radiograph or 2D-based planning. The mean interobserver (5 observers) field area ranged from 57.2 to 194.4 cm2 across the 9 cases. The mean coefficient of variation (VC= SD/Mean) was 0.22 (range, 0.11–0.54) (Table 2). When the spatial relationship between the fields was taken into account, the mean percentage overlap was 76% (range, 56%–85%). The consistency was worst for AP2 (para-aortic nodal mass overlying kidneys), where percent overlap was 56% (SD, 36), and best for L3 (hilar mass with postobstructive pneumonitis), where percent overlap was 85% (SD, 8) (Table 2). Interobserver variation in 3D volume–based planning Variation in GTV. When the observers used 3D volume– based planning, the mean GTV volumes ranged from 29 to

Volume 80, Number 5, 2011

599 cm3 (Table 3). The mean VC (SD/Mean) was 0.34 (range, 0.09–0.66). When the spatial relationship was taken into consideration, there was no significant difference in the degree of interobserver variation across clinical sites (mean overlap being 57% [SD, 13], 65% [SD, 13], and 77% [SD, 9] respectively [p = 0.13]). Similarly, there was no clinically significant relationship according to observer expertise. Variation in TF3D. TF3D, treatment fields were defined by use of GTV with a uniform expansion for treatment field definition (Table 4). The mean field size across observers for the 9 study cases ranged from 60 to 194.2 cm2. The mean coefficient of variation was 0.49 (range, 0.1–0.66). The mean percentage overlap was 74% (range, 55%–88%). Field– vs. volume (3D)–based planning When considering the treatment fields, as defined by the same observer but using the two different planning approaches (i.e., 2D- or 3D-based planning), the degree of overlap between TFDRR and TF3D was 78% (range, 65%–90%). There was no significant difference in the degree of interobserver variability across disease sites (p > 0.21) or across observers (p > 0.05) (Fig. 2). The reasons for the variability appear different across the sites. For example, for spines, the percent overlap was 67%. The discrepancy was predominantly because of undercoverage (mean under-coverage for spine, lung, and abdomen/pelvis was 34%, 15%, and 16.7%, respectively; p = 0.02). In other words, the TFDRR was smaller than the TF3D. If we assume 3D target was the gold standard, 2D planning was more likely to result in geographic misses in the planning of bone metastases. On the other hand, for lung and abdomen/pelvis, the degree of agreement was 85% and 83.3%, respectively, with the discrepancy predominantly attributed to over-coverage (mean overcoverage, 48.3% and 58%, respectively; p = 0.08), that is, 2D-based treatment fields tended to be larger or more normal tissues were treated when compared with 3D-based treatment fields (Table 5).

Table 1. Case characteristics Case

Target area

S1 (Case 1) S2 (Case 4)

Spine Spine

S3 (Case 9)

Spine

L1 (Case 2) L2 (Case 5) L3 (Case 8)

Lung Lung Lung

AP1 (Case 3)

Abdomen/pelvis

AP2 (Case 6)

Abdomen/pelvis

AP3 (Case 7)

Abdomen/pelvis

History Colon cancer; back pain; retroperitoneal nodes and L2 destruction Breast cancer; back pain localized to right rib and C7; numbness in left arm; multiple metastases in spine seen on CT (T5–6, 9, 11, L1–2), with epidural disease at T8, 10, and 12 on MRI Prostate cancer (hormone refractory); back pain localized around T7–8; diffuse metastases in spine with narrowing of canal at T7–8 Non–small-cell lung cancer; cough; right upper lobe mass with mediastinal nodes Non–small-cell lung cancer; shortness of breath; right hilar mass Non–small-cell lung cancer; shortness of breath; left upper lobe mass with soft tissue around left main bronchus, mediastinal nodes, and postobstructive pneumonia Rectal cancer; rectal bleeding; rectal mass, 3.2 cm; right external iliac node causing mild right hydronephrosis Renal cancer; left flank pain; primary previously resected; left para-aortic mass; known liver, lung, and mediastinal metastases Colon cancer; anemia/GI bleed; cecal mass with local regional nodes; known liver metastases

Abbreviations: CT = computed tomography; MRI = magnetic resonance imaging; GI = gastrointestinal.

Interobserver variability in palliative RT d D. GRABARZ et al.

Table 2. Interobserver variation in TFDRR Patient No. Spine S1 S2 S3 Mean across site Lung L1 L2 L3 Mean across site Abdomen or pelvis AP1 AP2 AP3 Mean across site Mean

Table 4. Interobserver variation in TF3D

Mean TFDRR (SD) (cm2)*

Variation coefficientyz

Percent overlap (SD)x

140.7 (17.8) 73.8 (23.4) 57.2 (12.6) 90.6

0.13 0.32 0.22 0.22

77 (16) 79 (18) 77 (15) 77.7

72.7 (2.6) 84.2 (19.2) 108.4 (11.4) 88.4

0.04 0.23 0.11 0.13

84 (6) 71 (19) 85 (8) 80

194.4 (43.1) 102.6 (55.1) 167 (33) 154.7 111.2

0.22 0.54 0.2 0.32 0.22

79 (13) 56 (36) 74 (15) 69.7 75.8

Abbreviation: TFDRR = treatment fields as defined using digital radiography. * TFDRR, p = 0.23 (significance testing across treatment site). y Variation coefficient = SD/Mean TFDRR. z Variation coefficient, p = 0.45 (significance testing across treatment site). x Percent overlap, p = 0.52 (significance testing across treatment site).

DISCUSSION Clinically significant variations in target delineation across observers deserve scrutiny, be it in the curative or palliative setting. Our study highlighted the presence and degree of variation in defining the target for simple palliative radiotherapy Table 3. Interobserver variation in GTV volume

Case No.

Mean GTV volume (SD) (cm3)*

Spine S1 599 (136.5) S2 93.2 (50.4) S3 95.2 (63.3) Mean across site 262.5 Lung L1 97 (20.9) L2 29.1 (9.5) L3 73.8 (47.7) Mean across site 66.6 Abdomen and pelvis AP1 120.8 (22.3) AP2 61.4 (11.6) AP3 368.8 (32.6) Mean across site 183.7 Mean 170.9

1501

Variation Percent coefficientyz overlap (SD)x 0.23 0.54 0.66 0.48

68 (12.8) 41.8 (29.1) 60.2 (30.8) 57

0.22 0.33 0.65 0.4

73.6 (12.5) 71.5 (20.6) 50.3 (29.6) 65

0.18 0.19 0.09 0.15 0.34

68.3 (11.9) 76 (10.9) 85.3 (5.8) 77 66.3

Abbreviation: GTV = gross tumor volume. * GTV volume, p = 0.51 (significance testing across treatment site). y Variation coefficient = SD/Mean GTV. z Variation coefficient, p = 0.05 (significance testing across treatment site). x Percent overlap, p = 0.13 (significance testing across treatment site).

Case No. Spine S1 S2 S3 Mean across site Lung L1 L2 L3 Mean across site Abdomen and pelvis AP1 AP2 AP3 Mean across site Mean

Mean TF3D (SD) (cm2)*

Variation coefficientyz

Percent overlap (SD)x

194.2 (50.7) 94.1 (33.3) 69.5 (33.8) 119.3

0.26 0.35 0.49 0.37

71 (18) 62 (29) 75 (22) 69.3

73.3 (7.6) 60 (13.5) 89.5 (32.3) 74.3

0.1 0.23 0.36 0.23

88 (8.2) 55 (35) 85 (8.1) 76

132.4 (12.2) 84.8 (56.3) 148.4 (8.9) 121.9 105.2

0.1 0.66 0.1 0.29 0.49

88 (6.2) 55 (42) 83 (9.3) 75.3 73.5

Abbreviation: TF3D = treatment field as defined using three-dimensional–based planning. * TF3D, p = 0.34 (significance testing across treatment site). y Variation coefficient = SD/Mean TFDRR. z Variation coefficient, p = 0.67 (significance testing across treatment site). x Percent overlap, p = 0.77 (significance testing across treatment site).

across observers. The degree of agreement across observers was similar whether a 2D or 3D approach was used. The degree of overlap was 76% (range, 56%–85%) and 74% (range, 55%–88%), respectively. When comparing the degree of intraobserver variation but using two different planning strategies, we found that the degree of overlap ranged from 65% to 90%. We also found a tendency to use larger treatment fields for the lung and the abdomen and pelvis when using DRR for planning. There were no significant differences in the degree of interobserver variability related to observer experience, anatomic location, and the two different treatment planning approaches. Different metrics and methodology for describing interobserver variation have been used. In general, an ‘‘average’’ region of interest based on all observers (12–14), with the use of a composite measure to account for variations in all directions or along key directions (e.g., cranial caudal, lateral, and anterior–posterior directions), using summary statistics such as coefficient of variation (10, 11), degree of overlap (15), and under-coverage and over-coverage (16), has been described. These metrics are not directly comparable across studies because they are sensitive to the number of observers and the opinion of the least consistent observer. In our study a modified approach was used based on pairwise comparison across each pair of observers. This has the advantage of being less sensitive to the impact of outliers compared with the approach of using the union across all observers as the standard against which each observer is compared. What is the clinical significance of VC and percentage overlap? Unfortunately, there was no standard criterion

I. J. Radiation Oncology d Biology d Physics

1502

Volume 80, Number 5, 2011

100

80

60

40

20

0 S1

S2

S3

L1

L2

L3

AP1

AP2

AP 3

Fig. 2. Graph planning the percent overlap between treatment fields as defined by digital reconstruction radiograph (DRR) (TFDRR) and three-dimensional (3D) volume–based (TF3D) planning. The solid circles represent the mean, and the horizontal line within the box is the median (50th percentile). The box above and below the horizontal line represents the interquartile (25th and 75th percentile) range, and the whiskers that extend from the box represent 1.5 times the interquartile range. Data sets with data points that are extreme deviations beyond this are considered outliers and are represented by open circles. The degree of interobserver variation for overlap between TFDRR and TF3D was not significant (p > 0.21).

available. In the radical setting, VCs of 0.2 were observed when using CT information and 0.14 when using magnetic resonance imaging information for GTV definition in headand-neck cancer (10). A different way of getting a handle Table 5. Comparison of agreement between TFDRR and TF3D

Case No. Spine S1 S2 S3 Mean across site Lung L1 L2 L3 Mean across site Abdomen and pelvis AP1 AP2 AP3 Mean across site Mean

Mean percent Mean percent Mean percent overlap under-coverage over-coverage (SD) (SD) (SD) 68 (14) 65 (16) 67 (19) 66.7*

32 (14) 35 (16) 33 (19) 33.3y

8 (7) 37 (79) 8 (8) 17.6z

82 (6) 86 (9) 87 (11) 85

18 (6) 14 (9) 13 (11) 15

17 (5) 73 (27) 55 (69) 48.3

90 (6) 79 (35) 81 (15) 83.3 78

10 (6) 21 (35) 19 (15) 16.7 22

57 (26) 89 (58) 28 (20) 58 41

Abbreviations: TFDRR = treatment fields as defined using digital radiography; TF3D = treatment field as defined using three-dimensional–based planning. * Overlap, p = 0.02 (significance testing across treatment site comparing spine vs. lung/abdomen and pelvis). y Under-coverage, p = 0.02 (significance testing across treatment site comparing spine vs. lung/abdomen and pelvis). z Percent over-coverage, p = 0.08 (significance testing across treatment site comparing spine vs. lung/abdomen and pelvis).

on the VC could be to look at a set of common field sizes. For example, when considering a set of field areas at 1  1–cm2 increments, from 10  10 to 14  14 cm2, the coefficient of variation is 0.26. The absolute value is perhaps less important than its relative value, however. In terms of percent overlap, an overlap of 80% or greater is a reasonable criterion for clinically acceptable variation for PTV target definition. In Fig. 3 we provided a representative case of 5 different observers defining the target of interest on the same case. Here, the coefficient of variation for the GTV volumes was 0.23, and the percent overlap was 68%. Interobserver variation in target delineation has been the subject of intense investigation in many curative clinical settings including prostate, head-and-neck (17), breast, bladder, lung (18–20), and esophageal cancer. Strategies that have been shown to improve interobserver variation include the addition of contrast (14, 21), complementary imaging modalities [e.g., positron emission tomography (22) and magnetic resonance imaging (23)], the use of fiducial markers (24), the addition of diagnostic radiologist into the multidisciplinary team for target delineation (20, 21), and the use of delineation protocols (11, 18, 25), targeting predominantly uncertainty in target visualization. In simple palliative radiotherapy the sources of variability go beyond the challenge of imaging. Key factors would likely include clinical judgment on the relevant GTV for radiotherapy and the lack of evidence or consensus to guide appropriate CTV definition in palliative radiotherapy. When multiple sites of disease are detected radiologically, a common scenario in palliative radiotherapy practice, the clinician needs to make an explicit judgment on the radiotherapy target, GTV, based on a synthesis of the clinical history

Interobserver variability in palliative RT d D. GRABARZ et al.

1503

Fig. 3. Gross tumor volume (GTV) defined by 5 observers (Case S1): mean GTV, 599 cm3 (SD, 136.5 cm3); coefficient of variation, 0.23; and percentage overlap, 68% (SD, 12.8%).

and physical and radiologic findings to establish which area of disease is likely responsible for the patient’s symptoms. Different clinical decisions could result in significant variations in the final target volumes. Simple radiotherapy can be completed by use of 2D planning (i.e., based on DRR only), 3D planning without dosimetry, and 3D planning with dosimetry. All three approaches are currently used in clinical practice. The use of 3D planning (with or without dosimetry) can reduce geographic misses. The use of dosimetry provides the opportunity to use simple optimization strategies (e.g., different beam weighting) that could allow improved normal tissue sparing. Although it remains to be proven whether, in palliative radiotherapy, reduced geographic miss and reduced dose to normal tissue translate into improved efficacy and quality of life at a population level, the goal of achieving these for individual patients is a sound one for the practice of palliative radiotherapy. What strategies can be used to reduce this variability? Variability because of differences in clinical decision making may respond to peer-review strategies such as a tumor board. What constitutes an appropriate CTV margin in different clinical scenarios in palliative radiotherapy should be the subject of further investigation. In contrast to the radical setting, where the potential routes of spread and long-term patterns of failure provided us with evidence to guide the design of CTV margins, information around short-/intermediateterm local symptom control and failure patterns, risk of morbidity from key anatomic areas, and estimates of toxicity are lacking. Consensus-based guidelines may provide us with the

best tool forward if we are to build the necessary evidence upon which to guide clinical practice. Similarly, choices for PTV margins require due consideration for patient condition and discomfort, guiding the choice of patient immobilization and tolerance criteria for treatment verification. Building the evidence base for setup variability under different palliative clinical scenarios would provide the necessary data for guidelines and consensus in clinical practice. Our study was based on 9 study cases and 5 observers, a modest sample size for the wide spectrum of scenarios in the practice of simple palliative radiotherapy. The generalizability of our findings has to be done with caution. Nevertheless, we provide the proof of principle that clinically significant interobserver variability in target definition exists in palliative radiotherapy. This is important and demands attention and further investigation CONCLUSION Palliative radiotherapy involves a complex interaction of practical considerations based on the patient’s physical condition, tumor anatomy, imaging characteristics, and planning strategy used. Our study highlighted the existence of clinically significant variability in target delineation for planning palliative radiotherapy treatments suitable for ‘‘simple’’ techniques. Strategies that can be efficiently incorporated into clinical practice to reduce this variability could have a valuable impact on patient outcomes and deserve further investigation.

REFERENCES 1. Radiotherapy practices in Sweden compared to the scientific evidence. Acta Oncol 1996;35(Suppl. 6):89–97. 2. Mackillop WJ, Zhou S, Groome P, et al. Changes in the use of radiotherapy in Ontario 1984-1995. Int J Radiat Oncol Biol Phys 1999;44:355–362. 3. Haddad P, Wong RK, Pond GR, et al. Factors influencing the use of single vs multiple fractions of palliative radiotherapy for bone metastases: A 5-year review. Clin Oncol (R Coll Radiol) 2005;17:430–434. 4. Driver DM, Drzymala M, Dobbs HJ, et al. Virtual simulation in palliative lung radiotherapy. Clin Oncol (R Coll Radiol) 2004; 16:461–466.

5. International Commission on Radiation Units and Measurements. ICRU report 62. Prescribing, recording and reporting photon beam therapy (supplement to ICRU report 50). Bethesda, MD: International Commission on Radiation Units and Measurements; 1999. 6. Weiss E, Hess C. The impact of gross tumor volume (GTV) and clinical target volume (CTV) definition on the total accuracy in radiotherapy theoretical aspects and practical experiences. Strahlenther Onkol 2003;179:21–30. 7. Letourneau D, Wong R, Moseley D, et al. Online planning and delivery technique for radiotherapy of spinal metastases using

1504

8.

9. 10.

11. 12. 13.

14.

15.

16.

I. J. Radiation Oncology d Biology d Physics

cone-beam CT: Image quality and system performance. Int J Radiat Oncol Biol Phys 2007;67:1229–1237. Vloet A, Letourneau D, Moledina S, et al. Feasibility testing of a quality assurance process for a cone-beam CT enabled online planning and treatment model for palliative radiotherapy. Rad Oncol 2007;84:S62. Letourneau D, Vloet A, Wong R, et al. Assessment of intrafraction motion for on-line planning and delivery of radiotherapy for patients with spinal metastases. Rad Onc 2007;84:S45. Geets X, Daisne JF, Arcangeli S, et al. Inter-observer variability in the delineation of pharyngo-laryngeal tumor, parotid glands and cervical spinal cord: Comparison between CT-scan and MRI. Radiother Oncol 2005;77:25–31. Tai P, Van Dyk J, Yu E, et al. Variability of target volume delineation in cervical esophageal cancer. Int J Radiat Oncol Biol Phys 1998;42:277–288. Hurkmans CW, Borger JH, Pieters BR, et al. Variability in target volume delineation on CT scans of the breast. Int J Radiat Oncol Biol Phys 2001;50:1366–1372. Meijer GJ, Rasch C, Remeijer P, et al. Three-dimensional analysis of delineation errors, setup errors, and organ motion during radiotherapy of bladder cancer. Int J Radiat Oncol Biol Phys 2003;55:1277–1287. Breen SL, Publicover J, De Silva S, et al. Intraobserver and interobserver variability in GTV delineation on FDG-PET-CT images of head and neck cancers. Int J Radiat Oncol Biol Phys 2007;68:763–770. Struikmans H, Warlam-Rodenhuis C, Stam T, et al. Interobserver variability of clinical target volume delineation of glandular breast tissue and of boost volume in tangential breast irradiation. Radiother Oncol 2005;76:293–299. McJury M, Fisher PM, Pledge S, et al. The impact of virtual simulation in palliative radiotherapy for non-small-cell lung cancer. Radiother Oncol 2001;59:311–318.

Volume 80, Number 5, 2011

17. Jeanneret-Sozzi W, Moeckli R, Valley JF, et al. The reasons for discrepancies in target volume delineation: A SASRO study on head-and-neck and prostate cancers. Strahlenther Onkol 2006; 182:450–457. 18. Steenbakkers RJ, Duppen JC, Fitton I, et al. Observer variation in target volume delineation of lung cancer related to radiation oncologist-computer interaction: A ‘Big Brother’ evaluation. Radiother Oncol 2005;77:182–190. 19. Van de Steene J, Linthout N, de Mey J, et al. Definition of gross tumor volume in lung cancer: Inter-observer variability. Radiother Oncol 2002;62:37–49. 20. Horan G, Roques TW, Curtin J, et al. ‘‘Two are better than one’’: A pilot study of how radiologist and oncologists can collaborate in target volume definition. Cancer Imaging 2006;6: 16–19. 21. McJury M, Dyker K, Nakielny R, et al. Optimizing localization accuracy in head and neck, and brain radiotherapy. Br J Radiol 2006;79:672–680. 22. Greco C, Rosenzweig K, Cascini GL, et al. Current status of PET/CT for tumour volume definition in radiotherapy treatment planning for non-small cell lung cancer (NSCLC). Lung Cancer 2007;57:125–134. 23. Smith WL, Lewis C, Bauman G, et al. Prostate volume contouring: A 3D analysis of segmentation using 3DTRUS, CT, and MR. Int J Radiat Oncol Biol Phys 2007;67:1238–1247. 24. Moseley DJ, White EA, Wiltshire KL, et al. Comparison of localization performance with implanted fiducial markers and cone-beam computed tomography for on-line image-guided radiotherapy of the prostate. Int J Radiat Oncol Biol Phys 2007;67:942–953. 25. Seddon B, Bidmead M, Wilson J, et al. Target volume definition in conformal radiotherapy for prostate cancer: Quality assurance in the MRC RT-01 trial. Radiother Oncol 2000;56: 73–83.