Radiation Physics and Chemistry 168 (2020) 108580
Contents lists available at ScienceDirect
Radiation Physics and Chemistry journal homepage: www.elsevier.com/locate/radphyschem
Do a priori expectations of plan quality offset planning variability in head and neck IMRT?
T
N. Alvesb,∗, T. Venturaa, J. Mateusa, M. Capelaa, M.C. Lopesa a b
IPOCFG, E.P.E, Medical Physics. Dept, Av. Bissaya Barreto, 3000-075, Coimbra, Portugal FCTUC, Physics Dept., Coimbra, Portugal
A R T I C LE I N FO
A B S T R A C T
Keywords: Radiotherapy Head & neck IMRT Dosimetry PlanIQ
Introduction: Radiotherapy planning is still a rather biased and planner dependent process, presenting large output variability. This work aims to evaluate whether the feasibility estimation tool, from PlanIQ software version 2.2 by Sun Nuclear Corporation, contributes to a significant improvement in intensity modulated radiation therapy plans’ quality for head and neck patients in Tomotherapy, as well as whether it leads to greater homogeneity across different planners. Methods: 28 head and neck clinical cases distributed by three different planners (A, B, C) were planned: 1) without the use of PlanIQ's feasibility tool and 2) using as guidelines to the organs-at-risk (OAR) constraints the feasibility dose-volume histogram (DVH) with feasibility level f=0.1. In both scenarios the resulting DVHs were compared to the closest feasibility DVH curve, and the mean or maximum dose to the OARs as well as the target coverage were assessed. Finally, a validation of the results by SPIDERplan was carried out. Results: For planner A, 3 out of 17 OARs showed statistically significant lowering of the mean f value with PlanIQ. As for planner B, only 1 OAR achieved this result, and most of the OAR presented a higher mean f value in scenario 2. Planner C had the greatest improvement with 12/17 OARs showing significant differences. It was verified that the dose limits clinically demanded for the OAR were generally met in both scenarios by all planners, and that the improvements in the sparing of OAR in scenario 2 for planners A and C were not at the expense of the coverage of the target volumes. These results were corroborated by the SPIDERplan scores, which also showed that the standard deviation of the global plan scores achieved by the different planners was lower with the use of the tool. Conclusion: For planners A and B, the usage of PlanIQ did not show a great improvement in the sparing of organs-at-risk whereas for planner C it made a major difference. In any case, PlanIQ contributed to a lower plan quality output variability among the different planners.
1. Introduction Radiotherapy is a cancer treatment approach that relies on the usage of ionizing radiation, such as photons, electrons or heavy ions, to deliver toxic levels of energy to tumorous cells, which lead to their controlled destruction. There is a complex multi-objective optimization problem inherent to the planning of radiotherapy treatments – to deliver a high radiation dose to the target volumes, complying with the objectives set by clinicians, whilst reducing the dose received by the adjacent organs to clinically acceptable values (Breedveld et al., 2018). Over the years, several advances have been made towards developing techniques and technologies that more efficiently solve this problem, such as intensity modulated radiation therapy (IMRT), volumetric-modulated arc therapy, stereotactic body RT and adaptive RT ∗
(Garibaldi et al., 2017), favoring the creation of better-quality radiotherapy plans. IMRT is a treatment technique that allows the delivery of highly conformal dose distributions, by modulating the intensity of the photon beam over time using multi-leaf collimators (MLC) (Garibaldi et al., 2017). Helical Tomotherapy is a treatment technology in which a 6 MV linear accelerator mounted onto a CT-type ring gantry rotates with constant speed around the patient, while the patients couch slides through it (Welsh et al., 2002). This contrasts with IMRT delivery by conventional linear accelerators, which typically consists of modulated static/dynamic fields or arcs around the patient who lies on a fixed couch position (Welsh et al., 2002). In Tomotherapy, the radiation beam takes a helical shape and this, combined with the intensity modulation provided by a binary MLC, allows for the delivery of highly conformal doses to the target volumes (Welsh et al., 2002).
Corresponding author. E-mail address:
[email protected] (N. Alves).
https://doi.org/10.1016/j.radphyschem.2019.108580 Received 30 July 2019; Received in revised form 8 October 2019; Accepted 14 November 2019 Available online 15 November 2019 0969-806X/ © 2019 Elsevier Ltd. All rights reserved.
Radiation Physics and Chemistry 168 (2020) 108580
N. Alves, et al.
2.2. Feasibility determination
Radiotherapy treatment planning is still considered a combination of science and art, being highly depend on the planners’ experience and skills (Nelms et al., 2012). Due to the intrinsic complexity of the optimization task planners are faced with, there is a high output variability across different planners, as each may have a particular approach on how to meet the dose prescription requirements set by the clinicians. For example, some planners may choose to stop attempting to further spare OAR once the dose limit is met, while others may opt to pursue extra sparing (Fried et al., 2017).There have been multiple studies that reinforce the need to pursue a dose as low as possible, especially in structures that are critical to the patients quality of life, such as the salivary glands, pharyngeal constrictors, or larynx (Fried et al., 2017; Rancati et al., 2010). Nevertheless, planners guide their process by generic dose constraints both on the targets and on critical structures, that if accomplished allow to treat the patient safely, but are not specific to any patient, and do not guarantee that the best possible resulting plan is being achieved for each individual. In this sense, it could be argued that planners would benefit from a priori knowledge of the patient specific best achievable results for the sparing of OAR, which could provide individually ambitious objectives for each structure. It is this goal that the Feasibility Estimation Module integrated in PlanIQ version 2.2 by Sun Nuclear Corporation proposes to accomplish. This tool provides a priori understanding of how difficult planners can expect it will be to achieve certain dose distribution in each OAR for a given patient, based on individual information of the patient anatomy, and while guaranteeing full coverage of the target volumes by the prescribed dose (Ahmed et al., 2017). This qualitative information is provided by dividing the dose-volume histograms into different areas, that correspond to different levels of feasibility (f). This tool could not only assist planners in their optimization tasks, but also provide a means to assess and compare the quality of the plans across different planners, and hopefully contribute to a more homogeneous plan quality output (Ahmed et al., 2017). Ahmed et al. presented a validation for 10 clinical head-and-neck (H &N) patients treated with Volumetric Modulated Arc Therapy (VMAT) (Ahmed et al., 2017). Additionally, Fried et al. proposed a study to compare the sparing of patients’ salivary glands and the larynx with and without the use of the tool, using 10 patients and considering VMAT (Fried et al., 2017). The goal of this study is to assess the PlanIQ Feasibility tool in a clinical context, considering 28 H&N patients with multiple OAR and different target volume sizes and shapes that were treated with Helical Tomotherapy. This aims to evaluate whether the feasibility estimation is suitable for Tomotherapy treatment planning, and either it can lead to an overall improvement in OAR sparing and plan quality, as well as contribute to a greater output homogeneity across the different planners at our center.
The process of the feasibility calculation by PlanIQ's software is thoroughly described by Ahmed et al. (Ahmed et al., 2017) and only the major steps are hereby presented. The input provided to the software for the calculations of feasibility were the patient anatomy data (planning CT scan with the contoured OAR and target volumes), the overlapping order of the targets (from highest to lowest prescribed dose), the dose to be delivered to the target volumes, set to at least 95% of the prescribed dose by the radiation oncologist, which is the minimum required PTV coverage criteria, and finally calculation parameters such as beam energy (6 MV) and dose grid spatial resolution (2 mm). For a given structure, a feasibility level (f), that ranges between 0 and 1 is computed. An iso-feasibility curve is then created by joining the points in the DVH with the same value of f. The f=0 curve is computed based on a benchmark dose grid that assigns full coverage to the target volumes with its respective prescription dose, and then estimates the minimum dose that any voxel outside these volumes could receive, using both a high gradient dose spread function and a low dose spread to periphery function (Ahmed et al., 2017). This distribution represents the “ideal” sparing curve and is unachievable by design, serving as the foundation for the feasibility calculations (Ahmed et al., 2017). The feasibility level for every point in the DVH that is above the f=0 curve is then computed considering the normalized distance of this point to the f=0 curve, and a closeness-to-feasibility function that converts this distance to a feasibility level (Ahmed et al., 2017). After all feasibility calculations are completed, the software generates the feasibility DVH (fDVH) for each OAR, which is qualitatively divided in 4 areas: the “impossible” region (red), which has as upper boundary the f=0 curve and is unachievable by design, given full coverage of the target volumes; the deemed “difficult” to achieve region (orange), which comprises curves with f values between 0 and 0.1, the “challenging” (yellow) region ranging between f=0.1 and f=0.5, and finally the “easy” to achieve or “probable” region from f=0.5 to f=1, as is defined in PlanIQ's reference guide (Sun Nuclear Corporation, 2014-2018). This way, the lower the level of feasibility the more difficult it is to achieve that dose distribution, given full target coverage, the lower the dose received by the OAR and the higher its sparing. 2.3. Plan optimization For each patient, the fDVH was generated for every OAR using the feasibility estimation tool from PlanIQ's software version 2.2 by Sun Nuclear Corporation. Each planner produced 2 plans for each of their patients: scenario 1) without the knowledge of the feasibility estimation provided by PlanIQ and scenario 2) using as objectives in the Treatment Planning System (TPS – VoLO 2.1.3) 3 metrics taken from the f=0.1 curve of the fDVH for each OAR: the dose that is received by more than 90% and 50% of the volume (D90 and D50 respectively) and the maximum dose of the non-zero-dose voxels in the structure (Dmax). The tolerance criteria for the OARs were defined according to clinical protocols.
2. Experimental set-up 2.1. Sample description Twenty-eight H&N patients were distributed across 3 different planners, A, B and C that were assigned 8, 9 and 11 patients respectively. The prescribed doses to the Tumor Planning Target Volume (PTV_T) were either 66 Gy or 69.96 Gy, and the prescribed dose to the lymph nodes PTVs (PTV_N) was either 50.4 Gy, 54 Gy or 59.4 Gy. Helical IMRT treatments were delivered to the patients in 33 fractions using the Tomotherapy HD (Accuray) treatment unit. Regarding the OAR, for all the patients the spinal cord, brainstem, parotids, oral cavity, oesophagus, lips and mandible were considered. Furthermore, 23 patients also presented the submandibular glands, 18 the larynx, 24 the cochleae ears, and constrictor muscle, and 26 the thyroid as OARs.
2.4. Plan analysis An individual patient analysis was conducted where the relative DVHs (rDVHs) obtained by each planner in scenarios 1 and 2 for every OAR and patient were crossed with the respective fDVHs generated by PlanIQ. Afterwards, the rDVH and the fDVH were compared using the Dice index, eq. (1), being that the closer the value of D is to 1 the more similar are the DVHs:
D=2×
2
ArDVH ∩ AfDVH ArDVH + AfDVH
(1)
Radiation Physics and Chemistry 168 (2020) 108580
N. Alves, et al.
for the spinal cord of one of planner C's patients. The crossing of the ftheor for each OAR per planner on scenarios 1 and 2 with the absolute dose delivered to the OAR, showed that, in general, there was a lowering of the ftheor in scenario 2, and that the dose constraints were met in both scenarios. Fig. 2 shows an example of this analysis for all planner C's patients, relatively to the spinal cord.
Where ArDVH is the area under the rDVH curve and AfDVH the area under the fDVH curve. Hence, the closest fDVH to the rDVH of the planner was computed (fDVHtheor) as well as its respective value of f (ftheor). The obtained values of ftheor were then crossed with the mean dose (Dmean) or Dmax for each OAR (according to the clinical endpoint), as well as with its respective clinical dose limit, and with the coverage of the target volumes. An analysis per planner was then carried out, where the mean ftheor from scenarios 1 and 2 for each OAR of the patients of a given planner was computed, as well as its standard deviation. In order to assess to which extent the similarity level between the DVHs was acceptable, the mean maximum Dice index and respective standard deviation for each OAR were also calculated. The Wilcoxon sign rank test with a significance level of 0.05 was used to determine whether there were statistically significant differences between the mean ftheor values from scenarios 1 and 2. Finally, a validation of the results was carried out using the SPIDERplan quality assessment tool. This tool, described in detail by Ventura et al. (2016), uses an intuitive graphical representation through customized radar plots, and an associated score function, which is based on the targets and OARs dose constraints to evaluate the quality of the plans (Ventura et al., 2016). A score lower than unity is achieved when all planning aims are overcome, and a score higher than 1 means that the plan is not complying with the required dose constraints. Structures are organized within groups and both groups and structures are assigned with weights that reflect the radiation oncologist clinical preferences. Furthermore, a global plan score containing all structures score and the clinical preferences is calculated, and a group score can also be determined. In this study, the structures were divided into five groups (PTV group, Critical group, DigestOral group, Bone group and Other group) which were assigned different weights (55%, 33%, 6%, 4% and 2%, respectively). The different group structures depend on factors like the pathology and clinical preferences, established based on institutional or international protocols (such as the RTOG 0615 (Lee et al., 2014)), and the different weights were assigned according to the group clinical relevance (Table 1) (Ventura et al., 2016).
3.2. Analysis per planner The results of the analysis of the mean ftheor for each OAR per planner are shown in Fig. 3. For planner A, there was a statistically significant decrease in the mean ftheor for the left submandibular gland, the oral cavity and the lips in scenario 2. As for planner B there was a significant decrease in the mean ftheor for the spinal cord in scenario 2 but, in contrast, a significant increase for the parotids, larynx and oesophagus was observed, revealing that the sparing of these OAR was worst with the use of the feasibility tool. Regarding planner C, there were significantly declines of high magnitude in the mean ftheor for almost every OAR in scenario 2, being the constrictor muscle, larynx and cochleas the only exceptions. The mean Dice indexes for each structure across all planners were all above 0.8, meaning that anyway, the pairs of curves are generally quite similar. 3.3. Target coverage Regarding the target coverage, for planners A and B there was not a visible difference of the coverage for scenarios 1 and 2. For planner A, all targets received more than 95% of the prescribed dose and for planner B there were 2 cases of PTV_N that fell slightly below the 95% mark in scenario 1 and two in scenario 2. Planner C is the only that showed a visible distinction between the coverage in scenario 1 and 2, the percentage of dose to the target tending to be smaller in scenario 2, but in spite of that only two PTV_N and one PTV_T fell below the 95% of the prescribed dose. 3.4. SPIDERplan validation The results of the group scores from the SPIDERplan validation in both scenarios are shown in Table 2. It can be seen that for all planners the Critical group achieved a better (closer to 0) score in scenario 2. For the DigestOral group planners A and C achieved a better score in scenario 2 but planner B had the opposite result. For the Bone group the differences were not very significant. Regarding the PTV group, for planners A and B the score was almost identical in both scenarios, whereas in planner C we see a slight increase in the score for scenario 2. As for the global scores per planner, shown in Table 3, all planners showed a decrease in the global score for scenario 2, being planner C the one that achieved the highest difference between the two scenarios. The standard deviation of the scores was also lower in scenario 2.
3. Results 3.1. Analysis per OAR The results of the comparative analysis of the rDVHs and fDVH showed that for most OAR the obtained rDVH was closer to the ideal sparing curve (f=0) in scenario 2. One example is presented in Fig. 1 Table 1 Distribution of the structures per groups and respective group weights in SPIDERplan configuration. Group
Structures
Group weights
4. Discussion
PTV
PTV-T PTV-N Spinal cord Brainstem Parotids Oral cavity Oesophagus Larynx Submandibular glands Lips Constrictor muscle Mandible Ears Cochleas Thyroid
55%
The proposed methodology allowed for the assessment of the PlanIQ feasibility tool in a real clinical environment, considering 28 H&N patients with multiple OAR and target volumes with competing objectives, distributed across 3 planners with different degrees of experiences and planning approaches. Furthermore, it explores whether this toll is suitable for treatment with Helical Tomotherapy. Since the introduction of this feasibility tool in PlanIQ's software there have been few studies to validate its efficacy and relevance for H& N patients. In the work by Ahmed et al. (2017) where the algorithm behind the feasibility tool is described, a validation is presented for 10 H&N datasets, considering 3 OAR (namely the parotids and the inferior pharyngeal constrictor). Besides, just one OAR was considered at a time for the manual driven plans, to avoid competing objectives.
Critical DigestOral
Bone
Other
33% 6%
4%
2%
3
Radiation Physics and Chemistry 168 (2020) 108580
N. Alves, et al.
Fig. 1. fDVHs with rDVH and fDVHtheor for scenario 1 (left) and 2 (right) for the spinal cord of a patient from planner C.
Fig. 2. The red dots represent scenario 1 and the blue dots represent scenario 2. On the right, the red line and the blue line depict the mean ftheor for the Spinal Cord of planner C patients in scenarios 1 and 2 respectively. On the left, the black line represents the dose limit that is clinically acceptable for the spinal cord. There is a clear fall off in the maximum dose received by the spinal cord in scenario 2, that is associated with a smaller value of ftheor. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
was greater for planner C. Regarding planner B there was a significant improvement in the sparing of the spinal cord in scenario 2, but it is visible that this was done at the expense of deteriorating the sparing of other less critical structures, since almost all other OAR showed a worse mean value of ftheor in scenario 2. Despite this fact, when analyzing the absolute dose delivered to each OAR, it can be seen that the dose criteria was generally met in both scenarios. By analyzing the percentage of the prescribed dose that was delivered to the PTVs it was clear that the improvements on the sparing of OAR achieved for planners A and C in scenario 2 was not due to a sacrifice in the dose to the targets, since overall all planners complied with the 95% of the prescribed dose objective. Despite this, it was visible that for planner C there was a lowering in the percentage of dose delivered to the target in scenario 2, which is also related to the great improvement in patients OAR. The SPIDERplan validation corroborated these results, as it showed that the planner in which the use of the tool had a greater effect was undoubtedly planner C, as well as that the most positively affected OAR group was the critical one (brainstem and spinal cord). The standard deviation of the global scores for all the planners was lower in scenario 2, which suggests that the use of the PlanIQ feasibility tool contributed to less output variability among the three planners. Moreover, the mean Dice coefficients were relatively high for every OAR, which indicates that the DVHs produced by the tool were in conformation with the ones generated by the TPS for Tomotherapy treatments, which is an indication that the feasibility tool as implemented in PlanIQ also fits the IMRT planning specifications of this
Another study by Fried et al. (Welsh et al., 2002) compared the sparing of patients' salivary glands and the larynx with and without the use of the tool, using 10 patients with primary lesions of the oropharynx. In both studies VMAT was the adopted treatment technique. In this study, besides validating the effect of the tool in the DHVs of the OAR, the impact it has in planning variability across the different planners is assessed, as well as which planners benefited most from the use of the tool. The results indicate that the impact of the usage of the feasibility tool is not homogeneous across the three different planners, which corroborates the idea that dose planning is highly planner dependent (Nelms et al., 2012). Through the analysis of Fig. 3, it can be seen that the use of the tool had a major impact for planner C, since it caused a lowering of the ftheor for every OAR, and a statistically significant difference for most of them. This means that the rDVH is closer to the bestcase scenario curve (f=0), and thus the sparing of the OAR is increased in scenario 2 for this planner. As for planner A, it could be argued that the tool did not have as much of an impact, since although a decrease of the mean value of ftheor can be seen for the spinal cord, submandibular glands, constrictor muscle, oral cavity and larynx, only three structures showed statistically significant differences between the two scenarios. It is also important to point out that for this planner the mean value of ftheor was already in the yellow (“challenging”) area in scenario 1 for several OAR, whereas for planner C almost all structures were in the green (“easy”) area, meaning that for planner A the sparing of the OAR was higher than for planner C prior to the use of the feasibility tool, which could be the reason why the magnitude of the effect the tool had in the sparing 4
Radiation Physics and Chemistry 168 (2020) 108580
N. Alves, et al.
Fig. 3. The mean ftheor and corresponding standard deviations for each OAR for planners A, B and C are represented. The plot is divided in the feasibility regions (orange, yellow and green) and the statistically significant results are shown with a black star. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
5
Radiation Physics and Chemistry 168 (2020) 108580
N. Alves, et al.
Table 2 Group SPIDERplan scores per planner in scenarios 1 and 2. Group Scores
Planner A
PTV Critical DigestOral Bone
Planner B
Planner C
Scenario 1
Scenario 2
Scenario 1
Scenario 2
Scenario 1
Scenario 2
0.977 0.637 0.949 0.430
0.976 0.561 0.888 0.428
0.978 0.548 0.843 0.425
0.977 0.476 0.906 0.432
0.969 0.673 0.930 0.535
0.985 0.495 0.717 0.554
Declaration of competing interest
Table 3 Global SPIDERplan scores per planner in scenarios 1 and 2 with respective mean and standard deviation. Global Scores
Planner A
Planner B
Planner C
Mean
Standard Deviation
Scenario 1 Scenario 2
0.856 0.827
0.815 0.796
0.861 0.797
0.844 0.807
0.025 0.018
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.radphyschem.2019.108580.
technology. It is also important to note that there are some structures that presented doses lower than the impossible DVH indication. This happened because only 95% instead of 100% of the prescribed dose was considered as input parameter. There were a few limitations to this analysis inherent to its clinical nature such as the fact that plans from scenarios 1 and 2 were produced at different points in time, and the fact that each planner had different patient cases. Anyway, the adopted methodology revealed to be adequate to the proposed study and that there is still space to be improved.
References Ahmed, S., et al., 2017. A method for a priori estimation of best feasible DVH for organsat-risk: validation for head and neck VMAT planning. Med. Phys. https://doi.org/10. 1002/mp.12500. Breedveld, S., et al., 2018. Multi-criteria optimization and decision making in radiotherapy. Eur. J. Oper. Res. https://doi.org/10.1016/j.ejor.2018.08.019. Fried, D., et al., 2017. Assessment of PlanIQ Feasibility DVH for head and neck treatment planning. Radiol. Oncol. Phys. https://doi.org/10.1002/acm2.12165. Garibaldi, C., et al., 2017. Recent advances in radiation oncology, ecancermedicalscience. https://doi.org/10.3332/ecancer.2017.785. Lee, N., et al., 2014. A phase II study of concurrent chemoradiotherapy using intensitymodulated radiation therapy (IMRT) + Bevacizumab (BV) [NSC 708865; IND 7921] for locally or regionally advanced nasopharyngeal cancer, RTOG 0615. Int. J. Radiat. Oncol. Biol. Phys. Nelms, B.E., et al., 2012. Variations in external beam treatment plan quality: an interinstitutional study of planners and planning systems. Practical Radiat. Oncol. https:// doi.org/10.1016/j.prro.2011.11.012. Rancati, T., et al., 2010. Radiation dose volume effects in the larynx and pharynx. Int. J. Radiat. Oncol. Biol. Phys. https://doi.org/10.1016/j.ijrobp.2009.03.079. Sun Nuclear Corporation, 2014-2018. Reference Guide. PlanIQTM, pp. 89–97 Document 1216011, Rev F. Ventura, T., et al., 2016. SPIDERplan: a tool to support decision-making in radiation therapy treatment plan assessment. Rep. Pract. Oncol. Radiother. https://doi.org/10. 1016/j.rpor.2016.07.002. Welsh, J.S., et al., 2002. Helical Tomotherapy: an innovative technology and approach to radiation therapy, technology in cancer research & treatment. https://doi.org/10. 1177/153303460200100413.
5. Conclusion This study showed that PlanIQ fDVHs can be a valuable tool both for providing ambitious patient specific goals and a way to compare and assess the performance of different planners. The results indicate that the use of this tool contributed to the sparing of OAR, although this effect was more evident for planners that exhibited higher mean values of ftheor prior to the use of the tool. The structures that benefited the most were the spinal cord and the brainstem, which are critical structures to the patients’ quality of live, and this improvement was not done compromising the target coverage. It is also demonstrated that this tool is suitable for H&N cases with multiple targets and overlapping OAR treated with Tomotherapy, and that the use of fDVH metrics contributed to a smaller output variability in the quality of the plans produced by different planners.
6