Dosimetric Multicenter Planning Comparison Studies for Stereotactic Body Radiation Therapy: Methodology and Future Perspectives

Dosimetric Multicenter Planning Comparison Studies for Stereotactic Body Radiation Therapy: Methodology and Future Perspectives

Journal Pre-proof dosimetric multicenter planning comparison studies for SBRT: methodology and future perspectives Francesca Romana Giglioli, Msc, Cri...

1MB Sizes 0 Downloads 51 Views

Journal Pre-proof dosimetric multicenter planning comparison studies for SBRT: methodology and future perspectives Francesca Romana Giglioli, Msc, Cristina Garibaldi, Msc, Oliver Blanck, PhD, Elena Villaggi, Msc, Serenella Russo, Msc, Marco Esposito, PhD, Carmelo Marino, Msc, Michele Stasi, Msc, Pietro Mancosu, PhD PII:

S0360-3016(19)33965-3

DOI:

https://doi.org/10.1016/j.ijrobp.2019.10.041

Reference:

ROB 26019

To appear in:

International Journal of Radiation Oncology • Biology • Physics

Received Date: 5 August 2019 Revised Date:

3 October 2019

Accepted Date: 25 October 2019

Please cite this article as: Giglioli FR, Garibaldi C, Blanck O, Villaggi E, Russo S, Esposito M, Marino C, Stasi M, Mancosu P, dosimetric multicenter planning comparison studies for SBRT: methodology and future perspectives, International Journal of Radiation Oncology • Biology • Physics (2019), doi: https:// doi.org/10.1016/j.ijrobp.2019.10.041. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Elsevier Inc. All rights reserved.

Title: dosimetric multicenter planning comparison studies for SBRT: methodology and future perspectives Francesca Romana Gigliolia*1Msc, Cristina Garibaldib1Msc ,Oliver BlanckcPhD, Elena Villaggid Msc, Serenella Russoe Msc, Marco EspositoePhD, Carmelo Marinof Msc, Michele Stasig Msc, Pietro Mancosuh PhD a

Medical Physics Unit, AOU Città della Salute e della Scienza di Torino, Italy

b

IEO, European Institute of Oncology IRCCS, Unit of Radiation Research, Milan, Italy

c

Department of Radiation Oncology, University Medical Center Schleswig-Holstein, Kiel, Germany

d

Medical Physics Unit, AUSL Piacenza, Italy

e

Medical Physics Unit, Azienda USL Toscana Centro, Firenze, Italy

f

Medical Physics Unit, Humanitas C.C.O., Catania, Italy.

g

Medical Physics Department A.O. Ordine Mauriziano di Torino, Italy

h

Medical Physics Unit of Radiation Oncology Dept., Humanitas Research Hospital, Milano, Italy

* Corresponding Author 1

Both authors equally contributed to this study

Francesca Romana Giglioli and Cristina Garibaldi equally contributed to the work and should be considered as co-first authors Corresponding Author: Francesca Romana Giglioli A.O.U. Città della Salute e della Scienza di Torino, corso Bramante 88/90, Torino Italy Tel: (+39) 011 6334825 Fax : (+39) 011 6336614 E-mail: [email protected]

Conflict of interest: None Funds : None Running title: multicenters for sbrt: critical review

1

Abstract: In this review a summary of the published literature pertaining to the stereotactic body radiation therapy (SBRT) multiplanning comparison, data sharing strategies, and implementation of benchmark planning cases to improve the skills and knowledge of the participating centers was investigated. A total of 30 full-text articles were included. The studies were subdivided in three categories: multiplanning studies on dosimetric variability, planning harmonization before clinical trials, and technical and methodological studies. The methodology employed in the studies were critically analyzed to find common and original elements with the pros and cons. Multicenter planning studies have demonstrated to play a key role in improving treatment plan harmonization, treatment plan compliance and even clinical practices. This review has highlighted that some fundamental steps should be taken in order to transform a simple treatment planning comparison study into a potential credentialing method for SBRT accreditation. In particular, prescription and general requirements should always be well defined; data analysis should be performed with independent DVH and/or dose calculations; quality score indices should be constructed; feedback and correction strategies should be provided and a simple web-based collaboration platform should be used. The results reported clearly showed that a crowd-based re-planning approach is a viable method for achieving harmonization and standardization of treatment planning among centers using different technologies.

Introduction Stereotactic body radiation therapy (SBRT) was introduced several years ago and is now a widely used treatment option for many anatomical sites [1,2]. An interesting characteristic of SBRT is that it delivers high radiation doses to small lesions with short fractionation schemes, thus necessitating high accuracy and high dose conformity to spare healthy surrounding tissues. SBRT often requires complex and highly modulated dose distribution and an extensive knowledge of treatment planning, quality assurance and treatment delivery issues. The rapid increase in the use of these treatments could benefit from the sharing and standardization of dosimetric and treatment planning strategies. Implementing benchmark planning cases and multiplanning data sharing strategies could help to increase the compliance with clinical trial protocol requirements and the harmonization of treatment planning may improve the reproducibility of clinical trial outcomes. Furthermore, the standardization of physical and dosimetric approaches to SBRT procedures is essential for enhancing treatment quality. In several countries working groups have been formed

1

with the aim of sharing skills and strategies, and harmonizing clinical practices, including treatment planning. For example, in 2012 the Italian Association of Medical Physicists (AIFM) formed a working group on “Dosimetry, physics, image guidance and radiobiology of SBRT” (AIFM/SBRT-WG), with the aim of supporting the routine use of SBRT. The evaluation of dosimetric consistency between SBRT treatments planned in different centers was one of the main objectives of the AIFM/SBRT-WG [3,4,5,6,7]. The multicenter approach was also explored by other radiation oncology and medical physics SBRT working groups around the world [8,9,10]. Furthermore, multicenter "end-to-end" testing and dummy runs were carried out before the start and even with the first patients during the accreditation processes of numerous international trials (e.g., Radiation Therapy Oncology Group, European Organization for Research and Treatment of Cancer, and many more). Besides high quality standards by all participants, results comparison and joined interpretation are the most important aspects of international clinical trials. For those reasons the multiplanning approach, recently introduced as the so called knowledge-based approach [11], may play a key role in increasing the overall quality level of clinical trials and enable harmonization of SBRT practice in general. This review focuses on dosimetric multicenter planning comparison studies for SBRT and aims to address the questions, if multicenter trials are a valid approach for improving the skills and knowledge of the participating centers, if studies of this kind should be included in special training programs for medical physicists, and if the methodology used in those studies are suitable for multiplanning approaches.

Methods and materials A literature search in the National Center for Biotechnology Information (NCBI) database (PubMed) was performed from 2006 to 2019. The following keywords were used: “SBRT” and (“multiplanning” or “multi institution” or “multi center” or “multi institutional”, “multinational”, “and inter-institutional”, “planning benchmark”). The initial search yielded a total of 50 peerreviewed articles published in various journals and conference proceedings books. The papers were further screened by an expert group against the following criteria: original papers in English dealing with dosimetric treatment plan comparisons among the participating centers from a physical point of view; including publications of contouring or credentialing processes for clinical trials. Reviews were excluded. 2

Results Thirty studies met our selection criteria and were used for detailed data extraction. In particular 13% of the papers were published between 2006 and 2009, 26% from 2010 to 2014 and 61% from 2015 to 2019. Multicenter clinical trials on SBRT treatment planning were mainly focused on analyzing differences in technology, expertise and on the practical implementation of clinical trial protocols that can lead to relevant dosimetric differences. The studies analyzed can be divided into three main groups: multicenter studies to assess dosimetric variability, planning harmonization studies before clinical trials and multicenter studies on specific technical and methodological aspects. The aim of most multicenter studies (11/30, 37%) was to evaluate the dosimetric variability induced by different technological factors and human expertise [12,13,14,15,16,4,6,3,5,7]. Twenty-three per cent (7/30) of the studies were conducted for the harmonization of planned doses before clinical trials [17,18,19,20,21,22,9]. Three studies were based on survey results regarding dose prescription and technical preferences [23,24,25]. Three studies evaluated contouring variability [26,27,28] two studies evaluated the dosimetric impact of different approaches to intra-fraction respiratory-induced target motion [29,30] and two studies analyzed the multicentric comparison of dose calculation algorithms [31,32]. One study [33] was designed for CyberKnife planning for spinal treatment. The aim of the paper by Clark et al [34] was to review the procedures for ensuring the highest accuracy of dose calculation and delivery in multicenter trials. The methods employed in the reviewed multi-center planning studies varied in terms of workflow, data requirement and type of analysis and strongly depended on the scope of the study.

3.1 Multiplanning studies on dosimetric variability Most large multicenter planning studies were performed outside of clinical trials or credentialing processes and were aimed at evaluating dosimetric consistency among the different centers [33] using different techniques and platforms [3-8,12,14-16]. An overview of the main multicenter planning studies is shown in table 1 together with publications of credentialing programs for clinical trials. These studies provided the participating centers with the CT scans and contours of various clinical cases (1-5) and both target volume and dose prescription guidelines for planning target volume (PTV) together with constraints for organs at risk (OARs) [12,15,3,7]. The evaluation methods used for the multicentre planning studies varied greatly. In some studies, the participating centers were asked to provide the analyzed parameter from their own treatment 3

planning systems (TPS) [8, 15], while one study used the same TPS for all participants [33]. The participating centers were often asked to provide the DICOM RT dose files to allow DVH recalculation with independent software [5,6,12,16,21,33]. Other studies required the entire treatment plan for complete dose recalculated with an independent TPS [18,22,28,31]. In particular, Habraken et al. [28] adopted a “homemade” automated treatment planning software, highlighting the fact that using automated treatment planning for clinical trials increases plan quality and reduces inter-planner variability. In another study, Kawai D et al. [31] evaluated the dosimetric variation regarding the analytical anisotropic algorithm (AAA) relative to other algorithms in lung SBRT in a multi-institutional setting involving six institutions using a secondary check program and compared the AAA to the Acuros XB (AXB) in two institutions. The studies also varied greatly in terms of the parameters evaluated. However, all of the studies assessed the different treatment plans in terms of DVH parameters for the PTV and OARs, and later studies assessed the prescribing, recording and reporting of stereotactic treatments with small photon beams [35] in accordance with the recommendations by the International Commission on Radiation Units and Measurements (ICRU) Report 91. Moreover, some studies also evaluated additional parameters such as treatment time, monitor unit, conformity index, homogeneity index, generalized Equivalent Uniform Dose [3-7,9,15,20,21,24,28,29,33] and two studies also evaluated tumor control probability (TCP) and the normal tissue complication probability (NTCP) [28,33]. Few studies defined a plan quality metric score based on DVH and other parameters to reduce variability not only in the plan output but also in the plan evaluation [6,12-15,33]. Villaggi et al [7] evaluated complexity metrics in VMAT plans as well as dosimetric parameters as potential evaluable plan quality indicators for planning benchmarking. Table 1 summarizes dosimetric variability in multiplanning studies. Study ID,

Method of comparison

year

Platform for report/review

DVH independent calculation

Initial % of fulfillment

Replan Phase

n° of

n° of cases Connection to a clinical participants trial

Multicenter Studies to assess Dosimetric Variability Nelms 2012 [12]

Scoring mechanism Plan Quality Metric (PQM) with 14 dose metric components

3DVH software

yes

Not reported

no

125

1 prostate lesion

no

Furuya 2015 [16]

Interinstitutional variations and protocol dose constraints

MIM

yes

100%, PTV inhomogeneity

no

3

3 spine lesions

no

4

Marino 2015 [3]

Intercomparison based on dosimetric results

Standardized forms + MATLAB

no

66/70 (94%)

yes

14

5 prostate lesions

no

Esposito 2016 [4]

Dose distribution comparison

MATLAB

no

66/70 (94.3%)

no

14

5 liver lesions

no

Giglioli 2016 [5]

Protocol dose constraints + gEUD2

Velocity + Matlab

yes

124/130 no violations (95.4%)

no

26

5 lung lesion

no

Moustakis 2017 [13]

Fulfillment of the dosimetric plan objectives + mathematical ranking

n/a

no

96.6% for OAR

no

29

3 lung lesions

No

Esposito 2018 [6]

Fulfillment of the dosimetric plan objectives + mathematical score

MIM

yes

87.5%

yes

48

2 spine lesions

no

Moustakis 2018 [15]

Comparison with Reference + mathematical ranking

n/a

no

63.9% for OAR

no

12

3 spine lesions

no

Villaggi 2019 [7]

Intercomparison based on dosimetric results and complexity metrics

R

no

100% for PTV; 70% for OAR; outliers for PTV and OAR doses

yes

13

5 prostate lesions

no

Planning Harmonization Studies before Clinical Trials

Djarv 2006 [20]

Intercomparison based on volumes and dosimetric results

n/a

no

Not reported

no

6

2 lung lesions

Scandinavia n SBRT study for NSCLC stage I

Matsuo 2007 [9]

Interinstitutional variations and protocol dose constraints

n/a

no

100% for OAR; outliers for PTV doses

no

11

4 lung lesions

JCOG 0403

Eriguchi 2013 [21]

Intercomparison based on dosimetric results

Eclipse

yes

Only variability analyzed

no

4

4 liver lesions; 3 plans for each case

no

Lambrecth 2016 [18]

Protocol dose constraints

VodcaRT in EORTCQA platform

yes

11/12 (91.7%)

provided

12

1 central lung lesion

LungTech

Multicenter Studies on Specific Technical and Methodological Aspects Blanck 2016 [33]

Human reviewer + mathematical ranking

ARTIVIEW

yes

80% for PTV coverage; 60% for OAR constraints

no

10

1 spine lesion

no

Colvill 2016 [29]

RTOG 1021 and RTOG 0938 dose constraints; measurements in static and motion phantoms

Standardized forms

no

not reported

no

10

1 lung and 1 prostate lesion

no

Habraken 2017 [28]

Trial protocol + results from the autoVMAT Erasmus-iCycle software

AutoVMAT

yes

18/20 (90%) +2 minor deviations

yes

9

1 liver lesion

TRENDY

Kron 2018 [22]

Protocol dose constraints + beam arrangements advices

SWAN

yes

9/40 (77.5%)

yes

22

2 lung lesions

CHISEL

5

Table 1 Overview of multicenter planning studies

3.2 Planning harmonization before clinical trials Studies focused on clinical trials or SBRT credentialing programs mainly analyzed dose delivery accuracy, in terms of machine output, dosimetry data, dose calculation algorithms and quality assurance (QA) procedures [17-20,22,26-28,30,34]. A simple credentialing procedure was reported by Ibbott et al. [17]. The authors drafted questionnaires to evaluate the institutions’ understanding of the protocol, treatment planning benchmarks to demonstrate the institutions’ ability to perform SBRT and tumor dose verifications on anthropomorphic phantoms for special treatment techniques. As a strictly “sent-out-and-sent-back” method, Clark et al [34] performed a dosimetric audit in lung SBRT by sending the MD Anderson (Houston, USA) Imaging and Radiation Oncology Core (IROC) static anthropomorphic phantom (measurements with film, TLD and alanine) and providing feedback to the participating centers. Similarly, Timmerman et al. [19] reported the accreditation process of the RTOG trial number 0236 (NCT00087438) based on an "end-to-end" test and dummy run of the first patient, and sent the phantom with the contours to each participating center. Furthermore, Djärv et al. [20] evaluated compliance observed in a dummy run study in a multicenter setting on different volumes and doses. Each center received the CT-scans of two lung patients with instructions for target contouring, dose prescribing and dose optimization. Lastly, Lambrecht et al. [18] presented a comprehensive radiotherapy QA model for their clinical trial (LungTech - NCT01795521) and provided the centers with CT-scans, contours and a motion phantom, with the possibility of amending the plan after providing detailed feedback. Kron et al. [22] reported multicenter site visits using a dedicated phantom with moving inserts to estimate the dose delivered in the presence of tissue inhomogeneity and target motion. This study was the only dosimetry report with dedicated on-site visits for SBRT credentialing. Moreover, two studies [29,30] investigated the dosimetric impact of various approaches to respiratory induced tumor motion. Hurkmans al. [30] determined the accuracy of 4DCT in the ROSEL trial (NCT00687986) by performing a four-dimensional CT (4DCT) scan of a motion phantom in order to evaluate positional and volumetric imaging accuracy, depending on the scanning protocol. Colvill et al. [29] performed a multicenter study involving only expert centers that use motion compensation delivery techniques. The centers were provided with CT images, contours and motion traces in order to account for realistic tumor motion accurately, but the centers were to use their inhouse motion phantoms for delivery accuracy assessment.

6

3.3 Technical and methodological studies Two studies carried out a multi-institutional retrospective analysis involving a large cohort of patients on the technical aspects of SBRT [23,24]. More specifically, Woerner et al. evaluated if planners followed the guidelines established by the Radiation Therapy Oncology Group (RTOG) and other literature for lung SBRT [24]. Das et al. assessed the state of dose prescription and dose volume histogram (DVH) compliance to international standards [23]. Redmond and colleagues [25] performed an international survey reviewing the technical aspects of SBRT for managing oligometastatic diseases to guide safe patient management. Seven high-volume centers participated in this survey and the levels of agreement were categorized as strong (6-7 common responses), moderate (4-5), low (2-3) or no agreement. In this survey, the key methods for the safe implementation and practice of SBRT for oligo-metastases, including target delineation, prescription doses, normal tissue constraints, imaging and set-up, were identified and reported. In contrast to dose planning and delivery audits, three studies focused on the impact of contouring variability on dose volume data [26-28]. Habraken et al. [28] provided CT scans of a clinical case from their clinical TRENDY trial (NCT02470533) for contouring and treatment planning according to the trial protocol and dedicated contours for re-planning after feedback on the original contouring (including prospective plan feedback based on automated treatment planning). Gwinne et al. [26] implemented a workflow to outline the process using a dedicated platform and a pool of reviewers, while Lo et al [27] evaluated the dosimetric effect of peer review of volume delineation in SBRT lung planning, thus providing the center with plan details, delineation guidelines and DVH analysis of the reviewed contours for dosimetric violations.

3.4 Results of the multicenter studies The results of the reviewed multicenter research studies are as varied as the intended goals and methods used, being this review mainly focused on the methodology rather than the results of the multicenter approach. The more relevant issues relating the outcomes are reported.

3.4.1 Multiplanning studies on dosimetric variability The violation of OAR limits in non-credentialing multicenter planning studies was generally much

higher, as high as 47.6% probably due to a more stringent design of the planning objectives compared to clinical trial benchmarks. More specifically, significant variations in treatment plan quality based on participants rather than on treatment platforms were observed throughout all 7

multicenter planning studies and the As Low As Reasonably Achievable (ALARA) principle was not always followed [3-6,8,9,12,33]. Some studies also demonstrated that high quality treatment plans do not require complex planning or delivery techniques or even long treatment times [33, 8] and generally more balanced treatment plans in the pareto optimization solution space are clinically preferred [33]. However, this can only be evaluated using expert reviews or score functions that involve multiple conflicting planning objectives rather than just comparing the various parameters separately [6, 8, 12, 14, 15, 33]. The treatment platform did not affect quality except for some small differences in target conformity and dose homogeneity [16,29]; probably due to the lack of instructions about inhomogeneity in the planning protocols. Nevertheless, the dose calculation algorithm significantly affected plan harmonization throughout the SBRT studies especially for lung SBRT [31, 32]. In this study, even with the analytical anisotropic algorithm (AAA), significant differences as high as 7% were observed [32], thus confirming the ICRU report 91 recommendations not to use AAA for lung SBRT [35]. On the other hand, the aforementioned dose inhomogeneity may indeed affect treatment plan quality, e.g. by measures of using PTV-gEUD2 (Equivalent Uniform Dose) [5], and high dose inhomogeneity may reduce the dose delivered to the surrounding OARs and increase it to the tumor at the same time [3, 5, 31, 36]. However, this may not be true for all locations, e.g. for spinal SBRT [6, 15], and it is not clear if inhomogeneous doses are desirable in cases where potentially not all of the cells in the PTV are tumours. Nevertheless, clinical practice guidelines show that more than one target dose parameter is required for SBRT treatment plan quality harmonization [23], which was determined by observing the results of clinical trials, multicenter planning benchmarks [21] and international recommendations (i.e., ICRU report 91) [35]. However, there is some discordance regarding the dose levels (i.e., prescription dose and dose inhomogeneity) required for each SBRT indication [13], even among experienced medical centers [25]. There is also considerable debate over dose-prescription methods (i.e., volume based on PTVDX% or on GTV/ITV/PTV D50% or point based on a reference point or on the maximum dose) and how they may affect plan harmonization. Multicenter planning studies generally follow the ICRU 91 recommendations (e.g., dose prescription to the PTVD98%), but newer studies have proposed prescription to the GTV or ITV median dose [37, 38] in order to achieve greater dosage harmonization inside the actual target [14].

3.4.2 Planning harmonization before clinical trials The studies that focused on clinical trials or SBRT program credentialing mainly analyzed dose delivery accuracy and did not evaluate treatment planning except for protocol violations [17-20,268

28,30,34]. However, these studies found significant differences in imaging [30], contouring [26, 28] and delivery quality [17,19,34] over a wide range of indications. For all of the participating centers, contouring protocol violations and benchmark delivery failures were as high as 23% [27] and 30% [17], respectively. On the other hand, some SBRT programs or studies on the clinical trial credentialing process focused on treatment planning in more detail [9,21,22,24,32], yet also in this case protocol violations, i.e. the violation of OAR limits based on references (e.g., RTOG, EORTC, etc.), were as high as 15% [24].

3.4.3 Technical and methodological studies The three retrospective multi-institutional studies [23-25] focused on compliance of the centers to the international standards, only found moderate agreement with established guidelines (ICRU-83). Specifically, Das et al. [23] discovered nearly 95% of patient treatments deviated from the ICRU-83 recommended D50 prescription dose delivery. In Woerner et al. survey [24] the planning criteria were unmet for all the critical structure such as lung, heart, spinal cord, esophagus, and trachea/bronchus for at least 15% of the patients. Moreover, among the various parameters used to evaluate the SBRT plans, CI100% and CI50% were the most challenging criteria to meet. Finally, Redmond et al. [25], in their survey, found strong agreement in the acceptable target dosimetric parameters used for plan analysis but low or moderate agreement regarding prescription doses, fractionation and normal tissue constraints. The last three studies focusing on the impact of contouring variability on dose volume data [26-28] highlighted that a distributed review approach don't compromises the qualityof treatment, don't delay its start, don’t require more resources [26]. Lo et al. [27] revealed 23% of major changes and 80% of plan with at least one major change recommended with 5% of structures presenting dosimetric violations. Lastly, the main results of the multicenter studies were that the correction of errors or poor quality contouring, treatment planning or delivery arising from the treatment procedure was possible thanks to failure analysis, training and repeated benchmarking. Protocol violations can be significantly reduced to fewer than 2% if specific guidelines are followed and knowledge is shared [3, 6, 11].

4. Discussion The harmonization of the radiotherapy treatment planning process is a major challenge for SBRT treatments, especially if the clinical results have to be shared, compared with other centers or 9

therapeutic options or evaluated in international multicenter trials. The aim of this review was to summarize the methods employed in multicenter comparisons in order to understand if there is a suitable approach for multicenter studies focused on SBRT treatment planning that can be considered mandatory not only for clinical trials but also for multicenter comparisons driven by national societies as well as international associations. Currently, clinical trial credentialing is the only area where participation in plan comparison studies is already mandatory and feedback is considered a fundamental part of quality assurance. During credentialing, the participating centers are asked to plan benchmark cases according to the protocols and to perform dose measurements of the treatment plans with a phantom generally provided by the audit institution. With regard to the dose delivery accuracy, over a longer timeframe Ibbot et al. [17] reported over a 30% failure-rate for a head and neck phantom, 14% for a prostate phantom, 29% for a thorax phantom and 25% for a liver phantom test. These tests identified errors such as incorrect output factors evaluation, inadequate modeling of the linear accelerator beam, incorrect patient positioning and errors in treatment planning software. Furthermore, the tests showed that centers which passed the credentialing process had lower deviation rates (major and minor) in several clinical trials. Nevertheless, the high initial failure-rate in the credentialing process is alarming and once again highlights the need for dedicated benchmarks, error correction and quality improvement strategies, the latter being currently only sporadically performed in clinical settings. However, the lack of accreditation centers worldwide means that audit processes are almost exclusively implemented in the context of international or large multicenter trials, although improvements have recently been observed on a national basis [10, 39]. Our review also showed that credentialing for clinical trials is not usually focused on planning comparisons or on the drafting of a best practice guideline or even the best treatment plan, which are fundamental steps for treatment quality optimization. For this reason, we believe that in addition to SBRT programs or clinical trial credentialing, participating in multicenter planning studies helps healthcare practitioners improve their professional capabilities by comparing and sharing knowledge. Many multicenter planning studies have already involved a large number of institutions (e.g. Esposito et al. [6] involved 38 institutions; Moustakis et al [15] involved 35 institutions). All of the studies compared various aspects of contouring, planning strategies, dose calculation algorithms and evaluation methods for different SBRT sites such as spine, lung, liver, pancreas and prostate. 4.1 Dose prescription 10

For all of the studies, the participants received some instructions and/or references with the corresponding treatment plan objectives and it was observed that the more comprehensive the instructions, the more consistent the results. For example, dose prescription should always be well defined, but achieving this with a single parameter set (e.g., 45Gy to the 65% isodose or 60 Gy to the isocenter) is a matter of debate for SBRT [5, 13, 35]. Given the objective of minimizing dose distribution variability among different centers and technologies in the light of multicenter clinical trials, a consensus on prescription modalities may help to harmonize SBRT practices. The recently published ICRU report 91 recommendations on the prescribing, recording and reporting of stereotactic treatments with small photon beams [35, 40] will surely help this process. The ICRU report 91 provides a thorough review of stereotactic radiotherapy techniques and specifically recommends prescription to the isodose surface which should cover the optimal percentage of the PTV while optimally restricting the dose to the OARs. More importantly, the ICRU report 91 also encourages the reporting of additional parameters such as the median target dose for GTV/CTV/ITV (D50%) and the near maximum (PTVD2%) and minimum (PTVD98%) doses which will undoubtedly help to improve inter-center and inter-systems comparisons and harmonization of SBRT practices. 4.2 Treatment plan comparisons: dose calculation and quality metrics Apart from selecting well-defined parameters, another aspect of multicentre studies is that they take data correlation into account. It was observed that dose comparison and statistical analysis were more meaningful in studies where plan and dosimetry data was requested from the participating centers rather than simply a report of the parameter from each center’s own TPS. A DVH recalculation by an independent system provided additional monitoring and quality assurance for native calculations on the participants TPS and ensured consistency between all of the parameters analyzed. In particular, for small structures in regions with high dose gradients, significant discrepancies were observed between re-calculated and submitted DVHs, due to the resolution of the dose grids and interpolation methods of the various TPS s [41]. Regardless of the technical evaluation methods, efforts should be made to define a comprehensive plan quality metric for objectively comparing treatment plans rather than comparing single parameters against each other and trying to interpret part comparisons as a whole. Various parameters were used in the studies to compare the plans: DVH curves and points, conformity and homogeneity index, EUD, TCP, NTCP and many others. However, it is advisable to solely refer to a single score in order to capture the true quality of a treatment plan, in terms of conflicting clinical goals and participating centers or even the participants’ preferences in balancing those goals. Blanck 11

et al [8, 15, 33] compared plans using a qualitative and quantitative score function and Esposito et al [6] defined a quality index by combining multiple DVH points into a single score, both based on the evaluation of the submitted treatment plans, whereas Nelms et al [12] used an absolute scoring function published upfront to the planning study so that the treatment plans could be evaluated at submission. Using absolute scoring, the participating centers are able to evaluate the quality of treatment plans and make improvements based on their scores; however this may also lead to “pursuing” a specific score using non-practical non-clinical methods which strongly depends on the score function itself. Furthermore, it is likely that the study itself also loses the average clinical perspective of the participants which would distort the harmonization process for clinical trials. This could be overcome by using relative scoring based on the submissions of the participants’ treatment plans without immediate scoring. Nevertheless, all of these score definitions show that national and international organizations must univocally define a quality score for comparing treatment plans, which take various clinical and physical aspects into account. Moreover, the results of many multicenter planning studies revealed that meeting specific plan objectives and implementing high quality treatment plans was statistically not dependent on any technological parameters (delivery platform, TPS, treatment modalities, plan complexity, etc.) or planner experience (years of experience, confidence, certification, education, etc.) [5,7,15]. Therefore, we hypothesize that the wide variation in plan quality could be attributed to a general “center experience and skill” category. Knowledge and experience sharing practices between health centers where best practices could be derived and disseminated would improve the mean quality and minimize variations in any population of treatment planners. Moore at al. [42] demonstrated sub-optimal planning of the high-dose arm (79.2 Gy) in the RTOG 0126 study for prostate radiotherapy with IMRT. In 94 out of 219 patients (42.9%) the authors found a risk reduction of grade II for rectal toxicity ≥5% thanks to a more appropriate planning. Therefore, mandatory online QC applications for clinical trials could help in reducing suboptimal plans. 4.3 Benchmarks, interaction and feedback Effective communication and coordination skills between health centers are essential traits for conducting multicenter studies; institutions that are required to undergo a credentialing or benchmarking process are also better prepared to comply with protocol requirements. For both retrospective and prospective approaches, the heterogeneity of various aspects implemented in different centers makes it difficult to compare quality outcomes due to the lack of standardization [34]. A well-designed multicenter quality platform may facilitate the sharing of knowledge between 12

participating centers by providing immediate feedback, as shown in Table 1. In Ibbot et al [17], Matsuo et al [9] and Eriguchi et al [21], effective feedback was given to the institutions through the interpretation of the results and suggestions for improvements: interpersonal communication can help health care practitioners and researchers improve treatment plans. When making plan quality comparisons, Blanck et al [33] provided feedback to participants through a reviewer ranking system conducted by a panel of experts combined with a mathematical ranking method based on the techniques and results of all participating centers. The inclusion of multiple indices and a final mathematical weighted plan rank/score was viewed as a potential speed-up in the review workflow making the management of future multi-center studies less work intensive [8,15]. On the other hand, Esposito et al [6] Marino et al [3] and Villaggi et al [7] suggested re-optimizing treatment plans in the event of noncompliance with protocol requirements using reference guides from the best plans or from the median results obtained from participants using the same treatment planning and delivery system. This is known as the “crowd knowledge-based” re-planning strategy, which is dynamically determined by the incoming results: the best or median DVH becomes the new benchmark. Many studies have proposed an automation or semi-automation of the feedback process in order to speed-up the treatment plan review [18,19,22,28] In the studies published by Timmermann et al [19] and Kron et al [22], a web-based near real-time (NRT) tool was used to review the plans and potential deviations were identified and notified to each participating health center to less than 48 hours [19]. The quality and consistency of automatically generated plans, can assist in evaluating constraint violations in submitted data and also identify suboptimal treatment that do fulfill all constraints, but with PTV or OAR dose delivery that could be optimized. Due to the automation, fast feedback on plans submitted by participating centers is feasible. 4.4 Delivery quality assurance In comparison to SBRT programs or clinical trial credentialing processes, the limitation of many multicenter plan comparison studies is the lack of delivery quality assurance (DQA) evaluation of the treatment plans. It is certainly advisable to verify the deliverability and accuracy of the planned dose distributions, ideally under certain conditions such as moving phantoms [29]. The results should also consider overall treatment plan quality, despite the additional issue of DQA method discrepancies [43] or even the comparison of complexity metrics for multi-institutional evaluations of treatment plans [44]: in this area we could register major improvements in the future.

5. Conclusions

13

Multicenter planning studies have demonstrated to play a key role in improving treatment plan harmonization, treatment plan compliance and even clinical practices. National and international associations of medical physicists, radiation oncologists and radiotherapists should be the proponents and coordinators of these projects in order to improve the skills and to broaden the knowledge of all participating centers. This review has highlighted that some fundamental steps must be taken in order to transform a simple treatment planning comparison study into a potential credentialing method for SBRT accreditation: prescription and general requirements should always be well defined; data analysis should be performed with independent DVH and/or dose calculations; quality score indices should be constructed; feedback and correction strategies should be provided and a simple web-based collaboration platform should be used. The results reported in this review clearly showed that a crowd-based re-planning approach is a viable method for achieving harmonization and standardization of treatment planning among centers using different technologies. The comparison between planning techniques could be easier by assigning to each plan a single overall plan quality score incorporating multiple parameters. The key point of a weighted sum of multiple parameters is the definition of the relative weights and the choice of the ideal goal and threshold values. For that reason, effort should be performed by international organizations, such as ICRU, AAPM, etc..., to univocally define robust and clinically relevant quality scores. Furthermore, in the absence of national audit programs, the delivery quality assurance of the generated treatment plans should be included in the multicenter planning studies and centers practicing SBRT should be strongly encouraged to participate in these studies in order to verify their technology and capabilities. In the future, we believe that both ongoing certified SBRT programs and clinical trial accreditation processes will be more focused on treatment plan quality. Furthermore, the recent introduction of automated treatment planning and evaluation is a promising approach for promoting the quality and uniformity of treatments in clinical trials and routine clinical practice. Therefore, evolving from institution-specific experience to multicenter comparisons, a strictly webbased and easy-to-use collaboration platform for research and radiotherapy quality assurance may prove to be a useful tool for designing future benchmark studies. Interpersonal team communication and knowledge sharing of experienced planners combined with simple yet comprehensive technology will improve clinical practices for all participants. It would be recommend that International scientific societies be the driving force behind such a model-sharing approach in order to harmonize widely. The conclusions are summarized in table 2. 14

considerations

National and international associations should propose and coordinate multicenter studies Prescription and general requirements should always be well defined and extremely detailed Data analysis should be performed with independent DVH and/or dose calculations Quality score should be constructed Feedback and correction strategies should be provided and a simple web-based collaboration platform should be used The delivery quality assurance of the generated treatment plans should be included in the multicenter projects Automated treatment planning and evaluation is a promising approach for promoting the quality and uniformity of treatment plans Interpersonal team communication and knowledge sharing of experienced planners combined with simple yet comprehensive technology will improve clinical practices for all participants

suggestions

Strictly web-based and easy-to-use collaboration platform should be implemented International societies should encourage model-sharing approach Effort should be made by International organizations (ICRU, AAPM...) to univocally define robust and clinically relevant quality scores.

Table 2 Summary overview References 1. Timmerman RD, Paulus R, Pass HI, et al. Stereotactic Body Radiation Therapy for Operable Early-Stage Lung Cancer: Findings From the NRG Oncology RTOG 0618 Trial. JAMA Oncol 2018 ;4(9):1263-1266. 2. Joo JH, Park JH, Kim JC, et al. Local Control Outcomes using Stereotactic Body Radiation therapy for Liver Metastases From Colorectal Cancer. Int J Radiat Oncol Biol Phys 2017;99(4):876-883. 3. Marino C, Villaggi E, Maggi G, et al. A feasibility dosimetric study on prostate cancer: Are we ready for a multicenter clinical trial on SBRT? Strahlenther Onkol 2015; 191(7): 573–581. 4. Esposito M, Maggi G, Marino C, et al. Multicentre treatment planning inter-comparison in a national context: The liver stereotactic ablative radiotherapy case Phys Med 2016;32:277-283. 5. Giglioli FR, Strigari L, Ragona R, et al. Lung stereotactic ablative body radiotherapy: a large scale multiinstitu--tional planning comparison for interpreting results of multi-institutional studies. Phys Med 2016;32(4):600–606. 6. Esposito M, Masi L, Zani M, et al. SBRT planning for spinal metastasis: indications from a large multicentric study. Strahlenther Onkol 2019;195(3):226-235. 7. Villaggi E, Hernandez V, Fusella M, et al. Plan quality improvement by DVH sharing and planner’s experience: Results of a SBRT multicentric planning study on prostate. Phys Med 2019; 62: 73–84. 8. Moustakis C, Blanck O, Ebrahimi Tazehmahalleh F, et al. Planning benchmark study for SBRT of early stage NSCLC: Results of the DEGRO Working Group Stereotactic Radiotherapy. Strahlenther Onkol 2017;193(10):780–790. 9. Matsuo Y, Takayama K, Nagata Y, et al. Interinstitutional Variations in Planning for Stereotactic Body Radiation Therapy for Lung Cancer. Int J Radiat Oncol Biol Phys 2007; 68(2): 416–425. 10. Distefano G, Lee J, Jafari S, et al. A national dosimetry audit for stereotactic ablative radiotherapy in lung. Radiother Oncol. 2017;122(3):406-410. 11. Mancosu P, Esposito M, Giglioli F, et al. Time for crowd knowledge-based approach in SBRT planning. Strahlenther Onkol 2017;193(12):1066-1067. 12. Nelms B, Robinson G, Markham J, et al. Variation in external beam treatment plan quality: An interinstitutional study of planners and planning systems. Pract Radiat Oncol 2012;2(4): 296-305. 13. Moustakis C, Blanck O, Ebrahimi F et al. Time for standardization of SBRT planning through large scale clinical data and guideline-based approaches. Strahlenther Onkol 2017;193(12):1068-1069.

15

14. Wilke L , Avcu Y, Albrecht C, et al. Can normalization to the mean ITV dose achieve a uniform dose distribution in the target volume for lung SBRT? Strahlenther Onkol 2017; 193(1): 21-22. 15. Moustakis C, Chan MKH, Kim J, et al. Treatment planning for spinal radiosurgery: A competitive multiplatform benchmark challenge. Strahlenther Onkol 2018; 194(9): 843–854. 16. Furuya T, Tanaka H, Ruschin M, et al. Evaluating dosimetric differences in spine stereotactic body radiotherapy: An international multi-institutional treatment planning study. J Radiosurg SBRT 2015;3(4):307314. 17. Ibbott GS, Followill DS, Molineu HA, et al. Challenges in Credentialing Institutions and Participants in Advanced Technology Multi-institutional Clinical Trials. Int J Radiat Oncol Biol Phys 2008; 71(1 Suppl): S71–5. 18. Lambrecht M, Melidis C, Sonke JJ, et al. Lungtech, a phase II EORTC trial of SBRT for centrally located lung tumours – a clinical physics perspective. Radiat Oncol 2016; 11:7. 19. Timmerman R., Galvin J., Michalski J., et al. Accreditation and quality assurance for Radiation Therapy Oncology Group: Multicenter clinical trials using stereotactic body radiation therapy in lung cancer. Acta Oncol 2006; 45(7): 779-786. 20. Djärv E, Nyman J, Baumann P, et al. Dummy run for a phase II study of stereotactic body radiotherapy of T1T2 N0M0 medical inoperable non-small cell lung cancer. Acta Oncol 2006;45(7):973–977. 21. Eriguchi T, Takeda A, Oku Y, et al. Multi-institutional comparison of treatment planning using stereotactic ablative body radiotherapy for hepatocellular carcinoma–benchmark for a prospective multi-institutional study. Radiat Oncol 2013;8:113. 22. Kron T, Chesson B, Hardcastle N, et al. Credentialing of radiotherapy centres in Australasia for TROG 09.03 (Chisel), a Phase III clinical trial on stereotactic ablative body radiotherapy of early stage lung cancer. Br J Radiol 2018; 91(1085): 20170737. 23. Das IJ, Andersen A, Chen Z, et al. State of Dose Prescription and Compliance to International Standard (ICRU-83) in Intensity Modulated Radiation Therapy among Academic Institutions. Pract Radiat Oncol 2017;7(2):145–155. 24. Woerner A, Roeske JC, Harkenrider MM, et al. A multi-institutional study to assess adherence to lung stereotactic body radiotherapy planning goals. Med Phys 2015;42(8):4629-4635. 25. Redmond KJ, Lo SS, Dagan R, et al. A multinational report of technical factors on stereotactic body radiotherapy for oligometastases. Future Oncol 2017;13(12): 1081-1089. 26. Gwynne S, Jones G, Maggs R, et al. Prospective review of radiotherapy trials through implementation of standardized multicentre workflow and IT infrastructure. Br J Radiol 2016;89:20160020. 27. Lo A, Liu M, Chan E, et al. The Impact of Peer Review of Volume Delineation in Stereotactic Body Radiation Therapy Planning for Primary Lung Cancer: A Multicenter Quality Assurance Study. J Thorac Oncol 2014;9(4): 527-533. 28. Habraken SJM, Sharfo AWM, Buijsen J, et al. The TRENDY multi-center randomized trial on hepatocellular carcinoma – Trial QA including automated treatment planning and benchmark-case results. Radiother Oncol 2017;125(3):507–513. 29. Colvill E, Booth J, Nill S, et al. A dosimetric comparison of real-time adaptive and non-adaptive radiotherapy: A multi-institutional study encompassing robotic, gimbaled, multileaf collimator and couch tracking. Radiother Oncol 2016;119(1):159–165. 30. Hurkmans CW, van Lieshout M, Schuring D, et al. Quality Assurance of 4D-CT Scan Techniques in Multicenter Phase III Trial of Surgery Versus Stereotactic Radiotherapy (Radiosurgery or Surgery for Operable Early Stage (Stage 1A) Non–Small-Cell Lung Cancer [ROSEL] Study). Int J Radiat Oncol Biol Phys 2011;80(3):918–927. 31. Kawai D, Takahashi R, Kamima T, et al. Variation of the prescription dose using the analytical anisotropic algorithm in lung stereotactic body radiation therapy. Phys Med 2017; 38: 98–104. 32. Nishio T, Kunieda E, Shirato H, et al. Dosimetric verification in participating institutions in a stereotactic body radiotherapy trial for stage I non-small cell lung cancer: Japan clinical oncology group trial (JCOG0403). Phys Med Biol 2006;51(21): 5409-5417.

16

33. Blanck O, Wang L, Baus W, et al. Inverse treatment planning for spinal robotic radiosurgery: an international multi-institutional benchmark trial. J Appl Clin Med Phys 2016;17(3):313-330. 34. Clark CH, Hurkmans CW, Kry SF. The role of dosimetry audit in lung SBRT multi-centre clinical trials. Phys Med 2017;44:171–176. 35. Seuntjens J, Lartigau EF, Cora S, et al. ICRU Report 91. Prescribing, Recording, and Reporting of Stereotactic Treatments with Small Photon Beams. J ICRU 2014;14(2):1-160. 36. Chan MK, Wong M, Leung R, et al. Optimizing the prescription isodose level in stereotactic volumetricmodulated arc radiotherapy of lung lesions as a potential for dose de-escalation. Radiat Oncol 2018;13(1):24. 37. Lacornerie T, Lisbona A, Mirabel X, et al. GTV-based prescription in SBRT for lung lesions using advanced dose calculation algorithms. Radiat Oncol 2014; 9:223. 38. Lebredonchel S, Lacornerie T, Rault E, et al. About the non-consistency of PTV-based prescription in lung. Phys Med 2017; 44:177-187. 39. Okamoto H, Minemura T, Nakamura M, et al. Establishment of postal audit system in intensity-modulated radiotherapy by radiophotoluminescent glass dosimeters and a radiochromic film. Phys Med 2018;48:119-126. 40. Wilke L, Andratschke N, Blanck O. ICRU report 91 on prescribing, recording, and reporting of stereotactic treatments with small photon beams : Statement from the DEGRO/DGMP working group stereotactic radiotherapy and radiosurgery. Strahlenther Onkol 2019;195(3):193-198. 41. Eaton DJ, Lee J, Paddick I. Stereotactic radiosurgery for multiple brain metastases: Results of multicenter benchmark planning studies. Pract Radiat Oncol. 2018 Jul - Aug;8(4):e212-e220 42. Moore KL, Schmidt R, Moiseenko V, et al. Quantifying Unnecessary Normal Tissue Complication Risks due to Suboptimal Planning: A Secondary Study of RTOG 0126. J Radiat Oncol Biol Phys 2015;92(2):228-35. 43. Hussein M, Clementel E, Eaton DJ, et al. Global Quality Assurance of Radiation Therapy Clinical Trial Harmonisation Group. A virtual dosimetry audit - Towards transferability of gamma index analysis between clinical trial QA groups. Radiother Oncol 2017;125(3):398-404. 44. Hernandez V, Saez J, Jurado-Bruggeman MPD, Jornet N. Comparison of complexity metrics for multiinstitutional evaluations of treatment plans in radiotherapy. Phys Imaging Radiat Oncol 5 (2018) 37–43.

17

Study ID,

Method of comparison

year

Platform for report/review

DVH independent calculation

Initial % of fulfillment

Replan

n° of

Phase

participants

n° of cases Connection to a clinical trial

Multicenter Studies to assess Dosimetric Variability Nelms 2012 [12]

Scoring mechanism Plan Quality Metric (PQM) with 14 dose metric components

3DVH software

yes

Not reported

no

125

1 prostate lesion

no

Furuya 2015 [16]

Interinstitutional variations and protocol dose constraints

MIM

yes

100%, PTV inhomogeneity

no

3

3 spine lesions

no

Marino 2015 [3]

Intercomparison based on dosimetric results

Standardized forms + MATLAB

no

66/70 (94%)

yes

14

5 prostate lesions

no

Esposito 2016 [4]

Dose distribution comparison

MATLAB

no

66/70 (94.3%)

no

14

5 liver lesions

no

Giglioli 2016 [5]

Protocol dose constraints + gEUD2

Velocity + Matlab

yes

124/130 no violations (95.4%)

no

26

5 lung lesion

no

Moustakis 2017 [13]

Fulfillment of the dosimetric plan objectives + mathematical ranking

n/a

no

96.6% for OAR

no

29

3 lung lesions

No

Esposito 2018 [6]

Fulfillment of the dosimetric plan objectives + mathematical score

MIM

yes

87.5%

yes

48

2 spine lesions

no

Moustakis 2018 [15]

Comparison with Reference + mathematical ranking

n/a

no

63.9% for OAR

no

12

3 spine lesions

no

Villaggi 2019 [7]

Intercomparison based on dosimetric results and complexity metrics

R

no

100% for PTV; 70% for OAR; outliers for PTV and OAR doses

yes

13

5 prostate lesions

no

Planning Harmonization Studies before Clinical Trials

Djarv 2006 [20]

Intercomparison based on volumes and dosimetric results

n/a

no

Not reported

no

6

2 lung lesions

Scandinavia n SBRT study for NSCLC stage I

Matsuo 2007 [9]

Interinstitutional variations and protocol dose constraints

n/a

no

100% for OAR; outliers for PTV doses

no

11

4 lung lesions

JCOG 0403

Eriguchi 2013 [21]

Intercomparison based on dosimetric results

Eclipse

yes

Only variability analyzed

no

4

4 liver lesions; 3 plans for each case

no

Lambrecth 2016 [18]

Protocol dose constraints

VodcaRT in EORTCQA platform

yes

11/12 (91.7%)

provided

12

1 central lung lesion

LungTech

Multicenter Studies on Specific Technical and Methodological Aspects Blanck 2016 [33]

Human reviewer + mathematical ranking

ARTIVIEW

yes

80% for PTV coverage; 60% for OAR constraints

no

10

1 spine lesion

no

Colvill 2016 [29]

RTOG 1021 and RTOG 0938 dose constraints; measurements in static and motion phantoms

Standardized forms

no

not reported

no

10

1 lung and 1 prostate lesion

no

Habraken 2017 [28]

Trial protocol + results from the autoVMAT Erasmus-iCycle software

AutoVMAT

yes

18/20 (90%) +2 minor deviations

yes

9

1 liver lesion

TRENDY

Kron 2018 [22]

Protocol dose constraints + beam arrangements advices

SWAN

yes

9/40 (77.5%)

yes

22

2 lung lesions

CHISEL

Table 1 Overview of multicenter planning studies

considerations

National and international associations should propose and coordinate multicenter studies Prescription and general requirements should always be well defined and extremely detailed Data analysis should be performed with independent DVH and/or dose calculations Quality score should be constructed Feedback and correction strategies should be provided and a simple web-based collaboration platform should be used The delivery quality assurance of the generated treatment plans should be included in the multicenter projects Automated treatment planning and evaluation is a promising approach for promoting the quality and uniformity of treatment plans Interpersonal team communication and knowledge sharing of experienced planners combined with simple yet comprehensive technology will improve clinical practices for all participants

suggestions

Strictly web-based and easy-to-use collaboration platform should be implemented International societies should encourage model-sharing approach Effort should be made by International organizations (ICRU, AAPM...) to univocally define robust and clinically relevant quality scores.

Table 2 Summary overview