Effect of CAD on performance in ASPECTS reading

Effect of CAD on performance in ASPECTS reading

Informatics in Medicine Unlocked 18 (2020) 100295 Contents lists available at ScienceDirect Informatics in Medicine Unlocked journal homepage: http:...

1MB Sizes 0 Downloads 54 Views

Informatics in Medicine Unlocked 18 (2020) 100295

Contents lists available at ScienceDirect

Informatics in Medicine Unlocked journal homepage: http://www.elsevier.com/locate/imu

Effect of CAD on performance in ASPECTS reading €n b, Jens Fiehler a, Marielle Ernst a, *, Martina Bernhardt a, Matthias Bechstein a, Gerhard Scho c c, d e Charles B.L.M. Majoie , Henk A. Marquering , Wim H. van Zwam , Diederik W.J. Dippel f, Robert J. van Oostenbrugge g, Einar Goebell a, on behalf of the MR CLEAN trial investigators1 a

Department of Diagnostic and Interventional Neuroradiology, University Medical Center Hamburg-Eppendorf, Germany Department of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf, Germany Department of Radiology, Academic Medical Center, Amsterdam, the Netherlands d Department of Biomedical Engineering and Physics, Academic Medical Center, Amsterdam, the Netherlands e Department of Radiology, Maastricht University Medical Center, the Netherlands f Department of Neurology, Erasmus MC University Medical Center Rotterdam, the Netherlands g Department of Neurology, Maastricht University Medical Center and Cardiovascular Research Institute Maastricht (CARIM), the Netherlands b c

A R T I C L E I N F O

A B S T R A C T

Keywords: Computed tomography Ischemic stroke Diagnostic method Alberta stroke program early CT score Machine learning

Background and purpose: While computer-aided diagnosis (CAD) tools are widely used in stroke imaging routines already, their influence on actual decision-making is still underexplored. We analyzed the effect of a simulated CAD tool on ASPECT-Scoring on acute-stroke CT-scans with respect to experience level. Materials and methods: Baseline CT scans of 100 stroke patients from the MR CLEAN trial with consensusASPECTS as ground truth were independently ASPECTS graded by three readers with different levels of expe­ rience. Weeks later the same CTs were re-analyzed with additional displaying of simulated ASPECTS (s-ASPECTS, by adding or subtracting 2 points from the ground truth). Readers were told that the score was generated by an automatic ASPECT-Scoring algorithm. The influence of the displayed s-ASPECTS on the readers’ second ASPECTScoring was analyzed by using a linear mixed model and the reliability was assessed. Performance was measured as the absolute difference between readers ASPECTS and consensus-ASPECTS. Results: The influence of the s-ASPECTS on the second ASPECT-Scoring was the lowest for the reader with the most experience in neuroradiology, while the other readers were significantly more influenced. All readers veered further away from the ground truth in their second ASPECT-Scoring with the s-ASPECTS, though not significantly. Overall interrater reliability was excellent (ICC ¼ 0.94 [0.92–0.96]). Conclusions: ASPECT-Scoring may be significantly influenced by simulated ASPECTS displayed by a suboptimal CAD tool, especially in readers with less experience, and performance tends to decrease.

1. Introduction Computer-aided diagnosis (CAD) is being increasingly applied in the field of neuroradiology. Especially in image analysis of acute stroke patients, which is the key factor in stroke management, CAD tools might obtain a pivotal role in determining the therapeutic approach and pre­ dicting the prognosis in an individualized manner [1]. The Alberta Stroke Program Early CT Score (ASPECTS) is a quanti­ tative method of estimation of infarct size with non-contrast computed tomography (NCCT) during the acute phase [2]. ASPECTS is widely used in routine clinical practice as a fast and easy method to guide acute

stroke treatment decisions. Previous work has illustrated the importance of ASPECTS for identifying patients who will benefit from endovascular treatment (EVT) [3,4]. Despite its broad application ASPECTS has lim­ itations. First of all, ASPECTS assessment can be compromised by poor-quality scans due to patient motion. Second, patient characteristics such as prior stroke and leukoaraiosis as well as observer characteristics such as training and experience might affect ASPECTS reading [5,6]. Moreover, it has been observed that ASPECTS in the ultra-early phase of stroke (<90 min), has higher inter-rater variability and is less reliable in the prediction of patient outcome [7,8]. Recently, automated ASPECT-Scoring methods have been presented as an alternative to

* Corresponding author. E-mail address: [email protected] (M. Ernst). 1 www.mrclean-trial.org. https://doi.org/10.1016/j.imu.2020.100295 Received 13 November 2019; Received in revised form 9 January 2020; Accepted 13 January 2020 Available online 21 January 2020 2352-9148/© 2020 The Authors. Published by Elsevier Ltd. This is an open (http://creativecommons.org/licenses/by-nc-nd/4.0/).

access

article

under

the

CC

BY-NC-ND

license

M. Ernst et al.

Informatics in Medicine Unlocked 18 (2020) 100295

visual ASPECTS [9,10]. In two recent studies, the sensitivity and spec­ ificity of an automated machine learning algorithm, i.e. the e-ASPECTS software, were found to be significantly higher than those of stroke trainees, and non-inferiority was reported if compared to stroke experts [11,12]. So far, the focus of research has been the diagnostic or prog­ nostic performance of CAD tools, but the influence of a CAD-tool on ASPECT-Scoring and thus on clinical decisions is unclear. CAD-tools are said to be particularly helpful for physicians with less experience in stroke imaging, who are usually the first interpreters of a scan of an acute stroke patient. However, a CAD-tool that shows inaccurate scores might be misleading and rather do harm than be of help. We aimed to analyze the effect of a suboptimal CAD tool on ob­ servers’ performance and decision in ASPECT-Scoring on acute-stroke CT-scans with respect to experience level.

subtracted, in case of a Score of 0 or 5 two points were added. In half of the cases with an ASPECT-Score of 6, 7 or 8 two points were added, in the other half two points were subtracted. The distributions of the consensus-ASPECTS of the original MR CLEAN population and of the included cases as well as of the s-ASPECTS are shown in Fig. 1. 2.4. Statistic The quantitative ASPECT-Score was treated as a metric variable and differences between the scores were analyzed. First, the effect on the amount of deviation in the direction of the displayed s-ASPECT score during second ASPECT-Scoring was analyzed. If the s-ASPECTS was higher than the first ASPECT-Scoring, the sub­ traction between 2nd ASPECT-Scoring and 1st ASPECT-Scoring was calculated. If the s-ASPECT was lower than the first ASPECT-Scoring, the subtraction between 1st ASPECT-Scoring and 2nd ASPECT-Scoring was calculated. Then, we compared the effect of the displayed s-ASPECT score on the calculated difference between first and second ASPECTScoring for each Reader by employing a linear mixed model with “de­ viation in the direction of the displayed s-ASPECT score during second ASPECT-Scoring” as outcome variable, random intercepts for each case and fixed effects for the Readers. Pairwise posthoc comparisons of Readers were done. The Readerspecific estimated marginal means and their 95% confidence intervals are presented, along with the p values of the pairwise comparisons. P values < 0.05 were considered statistically significant. Second, performance was measured as absolute difference between readers’ ASPECTS and consensus-ASPECTS. The change in performance between first ASPECT-Scoring and second ASPECT-Scoring with dis­ played s-ASPECTS was analyzed. Therefore, a two-sided paired t-test was performed to determine the difference between the mean value of the absolute difference between consensus-ASPECT-Score and first ASPECT-Scoring and the mean value of the absolute difference between consensus-ASPECT-Score and second ASPECT-Scoring. Intra-rater reliability was assessed for each reader between first and second ASPECT-Scoring. Inter-rater reliability was assessed for each reader between consensus-ASPECTS and first ASPECT-Scoring, between consensus-ASPECTS and second ASPECT-Scoring as well as between sASPECTS and second ASEPCT-Scoring. As a measure of inter-rater and intra-rater reliability of total-ASPECT-Score, intraclass correlation co­ efficients (ICCs) were calculated. ICC estimates and their 95% confi­ dence intervals were calculated based on a two-way mixed-effects model with absolute-agreement. We used the following categories for inter­ preting ICCs: Values less than 0.5 are indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values be­ tween 0.75 and 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability [16]. All analyses were performed with SPSS 25 (IBM, Armonk, New York).

2. Materials and Methods 2.1. Study population The analyses are performed on data from the Multicenter Random­ ized Clinical Trial of Endovascular Therapy for Acute Ischemic Stroke in the Netherlands (MR CLEAN), which evaluated the effect of usual care versus intraarterial treatment in patients with acute ischemic anterior circulation stroke. The design and main results of the study have been reported previously [13,14]. Baseline NCCT scans of 100 stroke patients from the MR CLEAN trial without beam hardening or motion artifacts and with existing consensus-ASPECTS were randomly chosen and included in the study. Consensus-ASPECTS by the MR CLEAN core lab was determined by three physicians highly experienced in stroke im­ aging and the ASPECTS system. 2.2. Ethics statement The MR CLEAN study protocol was approved by the Medical and Ethical Review Committee (Medisch Ethische Toetsings Commissie of the Erasmus MC, Rotterdam, The Netherlands) and the research board of each participating center. All patient records and images were anony­ mized before analysis, and written informed consent was acquired from all patients or their legal representatives as a part of the original trial protocol. 2.3. Image analysis and ASPECT-Scoring All images were independently analyzed in random order and by three readers who were blinded to all clinical information except stroke side: Reader 1, a neuroradiologist with experience in stroke imaging over the past 15 years, Reader 2, a resident with 4 years of experience in stroke imaging and Reader 3, a resident with 1 year of experience in stroke imaging. Window and level settings were adjusted at the discre­ tion of the readers to increase contrast between normal and ischemic brain. All readers were instructed in the correct use of the ASPECTS according to the current methodology [15]. ASPECTS is calculated by subtracting the number of affected regions from a total possible score of 10, such that lower scores correspond to larger infarcts. Weeks later, the same readers were asked to grade again 100 NCCTscans but this time an ASPECT Score was displayed, simulating a CAD tool. Readers were told that the score was generated by an automatic ASPECT-Scoring algorithm. They were not told about the performance characteristics of the CAD tool. They were not informed that the same initial 100 NCCT-scans were presented in a different random order. The displayed simulated auto­ mated ASPECT Score (s-ASPECTS) was generated by either adding or subtracting two points from the MR CLEAN consensus-ASPECTS that was used as ground truth. This was in line with a recently proposed automated ASPECT-Scoring approach with limits of agreement of 3.3 and 2.6 [10]. In case of an ASPECT-Score of 10 or 9 two points were

3. Results 3.1. Influence of displayed s-ASPECTS Mean ASPECTS of the first and second ASPECT-Scoring were 7.31 and 7.10 for Reader 1, 7.31 and 7.23 for Reader 2 and 8.21 and 7.85 for Reader 3. The mean consensus-ASPECTS was 8.02. Mean deviations in relation to the displayed s-ASPECTS during second ASPECT-Scoring for all three Readers are shown in Fig. 2. Reader 1 did not differ from its first ASPECT-Scoring in 49 out of 100 cases, Reader 2 in 35 of cases and Reader 3 in 29 of cases. Mean deviations in relation to the displayed sASPECTS during second ASPECT-Scoring for all three Readers over all consensus-ASPECTS were 0.17 [-0.06; 0.40], 0.64 [0.41; 0.87] and 0.78 [0.55; 1.01] for Reader 1, Reader 2 and Reader 3, respectively. Hence the influence of the displayed s-ASPECTS on the second ASPECT-Scoring was the lowest for Reader 1 with the most experience in neuroradiology, while Reader 2 and 3 were significantly more influenced (Table 1). 2

M. Ernst et al.

Informatics in Medicine Unlocked 18 (2020) 100295

All MR CLEAN patients consensus ASPECTS 0 1 2 3 4 5 6 7 8 9 10

n cases 1 2 3 10 14 16 29 48 89 114 170

Included MR CLEAN patients consensus ASPECTS 0 1 2 3 4 5 6 7 8 9 10

n cases

simulated ASPECTS 0 1 2 3 4 5 6 7 8 9 10

n cases

0 1 2 3 4 4 5 10 17 24 30

0 0 0 1 6 7 12 28 31 6 9

Fig. 1. Distributions of the consensus-ASPECTS of the original MR CLEAN population and of the included cases as well as of the simulated displayed ASPECTS during second ASPECT-Scoring.

3

M. Ernst et al.

Informatics in Medicine Unlocked 18 (2020) 100295

Reader 1

Reader 2

Reader 3

Fig. 2. X-values of 0 signify no change during 1st and 2nd ASPECT-Scoring. Positive x-values signify a deviation in direction of the displayed simulated ASPECTS, negative x-values signify a deviation away from the displayed simulated ASPECTS during 2nd ASPECT-Scoring.

There was no significant difference between Reader 2 and 3 over all consensus-ASPECTS. The influence of the displayed s-ASPECTS across different consensus-ASPECTS ranges is shown in Fig. 3. There was no significant difference between all Readers for consensus-ASPECTS 9–10. For consensus-ASPECST 0–6 there was a significant difference between

Reader 3 vs. Reader 1 and 2, while Reader 1 and 2 did not differ significantly. 3.2. Performance With regard to the performance of the first ASPECT-Scoring, the mean absolute difference between the estimated ASPECT-Scoring and the consensus-ASPECT-Scoring was lowest for Reader 3 (1.03 � 1.03), followed by Reader 2 (1.17 � 1.30) and Reader 1 (1.23 � 1.11) (Table 2). All readers veered away from the consensus-ASPECT-Score in their second ASPECT-Scoring with the displayed s-ASPECTS, though not significantly: Reader 1 ( 0.09, CI -0.28-0.10), Reader 2 ( 0.20, CI -0.44-0.04) and Reader 3 ( 0.10, CI -0.33-0.13) (Table 3 and Fig. 4). The readers did not significantly differ with regard to performance. The number of patients scored ASPECTS 0–5 by the MR CLEAN experts, but not by the readers were n ¼ 5 and n ¼ 3 for Reader 1 in its 1st and 2nd Scoring, n ¼ 3 and n ¼ 5 for Reader 2, and n ¼ 4 and n ¼ 4 for Reader 3. The number of patients scored ASPECTS 0–5 by the readers, but not by

Table 1 Difference between readers regarding deviation towards simulated s-ASPECT Score (Linear mixed model) Fixed Coefficients.a. 95% Confidence Interval Model Term

Coefficient

Significance

Lower

Upper

Reader 1 Reader 2 Reader 3

0b 0.47 0.61

0.002 <0.001

0.17 0.31

0.77 0.91

a Target: Deviation between 1st and 2 nd ASPECT-Scoring in relation to simulated s-ASPECT-Score. b This coefficient is set to zero because it is redundant.

4

M. Ernst et al.

Informatics in Medicine Unlocked 18 (2020) 100295

the MR CLEAN experts were: n ¼ 4 and n ¼ 7 for Reader 1, n ¼ 3 and n ¼ 5 for Reader 2, and n ¼ 0 and n ¼ 1 for Reader 3. 3.3. Intra- and inter-rater reliability The results of intra- and inter-rater reliability of total ASPECT-Score are shown in Table 4. Intra-rater reliability was excellent for Reader 1 (0.94 [0.91–0.96]) and good to excellent for Reader 2 (0.87 [0.80–0.91]) and Reader 3 (0.86 [0.78–0.91]). Overall inter-rater reli­ ability was excellent (0.94 [0.92–0.96]). Inter-rater reliabilities between consensus-ASPECTS and first as well as second ASPECT-Scoring were moderate to good for Reader 1 and 2, and good to excellent for Reader 3. 4. Discussion CAD tools based on artificial intelligence such as automated reading of NCCTs are becoming part of the clinical routine. The focus of research on this subject is typically the diagnostic or prognostic performance of these tools [10–12,17,18]. However, the outcome of the interaction between man and machine is of great relevance as well. Particularly, the influence of these tools on decision-making is still underexplored, and to our knowledge the effect of a simulated CAD tool on ASPECT-Scoring has not been investigated yet. Our study suggests a significant influence of sub-optimal s-ASPECTS on ASPECT-Scoring. This might be explained by the well-known anchoring effect; the fact that people’s answers to a question are influ­ enced by the suggestion of an arbitrary value as a possible answer to the question [19]. On the one hand, this finding would be desirable pro­ vided that a CAD-tool of high accuracy increases the performance in ASPECT-Scoring, and thus helps physicians determining prognosis and deciding on appropriate therapy in patients presenting with acute ischemic strokes. Many hospitals around the world do not have 24/7 access to an expert neuroradiologist and have to rely on non-expert interpretation of CT scans during the therapeutic time window. In a recent review, automated ASPECTS was augured to be particularly helpful for medical staffs who are not accustomed to stroke imaging, such as general practitioners or paramedics; the decision to initiate thrombectomy may thus be markedly faster [1]. A recent study inte­ grated e-ASPECTS into the prehospital stroke management via a mobile stroke unit to support decisions regarding the treatment option and the triage regarding the most appropriate target hospital [20]. Our study confirms that physicians with less experience in neuroradiology are more influenced compared to a stroke expert. Early research demon­ strated that experts were more resistant to anchoring [21]. On the other hand, we showed that physicians are likewise influ­ enced in their ASPECT-Scoring given a suboptimal CAD-tool, leading to a decrease of performance and possibly causing wrong treatment de­ cisions. One drawback of the introduction of machine learning tech­ niques into clinical practice is the “black box” nature, i.e. the fact that the machines are learning from imaging features that are partly unrec­ ognizable to human beings and that their logic and technical mecha­ nisms are mostly incomprehensible. Moreover, algorithms might show a very good performance in a training dataset but be of less use in real clinical practice. Especially, for physicians with less experience it might be difficult to realize, when to trust the CAD-tool and when it is neces­ sary to consult an expert. Overreliance on artificial intelligence might have serious implications if it leads erroneously to the denial of therapy. Thus, ASPECTS could potentially be used to identify patients with small baseline ischemic core who could benefit from neuroprotection or from collateral augmentation [22]. A recent meta-analysis of the five groundbreaking trials proving ef­ ficacy of mechanical thrombectomy published in 2015 could demon­ strate neither benefit nor harm of EVT in large infarcts (ASPECTS 0–5) [23]. Though it is still a matter of debate, the consensus statement of the European Stroke Organisation on mechanical thrombectomy in acute ischemic stroke suggested that patients with radiological signs of large

Fig. 3. Mean deviations in relation to the displayed simulated ASPECTS during 2nd ASPECT-scoring for all three Readers over all consensus-ASPECTS as well as different consensus-ASPECTS ranges. Positive values signify a deviation in direction of the displayed simulated ASPECTS, negative values signify a devi­ ation away from the displayed simulated ASPECTS. Table 2 Performance during first and second ASPECT-Scoring. First ASPECT-Scoring Reader 1 Reader 2 Reader 3 Second ASPECT-Scoring Reader 1 Reader 2 Reader 3

Mean

Standard Deviation

1.23 1.17 1.03

1.11 1.30 1.03

1.32 1.37 1.13

1.12 1.16 0.87

Performance was measured for each reader as mean absolute difference between consensus-ASPECT-Scoring and first ASPECT-Scoring, and second ASPECTScoring respectively. A smaller mean absolute difference signifies a better performance. Table 3 Change of Performance – Paired Samples Test. Mean change of performance was calculated as differences of performances during first and second ASPECTScoring (see Table 2). 95% Confidence Interval of the Difference Mean Reader 1 Reader 2 Reader 3

Standard Deviation

Lower

Upper

Significance (2tailed)

0.09

0.94

0.28

0.10

0.34

0.20

1.21

0.44

0.04

0.10

0.10

1.15

0.33

0.13

0.39

5

M. Ernst et al.

Informatics in Medicine Unlocked 18 (2020) 100295

Rater 1

Rater 2

Rater 3

Fig. 4. Change of Performance. Performance at 1st and 2nd ASPECT-Scoring was measured as absolute difference between readers ASPECTS and consensusASPECTS. Negative points signify a worse performance, positive points a better performance at 2nd ASPECT-Scoring.

6

M. Ernst et al.

Informatics in Medicine Unlocked 18 (2020) 100295

clinical utility. Interobserver reliability of ASPECTS depends on several factors, such as knowledge of the stroke symptoms side, stroke onset-toimaging time, reader experience, and amount of reader training. In accordance with previous studies, overall interrater reliability of total ASPECTS was good [28]. This can be explained by the facts that readers were trained in ASPECT-Scoring and had knowledge of the affected hemisphere. We did not find a change in reliability between ASPECT-Scoring with or without displayed s-ASPECTS. As a limitation, patients’ ASPECTS were not normally distributed, with 81% of patients having an ASPECTS �7. This is in accordance with previous studies reporting an ASPECTS >7 on the initial NCCT in up to 90% of patients [8,28]. We focused on the influence of a suboptimal CAD-tool, as its influence would be of greater relevance from a clinical point of view. The effect might be different with an optimal CAD-tool and should be analyzed in future studies. Moreover, the displayed s-ASPECTS always deviated 2 points from the consensus-ASPECTS. Thus, the influence of a CAD-tool might be lower or higher in case of other deviations. The amount of deviation was in line with a recently proposed automated ASPECT-Scoring approach with limits of agree­ ment of 3.3 and 2.6 [10]. Moreover, we chose this fixed variation to dispel reader’s doubts in the validity of the CAD tool, as a huge deviation from the ground truth might have been striking. Finally, this approach allowed us to simplify our model in order to analyze if there is an in­ fluence or not.

Table 4 Reliability for total ASPECTS assessed with intraclass correlation coefficient. Intraclass Correlation (average measures)

95% Confidence Interval Lower Bound

Upper Bound

Intra-rater reliability Reader 1 0.94 Reader 2 0.87

0.91 0.80

0.96 0.91

Reader 3

0.78

0.91

0.86

Inter-rater reliability Overall inter0.94 0.92 0.96 rater reliability First and Second ASPECT-Scoring vs. Consensus-ASPECT-Scoring Reader 1 first 0.83 0.68 0.90 ASPECTScoring Reader 1 s 0.83 0.60 0.91 ASPECTScoring Reader 2 first 0.82 0.69 0.89 ASPECTScoring Reader 2 s 0.79 0.62 0.88 ASPECTScoring Reader 3 first 0.87 0.80 0.91 ASPECTScoring Reader 3 s 0.86 0.79 0.91 ASPECTScoring

excellent good to excellent good to excellent excellent

moderate to good moderate to excellent moderate to good moderate to good

5. Conclusions

good to excellent

Simulated ASPECTS displayed by a suboptimal CAD tool signifi­ cantly influences ASPECT-Scoring. This may hold, in particular, for readers with less experience, and the overall performance tends to decrease.

good to excellent

infarcts may be unsuitable for thrombectomy and proposes using the ASPECTS for Imaging-guided patient selection [24]. Based on this threshold, up to 5% of the patients in our study would have been judged suitable for EVT by our readers though not by the MR CLEAN experts, and in up to 7% access to EVT would have been denied by our readers though not by the MR CLEAN experts. Thus, it is recommended not to exclude patients from treatment by strict ASPECTS values, but to consider other imaging and clinical features as well [25]. Another issue confined to ASPECTS is the poor sensitivity of NCCT in the early period after stroke [26]. Though a recent study showed a non-inferior performance of e-ASPECTS in the assessment of NCCT compared to stroke experts, it has to be considered that the sensitivity of 44% of both experts and e-ASPECTS was very low [12]. This might be explained by the fact that follow-up scans and not acute MRI or perfu­ sion CT imaging served as ground truth to determine the definite area of infarction. Moreover, only three patients were treated with an endo­ vascular approach and 123 patients with intravenous thrombolysis. As stroke is a dynamic process, further brain areas might be infarcted in the meantime. Thus, the used follow-up scans most likely did not reflect the ground truth. This is in line with a Cochrane review on detecting early ischemic changes on CT when compared to MRI [27]. As human judg­ ment is often inconsistent, it is tempting to replace it by objective, reliable automatic algorithms when available. However, by ignoring the underlying poor sensitivity this is likely to result in too high confidence in computer aided judgment. Thus, uncritical acceptance of suggestions by a potentially suboptimal CAD-tool can lead to worse performance, as shown in our study. ASPECTS were graded by three readers with different levels of experience in stroke imaging. The effect of CAD-tools on medical staff who are not accustomed to stroke imaging would be of further interest and should be analyzed in future studies. The consistency with which readers make their judgments provides important information about the extent of error inherent to the process of visual imaging evaluation. Reliability of an imaging finding is an essential indicator of the potential of a diagnostic tool and determines its

Funding statement This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. Ethics statement The MR CLEAN study protocol was approved by the Medical and Ethical Review Committee (Medisch Ethische Toetsings Commissie of the Erasmus MC, Rotterdam, The Netherlands) and the research board of each participating center. All patient records and images were anony­ mized before analysis and written informed consent was acquired from all patients or their legal representatives as a part of the original trial protocol. Subject terms Ischemic Prognosis.

Stroke,

Computerized

Tomography

(CT),

Imaging,

Declaration of competing interest DWJD reports grants from the Dutch Heart Foundation, Medtronic/ Covidien, AngioCare, Stryker, Medac, Lamepro, Penumbra, Top Medi­ cal/Concentric. WHvZ reports speaking engagements with Stryker and Codman. HAM reports ownership interest with Nico-lab. CBM reports research grants from the Dutch Heart Foundation for the submitted work; Toegepast Wetenschappelijk Instituut voor Neuromodulatie foundation, European Commission and Stryker (outside the submitted work); all paid to institution. The other authors declare no competing interests.

7

M. Ernst et al.

Informatics in Medicine Unlocked 18 (2020) 100295

Acknowledgement

[14] Fransen PS, Beumer D, Berkhemer OA, van den Berg LA, Lingsma H, van der Lugt A, et al. Mr clean, a multicenter randomized clinical trial of endovascular treatment for acute ischemic stroke in The Netherlands: study protocol for a randomized controlled trial. Trials 2014;15:343. [15] Wei D, Oxley TJ, Nistal DA, Mascitelli JR, Wilson N, Stein L, et al. Mobile interventional stroke teams lead to faster treatment times for thrombectomy in large vessel occlusion. Stroke 2017;48:3295–300. [16] Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016;15:155–63. [17] Goebel J, Stenzel E, Guberina N, Wanke I, Koehrmann M, Kleinschnitz C, et al. Automated aspect rating: comparison between the frontier aspect score software and the brainomix software. Neuroradiology 2018;60:1267–72. [18] Guberina N, Dietrich U, Radbruch A, Goebel J, Deuschl C, Ringelstein A, et al. Detection of early infarction signs with machine learning-based diagnosis by means of the alberta stroke program early ct score (aspects) in the clinical routine. Neuroradiology 2018;60:889–901. [19] Tversky A, Kahneman D. Judgment under uncertainty: heuristics and biases. Science 1974;185:1124–31. [20] Grunwald IQ, Ragoschke-Schumm A, Kettner M, Schwindling L, Roumia S, Helwig S, et al. First automated stroke imaging evaluation via electronic alberta stroke program early ct score in a mobile stroke unit. Cerebrovasc Dis 2016;42: 332–8. [21] Wilson TD, Houston CE, Etling KM, Brekke N. A new look at anchoring effects: basic anchoring and its antecedents. J Exp Psychol Gen 1996;125:387–402. [22] Menon BK, Campbell BC, Levi C, Goyal M. Role of imaging in current acute ischemic stroke workflow for endovascular therapy. Stroke 2015;46:1453–61. [23] Goyal M, Menon BK, van Zwam WH, Dippel DW, Mitchell PJ, Demchuk AM, et al. Endovascular thrombectomy after large-vessel ischaemic stroke: a meta-analysis of individual patient data from five randomised trials. Lancet 2016;387:1723–31. [24] Wahlgren N, Moreira T, Michel P, Steiner T, Jansen O, Cognard C, et al. Mechanical thrombectomy in acute ischemic stroke: consensus statement by eso-karolinska stroke update 2014/2015, supported by eso, esmint, esnr and ean. Int J Stroke : Off. J. Int. Stroke Soc. 2016;11:134–47. [25] Venema E, Mulder M, Roozenbeek B, Broderick JP, Yeatts SD, Khatri P, et al. Selection of patients for intra-arterial treatment for acute ischaemic stroke: development and validation of a clinical decision tool in two randomised trials. Br Med J 2017;357:j1710. [26] Chalela JA, Kidwell CS, Nentwich LM, Luby M, Butman JA, Demchuk AM, et al. Magnetic resonance imaging and computed tomography in emergency assessment of patients with suspected acute stroke: a prospective comparison. Lancet 2007; 369:293–8. [27] Brazzelli M, Sandercock PA, Chappell FM, Celani MG, Righetti E, Arestis N, et al. Magnetic resonance imaging versus computed tomography for detection of acute vascular lesions in patients presenting with stroke symptoms. Cochrane Database Syst Rev 2009:CD007424. [28] Finlayson O, John V, Yeung R, Dowlatshahi D, Howard P, Zhang L, et al. Interobserver agreement of aspect score distribution for noncontrast ct, ct angiography, and ct perfusion in acute stroke. Stroke 2013;44:234–6.

None. References [1] Lee EJ, Kim YH, Kim N, Kang DW. Deep into the brain: artificial intelligence in stroke imaging. J Stroke 2017;19:277–85. [2] Barber PA, Demchuk AM, Zhang J, Buchan AM. Validity and reliability of a quantitative computed tomography score in predicting outcome of hyperacute stroke before thrombolytic therapy. Aspects study group. Alberta stroke programme early ct score. Lancet 2000;355:1670–4. [3] Menon BK, Puetz V, Kochar P, Demchuk AM. Aspects and other neuroimaging scores in the triage and prediction of outcome in acute stroke patients. Neuroimaging Clin 2011;21:407–23. xiiCrossref. [4] Yoo AJ, Berkhemer OA, Fransen PSS, van den Berg LA, Beumer D, Lingsma HF, et al. Effect of baseline alberta stroke program early ct score on safety and efficacy of intra-arterial treatment: a subgroup analysis of a randomised phase 3 trial (mr clean). Lancet Neurol 2016;15:685–94. [5] Coutts SB, Hill MD, Demchuk AM, Barber PA, Pexman JH, Buchan AM. Aspects reading requires training and experience. Stroke 2003;34:e179. author reply e179. [6] von Kummer R. Effect of training in reading ct scans on patient selection for ecass ii. Neurology 1998;51:S50–2. [7] Bal S, Bhatia R, Menon BK, Shobha N, Puetz V, Dzialowski I, et al. Time dependence of reliability of noncontrast computed tomography in comparison to computed tomography angiography source image in acute ischemic stroke. Int J Stroke : Off J Int Stroke Soc 2015;10:55–60. [8] Naylor J, Churilov L, Chen Z, Koome M, Rane N, Campbell BCV. Reliability, reproducibility and prognostic accuracy of the alberta stroke program early ct score on ct perfusion and non-contrast ct in hyperacute stroke. Cerebrovasc Dis 2017;44: 195–202. [9] Stoel BC, Marquering HA, Staring M, Beenen LF, Slump CH, Roos YB, et al. Automated brain computed tomographic densitometry of early ischemic changes in acute stroke. J Med Imag 2015;2:014004. [10] Kuang H, Najm M, Chakraborty D, Maraj N, Sohn SI, Goyal M, et al. Automated aspects on noncontrast ct scans in patients with acute ischemic stroke using machine learning. AJNR Am J Neuroradiol 2019;40:33–8. [11] Herweh C, Ringleb PA, Rauch G, Gerry S, Behrens L, Mohlenbruch M, et al. Performance of e-aspects software in comparison to that of stroke physicians on assessing ct scans of acute ischemic stroke patients. Int J Stroke : Off. J. Int. Stroke Soc. 2016;11:438–45. [12] Nagel S, Sinha D, Day D, Reith W, Chapot R, Papanagiotou P, et al. E-aspects software is non-inferior to neuroradiologists in applying the aspect score to computed tomography scans of acute ischemic stroke patients. Int J Stroke : Off. J. Int. Stroke Soc. 2017;12:615–22. [13] Berkhemer OA, Fransen PS, Beumer D, van den Berg LA, Lingsma HF, Yoo AJ, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med 2015;372:11–20.

8