Journal Pre-proof Characterizing autism spectrum disorder by deep learning spontaneous brain activity from functional near-infrared spectroscopy Lingyu Xu, Yaya Liu, Jie Yu, Xinjuan Li, Xuan Yu, Huiyi Cheng, Jun Li
PII:
S0165-0270(19)30395-4
DOI:
https://doi.org/10.1016/j.jneumeth.2019.108538
Reference:
NSM 108538
To appear in:
Journal of Neuroscience Methods
Received Date:
6 November 2019
Revised Date:
28 November 2019
Accepted Date:
29 November 2019
Please cite this article as: Xu L, Liu Y, Yu J, Li X, Yu X, Cheng H, Li J, Characterizing autism spectrum disorder by deep learning spontaneous brain activity from functional near-infrared spectroscopy, Journal of Neuroscience Methods (2019), doi: https://doi.org/10.1016/j.jneumeth.2019.108538
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier.
Characterizing autism spectrum disorder by deep learning spontaneous brain activity from functional near-infrared spectroscopy Lingyu Xu1, 2, Yaya Liu1*
[email protected], Jie Yu1, Xinjuan Li1, Xuan Yu1, Huiyi Cheng3, Jun Li3, 4*
[email protected]
School of Computer Engineering and Science, Shanghai University, Shanghai, China
2Shanghai
3
ro of
1
Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, China
South China Academy of Advanced Optoelectronics, South China Normal University, Guangzhou, China
4Key
Lab for Behavioral Economic Science & Technology, South China Normal University, Guangzhou, China
*Corresponding
author at: School of Computer Engineering and Science, Shanghai University, Shanghai 20444,
-p
China; South China Academy of Advanced Optoelectronics, South China Normal University, Guangzhou 510631, China, (Y. Liu), (J. Li)
Jo
ur
na
lP
re
Graphical Abstract
Analyzing on the time-varying behavior of spontaneous hemodynamic fluctuations from fNIRS was performed to investigate the possible functional patterns associated with autism spectrum disorder (ASD) and potential contribution of optical channels to ASD/TD classification; Then a deep learning model combining the long-short term memory (LSTM) and convolutional neural network (CNN) was conducted to represent the temporal variation of brain activity for accurate identification of ASD.
-1-
Highlights
Monitoring brain activity of bilateral TL by functional near-infrared spectroscopy Temporal variation in hemodynamic fluctuations of ASD children differs from TD Weaker internal logic but stronger memory to random shocks in ASD children Identifying ASD with long-short term memory and convolutional neural network Hemodynamic signals of Hb demonstrated classification accuracy of 95.7%
Jo
ur
na
lP
re
-p
ro of
Abstract: Background: Functional near-infrared spectroscopy (fNIRS) was used to investigate spontaneous hemodynamic fluctuations in the bilateral temporal cortices for typically developing (TD) children and children with autism spectrum disorder (ASD). New Method: This paper proposed an approach to estimate the global time-varying behavior of brain activity through the measurement on change in first-order statistical properties directly from fNIRS time series. Then, a deep learning model combining the long-short term memory (LSTM) and convolutional neural network (CNN) was constructed based on the integration strategy with improved bagging algorithm, with the purpose to explore the potential patterns of temporal variation for ASD identification. Results: Based on the theory of stationarity, analysis on the global time-varying behavior of hemodynamic fluctuations in oxy-hemoglobin (HbO2) and deoxy-hemoglobin (Hb) demonstrated that children with ASD showed weaker internal logic, but stronger memory and persistence to random shocks than TD children. Differentiating between ASD and TD with the proposed deep learning approach resulted in high accurate classification with sensitivity of 97.1% and specificity of 94.3%. Comparison with Existing Methods: Using fNIRS time series of Hb from single optical channel, we achieved a better classification accuracy of 95.7% that was about 8% higher than previous methods with similar data. Conclusions: The characterization on time-varying behavior of brain activity holds promise for better understanding the underlying causes to ASD. And the deployed deep learning framework with an integration manner has the potential for screening children with risk of ASD. Key words: functional near infrared spectroscopy (fNIRS); autism spectrum disorder (ASD); deep learning; long-short term memory (LSTM); convolutional neural network (CNN).
1. Introduction Autism spectrum disorder (ASD) is a pervasive developmental syndrome characterized by narrow interest, stereotyped behaviors, impaired social interaction, and sensory abnormalities [Li et al.,2016]. For now, the diagnosis of ASD relies solely on the behavioral observations, e.g., via the Autism Diagnosis Observation Schedule (ADOS) [Lord et al.,1989]. However, it is largely limited by not only the variability of measurement, but also the incomprehensibility of etiology for autism. Several recent imaging studies have noted significant alteration in the brain structure or function associated with ASD, such as enlarged brain volume [Courchesne et al.,2011; Nordahl et -2-
Jo
ur
na
lP
re
-p
ro of
al.,2011], accelerated cortical thinning [Ecker et al.,2014; Zielinski et al.,2014], and delayed language development [Ha et al.,2015; Ha et al.,2015]. These instructive findings might render brain imaging as a new avenue to aiding the diagnosis of autism. Compelling imaging studies have implicated the emergence and manifestation of autistic symptoms with changes of physiology in the temporal lobes (TL). For examples, measured with functional magnetic resonance imaging (fMRI), children with ASD showed an increased emotional reaction of temporal cortex to errors, whereas typically developing (TD) children doesn’t experience such change [Goldberg et al.,2011]. It echoes with studies suggesting a reduced activation of bilateral superior temporal region to novel sounds [Gomot et al.,2006; Boddaert et al.,2004]. Both of the anomalies have been linked to the excessive perseveration and repetitive behavior in autism. Voxel-based MRI investigations on individuals with ASD [Salmond et al.,2003; Waiter et al.,2004] reported increased gray matter volume in the superior temporal gyrus, which might underlie some autistic abnormalities in the language processing and social perception. Moreover, neuropsychological and physiological dysfunction of young autistic children might be associated with the deficient left temporal lobe [Gendry et al.,2005; Chi et al.,2014], enlargement of the right temporal lobe [Jou et al.,2010], and atypical hemispheric lateralization [Jou et al.,2010; Cardinale et al.,2013], especially for language related areas of TL [Gage et al.,2009; Lindell et al.,2013]. All these observations highlight that investigating the bilateral TL would attribute to better understanding the pathophysiology of ASD. Therefore, this paper attempts to characterize the spontaneous hemodynamic activity from bilateral TL, with the purpose to explore potential functional patterns for discriminating children with ASD from controls. Among various identification methods, deep learning has been drawing more attention due to its ability to automatically understand data. That is, it could explore multifaced features for desired tasks in an automatic manner rather than manual measurement utilized in the traditional machine learning approaches, such as support vector machines (SVM), decision tree, or random forests. Several recent studies with deep learning have made promising results on the identification of individuals with ASD and TD. For examples, Dvornek et al. adopted the long-short term memory (LSTM) to modeling heterogeneous resting state fMRI data, which resulted in a classification accuracy of 68.5% [Dvornek et al.,2017]. Heinsfeld et al. achieved better classification through the autoencoder network (accuracy of 70% and sensitivity of 74%) than SVM (accuracy of 65% and sensitivity of 68%) [Heinsfeld et al.,2018]. Moreover, Jain deployed a graph convolutional neural network (CNN) on the resting state fMRI data, which demonstrated a good accuracy of 70.23% [Jain et al.,2018]. Reviewing these efforts suggest that the deep learning could make more objective identification of ASD than traditional methods based on multiple subjective features. Of course, in addition to the individual identification task, the deep leaning technique also shows great potential in other applications, such as detection of speech and language abnormalities [Li et al.,2019] and automatic stereotypical motor movement detection [Mohammadian et al.,2018] by combining LSTM with CNN. In this paper, we investigate the feasibility of applying combination model of CNN and LSTM in the identification of ASD. Since the functional near infrared spectroscopy (fNIRS) provides a balance between the high sampling rate of EEG (electroencephalography) and excellent spatial resolution of fMRI [Wolf et al.,2007; Liu et al.,2019], we utilized fNIRS to record the spontaneous hemodynamic activity from bilateral TL for the TD children and children with ASD. To investigate the atypical features of spontaneous fluctuations in ASD, the present study is organized as follows. First of all, we -3-
measured and analyzed the global time-varying behavior of brain activity based on the first-order statistical properties of hemodynamic changes in oxy-hemoglobin (HbO2) and deoxy-hemoglobin (Hb). Then, an integrated deep learning model combining LSTM and CNN was constructed to explore further temporal patterns for identification of ASD. Finally, we discussed the classification performance and draw conclusions.
2. Material and Methodology 2.1. Experimental summary
Table 1. Demographics summary
Gender
Age
IQ
--
--
mean
std
ASD
G 7, B 18
9.3
TD
G 4, B 18
9.5
-p
Group
ro of
25 children with ASD and 22 age-matched TD children were recruited in accordance with the policy of University’s Ethical Review Board. All the subjects were right-handed, and the written consents of measuring were signed by the parents of children in advance. All the involved autistic children were diagnosed by experienced clinicians according to the DSM-IV-TR (Diagnostic and Statistical Manual of Mental Disorders, fourth edition, text revision) [American Psychiatric Association, 2000]. Table 1 aggregates the key demographic information of subjects, including distribution of ASD and TD group by sex and age, and the non-verbal IQ collected using Raven’s Standard Progressive Matrices Test [Raven et al.,2003].
std
1.4
106
12
1.6
91
15
re
mean
na
lP
G: Girl, B: Boy. The difference in IQ was significant between the TD and ASD group (p<0.05).
ur
Fig. 1. Schematic representation of brain showing the location of each measurement. Total optical channels covering TL is 24, and 12 channels are separately located in each hemisphere.
Jo
Keeping eyes closed and silence on a comfortable chair in a dark room, subjects were scanned by a commercial continuous-wave fNIRS system (FOIRE-3000, Shimadzu Corporation, Kyoto, Japan). Parameters measured were concentration changes in the HbO2, Hb and total hemoglobin (HbT), which were converted from changes in optical intensity based on the modified Beer-Lambert law. In contrast to most of fNIRS setups, the fNIRS (FOIRE-3000) utilized in our study does not take the differential pathlength factor (DPF) in calculation. FOIRE-3000 is equipped with sixteen fiber sources and sixteen fiber detectors, building up to 52 detecting channels with a fixed source-detector (SD) distance of 3.0 cm. In our study, only 24 optical channels covering bilateral TL (Fig.1) were used to make the measurement. To identify the locations of optical probes, the international 10-10 system for electroencephalography (EEG) was referenced with an EEG cap, in which T7 locating between Ch4 and Ch6, and T8 locating between -4-
Ch16 and Ch19. For each subject, approximate 8 minutes of spontaneous cerebral hemodynamic fluctuations from bilateral TL were recorded with a 70-ms temporal resolution (i.e., 14.3 Hz sampling frequency). With the consideration on less contribution of overlapped information to better identification, our experiments included only two independent variables (i.e., HbO2 and Hb, HbT = HbO2+Hb), which were preprocessed with centralization and normalization via Z = (S-mean(S))/std(S). It aims to equalize the average fluctuation of hemodynamic signal S, which might help to reveal the structural differences in the temporal variation of the brain signals between children with ASD and TD.
2.2. Characterizing the time-varying behavior
lP
re
-p
ro of
To explore the atypical temporal variation in spontaneous hemodynamic fluctuations of autistic children, the global time-varying behavior was measured through the change in first-order statistical properties of fNIRS time series. It was performed with Augmented Dickey-Fuller (ADF) test [Liu et al.,2008], which was implemented by R build-in functions (adfTest and urdfTest) in the package fUnitRoots. If the model could not reject the null hypothesis with unit root, the hemodynamic fluctuation shows non-stationary in the time-varying behavior.
Fig. 2. Two scaled time series (S1 and S2) with different time-varying behavior
na
Table 2. Augmented Dickey-Fuller test for S1 and S2
ur
Augmented Dickey-Fuller test statistic
S2
t-Statistic
Prob.*
t-Statistic
Prob.*
-1.0405
0.7408
-3.8305
0.0026
1% level
-2.5654
5% level
-1.9409
10% level
-1.6167
Jo
Test critical values:
S1
Suppose there are two time series with equal length (i.e., S1 and S2), as shown in Fig.2. Preliminary observation on the time-path diagrams suggest a non-stationary time-varying behavior in S1, whereas stationary in S2. Separately, it behaves as different mean values at different time periods and logic fluctuation around the mean values. This difference of time-varying behavior between S1 and S2 is proved in the Table 2, showing corresponding results of ADF test. Visual observation shows that the t-statistic of S1 is larger than the critical values at levels of 1%, 5% and 10%, whereas the S2 is smaller. The smaller the t-statistic is with respect to the predefined critical value, the more significant the stationarity is in the time-varying behavior. Thus, we defined the confidence level as 0.01 to investigate more remarkable temporal patterns contributing to accurate -5-
identification. Under the present condition, the stationarity of time-varying behavior is not significant for S1 (p>>0.01), but significant for S2 (p<<0.01). Based on the ADF test on hemodynamic signals from all subjects, we could estimate the global performance of time-varying behavior for groups at each optical channel. It was defined as the percentage of significant stationary samples within group. By this stage, some potential channels could be identified to guide the discrimination between autistic children and controls. What’s more, appropriate combination of other manual features with statistic values of time-varying behavior at those channels might available to conventional machine learning methods for desired tasks. In this paper, we tend to utilize deep leaning approach to make further study on temporal features of spontaneous hemodynamic fluctuations.
2.3. Deep learning the spontaneous hemodynamic fluctuations
ro of
To further explore the potential patterns of temporal variation for ASD identification, we constructed an integrated deep learning model, in which the long-short term memory (LSTM)
Jo
ur
na
lP
re
-p
[Murdoch et al.,2017] was hoped to express detailed temporal features and convolutional neural network (CNN) [Guan et al.,2018] to recognize informative as well as remarkable patterns. For a quantitative contribution of features to desired identification task, a total of 2256 hemodynamic signals gathered from fNIRS were split into 48 subsets according to various variables and channels. That is, only 22 ASD and 22 TD subjects was included in the modeling to balance the ability to learning brain activity for groups. The data of each subset was separately modeled with a ratio of 7: 3 for the training to testing samples, in which the proportion of categories (i.e., ASD and TD) was almost identical. Under random sampling without replacement, one of the subsets was independent modeled to determine the hyper-parameters of model, and the others were experimented to evaluate the performance of prediction. Training was stopped when 200 epochs had been executed or the validating loss had not decreased in 30 epochs. Moreover, a layer of dropout to the weights for regularization [Dvornek et al.,2017] was introduced to simplify the network for less parameters to be trained. Attributed to the random settings of dropout layer in learning and classifying, every training on same data may give rise to different classification. Thinking of this consideration, we designed an integration strategy based on improved bagging algorithm, which could combine effective features of several modeling for accurate identification. As shown in Fig. 3, the architecture of proposed model mainly involves three steps: (1) transformation of the input time series to three variants, i.e., primitive matrix PM, Gaussian matrix GM and sampled matrix SM; (2) extraction of remarkable patterns on temporal variation with the combination of LSTM and CNN (LAC); (3) training with supervised learning and classification with the improved bagging algorithm. The details of the three steps are separately described as follows.
-6-
ro of -p
Fig. 3. Architecture of the proposed deep learning model
Step 1: Transformation of scaled spontaneous hemodynamic fluctuations
PM 1 , PM 2 ,
PM T
ur
PM
na
lP
re
To better express the timing of time series, a hyper-parameter ‘timestep’ is introduced in this transformation stage. Denote S={x1,x2,…xT} as a hemodynamic signal with length of T, and PM, GM, and SM as three converted inputs respectively. Derived directly from initial sequence S, the primitive matrix PM is represented with a dimension of (timestep, T- timestep+1), as follows. To facilitate the understanding, each column of converted matrix could be considered as a state element at moment t, i.e., PMt∈Rtimestep, and t=1,2, …, T- timestep+1.
x1 x2
x2
xtimestep
xtimestep
xT
timestep 1
timestep 1
1
xT
Jo
Gaussian matrix GM and sampled matrix SM follow the similar expression with PM but different processed sequence, i.e., noise-added S and down-sampled S. In details, the former is generated by adding gaussian noise U={u1, u2, … uT} to S, denoted as U+S={u1 + x1, u2 + x2, … uT + xT}. And the latter is reconstructed from S by down-sampling with a sampling rate of r, r=2,3,…. Thus, the processed sequence for producing SM could be expressed as {x1+i*r}, i∈[0, (T-1)/r]. Since the sampled matrix SM is converted from reconstructed sequence (i.e., down-sampling process on S), the dimension might different from GM and SM. Step 2: Feature extraction on the temporal variation Due to the excellent handing to long-term dependencies in sequences, we modeled the above three variants with LSTM network as separate to capture diverse temporal variation in the spontaneous hemodynamic fluctuations. Let ct∈Rq and ht∈Rq to represent the cell state and -7-
output of an LSTM unit at moment t, and q denotes the number of neurons in the LSTM unit. With primitive matrix PM={PMt} as example, the update procedure from time t-1 to t could be formulized as follows.
Wi PM t ; ht
bi
1
ft
W f PM t ; ht
1
bf
ot
Wo PM t ; ht
1
bo
ct
tanh Wc PM t ; ht
ct
it ct
ht
ot tanh ct
ft ct
1
(1) (2) (3)
bc
(4) (5)
1
ro of
it
(6)
Here it and ft respectively indicate regulated gates to add or remove information of previous cell ht-1 to the current cell state ct. And the final output ht is decided by the combination of current cell state ct and activated output ot. Parameters W , b, refer to corresponding weight, bias as
conv1 Wk , LPM
bk , k
1, K
na
CkPM
lP
re
-p
well as activation function (i.e., tanh), respectively. Thus, a final feature space LPM={ht} could be achieved for further analysis. For gaussian matrix GM and sampled matrix SM, we could also generate corresponding feature representation LGM and LSM through similar operation. Then, a Convolution1D network was utilized to encode temporal features from LSTM layer, so as to explore potential structural patterns of temporal variation. Specifically, it is operated by a series of fixed-size filters, resulting in various eigenvalue maps (feature maps). Noting that neurons within same feature map share weights, which could significantly reduce connections between various layers and avoid over-fitting caused by small data. Suppose there are K kernels, and the convolution of matrix LPM with kth kernel could be described as follows.
ur
Symbol means activation function and conv1 refers to convolution function. Current feature map CPM is a feature vector and the final potential patterns could be represented as a tensor k PM PM of K feature maps, i.e., CPM =[CPM 1 ,C2 ,…,Ck ]. Similarly, we could generate the abstracted expression of CGM and CSM on the LGM and LSM, respectively. Finally, the concatenation Cmerge= CPM⊕CGM⊕CSM was fed into a ‘softmax’ classifier to predict its class.
Jo
Step 3: Training and classification with improved bagging algorithm
Based on the improved bagging algorithm, an integration strategy was deployed on the model to make more accurate and stable identification of children with ASD and TD. The main idea is that N LACs are trained with same sampled data and hyper-parameters in an independent manner, and the final prediction label’ is voted with mode (i.e. the minority obeys the majority). The process could be formulated as follows. If more than one category gets the highest vote, the model will randomly pick one as the final category. 1 2 label ' mode soft max Cmerge , soft max Cmerge ,
N , soft max Cmerg e
For faster and smoother convergence, models in this paper were trained with categorical -8-
cross-entropy and Adam optimizer in Keras [Lioutas et al.,2018]. The dropout layers with rate 0.2 and batch normalization were employed in the weaker learner LAC to improve the capacity of generalization. Moreover, the activation functions ReLU and tanh were separately applied in the 32 neurons for LSTM and kernels with a dimension of 3 for Convolution1D, with the purpose to avoid the gradient vanishing or exploding. The learning rate was set to 0.001 in 9 LACs, and the batch size was defined as maximum training samples size. Owing to the independent training of weak learners, the ensemble diagnosis model (EDM) could effectively reduce the over-fitting caused by small data and synergistic effect of different features, which is helpful for the generalization as well as robustness of model.
3. Experimental Results Characterization of time-varying behavior
na
lP
re
-p
ro of
3.1.
Fig. 4. Color maps of ADF Value matrices. Each number (1-24) in x-axis denotes the numbering of
Jo
ur
optical channels, and each number (1-22) in y-axis means the numbering of subjects.
Fig. 5. Distribution of ADF Statistic. Each number (1-24) in x-axis denotes the numbering of optical channels. The number in y-axis denotes the standard deviation (SD) of ADF statistic for TD and ASD group. The bigger the SD, the more widespread the distribution of stationarity performed in the group.
Fig. 4 illustrates the t-statistic of ADF test on ASD and TD children. Visual observation -9-
3.2. Identification of informative channels
ro of
shows that the values for both hemodynamic variables (i.e., HbO2 and Hb) in ASD group are larger than TD group. It implies that ASD children performed weaker stationarity in the temporal variation of hemodynamic fluctuation than controls. The universality is more intuitively embodied in the statistical distribution for t-statistic (Fig. 5). For the nearly all optical channels, children experienced more consistent time-varying behavior in ASD group than TD group, especially in HbO2 variable. Theory of stationarity [Liu et al.,2008; Liu et al.,2008] defines stationarity as internal logicality of time series, implying a consistent structural change relationship between the values of each period and the previous periods. From the above, we could conclude that the hemodynamic fluctuations of TD children are characterized as stronger internal logicality than children with ASD. In other words, it could be stated that bilateral TL of ASD children shows stronger memorial reaction to random shocks than TD children. This observation is consistent with studies suggesting “stimulus overselectivity” in social language processing [Liss et al.,2006; López et al.,2008], in which individuals with ASD might tend toward verbatim memory rather than conceptual processing (understanding and remembering each individual detail) in global.
Jo
ur
na
lP
re
-p
Based on the ADF values of hemodynamic signals from all the subjects, the time-varying behavior for groups at each channel could be estimated through the percentage of significant stationary samples within group, as shown in Fig. 6. To explore potential channels implicated with ASD, we measured the difference of global time-varying behavior between TD and ASD group respectively at each optical channel, as illustrated in Fig. 7 (a) (c). It was operated with simple subtraction between the scores on global time-varying behavior of TD and ASD group, and the sign of results were not considered.
Fig. 6. Global stationarity for TD and ASD group in HbO2 and Hb. The abscissa indicates optical channels, and ordinate refers to the global stationarity of group evaluated by number of significant stationary samples / total samples. - 10 -
ro of -p re lP na ur Jo Fig. 7. Difference of global stationarity for TD and ASD groups in signals of HbO2 and Hb. In (a) and (c), the differences (expressed as fraction) is significant at channels with an asterisk mark, and the significant levels are labeled asⅠ,Ⅱand Ⅲin descending order. The (b) and (d) show the ADF t-statistic of spontaneous hemodynamic fluctuations. Blue dots indicate the TD group and yellow - 11 -
triangles denote the ASD group.
Thus, the channels showing significant difference in global time-varying behavior for groups could be summarized through the following criterion. It was defined with concerns about not only the lowest threshold of difference, but also the complete stationarity of TD or ASD group (i.e., all the samples within group shows significant stationarity in time-varying behavior). PiT denotes the proportion of samples with significant stationary time-varying behavior in TD group at the i-th channel, whereas PiA indicates ASD group.
PiT
Pi A
0.05
1 PiT 1 Pi A
0
3.3. Classification performance
re
-p
ro of
With the defined filter criteria, analysis on the 24 optical channels demonstrated that the difference between the global time-varying behavior of TD and ASD group was significant at 5 channels, i.e., channel 3,6,8,11 for left and 22 for right hemisphere. As shown in the Fig. 7 (a) (c), the specialty is embodied in the HbO2 for channels of 3,6,11, and Hb for channels of 6,8,11,22. What is noticing is that both of HbO2 and Hb for channels 6,11 shows significant difference in the global time-varying behavior of TD and ASD group. The highlighted channels in left TL adds credence to the view that a core signature of autism is a deficient left anterior temporal cortex response to language [Chi et al.,2014; Eyler et al.,2012]. Furthermore, the significant difference could be intuitively shown in Fig. 7(b)(d), in which the points indicate ADF values of hemodynamic signals. Visual observation shows that ASD children are characterized with larger values (i.e., weaker significance in the stationarity), but much uniform performance than TD children in terms of HbO2 and Hb. Thus, it could perhaps be said that those observed channels might available for investigate potential biomarkers associated with ASD.
Percentage Difference (%)
Accuracy (%)
Sensitivity (%)
Specificity (%)
Mean
Std
Mean
Std
Mean
Std
--
--
68.5
--
--
--
--
--
--
--
70
--
74
--
63
--
70.2
--
--
--
--
--
Heinsfeld et al.,2018
ur
Jain et al.,2018
na
Channel Dvornek et al.,2017
lP
Table 3. Comparison of different representation on features for ASD/TD classification
--
--
81
--
88
--
--
--
Plitt et al.,2015
--
--
69.7
--
--
--
--
--
Abraham et al.,2017
--
--
66.9
2.7
53.2
5.8
78.3
4.1
--
--
87.5
--
81.6
--
94.6
--
3
18.182
95.0
4.82
98.6
4.52
91.4
9.99
11
9.091
93.
5.27
94.3
7.38
92.9
10.10
6
9.091
90.0
9.64
98.6
4.52
81.4
17.88
22
13.636
95.7
4.99
97.1
6.02
94.3
9.99
8
13.636
95.0
5.88
98.6
4.52
91.4
9.99
6
13.636
92.9
4.76
97.1
6.02
88.6
9.04
11
9.091
91.4
10.54
98.6
4.52
84.3
21.77
Jo
Hazlett et al.,2017
Li et al.,2016
HbO2
Hb
To evaluate the performance of classification, three measures of sensitivity, specificity and - 12 -
ro of
accuracy [Heinsfeld et al.,2018; Abraham et al.,2017] were considered in the present paper. In particular, the sensitivity refers to the ratio of identified ASD to the all diagnosed ASD, while the specificity defines the percentage of established TD children to all the tested TD children. And the accuracy indicates the proportion of correct diagnoses among all predicted labels. Based on the 10 runs of cross-validation procedure, we modeled fNIRS data from different informative channels to demonstrate the effectiveness of our proposed approach. As shown in Table 3, our model is performed well on the prediction even with fNIRS data of only one variable, and better than previously identification methods. Using HbO2 time series of optical channel 3 with most significant difference on global time-varying behavior between ASD and TD group, we achieved good specificity of 91.4%, sensitivity of 98.6% and accuracy of 95.0%. Similarly, the Hb time series of channel 22 resulted in a more accurate classification with sensitivity of 97.1% and specificity of 94.3%. Compared to the most competitive result using the fNIRS data [Li et al.,2016], the difference between our accuracy and chance is over higher 8%. And the large standard deviation could be improved with the increasing of sample size. The good results states that the deep learning is very effective for the identification of ASD, and the temporal variation has the potential to screen children with risk of ASD.
4. Discussion
na
lP
re
-p
When using the resting-state data to characterize ASD, motion artifact removal is usually used in most studies on brain imaging [Dvornek et al.,2017; Jain et al.,2018]. The motion artifact generally manifests brain signal as a sudden drop or rise, which might make conspicuous impact on functional measures, such as increased short-range connectivity and decreased long-range connectivity [Power et al.,2012]. Contrary to these studies, this paper does not take artifact removal in the preprocessing of fNIRS data. The reasons for this arrangement could be summed up to four aspects, as follows.
ur
Fig. 8. Raw time trace of Hb and measured motion-induced artifacts. The green trace is signal, while the red lines refers to moments in which the motion artifacts might be induced.
Jo
Firstly, the less motion artifacts are likely to be induced in our data acquisition. It might be attributed to our utilized continuous-wave system (FOIRE-3000), which shows weaker sensitive to motion within a given range than other imaging modalities including fMRI [Wolf et al.,2007; Liu et al.,2019]. We performed analysis of possible motion artifacts on all the involved subjects in our experiment with the reference of density-based anomaly detection methods [Guo et al.,2017]. It demonstrated a score of 0.31% in TD group and 2.76% in ASD group for Hb, and 0.51% in TD group and 3.52% in ASD group for HbO2, which implied a small proportion and non-universality of motion artifacts in the brain activities for either TD or ASD group. When measuring the possible artifacts on each brain signals, we firstly calculated the average and standard deviation of the absolute values of amplitudes across all available points. Under the interval of [xt-δ, xt+δ], the density of point xt is defined as the number of eligible points, whose values belong to the scope of - 13 -
Jo
ur
na
lP
re
-p
ro of
[xt+φ, xt-φ]. The φ is set to the sum of measured mean and 3 standard deviation. Motion artifact seems to be induced at moment t, when the point-density of xt is not more than δ=50. Figure 8 serves as an example to show the efficiency of the measurement. Secondly, our measurement on time-varying behavior is determined by global tendency in temporal variation rather than individual abnormal fluctuation. It is different from previous susceptible measurements to motion artifact. That is, the characterized time-varying behavior with ADF test, mainly focused on the global temporal variation in first-order statistical properties of time sub-series, which is less sensitive to the sudden change in the signal. As shown from the two normalized time series (S3 and S5) of same channel in Fig.9, head movement plausibly occurred, which induced artifacts at around 250s for S3, and 200s, 265s for S5. However, the possible motion did not appear to sway the overall trends of signals, performing a downtrend in S3 and a smooth trend in S5. And this speculation was corroborated through the measurement with ADF test, showing a non-significant stationary (p>0.01) in the time-varying behavior in S3 and a significant stationarity (p<0.01) in S5.
Fig. 9. Scaled time series of Hb from same optical channel (Ch3) for children with TD and ASD. The S3, S4 come from subjects in TD group and S5, S6 are ASD group.
Thirdly, it is an objective fact that the induced artifacts by head movement are random and not characterized with regularity in either autistic children or controls [Li et al.,2016]. The modeling purpose with deep leaning approach lies in the exploration of regular features, thus the less and sporadic motion artifacts, manifesting brain signal as a sudden drop or rise, are normally considered as noise or outliers in the signal. It would not be taken as available features for accurate identification tasks and often goes overlooked. A recent study on learning image restoration without clean data [Lehtinen et al.,2018] has demonstrated the statement. In particular, - 14 -
Jo
ur
na
lP
re
-p
ro of
analysis on the experimental results shows that our proposed model could make accurate prediction for a signal, showing the largest motion artifacts of 39.34% in all the brain signals. It implies that the motion artifacts are unlikely to have impact on the identification task performed by our combined model. Moreover, we explored a sorting of channels based on the amount of motion artifacts that they contain, and compared the classification performance. Analysis with linear regression shows that the amount of motion artifacts has no effect on the levels of classification accuracy (p=0.11>0.05). Finally, there is no denying that the motion artifact removal (e.g., via global signal regression [Power et al.,2014] and ICA-based nuisance removal [Tyszka et al.,2014]) might result in loss of information at some extent. To investigate more atypical features of brain activity for aiding diagnosis on ASD, our utilized fNIRS data was only processed with normalization, no other procedures including motion artifact removal. Compared with previous deep learning methods for identifying ASD [Dvornek et al.,2017; Heinsfeld et al.,2018; Jain et al.,2018], our proposed framework in this study mainly emphasizes two innovative points. One is the combination of LSTM and CNN, the other is integration strategy based on improved bagging method. Unlike modeling with single network, the combinational network could provide more remarkable patterns for desired identification task. A recent study with the combination framework has demonstrated positive detection on the atypical prosody and stereotyped idiosyncratic phrases associated with ASD. Although the combination framework of CNN and LSTM is not put forward in the first time, this is the first use to classify individuals with ASD and TD based on fNIRS data. It would provide a potential thinking to characterize the pathophysiology of ASD. Supervised deep learning aims to learn a stable and balanced model, but sometimes only get multiple models with preference. In present study, an integration strategy with improved bagging method was proposed to obtain a comprehensive and stable model with strong supervision. It could be interpreted as a special kind of ensemble learning [Lioutas et al.,2018], with the purpose to minimize the influence of random factors by combining multiple weakly supervised models. That is, other weak classifiers are able to correct the error if a weak classifier gives incorrect prediction. This property makes our deep leaning framework possible for other detection tasks of ASD and related speech impairments. A limitation of this study on the classification is the less size of subjects. Though a rather accurate classification was achieved on the current sample, the high accuracy might not hold if more subjects with ASD and TD are included. Increasing the number of subjects or augmenting samples in the future may overcome the problem. Another limitation could arise from the characterization of time-varying behavior, in which motion artifacts have negligible effects on global temporal variation of fNIRS data. This specialty might attribute to our data acquisition system of fNIRS (FOIRE-3000) and recruited subjects, thus the measurement on time-varying behaviors might show weaker portability to other imaging data. One possible solution is to uncover more statistical properties, so as to ensure that the characterized temporal variation are indicative of cerebral activity.
5. Conclusion Temporal variation in the spontaneous hemodynamic fluctuations was measured to characterize ASD in this paper. The global time-varying behavior was evaluated through the - 15 -
ro of
measurement on change in first-order statistical property of fNIRS time series. It demonstrated that, compared with TD children, ASD children experienced weaker internal logic, but stronger memory and persistence upon random shocks in the hemodynamic fluctuations of HbO2 and Hb. Moreover, the analysis on 24 channels covering bilateral TL highlighted some potential channels, in which the global time-varying behavior of hemodynamic fluctuations showed significant difference between TD and ASD group. It might have potential application for producing potential biomarkers associated with ASD, and screening children of ASD. To investigate potential functional patterns of temporal variation in the brain activity of ASD, we constructed a combined model of recurrent neural network LSTM with CNN based on the integration strategy with improved bagging algorithm. In details, the LSTM aimed to represent temporal features in a hierarchical manner, and the CNN tended to explore remarkable patterns of time-varying behavior for accurate classification. Using fNIRS data of single informative channel, high accurate classification of ASD has been achieved with the sensitivity of 97.1% and specificity of 94.3%. To be our best knowledge, this result has so far not been reported in literature. In the future, we will make more attention to the interpretation of well-trained model, and the information fusion on fNIRS data of multiple channels and variables, with the purpose of identifying children with ASD.
-p
Authors contribution
Lingyu Xu and Yaya Liu conceived and designed the study. Yaya Liu and Jun Li
re
performed the experiments. Jun Li and Huiyi Cheng provided the experimental data. Yaya Liu and Lingyu Xu wrote the paper. Jie Yu, Xinjuan Li and Xuan Yu reviewed
lP
and edited the manuscript. All authors read and approved the manuscript.
na
Declarations of interest: none
ur
Acknowledgment and Funding
Jo
National Program on Key Research Project (Grant No. 2016YFC1401900); National Natural Science Foundation of China (Grant No. 81771876); the Guangdong Provincial Key Laboratory of Optical Information Materials and Technology (Grant No. 2017B030301007); Guangdong Science and Technology Program (Grant No. 2017A010101023); the Innovation Project of Graduate School of South China Normal University.
- 16 -
References Abraham, A., Milham, M. P., Di Martino, A., Craddock, R. C., Samaras, D., Thirion, B., & Varoquaux, G. (2017). Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example. NeuroImage, 147(October 2016), 736–745. https://doi.org/10.1016/j.neuroimage.2016.10.045 American Psychiatric Association., & American Psychiatric Association. Task Force on DSM-IV. (2000). Diagnostic and statistical manual of mental disorders : DSM-IV-TR. American Psychiatric Association. Boddaert, N., Chabane, N., Belin, P., Bourgeois, M., Royer, V., Barthelemy, C., … Zilbovicius, M. (2004). Perception of Complex Sounds in Autism: Abnormal Auditory Cortical Processing in Children. American Journal of Psychiatry, 161(11), 2117–2120. https://doi.org/10.1176/appi.ajp.161.11.2117 Cardinale, R. C., Shih, P., Fishman, I., Ford, L. M., & Müller, R.-A. (2013). Pervasive Rightward Asymmetry
https://doi.org/10.1001/jamapsychiatry.2013.382
ro of
Shifts of Functional Networks in Autism Spectrum Disorder. JAMA Psychiatry, 70(9), 975.
Chi, R. P., & Snyder, A. W. (2014). Treating autism by targeting the temporal lobes. Medical Hypotheses, 83(5), 614–618. https://doi.org/10.1016/J.MEHY.2014.08.002
Courchesne, E., Campbell, K., & Solso, S. (2011). Brain growth across the life span in autism: Age-specific changes in anatomical pathology. Brain Research, 1380, 138–145.
-p
https://doi.org/10.1016/J.BRAINRES.2010.09.101
Dvornek, N. C., Ventola, P., Pelphrey, K. A., & Duncan, J. S. (2017). Identifying Autism from Resting-State fMRI Using Long Short-Term Memory Networks. 362–370. https://doi.org/10.1007/978-3-319-67389-9_42
re
Ecker, C., Shahidiani, A., Feng, Y., Daly, E., Murphy, C., D’Almeida, V., … Murphy, D. G. M. (2014). The effect of age, diagnosis, and their interaction on vertex-based measures of cortical thickness and surface area in autism spectrum disorder. Journal of Neural Transmission, 121(9), 1157–1170.
lP
https://doi.org/10.1007/s00702-014-1207-1
Eyler, L. T., Pierce, K., & Courchesne, E. (2012). A failure of left temporal cortex to specialize for language is an early emerging and fundamental property of autism. Brain, 135(3), 949–960. https://doi.org/10.1093/brain/awr364
na
Gage, N. M., Juranek, J., Filipek, P. A., Osann, K., Flodman, P., Isenberg, A. L., & Spence, M. A. (2009). Rightward hemispheric asymmetries in auditory language cortex in children with autistic disorder: an MRI investigation. Journal of Neurodevelopmental Disorders, 1(3), 205–214. https://doi.org/10.1007/s11689-009-9010-2
ur
Gendry Meresse, I., Zilbovicius, M., Boddaert, N., Robel, L., Philippe, A., Sfaello, I., … Chabane, N. (2005). Autism severity and temporal lobe functional abnormalities. Annals of Neurology, 58(3), 466–469.
Jo
https://doi.org/10.1002/ana.20597 Goldberg, M. C., Spinelli, S., Joel, S., Pekar, J. J., Denckla, M. B., & Mostofsky, S. H. (2011). Children with high functioning autism show increased prefrontal and temporal cortex activity during error monitoring. Developmental Cognitive Neuroscience, 1(1), 47–56. https://doi.org/10.1016/J.DCN.2010.07.002
Gomot, M., Bernard, F. A., Davis, M. H., Belmonte, M. K., Ashwin, C., Bullmore, E. T., & Baron-Cohen, S. (2006). Change detection in children with autism: An auditory event-related fMRI study. NeuroImage, 29(2), 475–484. https://doi.org/10.1016/J.NEUROIMAGE.2005.07.027 Guan, Q., Huang, Y., Zhong, Z., Zheng, Z., Zheng, L., & Yang, Y. (2018). Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification. arXiv preprint arXiv:1801.09927. Guo, F. P. , & Hui, H. X. . (2017). Anomaly detection algorithm based on the local distance of density-based - 17 -
sampling data. Journal of Software. https://doi.org/10.13328/j.cnki.jos.005134 Ha, S., Sohn, I.-J., Kim, N., Sim, H. J., & Cheon, K.-A. (2015). Characteristics of Brains in Autism Spectrum Disorder: Structure, Function and Connectivity across the Lifespan. Experimental Neurobiology, 24(4), 273–284. https://doi.org/10.5607/en.2015.24.4.273 Hazlett, H. C., Ph, D., Gu, H., Ph, D., Munsell, B. C., Ph, D., … Ph, D. (2017). Early brain development in infants at high risk for autism spectrum disorder. Nature, 542(7641), 348–351. https://doi.org/10.1038/nature21369.Early Heinsfeld, A. S., Franco, A. R., Craddock, R. C., Buchweitz, A., & Meneguzzi, F. (2018). Identification of autism spectrum disorder using deep learning and the ABIDE dataset. NeuroImage: Clinical, 17(August 2017), 16–23. https://doi.org/10.1016/j.nicl.2017.08.017 Jain, S. M. (2018). Detection of Autism using Magnetic Resonance Imaging data and Graph Convolutional Neural Networks Detection of Autism using Magnetic Resonance Imaging data and Graph Convolutional Neural Networks. Mater Dissertation, Rochester Institute of Technology.
ro of
Jou, R. J., Minshew, N. J., Keshavan, M. S., Vitale, M. P., & Hardan, A. Y. (2010). Enlarged right superior temporal gyrus in children and adolescents with autism. Brain Research, 1360, 205–212. https://doi.org/10.1016/j.brainres.2010.09.005
Lehtinen, J; Munkberg, J; Hasselgren, J; Laine, S; Karras, T; Aittala, M; Aila, T. (2018). Noise2Noise: Learning image restoration without clean data. 35th International Conference on Machine Learning, 7, 4620-4631.
-p
Li, J., Qiu, L., Xu, L., Pedapati, E. V., Erickson, C. A., & Sunar, U. (2016). Characterization of autism spectrum disorder with spontaneous hemodynamic activity. Biomedical Optics Express, 7(10), 3871. https://doi.org/10.1364/boe.7.003871
re
Li, M., Tang, D., Zeng, J., Zhou, T., Zhu, H., Chen, B., & Zou, X. (2019). An automated assessment framework for atypical prosody and stereotyped idiosyncratic phrases related to autism spectrum disorder. Computer Speech & Language, 56, 80–94. https://doi.org/10.1016/J.CSL.2018.11.002
lP
Lindell, A. K., & Hudry, K. (2013). Atypicalities in Cortical Structure, Handedness, and Functional Lateralization for Language in Autism Spectrum Disorders. Neuropsychology Review, 23(3), 257–270. https://doi.org/10.1007/s11065-013-9234-5
Lioutas, V., Passalis, N., & Tefas, A. (2018). Explicit ensemble attention learning for improving visual question
na
answering. Pattern Recognition Letters, 111, 51–57. https://doi.org/10.1016/j.patrec.2018.04.031 Liss, M., Saulnier, C., Fein, D., & Kinsbourne, M. (2006). Sensory and attention abnormalities in autistic spectrum disorders. Autism, 10(2), 155–172. https://doi.org/10.1177/1362361306062021
ur
Liu, H. Z., & Li, C. H. (2008). Research into the ADF and PP Methods in Asymmetric Unit Root Test [J]. Forecasting, (6), 13. https://doi.org/10.3724/SP.J.1005.2008.01003 Liu, T., Liu, X., Yi, L., Zhu, C., Markey, P. S., & Pelowski, M. (2019). Assessing autism at its social and
Jo
developmental roots: A review of Autism Spectrum Disorder studies using functional near-infrared spectroscopy. NeuroImage, 185, 955–967. https://doi.org/10.1016/j.neuroimage.2017.09.044
López, B., Leekam, S. R., & Arts, G. R. . J. (2008). How central is central coherence? Autism, 12(2), 159–171. https://doi.org/10.1177/1362361307086662
Lord, C., Rutter, M., Goode, S., Heemsbergen, J., Jordan, H., Mawhood, L., & Schopler, E. (1989). Austism diagnostic observation schedule: A standardized observation of communicative and social behavior. Journal of Autism and Developmental Disorders, 19(2), 185–212. https://doi.org/10.1007/BF02211841 Mohammadian Rad, N., Kia, S. M., Zarbo, C., van Laarhoven, T., Jurman, G., Venuti, P., … Furlanello, C. (2018). Deep learning for automatic stereotypical motor movement detection using wearable sensors in autism spectrum disorders. Signal Processing, 144, 180–191. https://doi.org/10.1016/j.sigpro.2017.10.011 - 18 -
Murdoch, W. J., & Szlam, A. (2017). Automatic rule extraction from long short term memory networks. arXiv preprint arXiv:1702.02540. Nordahl, C. W., Lange, N., Li, D. D., Barnett, L. A., Lee, A., Buonocore, M. H., … Amaral, D. G. (2011). Brain enlargement is associated with regression in preschool-age boys with autism spectrum disorders. Proceedings of the National Academy of Sciences of the United States of America, 108(50), 20195–20200. https://doi.org/10.1073/pnas.1107560108 Plitt, M., Barnes, K. A., & Martin, A. (2015). Functional connectivity classification of autism identifies highly predictive brain features but falls short of biomarker standards. NeuroImage: Clinical, 7, 359–366. https://doi.org/10.1016/J.NICL.2014.12.013 Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2012). Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage, 59(3), 2142–2154. https://doi.org/10.1016/J.NEUROIMAGE.2011.10.018 Power, J. D., Mitra, A., Laumann, T. O., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2014). Methods to
ro of
detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage, 84, 320–341. https://doi.org/10.1016/J.NEUROIMAGE.2013.08.048
Raven, J., John Hugh Court, & Raven, J. C. (2003). Standard Progressive Matrices:(including the Parallel and Plus Version); with Norms for the SPM Plus and Formulae for Calculating Change Scores. Pearson.
Salmond, C. H., de Haan, M., Friston, K. J., Gadian, D. G., & Vargha-Khadem, F. (2003). Investigating individual
-p
differences in brain abnormalities in autism. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 358(1430), 405–413. https://doi.org/10.1098/rstb.2002.1210 Tyszka, J. M., Kennedy, D. P., Paul, L. K., & Adolphs, R. (2014). Largely Typical Patterns of Resting-State
https://doi.org/10.1093/cercor/bht040
re
Functional Connectivity in High-Functioning Adults with Autism. Cerebral Cortex, 24(7), 1894–1905.
Waiter, G. D., Williams, J. H. ., Murray, A. D., Gilchrist, A., Perrett, D. I., & Whiten, A. (2004). A voxel-based
lP
investigation of brain structure in male adolescents with autistic spectrum disorder. NeuroImage, 22(2), 619–625. https://doi.org/10.1016/j.neuroimage.2004.02.029 Wolf, M., Ferrari, M., & Quaresima, V. (2007). Progress of near-infrared spectroscopy and topography for brain and muscle clinical applications. Journal of Biomedical Optics, 12(6), 062104.
na
https://doi.org/10.1117/1.2804899
Zielinski, B. A., Prigge, M. B. D., Nielsen, J. A., Froehlich, A. L., Abildskov, T. J., Anderson, J. S., … Lainhart, J. E. (2014). Longitudinal changes in cortical thickness in autism and typical development. Brain : A Journal
Jo
ur
of Neurology, 137(Pt 6), 1799–1812. https://doi.org/10.1093/brain/awu083
- 19 -