Outlier detection based on the neural network for tensor estimation

Outlier detection based on the neural network for tensor estimation

Biomedical Signal Processing and Control 13 (2014) 148–156 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journa...

2MB Sizes 0 Downloads 124 Views

Biomedical Signal Processing and Control 13 (2014) 148–156

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

Outlier detection based on the neural network for tensor estimation Xiaoying Zhang, Yangde Zhang ∗ National Hepatobiliary and Enteric Surgery Research Center, Ministry of Health, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, PR China

a r t i c l e

i n f o

Article history: Received 13 January 2014 Received in revised form 20 March 2014 Accepted 17 April 2014 Keywords: Diffusion weighted imaging (DWI) Tensor estimation Forward neural network (FNN) Outlier Motion artifact

a b s t r a c t Rationale and objectives: Diffusion weighted imaging (DWI) is always influenced by both thermal noise and spatially/temporally varying artifacts such as subject motion and cardiac pulsation. Motion artifacts are particularly prevalent, especially when scanning an uncooperative population with several disorders. Some motion between acquisitions can be corrected by co-registration approaches. However, automated and accurate motion outlier detection of brain DWIs is an integral component of the analysis and interpretation of tensor estimation. Many different and innovative methods have been proposed to improve upon this technology. In this study, we proposed a classifier work frame, which can classify DWIs as normal images or motion artifacts. Materials and methods: The procedure contains the following stages: first, we used the wavelet transform to extract features from the original DWIs; second, the principle component analysis was used to reduce the features; third, the forward neural network (FNN) was employed to construct the classifier; fourth, a Rossler-based chaotic particle swarm optimization method was proposed to train the FNN; fifth, the cost matrix was determined as the false negative (FN) which was 10 times larger than the false position (FP); and finally, the K-fold cross validation was chosen to avoid overfitting. We applied this method on 60 DWI datasets, including 50 training datasets and 10 test datasets. Results: The experimental results based on our DWI database showed that the proposed method can effectively extract the global feature from images and achieve better performance in tensor estimation by automatic unvoxelwise outlier rejection compared with manual and visual inspection, and previous voxelwise outlier rejection methods. We found that the motion artifact detection accuracy on both the training and test datasets was over 95.8%, while the computation time per DWI slice was only 0.0149 s. Conclusion: The proposed method could potentially remove the influence of unexpected motion artifacts in DWI acquisitions and should be applicable to other magnetic resonance imaging. © 2014 Elsevier Ltd. All rights reserved.

1. Introduction Diffusion tensor imaging (DTI) has become an important magnetic resonance imaging (MRI) procedure for imaging tissues with micro fabric structures in vivo, especially to investigate the integrity of brain white matter, and provide critical information for clinical diagnosis and biomedical research [1–3]. Currently, clinical research in the DTI field is undergoing a rapid expansion to depict white matter connectivity and microstructural features of human brain tissues that weave these sites together into a system with spatially interacting neural elements [4], and it is increasingly applied to brain studies of normal development, aging and pathological changes from various diseases [5]. In particular, diffusion tensor maps are typically computed by fitting the signal intensities

∗ Corresponding author. Tel.: +86 731 4327987; fax: +86 731 4327987. E-mail address: [email protected] (Y. Zhang). http://dx.doi.org/10.1016/j.bspc.2014.04.005 1746-8094/© 2014 Elsevier Ltd. All rights reserved.

from diffusion weighted images as a function of their corresponding b-matrices according to the multivariate least squares regression modal proposed by Basser et al. [1]. However, the signal in diffusion weighted imaging (DWI) is influenced not only by thermal noise, but also by spatially and temporally varying artifacts. All these factors might result in a very low signal-to-noise ratio (SNR) in the DWI dataset and lead to substantial influence on accuracy of the artificial tensor estimation [6,7]. Motion between acquisitions can be corrected by image registration approaches, but individually damaged images are generally uncorrectable, especially in scans of an uncooperative population [8,9]. Therefore, it is important to detect and remove corrupted images and outliers. Currently, most DTI quality control procedures are conducted manually by visually checking the DWI dataset slice by slice (Liu et al., 2010). In clinical practice, if images were corrupted by severe artifacts, they should be removed or be weighted less in subsequent tensor calculation. In theory, we only need six DWIs to calculate a tensor. However, in order to enhance the SNR of DTI, typically 32–128

X. Zhang, Y. Zhang / Biomedical Signal Processing and Control 13 (2014) 148–156

measurements are obtained to improve the robustness of the estimated tensor [10]. Hence, we can usually afford to reject a small subset of the dataset to get tensor estimation and to also not scarify the SNR significantly. Recently, investigators have also developed various algorithms to address this issue by automatically identifying outliers for essential motion correction in DTI tensor estimation. However, the results often suffer from low consistency across different datasets, lack of agreement between different researchers, and difficulty judging motion artifacts by qualitative inspection. The challenge is to identify and reject such images with motion artifacts from the redundant source images in DWIs quickly, accurately and automatically. The traditional tensor estimation method does not work well, although the DTI community is aware of the detrimental effects of the outlier, while some robust tensor estimation methods could identify potential outliers with low weight, in which the computation cost of the iteration procedure is enormous [11,12]. In this study, an automatic algorithm was proposed to improve the accuracy, sensitivity and robustness of the tensor estimation by targeting the most common artifacts in DWIs. The proposed method performs a standard pipeline of processing typical DTI data prior to tensor estimation. The procedure consists of three stages: feature extraction, feature dimension reduction, and forward neural network (FNN)-based classification. The wavelet transform is an effective tool for feature extraction because it allows for the analysis of images at various levels of resolution [13,14]. In order to further reduce the feature vector dimensions and increase the discriminative power, the principal component analysis (PCA) method was employed [15]. Subsequently, the FNN was chosen as the classifier because it is a powerful tool among supervised classifiers and it can classify nonlinear separable patterns and approximate an arbitrary continuous function. Recently, particle swarm optimization (PSO) algorithm is available to train the FNN. PSO is well-known for its lower computational cost, low memory store, and robust convergence to global minima [16]. In order to improve the performance of PSO, we used a chaotic PSO (CPSO) method [17,18] to find the optimal parameters for FNN. The performance of the CPSO method has been compared with that of adaptive back-propagation (BP), adaptive genetic algorithm (GA) and canonical PSO methods. The method has been tested on 60 adult DWI datasets, which were manually inspected for non-recovery datasets. We specifically investigated the ability of the proposed method to provide reliable and rapid tensor estimation in the presence of artifacts resulting from motion artifacts. The experimental results showed that the proposed method could extract the useful features from brain DWIs effectively and achieve good performance in outlier detection. The potential motion outliers could then be successfully excluded from the subsequent tensor estimation procedure. This article is organized into the following sections. Section 2 presents an optimized tensor estimation model based on automatic outlier rejection using the FNN method, including a wavelet transform-based feature extraction and a PCA technique-based feature dimension reduction, and a CPSO method to eliminate the adverse effects in DWIs. In Section 3, experimental results on real DWI datasets are presented to investigate the ability of the algorithm to provide reliable tensor estimation. The final section is devoted to the discussion and conclusions.

2. Methods 2.1. Feature extraction The wavelet transform (WT) is applicable to various fields especially in image classification [19,20]. Discrete wavelet transform (DWT) is a powerful implementation of the wavelet transform. The

149

Fig. 1. Schematic diagram of 2D DWT.

basic fundamental principle of DWT is introduced as follows: suppose x(t) is a square-integrable function, then the continuous WT of x(t) relative to a given wavelet (t) is defined as





W (a, b) =

x(t)

a,b (t)dt

(1)

−∞

where a,b (t)

t − a

1 = √ a

b

(2)

Here, wavelet a,b (t) is calculated from the mother wavelet (t) by translation and dilation: a is the dilation factor and b is the translation parameter (both real positive numbers). Eq. (1) can be discretized by restraining a and b to a discrete lattice (a = 2b and a > 0) to give the DWT, which can be expressed as follows: caj,k (n) = DS[



cdj,k (n) = DS[

n

x(n)gj∗ (n − 2j k)]

n

x(n)h∗j (n − 2j k)]



(3)

Here, coefficients caj,k and cdj,k refer to the approximation components and the detail components respectively. The functions g(n) and h(n) denote the low-pass filter and high-pass filter respectively. The subscripts j and k represent the wavelet scale and translation factors respectively. In applying this technique to DWIs, the DWT is applied separately in each dimension. Fig. 1 illustrates the schematic diagram of 2D DWT. As a result, there are 4 sub-band (LL, LH, HH, HL) images on each scale. The sub-band LL is used for the next 2D DWT. The LL sub-band can be regarded as the approximation component of the image, while the LH, HL, and HH sub-bands can be regarded as the detailed components of the image. As the level of the decomposition increased, we obtained more compact yet coarser approximation components. Thus, wavelets provide a simple hierarchical framework for interpreting the image information. Feature extraction with Level-1 and level-2 decompositions is still huge for further processing. In our algorithm, level-3 decomposition via the Harr wavelet was utilized to the extract features. 2.2. Principal component analysis (PCA) Classification methods often begin with the dimension reduction procedures where data are approximated by points in a lower-dimensional space. The wavelet coefficients obtained from DWT are considered to be the input features, but the dimensionality is too large for computation. There are two general approaches for performing dimensionality reduction: feature extraction and feature selection. Feature extraction transforms the existing features into a lower dimensional space, while feature selection selects a subset of the existing features without a transformation. In this paper we chose the feature extraction and dimension reduction method since our goal pertains to the final classification accuracy rather than the physical features.

150

X. Zhang, Y. Zhang / Biomedical Signal Processing and Control 13 (2014) 148–156

Fig. 2. A two-layer FNN diagram.

The dimensionality of wavelet coefficients obtained from DWT is too large to class computation. In this paper we chose PCA to further reduce the dimensionality. PCA is an efficient tool to reduce the dimension of a data set consisting of a large number of interrelated variables, while retaining most of the variations. PCA describes the space of the original data projecting onto the space in a base of orthogonal eigenvectors. The corresponding eigenvalues account for the energy of the process in the eigenvector directions. Then, these components are ordered based on their eigenvalues that the largest come first. As these resulting components are ordered based on their eigenvalues it could eliminate those components contributing the least to the variation in the data set. 2.3. Improved PSO-based neural network Neural networks (NNs) are widely used in pattern classification since they do not need any information about the probability distribution and a priori probabilities of different classes. A single, hidden layer back-propagation neural network is adopted with sigmoid neurons in the hidden layer, and a linear neuron in the output layer. The training vectors were presented to the NN, which were trained in batch mode. The network configuration is NI × NH × NO. This consists of a two-layer network with NI input neurons, NH neurons in the hidden layer, and NO output indicating that the DWI is normal or with motion artifacts. The structure is depicted in Fig. 2. In this study, since there were only 2 classes (DWIs with or without motion artifacts), only one output neuron was sufficient. The number 1 was denoted for normal DWI and 0 for motion artifacts. The setting of the other two parameters, NI and NH, will be discussed in Section 3. 2.3.1. Cost matrix If a normal image was classified as an artifact, the final DTI can be generated from other channels (larger than 6). Alternatively, if an artifact was classified as a normal one, the final DTI will be impaired. Therefore, we chose the cost matrix as Eq. (4), where the cost of misclassifying artifacts into normal images was 10 times higher than the cost of misclassifying the normal images into artifacts. We defined the positive class as the normal DWIs and the negative class as the motion artifacts. Then, the true position (TP), false position (FP), false negative (FN), and true negative (TN) denoted the correct positive classification, misclassify positive into negative, misclassify negative into positive, and the correct negative classification respectively. Normal (O)

Artifact (O)

Normal (T)

0 (TP)

1 (FP)

Artifact (T)

10 (FN)

0 (TN)

(4)

Fig. 3. Flow chart of the PSO algorithm.

2.3.2. Training method The traditional training method of NN can easily be trapped into the local minima, and the training procedures take a long time. In this study, we chose PSO to find the optimal parameters of the neural network. PSO is a population-based stochastic optimization technique that simulates the social behavior of a swarm of bird, flocking bees, and fish schooling. By randomly initializing the algorithm with candidate solutions, the PSO successfully leads to a global optimum. This is achieved by an iterative procedure based on the processes of movement and intelligence in an evolutionary system. Fig. 3 shows the flow chart of a PSO algorithm. In PSO, each potential solution is represented as a particle. Two properties (position x and velocity v) are associated with each particle. Suppose x and v of the ith particle are given as x = (xi1 , xi2 , · · ·, xiN )

(5)

v = (vi1 , vi2 , · · ·, viN )

(6)

where N stands for the dimensions of the problem. In each iteration step, a fitness function is evaluated for all the particles in the swarm. The velocity of each particle is updated by keeping track of the two best positions. One is the best position a particle has traversed so far and called “pBest.” The other is the best position that any neighbor of a particle has traversed so far. It is a neighborhood best called “nBest.” Hence, a particle’s velocity and position are updated as follows:

v = ω · v + c1 r1 (pBest − x) + c2 r2 (nBest − x)

(7)

x = x + vt

(8)

where ω is called the “inertia weight” that controls the impact of the previous velocity of the particle on its current one. The parameters c1 and c2 are positive constants, called “acceleration coefficients.” The parameters r1 and r2 are random numbers that are uniformly distributed in the interval [0,1]. These random numbers are updated each time they occur. The parameter t stands for the given time-step. The population of particles is then moved according to (7) and (8), and tends to cluster together from different directions. The PSO algorithm runs through these iterative processes until the termination criterion is satisfied. 2.3.3. Chaotic PSO The parameters (r1 , r2 ) were generated by pseudo-random number generators (RNG) in classical PSO. The RNG cannot ensure

X. Zhang, Y. Zhang / Biomedical Signal Processing and Control 13 (2014) 148–156

151

analysis on one subset, and validating the analysis on the other subsets [22]. To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds [23]. In general, cross validation contains three types: random subsampling, K-fold cross validation, and leave-one-out validation. We applied the K-fold cross validation because it is simple, easy to execute, and uses all of the data for training and validation [24]. The mechanism is to create a K-fold partition of the whole dataset, repeat K times to use K-1 folds for training with a left fold for validation, and finally average the error rates of K experiments. In this study, we chose K = 10. 2.4. All procedures

Fig. 4. A Rossler chaotic number generator with a = 0.2; b = 0.4; and c = 5.7.

the optimization’s ergodicity in solution space because they are absolutely random; therefore, we employed the Rossler chaotic operator [21] to generate parameters (r1 , r2 ). The Rossler equations are as follows:

⎧ dx ⎪ = −(y + z) ⎪ ⎪ dt ⎪ ⎪ ⎨ dy = x + ay dt

⎪ ⎪ ⎪ ⎪ ⎪ ⎩ dz dt

The detailed flowchart of our proposed algorithm is shown in Fig. 5, showing the relationship between different modules. The procedure contains the following stages: (i) the WT was used to extract features from the original DWT images. (ii) PCA was used to reduce the features. (iii) The FNN was employed to construct the classifier. (iv) A Rossler-based chaotic PSO method was proposed to train the FNN. (v) The cost matrix was determined when the FN was 10 times larger than FP. (vi) The K-fold cross validation was chosen to avoid overfitting. 3. Analysis of human DWI datasets

(9) 3.1. Experimental material

= b + xz − cz

Here a, b, and c are parameters. In this study, we chose a = 0.2, b = 0.4, and c = 5.7. The results are shown in Fig. 4, where the line in the 3D space exhibits a strong chaotic property called “spiral chaos.” The dynamic properties of x(t) and y(t) satisfy both ergodicity and randomness. Therefore, we let r1 = x(t) and r2 = y(t) to embed the chaotic operator into the canonical PSO method.

Sixty adult subjects were recruited with written, informed parental consent and in compliance with hospital ethics as a control part of a longitudinal MRI study on brain imaging. DTI was performed as part of the brain imaging sequences. The research protocol for the human study was approved by the Central South University’s Institutional Review Board. 3.2. Imaging protocol

2.3.4. Cross validation In order to prevent overfitting, we employed the cross validation method. It is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used to estimate how accurately a predictive model will perform in practice. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the

MR images were collected with a 3.0 Tesla MR scanner system at Xiangya Hospital. This unit is equipped with a high strength and high-speed gradient system capable of conducting DWI. For DWI acquisitions, a single-shot spin echo, echo planar imaging (SE–EPI) sequence was used, with diffusion gradients applied in 15 non-collinear spatial directions and 3 baseline images with

Fig. 5. Flowchart of our proposed algorithm.

152

X. Zhang, Y. Zhang / Biomedical Signal Processing and Control 13 (2014) 148–156

Table 1 Setting of training and test images.

Normal DWIs Motion artifacts Total

Training subset

Test subset

48,953 1447 50,400

9632 448 10,080

b = 1000 s/mm2 for each subject. The thickness of each slice was 2.5 mm without gap, and 60 axial slices (sized 256 × 256 voxel matrix) were assigned to cover the entire brain, resulting in a 0.94 mm × 0.94 mm in-plane resolution. The sequence parameters for DWI were: TE = minimal, which was 70 ms; TR = 17,000 ms; FOV = 24 cm × 24 cm and the acquisition data matrix = 132 × 128. The total DWI scanning time was 9 min 57 s with NEX = 2 to avoid excessive motion of the infant subjects. In addition, routine clinical MRI scans (T1, T2, and fluid-attenuated inversion recovery) were acquired to rule out any incidental pathological abnormalities.

3.3.2. Wavelet-based feature extraction The three levels of wavelet decomposition greatly reduced the input image size as shown in Fig. 7. The top left corner of the wavelet coefficient’s image denotes the approximation coefficients of level-3 whose size is only 32 × 32 = 1024, viz. the features’ numbers decreased from 256 × 256 to 32 × 32. The wavelets here can extract the structure of the artifacts while suppressing the noises, which is why we chose it. 3.3.3. Feature dimension reduction However, the remaining 1024 features are still too large for fast computation. Thus, PCA is used to further reduce the dimensions of the features. The curve of the cumulative sum of the variance with the number of principle components is shown in Fig. 8(a). The variance with the number of principal components from 1 to 38 are partial listed in Table 2. It shows that 31 principle components (bold font in table), which are only 3.03% of the original feature number, and preserve 98.1% of the total variance.

3.3. Experimental results 3.4. FNN training The algorithm was developed via the wavelet toolbox, the neural network toolbox, and the statistical toolbox of Matlab 2009b (Mathworks©). 3.3.1. Database The FNN methodology was applied to 60 series of DWI scans from these subjects, and the results of data saving are reported here to demonstrate our method. The setting of the training and testing images is shown in Table 1. We used 50 subjects for training and 10 subjects for testing. The training subset contained 56 (slices) × 18 (directions) × 50 (subjects) = 50,400 images, among which 48,953 were normal images and 1447 contained artifacts. The test subset contained 56 (slices) × 18 (directions) × 10 (subjects) = 10,080 images, among which 9632 were normal and 448 were corrupted by motion artifacts. Motion between acquisitions can be corrected by image registration approaches, but individually damaged images are generally uncorrectable (Fig. 6). Bad DW images from adult subjects listed below were seriously damaged by head motion. Some texture features are included in both datasets: wave, high brightness, shadow, and double image. The b0 images were made as the baseline, and all the other DWI images were manually inspected and were also calculated the correlation with the baseline to determine whether the image is infested with motion artifacts.

The 31 principle components were then directly sent to FNN. Thus, the NI is 31 and the NH is determined to be 30 according to the information entropy method [25]. The NO is determined to be 1 because our classification is a two-level problem where “0” indicates normal DWIs and “1” indicates motion artifacts. Consequently, the structure of the neural network is 31-301. We used the proposed CPSO method to train the neural network, and the results are shown in Fig. 8(b). It is shown that the CPSO converges at the 124th epoch. We chose an adaptive BP (ABP) method [26], an adaptive GA algorithm (AGA) [25] and a canonical PSO method as the comparative methods. The average weighted median square errors (MSEs) in the validation subsets obtained by these three algorithms are shown in Fig. 8(c). It indicates that all three algorithms are either trapped into local minima or are still slowly reaching global minima. Therefore, our proposed CPSO is fast and effective. 3.4.1. Classification accuracy The confusion matrix of our method in training and test subjects is listed in Table 3. The element of the ith row and the jth column represents the classification accuracy belonging to class i and is assigned to class j after the supervised classification.

Fig. 6. Artifact: DWs damaged by head motions. The top line was adult data; the bottom line was infant data.

X. Zhang, Y. Zhang / Biomedical Signal Processing and Control 13 (2014) 148–156

153

Fig. 7. An example of 3-level 2D DWT: (a) images without artifacts and (b) images with minimal motion artifacts.

The results show that our method obtained high classification accuracy in both training and test subjects. On the other side, the error rate of misclassifying the artifacts into normal images was quite low (2.4% of training subjects and 4.2% of test subjects) compared to the error rate of misclassifying normal images into motion artifacts (11.7% of training subjects and 12.4% of test subjects). The reason lies in the cost matrix we used. We used the identity matrix as a cost function, viz. treat the two error rates equally; the confusion matrix is shown in Table 4. The total error rate is only 2.34% in training subjects and 6.25% in test subjects, which is lower than using formula (4) with 11.43% in training subjects and 12.03% in test subjects. However, our goal is to minimize the error of misclassifying motion artifacts to normal images. It indicates that our method has a low error rate of misclassifying artifacts to normal images (2.4% in training subjects and 4.2% in test subjects) compared to the identity matrix method (20.7% in training subjects and 25.4% in test subjects). The bottom line is that any omission of motion artifacts within the DWIs should seriously influenced the performance of subsequent tensor estimation.

3.4.2. ROC curve The receiver operating characters (ROC) curve in test subjects is shown in Fig. 8(d). It indicates that the whole curves tend to go to the left (low false positive rate) due to the cost matrix. The optimal thresholds are shown as red dots, where the classifier has a very low false positive rate while having an extremely high true positive rate. 3.4.3. Time analysis The time of the training classifier using our proposed algorithm is only about 1 min. After the neural network was trained, there was no need to retrain it because the classifier proved to correctly classify other subjects. Therefore, the classification time only contained the following three phases: WT, trained PCA, and trained neural network. We used the whole classifier system to distinguish hundreds of input images, average the computation time, and finally we found that our classifier took only 0.0149 s to distinguish an image, which is fast enough for real-life applications.

Table 2 Detailed PCA data. No. of prin. comp.

1

6

11

16

21

26

31

36

37

38

Cumulative Variance (%)

39.4

69.7

85.7

92.7

95.4

97.1

98.1

98.7

98.7

98.7

Table 3 Confusion matrix of our method (O denotes output, T denotes target). Training subjects

Normal (T) Artifacts (T)

Test subjects

Normal (O)

Artifacts (O)

Normal (O)

Artifacts (O)

43,225 88.3% 35 2.4%

5728 11.7% 1412 97.6%

8438 87.6% 19 4.2%

1194 12.4% 429 95.8%

154

X. Zhang, Y. Zhang / Biomedical Signal Processing and Control 13 (2014) 148–156

Fig. 8. (a) Variances against the number of principle components (x-axis is on a log scale). (b) Training by our proposed CPSO method. (c) Training performance of the three comparative algorithms. Results show the final MSE found by our proposed CPSO method. (d) ROC curve of test subjects. (For interpretation of the references to color in text, the reader is referred to the web version of the article.)

3.4.4. DWI outlier rejection by FNN and tensor estimation The diffusion tensor is mathematically computed based on DWI data with at least 7 independent directions of gradients at varying strengths, including one b0 baseline imaging dataset using traditional LS methodology. However, many DWI datasets have a very low SNR and contain lots of head motion artifacts, which consequently affect the accuracy of the artificially reconstructed DTI data. Therefore, an optimized procedure for tensor estimation for these uncorrected DWIs with noise and motion artifacts is highly desired. Our outlier rejection approach based on classification with FNN has been successfully applied to large scale DTI studies with several hundred DWI datasets in our lab as well as collaborating labs in Central South University. Single adult DWI dataset results in a

fractional anisotropy (FA) color map were shown in Fig. 9 as an example, and were manually inspected and automatically analyzed by our optimized methodology. Motions in the volume weighted by diffusion gradient along each axis (Y, Z and X), which leading to the bias of certain hue. The automated algorithm identified almost all (97.87%, 46/47) of the motion artifacts in this special subject’s DWI dataset, which had been labeled as damaged by manual inspection. Moreover, our proposed algorithm can identify slices with fewer head motion artifacts, which had not been detected in the previous inspection. The resulting color maps from the experiment demonstrate that while the results of our proposed method are more identical to the reference data, a significant difference exists in many regions between the results of the previous methods and

Table 4 Confusion matrix using the identity cost matrix. Training subjects

Normal (T) Artifacts (T)

Test subjects

Normal (O)

Artifacts (O)

Normal (O)

Artifacts (O)

48,072 98.2% 299 20.7%

881 1.8% 1148 79.3%

9112 94.6% 114 25.4%

520 5.4% 334 74.6%

X. Zhang, Y. Zhang / Biomedical Signal Processing and Control 13 (2014) 148–156

155

Fig. 9. Single adult DT data: the FA color map calculated without any data rejection (left column), and with automated rejection with CPSO-based FNN (right column).

our proposed method, which is capable of detecting potential outliers missed during manual investigation. The traditional LS method in tensor estimation does not contain a mechanism to expel the outliers in DWI data and all error data will enter the procedure of tensor calculation; it will make the resulting tensor image incorrect. In contrast, our method excluded outliers so that errors were minimized. The novel method can find head motions and slight changes from texture features, and the invisibly perceivable artifacts could be detected and rejected automatically. Finally, it is important to emphasize that this method relies on an over-sampled dataset and some problems may arise if the there are many corrupted images within one slice. If too few gradients (less than 6) remained for one slice, the dataset would not be directly saved for further study. Fortunately, the interpolation method for DTI was very helpful when one special slice was missing in most volumes during the scan. This is a fairly straightforward approach that uses a linear interpolation of the eigenvalues and the rotation angle related to each axis [27]. This method has great accuracy for tensor field interpolation and can preserve the three diffusion

directions. The FA color map of recovery slice I from slice I − 1 and I + 1 by the interpolation methods is shown (here I = 30) in Fig. 10. However, if too many continued slices with motion artifacts abound in the DTI dataset, the entire DWI dataset should be deleted without further processing and cannot be saved. In addition, as the original DWI data are evenly distributed, removing the motionaffected DWIs might cause uneven distribution of SNR, which might affect the accuracy of tensor estimation. 4. Discussion and conclusion DTI is used increasingly in clinical research for its ability to depict white matter tracts and for its sensitivity to microstructure and architecture features of brain tissue. However, artifacts are common in DWI acquisitions. They come from both the acquisition system (such as eddy-current artifact and vibration artifact) and the subject being scanned (such as cardiac pulsation artifact and head motion artifact, especially for uncooperative or pediatric subjects). Signal changes produced by these artifacts can be severe

Fig. 10. Color map of recovery slice 30 (b) from slice 29 (a) and 31 (c) with the interpolation method based on directional cosine matrix metrics. (d) Original slice taken from the real slice from the tensor volume; (e) interpolation slice. Ellipsoids view of the white rectangles indicates a slight difference between (d) and (e).

156

X. Zhang, Y. Zhang / Biomedical Signal Processing and Control 13 (2014) 148–156

and result in erroneous diffusion tensor values. Recently, the diffusion tensor has been estimated by robust estimation techniques that assign low weights to artifacts (outliers) from the DWI data and select the selected diffusion weighted signals rather than the traditional full data multivariate least-squares regression model [11,12]; other investigators proposed other automatic outlier rejection by ADC and FA consistency (Jiang et al., 2009). However, the established optimized tensor estimation methods have two main drawbacks: (i) they are computationally intensive while improving the accuracy, sensitivity and robustness; and otherwise, (ii) they indirectly distinguish detailed texture features from calculated measurements (e.g. ADC and FA) rather than from the original DWI data, which would cause the loss of much information during preprocessing. Efforts have been undertaken to address various aspects of this problem, but this field is still poorly explored by targeting the most common artifacts from texture features. In this study we proposed a classifier, which can classify infant images as normal or motion artifacts. We used the WT to extract features from original DWT images due to the excellent multi-scale resolution ability and the fast computation algorithm. PCA was used to reduce the features to only 31 while preserving 98.1% energy of the total variances. FNN was then employed to construct the classifier because the neural network can theoretically approximate any function at any precision. Moreover, the CPSO method was proposed based on the Rossler equation to train the neural network, since traditional methods such as BP are easily trapped into local minima. It is more robust and faster as compared with other algorithms. Therefore, we will apply the CPSO to other biomedical fields. Meanwhile, we embedded the cost matrix where the FN was 10 times larger than FP to the training objective function of the neural network. Finally, the K-fold cross validation was chosen to enhance generalization and K was determined to be 10 throughout the trial-and-error method. In our protocol, we acquired three b0 baseline image datasets for each subject, which makes the reconstruction of the tensor more reliable. We demonstrated that the baseline measurement and tensors can be estimated in separate procedures. Only the closest b0 baseline to the DWIs was taken as the reference used to identify DWIs containing error signals before tensor estimation. All DWIs from ten new subjects were checked by our classifier before tensor estimation. If a small quantity of diffusion weighted slices in a subset with head motion artifacts was detected, these individual slices were deleted and the remaining usable images from excessively redundant acquisition were used for subsequent procedures. If a special slice with too many motion artifacts abounds in all DWI volumes, the entire slice is deleted without further processing, and the tensor interpolation procedure is processed in later steps. We tested this work frame extensively with datasets bearing artifacts from the adult DWI database from uncooperative or pediatric subjects, and we recycled and saved many of them (56.99% 53/93). These experiments and results have demonstrated that the proposed method improves the quality of tensor estimation for the DWI datasets. The impact of the damaged data on the initial estimate of the diffusion tensor is diluted because of the over sampling of diffusion data during acquisition, and tends to be reasonably eliminated in the optimized tensor estimation based on our method. Therefore, it does not affect subsequent outlier rejection from the 4D dataset before tensor estimation. The new method is strong in its ability to perform a direct image texture extraction of the entire brain DWI dataset within a reasonable time frame. The most important contribution of this paper is that the integration of WT, PCA, FNN, and CPSO methods presented a powerful tool in estimating the complex diffusion tensor in DWIs with unavoidable head motion artifacts. It is shown to be effective in detecting motion outliers in DWIs,

while reducing the computational cost. Although these are preliminary observations, we are confident that the new technique offers advantages over the previous methods and can be extended to any subject population prone to motion. This technique of brain DWIs classification based on FNN is a potentially valuable tool to be used in computer-assisted optimized tensor estimation in DTI and it should benefit for the further tractography in DTI studies. The algorithm provides the user with the ability to do the real-time quality control in an automatic mode in lieu of manual inspection. Acknowledgement We gratefully acknowledge the patients and their families for their support and participation. References [1] P.J. Basser, J. Mattiello, D. LeBihan, MR diffusion tensor spectroscopy and imaging, Biophys. J. 66 (1994) 259–267. [2] S. Mori, P.B. Barker, Diffusion magnetic resonance imaging: its principle and applications, Anat. Rec. 257 (1999) 102–109. [3] C. Pierpaoli, et al., Diffusion tensor MR imaging of the human brain, Radiology 201 (1996) 637–648. [4] D. Le Bihan, et al., Diffusion tensor imaging: concepts and applications, J. Magn. Reson. Imaging 13 (2001) 534–546. [5] A.F. DaSilva, et al., A primer on diffusion tensor imaging of anatomical substructures, Neurosurg. Focus 15 (2003) E4. [6] D. Merhof, et al., Correction of susceptibility artifacts in diffusion tensor data using non-linear registration, Med. Image Anal. 11 (2007) 588–603. [7] C.F. Westin, et al., Image processing for diffusion tensor magnetic resonance imaging, in: Medical Image Computing and Computer-Assisted Intervention, Miccai’99, Proceedings, 1679, 1999, pp. 441–452. [8] C. Goodlett, et al., Quantification of measurement error in DTI: theoretical predictions and validation, Med. Image Comput. Comput. Assist. Interv. 10 (2007) 10–17. [9] C.G. Koay, et al., A unifying theoretical and algorithmic framework for least squares methods of estimation in diffusion tensor imaging, J. Magn. Reson. 182 (2006) 115–125. [10] D.K. Jones, The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: a Monte Carlo study, Magn. Reson. Med. 51 (2004) 807–815. [11] L.C. Chang, D.K. Jones, C. Pierpaoli, RESTORE: robust estimation of tensors by outlier rejection, Magn. Reson. Med. 53 (2005) 1088–1095. [12] J.F. Mangin, et al., Distortion correction and robust tensor estimation for MR diffusion imaging, Med. Image Anal. 6 (2002) 191–198. [13] S.P. Valsan, K.S. Swarup, Wavelet transform based digital protection for transmission lines, Int. J. Electr. Power Energy Syst. 31 (2009) 379–388. [14] Z.Y. Zhou, Z.C. Ruan, Multicontext wavelet-based thresholding segmentation of brain tissues in magnetic resonance images, Magn. Reson. Imaging 25 (2007) 381–385. [15] J. Camacho, J. Pico, A. Ferrer, The best approaches in the on-line monitoring of batch processes based on PCA: does the modelling structure matter? Anal. Chim. Acta 642 (2009) 59–68. [16] S. Kiranyaz, et al., Evolutionary artificial neural networks by multi-dimensional particle swarm optimization, Neural Netw. 22 (2009) 1448–1462. [17] B. Liu, et al., Improved particle swarm optimization combined with chaos, Chaos Solitons Fract. 25 (5) (2005) 1261–1271. [18] Y. Song, Z. Chen, Z. Yuan, New chaotic PSO-based neural network predictive control for nonlinear process, IEEE Trans Neural Netw. 18 (2) (2007) 595–600. [19] S. Chaplot, L.M. Patnaik, N.R. Jagannathan, Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network, Biomed. Signal Process. Control 1 (2006) 86–92. [20] E.S.A. El-Dahshan, T. Hosny, A.B.M. Salem, Hybrid intelligent techniques for MRI brain images classification, Digital Signal Process. 20 (2010) 433–441. [21] O.E. Rössler, An equation for continuous chaos, Phys. Lett. A 57 (1976) 397–398. [22] M.B. Chadwick, et al., Actinide ENDF/B-VII cross-section evaluations and validation testing, Ann. Nucl. Energy 36 (2009) 258–262. [23] J. Camacho, J. Picó, A. Ferrer, Data understanding with PCA: structural and variance information plots, Chemom. Intell. Lab. Syst. 100 (2010) 48–56. [24] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, IJCAI (1995) 1137–1145. [25] O. Ludwig, et al., Applications of information theory, genetic algorithms, and neural models to predict oil flow, Commun. Nonlinear Sci. Numer. Simulat. 14 (2009) 2870–2885. [26] S.W. Yu, K.J. Zhu, F.Q. Diao, A dynamic all parameters adaptive BP neural networks model and its application on oil reservoir prediction, Appl. Math. Comput. 195 (2008) 66–75. [27] O. Pasternak, N. Sochen, P.J. Basser, The effect of metric selection on the analysis of diffusion tensor MRI data, Neuroimage 49 (2010) 2190–2204.