A new compression format for fiber tracking datasets

A new compression format for fiber tracking datasets

YNIMG-11880; No. of pages: 11; 4C: 3, 5, 6, 8, 10 NeuroImage xxx (2015) xxx–xxx Contents lists available at ScienceDirect NeuroImage journal homepag...

2MB Sizes 0 Downloads 34 Views

YNIMG-11880; No. of pages: 11; 4C: 3, 5, 6, 8, 10 NeuroImage xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

NeuroImage journal homepage: www.elsevier.com/locate/ynimg

A new compression format for fiber tracking datasets

2Q6

Caroline Presseau, Pierre-Marc Jodoin, Jean-Christophe Houde, Maxime Descoteaux ⁎

3 4Q7

Computer Science Department, Faculty of Science, Université de Sherbrooke, 2500 Boulevard Université, Sherbrooke, QC J1K 2R1, Canada Centre d'Imagerie Moléculaire de Sherbrooke (CIMS), Centre de Recherche CHUS, Canada

5

a r t i c l e

6 7 8

Article history: Accepted 22 December 2014 Available online xxxx

9 10 11 12 13 14 15

Keywords: Compression Diffusion MRI Tractography Diffusion Tensor Imaging (DTI) High Angular Resolution Diffusion Imaging (HARDI)

O

a b s t r a c t

R O

i n f o

F

1Q5

C

T

E

D

P

A single diffusion MRI streamline fiber tracking dataset may contain hundreds of thousands, and often millions of streamlines and can take up to several gigabytes of memory. This amount of data is not only heavy to compute, but also difficult to visualize and hard to store on disk (especially when dealing with a collection of brains). These problems call for a fiber-specific compression format that simplifies its manipulation. As of today, no fiber compression format has yet been adopted and the need for it is now becoming an issue for future connectomics research. In this work, we propose a new compression format, .zfib, for streamline tractography datasets reconstructed from diffusion magnetic resonance imaging (dMRI). Tracts contain a large amount of redundant information and are relatively smooth. Hence, they are highly compressible. The proposed method is a processing pipeline containing a linearization, a quantization and an encoding step. Our pipeline is tested and validated under a wide range of DTI and HARDI tractography configurations (step size, streamline number, deterministic and probabilistic tracking) and compression options. Similar to JPEG, the user has one parameter to select: a worst-case maximum tolerance error in millimeter (mm). Overall, we find a compression factor of more than 96% for a maximum error of 0.1 mm without any perceptual change or change of diffusion statistics (mean fractional anisotropy and mean diffusivity) along bundles. This opens new opportunities for connectomics and tractometry applications. © 2014 Elsevier Inc. All rights reserved.

35 33 32

E

34

1. Introduction

37 38 Q9

Diffusion magnetic resonance imaging (dMRI) tractography is an increasingly popular research area that helps in understanding structural connectivity between brain regions. The great success of dMRI tractography comes from its capability to accurately describe some of the neural architecture in vivo (Descoteaux and Poupon, 2014). Streamline fiber tracking datasets contain thousands, if not millions of streamlines and each streamline contains hundreds to thousands of 3D points. These streamlines are often called “tracts”. Here, we prefer to use the term streamline for a set of 3D points that represent virtual anatomical fiber representations (Côté et al., 2013). For example, a dataset generated with the deterministic DTI (or tensorline) algorithm (Weinstein et al., 1999; Lazar et al., 2003) using a 0.2 mm step size and 500,000 streamlines requires roughly 1.3 gigabytes (GB) of space. As such, some datasets are so large that they cannot be visualized due to the limited amount of RAM (Random Access Memory) available on most computers. Thus, visualization, storage, and handling of such a dataset require heavy processing and a lot of memory. Unfortunately, no

42 43 44 45 46 47 48 49 50 51 52 53 54

R

N C O

40 41

U

39

R

36

⁎ Corresponding author. E-mail address: [email protected] (M. Descoteaux).

fiber compression format has yet been adopted and the need for it is now becoming a glaring issue for future research. Compression algorithms are categorized into two distinct families, lossy and lossless. Lossless data compression algorithms allow the original signal (or document) to be recovered from the compressed one with no loss of quality (Nelson and Jean-Loup, 1995; Sayood, 2006; Salomon and Motta, 2010). ZIP, GIF and PNG (Taubman and Marcellin, 2002; Sayood, 2006; Schelkens et al., 2009; Salomon and Motta, 2010) file formats are typical examples of lossless compression algorithms. On the other hand, lossy algorithms perform a compression by removing elements from the original signal. Hence, the exact original data cannot be retrieved from the compressed version (Nelson and Jean-Loup, 1995; Sayood, 2006; Salomon and Motta, 2010). Multimedia file formats such as mp3, JPEG and MPEG are typically associated to lossy compression. As opposed to lossless compression, lossy compression techniques have no limit on the amount by which they can compress a signal. Lossy compression thus involves a trade-off between quality and compression ratio (Nelson and Jean-Loup, 1995; Sayood, 2006; Salomon and Motta, 2010). What differentiates a lossy compression method from another one is often the end application it was designed for. Although mp3, JPEG and MPEG share a common ground, they nonetheless have their own specifications. Due to the very nature of their signal, compression methods used for audio files (1D), image files (2D) or movie files (2D + time)

http://dx.doi.org/10.1016/j.neuroimage.2014.12.058 1053-8119/© 2015 Elsevier Inc. All rights reserved.

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

16 17 18 19 20 21 22 23 24 25 26 27 Q8 28 29 30 31

55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

2. Methodology

141

2.1. Definitions

142

146 147

154

F

• Streamline point: A point in 3D space, pi = (xi, yi, zi) ∈ R3. • Streamline: a curve in 3D space containing a finite, ordered and connected sequence of 3D points fi = {p1, …, pk}. • Fiber tracking dataset: A set of streamlines Fn = { f1, …, fn} for a finite n ∈ N. • |. |: Cardinality operator which returns number of elements in a set or a sequence. For example, |f| stands for the total number of points pi in the streamline f.

O

A fiber tracking dataset is a set of 3D curves in which each streamline 143 is represented as a series of 3D points. The definitions used in this paper 144 are as follows: 145

C

E

R

R

O

C

N

U

1

DTI: Diffusion Tensor Imaging HARDI: High-Angular-Resolution Diffusion Imaging.

148 149 150 151 152 153

155

Tractography algorithms find structural white matter connections by following the principal directions of diffusion at each voxel (Descoteaux and Poupon, 2014). One important tractography parameter is the step size δ. Step size determines the distance between two consecutive points pi and pi + 1. As such, a small δ leads to a tight set of points pi for each streamline. This implies that more points among each streamline are likely to be collinear in space. Hence, removing some collinear points would produce a compression without affecting much the accuracy of the data. Decreasing the number of points not only helps in reducing the file size, it also allows for faster rendering and the use of less RAM at runtime. In fact, this step could even be introduced within tractography algorithms themselves. The goal of this step is to remove as many points as possible within a certain margin of millimetric error (ϵ mm). This is achieved with a piece-wise linearization procedure over each streamline according to a tolerance error, ε. Thus, depending on how severely the streamlines are linearized, the pictorial view of streamlines remains almost unchanged. Formally, ^f is a valid linearized version of f (the raw input stream-

156

P

R O

2.2. Piece-wise linearization of streamlines

T

cannot be used interchangeably without significant modification. The situation is the same with streamline tracking datasets which also call 81 for an application-specific compression scheme. Blindly applying a com82 pression method to a streamline dataset could only lead to sub-optimal 83 results. 84 Streamlines are 3D curves represented as a collection of 3D points. 85 Unlike voice and music, streamlines are both smooth and defined in a 86 3D space. Unlike images and videos, streamlines are not defined over 87 a regular lattice and, most importantly, are not functions of space. Neu88 roimaging tracking applications have also their own specifications. One 89 important feature for a streamline compression method is to preserve 90 perfect accuracy at the end points. For many neuroimaging applications, 91 the accuracy of the starting and ending points is more important than 92 any other points along the curve (e.g. connectomics). On the other 93 hand, for neurosurgical planning (Chamberland and Descoteaux, 94 2012; Fortin et al., 2012), the full path is of interest and the error must 95 be controlled. Also, unlike JPEG or MPEG, whose focus is to keep low 96 visual compression artifacts, streamline datasets are meant for medical 97 applications where qualitative perceptual errors are not a primary 98 factor. Instead, medical users prefer to account for compression errors 99 quantified in millimeters. 100 The aim of this work is to provide a new lossy compression format 101 called .zfib for streamline tracking datasets. We propose a complete, 102 simple and powerful compression scheme validated under a wide 103 range of tractography configurations and compression options. We 104 demonstrate that streamlines are smooth and often represented 105 with a large number of points that can be removed without changing 106 the pictorial view of streamlines on the screen nor changing the aver107 age fractional anisotropy (FA) and mean diffusivity (MD) along white 108 matter bundles. Careful experiments are performed on real datasets 109 of different sizes (24 megabytes (MB) to 15 GB), from different 110 tractography algorithms (deterministic HARDI, probabilistic HARDI, 1 111 Q10 and deterministic DTI ), different step sizes (0.1 to 1 mm) and differ112 ent number of streamlines (60,000 to 3,000,000). Overall, with a 113 0.1 mm maximum error, which is very small considering the voxel 114 size (usually 2 mm isotropic), we can reach a compression ratio of 115 more than 96%. 116 Our findings open new perspectives for future connectomics and 117 group studies using tractography results with large number of stream118 lines (Hagmann et al., 2008; Honey et al., 2009) but also for future 119 tractography methods. Datasets of several gigabytes of memory that 120 were before impossible to visualize and hard to store on disk may 121 now be visualized and stored with only few megabytes of memory. 122 For example, the 3,000,000 deterministic DTI and probabilistic HARDI 123 streamline files of size 7.83 GB and 5.92 GB respectively, can be com124 pressed down to 95.6 MB and 94.4 MB respectively with a maximum 125 error of 0.5 mm. 126 The remainder of this paper is organized as follows. In Section 2, 127 we introduce a generic four-step compression pipeline made of a lin128 earization step, a transformation/approximation step, a quantization 129 step, and an encoding step. These steps are commonly used in well130 known compression algorithms such as mp3 and JPEG (Pennebaker 131 and Mitchell, 1993; Sayood, 2006, Chap. 13; Salomon and Motta, 132 2010, Chap. 7; Nelson and Jean-Loup, 1995, Chap. 11). Since these 133 steps can accommodate different algorithms and include a variety of 134 parameters, Sections 3 and 4 thoroughly validate the impact of each 135 step on compression ratios, speed, and accuracy on a variety of 136 streamline datasets. In Section 5, we discuss the results as well as 137 the pros and cons of every step. We then present the final compres138 sion algorithm and provide parameters which produce high compres139 sion ratios, low processing time, and high accuracy. We then draw 140 conclusions in Section 6.

D

79 80

E

2

157 158 159 160 161 162 163 164 165 Q11 166 167 168 169 170 171 172 173 174

line) according to a tolerance value ϵ if and only if every point pi ∈ f 175 respects the following constraint: 176   ∀pi ∈ f : dist ^f ; pi b ε i ¼ 1; …; jf j;

ð1Þ

where dist is the shortest distance between a streamline point pi and the 178 corresponding fiber ^f. In other words, ^f is a valid linearized version of f if one cannot remove a point from ^f without creating an error of more 179 than ϵ mm. 180 Fig. 1 illustrates the linearization process for different ϵ values. The 181 reader shall note that ^f always contains the starting and the ending 182 points p1 and p|f| of f. Hence, the accuracy of the end points is preserved, 183 which is crucial not to change the connectivity profile of the 184 tractography dataset. 185 2.3. Streamline transformation and approximation

186

A signal transformation is a generic term to describe a reversible mathematical operation which projects a signal from a set of basis functions to another set of basis functions. These basis functions define the domain on which the signal is represented. For example, a Fourier transformation converts a signal from a spatial domain to a frequency domain, and vice versa. A transformation function does not compress data per se. It rather concentrates its energy on a smaller number of

187 188

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

189 190 191 192 193

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

(a)

(b)

3

(c)

(d)

205 206 207 208 209 210 211 212 213 214

Some transformation candidates commonly used for compression are Discrete Cosine Transform (DCT) (Natarajan and Rao, 1974; Rao, 1990; Salomon and Motta, 2010, Chap. 7; Sayood, 2006, Chap. 13) and Discrete Wavelet Transform (DWT) (Salomon and Motta, 2010, Chap. 8; Sayood, 2006, Chap. 15). For the purpose of this work, we consider the Daubechies wavelet of orders 4, 6 and 8 (Chui, 1992; Mallat, 2008; Bamberger and Smith, 1992; Daubechies, 1988; Daubechies, 1992; Salomon and Motta, 2010, Chap. 8; Sayood, 2006, Chap. 15) and the biorthogonal Cohen–Daubechies–Feauveau (CDF) wavelets 5/3 and 7/9 (Mallat, 2008, Chap. 8). As mentioned previously, compression occurs when small coefficients are rounded to zero. One way of doing so is by keeping the K largest coefficients (Mallat, 2008, Chap. 9) and forcing to zero the other ones. This operation works as follows:

R O

P

203 204

    L qx;i ¼ log10 qx;i  ; 



non unif quant qx;i ; p ¼

h

αqx;i α

ð3Þ

237

239 240

i ; p N 0;

ð4Þ

  : non unif quant qx;1 ; p is the quantized value of 242 qx,1 containing the p most significant digits. Note that ⌈.⌉ stands for the ceiling operator whereas [.] stands for the rounding operator. 243 As for the uniform quantization procedure, it rounds up the value at 244 its 10p digit. A value quantized with a uniform procedure with a preci- 245 sion p is obtained according to the following equation: 246

D

201 202

streamline point are grouped into a single vector respectively, x = (x1, …, xn), y = (y1, …, yn) and z = (z1, …, zn). Then, transformation is done separately on each x, y and z vector. This leads to three 1D transfor      mations: transf ^f ðxÞ , transf ^f ðyÞ and transf ^f ðzÞ .

234 235 236

p−Lðqx;1 Þ

where α ¼ 10

E

199 200

most significant digits of the value. Given qx = approx(vx, K), a list of approximated coefficients obtained from Eq. (2) and p the precision, a value quantized with a non-uniform procedure is obtained according to the following equations:

T

197 198

coefficients. Compression occurs when the least significant coefficients are forced to zero. For a fiber tracking dataset, each streamline is taken separately in turn. Given a linearized streamline ^f , the x, y and z coordinates of each

C

195 196

E

194

O

F

Fig. 1. A single streamline containing 75 points (dotted points twice), and its linearized version (dotted points connected by dashed lines) with (a) ε = 0.01mm and 8 points, (b) ε = 0.05mm and 6 points, (c) ε = 0.1mm and 5 points, and (d) ε = 0.5mm and 4 points.

  q  x;i β; unif quant qx;i ; p ¼ β

ð5Þ 248

  where vk ¼ transf ^f ðkÞ . This operation is performed on vx, vy and vz

2.5. Encoding

252

221

2.4. Quantization

Encoding is a process by which the symbols of a source signal are replaced by a new set of symbols whose overall length is smaller. Although a plethora of encoding algorithms can be used, we consider the two most widely implemented ones, namely Huffman and arithmetic coding (Mallat, 2008, Chap. 10; Nelson and Jean-Loup, 1995, Chap. 2). These encoding algorithms are standard image compression

253 254

219 220

independently. Note that value K controls the amount of information that will be lost (Cohen et al., 2002; Mallat, 2008, Chap. 9; Salomon and Motta, 2010, Chap. 1). The smaller K is, the more aggressive data compression will be.

222

At this point, each streamline went through a linearization process ( f →^f ) and then a transformation and approximation pro    cess ^f ; ðkÞ; →; approx; transf ^f ðkÞ . The resulting signal contains

Table 1 Example of uniform and non-uniform quantizations on 1234.5678.

t1:1 Q1 t1:2

218

223 224

R

if jvk;i j yis amongst the K largest otherwise

ð2Þ

U

217

vk;i 0

N C O

 approxðvk ; K Þ ¼

R

where β = 10p. unif _ quant(qx,i, p) is the quantized value of qx,i rounded to its digit. After coefficients have been quantized, their values are stored in a dictionary and a lossless encoding algorithm is used to store it on disk. Table 1 shows example of both quantization procedures.

215

a series of zeros and non-zeros floating point entries for each dimension x, y and z. 227 One can further compress the signal by approximating the non-zero 228 entries by a small finite number of values. This allows representation of 229 Q12 every streamline by a finite number of floating point values stored in a 230 dictionary. This step is called quantization. 231 Two quantization procedures are typically used namely: the uniform 232 Q13 and the non-uniform quantizations (Mallat, 2008, Chap. 9; Salomon and 233 Q14 Motta, 2010, Chap. 7). Non-uniform quantization keeps the p as the 225 226

1234.5678 1234.568 1234.57 1234.6 1235 1230 1200 1000

249 250 251

255 256 257 258

Uniform quantization

Non-uniform quantization

t1:3

p

p

t1:4

−4 −3 −2 −1 0 1 2 3

8 7 6 5 4 3 2 1

t1:5 t1:6 t1:7 t1:8 t1:9 t1:10 t1:11 t1:12

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

3. Experimental setup

279

3.1. Diffusion MRI

263 264 265 266 267 268 269 270 271 272 273 274 275

U

N

C

O

R

R

E

C

Prob. HARDI

Det. DTI

212.5 MB 432.5 MB 863.0 MB 1.8 GB 3.6 GB 7.2 GB 10.8 GB

226.9 MB 454.0 MB 910.2 MB 1.9 GB 3.8 GB 7.58 GB 11.37 GB

307.4 MB 614.7 MB 1.24 GB 2.57 GB 5.14 GB 10.27 GB 15.41 GB

Step size 0.2 mm 60 k 120 k 240 k 500 k 1M 2M 3M

111.2 MB 222.4 MB 444.1 MB 926.4 MB 1.85 GB 3.71 GB 5.55 GB

118.4 MB 236.3 MB 473.6 MB 986.3 MB 1.97 GB 3.94 GB 5.92 GB

156.2 MB 313.1 MB 627.5 MB 1.31 GB 2.61 GB 5.22 GB 7.83 GB

46.5 MB 93.0 MB 186.4 MB 388.5 MB 776.6 MB 1.55 GB 2.33 GB

51.2 MB 102.0 MB 204.4 MB 425.1 MB 850.6 MB 1.7 GB 2.55 GB

66.4 MB 132.5 MB 265.0 MB 552.1 MB 1.1 GB 2.21 GB 3.32 GB

24.6 MB 49.1 MB 98.4 MB 204.9 MB 409.9 MB 819.0 MB 1.23 GB

24.8 MB 49.9 MB 100.3 MB 208.9 MB 418.1 MB 835.3 MB 1.25 GB

37.6 MB 75.1 MB 150.4 MB 313.9 MB 626.7 MB 1.25 GB 1.88 GB

D

Step size 1.0 mm 60 k 120 k 240 k 500 k 1M 2M 3M

E

T

Diffusion-weighted images (DWI) were acquired on a single volunteer along 64 uniformly distributed directions using a b-value of b = 282 1000s/mm2 and a single b = 0s/mm2 image using the single-shot 283 echo-planar imaging (EPI) sequence on a 1.5 T SIEMENS Magnetom 284 (128 × 128 matrix, 2 mm isotropic resolution, TR/TE 11000/98ms and 285 GRAPPA factor 2). Diffusion tensor estimation and corresponding 286 Fractional Anisotropy (FA) map generation were done using MRtrix 287 (Tournier et al., 2012). From this, the single fiber response function 288 was estimated from all FA values above a threshold of 0.7, within a 289 white matter binary mask. This single fiber response was used as 290 input for spherical deconvolution (Descoteaux et al., 2009; Tournier 291 et al., 2007) to compute the fiber ODFs, with maximum spherical 292 harmonics order 8, at every voxel. We used deterministic HARDI and 293 probabilistic HARDI tractography algorithms on fiber ODFs and 294 deterministic DTI tractography algorithm with 0.1, 0.2, 0.5 and 1.0 mm 295 step sizes and 60 k, 120 k, 240 k, 500 k, 1 M, 2 M and 3 M number of 296 streamlines randomly seeded in a FA mask thresholded at 0.1. MRtrix 297 launches seeds until the requested number of streamlines is reached 298 for both deterministic and probabilistic algorithms (Tournier et al., 299 2012). Other parameters were set to the default MRtrix parameters. 300 These were used to generate different levels of smoothness of stream301 line and different file sizes. We expect streamlines to be smooth for 302 deterministic DTI and HARDI tracking, and more jagged (thus more 303 difficult to compress) for probabilistic HARDI tractography. 304 Let us mention that there are benefits of running a tractography 305 algorithm with a step size smaller than the image resolution. First, a 306 small step size prevents the tracking algorithm from jumping from a 307 bundle to another (e.g. between the corpus callosum and the cingulum 308 bungles). Second, streamlines obtained with a small step size have a 309 smoother and better curvature than those obtained with a large step 310 size. Several authors discussed that issue in the past (Lazar et al., 311 2003; Tournier et al., 2012; Côté et al., 2013; Girard et al., 2014). In 312 fact, a step size between 1/2 and 1/10th the image resolution is usually 313 the default for publicly available software (Descoteaux and Poupon 314 (2014)). 315 Sizes of the datasets used for validation are reported in Table 2. 316 We used the efficient implementation publicly available in MRtrix 317 (Tournier et al., 2012). Note that our compression and decompression 318 Q15 codes support native MRtrix (.tck) (Tournier et al., 2012) and TrackVis 319 (.trk) [www.trackvis.org] tractography file formats. 320 Finally, from a whole-brain tractography dataset of 500,000 stream321 lines generated with a step size of 0.5 mm, we automatically extracted

Det. HARDI

Step size 0.1 mm 60 k 120 k 240 k 500 k 1M 2M 3M

Step size 0.5 mm 60 k 120 k 240 k 500 k 1M 2M 3M

280 281

t2:4

Number of streamlines

F

278

261 262

Table 2 t2:1 Original fiber tracking datasets sizes in megabytes (MB) and gigabytes (GB) for 0.1, 0.2, 0.5 t2:2 and 1.0mm. t2:3

O

276 277

algorithms used in JPEG (Pennebaker and Mitchell, 1993; Sayood, 2006, Chap. 13; Salomon and Motta, 2010, Chap. 7; Nelson and Jean-Loup, 1995, Chap. 11) and JPEG-2000 (Taubman and Marcellin, 2002; Sayood, 2006; Schelkens et al., 2009; Salomon and Motta, 2010). Huffman algorithm generates a dictionary of symbols with their corresponding probability in the dataset. A binary tree is built from the dictionary and the most probable entries are assigned near the root of the tree which correspond to the shortest code words. To obtain the unique code of a symbol, the tree is traversed from the root to the symbol leaf node, assigning sequentially a 0 to the left branch and a 1 to the right branch (Sayood, 2006, Chap. 3; Salomon and Motta, 2010, Chap. 5; Nelson and Jean-Loup, 1995, Chap. 3–4). While Huffman encodes the message in a chain of symbols, arithmetic encodes the entire message into a single binary code. Arithmetic encoding starts with a range between 0 and 1 which will be updated sequentially with the cumulative distribution function of each symbol. The resulting tag is converted in binary code and then encoded (Sayood, 2006, Chap. 4; Salomon and Motta, 2010, Chap. 5; Nelson and Jean-Loup, 1995, Chap. 5).

R O

259 260

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

P

4

t2:5 t2:6 t2:7 t2:8 t2:9 t2:10 t2:11 t2:12 t2:13 t2:14 t2:15 t2:16 t2:17 t2:18 t2:19 t2:20 t2:21 t2:22 t2:23 t2:24 t2:25 t2:26 t2:27 t2:28 t2:29 t2:30 t2:31 t2:32 t2:33 t2:34 t2:35 t2:36 t2:37 t2:38 t2:39

27 bundles with the TractQuerier Wassermann et al. (2013). The TractQuerier uses operations such as inclusion, exclusion on regions of interest defined by the Freesurfer cortical and white matter regions obtained from the T1-weighted image. Here, we used the queries provided in the online distribution (https://github.com/demianw/tract_querier). Once these bundles are segmented, we can then compute mean FA and MD along the bundle before and after fiber compression under different tolerance errors.

322 323

3.2. Evaluation metrics

330

In order to gauge performances, we use the following evaluation metrics: compression ratio (CR), maximum error (in millimeter) and processing time. CR is the percentage of reduction in size relative to the uncompressed size, namely

331 332

 Compressed Size : CR ¼ 100 1− Uncompressed Size

324 325 326 327 328 329

333 334

ð6Þ 336

The maximum error between the original and uncompressed datasets is used to validate the worst-case maximum Euclidean error. 337 This gives the user an intuitive understanding of the data loss after 338 compression. The processing times in Section 4 are reported in minutes. 339

4. Results

340

In this section, we tested the parameters and algorithms of every 341 stage of the processing pipeline. 342

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

5

(b) 0.2 mm

R O

O

F

(a) 0.1 mm

(d) 1 mm

P

(c) 0.5 mm

E

D

Fig. 2. Comparison of the lateral projections of the corticospinal bundle containing 5000 streamlines (blue) and linearized versions of it with (red) different errors, ε. Voxel size is 2mm. Linearized streamlines (red) with (a) ε = 0.1mm, (b) ε = 0.2mm, (c) ε = 0.5mm, (d) ε = 1mm.

4.1. Linearization

344 345

360 361

Fig. 2 illustrates the pictorial sagittal view of streamlines part of the corticospinal tract (CST) with four different linearization errors, 0.1, 0.2, 0.5, 1 mm. We have zoomed-in into a small region to highlight the size of the voxels (2 mm isotropic) with respect to maximal tolerance error. Table 3 shows the percentages of points removed with linearization on several datasets and several errors. As can be seen, with 0.1 mm maximum error (which is small considering the 2 mm isotropic voxel size of the original data) the linearization can achieve a percentage of removed points of up to 96%. As shown in Fig. 2(b), even at 96% compression, there is almost no visual difference between pre and post compression. This shows that unlike signals such as voice and music, streamlines are usually smooth and thus highly compressible. Table 3 also shows that the smaller the step size is, the larger the percentage of removed points will be. This is in line with intuition since a smaller step size leads to more tightly aligned (and yet, collinear) points. One can also note from Table 3 that probabilistic HARDI streamlines show a smaller percentage of removed points than deterministic HARDI and DTI, particularly with small step sizes such as 0.1 and

t3:1 t3:2 t3:3

Table 3 Percentage of points removed with linearization for several tracking methods on 120,000 streamlines datasets and different linearization errors.

354 355 356 357 358 359

C

E

R

352 353

R

350 351

N C O

348 349

U

346 347

T

343

0.2 mm. This is expected because of the noisy nature of probabilistic HARDI streamlines which contains sharper transitions. Moreover, Table 4 shows that the percentage of removed points for a given algorithm and a given maximum error is independent of the number of streamlines.

362 363

4.2. Transformation and approximation

367

Wavelet transforms and discrete cosine transform are hard to compare directly because wavelet decomposition such as Daubechies and Cohen–Daubechies–Feauveau is essentially applied on dyadic signals and thus require padding for non-dyadic signals (Mallat, 2008). There are several options for padding such as zero padding, symmetric padding and periodic padding (Mallat, 2008). Of course, initial padding means cropping inverse transformation to initial signal length. Depending on the shape of the signal, padding and cropping may cause artifacts on the decompressed signal (Mallat, 2008). As shown in Table 5, the DCT shows a stronger energy compaction, which concentrates almost all the energy of the streamline in a few

368 369

364 365 366

370 371 372 373 374 375 376 377 378

Table 4 t4:1 Percentage of points removed with linearization for several tracking methods on 0.2 step t4:2 size datasets and different linearization errors. t4:3

t3:4

Methods

t3:5

Algorithm

Step size (mm)

0.1

0.2

0.5

1.0

2.0

Algorithm

# streamlines

0.1

0.2

0.5

1.0

2.0

t4:5

t3:6 t3:7 t3:8 t3:9 t3:10 t3:11 t3:12 t3:13 t3:14

*Deterministic HARDI

0.1 0.2 0.5 0.1 0.2 0.5 0.1 0.2 0.5

96.2 92.4 80.5 95.4 88.9 51.2 96.5 93.0 81.6

97.2 94.4 85.8 97.0 93.1 77.6 97.5 95.0 87.3

98.2 96.5 91.2 98.3 96.2 89.5 98.4 96.8 92.0

98.7 97.5 93.9 98.8 97.5 93.5 98.8 97.6 94.2

99.1 98.2 95.6 99.1 98.2 95.6 99.1 98.2 95.7

*Deterministic HARDI

60,000 120,000 240,000 60,000 120,000 240,000 60,000 120,000 240,000

93.3 93.3 93.3 90.2 90.2 90.2 93.8 93.8 93.8

94.9 94.9 94.9 93.8 93.7 93.7 95.4 95.4 95.4

96.6 96.6 96.6 96.3 96.3 96.3 96.8 96.8 96.8

97.4 97.4 97.4 97.3 97.3 97.3 97.5 97.5 97.5

97.9 98.0 98.0 97.9 97.9 97.9 97.9 97.9 97.9

t4:6 t4:7 t4:8 t4:9 t4:10 t4:11 t4:12 t4:13 t4:14

*Probabilistic HARDI

*Deterministic DTI

Linearization max error (mm)

Methods

*Probabilistic HARDI

*Deterministic DTI

t4:4

Linearization max error (mm)

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

6

383 384 385 386 387 388 389 390 391 392 393 394 395 396

0.2

0.5

1.0

2.0

79.0 71.3 65.3 77.0 73.9

85.4 75.4 70.0 81.7 78.4

91.0 80.2 77.4 86.2 84.2

93.6 85.0 83.9 89.3 89.4

95.2 91.0 90.7 92.6 93.6

coefficients. This means that a small number of coefficients are sufficient to represent each streamline and that the DCT may be a good choice at first sight. Given that the shape of streamlines is similar to smooth cosines, it is not surprising that DCT performs better than Daubechies wavelets. Daubechies basis functions are compact, non-symmetric and more efficient over signals with sharp transitions (e.g. at edges, corners and discontinuities). The blue curve in Fig. 3 shows that on average, most of the DCT energy is concentrated in the first coefficients. In fact, on average 95% of the energy is contained within the first 3 coefficients. Table 6 shows the range of values of the DCT transformed signal in comparison with original signal. We see that the range of possible values of DCT transformed signal is more than 500 times the range of the original signal. As we transform and approximate the signal to minimize the number of values to encode, the transformation extends significantly the range of values. A large range of values means a large number of symbols to encode and thus a slower compression algorithm.

4.3. Quantization

398 399

Table 7 shows the dictionary size and the maximum quantization error after uniform and non-uniform quantizations. We see that for a given maximum quantization error (say 0.17321), a much smaller dictionary size is obtained with uniform (say 2523) than with nonuniform quantization (say 76,107). This means less values to encode and thus a better compression ratio and a faster algorithm.

E

R

R O C N U

402 403

C

397

400 401

t6:5

DCT No transformation

−41,567.2 −69.4

55,184.1 106.0

t6:6 t6:7

4.4. Encoding

404

Table 8 shows the Compression Ratio (CR), the total Compression Time (CT) and the total Decompression Time (DT) for different tracking algorithms and different step sizes with 120,000 streamlines. Information theory states that the compression power of arithmetic coding is better or identical than Huffman coding (Sayood, 2006; Salomon and Motta, 2010). The results presented in Table 8 are in line with the literature since the compression ratios of both methods are identical. However, in the current context, arithmetic coding is much slower than Huffman coding both for compression and decompression. Fig. 3 shows the compression and decompression time for different tracking algorithms with 0.2 mm step size and 0.5 mm maximum error. Again, we see that for a same compression ratio, Huffman coding is much faster both for compression and decompression. The compression and decompression time difference becomes an important factor when the number of streamlines exceeds 1 million. For 3 million streamlines, there is a 20 minute difference between the slowest and the fastest encoding method, which is significant for any real-life application. Looking at the results, it is clear that Huffman coding is more suitable for our problem than arithmetic coding. The main reason why decompression is much faster with Huffman is because it only implies to replace each block of bytes with its corresponding symbol in the dictionary, while arithmetic coding needs to decode the final tag by doing the inverse process (Sayood, 2006).

405 406

5. Discussion

429

5.1. zfib: our final fiber compression method

430

By taking into account the results reported in the previous section, our final method consists of three major steps: linearization, quantization and encoding. The reasons for excluding the transformation/ approximation step are three fold. First, the goal of the transformation/ approximation step is to minimize the number of non-zero values to

431

F

381 382

0.1

Max value

O

379 380

DCT Daubechies 4 Daubechies 6 CDF 5-3 CDF 9-7

Min value

R O

t5:7 Q2 t5:8 t5:9 t5:10 t5:11

Fig. 3. Normalized cumulative sum of DCT average coefficients.

t6:1 t6:2 t6:3 t6:4

Transformations

P

t5:6

Max error (mm)

D

2 ∗ Transformations

Table 6 Range of values for a DCT transformed and a non-transformed dataset containing 120,000 streamlines using deterministic fODF tracking with a step size of 0.5 mm without linearization.

E

t5:5

Table 5 Percentage of maximum coefficients compression obtained on a dataset containing 2500 streamlines. A deterministic HARDI tracking with a step size of 0.2mm without linearization was used.

T

t5:1 t5:2 t5:3 t5:4

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428

432 433 434 435

Table 7 t7:1 Quantization comparison for a dataset containing 120,000 streamlines using deterministic t7:2 HARDI tracking with a step size of 0.5 mm without linearization and transformation. t7:3 p

Dictionary size

Max quantization error (mm)

Uniform quantization −4 1 837 616 −3 211 131 −2 23 190 −1 2 523 0 272 1 31 2 4 3 1

0.00017 0.00173 0.01732 0.17321 1.73295 17.32050 173.20500 1732.05000

Non-uniform quantization 1 132 2 1 118 3 9 398 4 76 107 5 580 726

173.20500 17.32050 1.73205 0.17321 0.01732

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

t7:4 t7:5 t7:6 t7:7 t7:8 Q3 t7:9 t7:10 t7:11 t7:12 t7:13 t7:14 t7:15 t7:16 t7:17 t7:18 t7:19 t7:20

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

Arithmetic

Step size (mm)

CR

CT

DT

CR

CT

DT

t8:7 Q4 t8:8 t8:9 t8:10 t8:11 t8:12 t8:13 t8:14 t8:15 t8:16 t8:17 t8:18

*Deterministic HARDI

0.1 0.2 0.5 1.0 0.1 0.2 0.5 1.0 0.1 0.2 0.5 1.0

99.3 98.7 96.7 93.4 99.3 98.5 95.7 87.0 98.8 98.8 98.8 98.8

1.88 0.62 0.23 0.16 1.57 0.52 0.26 0.25 0.61 0.60 0.60 0.64

0.02 0.03 0.03 0.03 0.03 0.03 0.04 0.03 0.02 0.02 0.02 0.02

99.3 98.7 96.7 93.4 99.3 98.5 95.7 87.0 98.8 98.8 98.8 98.8

2.50 1.26 0.94 0.92 2.17 1.30 1.36 1.83 1.17 1.14 1.14 1.20

1.42 1.45 1.62 1.72 1.27 1.57 2.27 3.09 1.14 1.16 1.13 1.14

*Probabilistic HARDI

*Deterministic DTI

CR: Compression Ratio (%), CT: Total Compression Time (min), DT: Total Decompression Time (min).

436

encode. However, Table 6 shows that transformation extends significantly the range of values. In fact, it increases it by three orders of magnitude. Such a large range of values leads to a large number of non-zero symbols and thus a large dictionary and a slow compression algorithm. Second, as opposed to other compression formats such as mp3, JPEG and MPEG, a compressed fiber dataset shall never exceed a maximum error in millimeters. Unfortunately, it is impossible to know in advance the right number of coefficients K to force down to zero for each streamline. The only way of finding the best K for each streamline without exceeding the maximum error ε is by iteratively testing different values of K. Thus, for each K, the streamline signal is transformed, approximated and inverse transformed in order to figure out which K provides the best compression ratio without exceeding the maximum error. As one can imagine, doing so at runtime quickly becomes a computational bottleneck. Third, since the linearization step already removes more than 50% of the points (and often more than 95% of the points), the compression ratio produced by the transformation/approximation step is not that significant. Nonetheless, the transformation/approximation options remain available in the code provided on github2 for further research. As for the quantization step, according to Table 7, we decided to use a uniform quantization (that is rounding the value at its 10p position). The uniform quantization depends on one parameter p. Since the user only specifies the maximum error, we have to select a default value for p. To do so, we conducted a series of experiments on various datasets. According to these experiments, we found that a good compromise between compression ratio and processing time is to use p = − 1 when the user selects a maximum error of less than 0.2 mm and p = 0 otherwise. Let us mention that the quantization error (α) must be taken into account right from the beginning such that the overall error (that is the linearization error + quantization error) corresponds to the maximum error ϵ given by the user. In that perspective, the maximum quantization rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   error (α) is obtained with the following equation: α ¼ 3 102p . The

449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471

C

447 448

E

445 446

R

443 444

R

441 442

N C O

439 440

U

437 438

472 473

linearization error is then defined by subtracting the maximum quantization error α from the maximum error threshold ϵ given by the user. 474 Q16 This assures respect of the worst-case maximum error ϵ. 475 Once p and ϵ have been computed, the linearization step is 476 performed using Eq. (1) and the resulting dataset is quantized with

2

https://github.com/scilus/FiberCompression.

0.5

1.0

gzip

t9:4

Deterministic DTI Deterministic HARDI

228 Mb 357 Mb

53 Mb 80 Mb

51 Mb 77 Mb

31 Mb 42 Mb

21 Mb 28 Mb

202 Mb 323 Mb

t9:5 t9:6

uniform quantization and parameter p. The dataset is then further compressed with Huffman coding and saved in .zfib format. Our current implementation allows for two pipelines: the whole pipeline that includes linearization, quantization and encoding steps, and the linearization step only (see Fig. B.7). Details for each pipeline are presented in algorithms 1 and 2 of Appendix B.

5.2. Comparison with gzip

T

t8:19 t8:20

0.25

F

Algorithm

0.1

O

t8:6

Original size

We also compressed the datasets used in Table 9 with an off-theshelf compression software. We used the Linux gzip command which implements a Lempel–Ziv coding. With no surprise, compression ratios obtained with this method are by no means comparable to ours. While our method obtains compression ratios of at least 86% (Fig. 5), gzip is in the range of 8 to 10%. (See Fig. 4.) We also performed another experiment on images acquired on a 2 mm isotropic grid on which we ran a full brain tractography using a 2 mm step size with 1 million streamlines using all the other default parameters provided by MRtrix. As can be seen in Table 9, gzip reaches moderate compression ratios of 9–11%, while our FiberCompression algorithm reaches between 75% and 93% compression ratio. This is because gzip is a lossless compression method which does not exploit the streamlines structure.

R O

Huffman

t9:3

Maximum compression error (mm) Methods

P

Methods

t9:1 t9:2

Table 9 Compression ratios of our method and gzip.

D

t8:5

Table 8 Encoding comparison for datasets using HARDI tracking with different algorithms and step size, 120,000 streamlines and a maximum error of 0.5 mm with linearization and uniform quantization p = 0 and no transformation.

E

t8:1 t8:2 t8:3 t8:4

7

477 478 479 480 481 482

483 484 485 486 487 488 489 Q17 490 491 492 493 494 495 496 497

5.3. Impact of diffusion measures along bundles

498

Since clinical studies often involve the integration of dMRI metrics such as FA and MD along fiber bundles, we ran an experiment to measure how fiber compression could impact average FA and MD measures along 27 bundles extracted from whole-brain tractography, as described in the experiment section. Here, we tested two strategies to compute the average FA and MD along the bundle: i) By accumulating the average FA and MD only in voxels where there is a point in the compressed fiber, or ii) perform a smart Bresenham-style integration Amanatides and Woo (1987) along the compressed fibers, which essentially accumulates the FA and MD of every voxel traversed by a ray between each pair of points in the compressed fibers. These two strategies lead to the average error values (in %) shown in Table 10 (top) and Table 10 (bottom) respectively. As can be seen, even with a compression error of 1 mm, we only have an average error of 1.2% and 2.2% when doing a Bresenham-style integration of the compressed fiber. This shows that statistics gathered along compressed fibers are, for all practical purposes, identical to the ones obtained with non-compressed fibers. Note that naive cummulation of FA/MD values along the fibers without integration leads to large percentage differences in FA and MD average between the compressed and non-compressed versions. Hence, it is important to note that smart integration is crucial since the compressed fibers have much less points than the original fibers. Hence, statistics along fiber bundles, fiber selection of bundles with inclusion/exclusion regions must be re-thought in the current available software. One can no longer simply check if a point of the fiber is in or out of a voxel or a region of interest. Ray tracing must be done smartly to traverse every voxel between pairs of points in the compressed fibers. Visualization tools must therefore be adapted to handle compressed fiber datasets.

499

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

R O

O

F

8

P

Fig. 4. Compression and decompression time comparison for datasets using HARDI tracking with different algorithms, 0.2 mm step size, different number of streamlines and a maximum error of 0.5 mm.

A compression method such as ours could accommodate with parametric curves such as B-splines and Bezier curves (Piegl and Tiller, 1997). For example, instead of our linearization step which eliminates collinear points, one could instead fit a parametric curve on each streamline following a maximum error criterion (Walker et al., 2010). In this way, instead of representing a streamline by a collection of 3D points, one could represent a streamline with a set of spline parameters.

C

E R R O C

535 536

N

533 534

U

531 532

D

530

Given the relatively smooth aspect of most streamlines, this could result into an impressive compression ratio. However, this approach is not void of limitations. First, a splinebased regression means controlling the complexity of the spline. In many cases, that means evaluating the number of spline knots as well as their location (Walker et al., 2010). This leads inevitably to a nonlinear optimization algorithm which is significantly slower than our simple linearization procedure. Second, most tractography-related software assume that a streamline is a collection of 3D points. Thus, a

E

5.4. Parametric curves

T

529

Fig. 5. Compression ratio and compression factor comparison for number of streamlines on datasets using HARDI tracking with different algorithms, 0.1, 0.2, 0.5 and 0.5 mm step size and a maximum error of 0.5 mm.

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

537 538 539 540 541 542 543 544 545

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

Table 10 Mean error (in %) obtained after integrating MD and FA values along 27 fiber bundles compressed with different maximum compression error values (in mm). The top part of the table is with naive cummulation of statistics and the bottom part with a Bresenham-style integration Amanatides and Woo (1987).

t10:4

0.001 mm

t10:5

With naive accumulation of statistics

t10:6 t10:7 t10:8 t10:9

MD FA

t10:10 t10:11

MD FA

0.005 mm

0.075% 0.44%

0.25% 1.78%

0.01 mm

0.02 mm

0.05 mm

0.1 mm

0.2 mm

0.5 mm

1 mm

0.39% 2.87%

0.59% 4.08%

1.08% 5.61%

1.67% 6.98%

2.56% 8.66%

4.08% 10.93%

5.58% 13.24%

0.014% 0.017%

0.039% 0.058%

0.089% 0.136%

0.19% 0.30%

0.53% 0.88%

1.2% 2.2%

With Bresenham-style integration of statistics 0.00006% 0.00015%

0.001% 0.002%

0.005% 0.007%

F

t10:1 t10:2 t10:3

546

decompression method that would return spline parameters instead of 3D points would suffer from compatibility issues. Of course, the decompression method could also come with a spline-to-3D point con549 Q18 verter, but that would result into more processing and extra parameters 550 to tune. And third, as shown in Table 4, our linearization method can 551 reach a compression ratio of more than 98 % given a max-error of 552 0.5mm, a value much smaller than the usual 2 × 2 × 2 mm3 voxel reso553 lution. As a consequence, a spline-based regression could only slightly 554 improve compression at the expense of extra processing time and 555 algorithmic complexity. 556 Also, one could use parametric curves as basis functions for the 557 transformation step. However, we would still have to implement the 558 back-and-forth approach explained previously to determine K, the 559 number of non-zero coefficients. As mentioned before, such method 560 would be far too slow to be of any practical use.

(density) at each voxel. These TDI maps are known to be highly variable for low number of seeds (Calamante et al., 2011 2012) and thus, the community generates tractography datasets of 5 to 10 million streamlines to produce these maps (Calamante et al., 2011, 2012). Again, storing the original streamline files is quite heavy. A final remark concerns novel “global” tractography techniques (Mangin et al., 2013; Kreher et al., 2008; Reisert et al., 2011) that are based on a few controls points that are optimized to explain local and global connectivity under different constraints. Given the highly compressible nature of streamlines clearly shown in this work, it is very well possible to imagine a structural connectivity able to be captured by a different representation than millions of small line segments joined together. This can surely influence the development of a new family of tractography representations used for structural connectivity.

562 563

Diffusion MRI tractography techniques have recently been applied in the complete mode, i.e. seeding a complete white matter mask and then tracking “everywhere-to-everywhere” in this mask Descoteaux and Poupon, 2014. If this complete tracking is done in t1-space, which is generally 1 mm isotropic, this means that there are more than 1 million voxels in this white matter mask. Hence, generating a fiber tracking dataset with 120,000 streamlines results in poor spatial coverage, as there are roughly 1 in 10 voxels that were used to seed the tractography. This makes fiber reconstruction of small, narrow and highly curved fiber bundles such as the fornix and cingulum quite challenging to reconstruct in their full extent. Hence, we need millions of seeds to perform robust reconstruction of specific bundles. Yet, a 120,000 streamlines dataset is already more than 220 MB in size (if a 0.2 mm step size is used Tournier et al., 2012). For the same reason, in the context of structural connectomics, a very large number of seed points are needed to perform tractography before a connectivity matrix is built. The original work of Hagmann and colleagues (Hagmann et al., 2008; Honey et al., 2009) uses 33 seeds per voxel. This results in more than 33 million streamlines for a complete tractography, which is very challenging to store on disk and impossible to visualize. Of course, for a connectomic application, one could save some space by only storing the first and last points of each streamline in an uncompressed .tck or .trk file format. However, such a solution would make it impossible to go back to the original streamlines and explore certain paths. Using our fiber compression tool could lead to the exact same connectivity matrix while reducing the disk size by a factor of up to 100 times. This file could then be reused for quality assurance, visualization and later bundle dissection. Note that for diffusion MRI and functional MRI fusion, a large number of seeds and streamlines are also generated to find the link between functional and structural connectivity (Whittingstall et al., 2014). Furthermore, the recent application of tract-density imaging (TDI) produces super-resolved qualitative maps of the streamline count

573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595

C

E

R

R

571 572

N C O

569 570

U

567 568

R O

P

D

E

5.5. Fiber compression for diffusion MRI

T

561

565 566

O

547 548

564

9

596 597 598 599 600 601 602 603 604 605 606 607 608 609 610

6. Conclusion

611

In this work, we presented a new compression format .zfib and a new three-step pipeline that shows impressive compression ratios and relatively low processing time. Each step has been tested, evaluated and validated under a wide range of tractography configurations (step size, deterministic HARDI, probabilistic HARDI, deterministic DTI, streamline number) and compression options. The user has only one parameter to select: a worstcase maximum tolerance error in millimeters (mm). By choosing a large maximum error, the user accepts for a larger compression ratio which leads to a lower resolution in the compressed dataset. As of today, the whole compression/decompression algorithm is available in a stand-alone application with two modes: commandline and a simple interface (see Appendix and the code available on github3). The commandline mode can be used in batch scripts to compress many streamlines datasets. Our current implementation supports native MRtrix (.tck) (Tournier et al., 2012) and TrackVis (.trk) (trackvis.org) tractography file formats. Our new compression format both reduces significantly the amount of space required to store fiber tracks on disk and simplifies the visualization of large tractography datasets such as the ones used in connectomics studies. Datasets of several gigabytes of memory may now be stored with only few megabytes and our method is especially useful when dealing with a collection of brains. Datasets that were previously impossible to visualize and hard to store can now be more easily manipulated without affecting the visual aspect of streamlines or changing the diffusion statistics along the white matter bundles.

612

636 Q19 637

Acknowledgment

638 Q20

613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635

The authors would like to thank and are grateful to NSERC, FRQNT 639 Q21 Q22 640 and the Samuel de Champlain Program for funding this research. 3

https://github.com/scilus/FiberCompression.

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

10

641

Appendix A. Interface (See Fig. A.6.)

P

R O

O

F

642 Q23

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

Illustration and details of the available pipelines and algorithms.

N

C

Algorithm 1. Compression pseudo-code

U

646

Fig. B.7. Available pipelines.

O

R

R

E

C

T

645 644

Appendix B. Algorithms

E

643

D

Fig. A.6. Simple interface with as few options as possible: input file (tck, trk or zfib format), output file (tck, trk or zfib format), compression/decompression mode and maximum error in mm.

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

C. Presseau et al. / NeuroImage xxx (2015) xxx–xxx

11

648 647 650 649

Algorithm 2. Decompression pseudo-code.

651

Parameters: Input file (.tck, .trk).

652

1. Load compressed file (.zfib). 2. Decode with Huffman 3. Save file (.tck, .trk)

653 654

O

R O

P

Amanatides, J., Woo, A., 1987. A fast voxel traversal algorithm for ray tracing. Eurographics 87, 3–10. Bamberger, R.H., Smith, M.J.T., 1992. A filter bank for the directional decomposition of images. IEEE Trans. Signal Process. 40, 882–893. Calamante, F., Tournier, J.D., Heidemann, R.M., Anwander, A., Jackson, G.D., Connelly, A., 2011. Track density imaging (TDI): validation of super resolution property. NeuroImage 56. Calamante, F., Tournier, J.D., Smith, R.E., Connelly, A., 2012. A generalised framework for super-resolution track-weighted imaging. NeuroImage 59, 2494–2503. Chamberland, M., Descoteaux, M., 2012. Neurosurgical tracking at the Sherbrooke Connectivity Imaging Lab (SCIL). International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI12) — DTI Challenge Workshop. Chui, C.K., 1992. An Introduction to Wavelets. Academic Press, San Diego. Cohen, A., Daubechies, I., Guleryuz, O.G., Orchard, M.T., 2002. On the importance of combining wavelet-based nonlinear approximation with coding strategies. IEEE Trans. Inf. Theory 48, 1895–1921. Côté, M.A., Girard, G., Boré, A., Garyfallidis, E., Houde, J.C., Descoteaux, M., 2013. Tractometer: towards validation of tractography pipelines. Med. Image Anal. 17, 844–857. Daubechies, I., 1988. Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996. Daubechies, I., 1992. Ten Lectures on Wavelets. SIAM. Descoteaux, M., Deriche, R., Kno sche, T.R., Anwander, A., 2009. Deterministic and probabilistic tractography based on complex fibre orientation distributions. 28, 269–286. Descoteaux, M., Poupon, C., 2014. Diffusion-weighted MRI. Comprehensive Biomedical Physics. Elsevier Inc. Fortin, D., Aubin-lemay, C., Boré, A., Girard, G., Houde, J.C., Whittingstall, K., Descoteaux, M., 2012. Tractography in the study of the human brain: a neurosurgical perspective. Can. J. Neurol. Sci. 39, 747–756. Girard, G., Whittingstall, K., Deriche, R., Descoteaux, M., 2014. Towards quantitative connectivity analysis: reducing tractography biases. NeuroImage 98, 266–278.

D

657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 Q24 680 681 682 683 684 685 686 687 688

E

References

C

T

656

F

655

Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C.J., Wedeen, V.J., Sporns, O., 2008. Mapping the structural core of human cerebral cortex. PLoS Biol. 6, 1479–1493. Honey, C.J., Sporns, O., Cammoun, L., Gigandet, X., Thiran, J.P., Meuli, R., Hagmann, P., 2009. Predicting human resting-state functional connectivity from structural connectivity. Proc. Natl. Acad. Sci. U. S. A. 106, 2035–2040. Kreher, B.W., Mader, I., Kiselev, V.G., 2008. Gibbs tracking: a novel approach for the reconstruction of neuronal pathways. Magn. Reson. Med. 60. Society of Magnetic Resonance in Medicine, pp. 953–963 Lazar, M., Weinstein, D.M., Tsuruda, J.S., Hasan, K.M., Arfanakis, K., Meyerand, M.E., Badie, B., Rowley, H.a., Haughton, V., Field, A., Alexander, A.L., 2003. White matter tractography using diffusion tensor deflection. Hum. Brain Mapp. 18, 306–321. Mallat, S., 2008. A wavelet tour of signal processing. The Sparse Way, 3rd edition Elsevier Inc. Mangin, J.F., Fillard, P., Cointepas, Y., Le Bihan, D., Frouin, V., Poupon, C., 2013. Toward global tractography. NeuroImage 80, 290–296. Natarajan, T., Rao, K.R., 1974. Discrete cosine transform. IEEE Trans. Comput. 23, 90–93. Nelson, M., Jean-Loup, G., 1995. The Data Compression Book. 2nd edition Wiley. Pennebaker, W.B., Mitchell, J.L., 1993. JPEG Still Image Data Compression Standard. 3rd edition Springer. Piegl, L., Tiller, W., 1997. The NURBS Book. 2nd ed. Springer. Rao, K., 1990. Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic Press, Boston. Reisert, M., Mader, I., Anastasopoulos, C., Weigel, M., Schnell, S., Kiselev, V., 2011. Global fiber reconstruction becomes practical. NeuroImage 54, 955–962. Salomon, D., Motta, G., 2010. Handbook of Data Compression. 5th edition Springer. Sayood, K., 2006. Introduction to Data Compression. 3rd edition Elsevier Inc. Schelkens, P., Skodras, A., Ebrahimi, T., 2009. The JPEG 2000 Suite. Wiley. Taubman, D., Marcellin, M., 2002. JPEG2000. Image Compression Fundamentals, Standards and Practice. Springer. Tournier, J.D., Calamante, F., Connelly, A., 2007. Robust Determination of the Fibre Orientation Distribution in Diffusion MRI: Non-Negativity Constrained Super-Resolved Spherical Deconvolution. 35, 1459–1472. Tournier, J.D., Calamante, F., Connelly, A., 2012. Mrtrix: Diffusion Tractography in Crossing Fiber Regions. 22, 52–66. Walker, C., O'Sullivan, M., Zhu, Y., Hus, S., MacKenkie, M., Donovan, C., Rajagopal, V., 2010. Regression spline fitting with applications. Proc. of the Annual Conf. of the ORSNZ. Wassermann, D., Makris, N., Rathi, Y., Shenton, M., Kikinis, R., Kubick, M., Westin, C.F., 2013. On describing human white matter anatomy: the white matter query language. MICCAI 1. Springer, pp. 647–654. Weinstein, D., Kindlmann, G., Lundberg, E., 1999. Tensorlines: advection-diffusion based propagation through diffusion tensor fields. IEEE Visualization 1999 (Vis'99). 3 pp. 249–253. Whittingstall, K., Bernier, M., Houde, J.C., Fortin, D., Descoteaux, M., 2014. Structural network underlying visuospatial imagery in humans. Cortex.

U

N C O

R

R

E

733

Please cite this article as: Presseau, C., et al., A new compression format for fiber tracking datasets, NeuroImage (2015), http://dx.doi.org/10.1016/ j.neuroimage.2014.12.058

689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 Q25