Available online at www.sciencedirect.com
Computerized Medical Imaging and Graphics 32 (2008) 174–182
Lossless compression of medical images using Hilbert space-filling curves Jan-Yie Liang a , Chih-Sheng Chen a,∗ , Chua-Huang Huang a , Li Liu b a
Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan b Graduate Institute of Medical Informatics, Taipei Medical University, Taipei, Taiwan Received 26 May 2006; received in revised form 12 November 2007; accepted 26 November 2007
Abstract A Hilbert space-filling curve is a curve traversing the 2n × 2n two-dimensional space and it visits neighboring points consecutively without crossing itself. The application of Hilbert space-filling curves in image processing is to rearrange image pixels in order to enhance pixel locality. A computer program of the Hilbert space-filling curve ordering generated from a tensor product formula is used to rearrange pixels of medical images. We implement four lossless encoding schemes, run-length encoding, LZ77 coding, LZW coding, and Huffman coding, along with the Hilbert space-filling curve ordering. Combination of these encoding schemes are also implemented to study the effectiveness of various compression methods. In addition, differential encoding is employed to medical images to study different format of image representation to the above encoding schemes. In the paper, we report the testing results of compression ratio and performance evaluation. The experiments show that the pre-processing operation of differential encoding followed by the Hilbert space-filling curve ordering and the compression method of LZW coding followed by Huffman coding will give the best compression result. © 2007 Elsevier Ltd. All rights reserved. Keywords: Lossless compression; Hilbert space-filling curve; Run-length encoding; LZ77 coding; LZW coding; Huffman coding; Differential encoding
1. Introduction Modern medical diagnostics are often based on X-ray computerized tomography (CT) and magnetic resonance imaging (MRI) technique. The raw data delivered using such imaging devices usually take several mega-bytes of disk space. The diagnostic images for radiologic interpretation must be efficiently stored and transmitted to physicians for future medical or legal purposes. Digital medical image processing generates large and data-rich electronic files. To speed up electronic transmission and to minimize computer storage space, medical images often are compressed into files of smaller size. Compressed medical images must preserve all the original data details when they are restored for image presentation. That is, medical image compression and decompression must be lossless.
∗
Corresponding author. Tel.: +886 4 24517250x3769. E-mail addresses:
[email protected] (J.-Y. Liang),
[email protected] (C.-S. Chen),
[email protected] (C.-H. Huang),
[email protected] (L. Liu). 0895-6111/$ – see front matter © 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.compmedimag.2007.11.002
Data compression is a long-history technique of human activities. Abbreviation and other devices for shortening the length of transmitted messages have been used in every human society. Before Shannon, the activity of data compression was informal. It was Shannon who first created a formal intellectual discipline for data compression. A remarkable outcome of Shannon’s formalization of data compression has been the use of sophisticated theoretical ideas [24]. For example, the JPEG standard, proposed in the 1980’s and in use for transmitting and storing images, uses discrete cosine transformation, quantization, run-length coding, and entropy coding [20]. In this paper, the concept of information theory is reviewed. Several coding schemes, including run-length coding, Huffman coding, LZ77 coding, and LZW coding are used to compress medical images. The entropies of CT images before and after compression are shown. The major goal of these encoding schemes is to lower the entropy of the compressed image, i.e., to improve the compression ratio. Run-length encoding is a simple and effective compression scheme [6]. An example of a realworld use of run-length coding is for the ITU-T T.4 (G3 fax) standard for facsimile data compression. This is the standard for all home and business facsimile machines used over regular
J.-Y. Liang et al. / Computerized Medical Imaging and Graphics 32 (2008) 174–182
phone lines. The basic idea is to identify strings of adjacent messages of equal value and replace them with a single occurrence along with a count. The Huffman coding scheme first scans the source data to obtain a probability model of the source data and then generates a coding tree using the probability model [8]. The Huffman coding scheme is probably the mostly used one in data and image compression programs, such as GZIP and JPEG. The Lempel-Ziv coding schemes, known as the LZ77 and LZW coding schemes, are dictionary based [15,16]. A dictionary is created at the time when source data is parsed. The LZW coding scheme is often enhanced with the probability model as in the Huffman coding scheme. The Hilbert space-filling curve is a one-to-one mapping between an N-dimensional space and the one-dimensional space [7]. The Hilbert space-filling curve scans all points of the N-dimensional space without crossing itself. Also, the curve preserves neighborhoods of points as much as possible. Since David Hilbert presented the Hilbert space-filling curve in 1891, there have been several research works about formal specification of an operational model or a functional model. A mathematical history of Hilbert space-filling curves is given by Sagan [22]. Butz proposes an algorithm to compute the mapping function of Hilbert space-filling curves using bit operations [2,3]. Quinqueton and Berthod propose an algorithm for computing all addresses of scanning path as a recursive procedure [21]. Kamata et al. propose a non-recursive algorithm for the Ndimensional Hilbert space-filling curve using look-up tables [11,12]. Lin et al. present a tensor product based algebraic formulation of two-dimensional and three-dimensional Hilbert space-filling curves [4,18]. The tensor product formulas are also used to generate Hilbert space-filling curves in C programming language. Jagadish analyzes the clustering properties of Hilbert space-filling curves [9]. He shows that the Hilbert space-filling curve achieves the best clustering, i.e., it is the best space-filling curve in minimizing the number of clusters. Moon et al. provide closed-form formulas of the number of clusters required by a given query region of an arbitrary shape for Hilbert space-filling curves [19]. Hilbert space-filling curves have been applied to image compression broadly. In most of the works, image pixels initially stored in the row major order are reordered using the Hilbert space-filling curve permutation and then a given procedure is employed to perform data compression. If an image is not of horizontal or vertical stripes, the Hilbert space-filling curve ordering places similar color pixels in adjacent areas consecutively, i.e., the compressibility of an image under the Hilbert space-filling curve order is more effective than the row-major order. Lempel and Ziv prove that the compression rate of an image is lowerbounded if it is scanned in the Hilbert space-filling curve order [17]. Kamata et al. use a simple zero-order interpolation compression algorithm to compress gray images and color images [10,13]. Before the application of the zero-order interpolation, the pixels of an image are rearranged into the Hilbert spacefilling curve order. Experimental results of images are reported. The compressed images will lose some content details irreversibly. However, according to their study, the quality of the compressed images are close to JPEG images. Khuri and Hsu
175
propose a run-length encoding algorithm with Hilbert spacefilling curve ordering [14]. Some experiments show that storage could be reduced up to 60% in the Hilbert space-filling curve order comparing to the row/column major order for images with high spatial homogeneity [1,5]. In this paper, we will investigate lossless compression of medical images using the Hilbert space-filling curve order. The programs generated from tensor product formulas [18] are used to perform pixel recording. Four coding schemes, run-length coding, Huffman coding, LZ77 coding, and LZW coding, will be tested to compress medical images. These schemes are also combined in image compression. In addition to the coding schemes, images are pre-processed with order one to order three differentiation. The paper is organized as the following. An overview of lossless compression methods is given in Section 2. The Hilbert Space-filling curve is described in Section 3. Programs generated from tensor product formulas for both Hilbert space-filling curve ordering and its inverse are included in Section 3. The inverse Hilbert space-filling curve order is needed for data decompression. Experiments and performance evaluation are presented and discussed in Section 4. Concluding remarks and future works are given in Section 5. 2. Lossless data compression Compression is a process used to reduce the physical size of information. A compression process has many goals: to store more information on the same media, i.e., reduce disk space usage and/or transmission time and bandwidth on a network, and later to reuse the data. For a compression technique, we actually refer to two dual processes of compression algorithms: construction of the compressed data and reconstruction of the original data. Based on the resulting data of reconstruction, data compression schemes can be divided into two classes: lossless compression schemes and lossy compression schemes [23]. Compression ratio is defined as the ratio of the number of bytes of the data to the number of bytes of the compressed result, including overhead bytes. A compression technique is considered optimum when the information content of the compressed data is closed to the entropy of the original image. Shannon shows that any definition of entropy satisfying his assumptions will be −K ni=1 p(i) log p(i), where K is the constant of a measurement unit [24]. This implies that the efficiency of a source alphabet with n symbols can be defined simply as being equal to its entropy. Lossless compression implies no loss of information. If data has been losslessly compressed, the original data can be totally recovered from the compressed data. Lossless compression is used for applications that cannot tolerate any difference between the original and reconstructed data. Lossless compression is generally implemented using one of two different types of modeling: statistical and dictionary-based. Statistical model reads in and encodes a single symbol at a time using the probability of the character appearance. This model generally encodes a single symbol at a time: reading a symbol, calculating its probability, then generating its
176
J.-Y. Liang et al. / Computerized Medical Imaging and Graphics 32 (2008) 174–182
compressed code. An example of this compression scheme is Huffman coding. A dictionary based compression technique uses a different concept. It reads in input data and looks for groups of symbols that appear in a dictionary. If a string match is found, a pointer or index into the dictionary can be output instead of the code for the symbol. Intuitively, the longer the match, the better the compression is. Static dictionary and adaptive dictionary schemes are two kinds of dictionary schemes. The genesis of most modern adaptive dictionary schemes can be traced to two landmark papers, written by Lempel and Ziv, in 1977 and 1978 [15,16]. The coding schemes based on the 1977 paper are named as LZ77 schemes; those based on the 1978 paper are named as the LZ78 schemes. For example, LZW is one of the LZ78 schemes. In the rest of this section, we will briefly review the algorithms of Huffman coding, LZ77 coding, and LZW coding schemes. We also review a simple coding scheme: run-length coding. In data compression, some pre-processing steps are performed before the application of a coding scheme. A pre-process, called differential coding, is to compute the difference of adjacent data elements. In addition, the review will include the differential coding scheme. 2.1. Huffman coding algorithm A Huffman code is an optimal prefix code generated from a set of probabilities [8]. It scans the source data to obtain a probability model first; it then generates a coding tree using the obtained probability model. We will describe generation of the prefix-code tree below. 1. Start with a forest of trees, one for each data element. Each tree contains a single vertex with weight Wi = Pi , where Pi is the probability of the data element on the vertex. 2. Repeat until only a single tree remains: 2.1 Select two trees with the lowest weight roots, say, W1 and W2 . 2.2 Combine them into a single tree by adding a new root with weight W1 + W2 and making the two trees as its children. It does not matter which subtree is the left or right child, but the convention is to put the lower weight root on the left, if W1 = W2 . When building the prefix-code tree, we must consider the conditions for an optimal variable-length binary code [23]. We list these conditions as follows: 1. Given any two letters ai and aj , if P[ai ] ≥ P[aj ], then li ≤ lj , where li is the number of bits in the codeword for ai . 2. The two least probable letters have codewords with the same maximum length lm . 3. In the tree corresponding to the optimum code, there must be two branches stemming from each intermediate node. 4. We can change an intermediate node into a leaf node by combining all the leaves descending from it into a composite
word of a reduced alphabet. If the original tree is optimal for the original alphabets, the reduced tree is also optimal for the reduced alphabet. In this paper, we build an optimal prefix-code tree when applying the Huffman coding scheme. 2.2. LZ77 coding algorithm The LZ77 algorithm compresses by building a dictionary of previously seen strings that consists of a group of characters of varying lengths. The LZ77 algorithm and its variants use a sliding window that starts from the beginning of the data and moves along with the cursor. The window can be divided into two parts, the part before the cursor, called the dictionary, and the part starting at the cursor, called the lookahead buffer. The size of these two parts are parameters of the program and are fixed during execution of the algorithm. At the highest level, the algorithm can be described as below: 1. Find the longest match of a string starting at the cursor and completely contained in the lookahead buffer to a string in the dictionary. 2. Output a triple (P, n, C) containing the position P of the occurrence in the window, the length n of the match and the next character C passing the match. 3. Move the cursor n + 1 characters forward. We will use the LZ77 coding scheme in the experiments. 2.3. LZW coding algorithm Unlike LZ77, the LZW coding scheme does not have an existing dictionary. The LZW algorithm reads the data and tries to match a sequence of data bytes as large as possible with an encoded string from the dictionary. The matched data sequence and its succeeding character are grouped together and then added to the dictionary for encoding later data sequences. For an image with n-bit pixels, the compressed code of each pixel occupies n + 1 bits or larger. While a smaller compressed code results in higher compression rate, it also limits the size of the dictionary. For example, a common arrangement uses a 12-bit compressed code for each 8-bit data element. A 12-bit code size allows 4096 entries in the dictionary. If the encoder runs out of space in the dictionary, the traditional LZW encoder must be aborted and a larger compression code size is tried again. The initial dictionary is a collection of roots containing all possible values of an n-bit pixel. Let P be a matched string that is empty at the beginning. The LZW compression algorithm starts from the beginning of the original data stream and performs the following steps: 1. 2. 2.1. 2.2.
Let C be the next character in the data stream. Is string P + C present in the dictionary? If yes, P ⇐ P + C (extend P with C). If no,
J.-Y. Liang et al. / Computerized Medical Imaging and Graphics 32 (2008) 174–182
2.2.1. 2.2.2. 2.2.3. 3.
Output the code word P to the code stream. Add the string P + C to the dictionary. P ⇐ C (P now contains only the character C). Is end of data stream? 3.1. If no, go to step 2. 3.2. If yes, output the code word P to the code stream.
We implement the LZW coding algorithm with 8-bit pixel and 12-bit code size. 2.4. Run-length encoding algorithm The run-length encoding scheme analyzes the data to be compressed by looking for runs of repeated characters. It stores these runs as a single character, preceded by a number representing the number of times this character is repeated in the run. Random data will not be compressed well if there are no runs of repeated characters. The algorithm cannot perform any compression at all. For example, the string “ABAB” will become “1A1B1A1B”, twice the length of the original string. In this case run-length encoding will increase the data size. Run-length encoding has the advantage of being simple to implement, but it is incapable of achieving high levels of compression ratio for most data. Although some graphical images may compress well using run-length encoding, most textual data does not contain long runs of repeated characters. The run-length encoding compression algorithm are described in the following steps:
177
2.5. Differential encoding algorithm Let X be the source data string and Xi be the ith character of X. A difference is generated by taking the differences Xi − Xi−1 . Let Y be the resulting data string. The differential encoding algorithm is described as below: 1. 2. 3. 4. 5. 6. 6.1. 6.2.
Let Y0 ⇐ X0 . Let this ⇐ X1 , last ⇐ X0 , and i ⇐ 1. Compute the difference, diff ⇐ this − last. If diff ≥ 0 then YI ⇐ diff else Yi ⇐ diff + 256. Let i ⇐ i + 1. If i is less than the size of the source string then last ⇐ this, this = Xi . Go to Step 3.
Note that the integer value 256 is added in Step 4 to adjust diff when it is a negative value. The arithmetic operations performed in the algorithm are signed operations. With the adjustment on negative diff, the result of Yi in Step 4 has at most eight non-zero bits. Hence, it does not change the size of the output data from the differential encoding algorithm. Furthermore, the adjustment will allow the differential decoding algorithm to recover the original data. In addition, the differential encoding scheme is only a pre-processing procedure of a data compression algorithm, i.e., it must be followed by other encoding schemes such as the Huffman, LZW, LZ77, or run-length encoding schemes. 3. Hilbert space-filling curves
1. Step through the source data from beginning to end, searching for repeated sequences of characters. 2. Build the compressed data string as the source data is scanned. In this paper, we also compress medical images using the run-length encoding scheme and compare its results with other encoding schemes.
Let Rn be an n-dimensional space. Peano, in 1890, discovered the existence of a continuous curve which passes through every point of a closed square. In 1891, Hilbert [7] presented a curve having the space-filling property in R2 as shown in Fig. 1. The base case of the 2 × 2 Hilbert space-filling curve in Fig. 1(a) is scanned in the order of four-point Gray permutation. The 4 × 4 Hilbert space-filling curve in Fig. 1(b) is constructed by four copies of 2 × 2curves, and the 8 × 8 curve in Fig. 1(c) is then
Fig. 1. Hilbert space-filling curves.
178
J.-Y. Liang et al. / Computerized Medical Imaging and Graphics 32 (2008) 174–182
constructed by four copies of 4 × 4curves. In the recursive construction, the subcurves are connected in a given order, i.e., the order of four-point Gray permutation. Also, the orientation of each subcurve must be adjusted to fit into the connection order. The Hilbert space-filling curve visits all points on a continuous plane if we iterate the curve formation algorithm to the infinite. For a finite space, the Hilbert space-filling curve is a curve on a 2n × 2n grid and visits consecutive neighboring points without crossing the curve. The 2n × 2n Hilbert spacefilling curve is recursively constructed from 2n−1 × 2n−1 Hilbert space-filling curves. A recursive tensor product formula of Hilbert space-filling curve for mapping two-dimensional data from the row-major ordering to the Hilbert space-filling curve ordering is presented in [18]. This recursive tensor product formula is manipulated to an iterative formula. Operations of a tensor product formula can be mapped into program constructs of high-level programming languages. Therefore, the recursive and iterative tensor product formulas of Hilbert space-filling curves are translated into C programs. We use the recursive program to reorder pixels of medical images before applying compression algorithms described in Section 2.
In addition to image compression, an image must be decompressed to be restored to its original format. In order to perform image decompression, the inverse mapping of the Hilbert spacefilling curve ordering must be used. With the recursive tensor product formula of Hilbert space-filling curve, it is possible to derive the formula of the inverse Hilbert space-filling curves [18] and to generate its corresponding C program.
The construction of two-dimensional Hilbert space-filling curves is shown in Fig. 1. The recursive program for converting an image of 2n × 2n points from the row-major order storage to the Hilbert space-filling curve order is illustrated in the following code. The program is written using C programming language. Initially, the image points are stored in the row-major order in array a[ ]of size 4n . Lines 6 to 8 are corresponding to the 2 × 2 base case, which accepts an array of four elements and performs Gray permutation swapping the third and the fourth elements. For the cases of n > 1, the input data is reallocated to 2 × 2 blocks with the block size 2n−1 × 2n−1 and then using 2 × 2 Gray permutation to permute the blocks. For each block, we align the block according to a specific orientation to ensure the correct entry and exit points of the curve of each block. If the blocks are indexed as (0, 0), (0, 1), (1, 0), and (1, 1), the alignment orientations are the transposition, the identity, the identity, and the anti-diagonal transposition, as implemented in Lines 12, 13, 14, and 15, respectively. In Line 16, the intermediate result in array b is copied to array a. Note that the copy operation can simply be implemented as a pointer manipulation. Finally, the recursive construction of Hilbert space-filling curves of the four blocks are implemented as the recursive function calls to hilbert() in Lines 17–20.
4. Experiments and performance evaluation The goal of this paper is to test the effect of Hilbert spacefilling curve ordering and compare various lossless compression algorithms. These algorithms are applied with and without Hilbert space-filling curve ordering. The intention is to verify the
J.-Y. Liang et al. / Computerized Medical Imaging and Graphics 32 (2008) 174–182
effectiveness of Hilbert space-filling curve ordering in lossless medical image compression. Similarly to differentiation encoding, the Hilbert space-filling curve ordering is not a compression scheme, but it is a pre-processing procedure. The experiments are carried out on a personal computer of Pentium III 850 and 384 mega-byte memory running Microsoft Windows 2000. The programs are written in C language and are developed using Microsoft Visual C++ 6.0. JPEG-LS, a lossless compression standard published by ISO/ITU jointly, is also
179
tested and compared. The JEPG-LS program is a shareware published by Signal Processing and Multimedia Group (SPMG). The images being tested are 40 CT images shown in Fig. 2 (a1) to (a40). Each image (8 bit/pixel, dimensions 512 × 512pixels) stored in the bitmap format occupies 263,222 bytes. At first, we measure the entropies of the averages of the images. The entropy of the original CT image is 3.381574. The entropy of the images are rearranged into the Hilbert spacefilling curve order followed using the differential scheme is
Fig. 2. CT medical images.
180
J.-Y. Liang et al. / Computerized Medical Imaging and Graphics 32 (2008) 174–182
Table 1 Average image size of each compression method and pre-processing operation
None (a) (b) (c) (d)
None
(1)
(2)
(3)
(4)
(5)
(6)
(7)
263,222 263,222 263,222 263,222 263,222
111,888 112,266 110,868 113,614 111,300
103,709 101,230 83,915 86,869 81,535
97,312 98,242 93,360 92,125 94,988
118,354 118,354 108,189 110,398 108,189
108,223 108,473 98,869 102,156 99,145
93,170 92,298 77,827 80,248 76,931
169,533 210,914 180,519 220,065 203,540
2.715905. The CT images are processed using the differential scheme followed by the rearrangement into the Hilbert spacefilling curve order is 2.626565. We observe that the CT images processed using the differential scheme and then followed by the rearrangement into the Hilbert space-filling curve order have the lowest entropy. In the experiments, we test various encoding schemes, including run-length encoding, LZ77 coding, LZW coding, and Huffman coding, and their combinations. Along with the encoding schemes, differentiation and Hilbert space-filling curve ordering are applied as pre-processing operations. The experimental cases are summarized as the following: (1) (2) (3) (4) (5) (6) (7)
(a) (b) (c) (d)
the CT images are compressed using LZ77 coding. the CT images are compressed using LZW coding. the CT images are compressed using run-length encoding. the CT images are compressed using Huffman coding. the CT images are compressed using LZ77 and then followed by Huffman coding. the CT images are compressed using LZW and then followed by Huffman coding. the CT images are processed using JPEG-LS alone. For each of the above encoding schemes, the following pre-processing operations are applied before image compression: [(d)] The pixels of a CT image is rearranged into the Hilbert space-filling curve order. The CT images are processed using the differential scheme. The CT images are rearranged into the Hilbert space-filling curve order and then followed by the differential scheme. The CT images are processed using the differential scheme and then followed by the rearrangement into the Hilbert space-filling curve order.
All the compression methods are tested on 40 CT images shown in Fig. 2. The average size and average execution time from the 40 images obtained of each compression method are reported. Table 1 shows the average image size of each compression method of seven encoding schemes (1)–(7) and each of four pre-processing operation (a)–(d). The first data column marked ”none” means that no encoding scheme is applied. The first data row marked “none” means that no pre-processing operation is applied. All the image sizes also count the file header. Note that all the data obtained in the first column is the same as the original image size because pre-processing operations do not reduce the image size. Table 2 is the average bit-per-pixel
Table 2 Bit-per-pixel of each compression method and pre-processing operation
None (a) (b) (c) (d)
None
(1)
(2)
(3)
(4)
(5)
(6)
(7)
8.03 8.03 8.03 8.03 8.03
3.41 3.43 3.38 3.47 3.40
3.16 3.09 2.56 2.65 2.49
2.97 3.00 2.85 2.81 2.90
3.61 3.61 3.30 3.37 3.30
3.30 3.31 3.01 3.12 3.03
2.84 2.82 2.38 2.45 2.35
5.17 6.44 5.51 6.72 6.21
of the same compression methods and pre-processing operations. Let s be an image size. The bit-per-pixel is calculated as s × 8/(512 × 512). The experiments show that case (6d), which applies differential scheme followed by Hilbert space-filling curve ordering pre-processing operations and then compressed using the LZW coding followed by Huffman coding scheme, has the best compression result. The best compression ratio achieves size reduction of 71.77%. The experiments reveals that Hilbert space-filling curve ordering alone may not help to reduce compressed images. However, with the differential pre-processing, the source data are transformed to another format which has better locality. In this case, Hilbert space-filling curve ordering does play its role to enhance compression effect. Finally, we would like to point out that JPEG-LS does better without applying any pre-processing operation. However, its compression ratio is not as good as the other encoding schemes. We also report execution time for image compression and decompression of various encoding schemes and pre-processing operations in Table 3. In Table 3, two execution times are given. The first value denotes execution time for image compression and the other value denotes execution of image decompression. It is worth to note that it takes about 0.156 s to rearrange pixels into the Hilbert space-filling curve order and it takes about 0.191 s to restore pixels from Hilbert space-filling curve order Table 3 Execution time for CT image compression and decompression (second)
None (a) (b) (c) (d)
(1)
(2)
(3)
(4)
(5)
(6)
0.328 0.048 0.482 0.237 0.341 0.056 0.497 0.248 0.497 0.248
0.082 0.051 0.238 0.242 0.094 0.058 0.250 0.251 0.250 0.251
0.043 0.042 0.198 0.234 0.053 0.049 0.210 0.241 0.210 0.241
0.105 0.160 0.263 0.353 0.115 0.168 0.271 0.361 0.272 0.361
0.408 0.196 0.565 0.387 0.419 0.202 0.576 0.396 0.576 0.396
0.155 0.181 0.315 0.377 0.167 0.191 0.325 0.385 0.325 0.385
J.-Y. Liang et al. / Computerized Medical Imaging and Graphics 32 (2008) 174–182
back to the row-major order. Differential and reverse differential operations takes only a minor time of about 0.010 s. The combination of Hilbert space-filling curve ordering and differential takes about the total of the two individual execution times. The execution time overhead for Hilbert space-filling curve ordering ranges from 38.24% to 362.79%. The least overhead is for the compression combined LZW and Huffman coding schemes. The most overhead is for the compression of run-length encoding scheme. If we consider the best compression scheme of differential with Hilbert space-filling curve ordering pre-processing operation and LZW with Huffman coding scheme, the overhead is about 41.17%. 5. Conclusions and future works The paper is to study the effectiveness of Hilbert space-filling curve on lossless medical image compression. We rearrange pixels of CT images according to the Hilbert space-filling curve order before it is applied to each of the four encoding schemes, run-length encoding, LZ77 coding, LZW coding, and Huffman coding. Combination of these encoding schemes are also tested. From the experiments, LZW coding followed by Huffman coding yields the best compression result. Also, by measuring the entropy, we can verify that the lower entropy to get the better compression ratio. However, the pre-processing operation of Hilbert space-filling curve ordering seems not to make major improvement. The LZW coding scheme is very sensitive with the size of table being used in the algorithm. The CT images are not compressed well by enlarging the table size. Changing the word size in the LZW scheme to match the frequently repeated strings size usually improves compression result. Differential pre-processing operation is applied to CT images. Since a CT image has most pixels closed to very white and very dark gray level. Differential pre-processing convert pixels to differences of closer values. It yields more data elements with similar values and improves compression ratio. Following differential pre-processing operation, Hilbert space-filling curve ordering enhances the locality of the differences. It makes the encoding schemes much more effective. We also test CT image compression with multiple times applied differential operation. It does not improve the compression result. Non-medical images, such as colored human portrait images and scenery images are also tested with the same pre-processing operations and encoding schemes. For these images, the Hilbert space-filling curve ordering will give much better results than CT images. However, in reality, these images often do not require loss-less compression. In the future work, we will also study application of the Hilbert space-filling curve ordering to lossy compression methods. The Hilbert space-filling curve presented in the paper rearranges source data of 2n × 2n points. Many images are not necessary satisfy this limitation. One simple compensation is to extend an image with padding pixels. A real solution is to develop space-filling curves with arbitrary size. We will work on the design of space-filling curves of a rectangle space and size other than power of two.
181
References [1] Abel DJ, Mark DM. A comparative analysis of some 2-dimensional orderings. Int J Geogr Inform Syst 1990;4(1):21–31. [2] Butz DM. Space filling curves and mathematical programming. Inform Contr 1968;12(4):314–30. [3] Butz AR. Convergence with Hilbert’s space filling curve. J Comput Syst Sci 1969;3(2):128–46. [4] Chen C-S, Lin S-Y, Huang C-H. Algebraic formulation and program generation of three-dimensional Hilbert space-filling curves. In: Proceedings of the 2004 International Conference on Imaging Science, Systems, and Technology. 2004. p. 254–60. [5] Gaede V, G¨unther O. Multidimensional access methods. ACM Comput Surveys 1998;30(2):170–231. [6] Golomb SW. Run-length encodings. IEEE Trans Inform Theory 1966;IT12:140–9. ¨ [7] Hilbert D. Uber die stetige abbildung einer linie auf Fl¨achenst¨uck. Mathematische Annalen 1891;38:459–60. [8] Huffman DA. A method for the construction of minimum redundancy codes. In: Proceedings of the IRE. 1951. p. 1098–101. [9] Jagadish HV. Linear clustering of objects with multiple attributes. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data. 1990. p. 332–42. [10] Kamata S, Bandoh Y, Nishi N. Color image compression using a Hilbert scan. In: Proceedings of Fourteenth International Conference on Pattern Recognition, vol. 3. 1998. p. 1575–8. [11] Kamata S, Eason RO, Bandou Y. A new algorithm for N-dimensional Hilbert scanning. IEEE Trans Image Process 1999;8(7):964–73. [12] Kamata S, Niimi M, Eason RO, Kawaguchi E. An implementation of an N-dimensional Hilbert scanning algorithm. In: Proceedings of the Ninth Scandlnavian Conference on Image Analysis. 1995. p. 431–40. [13] Kamata S, Niimi M, Kawaguchi E. A gray image compression using a Hilbert scan. In: Proceedings of Thirteenth International Conference on Pattern Recognition, vol. 2. 1996. p. 905–9. [14] Khuri S, Hsu H-C. Interactive packages for learning image compression algorithms. In: Proceedings of the Fifth Annual Conference on Innovation and Technology in Computer Science Education. 2000. p. 73–6. [15] Lempel A, Ziv J. A universal algorithm for data compression. IEEE Trans Inform Theory 1977;23:337–43. [16] Lempel A, Ziv J. Compression of individual sequences via variable-rate coding. IEEE Trans Inform Theory 1978;24(5):530–6. [17] Lempel A, Ziv J. Compression of two-dimensional data. IEEE Trans Inform Theory 1986;32:2–8. [18] Lin S-Y, Chen C-S, Liu L, Huang C-H. Tensor product formulation for Hilbert space-filling curves. In: Proceedings of the 2003 International Conference on Parallel Processing. 2003. p. 99–106. [19] Moon B, Jagadish HV, Faloutsos C, Saltz JH. Analysis of the clustering properties of the Hilbert space-filling curve. Knowl Data Eng 2001;13(1):124–41. [20] Pennebaker WB, Mitchell JL. JPEG still image data compression standard. Van Nostrand Reinhold; 1993. [21] Quinqueton J, Berthod M. A locally adaptive Peano scanning algorithm. IEEE Transactions on Pattern Anal Mach Intell 1981;3(4):403–12. [22] Sagan H. Space-filling curves. Springer-Verlag; 1994. [23] Sayood K. Introduction to data compression. 2nd edn Moorgan Kaufmann; 1991. [24] Shannon CE. A mathematical theory of communication. Bell Syst Tech J 1948;27, 379–423, 623–56. Jan-Yie Liang received the B.S. and M.S. degrees in information engineering and computer science from the Feng Chia University, Taichung, Taiwan, in 1985 and 2004, respectively. He is currently a Ph.D. student at the Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan. His research interests include image processing and software engineering. Chih-Sheng Chen received the B.S. and M.S. degrees in electrical engineering from the National Central University, Taoyuan, Taiwan, in 1985 and 1987, respectively. He is currently a Ph.D. candidate at the Department of Information
182
J.-Y. Liang et al. / Computerized Medical Imaging and Graphics 32 (2008) 174–182
Engineering and Computer Science, Feng Chia University, Taichung, Taiwan. His research interests include image processing, algorithms, and software engineering. Chua-Huang Huang received the B.S. degree in mathematics from Fu-Jen University, Taipei, Taiwan, in 1974, the M.S. degree in computer Science from University of Oregon, Eugene, Oregon, in 1979, and the Ph.D. degree in computer science from the University of Texas at Austin, Austin, Texas, in 1987. From 1979 to 1982, Dr. Huang worked as an assistant researcher in the Telecommunication Laboratories, Ministry of Communication, R.O.C. He was the project manager of a Chinese terminal project. Dr. Huang was an assistant professor in the Department of Computer and Information Science, the Ohio State University, from 1987 to 1993 and an associate professor from 1993 to 1997. From 1997 to 2000, Dr. Huang was a professor at the Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, Taiwan. Since 2000, Dr. Huang has been a professor at the Department of Information and Engineering, Feng Chia University, Taichung, Taiwan. Dr. Huang’s research interests include algorithms, software engineering, and RFID.
Li Liu received the B.S. degree in mathematics from Fu-Jen University, Taipei, Taiwan, in 1989, the Ph.D. degree in Computer Science and Information Engineering, National Taiwan University, in 1995. Dr. Liu worked as Vice Superintendent of Medical Information, Taipei Medical University Hospital since 2003. There he has been in charge of the E-Hospital development and deployment of Taipei Medical University Hospital including: RFID Enabling Medicare Service Platform, Intelligent Community Healthcare Network under inter-domain PKI infrastructures, Hypermedia Medical Resources Data Warehousing, E-Hospital Clinical Info-station, E-Hospital Patient Profile, E-Hospital Intelligence System, E-Hospital CRM, and E-Hospital EBM. In 2006, Dr. Liu was elected to be the chairman of Healthcare Informatics and Management Committee, Taiwan Hospital Association. Since 1996, Dr. Liu has been an associate professor at Graduate Institute of Medical Informatics, Taipei Medical University. Also, staring from 2004, Dr. Liu has been an adjunct professor at National Defense Medical Center, National Defense University. Dr. Liu’s research interest and expertise include Sensor Network, Pervasive Computing, Bio-Informatics, Mobile Computing, Grid Computing, Security, and Image Processing.