Computerized Medical Imaging and Graphics PERGAMON
Computerized Medical Imaging and Graphics 25 (2001) 173±185
www.elsevier.com/locate/compmedimag
State-of-the-art techniques for lossless compression of 3D medical image sets W. Philips a,*, S. Van Assche b, D. De Rycke a, K. Denecker b,1 a
b
Department TELIN, Ghent University, Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium Department ELIS-MEDISIP, Ghent University, Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium Received 20 April 2000
Abstract This paper explains the basic principles of lossless two-dimensional (2D) and three-dimensional (3D) image coding at a high level of abstraction. It discusses also a new inter-frame technique for lossless video coding based on intra-frame prediction and inter-frame context modelling. The performance of this technique is compared to that of state-of-the-art 2D coders on CT and MRI data sets from the Visual Human Project. The results show that the inter-frame technique outperforms state-of-the-art intra-frame coders, i.e. Calic and JPEG-LS. The improvement in compression ratio is signi®cant in the case of CT data but is rather small in the case of MRI data. q 2001 Elsevier Science Ltd. All rights reserved. Keywords: Lossless compression; 3D Medical images; Visual human project
1. Introduction Recent advances in digital technology have caused a huge increase in the acquisition and processing of three-dimensional (3D) image data. For instance, at the Ghent University Hospital, more than 10 Gbyte of medical image data is produced every week. The bulk of this data is produced in the Magnetic Resonance Imaging (MRI) and Computer Tomography (CT) departments; imaging modalities such as Positron Emission Tomography (PET) and Single Photon Emission Tomography (SPECT) account for only about 5% of this data, whereas angiographic and radiological images are still processed and stored by analogue means (e.g. video cassette recorders and ®lm, respectively). In order to cope with large digital storage and transmission requirements, data compression is necessary. In particular, compression allows more cost-effective utilisation of network bandwidth and storage capacity. Perhaps more important, compression may help to postpone the acquisition of new storage devices or networks when these reach their maximal capacity. Finally, compression facilitates the transport of medical images, e.g. from the hospital to a * Corresponding author. Tel.: 132-9-264-3385; fax: 132-9-264-4295. E-mail addresses:
[email protected] (W. Philips), svassche@ elis.rug.ac.be (S. Van Assche),
[email protected] (K. Denecker). 1 Author K. Denecker was supported by the Flemish Fund for Scienti®c Research (FWO Ð Vlaanderen).
private medical practice on removable storage media such as zip-drives and CD-ROMs, because fewer disks are needed and because in some cases the data can be written and read faster. We must distinguish between lossless techniques that do not alter the image data and lossy techniques that introduce some distortion in order to achieve much larger data compression. Although lossy techniques achieve higher compression ratios than lossless techniques, usually the latter are still preferred in medical environments. There are several reasons for this: ² Although it is easy to keep the loss of image quality introduced by lossy techniques below an acceptable level for most images, there is no guarantee that this is the case for any image that may be encountered. Moreover, it is not easy to reach a consensus on how large a quality loss is acceptable [1]. ² Potential legal problems may arise in cases where a diagnosis that is performed on an original image is later contested in a malpractice suit and where only a compressed version of the image is available in court. Many lossless data compression techniques suitable for image coding exist, but ®ve major categories can be distinguished: ² General-purpose techniques. These have been developed for compressing text ®les, software and other data typically encountered in computer applications. They do not exploit
0895-6111/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved. PII: S 0895-611 1(00)00046-X
174
²
²
²
²
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
speci®c characteristics of images (such as correlation between neighbouring pixels' grey values) but despite this fact, often yield a substantial data compression [2,3]. The most important examples are Gzip (based on LempelZiv 1977 coding) [4], Bzip (based on block sorting [5]), and the less well-known Stat (based on context modeling) [6]. 2D image coding techniques. These techniques are speci®cally designed and optimised for coding images and therefore should be able to achieve higher compression ratios than general-purpose techniques.The most promising stateof-the-art techniques lossless image coding techniques include the new 2D image coding standard JPEG-LS [7], BTPC [8], Calic [9,10] and techniques based on integer wavelet transforms [11,12]. Multi-spectral image coding techniques. Multi-spectral images actually consist of several images, each acquired in a different spectral band. They occur, e.g. in remote sensing applications. As these images are not so relevant here, we will not discuss the corresponding compression techniques. 3D image coding techniques. These techniques are extensions of the previous category of 2D techniques for sets of images. They tend to be developed and optimised for medical image data, because medical imaging is one of their most important application areas. Comparatively few 3D coders have been described in literature. Some important examples include integer wavelet based techniques [13], the techniques based on inter-frame context modelling described in Ref. [14] and the inter-frame version of Calic [15]. This last technique was developed for coding multi-spectral images and might be suitable for compression of 3D medical images, but this remains to be veri®ed by experimental studies. Video coding techniques. A video sequence is similar to a 3D image in that it consists of a number of 2D images (ªframesº). As such, video could be compressed by a 3D image coder and vice versa. However, video compression poses some additional requirements, related to the need for real-time decompression at full speed and with small delay. As we will see later, these requirements are not easily satis®ed by ªgeneralº 3D techniques and they somewhat restrict the compression ratios that can be achieved. On the other hand, the high speed of video coders and decoders makes them attractive for general 3D image compression. Lossless video coding techniques are still very rare. The ones that have been proposed in the literature, are often basically 2D techniques that were modi®ed to exploit some information from the previously encoded frame(s): inter-frame Calic and the technique in Ref. [14] combine the intra-frame prediction and modelling steps that typically occur in 2D techniques with inter-frame context modelling. On the other hand, the technique in Ref. [16] adapts a predictive 2D image coder by providing several possible predictors and encoding each pixel using the predictor that performed best at the same spatial position in the previous frame. This technique also uses motion compensation.
In this paper, we ®rst present a brief summary of the stateof-the-art concerning lossless 2D image coders. It is neither our aim to explain these coders in detail, nor to compare their performances. Indeed, there would be little point in doing this as some excellent papers (e.g. [2,3]) have covered these aspects. Instead, we explain the general, common principles shared by such techniques at a relatively high level of abstraction. The explanation is aimed at interested but non-specialised users of such techniques. In Section 3, we describe some recent lossless compression techniques for 3D images. We focus our attention on video coding techniques because these have lower computational requirements than general 3D techniques and are therefore the most likely to gain acceptance in the near future. In Section 4, we present quantitative data on the improvements in compression ratio that can be achieved by switching from intra-frame (2D) compression techniques to inter-frame (3D or video) techniques. The results show that the current state-of-the-art inter-frame techniques outperform 2D coding techniques. Currently, the reduction in bit rate is probably not suf®ciently large yet to make interframe techniques really popular, especially in view of their larger computational requirements. However, there is some room for further improvements.
2. State-of-the-art in 2D compression For lossless compression of 2D images, many techniques are available, e.g. Calic (ªContext-based, Adaptive, Lossless Image Codecº) [9,10], BTPC (ªBinary Tree Predictive Codingº) [8], the S 1 P-transform [12], CREW (ªCompression with Reversible Embedded Waveletsº) [17] and JPEGLS (the new lossless JPEG standard) [7,18±20]. Within this group, Calic attains the highest compression ratios. JPEGLS achieves comparable compression ratios on most contone images but is an order lower in complexity, and therefore much faster. In the remainder of this section, we ®rst describe the most important underlying principles of the aforementioned lossless compression techniques. We also describe the two most important compression techniques, i.e. Calic and JPEG-LS in a little more detail and point out some interesting developments. For a deeper explanation of the other techniques, we refer the reader to Ref. [3]. 2.1. General principles A typical image coder consists of three blocks. The ®rst block is usually a predictor that transforms the image pixels into less correlated prediction errors. The second block, the context modeller, models the statistical properties of the prediction errors. Its output is used by the third block, the statistical coder, which transforms the prediction errors into a bitstream. We now explain each of these blocks in greater detail.
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
2.1.1. Statistical coding The statistical coders that are used in lossless compression techniques are basically very simple. At their input they accept a stream of (integer) input symbols xn ; n 1; 2; ¼ and a table of probabilities Pn;k P
xn k; where P
xn k denotes the prior probability that a given symbol xn will assume the (integer) value k. The statistical coder transforms the input stream xn ; n 1; 2; ¼ reversibly into an output bitstream while exploiting the knowledge of the probabilities Pn,k to minimise the statistically expected number of output bits. The decoder performs the inverse operation and therefore the probabilities Pn,k must be available to both the coder and the decoder, at the time they are needed to compress or decompress particular symbols. There exist many types of statistical coders, but the most popular ones are the Huffman coder and the arithmetic coder, both of which come in many different variants. Note that the table Pn,k only provides information concerning the ®rst-order probability of the symbol variables xn. As such, the coder cannot take advantage of statistical ªpredictabilityº due to inter-symbol correlation. When the symbols xn are statistically independent, the statistical coders are optimal among all possible coders. When the symbols are not independent this is not the case and the statistical coders do not operate optimally. In many techniques, the probability table Pn,k is estimated from the previously encoded symbols xj ; j 1; ¼; n 2 1: Note that this information is also available at the decoder at the time when xn is to be decoded. Therefore, there is no need to explicitly pass the probability tables Pn,k to the decoder in this case. This saves a lot of bits, beacuse the probability table is often very large in practice. As the grey-values of neighbouring pixels are highly correlated, and as the statistical coders cannot exploit this correlation, it is not a good idea to pass the grey values directly (i.e. as the symbols xn) to a statistical coder because that would lead to a very ineffective compression. There exist two different solutions to this problem: prediction and context modelling. Most techniques employ a combination of these solutions. 2.1.2. Prediction The grey values of neighbouring pixels are usually very similar, or Ð using the statistical terminology Ð they are highly correlated. Many techniques exploit this property by ®rst predicting the grey value of the ªcurrentº pixel (i.e. the next pixel to encode) from previously encoded neighbouring pixels. The prediction errors (also called the ªresidualº) are then passed to the statistical coder instead of the original grey value. The prediction step can be viewed as a pre-processing operation that facilitates the job of the statistical coder: it transforms highly correlated input data (pixel grey values) into almost statistically independent output data (residuals), which can be coded more effectively by the statistical coder.
175
The simplest predictors are linear and compute a weighted average of the neighbouring grey values; more complicated predictors try to determine whether the current pixel lies in an area of smoothly varying grey-values or rather near an edge (horizontal, vertical or diagonal) and modify the predicted value accordingly.
2.1.3. Context modelling When the input symbols xn to the statistical coder are not statistically independent, the statistical decoder as described above does not compress the data optimally, because the probability that a particular input symbol xn assumes a value k is not ®xed but also depends on the values of the other input symbols xj ; j , n: In this case, one may de®ne Pn;k
a1 ; ¼an21 ; the probability that the ªcurrentº symbol xn will assume the value k, given that the previously encoded symbols x1 ; ¼xn21 have assumed the values a1 ; ¼; an21 : In context modelling, Pn;k
a1 ; ¼; an21 is estimated for each coding history
a1 ; ¼; an21 and passed as the probability table to the statistical coder instead of Pn,k. In practice, the estimate is updated for each value of n. It can be shown that a statistical coder extended with this kind of context modelling performs optimally from a data compression point of view, even if the input symbols are not statistically independent. Of course, this is only true if the estimates of Pn;k
a1 ; ¼; an21 are accurate. In general, Pn;k
a1 ; ¼; an21 is a highly complicated function of the coding history
a1 ; ¼; an21 : In fact, this function is not only highly complicated, but also nearly impossible to estimate with any accuracy whatsoever in practice. This is because reliably estimating a general highly dimensional probability distribution requires an enormous amount of data, much more than is available in a single image. For this reason, practical context modelling techniques limit the encoding history to a ®xed size coding context
an2m ; ¼; an21 and usually even summarise this context into a single context parameter b , i.e. they approximate Pn;k
a1 ; ¼; an21 < Pn;k
b
a1 ; ¼; an21 and maintain an estimate of Pn,k(b ) for all possible values of b . 2.2. Calic Calic is the most ef®cient and a very complex compressor for contone images [9]. Calic provides both a lossy and a lossless coding scheme. In spite of the complexity of the algorithm, the memory use is very small as it only requires buffering of two image rows at any given time and some additional memory for modelling. Calic starts by predicting the value of the ªcurrent pixelº from the values of a few previously encoded pixels on the ªcurrent rowº and on the two previous image rows. From these pixels, Calic estimates the local grey value gradient, which indicates the presence or absence of horizontal,
176
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
2.3. JPEG-LS
Fig. 1. Schematic description of the Calic image coding system.
vertical and diagonal edges. This information is then used to select the most relevant pixels for predicting the current pixel. This type of relatively complicated and non-linear predictor performs better than the linear predictors employed in earlier techniques. Even so, the prediction errors are not completely uncorrelated and for this reason, Calic features a ªlevel 2º predictor to improve the prediction even further. The ªlevel 2º predictor examines the prediction errors from previously encoded pixels whose neighbours were similar to the neighbours of the current pixel. This is achieved by classifying every pixel into one of 256 classes (depending on its neighbouring pixels) and maintaining an estimate of the mean prediction error within each class. If the average prediction error within the class of the current pixel is found to be non-zero, the predicted value of the current pixel is appropriately adjusted to obtain an unbiased estimate. In order to ef®ciently encode the resulting ªlevel 2º prediction errors, Calic passes them to an arithmetic coder (or a Huffman coder, which achieves the same goal as an arithmetic coder but using different principles) which is driven by a very complicated (but nevertheless relatively fast) context modeller. This context modeller does not build its context directly from previously encoded grey values, but rather from gradient estimates and statistics of previous prediction errors. Of course, all of this information can be derived by the decoder from the values of the previously encoded pixels. As illustrated by the (simpli®ed) schematic description of Fig. 1, Calic has some other rather advanced features that help to improve compression but which we will not describe in detail here. For instance, it is able to detect ªbinary regions,º i.e. regions with only two colours. This enables it to ef®ciently compresses of®ce documents such as faxes, but also regions containing only text annotations in medical imagery.
JPEG-LS combines simplicity with the powerful compression potential of context modelling. Fig. 2 scheme shows a block diagram, in which we can clearly distinguish the combined context-modeller/predictor on the left and a statistical coder on the right. The JPEG-LS modeller/predictor processes the images in raster scan mode and has two basic modes of operation: ªregularº mode and ªrunº mode. 2 The latter is entered in smooth regions, where the limitations of the Golomb±Rice coder (see below) would force it to write at least one bit per pixel. However, coding runs of pixels as super-symbols allows the average number of bits per coded pixel to be less than one. As far as lossless compression is concerned, most pixels are compressed using the ªregularº mode; therefore, we will concentrate on this mode here. In a ®rst step, the current pixel is predicted using the following non-linear predictor that takes the grey values a, b and c of three neighbouring pixels (see Fig. 2) as inputs: 8 min
a; b if c $ max
a; b; > > < x^ max
a; b if c # min
a; b;
1 > > : a 1 b 2 c otherwise: Even though this predictor is very simple, it deals appropriately with at least the most basic type of edges, i.e. horizontal and vertical ones. In Fig. 2, this predictor is called the ª®xed predictor.º The context modeller of JPEG-LS not only provides information to the statistical coder but is also used to improve the prediction in a ªlevel 2º step, which is called ªadaptive correctionº in Fig. 2. As in Calic, the context modeller computes local gradient information and then uses this information to classify the current pixel into one of a number of classes. It also keeps an estimate of the mean prediction error within each class and subsequently adjusts the prediction error to obtain an unbiased estimate. It was experimentally observed that the resulting ªlevel 2º prediction errors can be reliably modelled with a two-sided geometric distribution (TSGD), which has only two parameters (the rate of decay and the mean value). This means that instead of estimating complete probability tables Pn,k(b ) (for use in an arithmetic coder) for every possible context value b , the context modeller can simply estimate the two parameters of the TSGD. This has two important advantages: ®rstly, less memory is required to store two parameters than to store one probability table and secondly, the two parameters of the TSGD can be estimated much more reliably than a general probability table, especially in the early stages of coding when only a few pixels have been encoded. This improves the compression ratio. 2
A third mode, ªrun interruptionº coding, is not discussed in this paper.
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
177
Fig. 2. Overview of the JPEG-LS image coding standard.
The statistical coder used in JPEG-LS is not an arithmetic coder, but rather a Golomb±Rice coder. A Golomb±Rice coder achieves the same compression ratio as an arithmetic coder, but only on data with a TSGD. Furthermore, it is much faster and this is the reason why it was chosen in JPEG-LS. The only disadvantage of the Golomb±Rice coder is that it has to output at least one bit per symbol, which is not the case for a general arithmetic coder. In very smooth image areas, the prediction errors tend to have a distribution that leads to an output bit rate of less than one bit per pixel. Such regions would be coded rather inef®ciently by JPEG-LS, but this problem is avoided by including a run-mode that replaces sequences of identical pixel values by a single ªvalue/run countº pair. 2.4. Other interesting developments Most state-of-the-art image compression techniques use prediction as the pre-processing step that facilitates contextmodelling and arithmetic coding. However, recently, also some techniques that employ transform coding instead of prediction in the preprocessing step have been published. Most of these techniques are based on lossless wavelet transforms in combination with (improved) ideas from lossy wavelet coding, e.g. zero-tree encoding [11]. A few papers also consider other transforms, e.g. the Hadamard transform [21] or the discrete cosine transform (DCT) [22] and extend these transforms towards lossless coding. Recently, the research efforts shift from achieving higher compression ratios and lower processing time to providing additional ªfeaturesº. For example, the embedded wavelet based image coding technique in Ref. [23] allows certain regions of the image to be coded losslessly, while the remaining regions are coded in a lossy manner. Similarly, other techniques, e.g. the one in Ref. [24] allow users that wish to download a large image to specify one or more regions of interest, which are then downloaded ®rst after which the image quality in the remaining less interesting regions is built up gradually.
Many techniques also allow ªprogressiveº decompression, i.e. the decompression process is such that it gradually builds up the image quality as more and more compressed data is decoded. This is useful in situations where both lossy compression at lower bit rates and lossless compression at higher bit rates must be supported simultaneously. 3. Video compression techniques and 3D techniques 2D image coding techniques can be trivially extended to 3D by applying them to each image frame individually. However, such an approach cannot exploit inter-frame redundancies, i.e. the property that pixels at the same position in neighbouring image frames are usually very similar. Clearly, techniques exploiting such inter-frame redundancies should be able to provide a better data compression. Lossless video compression techniques are obviously useful for coding medical video sequences, i.e. angiographic sequences. By exploiting temporal correlation, i.e. correlation between image frames, video coders can achieve lower bit rates than still image coders, which code each frame independently of the others. Lossless video coders can also be used to compress 3D medical image sets; indeed, by treating one of the dimensions as ªtime,º any 3D image is turned into a video sequence. Each image slice then corresponds with one video frame. Vice versa, a video sequence can also be treated as a 3D image. Nevertheless, compression techniques for 3D images and video sequences are often radically different because of different decoding requirements. Indeed, video is usually compressed and decompressed as a ªstreamº in a (nearly) causal fashion: the image sequence is encoded frame by frame in time-order. In this process, the coder is allowed to exploit information contained in previously encoded frames, but not information in frames ªfrom the future.º This allows the decoder to start decompressing a particular frame as soon as the compressed data for that frame becomes available and therefore to minimise
178
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
the delay due to compression and decompression in realtime transmission applications. In practical coders, e.g. MPEG-II compliant ones, 3 the ªcausal codingº restriction is somewhat loosened and the encoder is also allowed to take advantage of a few frames from the near future to increase the compression ratio; these so-called B-frames are predicted from both a preceding and a succeeding frame. This approach leads to an additional small delay in real-time video streaming because the decoder cannot decode the B-frame until it has received the succeeding frame. Summarising, video coders can take advantage of correlation between the current frame and ªpastº frames rather well, but can take little advantage of the correlation between the current frame and ªfutureº frames. 3D image coders do not have these restrictions. They can take into account ªpreviousº and ªnextº image slices equally well when encoding a particular slice. As such, from a compression ef®ciency point of view (and at least in theory) 3D image coders should perform better than video coding techniques. Nevertheless, video coders have a number of advantages: they have smaller memory requirements (because they use fewer frames at a time) and they are often faster (because they are usually designed for real-time applications). Also, video coders have a wider application range than 3D image coders and will therefore innovate faster. As mentioned before, relatively few papers have been published on inter-frame techniques for lossless 3D image or video coding. Furthermore, most published techniques operate in stream mode and can thus be classi®ed as video coders, even though they were developed mainly for coding of 3D images. One noteworthy example of a 3D coder that is not a video coder is the 3D integer wavelet transform coder in Ref. [13]. This technique uses 3D integer wavelet packet transforms and allows progressive lossy to lossless decompression. Below we describe some common principles of lossless video coders in greater detail and we give some examples of such techniques. 3.1. Inter-frame prediction and motion compensation A simple approach to exploit inter-frame redundancies is to extend the 2D predictors that are typically used in 2D image coding techniques to 3D. In that case, the predicted value of a pixel is not only based on pixels from the current frame but also on pixels from the previous frame(s). In Ref. [14], the authors investigated whether the predictive value of simple linear predictors can be improved by including pixels from previous frames into the prediction. It was found that this was not the case, even when the coef®cients of the 3D predictor are optimised for this purpose. In fact, the 3 MPEG stands for ªthe Motion Pictures Experts Groupº, which has developed a number of standards for lossy video coding for multimedia and digital television applications; MPEG-II is the current digital television standard.
simple intra-frame predictor of LJPEG [25], the predecessor of the current JPEG-LS compression standard, was found to always perform at least as well as an optimal 3D predictor; sometimes the simple predictor performed even better. Researchers have investigated whether the previous disappointing ®nding also holds good for non-linear predictors. For instance, the technique in Ref. [16] investigated a predictor that switches between several intra-frame linear predictors based on inter-frame knowledge: for each pixel, the predictor that yielded the smallest prediction error at the same position in the previous frame is selected. The switching process makes the overall predictor non-linear. The results in Ref. [16] concerning this ªPrevious Best Predictorº (PBP) are somewhat disappointing, at least for video sequences and as far as spatio-temporal decorrelation is concerned. 4 The authors of Ref. [16] blame the poor performance of the PBP on the fact that objects (identi®ed as regions of similar grey values or colours) are displaced in different frames of a video sequence due to motion and demonstrated that a predictor based on block motion compensation is more successful. The principle of motion compensation, as is typically included in MPEG-II compliant coders, is simple: when a video camera shoots a scene, objects that move in front of the camera will be located at slightly different positions in successive frames. Block motion compensation considers a square region in the current frame and looks for the most similar square region in the previous frame (motion estimation). The pixels in that region are then used to predict the pixels in the square region in the current frame (motion compensation). Note that the motion vectors, which indicate the position of the best block in the previous frame have to be included in the compressed ®le; this slightly reduces the bene®ts of motion compensation. Block motion compensation depends on the assumption that objects only move from frame to frame but do not deform or rotate. This assumption is not always satis®ed in video applications, e.g. because objects may rotate, or may shrink as they move away from the camera. Nevertheless, the results in Ref. [16] indicate that in combination with simple linear 3D predictors, block motion estimation is quite successful. Furthermore, the compression ratio increases even more when the ªsum of absolute differencesº similarity criterion that is used in MPEG-II block motion estimation is replaced with a new criterion that aims to minimise the sum of the prediction errors produced by the 3D predictor. The assumption of translational motion, which is not always satis®ed in video sequences, is even less realistic in 3D image sets, where image frames are not snap shots of moving objects but rather plane slices of 3D objects. Nevertheless, motion compensation does remove some of 4 However, signi®cant improvements were reported in exploiting spectral redundancies in colour images.
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
Fig. 3. The technique of [26] which switches between motion compensation (the box labeled ªinter-frame predictionº) and intra-frame prediction.
the correlation between image frames and may therefore also be useful in the coding of medical 3D images. An important disadvantage of motion estimation is that it is a very time-consuming process. In Ref. [26], this problem is reduced by performing the motion estimation and compensation only for some pixels (see Fig. 3: many pixels are simply encoded using JPEG-LS and for these pixels no motion estimation/compensation is performed). Speci®cally, a very simple activity measure is computed for every pixel. This activity measure (for which different possibilities are proposed in Ref. [26]) predicts the magnitude of the intra-frame prediction error for the current pixel, based on neighbouring pixels. When the activity measure is low, pure intra-frame prediction performs so well that no bene®t can be possibly expected from motion compensation. Only when the measure is high, the computationally intensive motion compensation is attempted. In practice, motion compensation is only attempted for a small number of pixels (less than 1%), which makes the technique very fast. The practical implementation of this ªpartial motion compensationº technique makes use of JPEG-LS (see Fig. 3): the switching block computes the activity measure and
Fig. 4. A simple technique based on a combination of intra-frame prediction and inter-frame context modeling.
179
switches accordingly between JPEG-LS and inter-frame coding. This approach is very powerful, because almost no modi®cations to the JPEG-LS coder are required. This facilitates implementing backward compatibility with JPEG-LS, the use of special hardware for JPEG-LS, etc.; also, possible future enhancements to JPEG-LS can effortlessly be taken advantage of. 3.2. Inter-frame context modelling It is also possible to exploit inter-frame knowledge in the context modelling step rather than in the prediction step. For instance, the technique in Ref. [16] classi®es each prediction error into eight classes based on an ªactivityº statistic derived from neighbouring prediction errors in the current and previous frame. The underlying idea is that the prediction error statistics are rather different in, e.g. smooth regions and regions with static or moving edges. Similarly, a medical 3D image compression technique proposed in Ref. [14] uses a very simple linear purely intra-frame predictor (predictor number 7 in the LJPEG standard), but it incorporates inter-frame information in the context model. A simpli®ed overview of this technique is shown in Fig. 4: ®rst, for each pixel a purely intra-frame prediction is made. Then, after subtracting this prediction from the original pixel value, a residual is obtained (and stored for later use in the compression of the next frame). In the context modelling step, the context parameter for encoding the current pixel is a quantised version of the intraframe prediction error in the previous frame at the same position. The underlying idea is that in smooth regions, where intra-frame prediction works well, the context parameter will be low. On the other hand, near moving or static edges, the context parameter will be high. Hence, the context model effectively forces the use of different probability tables in both of these region types. This is bene®cial to the arithmetic coder because the prediction error statistics are indeed very dissimilar in both types of regions. It was experimentally shown that this inter-frame context modelling leads to an additional 10% increase in compression ratio. Later, the authors have extended the technique described in Ref. [14] in several ways: ®rstly, the simple inter-frame context model is extended by combining it with the typical intra-frame context parameters used in other 2D compression techniques (e.g. JLPEG-LS) and by incorporating a lot of little tricks inspired by other compression techniques. Secondly, the simple linear intra-frame predictor was replaced with a more complicated two-level predictor, similar to the one used in JPEG-LS. The details of the extended technique have not yet been published, but a simpli®ed block scheme is shown in Fig. 5. In theory, inter-frame redundancies can be exploited equally well by inter-frame predictive techniques as by inter-frame context modelling techniques. Therefore, one may wonder whether one should use inter-frame
180
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
Fig. 5. A more sophisticated technique based on a combination of intra-frame prediction and inter-frame context modelling.
information in the predictor only, in the context modeller only, or in both. This question has not been answered clearly yet. Some results, e.g. the ones in Ref. [16], indicate that incorporating intra-frame information in both the prediction and the context modelling step is better than incorporating it only in the prediction step. This implies that the inter-frame context modeller is able to compensate for the de®ciencies of the inter-frame predictor. On the other hand, an important advantage of inter-frame modelling over inter-frame prediction is its inherent graceful compression ratio degradation when correlation between two consecutive frames drops. In essence, combined inter/ intra-frame modelling reduces to pure intra-frame modelling when there is no correlation between successive frames and this independently of the design of the context modeller. On the other hand, the compression degradation of an interframe predictor depends crucially on the details of its design. Such a predictor may in fact perform worse than an intra-frame predictor when the correlation between successive frames is low. Therefore, it seems safer to rely on inter-frame information only in the context modeller, but not in the predictor. 4. Experimental results As we mentioned before, relatively few papers on 3D image and video compression have been published. Also, the experimental results in these papers are all obtained on
different input data, some of which is not even of a medical nature. Obviously, the only reliable way to experimentally compare the different techniques would be to apply them to the same medical image data. Unfortunately, this is very dif®cult at the moment because software implementations for the techniques that have been published are dif®cult to get hold of. On the other hand, the state-of-the-art in 3D image and video coding is evolving far more rapidly than in 2D image coding: indeed, state-of-the-art 2D techniques such as JPEG-LS combine many advanced ªtricksº developed during years of research by many researchers. On the other hand, the research on lossless 3D image and video compression techniques is still in the stage of investigating the performance of basic principles (e.g. inter-frame context modelling or inter-frame prediction), but many combinations of these principles remain unexplored. For this reason, it is too soon for a thorough evaluation and comparison of these techniques. In view of the above, we will restrict ourselves here to an evaluation of the lossless video coding techniques that we have developed ourselves and we will not attempt to compare these techniques to other 3D image and video coders. Our main intent is to investigate whether exploiting inter-frame correlation presents signi®cant bene®ts in lossless coding and whether in the future such techniques are likely to replace the new 2D image coding standards such as JPEG-LS. In the short term, in view of the immaturity of the 3D image coding techniques, 2D image coding standards will
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
181
Fig. 6. Normalised sample images (MRI and CT) from the VHP data set.
prevail. For an experimental evaluation of 2D techniques, we refer the reader to Refs. [2,3]. At the moment, we strongly recommend JPEG-LS because it not only offers one of the best compression ratios, but it is also one of the fastest techniques. Furthermore, as JPEG-LS is now standardised, software implementations will soon appear for many different computing platforms. In the remainder of this section, we present some results obtained with lossless video coders on MRI and CT image sets taken from the Visible Human Project (VHP) [27]. Fig. 6 shows two example images from these sets. 5 The VHP data set is freely available and is therefore an excellent candidate for comparison of compression algorithms on medical data. Within the VHP data set, we selected images from the cadaver of a 59-year old woman. This data set was chosen because it is more recent than the one of the male subject, and should therefore give a better indication of the compression ratios that can be achieved on images acquired using modern medical equipment. The image modalities that we consider are CT and MRI; in the latter case we investigated PD (proton density), T1- and T2-weighted images. The CT images and the MRI images have a resolution of 512 £ 512 pixels and 256 £ 256 pixels, respectively. The grey scale resolution is 12 bit per pixel in each case. In the CT data set, pixels within a frame are separated by a distance of 0.93 mm, whereas the inter-frame distance is 1 mm. For the MRI data sets, the corresponding numbers are 1.88 and 4 mm. In the following, we compare our inter-frame method (the one in Fig. 5) to JPEG-LS and Calic, the current state-ofthe-art purely intra-frame coding techniques. Our main aim is to present some preliminary data on the possible bene®ts of exploiting intra-frame redundancies. In the case of JPEGLS and Calic, each frame is compressed independently of
the others, whereas for the inter-frame technique, information from the preceding frame is also used. We wish to point out that the bit rates that we report here for our method are not actual bit rates, but rather close estimates. This is because we did not implement the actual arithmetic coder, but instead estimated the bit rate that it would produce within a given coding context by computing the so-called ªempirical entropyº of the input data within that context. Experience with arithmetic coders in other image coding techniques has learned that the actual bit rates are usually very close to these entropies. Furthermore, we would like to point out that our method is not optimised to the same degree as JPEG-LS and Calic. Therefore, the bit rate improvements that we will demonstrate for our method are probably conservative estimates. Figs. 7±11 show compression results for the VHP CT and MRI data sets. Each of these graphs displays bi /p, where bi is the total number of bits needed to encode the ith frame of
5 For display purposes, we normalised the histograms of these images, to improve visibility; however, in the experiments, the actual unnormalised data was used.
Fig. 7. Comparison of the video compression technique of Fig. 5, with 2 state-of-the-art 2D compression techniques, i.e., Calic and JPEG-LS on the VHP CT data set.
182
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
Fig. 8. Enlargement of Fig. 7.
Fig. 10. Comparison of the video compression technique of Fig. 5, with 2 state-of-the-art 2D compression techniques, i.e., Calic and JPEG-LS on the VHP MR-T1 data set.
the sequence and p is the number of pixels in a frame. The number bi/p is expressed in ªbit per pixelº and represents the average number of bits needed to encode the pixels of a particular frame. Of course, in the case of the inter-frame coder, bi depends not only on the contents of the ith frame, but also on the contents of the previous frames. Table 1 displays some statistical data about the results in the ®gures. The ®rst number in each table cell is the average bit rate, again expressed in ªbit per pixel.º This is also equal to b/(Np), where b is the total number of bits needed to encode the complete sequence, N is the number of frames in the sequence, and p is the number of pixels in each frame. The second number on the cells of Table 1 is the variance of bi/p. As is also clear from the ®gures, not all frames
compress equally well and the variance quanti®es this variability in bit rate. The results in Table 1 show that the inter-frame coder offers a signi®cant reduction in bit rate over the intraframe techniques Calic and JPEG-LS for coding CT image data: compared to JPEG-LS, the bit rate of the inter-frame technique is about 9% lower. In the case of the MRI sets, intra-frame modelling still offers an advantage, but the reduction in bit rate is rather small and only of the order 4%. The fact that the reduction in bit rate is lower for MRI is not surprising: the physical distance between two slices is 4 mm in the MRI sequences and only 1 mm in the CT sequence. As such, the inter-frame correlation is much
Fig. 9. Comparison of the video compression technique of Fig. 5, with 2 state-of-the-art 2D compression techniques, i.e., Calic and JPEG-LS on the VHP MR-PD data set.
Fig. 11. Comparison of the video compression technique of Fig. 5, with 2 state-of-the-art 2D compression techniques, i.e., Calic and JPEG-LS on the VHP MR-T2 data set.
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185 Table 1 Mean bit rate (bit per pixel) and standard deviation
CT MR PD MR T1 MR T2
Calic
JPEG-LS
Inter-frame
4.7 ^ 0.65 4.7 ^ 0.89 4.5 ^ 0.78 4.4 ^ 0.73
4.8 ^ 0.69 4.8 ^ 0.91 4.6 ^ 0.80 4.5 ^ 0.74
4.4 ^ 0.55 4.6 ^ 0.91 4.4 ^ 0.79 4.3 ^ 0.73
lower in the MRI sequences and the gain that can be expected from exploiting this correlation is also lower. As we have pointed out before, the above comparison is not completely fair because the proposed inter-frame coder is not optimised to the same degree as, e.g. JPEG-LS. It is reasonable to expect that an optimised implementation would yield a higher gain in bit rate. Even when taking this into account, the results suggest that the additional savings that can be expected from exploiting inter-frame correlation are small compared to the savings that can be achieved from exploiting intra-frame redundancies. On the other hand, inter-frame coders are usually slower (because they perform additional computations). Therefore, on balance, it is not clear whether the proposed video coder in its current form would gain rapid acceptance. From the graphs of Figs. 7, 9±11 some additional interesting conclusions can be drawn. The graph in Fig. 7 for the CT data set ®rst of all con®rms that the bit rate varies considerably within the CT data set. For many frames, the inter-frame technique yields a signi®cantly smaller bit rate than JPEG-LS and Calic. Often the reduction is of the order of 0.5 bit per pixel, i.e. about 12%. However, for some frames, JPEG-LS and Calic outperform the inter-frame technique. We do not yet know why this is the case, but this result shows that it should be possible to improve the inter-frame coder, e.g. by switching off the use of interframe information for these frames. Another strange phenomenon is observed between frames 1000 and 1800 of the CT data set. Fig. 8 shows an enlargement of this region. Clearly, the bit rate ¯uctuates from frame to frame, i.e. frames that compress well alternate with frames that compress less well. The ¯uctuation is largest for the inter-frame technique but is also present in the intra-frame results. The graphs for the MRI data set display fewer interesting features. They also show a considerable variation in bit rate over the sequence. Furthermore, the inter-frame technique seems to outperform the intra-frame techniques on almost all frames, which agrees with our expectations. Finally, the graphs display a few sudden drops and increases in bit rate for which we do not have a good explanation at the moment. 5. Conclusions In this paper, we ®rst presented a general classi®cation of lossless image compression techniques. General-purpose
183
techniques and 2D image coding techniques have been studied and offer quite substantial compression ratios in, e.g. medical applications. On the other hand, the research concerning 3D image coding techniques and lossless video coding techniques is still in its infancy. Several papers have reported experimental results on the performance of 2D image coders. These results showed that Calic and JPEG-LS are the current stateof-the-art techniques; furthermore JPEG-LS is the preferred technique because it is faster. In this paper, we presented an overview of the basic principles of 2D coding, but we chose not to include an experimental comparison of these techniques, because this is readily available in other papers. Section 3 of this paper described some recent developments in the area of lossless general 3D coders and the more specialised lossless video coders, which are specialised cases of general 3D coders. Video coders are of course useful for coding medical video (e.g. angiographic sequences) but they can also be used to compress 3D data sets. For the latter purpose, video coders are more attractive than general 3D techniques because they are faster. The basic additional principles involved in lossless 3D and video coding over 2D lossless image coding are inter-frame prediction and inter-frame context modelling. Simple linear predictors are not capable of exploiting inter-frame correlation, but non-linear predictors do lead to a reduction in bit rate. Such predictors can take different forms. The ªprevious best predictorº scheme switches between different linear predictors according to the performance of these predictors in the previous frame. Motion compensation techniques can be viewed as another type of non-linear predictors but are rather time-consuming. Inter-frame context modelling is another way of exploiting information from the previous frame(s). However, the information that is exploited is of a statistical nature. Of course, inter-frame context modelling can be combined with inter-frame prediction, but it is not yet clear whether such a combination can perform signi®cantly better than inter-frame prediction or inter-frame context modelling on their own. The authors have developed their own inter-frame compression technique, which is based on a combination of intra-frame prediction and inter-frame context modelling. The results in Section 4 show that such a relatively simple inter-frame technique outperforms intra-frame techniques such as Calic and JPEG-LS on CT and MRI data taken from the Visual Human Project. The improvement is significant in the case of CT data but is rather small in the case of MRI data. In any case, it is probably not large enough to lead to a rapid adoption of video coders in medical 3D image compression, especially since they require more computation time. On the other hand, the results can be improved upon in several ways, so this conclusion may be reversed in the future.
184
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185
References [1] Okkalides D. Assessment of commercial compression algorithms, of the lossy DCT and lossless types, applied to diagnostic digital image ®les. Computerized Medical Imaging and Graphics 1998;22(1):25± 30. [2] KivijaÈrvi J, Ojala T, Kaukoranta T, Kuba A, LaÂszlo N, Nevalainen O. A comparison of lossless compression methods for medical images. Computerized Medical Imaging and Graphics 1998;22(4):323±39. [3] Denecker K, Van Assche S, Philips W, Lemahieu I. State of the art concerning lossless medical image coding. In: Veen J-P, editor. Proceedings of the PRORISC IEEE Benelux Workshop on Circuits, Systems and Signal Processing, Mierlo, NL, STW Technology Foundation, 1997. p. 129±36. [4] Gailly J-L. Documentation of the GZIP program, 1996. ftp://prep.ai.mit.edu/pub/gnu/gzip-1.2.4.tar.gz. [5] Burrows, M., Wheeler, D., A block-sorting lossless data compression algorithm, Technical report, Digital SRC Research Report 124, May 1994. ftp://ftp.digital.com/pub/DEC/SRC/research-reports/ SRC-124.ps. gz. [6] Bellard F. Compression statistique aÁ contexte ®ni, June 1995. http:// www.polytechnique.fr/~bellard. [7] Weinberger M, Seroussi G, Sapiro G. The LOCO-I lossless image compression algorithm: Principles and standardization into JPEG-LS. Technical Report HPL-98-193, HP Computer Systems Laboratory, November 1998. http://www.hpl.hp.com/ techreports/98. [8] Robinson J. Ef®cient general-purpose image compression with binary tree predictive coding. IEEE Transactions on Image Processing 1997;6(4):601±8. [9] Wu X. Lossless compression of continuous-tone images via context selection, quantization and modeling. IEEE Transactions on Image Processing 1997;6:656±64. [10] Wu X, Memon N. Context-based adaptive lossless image coding. IEEE Transactions on Communications 1997;45:437±44. [11] Munteanu A, Cornelis J, Cristea P. Wavelet-based lossless compression of coronary angiographic images. IEEE Transactions on Medical Imaging 1999;18(3):272±81. [12] Said A, Pearlman W. Image compression via multiresolution representation and predictive coding. Visual Communications and Image Processing, SPIE 1993;2094:664±74. [13] Xiong Z, Wu X, Yun D, Pearlman W. Progressive coding of medical volumetric data using three-dimensional integer wavelet packet transform. Proceedings of the 1999 Visual Communications and Image Processing 1999;3653:327±35. [14] Van Assche S, Denecker K, Philips W, Lemahieu I. Lossless compression of three-dimensional medical images. In: Veen J-P, editor. Proceedings of the PRORISC IEEE Benelux Workshop on Circuits, Systems and Signal Processing (PRORISC'98), 1998. p. 549±53. [15] Wu X, Choi W, Memon N. Lossless interframe image compression via context modeling. In: Storer JA, Cohn M, editors. Proceedings of the Data Compression Conference, 1998. p. 378±87. [16] Memon N, Sayood K. Lossless compression of video sequences. IEEE Transactions on Communications 1996;44(10):1340±5. [17] Zandi A, Allen J, Schwartz E, Boliek M. Compression with reversible embedded wavelets. In: Storer JA, Cohn M, editors. Proceedings of the Data Compression Conference, 1995. p. 212±21. [18] Hewlett Packard labs. JPEG-LS binaries v0.90. http:// www.hpl.hp.com//itc/csl/rcd/infotheory/loco.html. [19] Weinberger M, Seroussi G, Sapiro G. LOCO-I: a low complexity, context-based, lossless image compression algorithm. In: Storer J, Cohn M, editors. Proceedings of the IEEE Data Compression Conference, 1996. p. 140±9. [20] ISO/IEC. Lossless and near-lossless compression of continuous-tone
[21]
[22] [23] [24]
[25] [26]
[27]
still images ISO / IEC JTC 1/SC 29/WG1, FCD 14495 Ð public draft edition, July 1997. Philips W, Denecker K. A new embedded lossless/quasi-lossless image coder based on the Hadamard transform. Proceedings of the 1997 IEEE International Conference on Image Processing (ICIP'97) 1997;1:667±70. Philips W. The lossless DCT for combined lossy/lossless image coding. Proceedings of the International Conference on Image Processing ICIP'98 1998;3:871±5. Nister C, Christopoulos D. Lossless region of interest with a naturally progressive still image coding algorithm. IEEE International Conference on Image Processing (ICIP98) 1998;3:856±60. Rogge B, Lemahieu I, Philips W, Denecker K, De Neve P, Van Assche S. Region of interest-based progressive transmission of greyscale images across the Internet. In: Beretta GB, Eschbach R, editors. SPIE Proceedings of the Conference on Color Imaging: Device-Independent Color, Color Hardcopy, and Graphic Arts IV, vol. 3648. 1998. p. 365±72. International Telegraph and Telephone Consultative Committee (CCITT). Digital Compression and Coding of Continuous-Tone Still Images, Recommendation T.81, 1992. De Rycke D, Philips W. Lossless non-linear predictive coding of video data through context matching. In: Sanchez B, Torres M, Langlois DG, editors. The Fifth International Conference on Information Systems Analysis and Synthesis (ISAS'99), 1999. p. 42±49. National Library of Medicine. Visible Human Project. http:// www.nlm.nih.gov/research/visible/getting_data.html.
Wilfried Philips was born in Aalst, Belgium on October 19, 1966. In 1989, he received the Diploma degree in Electrical Engineering and in 1993 the PhD degree in Applied Sciences, both from the Ghent University, Belgium. From October 1989 until October 1997 he worked at the Department of Electronics and Information Systems of Ghent University with the Flemish Fund for Scienti®c Research (FWO Ð Vlaanderen), ®rst as a research assistant and later as a Post-Doctoral Research Fellow. Since November 1997 he is a Lecturer at the Department of Telecommunications and Information Processing of Ghent University. His main research interests are image restoration, image analysis, and lossless and lossy data compression of images and video and processing of multimedia data.
Steven Van Assche received a Degree in Electrical Engineering from Ghent University, Belgium in 1996. Since then he is a PhD-student and a Research Assistant at the Department of Electronics and Information Systems (ELIS) of the same university. His main research topics are lossless compression of halftone and color images.
W. Philips et al. / Computerized Medical Imaging and Graphics 25 (2001) 173±185 Dirk De Rycke was born in Ghent, Belgium on August 22, 1975. He received a degree in Computer Science from Ghent University in 1997. During 1998±99 he was a Research Assistant at the Department of Telecommunications and Information Processing of Ghent University; his main research area was compression techniques for medical threedimensional image data. His work was ®nancially supported by a research grant (IWT/SB/981362) of the Flemish Institute for the Advancement of Scienti®c-Technological Research in Industry (IWT). Currently he is working at ICOS, a Belgian company specialising in the inspection of semi-conductor devices. His current interests include general-purpose compression, lossless and near-lossless image compression, image processing, open source development, automated visual inspection.
185
Koen Denecker received a degree in Physical Engineering from Ghent University, Belgium in 1995. Since then, he is a PhD-student in Computer Engineering at the Department of Electronics and Information Systems (ELIS) at the same university. His main interests are lossless and near-lossless image compression for medical and prepress applications. His work is ®nancially supported by the Flemish Fund for Scienti®c Research (FWO Ð Vlaanderen).