Microprocessors and Microsystems 25 (2001) 19±31
www.elsevier.nl/locate/micpro
A hybrid system for real-time lossless image compression G.W. Drost a,*, N.G. Bourbakis a,b a
Image-Video & Machine Vision Laboratory, Binghamton University, Binghamton, NY 13902, USA b Technical University of Crete, Department of ECE, Chania 73100, Crete, Greece Received 11 April 2000; revised 12 October 2000; accepted 13 November 2000
Abstract This paper describes a hardware design for a real-time lossless image compression system based on the organized matrix scanning methodology, SCAN. SCAN is a special purpose context-free language which describes and generates a wide range of array accessing algorithms from a short set of simple ones. These algorithms may represent scan techniques for image processing, but at the same time they stand as generic data accessing strategies. In this system, an 8-bit gray scale image is transformed into an error image using differential pulse code modulation (DPCM). This error image is subdivided into 64 £ 64 pixel blocks, and the bit planes associated with each block are scanned with four different scan sequences. For each image block and each bit plane, the results of the scan which most closely matches the grain of the image is Huffman coded for transmission. High data rates associated with the real-time requirement make a software solution impossible. Pipelining is employed to simultaneously perform the functions of image input, DPCM, SCAN based run-length encoding, Huffman coding and output. Parallel processing is utilized to perform the scanning and run-length encoding on all bit planes simultaneously. In addition to compression, the use of the SCAN methodology provides good data encryption, and can be extended to include data hiding or watermarking. q 2001 Published by Elsevier Science B.V. Keywords: Real-time image compression; Hybrid hardware design
1. Introduction An uncompressed digital gray scale 512 £ 512 pixel video at 1 byte/pixel and 30 frames per second requires a bandwidth of 7.87 Mbps. Ninety minutes of this video would require 42.5 Gb of storage. One 1024 £ 1024 pixel, 8-bit gray scale chest X-ray would require more memory capacity than a 3 1/2 in. high density diskette can provide today. Even though storage costs are decreasing, the volume of digital image data to be transmitted and stored is increasing at a much faster rate. A broad range of data compression schemes has been developed to alleviate these problems. However, when lossless compression is required, such as for satellite photography or medical imaging, and realtime processing at 30 frames per second is also critical, existing solutions have left room for improvement. Bourbakis et al. [1] have proposed a bit-plane processing system which utilizes an image scanning language called SCAN [2]. If an 8-bit gray scale image matrix is separated into its eight binary components or bit planes, and each of these 2D arrays is scanned with the appropriate SCAN language algorithm, then pixels with the same binary value can be grouped together and run-length encoding * Corresponding author. 0141-9331/00/$ - see front matter q 2001 Published by Elsevier Science B.V. PII: S 0141-933 1(00)00102-2
would yield very good compression. Actual test results show that this methodology works very well for the more signi®cant bit planes, but compression becomes impossible for the less signi®cant bit planes due to the random nature of these image patterns as illustrated in Fig. 1. Although this SCAN language based bit-plane processing approach focuses on the spatial redundancy that exists between neighboring pixels, it must also consider the correlation that exists between bit planes. Wang has proposed a solution which ®rst performs differential pulse code modulation (DPCM), and then bit-plane processing [3]. DPCM exploits the inter-plane correlation by producing a difference matrix with an average value much smaller than the average value of the original image. It reduces the number of binary 1s in the more signi®cant bit planes, and therefore increases compression because run-length encoding will have fewer groups of binary 1s to encode. The lossless compression methodology presented in this paper merges the two approaches above. If an image is transformed into an error image using DPCM, then the error image is divided into equal size blocks (in the order of 64 £ 64), it can be shown that bit-plane processing at the block level using a variety of SCAN language algorithms, will produce varying degrees of compression. The amount of compression for a given block will be based on three
20
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
Fig. 1. Mona Lisa bit planes 0±7.
factors: the texture or micro-topology contained within the block, the region grain or macro-topology and how well the scan pattern can match this macro-topology. Some lossless compression schemes exploit micro-level redundancy. Ranganathan et al. [4] use block matching at the 2 £ 2 pixel level as part of a lossless image compression methodology which achieves 4.42 bits/pixel on Lenna. This is very dif®cult to accomplish in real time, however, and is not part of the proposed system. The system presented here achieves good performance by utilizing a scan pattern that matches the macro-topology of the image block being processed as closely as possible. It is accomplished without a priori knowledge of the image by scanning each bit plane of each block with four different scan patterns and selecting the best results for output. Not only does the best scan pattern vary from block to block, it frequently varies from bit plane to bit plane for the same block. All this is accommodated in the proposed approach. Another signi®cant feature of this approach is a bypass mode which causes raw error image bit-plane data to be passed through as output for less signi®cant bit planes when reverse compression would otherwise result. This prevents dif®cult regions of an image (with complicated texture) from eroding compression gains achieved on easier regions. Section 2 of this paper presents a description of the SCAN language. An overview of the proposed system followed by more detailed descriptions of key functions are presented in Section 3. Simulation results are presented in Section 4, followed by conclusions which are presented in Section 5. 2. The SCAN language Although the raster scan is the most widely used image scanning sequence, it is by no means the most effective scanning technique for all images or all processing applications. An object detection problem, for example, would achieve faster results using a spiral-out scan starting at the image location where the target is predicted to be located.
The SCAN language is a fractal based, context-free language which sequentially accesses the data of a 2D array, by describing and generating a wide range (near
n £ n! of space ®lling curves, or SCAN patterns, from a short set of simple ones. The basic SCAN language de®ned in Ref. [2] is composed of a family of 15 different pattern primitives. Each pattern in S is represented by a letter designator to form a 15 symbol set called the SCAN alphabet:
S {A; B; C; D; E; H; I; L; O; R; S; W; X; Y; Z}: The symbols of S , or SCAN letters, correspond to the scan orders illustrated in Fig. 2. SCAN language rules enable primitives to be combined hierarchically to form complex scan patterns as illustrated in Fig. 3(a). The SCAN language has also been extended [5] to include rotational and symmetric (re¯ective) transformations as illustrated in Fig. 3(b). Each member of T denotes a simple scan pattern transformation: T {SYMH; SYMV; ROT1; ROT2; ROT3; IDEN}: The symbols SYMH and SYMV denote the horizontally symmetric and vertically symmetric transformations, respectively, and ROT1, ROT2 and ROT3 denote the 90, 180 and 2708 rotation transformations. IDEN speci®es the identity transformation. The result is an easily speci®able family of scan patterns which are expandable to ®t different image sizes and complexities. This SCAN language has a wide variety of applications beyond image compression, including encryption [6], curve ®lling, object detection, symmetry detection and region analysis. 3. The hybrid system The proposed system is illustrated in Fig. 4. The 8-bit gray scale input image is ®rst transformed into an error image using DPCM. Although there is a high probability that the most signi®cant bit plane will be all 0s, it is
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
21
Fig. 2. Graphical representation of 15 basic SCAN primitives. Non-homogeneous patterns (a±l), and homogeneous patterns (m±o).
preserved to insure no ambiguity, and a completely lossless system. Also, DPCM produces a sign bit that is treated as the ninth bit plane. The error image is then divided into 64 £ 64 pixel blocks. This image partitioning is done to isolate portions of the image with different patterns or grains to enable the scan pattern with the closest match a chance to deliver good compression. The 64 £ 64 pixel block size was selected as a compromise between smaller blocks, which allow better matching to a particular scan pattern, and larger blocks, which generate less header overhead bits to identify the scan pattern and Huffman coding algorithms used. Four different SCAN language patterns are applied to each image block in sequence, and seven different preset (static) Huffman codes are utilized. Because the scan pattern that produces the best compression cannot be determined until the Huffman coding results are known, the seven Huffman coding algorithms are applied to the run-length encoder output in parallel. At this stage, the Huffman coded outputs are not preserved, only the bit counts which are used to measure the performance of each scan pattern are saved. A bank of three memory buffers is provided as temporary storage for the run-length encoder output for each bit plane. A run-length encoder bypass path is also provided for the case where the encoder produces reverse compression. During the ®rst of the sequence of four scans, one of the three buffers will be loaded with the raw bit-plane data and a
second buffer will receive the run-length encoded data. The third data buffer will hold the best data from the previous image block, awaiting processing by the Huffman encoder. When this ®rst scan is completed, comparators will determine which of the seven Huffman codes produced the best results, and compare this with the raw binary pass-through data bit count. The smaller of the two data sets will be saved, while the memory buffer holding the other data set will be made available to receive the run-length encoded output from the second of the four scans. When the second scan is completed, the best Huffman coded bit count will be compared with the bit count from the previous scan. Again, the smaller of the two data sets will be saved and the memory buffer holding the other data set will be made available to receive the run-length encoded output from the third of the four scans. This process continues until all four scans have been completed. At this point, data from the previous image block has been passed on through the Huffman encoder to the output buffer. Therefore, two of the three data buffers will again be available for scan cycle of one of the subsequent image blocks while the third buffer holds the best run-length encoded data for the current image block. Once an image block has been scanned with the four-scan sequence, the best scan and best Huffman coding algorithm are known for each bit plane. If a bit-plane bit count is greater than break-even (4096 bits for a 64 £ 64 pixel
Fig. 3. Two level `nested' scan patterns: (a) ®rst-level Z SCAN on 2 £ 2, with second-level Spiral-In SCAN on 4 £ 4; and (b) ®rst-level Spiral-In SCAN on 4 £ 4 with symmetric and rotational operators, and second-level Raster SCAN on 2 £ 2.
22
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
Fig. 4. The hybrid image compression system diagram.
block), the raw bit-plane data will reside in the appropriate data buffer and be passed on as output. If compression was achieved, run-length encoded data will reside in the appropriate data buffer, and be made available to the Huffman encoder along with the identity of the Huffman code to be used (which was determined during run-length encoding). Because run-length encoded bit-plane data for an image block is Huffman coded in a single pass, at least four bit planes can be Huffman coded in sequence in the same time that the four-scan sequence was accomplished in the previous stage. Also, because the most signi®cant two bitplanes combined contain fewer data bits than the break-even value of 4096, two Huffman encoders can be used to process all nine bit-planes (one Huffman encoder for bit planes 0±4, and the second encoder for bit planes 5±7 and the sign plane).
The Huffman encoded data is passed on to the dual output buffers. Two buffers are provided for each bit plane to enable storage of the Huffman encoder output for the current image block, while the results from the previous image block are awaiting processing through the accumulator to the output. The accumulator function sequentially routes the
Fig. 5. Common DPCM algorithm used to predict the value of pixel X.
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
23
Fig. 6. DPCM algorithms associated with the eight different local image topologies.
`best' encoded data from the nine dual bit-plane output buffers to the system output. Header information is added to the encoded data for each bit plane and each 64 £ 64 image block to indicate which scan pattern and Huffman coding algorithms were used, or where encoder bypassing was performed. 3.1. The DPCM encoder DPCM encoding is a form of predictive coding where the value of a pixel is predicted based on the values of neighbor pixels which are already known. The predicted value is then compared with the actual value and the difference is used to form an error image. If the encoder and decoder each use the same algorithm(s), then only the value of the ®rst pixel and the error image need to be preserved, and lossless compression can be achieved. In the simplest scheme, DPCM is performed using a raster scan and the value of one neighbor pixel, the pixel at x 2 1 for all pixels except the ®rst pixel in each row which uses the pixel at y 2 1: More complex
DPCM schemes use four or more pixels as illustrated in Fig. 5. This image compression system uses a raster scan based adaptive DPCM. The pixels with known values in the local neighborhood of the pixel being predicted are analyzed to understand the local image pattern or grain. This local pattern is classi®ed as one of the eight topologies listed in Fig. 6, and the speci®c DPCM algorithm is selected based on this classi®cation. These algorithms are easily implemented in logic using shift rights and shift lefts to perform the divide and multiply functions. Using the Right Diagonal Grain Topology as an example, the test for this local pattern is as illustrated in Fig. 7 which makes comparisons between the three groups of two pixels illustrated. If these comparisons agree within preset limits, the DPCM algorithm listed in Fig. 6 for the Right Diagonal Grain Topology will be used. Pixels at the image left, right and top borders are treated as special cases due to the reduced set of neighbor pixels for grain testing and for use in the predictive coding algorithms.
Fig. 7. The test for Right Diagonal Grain Topology makes three comparisons as illustrated, to determine if the Right Diagonal Grain DPCM algorithm can be used to predict the value of the pixel at X.
24
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
tor provides the four different scan patterns which are applied to each 64 £ 64 pixel block, while the secondlevel scan generator provides the 64 starting pixel addresses to the ®rst-level generator to enable processing of all 64 blocks of a 512 £ 512 pixel image in a raster scan sequence. 3.2.1. First-level scan generator
Fig. 8. Right Orthogonal SCAN on 4 £ 4 window of an 8 £ 8 image, starting at address 13 (hex).
3.2. Scan generators The ®rst- and second-level scan generators provide the sequencing from one pixel to the next for the run-length encoding bit-plane processing. The ®rst-level scan genera-
Observation of the primitive SCAN patterns illustrated in Fig. 2 makes it clear that the ®rst-level scan generator hardware must be capable of taking a starting address and performing addition and subtraction operations. If N £ N is the image size, for example, and the Right Orthogonal SCAN of Fig. 2(a) is desired, then the sequence of addresses required would be: start address, 11, 1N, 21, start address 1 2, 1N, 1N, 21, 21, and so on. These same operations would apply even if the SCAN window were smaller than the image as in Fig. 8. Fig. 9 illustrates the hardware implementation chosen to
Fig. 9. First-level SCAN generator.
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
25
Fig. 10. Second-level SCAN generator (macro-SCAN generator).
generate an addressing stream consistent with any of the SCAN primitives implemented, including the Right Orthogonal SCAN pattern which will be used as an example. Initialization data is loaded into the starting address register (SAR), the N register and the four comparator registers. Scanning to the right, down and to the left is accomplished by adding SAR and 1, SAR and N and SAR and 3FFFE hex plus 1 (2's complement of 21), respectively. A comparator between register counters `RCount' and `RowR' alerts the system to change scan directions from scan down to scan left, and from scan left to `return to the baseline row'. This return to the baseline row is accomplished using one of the Hold registers, which is loaded just prior to starting the downward scan, and which transfers its contents to the SAR register at the completion of the leftward scan. As the scan pattern creates ever larger loops within its de®ned window, counter CCount ultimately equals the window
size stored in ColR, and the scan is terminated to await new inputs from the second-level SCAN generator. This single data unit approach was selected not only to minimize the overall circuit count, but also to improve speed performance by eliminating the bussing or multiplexing which would be required to funnel multiple unique SCAN generators down to the Address Bus. Each of the SCAN patterns implemented in the ®rst- and secondlevel SCAN generators can be applied to an input image at the rate of one pixel per system clock cycle with no clock cycles lost in the partial handshake between the ®rst- and second-level generators. 3.2.2. Second-level SCAN generator Unlike the ®rst-level generators which increment around the assigned window in steps as small as one pixel, the
26
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
Fig. 11. Static Huffman codes. Algorithms 0, 1 and 3±6 are applied to the run-length encoder output. Algorithms 1±3 are applied to the `alternating ones' encoder output (d indicates data bits).
second-level generator is required to increment around the assigned second-level window in steps no smaller than the ®rst-level SCAN window size. For this system and a 512 £ 512 image, the SCAN algorithm of R8#D64 would require the second-level SCAN generator to provide the following sequence of addresses for the ®rst-level generator start points: 0, 64, 128, 192, 256, 320, 384, 448, 32,768, 32,832, 32,896 and so on. For an image size of N, and a ®rst-level window size of W, a second-level SCAN generator producing a Right Orthogonal SCAN must perform the following sequence of operations to accomplish its task:
start address, 1W, 1W £ N; 2W, start address 1 2 £ W, 1W £ N; 1W £ N; 2W, 2W, and so on. The major additional requirement for second-level SCAN generators is the ability to multiply the ®rst-level window size by the image size to effect scanning in the vertical direction. This multiplication is implemented by loading the image size into a shift register, then performing one shift left for a ®rst-level window size of 2 £ 2, two shift lefts for a ®rst-level window size of 4 £ 4, three shift lefts for a ®rst-level window size of 8 £ 8 and so on. Although computational time increases proportionally to the ®rst-level SCAN window size, larger
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
27
will produce the same results. Therefore, only one secondlevel SCAN pattern is required. For this speci®c application, the second-level SCAN generator function could be implemented easily using a memory table to provide the 64 preset starting addresses to the ®rst-level generator. 3.3. Run-length encoders Fig. 12. Two-level nested scans with the discontinuities between ®rst-level patterns highlighted with bold lines.
®rst-level windows require more time to SCAN, so suf®cient time is always available. Fig. 10 illustrates the data unit for the second-level SCAN generator. This design is very similar to the ®rst-level SCAN generator data unit due to the similarity of tasks. The requirement for the second-level generator to scan left and right is in increments of W (®rst-level window size) versus increments of one pixel required for the addition of the MDLTA register. The requirement for the second-level generator to scan vertically in increments of W £ N versus increments of N required the N register to be replaced with the MDDLTA (macro down delta) shift left register and the addition of the MCnt (macro counter shift right register) to perform the W £ N multiplications. The second-level generator output addresses are developed and loaded into MReg to be ready and waiting when required by the ®rst-level SCAN generator. When the ®rstlevel generator acknowledges that a new address input has been received, the second-level generator processes the next address and loads it into MReg to be ready for the next ®rstlevel generator cycle. The second-level SCAN generator architecture can provide a wide range of second-level SCAN patterns. Because the proposed system processes each 64 £ 64 pixel image block independently, all second-level SCAN patterns
Run-length encoding produces compression by replacing a sequence of binary data with a sequence of hexadecimal numbers, each of which represents the number of occurrences of the same binary value in an unbroken string. Run-length encoding the binary data string 0000000110000001110000 for example, would produce 7, 2, 6, 3, 4. In this case, the 22 bits of binary data have been compressed into ®ve words of 3 bits each, for a compression ratio of 1.47:1. If the above bit-plane data string was preceded by a binary one, the run-length encoded output would be 0, 5, 5, 6, 3, 5, with the leading zero indicating that the bit-plane data string starts with a 1, not a 0. Because the more signi®cant bit planes have a low percent of binary 1s, the associated run-length encoded output will frequently have relatively large numbers alternating with very small numbers indicating large strings of binary 0s separated by very small strings of binary 1s. Often, the large strings of 0s are separated by single binary 1s. A run-length encoded output of 23, 1, 9, 1, 34, 1,¼ (with values greater than 1 alternating with values of 1) is typical of these more signi®cant bit planes. Because of this situation, a second run-length encoding algorithm was also used. This second encoding algorithm termed `alternating ones encoding' would generate an output of 23, 9, 34,¼ for the above data. For the case where the run-length encoded data has two numbers greater than 1 in sequence, a 0 is inserted in the `alternating ones' encoded output string to indicate this circumstance. As an example, a runlength encoded output of 23, 1, 9, 2, 34, 1,¼ would
Fig. 13. Average prediction error for each DPCM algorithm for Lenna.
28
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
Fig. 14. Predictive coding performance on Lenna. Total bits for one 512 £ 512 pixel bit plane 262,144. Total bits for 512 £ 512 pixel gray scale image 262,144 £ 8 2,097,152.
produce an `alternating ones' encoded output of 23, 9, 0, 2, 0, 34,¼. 3.4. Huffman coding Both the run-length encoding and the `alternating ones' encoding algorithms typically produce a wide range of output values. Because of this, a variable-length coding scheme is necessary to prevent a large word length that just one large encoder output value would require from signi®cantly impacting compression. Although dynamic Huffman coding will produce better compression than static coding (because the dynamic code is speci®cally customized to the data), this superior performance is impacted severely by the need to include the code key as a table along with the header data. Because there is the almost certain probability that the code key will be different for every bit
plane and every image block, the header penalty would be too great. In this system, seven different preset static Huffman codes are used. Six of these codes are applied to the run-length encoder output, and three codes are applied to the `alternating ones' encoder output. Each of these coding algorithms are illustrated in Fig. 11. The system will output bit-plane data in sequence, one 64 £ 64 pixel image block at a time. For each bit plane, output data will be based on the best of four scan patterns, and the best of nine run-length encoder algorithms and Huffman code combinations. 4. Simulation results A wide spectrum of bit-plane processing based system architectures were simulated, using the C programming
Fig. 15. Input and output images associated with the adaptive DPCM process.
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
29
Fig. 16. Graphical representation of the 10 SCAN patterns simulated.
language. In addition, selected system functions [7] were coded in the logic design language AHPL [8] and run on HPSIM [9]. The two critical performance parameters associated with this compression system are processing speed and image data compression. At 30 frames per second, 512 £ 512 pixel images must be processed at a rate of one pixel every 127 ns. Simulation results veri®ed that the system could process four scans in sequence at the rate of one new pixel every system clock cycle. Thirty nanosecond clocks enable the sequence of four scans to be completed at the rate of 30 images per second. Image data compression performance widely varies from image to image. Large regions containing random patterns such as a forest, ocean waves or the curly hair of a portrait subject cause less compression. Large homogeneous regions such as the blue sky obviously enable improved compression. Compression performance also varied for different SCAN patterns. Continuous scans, such as continued raster and diagonal (SCAN letters C and D) produced compression
results which were better than discontinuous scans such as Right Orthogonal and raster (SCAN letters A and R). In addition, single level scans produced compression results which were better than multiple level `nested' scans, even though the nested scans were made up of continuous scan primitives. Simple one-level continuous scans are more effective for image compression due to two factors. First, single-level continuous scans always move from one neighbor pixel to the next (the best place to ®nd pixel values nearly the same), as opposed to the discontinuities that exist in most multiple-level nested scans as illustrated in Fig. 12. Second, because a compound scan has shorter segments with more frequent direction changes, there is a much higher probability that boundaries between image regions will be crossed many more times than for a simple scan. The major advantage of compound scans is the higher level of encryption that can be achieved. Critical to the performance of this system is the subdividing of an image into blocks. This not only enables the bypass mode to be used on dif®cult blocks to prevent impacting compression gains made on other blocks, it also
Fig. 17. Compression performance on Lenna for the proposed system using one, four and ten different block scanning patterns. The four-SCAN sequence was 1.45% better than the R SCAN baseline, while the ten-SCAN sequence was only 0.084% better than the four-SCAN sequence.
30
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
Fig. 18. Total system output bits for Lenna for nine different system con®gurations.
enables the most effective scan pattern to be used on each block of the image. This system produces compression gains with the following three functions: 1. DPCM; 2. SCAN language based run-length encoding; 3. Huffman coding. DPCM compression is performed in one pass for each image including the sign bit. The run-length encoding and Huffman coding is a trial and error process which picks the best results from the nine different Huffman coding algorithms and run-length encoding combinations, which are the result of the four different SCAN patterns. The 512 £ 512 gray scale Lenna test image was processed, and the results are presented in the following sections. The DPCM algorithms developed for this system produced an average prediction error of only 4.10 per pixel or 1.6% (4.1/255). Fig. 13 provides the performance details for each element of the DPCM algorithm. In addition, it shows the excellent performance achieved on homogeneous regions of the image, and the signi®cantly poorer performance for the more complex image regions. Fig. 14 illustrates the distribution of binary 1s in each of Lenna's bit planes before and after DPCM, and Fig. 15 provides the input and output images associated with the DPCM process. Before DPCM, the image has a total of 1,032,616 binary 1s in the 8-bit planes, or 49.2% 1s. After DPCM, all binary 1s have been eliminated from the most signi®cant bit plane and very good compression is achieved in bit planes 1±5. However, bit planes 6 and 7 have negligible compression, and a high entropy sign plane has been produced. Testing has shown that the sign plane and the least significant two bit-planes routinely produce minimal to no compression, and require the binary pass-through mode of the system. These three `uncompressible' planes place an upper limit on system compression performance of 8:3, or 2.67:1. It is up to the scanning, run-length encoding and Huffman coding system functions to process bit planes 1± 5 as effectively as possible to minimize erosion of this limit. Testing has also shown that for complex images (such as
Lenna), the most effective scan cannot be determined at the run-length encoder output. The best scan can be selected only by comparing the results from all static Huffman codes applied to all scans. Lenna has been processed using the discontinuous raster scan as a baseline, plus the other nine SCAN patterns illustrated in Fig. 16. Fig. 17 illustrates the system compression performance when only the raster scan is used for run-length encoding, and also when four and ten different SCAN patterns were used. A four-SCAN sequence was ultimately chosen for this system due to the `diminishing returns' that additional SCAN patterns achieve. The continuous raster and continuous diagonal scans and their rotated permutations were chosen because these four patterns provided the best match to the local grain of images in general. 5. Conclusions Overall, this four-scan system has consistently achieved compressions of 5.3 bits/pixel to less than 4.0 bits/pixel, with Lenna resulting in 4.609 bits/pixel for a compression ratio of 1.7356:1. Some compression systems have achieved better performance: Ranganathan et al. [4] report 4.42 bits/ pixel on Lenna using an adaptive quadtree segmentation scheme. Said and Pearlman [10] report 4.17 bits/pixel on Lenna using the S 1 P transform, and Weinberger et al. [11] have achieved 4.15 bits/pixel on Lenna using a context modeling methodology. It is not clear that these systems can operate on images in real time without a massively parallel computer architecture. It was a surprise to see the small effect that additional scan patterns achieved (see Fig. 17). Analysis indicates that the DPCM process eliminates most of the image grain that is necessary for a large number of SCAN patterns to be effective. The data presented in Fig. 18 indicates that the compression difference between using only the R-SCAN and the ten scan sequence is 8% if DPCM is not performed, versus the 1.5% difference for the full system. This ®gure makes it clear that DPCM is the biggest contributor to compression, followed by the Huffman coding, then by multiple scans versus only R-SCAN. The almost 4.5% compression performance advantage
G.W. Drost, N.G. Bourbakis / Microprocessors and Microsystems 25 (2001) 19±31
this system produces over the Wang system [3] is undoubtedly due to a combination of factors, including multiple scans, adaptive DPCM and a reasonably large selection of static Huffman codes. It is likely that some (up to possibly 3%) additional compression improvement could be realized with more work on the DPCM and Huffman coding algorithms, but in general, system functions are very close to optimum. In addition, this SCAN language based hybrid system provides very good encryption for applications where information security is a concern [6]. Using the full system with a four-scan sequence as a baseline, compression reduction for the full system but only one scan is 1.48%; for the four-scan sequence but without DPCM, reduction is 24.3%; for the four-scan sequence but without Huffman coding, reduction is 11.0% (uncompressed image 2,097,152 bits). References [1] N.G. Bourbakis, R. Brause, C. Alexopolos, SCAN image compression/encryption hardware system, Proceedings of the SPIE 2419 (February) (1995) 354±364. [2] N.G. Bourbakis, A language for sequential access of two dimensional array elements, Proceedings of the IEEE Workshop on Languages for Automation, Singapore, August 1986, pp. 52±58. [3] Y. Wang, A set of transforms for lossless image compression, IEEE Transactions on Image Processing 4 (5) (1995). [4] N. Ranganathan, S. Romaniuk, K.R. Namuduri, A lossless image compression algorithm using variable block size segmentation, IEEE Transactions on Image Processing 4 (10) (1995). [5] N.G. Bourbakis, C. Alexopolos, A. Klinger, A parallel implementation of the SCAN language, Computer Languages 14 (4) (1989) 239± 254. [6] C. Alexopoulos, N. Bourbakis, N. Ioannou, Image encryption method using a class of fractals, Journal of Electronic Imaging 4 (3) (1995) 251±259. [7] G.W. Drost, SCAN based lossless image compression application speci®c integrated circuit, Masters thesis, Binghamton University, 1998. [8] F.J. Hill, G.R. Peterson, Digital Systems Ð Hardware Organization and Design, 3rd ed., Wiley, New York, 1987. [9] Z. Navabi, F.J. Hill, User manual for AHPL simulator (HPSIM2) and AHPL compiler (HPCOM), Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721, 1990.
31
[10] A. Said, W.A. Pearlman, An image multiresolution representation for lossless and lossy compression, IEEE Transactions on Image Processing 5 (9) (1996). [11] M.J. Weinberger, J.J. Rissanen, R.B. Arps, Applications of universal context modeling to lossless compression of gray-scale images, IEEE Transactions on Image Processing 5 (4) (1996).
Gary Drost received his BS from RIT in Electrical Science and his MS in Computer Engineering SUNY-Binghamton in 1998. He was for several years with IBM designing digital circuits and his research interests are Digital circuit design, hardware systems for image compression, distributed systems.
Nikolaos G. Bourbakis (IEEE Fellow) received his BS in mathematics from National University of Athens, Athens, Greece, and his PhD in computer science and computer engineering, Dept. of Computer Engineering & Informatics, University of Patras, Patras, Greece, 1983. He currently is a Professor in ECE at BU and a Professor at TUC, GR, and the Director of two Research Labs. He has directed several research projects funded by government and industry. He has published extensively in refereed International Journals and Conference Proceedings. He is an author, co-author or editor of several books. He is the founder and the Editorin-Chief of the International Journal on AI Tools, the Editor-in-Charge of a Research Series of Books in AI (WS Publisher), the Founder and General Chair of IEEE Computer Society Conferences, Symposia and Workshops, and Associate Editor in IEEE and Int. Journals and a Guest Editor in 14 special issues in IEEE and Int. Journals related with his research interests. He is conducting research in Applied Arti®cial Intelligence, Image and Video Processing, Biomedical Engineering and Distributed Computing-Processors Design and VLSI-CAD. His research work has been internationally recognized and granted with several presitgious awards. Some of them are: IBM Author recognition Award 1991, IEEE Outstanding Paper Award ATC 1994, IEEE Computer Society Technical Research Achievement Award 1998, IEEE ICTAI 10 years Research Contribution Award 1999.