Microelectronic Engineering 83 (2006) 972–975 www.elsevier.com/locate/mee
Lossless high-speed data compression for optical interconnects as used in maskless lithography systems Sven-Hendrik Voss *, Maati Talmi Fraunhofer Institute for Telecommunications Heinrich-Hertz-Institut, Berlin D-10587, Germany Available online 26 January 2006
Abstract The maskless lithography technology requires the management and transmission of exposure data at very high data rates. To meet these requirements an effective representation of the data is necessary which leads to the need for lossless data compression. In this paper, common standard compression techniques are compared regarding their applicability and a new lossless compression approach and its hardware implementation is presented. Furthermore, the trade-off between the buffer size as an essential part of the compression scheme and the compression ratio is examined. The problem of DC balance related to optical transmission is explicitly addressed and being solved by a DC-balanced coding. The aim was to develop a cost-optimized method for compressing and decompressing lithography data in real-time and being suitable for circuit integration. The results including a proposed integration in a maskless lithography system are presented here. 2006 Elsevier B.V. All rights reserved. Keywords: Real-time hardware implementation; Lossless compression; DC balance; Error detection
1. Introduction
2. Standard compression techniques
One of the common problems in maskless lithography is the management and transfer of enormous data volumes required to define die structures on a wafer. To achieve effective exposure rates the electronics inside an aperture plate system must be provided with a large amount of data in very short intervals [1]. A production-worthy maskless lithography tool will require a data path with transfer rates in the terabit/s range. First proof-of-concept tools have to demonstrate the transmission of about 40 Gbps over 40 channels with a channel rate of 1 Gbps [2]. To maximize throughput of information it is therefore worthwhile to effectively compress the data. Data compression is an effective means for saving storage space and transmission bandwidth.
At the outset various standard compression techniques have been investigated and analyzed in consideration of applicability for the exposure data which is available in the form of pixel data. The pixel representation is necessary due to an unjustifiable complexity of processing hierarchical formats like GDS-2 in real-time at the decoder side. To take only lossless techniques into account was a basic principle, since the focus is on transmitting exposure data that does not allow any degradation or loss of information. It was found that only a limited number of compression techniques is theoretically suitable for compressing lithographical data. Lempel-Ziv (LZ77 and LZ78) and Lempel-Ziv-Welch (LZW) are compression algorithms based on substitution of recurring symbols with pointers to previous occurrences in a dictionary. They are fast and easy to implement but come with large memory demands. The portable network graphics format (PNG) is used for the storage of raster images. Its compression scheme uses pre-processing to remove data redundancy, followed by the Deflate
*
Corresponding author. Tel.: +49 30 31 00 23 18; fax: +49 30 31 00 22
13. E-mail address:
[email protected] (S.-H. Voss). 0167-9317/$ - see front matter 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.mee.2006.01.019
S.-H. Voss, M. Talmi / Microelectronic Engineering 83 (2006) 972–975
algorithm which can be seen as a combination of LZ77 and Huffman coding. Differential pulse code modulation (DPCM) is widely used in lossless audio compression and in the old lossless JPEG graphics format. DPCM is a predictive coding scheme using information already passed to predict future values and is used for encoding sources with memory, i.e., sources where the variance of the prediction error is much smaller than that of the source itself. DPCM encoding is often followed by Huffman coding. The concepts used in these compression algorithms are simple, but hardware implementations are somewhat more complicated which has to be considered in the applicability question. The algorithms are mostly specified in the standards documents in C pseudo-code, and thus require a translation of a substantially serial implementation into a pipelineable design. Moreover some software concepts cannot be applied in hardware, e.g., dynamic memory allocation. Predisposing a maximum memory size is impractical, because of changing data statistics and varying entropy. For the LZ derivates the problems lie in the management of the look-up tables. Every time a new character is read in, the look-up table has to be searched for a match. If a match is not found, then a new string has to be added. The resulting problems are obvious: the table can quickly get very large and if strings of minor lengths are processed the overhead could easily grow above the byte count of the string itself (negative compression). The look-up table is written in the output stream one entry at a time during compression and then rebuilt when decompressing, requiring an external SRAM (as simplest and fastest option) by all means which is absolutely intolerable on the decoder side because of chip space, packaging and power requirements in the vacuum environment. The problem stays the same with all dictionary-type compression, like the algorithm used in the PNG format. Predictive algorithms like DPCM show some disadvantages in hardware implementation as absolute value and squaring operations are needed. These two steps require extensive circuitry. Another challenge with these algorithms lies in the pipelining. Each value prediction, error calculation, and statistics update must be completed prior to processing the next value thereby thwarting the hardware resources and limiting the processing speed to some Mbps. Against the background of these investigations another approach was pursued in developing a new proprietary coding scheme. 3. Overview of the new compression scheme In proposing the novel lossless compression approach, requirements were first established which include the following conditions: • Raw data is 6 bit coded as 64 grey values deliver sufficient accuracy for layout data [2]. • The compression scheme must be lossless. • The algorithm has to adapt to the changes in the data statistics to maximize compression performance.
973
• The algorithm must be suitable to the intended real-time implementation while minimizing memory and power budgets on encoder (unbinding) and decoder side (mandatory due to vacuum environment). • The trade-off between processing complexity and realtime ability goes for the benefit of a real-time hardware implementation. Lossless compression is possible by taking advantage of those parts of the layout images that imply redundancy. A spatial sub-pixel approach is used, taking spatial correlation among neighboring pixels as well as whole repetitive sequences of pixel values in consideration. No transformations in the spectral domain are needed. The algorithm exploits inner-symbol bit-correlations rather than symbol- or pixel-correlations, meaning that a compression is achieved even if the pixel values are not identical. It is a symmetric method, i.e., almost the same algorithm is used for compression and decompression resulting in about the same processing time. Fig. 1 shows a block diagram of the basic operations needed, whereas Fig. 2 presents a more detailed block diagram of the simplified encoder structure in regard to the subsequent hardware implementation highlighting the main functional components. An analysis of the input data is the first processing stage. The input data is scanned on-the-fly in sequential order. If there are groups of pixel values appearing in identical order in the data stream they are grouped and treated like pattern instead of single values. Concurrently an inner-symbol bitcorrelation check runs on the input data. Therefore, the byte-values are divided into separate bitsections to deliver finer entities for correlations. The number of bitsections (n) is arbitrary depending on the bit-length of the input data symbols. Provided 6 bit coded input data, n = 2 is a rational value. Assuming input data with adequate dynamics, it is plain to see that in binary representation the image structure becomes decomposed in the lower bitsections. It was found that with images containing plain intensity values a change of information over a certain time window in most cases happens in only in one bitsection at a time. The information content of the other bitsections can be utilized for compression. Depending on the outcome of both check routines the type with the higher expected gain is chosen. Aside from that no additional data modelling is performed in the encoder (and decoder), making it considerably simple and fast, as real-time ability is one of the essential requirements. Once correlations or repetitions are detected the 6-bit provided for the representation of each pixel value are ‘‘misused’’ and filled with specific information about the occurrence of corresponding bitsections. Every number of occurrences in correlations and repetitions P2 can be efficiently compressed. For an unambiguous decompression, it is necessary to distinguish between uncompressed pixel values and the information about the occurrence of corresponding bitsections. For this purpose the SFS (semantic flag setting) block is introduced providing a
974
S.-H. Voss, M. Talmi / Microelectronic Engineering 83 (2006) 972–975
Buffer
Sub-pixel analysis
Correlation estimation
Coding decision
Balance Coding
Fig. 1. Block diagram of basic operations in the compression scheme.
Fig. 2. Block diagram of simplified encoder structure highlighting the main functional components.
precise differentiation for both cases by setting an accordant flag in the data stream. Because the data is compressed losslessly on-the-fly the output rate varies over time. Thus, a ‘‘flow control’’ mechanism and a FIFO buffer is provided, described in the figure as ‘‘VR adjust’’. 4. Buffer size The trade-off between buffer size and compression ratio has been investigated and the corresponding results are shown in Table 1. There is a specific ‘‘ideal constellation’’ regarding the buffer size and compression efficiency (i.e., buffer size 64 · 6). Certainly data reductions by a factor more than 1:2 are possible and subject to ongoing research using this new approach. Table 1 Dependency of compression results on buffer size Buffer size (bit)
Random data
Grid structure
Shades structure
4·6 8·6 12 · 6 16 · 6 32 · 6 64 · 6 128 · 6 512 · 6 1024 · 6
1:1.037 1:1.052 1:1.060 1:1.065 1:1.072 1:1.073 1:1.073 1:1.072 1:1.072
1:1.27 1:1.75 1:1.79 1:1.85 1:1.85 1:1.85 1:1.85 1:1.84 1:1.84
1:1.08 1:1.10 1:1.77 1:1.82 1:1.97 1:2.00 1:1.99 1:1.98 1:1.98
5. Balance coding By preparing the data for optical transmission another requirement, namely the so-called DC balance, is added. For this purpose a proprietary coding scheme is proposed comprising the compression implementation, guaranteeing DC balance and moreover providing error detection. The proposed balance coding scheme features a method of encoding 6-bit data into an 8-bit form to ensure the data is constructed of equal ones and zeros. An overhead of possible code-words can be used to signal the compressed/ uncompressed state and thereby changing the interpretation of a certain codeword. Taking the DC-balance as a coding rule it is possible to detect errors in the data stream through code violations, since every odd number of erroneous bits results in code violations and can easily be implemented in the form of a look-up table. 6. Implementation on FPGA platform After building a rough software reference model for concept proof of the algorithm, efforts were taken further to fully implement the encoder/decoder structure on the hardware. Therefore the whole design was optimized for a pipelineable design. The underlying hardware platform was made up by a Xilinx Virtex-II Pro FPGA (XC2VP50). The arithmetic operations add up to 304 comparators and 48 adders/subtractors under total omission of multiplication or division and the provided memory was fixed to
S.-H. Voss, M. Talmi / Microelectronic Engineering 83 (2006) 972–975
approximately 3 kbit leading to a cost expressed in approx. 18 k transistors for a full one-channel implementation. The implementation gains a throughput of 750 Mbps with a coding delay of 1.568 ls at the beginning of the data transmission. With this result one can say that with an ASIC implementation an additional increase of factor 2 is achievable. Further optimization and accompanying technology progression will allow for an expected throughput of about 2 Gbps and above. 7. Conclusion A novel lossless compression approach and its hardware implementation was presented. Because the compressed data is generated on-the-fly using minimum memory and processing resources the encoding time is faster than conventional algorithms using block-based search methods and methods known from image processing, making the concept well suitable for high-speed applications. Although the idea of data compression for maskless lithography has already been addressed in literature, previous efforts always
975
failed in the required realtime ability or hardware complexity [3]. Particularly the possibility to decode the compressed data in real-time in the vacuum environment (crucial restrictions to the allowable power dissipation) and the available chip space in the vacuum area were considered. The proposed coding scheme sets priorities in the real-time ability, thereby certainly forfeiting some coding efficiency making it attractive for transmission rather than storage applications. The percentage data reduction may not be the best achievable considering all other available techniques; however, its simplicity and adaptivity does allow cost-optimized high-speed hardware implementation and applicability to not only image data. References [1] S.-H. Voss et al., High-speed data storage and processing for projection mask-less lithography systems, Microelectronic Engineering, MNE 2005 International Conference. [2] C. Brandstaetter, PML2 – Technology Results, SEMATECH Industry maskless meeting talk, San Jose, CA, USA, 17–19 January 2005. [3] V. Dai, A. Zakhor, Proc. SPIE 4688 (2002).