Lossless high-speed data compression for optical interconnects as used in maskless lithography systems

Microelectronic Engineering 83 (2006) 972–975 www.elsevier.com/locate/mee Lossless high-speed data compression for optical interconnects as used in m...

Download PDF

140KB Sizes 0 Downloads 33 Views

Report

PDF Reader
Full Text

Microelectronic Engineering 83 (2006) 972–975 www.elsevier.com/locate/mee

Lossless high-speed data compression for optical interconnects as used in maskless lithography systems Sven-Hendrik Voss *, Maati Talmi Fraunhofer Institute for Telecommunications Heinrich-Hertz-Institut, Berlin D-10587, Germany Available online 26 January 2006

Abstract The maskless lithography technology requires the management and transmission of exposure data at very high data rates. To meet these requirements an eﬀective representation of the data is necessary which leads to the need for lossless data compression. In this paper, common standard compression techniques are compared regarding their applicability and a new lossless compression approach and its hardware implementation is presented. Furthermore, the trade-oﬀ between the buﬀer size as an essential part of the compression scheme and the compression ratio is examined. The problem of DC balance related to optical transmission is explicitly addressed and being solved by a DC-balanced coding. The aim was to develop a cost-optimized method for compressing and decompressing lithography data in real-time and being suitable for circuit integration. The results including a proposed integration in a maskless lithography system are presented here. 2006 Elsevier B.V. All rights reserved. Keywords: Real-time hardware implementation; Lossless compression; DC balance; Error detection

1. Introduction

2. Standard compression techniques

One of the common problems in maskless lithography is the management and transfer of enormous data volumes required to deﬁne die structures on a wafer. To achieve eﬀective exposure rates the electronics inside an aperture plate system must be provided with a large amount of data in very short intervals [1]. A production-worthy maskless lithography tool will require a data path with transfer rates in the terabit/s range. First proof-of-concept tools have to demonstrate the transmission of about 40 Gbps over 40 channels with a channel rate of 1 Gbps [2]. To maximize throughput of information it is therefore worthwhile to eﬀectively compress the data. Data compression is an eﬀective means for saving storage space and transmission bandwidth.

At the outset various standard compression techniques have been investigated and analyzed in consideration of applicability for the exposure data which is available in the form of pixel data. The pixel representation is necessary due to an unjustiﬁable complexity of processing hierarchical formats like GDS-2 in real-time at the decoder side. To take only lossless techniques into account was a basic principle, since the focus is on transmitting exposure data that does not allow any degradation or loss of information. It was found that only a limited number of compression techniques is theoretically suitable for compressing lithographical data. Lempel-Ziv (LZ77 and LZ78) and Lempel-Ziv-Welch (LZW) are compression algorithms based on substitution of recurring symbols with pointers to previous occurrences in a dictionary. They are fast and easy to implement but come with large memory demands. The portable network graphics format (PNG) is used for the storage of raster images. Its compression scheme uses pre-processing to remove data redundancy, followed by the Deﬂate

*

Corresponding author. Tel.: +49 30 31 00 23 18; fax: +49 30 31 00 22

13. E-mail address: [email protected] (S.-H. Voss). 0167-9317/$ - see front matter 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.mee.2006.01.019

S.-H. Voss, M. Talmi / Microelectronic Engineering 83 (2006) 972–975

algorithm which can be seen as a combination of LZ77 and Huﬀman coding. Diﬀerential pulse code modulation (DPCM) is widely used in lossless audio compression and in the old lossless JPEG graphics format. DPCM is a predictive coding scheme using information already passed to predict future values and is used for encoding sources with memory, i.e., sources where the variance of the prediction error is much smaller than that of the source itself. DPCM encoding is often followed by Huﬀman coding. The concepts used in these compression algorithms are simple, but hardware implementations are somewhat more complicated which has to be considered in the applicability question. The algorithms are mostly speciﬁed in the standards documents in C pseudo-code, and thus require a translation of a substantially serial implementation into a pipelineable design. Moreover some software concepts cannot be applied in hardware, e.g., dynamic memory allocation. Predisposing a maximum memory size is impractical, because of changing data statistics and varying entropy. For the LZ derivates the problems lie in the management of the look-up tables. Every time a new character is read in, the look-up table has to be searched for a match. If a match is not found, then a new string has to be added. The resulting problems are obvious: the table can quickly get very large and if strings of minor lengths are processed the overhead could easily grow above the byte count of the string itself (negative compression). The look-up table is written in the output stream one entry at a time during compression and then rebuilt when decompressing, requiring an external SRAM (as simplest and fastest option) by all means which is absolutely intolerable on the decoder side because of chip space, packaging and power requirements in the vacuum environment. The problem stays the same with all dictionary-type compression, like the algorithm used in the PNG format. Predictive algorithms like DPCM show some disadvantages in hardware implementation as absolute value and squaring operations are needed. These two steps require extensive circuitry. Another challenge with these algorithms lies in the pipelining. Each value prediction, error calculation, and statistics update must be completed prior to processing the next value thereby thwarting the hardware resources and limiting the processing speed to some Mbps. Against the background of these investigations another approach was pursued in developing a new proprietary coding scheme. 3. Overview of the new compression scheme In proposing the novel lossless compression approach, requirements were ﬁrst established which include the following conditions: • Raw data is 6 bit coded as 64 grey values deliver suﬃcient accuracy for layout data [2]. • The compression scheme must be lossless. • The algorithm has to adapt to the changes in the data statistics to maximize compression performance.

973

• The algorithm must be suitable to the intended real-time implementation while minimizing memory and power budgets on encoder (unbinding) and decoder side (mandatory due to vacuum environment). • The trade-oﬀ between processing complexity and realtime ability goes for the beneﬁt of a real-time hardware implementation. Lossless compression is possible by taking advantage of those parts of the layout images that imply redundancy. A spatial sub-pixel approach is used, taking spatial correlation among neighboring pixels as well as whole repetitive sequences of pixel values in consideration. No transformations in the spectral domain are needed. The algorithm exploits inner-symbol bit-correlations rather than symbol- or pixel-correlations, meaning that a compression is achieved even if the pixel values are not identical. It is a symmetric method, i.e., almost the same algorithm is used for compression and decompression resulting in about the same processing time. Fig. 1 shows a block diagram of the basic operations needed, whereas Fig. 2 presents a more detailed block diagram of the simpliﬁed encoder structure in regard to the subsequent hardware implementation highlighting the main functional components. An analysis of the input data is the ﬁrst processing stage. The input data is scanned on-the-ﬂy in sequential order. If there are groups of pixel values appearing in identical order in the data stream they are grouped and treated like pattern instead of single values. Concurrently an inner-symbol bitcorrelation check runs on the input data. Therefore, the byte-values are divided into separate bitsections to deliver ﬁner entities for correlations. The number of bitsections (n) is arbitrary depending on the bit-length of the input data symbols. Provided 6 bit coded input data, n = 2 is a rational value. Assuming input data with adequate dynamics, it is plain to see that in binary representation the image structure becomes decomposed in the lower bitsections. It was found that with images containing plain intensity values a change of information over a certain time window in most cases happens in only in one bitsection at a time. The information content of the other bitsections can be utilized for compression. Depending on the outcome of both check routines the type with the higher expected gain is chosen. Aside from that no additional data modelling is performed in the encoder (and decoder), making it considerably simple and fast, as real-time ability is one of the essential requirements. Once correlations or repetitions are detected the 6-bit provided for the representation of each pixel value are ‘‘misused’’ and ﬁlled with speciﬁc information about the occurrence of corresponding bitsections. Every number of occurrences in correlations and repetitions P2 can be eﬃciently compressed. For an unambiguous decompression, it is necessary to distinguish between uncompressed pixel values and the information about the occurrence of corresponding bitsections. For this purpose the SFS (semantic ﬂag setting) block is introduced providing a

974

S.-H. Voss, M. Talmi / Microelectronic Engineering 83 (2006) 972–975

Buffer

Sub-pixel analysis

Correlation estimation

Coding decision

Balance Coding

Fig. 1. Block diagram of basic operations in the compression scheme.

Fig. 2. Block diagram of simpliﬁed encoder structure highlighting the main functional components.

precise diﬀerentiation for both cases by setting an accordant ﬂag in the data stream. Because the data is compressed losslessly on-the-ﬂy the output rate varies over time. Thus, a ‘‘ﬂow control’’ mechanism and a FIFO buﬀer is provided, described in the ﬁgure as ‘‘VR adjust’’. 4. Buﬀer size The trade-oﬀ between buﬀer size and compression ratio has been investigated and the corresponding results are shown in Table 1. There is a speciﬁc ‘‘ideal constellation’’ regarding the buﬀer size and compression eﬃciency (i.e., buﬀer size 64 · 6). Certainly data reductions by a factor more than 1:2 are possible and subject to ongoing research using this new approach. Table 1 Dependency of compression results on buﬀer size Buﬀer size (bit)

Random data

Grid structure

Shades structure

4·6 8·6 12 · 6 16 · 6 32 · 6 64 · 6 128 · 6 512 · 6 1024 · 6

1:1.037 1:1.052 1:1.060 1:1.065 1:1.072 1:1.073 1:1.073 1:1.072 1:1.072

1:1.27 1:1.75 1:1.79 1:1.85 1:1.85 1:1.85 1:1.85 1:1.84 1:1.84

1:1.08 1:1.10 1:1.77 1:1.82 1:1.97 1:2.00 1:1.99 1:1.98 1:1.98

5. Balance coding By preparing the data for optical transmission another requirement, namely the so-called DC balance, is added. For this purpose a proprietary coding scheme is proposed comprising the compression implementation, guaranteeing DC balance and moreover providing error detection. The proposed balance coding scheme features a method of encoding 6-bit data into an 8-bit form to ensure the data is constructed of equal ones and zeros. An overhead of possible code-words can be used to signal the compressed/ uncompressed state and thereby changing the interpretation of a certain codeword. Taking the DC-balance as a coding rule it is possible to detect errors in the data stream through code violations, since every odd number of erroneous bits results in code violations and can easily be implemented in the form of a look-up table. 6. Implementation on FPGA platform After building a rough software reference model for concept proof of the algorithm, eﬀorts were taken further to fully implement the encoder/decoder structure on the hardware. Therefore the whole design was optimized for a pipelineable design. The underlying hardware platform was made up by a Xilinx Virtex-II Pro FPGA (XC2VP50). The arithmetic operations add up to 304 comparators and 48 adders/subtractors under total omission of multiplication or division and the provided memory was ﬁxed to

S.-H. Voss, M. Talmi / Microelectronic Engineering 83 (2006) 972–975

approximately 3 kbit leading to a cost expressed in approx. 18 k transistors for a full one-channel implementation. The implementation gains a throughput of 750 Mbps with a coding delay of 1.568 ls at the beginning of the data transmission. With this result one can say that with an ASIC implementation an additional increase of factor 2 is achievable. Further optimization and accompanying technology progression will allow for an expected throughput of about 2 Gbps and above. 7. Conclusion A novel lossless compression approach and its hardware implementation was presented. Because the compressed data is generated on-the-ﬂy using minimum memory and processing resources the encoding time is faster than conventional algorithms using block-based search methods and methods known from image processing, making the concept well suitable for high-speed applications. Although the idea of data compression for maskless lithography has already been addressed in literature, previous eﬀorts always

975

failed in the required realtime ability or hardware complexity [3]. Particularly the possibility to decode the compressed data in real-time in the vacuum environment (crucial restrictions to the allowable power dissipation) and the available chip space in the vacuum area were considered. The proposed coding scheme sets priorities in the real-time ability, thereby certainly forfeiting some coding eﬃciency making it attractive for transmission rather than storage applications. The percentage data reduction may not be the best achievable considering all other available techniques; however, its simplicity and adaptivity does allow cost-optimized high-speed hardware implementation and applicability to not only image data. References [1] S.-H. Voss et al., High-speed data storage and processing for projection mask-less lithography systems, Microelectronic Engineering, MNE 2005 International Conference. [2] C. Brandstaetter, PML2 – Technology Results, SEMATECH Industry maskless meeting talk, San Jose, CA, USA, 17–19 January 2005. [3] V. Dai, A. Zakhor, Proc. SPIE 4688 (2002).

Lossless high-speed data compression for optical interconnects as used in maskless lithography systems

Lossless high-speed data compression for optical interconnects as used in maskless lithography systems

Recommend Documents