Efficient implementation of error correction codes in hash tables

Efficient implementation of error correction codes in hash tables

Microelectronics Reliability 54 (2014) 338–340 Contents lists available at ScienceDirect Microelectronics Reliability journal homepage: www.elsevier...

263KB Sizes 0 Downloads 62 Views

Microelectronics Reliability 54 (2014) 338–340

Contents lists available at ScienceDirect

Microelectronics Reliability journal homepage: www.elsevier.com/locate/microrel

Research Note

Efficient implementation of error correction codes in hash tables P. Reviriego a,⇑, S. Pontarelli b, J.A. Maestro a, M. Ottavi b a b

Escuela Politécnica Superior, Universidad Antonio de Nebrija, C. Pirineos 55, Madrid, Spain Department of Electronic Engineering, University of Rome ‘‘Tor Vergata’’, Rome, Italy

a r t i c l e

i n f o

Article history: Received 6 June 2013 Received in revised form 3 August 2013 Accepted 8 August 2013 Available online 3 September 2013

a b s t r a c t Hash tables are one of the most commonly used data structures in computing applications. They are used for example to organize a data set such that searches can be performed efficiently. The data stored in a hash table is commonly stored in memory and can suffer errors. To ensure that data stored in a memory is not corrupted when it suffers errors, Error Correction Codes (ECCs) are commonly used. In this research note a scheme to efficiently implement ECCs for the entries stored in hash tables is presented. The main idea is to use an ECC as the hash function that is used to construct the table. This eliminates the need to store the parity bits for the entries in the memory as they are implicit in the hash table construction thus reducing the implementation cost. Ó 2013 Elsevier Ltd. All rights reserved.

1. Introduction Hash tables are commonly used to store a set of data such that searches on the set can be done efficiently [1]. There are many different types of hash tables but all use one or more hash functions. A hash function maps a block of l bits to a smaller block of m bits. In an ideal hash function, the mapping is balanced such that the same number of combinations of the l bits maps to each one of the combinations of the m bits. Additionally, it is also desired that the mapping is such that similar values of the input map to different values of output. Several algorithms have been proposed to implement hash functions that meet the above properties. Once a hash function H is selected, when a new entry knew is to be added to the table the first step is to compute the index inew = H(knew). This value is then used to calculate the position on which the new entry is stored in the table. This can be done in different ways. In its simplest form, known as separate chaining, a linked list is used for each value of the m bits. The linked list contains all the entries whose H(k) collide on the same position. When the hash function is ideal, searches are efficient as we only need to search in one of the M lists which should be well balanced in length. The concept of a hash table with separate chaining is illustrated in Fig. 1 in which M = 2m. In this case, a new key is added by computing its hash function which gives a value of one (H(knew) = 1) and the new entry is stored at the end of the first linked list. Hash tables are commonly stored in memory and can consume a significant amount of space. The data is therefore exposed to the errors that affect the memory such as radiation induced soft errors [2]. To avoid data corruption in memories, Error Correction Codes (ECCs) are commonly used [3]. ECCs add additional bits to each ⇑ Corresponding author. Tel.: +34 914521100. E-mail address: [email protected] (P. Reviriego). 0026-2714/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.microrel.2013.08.004

memory word which are used to detect and correct errors and therefore increase the area and power consumption of the memory. Typically, the overhead due to the ECC is at least 12.5% for 64 bit data blocks [4,5]. One interesting observation is that ECCs can also be used as hash functions. This can be explained as each of the parity check bits is the logical XOR of a different subset of the data bits. Therefore, the parity bits tend to be uniformly distributed and similar data words produce different parity bits. In particular, when used as a hash function, an ECC with a given minimum Hamming distance dmin will not have collisions for blocks that have dmin 1 or fewer bits different. The use of ECC as hash functions has been explored for example for BCH codes in [6] to provide flexible hash functions that can be used for different block sizes. This work however, did not consider using the ECC for error detection or correction in hash tables. In this research note, the use of ECCs as hash functions is approached from a different perspective. The goal in our case is to reduce the cost of implementing error correction for the entries stored in the hash table. This is achieved by embedding the parity bits into the position of the list in which a key is stored. This eliminates the need to physically store the parity bits in the memory thus reducing the implementation cost. The present research note introduces the idea and shows its potential using a case study. The detailed study of the performance of commonly used ECCs as hash functions is not considered in this research note.

2. Proposed scheme A linear block error correction code takes an input block of k bits and maps it to a block of n bits where n is larger than k. In many cases, systematic codes are used such that the first k bits of the output block are the same as the input bits and the remaining n k

339

P. Reviriego et al. / Microelectronics Reliability 54 (2014) 338–340

knew

1 2 3 4 ... M-1 M

H(knew)

k11 k21 k31 k 41

k 42

kM1

kM2

k12

Fig. 1. Example of adding an entry in a hash table with separate chaining.

are the added bits also known as parity check bits [7]. Both the data and the parity checks are stored in the memory thus increasing the area and power relative to the unprotected memory. To detect errors, the n k parity checks are recomputed using the stored k data bits and compared with the n k stored parity checks. When there are differences, errors have occurred and, based on the specific pattern of the differences, the errors may be corrected. The number of correctable bit errors depends on the minimum Hamming distance dmin of the code and on the decoding algorithm used. ECCs used to protect memories have typically a minimum distance of four and are known as Single Error Correction Double Error Detection (SEC–DED) codes [8]. The mapping of the k input bits to the n k parity bits can be seen as a hash function. In fact, as mentioned in the introduction, as parity check bits are the XOR of different subsets of the data bits, they tend to be uniformly distributed. In addition, for two data blocks to experience a collision, they have to be different in at least dmin bits. This ensures that similar blocks will map to different hash values. Let us consider a hash table with separate chaining in which an ECC is used as the hash function. An interesting observation is that in this case, the ECCs bits for each entry are implicit in the index of the list in which the entry is stored. Therefore, error detection and correction can be achieved by re-computing the parity check bits on the stored entry and comparing them with the index of the list in which the entry is stored. This is illustrated in Fig. 2 where an entry in the first list is checked for errors. The proposed scheme removes the need to store the parity bits for each entry thus reducing the cost. Another more subtle advantage is that now errors can affect the data bits but not the ECC parity check bits. This could be eventually used to optimize the ECC, increasing the error detection capabilities, as shown in [9], or reducing the decoding delay as shown in [10]. 3. Evaluation A case study will be used to show the benefits of the proposed scheme. The purpose is to illustrate that in a real application and using two common ECCs, the scheme is effective in terms both of

HECC(k11)

Equal Yes No error

No Error

1 2 3 4 ... M-1 M

k11 k21 k31 k41

kM1

the performance of the hash table and the correction of errors. Better results for the hash table performance can possibly be obtained by optimizing the ECC or by applying some bit reordering prior to the use of the ECC. The application considered is the network monitoring of the IP flows traversing a high speed link. For example, for each pair of source and destination IP addresses that exchange packets over the link we want to count the number of packets and bytes. For this a data structure that allows a fast search using as key the 64 bits of the two IP addresses in needed. This can be implemented with a hash table. In particular a hash table with m = 16 bits such that there are M = 64 K lists on which the entries can be placed is considered. For this example, a Double Error Correction–Triple Error Detection (DEC–TED) BCH code with a minimum distance dmin = 6 is used as a hash function. This code requires 15 bits for 64 data bits. Therefore, one additional bit needs to be computed to obtain the hash index. This is done by simply selecting the least significant bit of the entry. To evaluate the goodness of the hash function, the length of the longest list among all M lists is used. This is commonly known as the Longest Length Probe (LLP) sequence and determines the worst case for accessing a value stored in the table. The results are also compared with the traditional H3 hash function [11]. To ensure that the distribution of IP addresses is realistic, packet traces from the 2012 CAIDA Anonymized Internet Traces Dataset have been used [12]. The selected trace contains 21GBytes of traffic and is composed of 28 million packets and 1.4 million IP flows. The simulation procedure is as follows, a number of entries taken randomly from the trace are added to the hash table and the LLP is measured. Different load factors (i.e. the ratio between number of entries and table size) are used to evaluate the LLP relative to the number of entries stored. Finally, the experiment is repeated three hundred times for each configuration and the average value is reported. The results are summarized in Table 1. It can be observed that the traditional H3 hashing and the proposed BCH hash have similar result in all cases. The results for BCH show only a small degradation that is around 5%. This shows that using a BCH code as a hash function does not have a significant impact on the effectiveness of the hash table. As mentioned before, the ECC could be optimized to improve its performance as a hash function. The cost of implementing the DEC–TED BCH code is 15 bit per entry in the hash table. Therefore, for example the saving for the case study considered that has 64 K entries is 960 Kbit of memory. This clearly illustrates the benefits of the proposed technique. To complete the case study, a second ECC was considered. In particular a SEC–DED code was used. For 64 data bits, a SEC–DED code requires 8 data bits. The remaining 8 bits of the hash index were obtained using the H3 hash function. Therefore this second example shows how traditional hashing and ECC can also be combined. The results for the LLP are presented in Table 1 and are similar to those of the BCH code and the H3 hash function. The memory savings in this case are 512 Kbit of memory as only 8 bits are needed per entry to implement a SEC–DED code. Finally, to put the benefits of the proposed scheme in perspective, Table 2 summarizes the memory savings for different values of the word sizes and also for different ECCs when M is 64 K. In particular,

k12 Table 1 Longest length probe (LLP) sequence averaged over 300 experiments.

k42

kM2

Fig. 2. Illustration of error detection in the proposed scheme.

Load factor

H3 hash

BCH hash

SEC–DED/H3 hash

0.5 0.6 0.7 0.8 0.9

5.7 6.1 6.5 6.8 7.2

5.9 6.3 6.8 7.2 7.5

6.0 6.2 6.3 6.8 7.4

340

P. Reviriego et al. / Microelectronics Reliability 54 (2014) 338–340

Table 2 Memory savings in Kbits when using the proposed scheme for different word sizes when M = 64 K. Word size

SEC–DED

DEC–TED

16 32 64 128 256

384 448 512 576 640

704 832 960 1088 1216

SEC–DED and DEC–TED codes are considered as they are commonly used to protect memories. The savings for other values of M are proportional and can be obtained by dividing the data in the table by 64 K and multiplying by the new value of M. It can be seen that the savings are significant in all cases and larger for the DEC–TED codes. In many cases, the memory devices used to store the hash table may already provide some protection against errors, like for example Single Parity Check (SPC) or a Single Error Correction (SEC) code [3]. In those cases, the proposed scheme can be used to provide additional protection. For a memory protected with SPC, error correction can be provided for the entries. For a SEC code, additional error correction capabilities like double error correction can be implemented. Therefore the proposed scheme can also be used to enhance the protection already implemented in the memory devices. 4. Conclusions In this research note, it has been shown that is possible to efficiently implement error detection and correction in hash tables by using an ECC as the hash function. With the proposed scheme, there is no need to store the ECC parity check bits in the memory thus reducing the area and power consumption. A case study has been presented to illustrate the benefits in a practical scenario.

The proposed scheme can be optimized by selecting or designing ECCs that have good properties as hash functions. Another possibility could be to randomize the data bits prior to the ECC to ensure good performance of the ECC hashing. The randomization would be undone in the read process after the ECC decoding. These ideas as well as the application of the proposed scheme to other case studies are left for future work. References [1] Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms. 3rd ed. MIT Press and McGraw-Hill; 2009. [2] Nicolaidis M. Design for soft error mitigation. IEEE Trans Dev Mater Reliab 2005;5(3):405–18. [3] Chen CL, Hsiao MY. Error-correcting codes for semiconductor memory applications: a state-of-the-art review. IBM J Res Dev 1984;28(2):124–34. [4] Gherman V, Evain VS, Auzanneau F, Bonhomme Y. Programmable extended SEC–DED codes for memory errors. In: IEEE VLSI test symposium (VTS); 2011. p. 140–5. [5] Neale A, Sachdev M. A new SEC–DED error correction code subclass for adjacent MBU tolerance in embedded memory. IEEE Trans Dev Mater Reliab 2013;13(1):223–30. [6] Grossman JP, Jakab L. Using the BCH construction to generate robust linear hash functions. In: IEEE information theory, workshop; 2004. [7] Lin S, Costello DJ. Error control coding. 2nd ed. Englewood Cliffs, NJ: PrenticeHall; 2004. [8] Richter M, Oberlaender K, Goessel M. New linear SEC–DED codes with reduced triple bit error miscorrection probability. In: IEEE on-line testing, symposium; 2008. p. 37–42. [9] Reviriego P, Liu S-F, Lee S, Maestro JA. Efficient error detection in double error correction orthogonal latin squares codes. In: Second workshop on manufacturable and dependable multicore architectures at nanoscale (MEDIAN’13). May 30–31, 2013. [10] Reviriego P, Pontarelli S, Maestro JA, Ottavi M. A method to construct low delay single error correction (SEC) codes for protecting data bits only. IEEE Trans Comput Aided Des Integr Circ Syst 2013;32(3):479–83. [11] Ramakrishna MV, Fu E, Bahcekapili E. Efficient hardware hashing functions for high performance computers. IEEE Trans Comput 1997;46(12):1378–81. [12] CAIDA Anonymized Internet Traces 2012 Dataset.