Average Case Analysis of Searching in Associative Processing

journal of parallel and distributed computing 54, 133161 (1998) article no. PC981461 Average Case Analysis of Searching in Associative Processing Pa...

Download PDF

804KB Sizes 1 Downloads 80 Views

Report

PDF Reader
Full Text

journal of parallel and distributed computing 54, 133161 (1998) article no. PC981461

Average Case Analysis of Searching in Associative Processing Panagiotis E. Nastou* Computer Technology Institute (CTI ), 70, G. Papandreou Street, GR-15773 Zografos, Athens, Greece

Dimitrios N. Serpanos Institute of Computer Science, Foundation for Research and TechnologyHellas, P.O. Box 1385, GR-71110 Heraklion, Crete, Greece

and Dimitrios G. Maritsas* Computer Technology Institute (CTI ), 70, G. Papandreou Street, GR-15773 Zografos, Athens, Greece Received December 4, 1995; revised May 11, 1998; accepted May 11, 1998

We introduce an average case analysis of the search primitive operations (equality and thresholding) in associative memories. We provide a general framework for analysis, using as parameters the word space distribution and the CAM size parameters: m (number of memory words) and n (memory word length). Using this framework, we calculate the probability that the whole CAM memory responds to a search primitive operation after comparing up to k most significant bits (1kn) in each word; furthermore, we provide a closed formula for the average value of k and the probability that there exists at least one memory word that equals the centrally broadcast word. Additionally, we derive results for the cases of uniform and exponential distribution of word spaces. We prove that in both cases the average value of k depends strongly on lg m, when n>lg m: for the case of uniform distribution, the average value is practically independent of n, while in the exponential depends weakly on the difference between the sample space size 2 n and the CAM size m. Furthermore, in both cases, the average k is approximately n when nlg m. Verification of our theoretical results through massive simulations on a parallel machine is presented. One of the main results of this work, that the average value of k can be much smaller than n

* Also, with the Department of Computer Engineering and Information Sciences, University of Patras, Patras, Greece. Also, with the Department of Computer Science, University of Crete, Heraklion, Crete, Greece.

133

0743-731598 25.00 Copyright 1998 by Academic Press All rights of reproduction in any form reserved.

File: DISTL1 146101 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 4357 Signs: 2294 . Length: 58 pic 2 pts, 245 mm

134

NASTOU, SERPANOS, AND MARITSAS

or even practically independent of n in some cases, has an important practical effect: associative memories can be designed with fast execution times of threshold primitives and low implementation complexity, leading to high performance associative memories that can scale up to sizes larger than previous designs at a low cost. 1998 Academic Press, Inc.

1. INTRODUCTION A Content Addressable Memory (CAM) is a memory system which can access its stored data by their contents rather than by their address in memory. One of its main characteristics is its capability to perform search operations, where memory word contents are compared with centrally broadcast words. The most typical search operation is the Equality Search, where CAM indicates the memory words that match a broadcast word. The system can provide high performance due to the inherent parallelism in searching all CAM words (and possibly bits). The general structure of a CAM is shown in Fig. 1. It is a two-dimensional array of identical processing elements (PEs), where the unit processing element is a onebit cell that performs the standard readwrite memory operationssimilarly to random access memoryand also contains sufficient logic to compare its bit content with the corresponding bit content of a global register called the Comparand Register (register CR in Fig. 1). The comparand register contains the data to be written in memory (or accepts a data word read) or, the data for a search in the CAM. The results of a search operation, e.g., the identification of the CAM words matching CR in an Equality Search, are written in the Tag Register (register TR in Fig. 1). In general, it is not necessary that all bits in a word, or that all words, participate in a CAM operation: a Mask Register (register MR in Fig. 1) identifies the bits of the comparand register and the bit-columns of the CAM that are involved

FIG. 1.

The conceptual model of a CAM.

SEARCHING IN ASSOCIATIVE PROCESSING

135

in an operation, while the subset of CAM words participating in the operation is specified by the Word-Select Register (register WSR in Fig. 1). So, during an equality search operation, for example, if WSR(i)=1, then the ith CAM word compares its contents with the comparand register contents, and if the unmasked bits match, it sets the TR(i) bit. In addition to the Equality Search operation described above, there exist many other Search Operations that can be performed over the whole CAM. These operations identify memory words with values that satisfy certain conditions: Threshold Searches identify words with values greater or less than CR; Adjacency Searches identify words with values closest to CR either from above or below; Extremum Searches identify the words with the maximum or minimum value in the CAM; Double Limit Searches identify words with values between or outside two given limits X and Y. Equality and Threshold Searches can be considered as primitive search operations, because all other search operations can be synthesized by (or analyzed to) them, and because they are directly implemented in each CAM memory word. Many sets of CAM operations have been proposed and implemented considering Equality and Threshold Search operations as primitives by Davis and Lee, Ramamoorthy et al., and Scherson and Ilgen [2, 11, 12], while a few implementations are based on Extremum Searches [11]. A Content Addressable Parallel Processor (CAPP) is built around a CAM. In addition to the CAM, it contains a centralized control unit (CU) which specifies the sequence of primitive operations that implements an application, and synchronizes their execution in the CAM. Clearly, the key parameters to a high performance CAPP are two: the size of the associative memory and the performance of the primitive search operations (since CAPP applications can be viewed as a sequence of primitives, i.e., equality and threshold searches as well as readwrite memory operations). Concerning the memory size, it is well known that associative memories suffer from low density due to the high complexity of the memory cells. Regarding high speed primitives, design enhancements in the memory cells improve performance by taking advantage of the inherent parallelism within CAM word operations. Such enhancements, which aim to achieve higher performance (faster execution) of primitive searches, reveal a trade-off between memory size and execution time of primitives. The trade-off becomes clear with an analysis of the execution of these primitive searches: at the beginning of an Equality or Threshold primitive, each memory word compares in parallel its bits with the corresponding bits of the comparand register. After this step, each word combines the results of the bit comparisons and calculates its response (equal, greater, or less than CR). With the simplest implementation, this combining process is performed bit serially with a time complexity O(n), where n is the word bit length. A specific serial model was used by Ramamoorthy et al. [11]. To improve performance, Davis and Lee explored parallelism in the combining process and proposed the combining word tree topology achieving the significantly improved time complexity O(lg n) [2]. However, the improvement in performance is traded off for significant increase in the implementation complexity introduced by the word tree. So, speed improvement requires a higher complexity CAM cell resulting in even lower memory densities than before.

File: DISTL1 146103 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 3875 Signs: 3557 . Length: 52 pic 10 pts, 222 mm

136

NASTOU, SERPANOS, AND MARITSAS

The disadvantage of the high complexity memory cell required by the word tree originates from the fact that O(lg n) execution time is provided, even for the worst case scenario of a search operation, whereby all n bit comparison results need to be combined. In this paper, we perform an average case analysis of primitive search operations (equality and thresholding) using as parameters the word space distribution and the size variables m and n of the CAM. We show that, for realistic CAM configurations and for the most representative physical distributions, on the average, it is not necessary for all n bits of the memory words to be combined (or compared) in order to obtain a response from all CAM memory words; importantly, only a small number k (klg m, + m, n is strongly affected by lg m. All our theoretical results have

File: DISTL1 146104 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 4028 Signs: 3649 . Length: 52 pic 10 pts, 222 mm

SEARCHING IN ASSOCIATIVE PROCESSING

137

been verified through extensive simulations of several CAM configurations on a massively parallel machine Parsytec GCel 3512. Our work is a complete analysis providing results for variable m, n, and word space distribution and in that sense generalizes the analysis of Davarakis and Maritsas [1], which produces results for a CAM memory size of one word without considering the distribution of the word space in the analysis. The analysis in [1] is based on p c , i.e., the probability of mismatch between peer bits of the comparand register and a single memory word (memory model with one word, m=1), and it is assumed that p c is constant for a memory configuration (there is no indication if and how p c is related to the distribution of the data memory space). Furthermore, an analysis which is given in [1] for a CAM with memory size m>1 is dependent on the memory contents. The paper is organized as follows. Section 2 describes the CAM model used in our analysis, introduces the notation, and gives some definitions that are the basis of our analysis. Section 3 presents the analysis for the case of variable word space distribution presenting a formula for the calculation of + m, n and the probabilities of equality an inequality. Sections 4 and 5 provide the analysis for the uniform and exponential word space distributions, respectively. Section 6 describes our simulator on the Parsytec GCel 3512 and gives experimental results that verify our analytical results. Preliminary results mainly for this general case analysis presented in this paper appear in [9].

2. MODELS, DEFINITIONS, AND NOTATION In this section we describe the CAM model used in the analysis. We also introduce the used notation and provide some definitions that facilitate the presentation of the analysis. 2.1. Associative Memory Model We consider a Content Addressable Memory (CAM) as the one shown in Fig. 1, that consists of a comparand register (CR) and m memory words, each with bit length n. We focus our analysis on the primitive operations for Equality and Threshold Search. In an Equality Search, CAM responds by identifying the words that match (are equal to) the contents of CR, while in a Threshold Search CAM responds by identifying the words that contain values greater (or less, depending on the primitive) than CR. There are two execution models for these primitive operations: the Word Serial Model, where all words in the CAM memory are compared to CR sequentially, and the Word Parallel Model, where the contents of CR are compared with the contents of each CAM word in parallel. These two models provide significantly different performance characteristics: if d i is the delay of word i to respond, then the total response time in the Word Serial Model is d s = m i=1 d i , while in the Word Parallel Model it is d p =max 1im (d i ). Although our analysis applies to both execution models, in the remaining of the paper we assume the Word Parallel Model.

File: DISTL1 146105 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 3476 Signs: 2984 . Length: 52 pic 10 pts, 222 mm

138

NASTOU, SERPANOS, AND MARITSAS

When CAM executes a primitive operation following the Word Parallel Model, the execution can be considered as a sequence of two steps, as described in Section 1. As a first step, all memory words compare their bits with the corresponding bits of CR in a bit parallel fashion. The second step combines in each word (all words in parallel) the results of the n pairwise bit comparisons, in order to calculate the response of the word. Since in our analysis, we calculate the number k of the n pairwise bit comparisons, starting from the most significant bit, that need to be combined in order to obtain a word response (or, a response from all memory words), the method of the combination process (serial model that is used by Ramamoorthy et al. [11], or parallel using a combination tree as Davis and Lee did [2]) is irrelevant to our analysis. Although our results are independent of the method, for simplicity of the presentation of the analysis, we assume the sequential combining process in the remaining of this paper. So, one can view the combination process as a sequence of stages: at the first stage each word checks the result of the comparison of the most significant bit. The words that differ from CR at this bit respond, while the remaining ones continue at the second stage where the result of the comparison of the second most significant bit is checked, and so on. The process continues until all CAM words respond. Figure 2 shows a parallel pseudocode describing the execution of the Threshold Search primitive GREATER using the CAM model which we assume in our presentation; i.e., it follows the Word Parallel Model and performs bit-parallel word comparisons and sequential combining of the results when executing a primitive. As the pseudocode indicates, each CAM cell M[ j][i] is considered to calculate two results (variables) simultaneously: E i and G i , indicating equality and greater, respectively. The serial combination process of our model is illustrated through the structure of the pseudocode in the block:parbegin.. parend where the variable t k defines in each word the most significant bit that it is checked. The execution of the threshold operation is completed when all CAM words set their flags finish j to 1. 2.2. Definitions and Notation Since the memory word length is n, if we denote S=[0, 1], the word sample space for both CR and each memory word is the set F=S n. Assuming an n-bit word w of the set F, the bits of its binary representation are numbered as w n&1 w n&2 } } } w n&i } } } w 1 w 0 , where w n&1 is the most significant bit. As both CR and the memory words take values from F, an instance of the CAM can be represented by a tuple t=(t 0 , t 1 , ..., t m ), where the first coordinate (t 0 ) corresponds to the contents of CR and the remaining m coordinates correspond to the contents of the m memory words. So, tuple t is a member of the set m+1

F (m+1) =F_ } } } _F, and F (m+1) is the sample space of the CAM contents, containing 2 n(m+1) elements. Two tuples t, s # F (m+1) are different, if there exists at least one coordinate where they differ, i.e., if t=(t 0 , t 1 , ..., t m ) and s=(s 0 , s 1 , ..., s m ), then t{s, if _i: 0im and t i {s i . Given the above models and notation, we can describe more precisely the approach of our analysis: given an instance t of the CAM, we can calculate the k

File: DISTL1 146106 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 3942 Signs: 3336 . Length: 52 pic 10 pts, 222 mm

SEARCHING IN ASSOCIATIVE PROCESSING

FIG. 2.

139

Pseudocode for the threshold primitive GREATER.

stages (kn) of the combination process that are required before the whole CAM responds to a Threshold Search. Taking into account the distribution of the word space F, we calculate the probability that the execution of a primitive will end after k stages of the combination process. For our analysis and for a more comprehensive presentation in the following sections, we define three different types of sets. First, we define A k, , with 1kn and 0,2 k &1, as the set of all words that have the value , at the k most significant bits. Clearly, for a fixed k and all values ,, the sets A k, form a partition of the sample space F. The set A k, can be expressed in terms of a random variable X, as follows: consider the random variable X over the sample space F such that (i) X(x j )=d j with x j # F, and d j # N the decimal 2n &1 p j _$(x&d j ), where p j =P[X=d j ]. value of x j , and (ii) with density f (x)= j=0 Then A k, can be expressed as A k, =[,_2 n&k X((,+1)_2 n&k &1)]. A second set that we are interested in during the combination process is the set of all CAM instances where the k most significant bits of CR contain the same data

140

NASTOU, SERPANOS, AND MARITSAS

as the k most significant bits of some memory words. So, we define B ki as the set of all such tuples, where the first coordinate (CR) and exactly (i&1) of the remaining m coordinates (memory words) have the same k most significant bits (1i(m+1)). It is easy to prove that B ki /F (m+1) (where 1kn and 1i(m+1)) and thus, P(B ki )<1: given a tuple t 1 # B ki , one can always construct a tuple t 2 # F (m+1) which has CR and exactly i or (i&2) of the remaining m coordinates with the same k most significant bits, i.e., t 2 Â B ki . The third set we define is C k , the set of all tuples where the first coordinate (CR) differs from all the remaining m coordinates (memory words) in at least one of the k most significant bits, while there is at least one of the m coordinates with the first (k&1) most significant bits matching the corresponding bits of CR. This implies that k is the minimum number of most significant bits that has to be combined, so that the whole CAM responds to a primitive search. The sets C 1 , C 2 , ..., C n are mutually exclusive, as proven in the following lemma. Lemma 2.1. exclusive.

For all 1k, ln such that k{l, the sets C k and C l are mutually

Proof. Without loss of generality, it is assumed that k
The sets B n2 , ..., B n(m+1) form a partition of the equality set E.

Proof. Based on the definition of the set B ki , the set B ni contains all those CAM instances where exactly (i&1) of the m CAM words match CR. Thus, E=B n2 _ B n3 _ } } } _ B n(m+1) . Consider two sets B ni and B nj , where 2i, j(m+1) and i{ j. These two sets are not disjoint if there exist at least two tuples s # B ni and t # B nj , such that s=t. Given the definition of the sets B nl (2l(m+1)), it is impossible s=t, since each tuple s # B ni differs from each tuple t # B nj in at least | j&i | {0 coordinates. Consequently, the sets B n2 , ..., B n(m+1) are mutually exclusive. K Based on the above definition for B ki , for a fixed k, the set B k1 consists of all such tuples t, where the first coordinate (CR) differs from all remaining coordinates in at least one of the k most significant bits. According to the definition of C k and

File: DISTL1 146108 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 3575 Signs: 2882 . Length: 52 pic 10 pts, 222 mm

SEARCHING IN ASSOCIATIVE PROCESSING

141

Lemma (2.1), it is deduced that the sets C 1 , ..., C k form a partition of B k1 ; thus, it holds that B k1 =C 1 _ C 2 _ } } } _ C k . At the boundary case k=n, the set B n1 is B n1 =C 1 _ C 2 _ } } } _ C n . Consequently, a partition of the thresholding set T is obtained, since this set equals to B n1 . So, we deduce that the CAM (CR and memory) sample space F (m+1) is partitioned to the sets B n1 , B n2 , ..., B n(m+1) , i.e., to the sets C 1 , C 2 , ..., C n , B n2 , ..., B n(m+1) ; thus, the following condition holds: : 1kn

P(C k )+

:

P(B ni )=1

or

2i(m+1)

P(B n1 )+P(E)=1.

(1)

3. AVERAGE CASE ANALYSIS OF THE THRESHOLD QUERY We want to evaluate the probability that the whole CAM responds after combining k bits, i.e., there is at least one word that needs to combine k bits, while the remaining words respond after combining i bits, with ik. First, we calculate the probability of the case where all memory responds that there is no word matching the contents of CR after combining up to k bits per word; i.e., we calculate P(C k ), for all k. Then we calculate the probability P(E)= 2l(m+1) B nl that there is at least one memory word which is equal to CR. Based on these two probabilities, we derive a closed formula for the mean value + m, n of the number k of most significant bits that suffice to be compared and combined so that the whole CAM responds to a primitive operation. Given the definitions and lemmas of Section 2.2, the probabilities P(C k ) (for all k: 1kn) and P(E) are calculated as follows: v P(C 1 )=P(B 11 ) while P(C k )=P(Bk1 )&P(B (k&1) 1 ), since B k1 =B (k&1) 1 _ C k and B (k&1) 1 & C k =< for k{1; v P(E)=1&P(B n1 ), based on Eq. (1). The probabilities P(B k1 ) for all k: 1kn can be directly calculated from P(B ki ) as special case (i=1). So, the probabilities P(E) and P(C k ) can be easily calculated, if we know P(B ki ). To calculate P(B ki ), first we define B ,ki (0,2 k &1) as the set: B ,ki =[t | t # B ki 7 decimal value(CR (n&1) ...CR (n&k) )=,]. B ,ki is the set of CAM instances where exactly (i&1) words match with CR at the k most significant bits, and the decimal value of those k bits is ,. The sets B kij and B lki are mutually exclusive, for j{l, since CR in the first set has the value j at the k most significant bits, while it has l in the second set. Furthermore, since k B ki =B 0ki _ } } } _ B 2ki &1 , the probability of the event B ki is 2 k &1

P(B ki )= : P(B ,ki ). ,=0

File: DISTL1 146109 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 3281 Signs: 2325 . Length: 52 pic 10 pts, 222 mm

(2)

142

NASTOU, SERPANOS, AND MARITSAS

To evaluate the probability P(B ,ki ) it is important to identify how the coordinates of a tuple take values from the sample space F. In our analysis, we construct tuples by drawing (m+1) words from F=S n independently. So, at first a word is drawn from the sample space F and is assigned to the first coordinate; then a second word is drawn and assigned to the second coordinate, and the above drawing is repeated for each of the remaining (m&1) coordinates. The probability that the k most significant bits of the first coordinate (CR) hold the decimal value , is equal to the probability of the event A k, , as defined in Subsection (2.2). This is also true for the remaining (i&1) words of the tuples in B ,ki , which match CR at the k most significant bits. Since these (i&1) words can be m located anywhere inside the tuple, there are ( i&1 ) tuples (instances) which have the value , at the k most significant bits of the i matching coordinates. Furthermore, any of the remaining (m&i+1) coordinates can store any value x, x # [0, 1, ..., 2 n &1], except the values [,_2 n&k, ..., (,+1)_2 n&k &1] (if any one of the (m&i+1) coordinates took any of these excluded values, then that coordinate would have value , in its k most significant bits resulting in a memory with i words and not (i&1)matching CR at the k leftmost bits; contradiction). So, the probability P(B ,ki ) can be calculated as m

\i&1+ (P(A )) (1&P(A m = \i&1+ c (,) (1&c (,))

P(B ,ki )=

i

k,

i

k

k,

)) m&i+1

m&i+1

k

,

(3)

where c k(,) is the probability of the event A k, . Probability c k(,) can be easily calculated as c k(,)=P(A k, )=P[,2 n&k X((,+1) 2 n&k &1)] =F((,+1) 2 n&k &1)&F(,2 n&k &1),

(4)

where F(x)= xj=0 f ( j) is the distribution function of random variable X. If we substitute probability P(B ,ki ) in Eq. (2) with the result of Eq. (3), we obtain for P(B ki ) P(B ki )=

m i&1

\ +

2k &1

: c k(,) i (1&c k(,)) m&i+1.

(5)

,=0

This shows that in general, P(B ki ) is a function of m and the distribution of the word space for a certain value of k. The probability P(B k1 ) is given by Eq. (6) which is extracted from Eq. (5) for i=1: 2k &1

P(B k1 )= : c k(,)(1&c k(,)) m. ,=0

File: DISTL1 146110 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 3010 Signs: 2094 . Length: 52 pic 10 pts, 222 mm

(6)

SEARCHING IN ASSOCIATIVE PROCESSING

143

Given the probabilities P(E), P(C k ), and P(B k1 ), calculated above, the mean value + m, n of the number k of the most significant bits that define the completion of the execution of the threshold primitive is derived as follows: + m, n = :

k P(C k )+

1kn

n P(B ni )

: 2i(m+1)

k P(B k1 )& :

=nP(E)+ : 1kn

=n&n P(B n1 )+ :

k P(B k1 )&

1kn

=n+

k P(B k1 )&

: 1k(n&1)

=n&

k P(B (k&1) 1 )

2kn

:

(l+1) P(B l1 )

1l(n&1)

:

(l+1) P(B l1 )

1l(n&1)

P(B k1 ).

:

(7)

1k(n&1)

The above analysis provides a general framework for the execution of the primitive search operations of a CAM. The framework uses a set of probabilities P(B k1 ), for all 1kn, which are a function of the word distribution space, and calculates the probability of equality P(E) and the mean value + m, n . 4. UNIFORM DISTRIBUTION Consider the uniform distribution for the previously defined random variable X n &1 over the sample space F. Then its density function is f (x)= 2j=0 p j _$(x&d j ), where p j =P[X=d j ]=12 n. It is known [10] that the entropy H(x) of the random variable X is maximized when the density function is uniform and equals H(x)=n. This means that the uncertainty of the occurrence of a specific word is maximized. From this point of view the uniform distribution could be considered as the worst case. First, the equation that gives the probability of the event B ki for all 1kn and 1i(m+1) is extracted. Since the probability p j is 12 n, it is easy to evaluate the values of the distribution function F(x) for each x. Thus, the probability c k(,) is obtained by c k(,)=2 &k which is derived from Eq. (4). The Eq. (8) gives the probability P(B ki ) for all 1i(m+1) and 1kn and is derived from Eq. (5) by introducing the evaluated probability c k(,). P(B ki )=

m _2 &k(i&1)_(1&2 &k ) m&i+1 i&1

\ +

(8)

The probability P(B k1 ) for all 1kn, needed both for the calculation of the mean value + m, n and the probability of equality P(E) are derived from Eq. (8) for i=1: P(B k1 )=(1&2 &k ) m.

File: DISTL1 146111 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2840 Signs: 1925 . Length: 52 pic 10 pts, 222 mm

(9)

144

NASTOU, SERPANOS, AND MARITSAS

As shown in Subsection (2.2) the probability P(E) that there is a match between CR and at least one memory word can be extracted by Eq. (1). So, by introducing k=n in the Eq. (9) the probability of at least one match between memory and CR is P(E)=1&P(B n1 ) =1&(1&2 &n ) m.

(10)

The Eq. (10) verifies some intuitively expected results: 1. For a fixed value of m, i.e., fixed memory size, P(E) Ä 0 as the bit-length of the words increases (n Ä ). This agrees with intuition because it proves that as memory words become very large and the word sample space F increases, while memory size remains constant, the probability that a word will be selected twiceonce for CR and once for memorydecreases. 2. For a fixed value of n, i.e., fixed word length, P(E) Ä 1 as memory size increases (m Ä ). This occurs because the probability that a word will match CR increases as the number of selections from the word sample space F (equal to memory size m) increases, while the cardinality of the sample space F remains constant. So, based on Eqs. (7) and (9), Eq. (11) is derived which gives the mean value + m, n for the uniform distribution. + m, n =n&

(1&2 &l ) m =n&S n

:

(11)

0ln&1

To obtain a closed formula for + m, n , we need to calculate S n . This is done through the following lemma and theorems. Lemma 4.1.

S n can be approximated as Sn &

&i

e &2 +O(2 &2n ).

: &lg mi
Proof. Sn =

(1&2 &l ) m

: 0ln&1

We denote as a(l, m) the sequence of numbers (1&2 &l ) m, for 0l
\

a(i, m)= 1&

2 &i m

m

+.

File: DISTL1 146112 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2376 Signs: 1582 . Length: 52 pic 10 pts, 222 mm

145

SEARCHING IN ASSOCIATIVE PROCESSING

In [6] Knuth et al. give an approximation of the ln(1+z) if z Ä 0: z 2 z3 ln(1+z)=z& + +O(z 4 ). 2 3 This formula can be used for a(i, m)=(1+z) and with the asymptotic notation at the second power of z, since 2 &im approaches zero (2 &im Ä 0) very fast when i Ä (since n Ä ).

\

ln(a(i, m))=m_ln 1& =m_

\

2 &i m

+

&2 &i 2 &2i +O m m2

\ ++ 2 +O \m + &2i

=&2 &i

2

So, a(i, m) is a(i, m)=e &2

&i +O(2 &2im2 )

&i

=e &2 _e O(2

&2im2 )

.

(12)

Since Knuth [6] shows that e O( f (n)) =(1+O( f (n))), if f (n)=O(1) Eq. (12) becomes &i

a(i, m)=e &2 _e O(2 &i

&2im2 )

\

=e &2

&i

2 &2i m2

\ ++ 2 +O \ m +.

=e &2 _ 1+O

&2i 2

So, finally S n is evaluated: Sn =

(1&2 &l ) m

: 0ln&1

=

: &lg mi
=

:

\

&i

e &2 +O

2 &2i m2

\ ++

&i

e &2 +O(2 &2n ). K

(13)

&lg mi
Table 1 shows some values of (1&2 &l ) m and its approximation e &2 for variable l and i (l, i are related as l=lg m+i) and for m=2K. Clearly, the sequence a(i, m) is an increasing sequence in the range [&lg m, n&lg m). Furthermore, a(i, m) & 0 for i &3 with an absolute error of O(2 &2n ).

File: DISTL1 146113 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2152 Signs: 939 . Length: 52 pic 10 pts, 222 mm

146

NASTOU, SERPANOS, AND MARITSAS

TABLE 1 Terms a(l, m) and Their Approximation for Various Values of l m=2K (1&2 &l ) m 0 b 9.83_10 &15 1.05_10 &7 3.35_10 &4 0.0182 0.1352 0.3677 0.6064 0.7787 0.8824 0.9394

1 0 b 6 7 8 9 10 11 12 13 14 15

&i

e &2 0 b 1.26_10 &14 1.12_10 &7 3.35_10 &4 0.0183 0.1353 0.3678 0.6065 0.7788 0.8825 0.9394

i &11 b &5 &4 &3 &2 &1 0 1 2 3 4

Theorem 4.1. If n>lg m, then + m, n =lg m+1.312& Proof.

m 2

n&1

1 m _ 1& _ n +O(2 &2n ). 3 2

\

+

Using the result of Lemma (4.1), S n is calculated as follows. Sn =

&i

e &2 +O(2 &2n )

: &lg mi
=

&i

e &2 +

: &lg mi0

&i

e &2 +O(2 &2n )

: 1i
=B n +C n +O(2

&2n

)

The calculation of B n is straightforward. As mentioned above, in Table 1 the &i value of e &2 is approximately zero (0) for i &3. So, Bn =

&i

e &2 =e &4 +e &2 +e &1 =0.5214.

: &lg mi0

We calculate C n using the approximation of e z as this is given by Knuth et al. [6]: e z =1+z+z 22!+z 33!+z 44!+O(z 5 ), when z Ä 0. Since 2 &i Ä 0, as i Ä (since n Ä ): Cn=

:

&i

e &2 =

1i
=n&lg m&

:

:

2 &i +2 &1_

0i
=n&lg m&

(1&2 &i +2 &2i&1 +O(&2 &3i ))

1i
:

2 &2i +O

1i
\

m 1 m 11 & n&1_ _ n &1 +O(2 &3n ). 6 2 3 2

\

+

File: DISTL1 146114 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2347 Signs: 993 . Length: 52 pic 10 pts, 222 mm

: 1i
( | &2 &3i | )

+

147

SEARCHING IN ASSOCIATIVE PROCESSING

So, if n>lg m S n =n&lg m&

11 m 1 m +0.5214& n&1_ _ n &1 +O(2 &2n ). 6 2 3 2

\

+

Thus from Eq. (11) + m, n becomes + m, n =lg m+1.312&

m 2

n&1

1 m _ 1& _ n +O(2 &2n ). 3 2

\

+

K

Thus it has been shown that + m, n changes slightly when the word bit-length n takes large values while the memory size m remains constant. Furthermore, + m, n changes logarithmically with memory size m. Theorem 4.2. If nlg m, then (n&0.1539)+ m, n n. &i

Proof. Considering the sum A n = &lg mi >1. This mean value is the average number of words that have the same k most significant bits with the comparand register (CR). Expanding the condition of the De-Moivre and Laplace theorem in our case, we deduce

m> >

1 pq

m> >

2k 1&2 &k

lg m> >lg (2 2k )&lg (2 k &1) > >2k&lg (2 k &1) &k. This implies that for k<
File: DISTL1 146115 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2895 Signs: 1908 . Length: 52 pic 10 pts, 222 mm

(14)

148

NASTOU, SERPANOS, AND MARITSAS

FIG. 3.

The probability P(B k1 ) in a single-word threshold query (m=1) for uniform distribution.

Clearly, if n<
+ m, n =

{

lg m+1.312&

m 1 m _ 1& _ n +O(2 &2n ) 2 n&1 3 2

\

+

&n(+ m, n # [n&0.1539, n])

if

n>lg m

if

nlg m.

An interesting boundary case is that of m=1; i.e., the case of comparing two words of bit-length n. In this case, P(B k1 ) is the probability of mismatch between the two words, while P(E) is the probability of a match. These probabilities can be directly derived from Eqs. (10) and (9): P(B k1 )=1&2 &k

(15)

P(E)=2 &n.

Furthermore, the average number of most significant bits that need to be combined (and compared) in order to get a response is + 1, n =n&

(1&2 &l )

: 0ln&1

=n&n+

: 0ln&1

=2&2

&n+1

.

2 &l

SEARCHING IN ASSOCIATIVE PROCESSING

149

As the result shows, + 1, n approaches 2 for large word lengths n, and changes slightly as the word length increases. Figure 3 shows the theoretical values of P(B k1 ) evaluated with Eq. (15) and the values (relative frequency) obtained by executing a single word comparison routine (using uniform random generator) in a Sun workstation. Finally the theoretical and the experimental values for + 1, n is shown in Figs. 6 and 7. 5. EXPONENTIAL DISTRIBUTION Consider the exponential distribution for random variable X defined over the n &1 p j _$(x&d j ), where sample space F. Then, its density function is f (x)= 2j=0 & pdj . The basic Eq. (16), given below, shows that the size of the p j =P[X=d j ]= p e sample space |F| =2 n defines the values of the parameter p<1: 2n &1

n

: pj = j=0

p& pe & p2 =1. 1&e & p

(16)

Thus, as the sample space increases (i.e., the word length n takes large values, n Ä ), the parameter p decreases to zero ( p Ä 0). Using the approximation formulas, e z =1+z+z 22!+z 33!+O(z 4 ) and ln(1+z) =z+O(z 2 ) (when z Ä 0), which are given in [6], Eq. (16) becomes 1 1 2 n =& ln p2 + +O( p). p 3

\ +

(17)

Equation (17) relates directly parameter p with the word length n. Table 2 gives the values of p for various word lengths n. As in the case of uniform distribution, the mean value + m, n is calculated with Eq. (7) using the probabilities P(B k1 ) (1k(n&1)). TABLE 2 Values of p for Various Values of the Word Length n n

p

8 9 10 11 12 13 14 15 16 20 24 32

1.83497_10 (&2) 1.02978_10 (&2) 5.72148_10 (&3) 3.15142_10 (&3) 1.72301_10 (&3) 9.35965_10 (&4) 5.055636_10 (&4) 2.7172704_10 (&4) 1.450392_10 (&4) 1.15067728_10 (&5) 8.72886636_10 (&7) 4.62959156594_10 (&9)

File: DISTL1 146117 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2557 Signs: 1611 . Length: 52 pic 10 pts, 222 mm

150

NASTOU, SERPANOS, AND MARITSAS

First, we calculate the probability c k(,) for a fixed k (, # [0 } } } 2 k &1]), which defines P(B k1 ). Based on Eq. (4), we derive a closed formula for c k(,), c k(,)=F((,+1) 2 n&k &1)&F(,2 n&k &1) =

p p n&k n&k (1&e & p2 (,+1) )& (1&e & p2 , ) &p &p 1&e 1&e

=

p n&k n&k (e & p2 , &e & p2 (,+1) ) 1&e & p

=

p n&k n&k (1&e & p2 )) e & p2 , 1&e & p

=b k e & p2

n&k,

,

(18) n&k

n&k

where b k = p(1&e & p )(1&e & p2 )=b(1&e & p2 ). For large n, parameter b is approximated by 1(1& p2+O( p 2 )), since e & p =1& p+ p 22+O( p 3 ). Thus, the n&k factor b k becomes b k =1(1& p2+O( p 2 ))(1&e & p2 ). If we substitute c k(,) in Eq. (6) with the result of Eq. (18), we obtain probability P(B k1 ): 2k &1

P(B k1 )= : b k e & p2

n&k,

(1&b k e & p2

n&k, m

)

(19)

,=0 k

=

2 &1 1 d(1&b k e & p2 n&k : p(m+1) 2 d, ,=0

1 r p(m+1) 2 n&k =

(1&b k e & p2

|

2k &1 ,=0

=

d(1&b k e & p2 d,

b m+1(1&(e p2

n&k

)

d,

e p2 n e p2

n&k

n

n

)(e p2 )) m+1 &(b(1&e & p2 )&b k ) m+1 p(m+1) 2 n&k

n

)(e p2 )) m+1 &b m+1(1&(e p2 p(m+1) 2 n&k

n&k

\

n&k, m+1

) &(1&b k ) m+1 n (since b(1&e & p2 )=1) n&k p(m+1) 2

(b(1&e & p2 )&b k(e p2

=b m+1 1&

)

n&k (2k &1) m+1

n

=

n&k, m+1

+

m+1

n&k

n

)(e p2 )) m+1 e & p(m+1) 2

n&k

n&k

(1&e & p(m+1) 2 ) . p(m+1) 2 n&k

(20)

As mentioned in Subsection 2.2, the probability P(E) of the existense of at least one match between memory words and CR is calculated by the equation P(E)=1&P(B n1 ). The probability P(B n1 ), that all memory words differ from CR in at least one of the n bits, is calculated by Eq. (20) for k=n. Thus, Eq. (21) gives the probability of equality P(E): P(E)=1&P(B n1 )

\

=1&b (m+1) 1&

ep n e p2

+

(m+1)

1&e p(m+1) . p(m+1)

File: DISTL1 146118 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2825 Signs: 1392 . Length: 52 pic 10 pts, 222 mm

(21)

SEARCHING IN ASSOCIATIVE PROCESSING

151

Assuming fixed word length n, the right part of Eq. (21) converges to 1 as the memory length, m, increases (m Ä ). This means that there is an integer m 0 , which depends on n, such that, the probability P(E) is in a small neighborhood of 1 for all mm 0 , while P(B n1 ) is in a small neighborhood of 0. This occurs because the probability that a word will match CR increases as the number of selections from the word sample space F (equal to memory size m) increases, while the cardinality of the sample space F remains constant. Furthermore, given the definitions of Section 3, the probability P(B n1 ) is an upper bound of P(B k1 ), for 1kn&1. These probabilities define the mean value + m, n through Eq. (7). Thus, if P(B n1 ) converges to 0 (for mm 0 ), then + m, n rn. Table 3 presents some values of P(E), calculated using Eq. (21), for various values of the parameter m, and for n=12. In this table, when the number of selections m from the word sample space becomes greater than the size 2 n of the sample space (nlg m), probability P(E) converges to 1. For large n and fixed m, the factor n&k

\

a k = 1&

e p2 n e p2

+

(m+1)

1

\ \e +

= 1&

(2k &1)2k

p2n

+

(m+1)

of P(B k1 ) converges to 1 very fast, even for small k. Figure 4 gives an example of the behavior of this factor for n=32. Moreover, the factor b (m+1) of P(B k1 ) is approximated by 1 , 1&(m+1) p2+O( p 2 ) which is very close to 1 for large n (since p Ä 0). Consequently, probability P(B k1 ) can be approximated using n&k

P(B k1 )r

(1&e & p(m+1) 2 ) . p(m+1) 2 n&k

(22)

The behavior of the approximated probability P(B k1 ) is analyzed using lg m as the reference point for k; if we substitute k=lg m+l, where &lg m
P(B l1 )=

n&l

(1&e & p(m+1)(m)2 ) (1&e & p2 ) = . p(m+1)(m)2 n&l p2 n&l

(23) n&l

For negative values of l (i.e., 1k
File: DISTL1 146119 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2883 Signs: 2125 . Length: 52 pic 10 pts, 222 mm

152

NASTOU, SERPANOS, AND MARITSAS

TABLE 3 Values of P(E) for Various Values of the Parameter lg m and for Word Length n=12 lg m

P(E)

8 9 10 11 12 13 14 15 16

0.192354 0.336595 0.531167 0.725883 0.859321 0.930019 0.965431 0.983131 0.991966

h(n&lg m)= =

(1&e & pm ) pm 1&1+ pm&m 2p 22+O( p 3 ) pm

=1&

mp +O( p 2 ). 2

(24)

Thus, there is an integer l 0 , such that the probabilities P(B l1 ) are in a small neigborhood of 1, for all ll 0 . Figure 4 illustrates the behavior of probabilities P(B k1 ), for n=32 and m=512, as an example. The mean value + m, n is calculated using Eq. (7), which is rewritten as + m, n =n&

P(B l1 )&

: &lg m+1l &1

:

P(B l1 )

0ln&lg m&1

=n&=&A.

(25)

Based on the previous analysis of probabilities P(B l1 ) for negative values of the variable l, we obtain a closed formula for the sum =: == =

1 n&l p2 &lg m+1l &1 :

m&2 . pm2 n

(26)

Regarding the sum A in Eq. (25), we consider two separate cases. When nlg m, the sum A is equal to 0 by definition. Thus, + m, n is + m, n =n&

m&2 . pm2 n

File: DISTL1 146120 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2103 Signs: 918 . Length: 52 pic 10 pts, 222 mm

(27)

153

SEARCHING IN ASSOCIATIVE PROCESSING

n&k

n

FIG. 4. Behavior of the factors ak =(1&(e p2 )(e p2 )) (m+1) ( p(m+1) 2 n&k ) of the probability P(B k1 ) assuming n=32, m=512.

and

h k =(1&e & p(m+1) 2

n&k

)

It becomes clear from this equation and the analysis of the behavior of probability P(E) that + m, n approaches n, for all mm 0 =2 n. Table 4 presents values of + m, n for various CAM configurations which were calculated with Eq. (7) using Eq. (20) for the evaluation of probabilities P(B k1 ). The entry of Table 4 that corresponds to the CAM configuration lg m=16, n=16 is equal to the result obtained using Eq. (27) for the same configuration. When n>lg m, we obtain a lower bound for A and consequently an upper bound for + m, n . Based on the analysis of the behavior of P(B l1 ), A is bounded as n&lg m&1

A=

1&e & p2 p2 n&l

: l=0

|

n&lg m&1 0

n&l

n&x

1&e & p2 p2 n&x

dx(u=2 &x, w= p2 n )

1&e &wu du u2 2m2n

=

1 w ln(2)

=

1 w ln(2)

=

1 1 1 & n & 2pm ln(2) p2 ln(2) w ln(2)

=

1 1 & n &B. 2pm ln(2) p2 ln(2)

|

1

1 1 du& 2 n w ln(2) 2m2 u

|

1

|

e &wu du 2 2m2n u 1

e &wu du 2 2m2n u

|

1

(28)

154

NASTOU, SERPANOS, AND MARITSAS

TABLE 4 Mean Value + m, n for Various Values of Parameter lg m and for Word Lengths n=16, 24, 32 Evaluated by Eq. (20). lg m

+m, 16

+ m, 24

+ m, 32

4 5 6 7 8 9 10 11 12 13 14 15 16

7.19 8.16 9.14 10.13 11.11 12.07 13 13.86 14.61 15.20 15.58 15.79 15.89

7.82 8.78 9.78 10.77 11.76 12.76 13.76 14.76 15.75 16.75 17.74 18.73 19.70

8.29 9.25 10.22 11.21 12.20 13.20 14.20 15.20 16.20 17.20 18.20 19.20 20.20

According to Ryzhik in [5], integral B can be handled using

|

e &at e &ax (a>0). dt=aEi(&ax)+ t2 x

x

(29)

Symbol Ei(x) is for the exponential integral function that is defined by &t Ei(x)=& t dt, for x<0. Furthermore, the exponential integral function &x e does not have a closed form. Consequently, integral B can be expressed as B=

1 w ln(2)

e &wu 1 du& 2 w ln(2) 2m2n u

|

|

1

e &wu du u2 n

=

e & p2 1 e &2pm (Ei(&2pm)&Ei(& p2 n ))+ & n ln(2) 2pm ln(2) p2 ln(2) n

e &2pm e & p2 1 = & n & 2pm ln(2) p2 ln(2) ln(2)

|

e &t dt. 2pm t p2n

Substituting the above expression of B in Eq. (28), we obtain an upper bound for the mean value + m, n : n

+ m, n n&

m&2 1&e &2pm 1&e & p2 1 & + n & n pm2 2pm ln(2) p2 ln(2) ln(2)

|

p2n 2pm

e &t dt. t

(30)

The bound calculated above provides a very good approximation to the precise value of + m, n , as can be derived by comparing precise values with the bound values. Table 4 includes a set of precise values of + m, n , which have been calculated using Eq. (7) (probabilities P(B k1 ) were calculated with Eq. (20)). We have also

File: DISTL1 146122 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2473 Signs: 1321 . Length: 52 pic 10 pts, 222 mm

155

SEARCHING IN ASSOCIATIVE PROCESSING

calculated representative values for the upper bound of Eq. (30) using the mathematical tool Scilab under the operating system Linux; for the CAM configurations (i) n=24, m=1024 and (ii) n=32, m=1024, we calculate the upper bounds 14.28 and 14.76, respectively. These values are very close to the precise values of + m, n , given in Table 4, showing that Eq. (30) provides a good upper bound of + m, n . Furthermore, the precise values of + m, n provide an indication of the behavior of + m, n as a function of m for a fixed n: interestingly, + m, n increases logarithmically as the CAM size m increases. This logarithmic dependency becomes evident by refining the bound of Eq. (30) further. Gradshteyn and Ryzhik in [5] gives Eq. (31) that provides an asymptotic representation of the exponential integral function for x<0:

Ei(x)=C+ln( &x)+ : j=1

xj . jj!

(31)

Based on this, we obtain an asymptotic form for the integral that appears in Eq. (30): 1 ln(2)

|

e &t 1 dt= (Ei(& p2 n )&Ei(&2pm)) t ln(2) 2pm p2n

=n&lg m&1+

1 (& p2 n ) j &(&2pm) j : . ln(2) j=1 jj!

(32)

Applying Eq. (32) to Eq. (30), we obtain a new expression for the upper bound of + m, n : n

+ m, n lg m+1& &

m&2 2 n(1&e &2pm )&2m(1&e & p2 ) & pm2 n pm2 n+1 ln(2)

1 (&p2 n ) j &(&2pm) j ) : . ln(2) j=1 jj !

(33)

Inequality (33) proves that the mean value + m, n depends strongly on parameter lg m, as was indicated by the values given in Table 4. Furthermore, it indicates a weak dependency on the difference between the size of the sample space 2 n and the CAM length m in the last term of the bound. The infinite sum there converges, as can be easily shown (since lim j Ä |u j+1 u j | =0). 6. PARALLEL SIMULATION OF THE CLASSICAL CAPP We have developed a simulator of the classical CAPP on a multiprocessor system in order to obtain simulation results for the performance of the primitive search operations (Equality and Thresholding) with different CAM configurations and either uniform or exponential distribution of the word space. In the following, we describe our environment and provide results of the simulations.

File: DISTL1 146123 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 2950 Signs: 2024 . Length: 52 pic 10 pts, 222 mm

156

NASTOU, SERPANOS, AND MARITSAS

6.1. The Parsytec GCel512 machine A Parsytec GCel 3512 is a MIMD parallel computer consisting of 512 processors. It is physically divided into eight cubes. Each cube consists of four clusters. Sixteen processors and one redundant processor forms a cluster. The GCel is built using INMOS T805 transputers. Each processor is a node in a 2-dimensional communication grid, the data-network (D-network), communicating with other processors using message passing. A user can allocate a subset of the 512 processors, called a partition. A GCel cluster is the smallest allocatable partition. SUN workstations provide the front-end to the GCel. GCel clusters are connected via physical links to one of the front-end workstations. The operating environment for the GCel is called PARIX (PARallel extensions to unIX). The GCel is controlled by a network of control processors (C-network, one transputer per cube) which is independent of the D-network. The CNet software is a program running on the C-network. The main task of the CNet is the configuration of the machine, i.e., connecting and disconnecting processors as 2-dimensional grids. 6.2. Simulation Results We have implemented a simulator of the classical CAPP on the Parsytec parallel machine. The Content Addressable Memory (CAM) of the CAPP is partitioned into L parts, where L is the number of processors in the allocated partition. The ith part of the CAM belongs to the processor with identification number Id=i. Furthermore, the processor with Id=0 is the controller of the CAPP. Each processor loads the data of its CAM partition following the uniform or the exponential distribution of the n-bit patterns. Uniformly distributed n-bit patterns are produced by using the uniform random number generators of the available operating system. The exponential random number generator produces n-bit patterns according to the following mechanism given in [10]; w i =k

F(k&1)
where u i is a uniformly distributed random number in the range (0, 1), k # [0 } } } 2 n &1] and F(k)= p(1&e & p )(1&e & p(k+1) ). Values of the parameter p are given in Table 2. Both random number generators use different seed for each CAM partition and word (independent data filling of the CAM words). The seed of each random number generator is a function of the corresponding processor identification number Id modified by the current word index and a user-supplied constant. The controller broadcasts the contents of the comparand and mask registers to the slave processors. After this startup step, all processors execute the search operation. Implementing this operation, each processor compares the contents of the comparand register with the contents of each memory word. When an internal bit of a memory word differs from the corresponding bit of the comparand register, then the operation for that word stops and there is a responder (less or greater than

File: DISTL1 146124 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 3497 Signs: 2908 . Length: 52 pic 10 pts, 222 mm

SEARCHING IN ASSOCIATIVE PROCESSING

157

comparand). This is repeated for all memory words. The last word (or words), that responds defines the maximum number of bits (the corresponding internal bits) that are compared and combined in a word in order to obtain a response from each part of the CAM. This number is denoted as Threshold. It is obvious that the last word (or words) may respond at the nth bit either for equality or for inequality. When the processors finish simulation of their partitions, they communicate via a hypercube interconnection network to combine their responses and to provide the maximum Threshold from their individual values. At the end, the processor with Id=0 contains the Threshold. So, in summary: v each processor loads its CAM partition drawing data from uniformly or exponentially distributed word space; v each processor performs the comparison between the comparand register (CR) and the content of each word; v each processor collects the responses and calculates the Threshold; v processor with Id=0 collects data from all other processors through a hypercube interconnection network and calculates the Threshold. The above process is repeated 10 5 times in each experiment, and the average value of the Threshold is evaluated. Figures 5, 6, and 7 present some of the experimental and the theoretical results we have obtained. Figure 5 shows that the average threshold increases logarithmically when n>lg m, as memory size increases and keeping n constant. Furthermore,

FIG. 5. Mean value + m, n versus number of words of the CAM for n=14, 32 for uniformly distributed word space.

158

FIG. 6.

FIG. 7.

NASTOU, SERPANOS, AND MARITSAS

Theoretical curves for + m, n versus CAM word length for uniformly distributed word space.

Experimental curves for + m, n versus CAM word length for uniformly distributed word space.

SEARCHING IN ASSOCIATIVE PROCESSING

159

FIG. 8. Mean value + m, n versus number of words of the CAM for n=12, 14, 16 for exponentially distributed word space.

it is shown that as the binary logarithm of the memory size m becomes greater than n, the experimental values of the average threshold becomes greater than n&1 and aproaches n. Figures 6 and 7 show that the threshold changes slightly (almost constant) as the word length increases, when m is constant. Figure 8 shows the experimental values of the average threshold, obtained with the simulator, in the case of exponential word space distribution. As the figure indicates, in CAM configurations with n>lg m, the average threshold increases logarithmically as the memory length m changes and n is fixed; as memory length, m, converges to n, the average threshold approaches n.

7. CONCLUSIONS We presented a general framework for the average case analysis of the search primitive operations, i.e., equality and thresholding, in associative memories. The framework considers as parameters the CAM size m (number of memory words), the memory word length n and the word space distribution. The results are formulas for calculating (i) the probability that the whole CAM responds after comparing up to k most significant bits (1kn) in each word, (ii) the probability that there exists at least one memory word that is equal to the centrally broadcast word, and (iii) the average value of k, i.e., + m, n . Thus, using this framework, one can obtain results for the behavior of the average value of k for any CAM configuration and for any word space distribution model.

160

NASTOU, SERPANOS, AND MARITSAS

We used this framework to obtain results for the cases where memory draws its data from a uniformly or exponentially distributed word space, which are considered as the most representative physical distribution models [7]; furthermore, the uniform distribution is considered as a worst case scenario, since it is is characterized by maximum entropy. Interestingly, for both distributions, it is proven that + m, n depends strongly on lg m, when n>lg m. Furthermore, + m, n is practically independent of n in the uniform distribution, while it depends weakly on the difference between the size of the word space 2 n and the CAM length m in the case of exponential distribution. Finally, + m, n is approximately n in case nlg m. Verification of the analytical results through massive simulations indicates the correctness of the analysis. The important outcome of the presented work is that one can build associative memory cells with low complexity achieving high speed primitive search execution. Thus, high system performance is achieved without decreasing the density of the CAM memories. The way to build such memories is to implement a combining tree only for + m, n most significant bits in each word and to allow serial combination of bit comparison results for the remaining (n&+ m, n ) bits

ACKNOWLEDGMENTS This work was supported in part by the ESPRIT III Basic Research Programme of the EU under Contract 9072 (Project GEPPCOM).

REFERENCES 1. C. Davarakis and D. Maritsas, A probabilistic parallel associative search and query set of algorithms, J. Parallel Distrib. Comput. 14 (1992), 3749. 2. W. Davis and D. Lee, Fast search algorithms for associative memories, IEEE Trans. Comput. C-35, No. 5 (May 1986), 456461. 3. A. Decegama, ``The Technology of Parallel Processing,'' PrenticeHall International, Englewood Cliffs, NJ, 1989. 4. C. Foster, ``Content-Addressable Parallel Processors,'' Van NostrandReinhold, New York, 1976. 5. I. S. Gradshteyn and I. M. Ryzhik, ``Table of Integrals, Series, and Products,'' Academic Press, San Diego, 1980. 6. D. Knuth, R. Graham, and O. Patashnik, ``Concrete Mathematics,'' AddisonWesley, New York, 1989. 7. D. Knuth, ``The Art of Computer Programming, Seminumerical Algorithms,'' AddisonWesley, New York, 1987. 8. T. Kohonen, ``Content-Addressable Memories,'' Springer-Verlag, Berlin, 1987. 9. P. Nastou, D. Serpanos, and D. Maritsas, Analysis of searching in associative processing, in ``Proc. 1995 Zeus Workshop on Parallel Programming and Computation,'' pp. 222227, Linkoping, Sweden, 1995. 10. A. Papoulis, ``Probability, Random Variables, and Stochastic Processes,'' McGrawHill, New York, 1984.

File: DISTL1 146128 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 5636 Signs: 2658 . Length: 52 pic 10 pts, 222 mm

SEARCHING IN ASSOCIATIVE PROCESSING

161

11. C. Ramamoorthy, J. Turner, and B. Wah, A design of a fast cellular associative memory for ordered retrieval, IEEE Trans. Comput. C-27, No. 9 (September 1978), 800815. 12. I. Scherson and S. A. Ilgen, A reconfigurable fully parallel associative processor, J. Parallel Distrib. Comput. 6 (1989), 6989. 13. I. Scherson and S. Ruhman, Multi-operand arithmetic in a partitioned associative architecture, J. Parallel Distrib. Comput. 5 (1988), 655668.

File: DISTL1 146129 . By:GC . Date:13:10:98 . Time:11:26 LOP8M. V8.B. Page 01:01 Codes: 1690 Signs: 500 . Length: 52 pic 10 pts, 222 mm

Average Case Analysis of Searching in Associative Processing

Average Case Analysis of Searching in Associative Processing

Recommend Documents