A parallel optical implementation of arithmetic operations

A parallel optical implementation of arithmetic operations

Optics & Laser Technology 49 (2013) 173–182 Contents lists available at SciVerse ScienceDirect Optics & Laser Technology journal homepage: www.elsev...

397KB Sizes 2 Downloads 121 Views

Optics & Laser Technology 49 (2013) 173–182

Contents lists available at SciVerse ScienceDirect

Optics & Laser Technology journal homepage: www.elsevier.com/locate/optlastec

A parallel optical implementation of arithmetic operations Ali Gholami Rudi, Saeed Jalili n Department of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran

a r t i c l e i n f o

abstract

Article history: Received 12 September 2012 Received in revised form 4 December 2012 Accepted 23 December 2012 Available online 31 January 2013

In this paper we present an optical processor for performing arithmetic operations in parallel. The implementation of arithmetic operations makes it possible to perform various computational tasks. As case studies we solve the bounded subset sum problem in parallel and perform parallel primality testing. The processor uses two-dimensional optoelectronic planes for both performing logic operations and storing data, which eliminates the need for transferring data from electronic devices in each step of the computation. The presented processor seems easier to realize than most of the past parallel optical processors due to its simpler and more compact architecture, while staying powerful enough to carry out computations from diverse applications. & 2012 Elsevier Ltd. All rights reserved.

Keywords: Parallel optical computing Arithmetic operations Primality testing

1. Introduction The speed limitation of VLSI technology has forced the electronic industry to use parallelism to achieve higher throughput. However, despite the great advances in electronic processors, they still suffer from limited parallelism and scalability. As a matter of fact, the industry is moving towards technologies like multi-core and distributed systems, technologies more complex, power hungry, and expensive than older processors. On the other hand, the high parallelism of optics, its ability to implement digital logic, its success in data transfer, and the advances in optical devices make light a viable alternative to electricity for computation. In fact, the studies for using optics for computing began more than five decades ago [1]. Especially in years 1980–1995, a large number of researchers invested in optical computing and presented several techniques to use optics in parallel processing, such as optical shadow casting [2–17], optical symbolic substitution [18–24], and optical vector–matrix multiplication [1,25–31]. After this golden age of optical computing, it experienced a slowdown near the end of the last millennium. Some researchers attribute this slowdown to the limitations of optical devices and the breakthroughs in electronic computers [32]. In any case, optics failed to play a major role in computing after this period. The remarkable progress of the last few decades in optical devices, one of the main obstacles for implementing optical processors, has made optical processors more viable now than any time in the past [33,34]. In the last few years, there seems to

n

Corresponding author. Tel.: þ98 21 8288 3374; fax: þ 98 21 8288 4325. E-mail address: [email protected] (S. Jalili).

0030-3992/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.optlastec.2012.12.029

be growing interest in parallel optical computing; researchers are trying to use optical devices, like liquid crystal, to implement parallel arithmetic operations like addition [35–37] and to improve the performance of optical vector-matrix multiplication [29,38–41]. Optics has also been used to solve intractable combinatorial problems more efficiently using the parallelism of light like [42–52]. Although most of these efforts did not make optical processors any more realizable than they previously were, they did remind that the high parallelism of light could benefit even today’s advanced electronic computers. In this paper we present a parallel optical processor for implementing arithmetic operations. The processor uses twodimensional optoelectronic planes both to implement logical operations and to store data. Storing data planes inside the processor solves one of the main challenges of previous processors; data no longer needs to be transfered to electronic circuits or external memory devices in each cycle. The processor is very compact and seems much easier to implement than many of the previous optical processors. Unlike most of the past optical processors that implement limited arithmetic operations, the presented processor implements various arithmetic operations to carry out different computational tasks; as case studies we solve the bounded subset sum problem and perform parallel primality testing. This paper is organized as follows: In Section 2 we present our parallel optical processor and an overview of its optoelectronic implementation. Then in Section 3 we implement arithmetic operations using the presented optical processor. In Section 4 we solve the subset sum problem in parallel and perform parallel primality testing using the operations implemented in Section 3. We discuss a possible extension of the processor in Section 5,

174

A.G. Rudi, S. Jalili / Optics & Laser Technology 49 (2013) 173–182

which makes it possible to perform operations on the numbers inside a single number array. In Section 6 we review the related studies in the literature, compare them with the presented processor, and discuss the feasibility of the presented processor based on the available technologies. Finally in Section 7 we conclude this paper.

2. The parallel optical processor In this section we present our parallel optical processor, describe its architecture, and sketch its optoelectronic implementation. The operands of our processor are very large arrays of bits. The processor has a fixed number of such bit arrays and can perform bitwise logical operations on the corresponding bits of any of them. These operations are performed on all bits of the operands in parallel; i.e. the processor follows the Single Instruction Multiple Data (SIMD) organization in Flynn’s taxonomy of parallel processors [54]. In Section 2.1, we describe the bit array operations supported by the processor and in Section 2.2 we present the architecture of the processor and explain how it can be implemented using optoelectronic devices. Note that the presented processor performs simple bitwise logical operations on bit arrays and does not natively implement numerical arithmetic operations; we postpone describing the implementation of arithmetic operations until Section 3, in which we use these bit arrays for storing numbers and use processor’s bit array operations to implement arithmetic operations on them. 2.1. Processor bit arrays and operations The proposed processor has a fixed number of large bit arrays. These bit arrays can be uniquely addressed; Bi denotes the i-th bit array of the processor. 9Bi 9 denotes the number of bits in bit array Bi, but note that all bit arrays of the processor have an equal number of bits and the value of 9Bi 9 is equal for all bit arrays of a processor. We also use Bij to denote the j-th bit of the bit array Bi, for 0 rj o 9Bi 9. The processor provides two native operations on its bit arrays: bitwise complement and bitwise NAND. The processor can perform these operations on and move their results to any of its bit arrays. The bitwise complement operation Bz ’BitNotðBx Þ moves the bitwise complement of Bx into Bz. After this operation, each bit of Bz would be equal to the bitwise complement of its corresponding bit in Bx; in other words Bzi would be equal to the bitwise complement of Bxi , for 0 ri o 9Bz 9. The bitwise NAND operation Bz ’BitNandðBx ,By Þ moves the bitwise NAND of bit arrays Bx and By into Bz. After this operation Bzi would be equal to the bitwise NAND of Bxi and Byi , for 0 r io 9Bz 9. Other bitwise logical operations can be implemented using bitwise NAND and complement as shown in Table 1. In this table, we use Bti (for i Z 0) as temporary bit arrays. 2.2. Processor architecture and its optical implementation In this section we describe an overview of the optoelectronic implementation of the processor described in Section 2.1. The processor consists of several two-dimensional planes for both storing bit arrays and performing logical operations on them; we use the term data planes to refer to these optoelectronic planes. As will be discussed shortly, the processor controls these data planes via their electronic input command signals (control bus) to transfer data between data planes or to perform logical operations on them. A data plane is depicted in Fig. 1. The area of a data plane is divided into many equal-sized cells. Each cell stores a single bit,

Table 1 Implementing bit array operations using processor NAND and complement. Operation

Description

Implementation

Bz ’Bx

Copying

Bt0 ’BitNotðBx Þ

Bz ’BitAndðBx ,By Þ

Bitwise AND

Bt0 ’BitNandðBx ,By Þ

Bz ’BitOrðBx ,By Þ

Bitwise OR

Bt0 ’BitNotðBx Þ

Bz ’BitNotðBt0 Þ Bz ’BitNotðBt0 Þ Bt1 ’BitNotðBy Þ Bz ’BitNandðBt0 ,Bt1 Þ Bz ’BitXorðBx ,By Þ

Bitwise XOR

Bt0 ’BitNandðBx ,By Þ Bt1 ’BitNandðBx ,Bt0 Þ Bt2 ’BitNandðBy ,Bt0 Þ Bz ’BitNandðBt1 ,Bt2 Þ

Cells

Control Bus Fig. 1. An example 4  8 data plane.

can detect light, and depending on its state can block the light passing through it. The memory bits of the cells in a data plane form a large array of bits to implement a bit array. Although the cells in a data plane are independent, they all decode and follow the instructions in the 2-bit control bus of their data plane. The electronic logic in each cell is very simple: a flipflop stores cell’s memory bit and based on the signals on the control bus and the detection of light from its photodetector, the cell blocks the light (becomes opaque) or lets it pass (becomes transparent). Different technologies can be used to implement this switching behavior. Like some of the recent optical processors [36,37,39], the presented processor can use polarized light and liquid crystals for switching: both light sources can produce horizontally polarized light and the cells that block the light can change the polarization direction of light 90 degrees using liquid crystals. Then vertically polarized light can be ignored as blocked. In Section 6.3 we discuss how data planes can be implemented using microdisplays [55]. The processor manages data planes via their control bus. Depending on the signals carried by the control bus, the cells in a data plane can be in one of these four states: Transparent, Record, Filter, and Combine. Table 2 describes the behavior of data plane cells in each of these states. Fig. 2 shows the high level design of the processor. The states of processor’s data planes are controlled electronically via their control bus. There are two coherent light sources; the processor can turn them on when required. Based on the states of these light sources and data planes, the processor can move data between data planes. To implement the bitwise complement of a data plane (By ’BitNotðBx Þ), the destination plane is put into the Record state, the source plane into the Filter state, and the other planes into the Transparent state. Then turning on the light source closer to the source plane transfers the complement of the source plane to the destination. For bitwise NAND (Bz ’BitNandðBx ,By Þ), the complement of the first operand is transfered to the destination plane, as in BITNOT() operation. Then the complement of the second operand is transfered to the destination plane similarly except that the destination plane is put into the Combine state instead of the Record state.

A.G. Rudi, S. Jalili / Optics & Laser Technology 49 (2013) 173–182

Table 2 Data plane states. Plane state Description Transparent All cells become transparent Record The memory bit of a cell changes to one if it detects light, and zero otherwise Combine Like record, except that cells retain their old values if no light is detected Filter All cells whose memory bit is one become opaque, and others transparent

175

store. Fig. 3 shows an example number array N where 9N9 equals 4 and JNJ equals 8. As in the previous section for bit arrays, we achieve parallelism by performing an arithmetic operation on all numbers in number arrays in parallel. For instance, the result of adding number arrays A and B, is the number array C, in which C(i) equals AðiÞ þBðiÞ for 0 r io JAJ. We discuss the implementation of arithmetic operations on number arrays in Section 3.3. 3.2. Procedures To describe algorithms on number arrays, for instance to demonstrate the implementation of arithmetic operations, we use procedures. Procedures are simple wrappers around processor bit array operations. They allow describing algorithms on number arrays while ignoring details like the mapping of number arrays to processor bit arrays and the number of bit arrays in number arrays. For execution, procedures should be translated to a sequence of processor bit array operations as described in Section 2.1. Procedures follow the following rules to make the expression of number array algorithms easier:

Light source

Memory planes Light source Fig. 2. The architecture of the proposed optical processor.

3. Implementing arithmetic operations In this section we use the processor presented in the previous section to implement arithmetic operations. Natively, the processor implements bitwise logical operations on bit arrays. We combine bit arrays to form large arrays of numbers (Section 3.1) and implement arithmetic operations on these number arrays using processor’s bit array operations (Section 3.3). The encoding of numbers and their operations are largely based on the processor presented by Louri in his three-dimensional optical architecture [24]. In order to describe the implementation of arithmetic operations, we use procedures whose syntax is briefly explained in Section 3.2. Ultimately, these procedures should be translated into a series of processor native bit array operations that can be executed sequentially by the processor. The use of procedures helps us to describe the algorithms, while omitting the low-level details. 3.1. Number arrays The native operands of the presented optical processor are bit arrays. We combine processor’s bit arrays to form large arrays of numbers, i.e. number arrays; in other words, in order to represent arrays of numbers, we assign a few of processor’s bit arrays to a number array. In the rest of this paper, we use capital letters to reference number arrays. The number array N contains 9N9 bit arrays; we use Ni to point to the i-th bit array of N and Ni,j to point to the j-th bit in Ni. The j-th number in N, N(j), is stored in two’s complement number representation in bits N9N91,j ,N 9N92,j , . . . , N0,j , where N0,j is its least significant bit. Note that in this encoding, Ni stores the i-th bit of all numbers in N. As explained in Section 2, all bit arrays of the processor and, as a result, all bit arrays of a number array have the same number of bits; we use JNJ to denote the number of bits in Ni (and equally the number of numbers in N) for any i in 0 r io 9N9. On the other hand, different number arrays may contain different number of bit arrays; the number of bit arrays of a number array determines the number of bits in its numbers and hence the magnitude of the numbers it can

Number In procedures, number arrays are referenced using arrays: capital letters. Number array assignment is implemented as the assignment of corresponding bit arrays: the assignment A’B is translated into Ai ’Bi for 0 r io 9A9, given that 9A9 ¼ 9B9. For convenience we define a special notation for two useful number arrays: A constant array is a number array, all of whose numbers are equal. X denotes the constant array whose numbers are equal to the constant X; for instance, 0 is the number array whose numbers are zero. Also, the enumeration array, E, is the number array whose i-th number is i. Operations: A procedure defines an operation that can be used in other procedures. For brevity, we omit temporary variables. For operations OP1() and OP2(), we write B’Op2ðOp1ðAÞÞ instead of T’Op1ðAÞ followed by B’Op2ðTÞ. Constant Procedures may use constant loops, which represent loops: repetitions with a fixed number of iterations. The body of these loops is duplicated for each iteration. For instance, the body of a for loop with variable i for values 0 until 9M91 is duplicated 9M9 times, each with a different value of i. For executing a procedure, its nested calls are inlined recursively, constant loops are expanded recursively, temporary number arrays are introduced when needed, and bit arrays are

3

0

0

1

1

6

0

1

1

0

4

0

1

0

0

0

0

0

0

0

4

0

1

0

0

3

0

0

1

1

2

0

0

1

0

1

0

0

0

1

N

N3

N2

N1

N0

Fig. 3. The number array N containing (3, 6, 4, 0, 4, 3, 2, 1), and its corresponding bit arrays.

176

A.G. Rudi, S. Jalili / Optics & Laser Technology 49 (2013) 173–182

mapped to number arrays (Ni is replaced with Bx where Bx is the processor bit array corresponding to the i-th bit array of N). The resulting bit array operations can be executed on the processor sequentially.

corresponding number in C is nonzero. Therefore, we can replace this statement with A’ConditionalArrayðC,B,AÞ; Fig. 4 shows an example. Table 4 shows the implementation of this operation; it uses the EXTEND() operation, which changes all bits of nonzero numbers in the input array to one.

3.3. Arithmetic operations In this section we implement the logical and arithmetic operations of Table 3. Some of these operations are implemented trivially using bit array operations on the corresponding bit arrays of input number arrays, like NOT() in Procedure 1: each bit array of the resulting number array is the result of BITNOT() on bit arrays of the input number array. Some of these operations are more complicated.

4. Case studies In this section we solve two nontrivial problems using the proposed optical processor. We measure the complexity of these solutions based on the number of bit-array operations needed for implementing them.

Procedure 1. procedure NOT(X) R’0 for i from 0 to 9X9 do Ri ’BitNotðX i Þ end for return R end procedure

The implementation of nontrivial operations of Table 3 is shown in Table 4. Note that the number of bit arrays in number arrays can be changed based on the magnitude of the numbers they contain; for instance, it is possible to double the number of bit arrays to hold the result of a multiplication (but for simplicity, we do not allow it in procedures). CONDITIONALARRAY() in Table 4 needs special attention. It selects each number of the resulting array from input arrays (the second and third arguments) based on the value of the corresponding number in the input condition array (the first argument). CONDITIONALARRAY() operation is very important when transforming conditional statements in algorithms. For instance, the statement A’B in an if (C) block of a procedural algorithm is executed only if the condition C holds. In the equivalent number array procedure, each number in the array A should be updated only if its

4.1. The bounded subset sum problem Given the set A ¼ fa0 ,a1 , . . . ,an1 g of numbers and a number b, in the subset sum problem the goal is to find out if there exists a subset of A, the sum of whose elements equals b. The naive solution is to try every subset of A and check if this condition holds for any of them. We employ the parallelism in number arrays to solve this problem more efficiently. We can define a one-to-one mapping between the subsets of A and n-bit binary numbers: the i-th bit of a number specifies the presence of ai in the corresponding subset. Procedure 2 uses this mapping to calculate the sum of all subsets of A in parallel: the ith number in S number array is dedicated to the subset mapped to the number i, Ai. Procedure 2 calculates the sum of the numbers in each of these subsets. In this procedure we assume that JSJ Z 29A9 and 9S9 Z9A9. Also, we assume that 9S9 is large enough to hold the sum of the numbers in A. Procedure 2. 1: 2: 3: 4: 5: 6: 7:

procedure SUBSETSUM(A, b) S’0 for i from 0 till 9A91 do S’ConditionalArrayðE&2i ,S þ ai ,SÞ end for return S ¼ b end procedure

Table 3 The list of number array operations. Operation

Description

NotðAÞ A&B A9B EXTEND(A)

Bitwise complement of all numbers in A Bitwise AND of corresponding numbers in A and B Bitwise OR of corresponding numbers in A and B

SignðAÞ LSHIFT(A, i) AþB A AB A¼B AoB AB A/B REMAINDER(A,B) POW(A, B) CONDITIONALARRAY (C, A, B)

Switch all bits of the i-th number to one, if AðiÞ a 0, for 0 r i o JAJ The extended sign bit of A Multiply all numbers in A by 2i Addition; ADD(A,B) is an alias and CARRY(A þB) is the resulting carry Two’s complement; equivalent to NOT(A)þ 1 Subtraction; equivalent to A þ ðBÞ Equality check; equivalent to NotðExtendðABÞÞ Comparison; equivalent to ExtendðSignðABÞÞ&ExtendðCarryðABÞÞ The product of A and B; MultiplyðA,BÞ is an alias The result of the integer division of A into B; DIVIDE(A,B) is an alias The remainder of dividing A into B A raised to the power B The i-th number of the resulting array is A(i) if C(i) is nonzero and B(i) otherwise

Procedure 2 calculates the sum of the numbers in Ai for 0 ri o 2n in n steps: in step i, we calculate the array Si (the value of S in the i-th iteration of the for loop in lines 2–3), whose j-th number equals the sum of the numbers in Aj \ fa0 ,a1 , . . . ,ai1 g. Then the array Sn would hold the sum of the elements in Aj \ fa0 ,a1 , . . . ,an1 g which equals Aj, for 0 r i o2n . In step i, we add ai to Si ðjÞ only if the i-th bit of E(j) is set, since this bit indicates the presence of ai in the corresponding subset. Recall that E is an enumeration array, as defined in Section 3.2. Hence (line 3): ( i S ðjÞ þni if EðjÞ&2i Si þ 1 ðjÞ ¼ Si ðjÞ otherwise Finally, to find the subsets with sum b, we compare Sn with the constant array b (line 4). Fig. 5 demonstrates this process for the set S ¼ f2,3,4g and b¼7. This algorithm performs Oð9A9Þ arithmetic number array operations. Since each of these operations performs at most Oð9A9Þ bit array operations, the time complexity (or the bit array 2 operation complexity) of this solution is Oð9A9 Þ.

A.G. Rudi, S. Jalili / Optics & Laser Technology 49 (2013) 173–182

177

Table 4 The implementation of number array operations. procedure AddðA,BÞ R’0

procedure MultiplyðM,FÞ P’0

procedure PowðB,PÞ R’1

C’0

Multiplication using shifting and addition

C 0 ’BitAndðA0 ,B0 Þ for i from 0 to 9M9 do

for i from 0 to 9M91 do

for 9B9 times do R’ConditionalArrayðP 40, MultiplyðR,BÞ,RÞ

P’ConditionalArrayððM&2i Þa 0, P þ LShiftðF,iÞ,PÞ

Ri ’BitXorðC i , BitXorðAi ,Bi ÞÞ C i þ 1 ’BitOrðBitAndðAi ,Bi Þ, BitAndðC i , BitXorðAi ,Bi ÞÞÞ

R’ConditionalArrayðP 40,

MultiplyðR,BÞ,RÞ

end for

P’ConditionalArrayðP 4 0,P1,PÞ end for

return P end procedure

end for end procedure

return R end procedure procedure ExtendðAÞ R’0

procedure LShiftðA,kÞ R’0

procedure DivideðN,DÞ Q ’0

for do i from 0 to 9A9k

for i from 0 to 9A91 do

Division using shifting and subtraction

Rk þ i ’Ai end for

R0 ’BitOrðR0 ,Ai Þ end for

for i from 9M91 to 0 do C’LShiftðD,iÞ

return R

for i from 1 to 9A91 do

N’ConditionalArrayðC o N,NC,NÞ

Ri ’R0 end for

Q ’ConditionalArrayðC o N,Q 92i ,Q Þ

end procedure

end for

return R

Now Q contains the quotient and N the remainder return Q

end procedure

end procedure procedure SignðAÞ

procedure ConditionalArrayðC,T,FÞ return ðExtendðC a 0Þ&TÞ9ðExtendðC ¼ 0Þ&FÞ end procedure

R’0 for i from 0 to 9R91 do Ri ’A9A91 end for return R end procedure

1

2

0

1

1

2

1

2

1

2

1

2

1

2

0

1

1

2

1

2

1

2

0

1

1

2

1

2

1

2

1

A

B

C

2 Conditional Array (C, B, A)

Fig. 4. An example for CONDITIONALARRAY() operation.

0 0 0 0 0 0 0 0 S0

+2

0 2 0 2 0 2 0 2 S1

+3

0 2 3 5 0 2 3 5 S2

+4

0 2 3 5 4 6 7 9 S3

Fig. 5. Solving the subset sum problem for A ¼{2, 3, 4} and b¼ 7.

=7

0 0 0 0 0 0 1 0 S3 = 7

178

A.G. Rudi, S. Jalili / Optics & Laser Technology 49 (2013) 173–182

4.2. Parallel primality testing We implement the Miller–Rabin primality test [56,57] to test all numbers in a number array in parallel. The algorithm relies on Fermat’s little theorem, which is described in Theorem 1 and the lemma following it: Theorem 1. For any prime p and any integer a, where 1r a o p, we have p1 p

a

1 2 p

Lemma 1. If x  1 for some integer x and prime p, where x Z 1, we p p p have x2 1  0 and hence ðx1Þðx þ 1Þ  0 which implies x  1 or p x  p1. Given an odd integer n, we can rewrite n as 2s d þ 1, where d is also an odd integer. Then if n is prime, for any integer a where s n 1 ra o n, we should have a2 d  1 by Theorem 1. Then by Lemma i j n n 1, a2 d  1 or n1 for some i in 0 ri os, and a2 d  1 for every j in i oj r s. Procedure 3 checks the converse of it for all numbers in N n number array. A number is proved to be composite if ad c1 and n i a2 d c1 for all 0 ri os. If N(j) is proved to be composite, the value of R(j) is changed to one. Procedure 3. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:

procedure ISCOMPOSITE(N, A) S’FirstBitðN1Þ D’DivideðN1, Powð2,SÞÞ X’ModularExpðA,D,NÞ R’0 S’ConditionalArrayððX ¼ 1Þ9ðX ¼ N1Þ,1,SÞ for repeat 9N9 times do X’ModularExpðX,2,NÞ S’ConditionalArrayðX ¼ N1,1,S1Þ R’ConditionalArrayððS Z 0Þ&ðR ¼ 0Þ&ðX ¼ 1Þ,1,RÞ end for return R end procedure

3: 4: 5: 6: 7:

for i from 9N91 downto 0 do R’ConditionalArrayðN&2i ,i,RÞ end for return R end procedure

Procedure 5. 1: 2:

procedure MODULAREXP(B, P, M) C¼0

3:

D¼1 for i from 9P9 downto 0 do

4: 5:

C’MultiplyðC,2Þ D’RemainderðMultiplyðD,DÞ,MÞ

6: 7: 8: 9: 10: 11: 12:

C’ConditionalArrayððP&2i Þ a 0,C þ 1,CÞ D’ConditionalArrayððP&2i Þ a0, RemainderðMultiplyðD,BÞ,MÞ,DÞ end for return D end procedure

Since the Miller–Rabin test is probabilistic, ISCOMPOSITE() is called multiple times for different values of A; a number is composite, if it was proved to be composite at least once by ISCOMPOSITE(). MILLERRABIN() takes the number array N of input numbers and the constant k, which specifies the number of tests to perform. randðÞ in line 3 is replaced with a random number in each expansion of the for loop in MILLERRABIN(). Procedure 6. 1: 2: 3: 4: 5: 6: 7:

procedure MILLERROBIN(N, k) R’0 for k times do R’R9IsCompositeðN, RemainderðrandðÞ,NÞÞ end for return R end procedure 3

Lines 1 and 2 of Procedure 3 find the values of s and d for every number n ¼ 2s d1 in the input array N. The implementation of FIRSTBIT() is described in Procedure 4; it finds the least significant non-zero bit of the numbers in input array. In line 3 of Procedure 3, MODULAREXP() is called to calculate AD ðmod NÞ efficiently. The implementation of the MODULAREXP() operation is described in Procedure 5 (details are explained in [58]). Lines 6–11 of Procedure 3 contain the main logic of the algorithm: theyn try to identify composite numbers such that n i ad c1, and a2 d c1 for all 0 r io s. Variable S is the counter of the loop in lines 7–11; i.e. R(j) is updated only if SðjÞ Z 0 (Recall that R array in Procedure 3 indicates if the numbers in N are n n composite or not). Line 5 checks if ad  1 or ad  n1; if any of these conditions hold for N(j), although N(j) may be composite, we cannot prove it with the given value of A(j). Hence the value 1 is assigned to S(j) to prevent setting R(j) in future iterations of i the loop. The loop tries different values of i for AðjÞ2 DðjÞ ðmod NðjÞÞ; if it equals  1 we cannot prove N(j) composite, so it assigns  1 to S(i) to skip the loop. Otherwise, if XðjÞ ¼ 1, N(j) is certainly composite and R(j) is set. Procedure 4. 1: 2:

procedure FIRSTBIT(N) R’0

This procedure runs in Oðk9N9 Þ, where k is the parameter passed to the MILLERRABIN() operation.

5. Extension: data reduction Until now, we assumed the numbers in number arrays are independent and the corresponding numbers in all number arrays form independent branches of execution. But in this model, it is impossible to carry out useful operations like finding the minimum, calculating the sum or the average, or in general merging the results of independent branches. To make these operations possible, we add a new bit array operation called BITFOLD() to the processor (similar to Louri’s horizontal and vertical shift functions [24]). Given the bit array Bx and integer k, after the operation By ’BitFoldðBx ,kÞ, the i-th bit of By is the ð2k þ iÞ-th bit of Bx. Based on this operation, we define the corresponding number array operation in Procedure 7. Procedure 7. 1: 2: 3: 4: 5: 6: 7:

procedure Fold(A, k) R’0 for i from 0 to 9A91 do Ri ’BitFoldðAi ,kÞ. end for return R end procedure

A.G. Rudi, S. Jalili / Optics & Laser Technology 49 (2013) 173–182

Procedures 8 and 9 demonstrate two possible uses of the operation FOLD() for calculating the average and the minimum of number arrays. Fig. 6 shows the steps in MIN() for an example number array; Ri shows the value of R number array in the i-th iteration of the loop in Procedure 8. Procedure 8. 1: 2: 3: 4: 5: 6: 7: 8:

procedure MIN(A) R’A for i from logðJAJÞ1 downto 0 do N’BitFoldðR,iÞ R’ConditionalArrayðN oR,N,RÞ end for return R end procedure

3: 4: 5: 6: 7:

6. Discussion and related work One of our main goals in this paper was designing a processor, powerful enough to perform various computational tasks, while being easier to implement than the past parallel optical processors. We defer discussing the feasibility of implementing the presented processor until Section 6.3, after reviewing the past processors in Section 6.1 and comparing the architecture of our processor to previous processors in Section 6.2. 6.1. Related word

Procedure 9. 1: 2:

179

procedure Average(A) R’0 for i from logðJAJÞ1 downto 0 do R’BitFoldðR,iÞ þR end for return DivideðA,JAJÞ end procedure

A possible implementation of BITFOLD() operation for the optical implementation presented in Section 2.2 is deflecting the light to shift the data, assuming that both dimensions of data planes are powers of 2. Then for BitFoldðB,kÞ, where B is a 2r  2c data plane, if k o c, a vertical shift of magnitude 2k is performed, and if kZ c, a horizontal shift of size 2kc is performed. As an alternative implementation (to avoid deflecting the light, as it complicates the processor), spacial shifting planes can be introduced, in which each cell detects light from a neighbor cell instead of its own; This allows shifting the data vertically or horizontally, but it is less efficient than the implementation based on light deflection, as it requires more steps for implementing BITFOLD(). Fig. 7 shows the result of applying BITFOLD() operation on a 4  4 array.

As mentioned in the Introduction section, several techniques were presented for parallel optical computing like optical shadow casting, optical symbolic substitution, optical vector–matrix multiplication, and optical switching and logic gates. In optical shadow casting, the light emitted by an array of light emitting diodes passes through a few input planes and then forms an overlapped pattern on the output plane [2]. Optical shadow casting was used to implement addition [4,10], and was augmented to use the modified signed-digit number system [59] to perform addition more efficiently [7,15,16]. In optical symbolic substitution the data is represented as an image and different substitution rules are defined for implementing different logical operations on the image [18]. Optical symbolic substitution was used to implement addition [19,20] and multiplication [21,23]. Probably the most notable work in the literature is Louri’s [22,24]: He presented a parallel optical processor capable of performing various arithmetic operations. In his model, the bits of the operands are stored in separate planes and using a technique called perfect shuffle, the corresponding bits of the operands are placed next to each other to perform logical operations using symbolic substitution. Louri’s architecture supports various arithmetic operations, and in this respect it is probably the most remarkable model in the literature. Goodman [1] and Heinz et al. [25] employed the concepts of Fourier optics to implement vector–matrix multiplication. Reif and Tyagi implemented various other operations like sorting, string matching, and prefix sum based on vector–matrix multiplication [27]. More recently, Shaked et al. employed this Fourier optics-based fast optical matrix multiplication method to solve the bounded TSP [48,60]. Also, Tamir et al. presented heuristic

2 8 5 3 6 4 1 7

6 4 1 7

2 4 1 3

1 3

1 3

3

1

R0

Fold(R0, 3)

R1

Fold(R1, 2)

R2

Fold(R2, 1)

R3

Fig. 6. The value of R in different iterations of Minð½2,8,5,3,6,4,1,7Þ.

n0 n1 n2 n3

n 8 n 9 n 10 n 11

n4 n5 n6 n7

n 12 n 13 n 14 n 15

n4 n5 n6 n7

n2 n3

n1

n 8 n 9 n 10 n 11 n 12 n 13 n 14 n 15 P

BitFold (P, 3)

BitFold (P, 2)

BitFold (P, 1)

Fig. 7. The fold operation for the 4  4 bit array P.

BitFold (P, 0)

180

A.G. Rudi, S. Jalili / Optics & Laser Technology 49 (2013) 173–182

solutions for the TSP using optical vector–matrix multiplication [29,50]. Optical implementations of vector–matrix multiplication usually rely on different intensity levels of light for encoding inputs and outputs. This makes these architectures more vulnerable to noise due to their analog nature and limits the magnitude of the operands because of the limitations in producing, detecting, and storing different intensity levels of light. Some of the past parallel optical processors use technologies like self-electro-optic effect devices (SEED) [61], electron trapping devices [62,63], liquid crystals [36,37], and spatial light modulators [64] for optical logic implementation and optical switching and modulation. Louri et al. [64–67] and Choo et al. [68] used such devices in their optical parallel database architectures. One of the main problems of using nonlinear optical devices like those based on SEED for implementing optical logic gates, is their high energy consumption [69]. This is probably the reason for the preference of many of the recent optical processors for using liquid crystal for switching instead [35–37]. 6.2. Comparison The main differences and advantages of the presented processor compared to previous processors are as follows: Electronic logic implementation: The presented processor uses electronics for implementing the logic and optics for communication (thus addressing Tucker’s first issue concerning the use of nonlinear optical devices in optical processors [69]). Memory integration: Most of the past architectures perform the computation in a single step: they assume that inputs are already encoded in the input planes and they produce the output in the output planes. But for performing algorithms and multi-step computations, the architecture should explicitly specify a mechanism for storing the result of each step to be used in later steps of the computation. Decoding the output, transferring it to the memory, loading the inputs from the memory, and encoding them, in each cycle of the computation for large data planes is a nontrivial task, but is essential for an optical processor. Some authors suggested using external optical memories [24,30,68] (whose integration with the processor may be difficult), or transferring data via electronic buses [24,29,40,41]. In this paper we integrated the memory inside the processor (addressing Tucker’s fifth issue about the unavailability of a satisfactory optical memory [69]). In the proposed processor, accessing the memory is very fast and is performed in each processor cycle. Addressing the limitation on the number of planes inside a processor, for instance by cascading processors, is left as future work. Eliminating the need for combining input planes: In the presented processor, binary operations between data planes are performed between the shadow of the source data plane and the memory of the cells in the destination plane (the Combine state). This eliminates the need for complex operations like Perfect Shuffle [24,70,71] to combine input planes, which is necessary for stateless planes since in techniques like optical shadow casting, optical logic gates or symbolic substitution corresponding bits should be spatially adjacent. Avoiding complex interconnects: The presented processor avoids complex interconnects and light deflection (like in [2,22,24,64–68,72]) and does not require placing planes on the focal point of lenses (like in optical vector–matrix multiplication processors) or light beams (in optical shadow casting). This makes processor architecture much simpler: the identical data planes can be aligned very close to each other, resulting in a compact processor which is much easier to assemble and configure. Diversity of supported operations: Almost all of the past optical processors implement a single arithmetic operations like addition [14,15,23,35,36,53,63], multiplication [70], or vector–matrix multiplication [29,38,39]. In this paper we implement different

arithmetic operations, making it possible to perform various multi-step computational tasks. As pointed out in Section 6.1, Louri’s architecture implements arithmetic operations also [24], however it is architecturally much harder to implement due to light deflections, complex light paths and interconnects, large size, and the use of external memory.

6.3. Feasibility Now we discuss the feasibility of the presented architecture based on the technologies and devices commercially available today. As discussed in Section 2.2, each cell uses electronic devices for implementing the logic and optical devices for communication. The simple logic in each cell is shown in Fig. 8, which can be implemented using CMOS technology. The optical communication devices, i.e. the photodetector and the switching device, should be integrated with the electronic circuit of each cell. Highly efficient photodiodes, based on silicon or other materials, are available and can be easily integrated with CMOS circuits. Modulators such as liquid crystals (LC), micro-electromechanical systems (MEMS), and nonlinear optical devices can be used to implement optical switching in cells. Given the relatively high energy consumption of nonlinear optical devices like those based on SEED, they do not seem feasible for data planes considering their large number of cells. But both LC and MEMS devices have been commercially used in microdisplays [55]. There is a strong resemblance between transmissive microdisplays and data planes: Both of them have a large number of cells (pixels in microdisplays) that can control the transmission of light. Also, most microdisplays use matrix addressing and each microdisplay pixel has one bit of DRAM memory, just like a data plane cell. The main difference is that in data planes the memory of each cell should be initialized from a photo-receiver instead of the active matrix. LC has been widely used in microdisplays [55]. If used to implement data planes, the transmittance of LC determines the number of possible data planes in the processor. A transmittance of 85% [73] may allow more than 30 data planes in a processor, but we expect higher transmittance in future LC devices given the current research efforts in that direction. A photodetector should be added to each pixel of microdisplays; it cannot be placed on the LC pixel itself, since it would block the light from reaching the photodetectors in other data planes. So a semi-transparent mirror should concentrate some of the light in each pixel on the photodetector; this would inevitably decrease the transmittance of the cells. Although the switching speed of LCs is usually low ð o1 kHzÞ, the clock speed of the processor can be higher, since only two data planes are involved in each cycle and the processor can use different planes in consecutive cycles. MEMS devices have also been used in microdisplays [55]. Since achieving a high aperture ratio (the ratio of the area of each pixel that passes the light) is harder in transmissive MEMS microdisplays, most of the commercially available MEMS microdisplays are reflective [74], but transmissive MEMS microdisplays are also available [75]. Note that in data planes a low aperture ratio is acceptable and even required to prevent diffracted light from entering neighbor cells; so producing transmissive MEMS microdisplays for implementing data planes should be much less challenging. Furthermore, transmissive MEMS microdisplays have a number of important advantages over LC microdisplays: First, since switching in MEMS pixels relies on physically blocking the light, pixels in MEMS-based microdisplays have a transmittance of almost 100 % in transparent state. This greatly improves the number of possible data planes in a processor compared to LC microdisplays. Second, by placing the photodiodes on the MEMS shutter itself or adjusting the shutter to reflect the light on the photodiode when in filter state, the photodiodes do not block the

A.G. Rudi, S. Jalili / Optics & Laser Technology 49 (2013) 173–182

Control bus

Photodetector

Filter

0

Mux 4to1

D

181

We are also investigating the ways of automating the conversion of procedural algorithms to execute on the presented model. Also it may be possible to adapt some algorithms specifically for this model, to exploit the reduction extension for more efficient execution; the problems solved in the map-reduce model are especially interesting [78]. From the implementation standpoint, we are evaluating possible technologies to create a prototype of the processor.

D-ff Q

References

Fig. 8. The schematic logic circuit in each cell.

light when pixels are in transparent state. Also, MEMS devices can achieve higher switching speeds compared to LC. Using the presented processor as a complementary technology for today’s electronic processors involves transferring the initial inputs and the final outputs between the electronic and the optical processors and managing the control bus of the data planes to perform the required computational steps. Transferring the initial inputs to the optical processor can be done with a data plane whose cell memory bits can be modified electronically from an external bus (using matrix addressing, like a regular microdisplay) and for transferring the final outputs to the electronic processor, a similar data plane can be used whose cell memories can be transfered electronically out of the processor. Although these steps may be much slower than the optical processor cycles, they should not happen very often for multi-step algorithms and computational tasks, i.e. the majority of today’s computational needs that can benefit from parallelism.

7. Conclusion In this paper we presented a parallel optical processor that implements arithmetic operations for performing general purpose numerical computations. The main strength of the processor is its high parallelism; a 210  210 data plane implementation can perform one million bitwise operations in parallel. As case studies, we solved the bounded subset sum problem and performed parallel primality testing using the presented processor. We extended the processor with the reduction extension of Section 5, to perform operations among the numbers inside a number array, like finding their minimum, efficiently. We discussed the optical implementation of the presented processor in Section 2.2. Its architecture addresses many of the challenges in the previous parallel optical processors: it uses simple interconnects, avoids light deflection, is very compact, and removes the need for transferring data from electronic memories in each cycle by integrating memory planes inside the processor. In addition to the architectural advantages of the presented processor over many of the past parallel optical processors as discussed in Section 6.2, it seems easier to realize as well, considering its similarity with an existing technology, i.e. transmissive microdisplays. However, just as optical interconnects [76,77] whose importance is now widely acknowledged, there are admittedly technical issues that need to be addressed before parallel optical processing could compete with electronic processors, such as the optical switching speed, the crosstalk between cells and data planes, signal distribution delays, the placement of the photodetectors, and the cost. The detailed study of these issues, the limitations they impose, and the required devices is left as future work.

[1] Goodman J. Introduction to Fourier optics. McGraw-Hill; 1968. [2] Ichioka Y, Tanida J. Optical parallel logic gates using a shadow-casting system for optical digital computing. Proceedings of the IEEE 1984;72(7):787–801. [3] Karim M, Awwal A, Cherri A. Polarization-encoded optical shadow-casting logic units: design. Applied Optics 1987;26(14):2720–5. [4] Awwal A, Karim M, Cherri A. Polarization-encoded optical shadow-casting scheme: design of multioutput trinary combinational logic units. Applied Optics 1987;26(22):4814–8. [5] Awwal A, Karim M. Polarization-encoded optical shadow-casting programmable logic array: simultaneous generation of multiple outputs. Applied Optics 1988;27(5):932–6. [6] Awwal A, Cherri A, Karim M. Logical equivalence of optical symbolic substitution and shadow-casting schemes. Optics Communications 1992;91(1–2):18–22. [7] Cherri A, Karim M. Symbolic substitution based flagged arithmetic unit design using polarization-encoded optical shadow-casting system. Optics Communications 1989;70(6):455–61. [8] Cherri A, Awwal A, Karim M. Morphological transformations based on optical symbolic substitution and polarization-encoded optical shadow-casting systems. Optics Communications 1991;82(5):441–5. [9] Alam M, Karim M. Efficient combinational logic circuit design using polarization-encoded optical shadow-casting. Optics Communications 1992; 93(3–4):245–52. [10] Alam M, Karim M. Multiple-valued logic unit design using polarization-encoded optical shadow-casting. Optics & Laser Technology 1993;25(1):17–23. [11] Ahmed J, Awwal A. General purpose computing using polarization-encoded optical shadow casting. In: National aerospace and electronics conference. IEEE; 1992. p. 1146–51. [12] Ahmed J, Awwal A. Polarization-encoded optical shadow casting: an efficient multiprocessor design using truth-table partitioning. Optics & Laser Technology 1991;23(6):345–8. [13] Ahmed J, Awwal A, Haque M. Polarization-encoded optical shadow casting: arithmetic logic unit (ALU) design using truth-table partitioning. Optics Communications 1992;90(1–3):156–64. [14] Awwal A, Karim M. Polarization-encoded optical shadow-casting: direct implementation of a carry-free adder. Applied Optics 1989;28(4):785–90. [15] Li G, Liu L, Hua J, Shao L. Unified negabinary symbolic arithmetic for addition and subtraction with polarization-encoded optical shadow casting. Optics & Laser Technology 1997;29(4):221–7. [16] Fyath R, Alsaffar A, Alam M. Optical binary logic gate-based modified signeddigit arithmetic. Optics & Laser Technology 2002;34(7):501–8. [17] Fyath R, Ali S, Alam M. Four-operand parallel optical computing using shadow-casting technique. Optics & Laser Technology 2005;37(3):251–7. [18] Brenner K. New implementation of symbolic substitution logic. Applied Optics 1986;25(18):3061–4. [19] Jeon H, Abushagur M. Digital optical arithmetic processor based on symbolic substitution. In: Southeastern Symposium on System Theory. IEEE; 1988. p. 221–3. [20] Jeon H, Abushagur M, Sawchuk A, Jenkins B. Digital optical processor based on symbolic substitution using holographic matched filtering. Applied Optics 1990;29(14):2113–25. [21] Brenner K, Kufner M, Kufner S. Highly parallel arithmetic algorithms for a digital optical processor using symbolic substitution logic. Applied Optics 1990;29(11):1610–8. [22] Louri A, Hwang K. A bit-plane architecture for optical computing with twodimensional symbolic substitution. In: International symposium on computer architecture, vol. 16; 1988. p. 18–27. [23] Casasent D, Woodford P. Symbolic substitution modified signed-digit optical adder. Applied Optics 1994;33(8):1498–506. [24] Louri A. Three-dimensional optical architecture and data-parallel algorithms for massively parallel computing. IEEE Micro 1991;11(2):24–7. [25] Heinz R, Artman J, Lee S. Matrix multiplication by optical methods. Applied Optics 1970;9(9):2161–8. [26] Ando S, Sekine S, Mita M, Katsuo S. Optical computing using optical flip-flops in fourier processors: use in matrix multiplication and discrete linear transforms. Applied Optics 1989;28(24):5363–73. [27] Reif J, Tyagi A. Efficient parallel algorithms for optical computing with the discrete fourier transform (DFT) primitive. Applied Optics 1997;36(29):7327–40. [28] Haist T, Osten W. An optical solution for the traveling salesman problem. Optics Express 2007;15:10473–82. [29] Tamir D, Shaked N, Geerts W, Dolev S. Parallel decomposition of combinatorial optimization problems using electro-optical vector by matrix multiplication architecture. Journal of Supercomputing 2010;62(2):633–55.

182

A.G. Rudi, S. Jalili / Optics & Laser Technology 49 (2013) 173–182

[30] Ghoniemy S, Karam O. An optoelectronic expert system for knowledge base systems. In: The 10th international conference on intelligent systems design and applications. IEEE; 2010. p. 689–95. [31] Ghoniemy S, Karam O. Performance of an optoelectronic expert system for massively parallel knowledge base applications. In: International symposium on signal processing and information technology. IEEE; 2010. p. 108–13. [32] Ambs P. Optical computing: a 60-year adventure, Advances in Optical Technologies; 2010. [33] Kress B, Meyrueis P. Applied digital optics: from micro-optics to nanophotonics. Wiley; 2009. [34] Caulfield H, Dolev S. Why future supercomputing requires optics. Nature Photonics 2010;4(5):261–3. [35] Liu Y, Peng J, Chen Y, He H, Su H. A new carry-free adder model for ternary optical computer. In: International symposium on distributed computing and applications to business, engineering and science. IEEE; 2011. p. 64–8. [36] Song K, Yan L. Design and implementation of the one-step MSD adder of optical computer. Applied Optics 2012;51(7):917–26. [37] Wang H, Song K. Simulative method for the optical processor reconfiguration on a dynamically reconfigurable optical platform. Applied Optics 2012;51(2):167–75. [38] Mei L, Hua-can H, Yi J. A new method for optical vector–matrix multiplier. In: 2009 International conference on electronic computer technology. IEEE; 2009. p. 191–4. [39] Wang X, Peng J, Li M, Shen Z, Shan O. Carry-free vector–matrix multiplication on a dynamically reconfigurable optical platform. Applied Optics 2010;49(12): 2352–62. [40] Tamir D, Shaked N, Wilson P, Dolev S. Electro-optical DSP of tera operations per second and beyond. In: Optical SuperComputing. Springer; 2008. p. 56–69. [41] Tamir D, Shaked N, Wilson P, Dolev S. High-speed and low-power electrooptical DSP coprocessor. Journal of the Optical Society of America A 2009;26(8):A11–20. [42] Oltean M. Solving the hamiltonian path problem with a light-based computer. Natural Computing 2008;7(1):57–70. [43] Oltean M, Muntean O. Exact cover with light. New Generation Computing 2008;26(4):329–46. [44] Oltean M, Muntean O. Solving the subset-sum problem with a light-based device. Natural Computing 2009;8(2):321–31. [45] Oltean M, Muntean O. Solving np-complete problems with delayed signals: an overview of current research directions. In: Optical SuperComputing. Springer; 2008. p. 115–27. [46] Dolev S, Fitoussi H. Masking traveling beams: optical solutions for npcomplete problems, trading space for time. Theoretical Computer Science 2010;411(6):837–53. [47] Goliaei S, Jalili S. An optical wavelength-based solution to the 3-sat problem. In: Optical SuperComputing. Springer; 2009. p. 77–85. [48] Shaked N, Messika S, Dolev S, Rosen J. Optical solution for bounded npcomplete problems. Applied Optics 2007;46(5):711–24. [49] Dolev S, Fitoussi H. The traveling beams optical solutions for bounded npcomplete problems. In: Fun with Algorithms. Springer; 2007. p. 120–34. [50] Tamir D, Shaked N, Geerts W, Dolev S. Combinatorial optimization using electro-optical vector by matrix multiplication architecture. In: Optical SuperComputing. Springer; 2009. p. 130–43. [51] Sartakhti JS, Jalili S, Rudi AG. A new light-based solution to the hamiltonian path problem. Future Generation Computer Systems 2013;29(2):520–27. [52] Goliaei S, Jalili S, Salimi J. Light-based solution for the dominating set problem. Applied Optics 2012;51(29):6979–83. [53] Bocker R, Drake B, Lasher M, Henderson T. Modified signed-digit addition and subtraction using optical symbolic substitution. Applied Optics 1986;25(15): 2156–465.

[54] Flynn M. Some computer organizations and their effectiveness. IEEE Transactions on Computers 1972;100(9):948–60. [55] Armitage D, Underwood I, Wu S. Introduction to microdisplays. Wiley; 2006. [56] Miller G. Riemann’s hypothesis and tests for primality. Journal of Computer and System Sciences 1976;13(3):300–17. [57] Rabin M. Probabilistic algorithm for testing primality. Journal of Number Theory 1980;12(1):128–38. [58] Cormen T, Leiserson C, Rivest R, Stein C. Introduction to algorithms. 3rd ed.MIT Press; 2009. [59] Avizienis A. Signed-digit number representations for fast parallel arithmetic. IRE Transactions on Electronic Computers 1961;EC-10:389–400. [60] Shaked N, Tabib T, Simon G, Messika S, Dolev S, Rosen J. Optical binarymatrix synthesis for solving bounded np-complete combinatorial problems. Optical Engineering 2007;46(10):108201–11. [61] Miller D. Quantum-well self-electro-optic effect devices. Optical and Quantum Electron 1990;22:61–98. [62] Li G, Qian F, Ruan H, Liu L. Compact parallel optical modified-signed-digit arithmetic-logic array processor with electron-trapping device. Applied Optics 1999;38(23):5039–45. [63] Qian F, Li G, Ruan H, Jing H, Liu L. Two-step digit-set-restricted modified signed-digit addition–subtraction algorithm and its optoelectronic implementation. Applied Optics 1999;38(26):5621–30. [64] Louri A. Optical content-addressable parallel processor: architecture, algorithms, and design concepts. Applied Optics 1992;31(17):3241–58. [65] Louri A, Hatch J. Optical content-addressable parallel processor for highspeed database processing. Applied Optics 1994;33(35):8153–63. [66] Louri A, Hatch J. Optical implementation of a single-iteration thresholding algorithm with applications to parallel data-base/knowledge-base processing. Optics Letters 1993;18(12):992–4. [67] Louri A, Hatch J. Optical content-addressable parallel processor for highspeed database processing. Applied Optics 1994;33(35):8153–63. [68] Choo P, Detofsky A, Louri A. Multiwavelength optical content-addressable parallel processor for high-speed parallel relational database processing. Applied Optics 1999;38(26):5594–604. [69] Tucker R. The role of optics in computing. Nature Photonics 2010;4(7):405. [70] Al-Ghoneim K, Casasent D. High-accuracy pipelined iterative-tree optical multiplication. Applied Optics 1994;33(8):1517–27. [71] Walker A, Yang T, Gourlay J, Dines J, Forbes M, Prince S, et al. Optoelectronic systems based on InGaAs-complementary-metal-oxide-semiconductor smartpixel arrays and free-space optical interconnects. Applied Optics 1998;37(14): 2822–30. [72] Alam M, Karim M, Awwal A. Two-dimensional register design using polarizationencoded optical shadow-casting. Optics Communications 1994;106(1–3):11–8. [73] Jiao M, Li Y, Wu S. Low voltage and high transmittance blue-phase liquid crystal displays with corrugated electrodes. Applied Physics Letters 2010;96(1) 011102–1–011102–3. [74] Liao C, Tsai J. The evolution of MEMS displays. IEEE Transactions on Industrial Electronics 2009;56(4):1057–65. [75] Brosnihan T, Payne R, Gandhi J, Lewis S, Steyn L, Halfman M, et al. Pixtronix digital micro-shutter display technology: a MEMS display for low power mobile multimedia displays. In: International society for optics and photonics, vol. 7594. SPIE; 2010. [76] Miller D. Device requirements for optical interconnects to silicon chips. Proceedings of the IEEE 2009;97(7):1166–85. [77] Miller D. Optical interconnects to electronic chips. Applied Optics 2010;49(25):F59–70. [78] Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM 2008;51(1):107–13.