Applied Mathematics and Computation 167 (2005) 16–27 www.elsevier.com/locate/amc
An efficient binary sequence generator with cryptographic applications ´ lvarez, Joan-Josep Climent *, Leandro Tortosa, Rafael A Antonio Zamora Departament de Cie`ncia de la Computacio´ i Intel Æ lige`ncia Artificial, Universitat d’Alacant, Campus de Sant Vicent del Raspeig, Ap. Correus 99, E-03080, Alacant, Spain
Abstract As the Internet has become more and more important, the need for secure systems has increased accordingly. Our proposal is an efficient sequence generator that can be used in these secure systems. It is based on the powers of a block upper triangular matrix and, besides achieving great statistical results and efficiency, it is very flexible and can be easily adapted to different applications. The matricial techniques employed can also be used in other kinds of cryptographic primitives. 2004 Elsevier Inc. All rights reserved. Keywords: Pseudorandom generator; Blum–Blum–Shub generator; Binary sequence; Cipher; Efficient; Cryptography; Security; Monobit test; Serial test; Poker test; Autocorrelation; Linear complexity; Primitive polynomial; Companion matrix; Block upper triangular matrix; Order
*
Corresponding author. ´ lvarez),
[email protected] (J.-J. Climent), E-mail addresses:
[email protected] (R. A
[email protected] (L. Tortosa),
[email protected] (A. Zamora). 0096-3003/$ - see front matter 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2004.06.065
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
17
1. Introduction During the latest years, the Internet has become extremely popular. Everybody seems to be present in one way or another: businesses setup facilities for electronic commerce, governments provide services, and even individuals create sites of the most diverse subjects. Despite the immense possibilities, the Internet and the Web are vulnerable to malicious attacks. The harmful results of these attacks are multiplied by the fact that an attack not only implies a loss of service and money, or an inconvenience to the end user, but it also implies a damage to the business reputation that not many can afford. Businesses have taken this into account, increasing the demand of more secure services. Various approaches to the Internet security have been considered, differing in the scope of applicability and the location within the TCP/IP stack. The IPSec [6] system is a general purpose solution placed below the TCP protocol so it is transparent to end users and applications; it includes a filtering system allowing that only the required traffic is processed by IPSec. The SSL (Secure Socket Layer) [7] and TLS (Transport Layer Security) [5] systems are also relatively general purpose, but are placed right over the TCP protocol, providing the choice of being implemented either within the protocol suite (and accessible to all applications) or by specific packages such as browsers and http servers. Another approach is placing the security system within the application, tailoring it to the specific needs and peculiarities of that given application; being SET (Secure Electronic Transaction) [20] the most popular example of this approach. All of these security systems use cryptography; generally using public key cryptosystems to establish a session key and private key (symmetric) cryptosystems to transfer data securely between both parties using that session key. For example, SSL [7] uses RSA, Diffie-Hellman and Fortezza to exchange keys (establish session keys); and RC4, RC2, DES, 3DES, DES40, IDEA and Fortezza to transfer the data between client and server. Most cryptographic systems are based on unpredictable quantities (see [16,19,21]). The keys, prime numbers or challenge values of many cryptosystems need to be unpredictable enough so that the probabilities of all different values are more or less the same, making impossible any search optimization based on the reduction of the key space to the most probable values. These are obtained from random sequences that can be either truly random or pseudorandom, meaning that they are generated deterministically but appear to be random enough for practical use. A truly random generator is based on a natural source of randomness. This source is sampled and then postprocessed to make it free of biasing and skewing. The system must be designed to prevent any observation or manipulation by the opponent and also any non random external interference (such as
18
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
electromagnetic radiation from periodic sources, etc.). Proper function should be tested periodically ensuring that the previous conditions hold true. Natural sources of randomness include the thermal noise of a resistor, noise input from a microphone or a camera, particle emission time during radioactive decay, etc; software based events can also be useful, such as the system clock, the elapsed time between keystrokes, mouse moves or network packets, operating system statistics, etc. [16,19,21]. A pseudorandom generator [9,11–13,16,19,21] is a completely deterministic algorithm, in the sense that the sequence it generates is a function of its inputs and, unlike a truly random generator, its output can be reproduced. This means that we only need the seed (the input to the pseudorandom generator) in order to regenerate the complete output sequence. The output sequence is much longer than the seed and it is not really random, it is just undistinguishable from a real random sequence. For security applications we need to produce sequences with large periods, high linear complexities and good statistical properties. Several statistical tests are applied; these tests include checking the frequency of single bits, pairs of bits and of other bit patterns. Also, the autocorrelation and the linear complexity of the sequence are used. [4,16,19,21]. Most available cryptographic generators are based on linear feedback shift registers (LFSRs) (see [8,10,18]). The LFSRs are so popular because they can be easily implemented in hardware, they produce sequences of large periods and with good statistical properties, and have a simple structure that can be analyzed easily. LFSRs by themselves are not very secure, but because they are so efficient they are commonly enhanced with other techniques to improve their cryptographic properties. Another popular generator is the Blum–Blum–Shub (BBS) [3] generator. It is based on two large prime numbers p and q that satisfy p 3(mod 4) and q 3(mod 4), and a randomly chosen number s that is relatively prime to n = pq. Let X0 = s2 mod n and Xi = (Xi 1)2 mod n. The output sequence is obtained by taking the least significant bit of each Xi. The BBS generator is secure due to the intractableness of factoring n. Other generators combine a block cipher in different ways to obtain a cryptographically secure random bit sequence, such as the pseudorandom generator specified in ANSI X9.17 which performs three triple DES encryptions on each iteration [22]. Our proposal is a new kind of generator which uses matricial techniques. It can be used to generate session keys, random values in challenge/response systems, etc. It is a very flexible algorithm, which can be adjusted to satisfy memory and speed requirements and can also be modified to create other kinds of cryptographic primitives. The rest of the paper is organized as follows: the required linear algebra basics are given in Section 2; Section 3 describes the generator in detail; and Sec-
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
19
tion 4 includes the results obtained by our proposal in some standard tests and a comparison with BBS; finally we give some conclusions in Section 5. 2. Preliminaries In this section, we present some basic linear algebra notions and properties that are needed for our proposals (see [14] for a more detailed description). Besides, we introduce the mathematical notation we are going to use in the following. Let f ðxÞ ¼ a0 þ a1 x þ þ an1 xn1 þ xn be a monic polynomial in Zp ½x . The companion matrix of f is the n · n matrix 3 2 0 1 0 0 0 6 0 0 1 0 0 7 7 6 6 . .. 7 . . . 6 . . . . A¼6 . . 7 . . . 7: 7 6 4 0 0 0 0 1 5 a0 a1 a2 an2 an1 If f is irreducible over Zp ½x , then the order of A is equal to the order of any root of f in Fpn and the order of A divides pn 1. Furthermore, if we assume that f is a primitive polynomial over Zp ½x , then the order of A is exactly pn 1. Consequently, if we work with primitive polynomials over Zp ½x , we can easily build matrices whose orders are maximal. Odoni et al. [17], propose an extended scheme which is based on the construction of the block matrix 2 3 A1 0 0 6 0 A 0 7 6 7 2 7 A¼6 .. 7; .. 6 .. 4 . . 5 . 0 0 Ak where Ai is the companion matrix of fi, being fi, for i = 1, 2, . . ., k, distinct primitive polynomials over Zp ½x of degrees ni, for i = 1, 2, . . ., k, respectively. The order of Ai is pni 1, for i = 1, 2, . . ., k, therefore, the order of A is given by lcmð pn1 1; pn2 1; . . . ; pnk 1Þ: With the aim of using a matrix of this kind in a public key distribution system, they conjugate this matrix with an invertible matrix P, with n = n1 + n2 + + nk, obtaining a new matrix A ¼ P AP 1 which has the same order as that of A.
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
20
3. Description of the generator Our generator is based on the powers of block upper triangular matrices defined over Zp , with p prime. As we take the different powers of these matrices we have, as a result, a sequence of matrices of very long period that has great properties in terms of randomness. Each element of the sequence (each block upper triangular matrix) can be processed to obtain a series of values that conform an output sequence with great statistical values. This scheme is simple enough to be really fast but incorporates enough complexity to present great cryptographic properties; it is also greatly adaptable for other cryptographic primitives such as block ciphers, hash functions and public key cryptosystems. Consider the block upper triangular matrix M defined as A X ; ð1Þ M¼ O B whose entries lie in Zp , where A is an r · r matrix, B is an s · s matrix and X is an r · s matrix, and O denotes the s · r zero matrix. The following result, which is the base of our generator, establishes the expression of the different powers of matrix M. It also defines matrix X (h) in terms of A, B and X. Theorem 1. Let M be the block upper triangular matrix given by (1). Taking h as a non negative integer then " # h ðhÞ A X Mh ¼ ; ð2Þ O Bh where X
ðhÞ
8
if h ¼ 0; ð3Þ
if h P 1:
i¼1
Also, if 0 6 t 6 h then X ðhÞ ¼ At X ðhtÞ þ X ðtÞ Bht :
ð4Þ
Proof. We prove Eq. (2) first, using induction over h. For h = 0 and h = 1 the result is obvious. We suppose that it is true for h 1 and check that it holds true for h. Clearly, h
M ¼ MM
h1
¼
A
X
O
B
"
Ah1 O
X ðh1Þ Bh1
#
" ¼
Ah
AX ðh1Þ þ XBh1
O
Bh
# :
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
21
From the induction hypothesis and from (3) we have that AX ðh1Þ þ XBh1 ¼ A
h1 X
Ah1i XBi1 þ XBh1 ¼
i¼1
¼
h X
h1 X
Ahi XBi1 þ XBh1
i¼1
Ahi XBi1 ¼ X ðhÞ ;
i¼1
obtaining the same expression that in (2). Also, if 0 6 t 6 h we have " #" # " At X ðtÞ Aht X ðhtÞ Ah h t ht M ¼MM ¼ ¼ O Bt O Bht O
At X ðhtÞ þ X ðtÞ Bht Bh
# :
Comparing this result with (2) we obtain (4). h In order to generate the pseudorandom bit sequence, we fix matrices A and B and choose randomly matrix X, which constitutes the seed of the sequence. Next we apply expression (4) to obtain the following sequence of matrices: X ð2Þ ; X ð3Þ ; X ð4Þ ; . . .
ð5Þ
(h)
For each matrix X , for h = 2, 3, . . ., we establish a bit extraction operation which can be as simple as adding all the elements of X(h) obtaining a new element x(h), for h = 2, 3, . . ., in Zp from which we take the least significant bit, b(h), of its binary expression; or as complex as required and taking as many bits per iteration as needed. In this way, we have the sequence of bits bð2Þ ; bð3Þ ; bð4Þ ; . . . This sequence is then filtered by the following process, improving security and removing any bias from the sequence: cðiÞ ¼ bðiÞ XOR cði1Þ for i ¼ 2; 3; . . . and cð1Þ ¼ 0:
ð6Þ
One of the basic properties that every pseudorandom sequence should hold is that its period must be very long (at least a period of 2128). Now, we analyze the way we can get long periods for the sequence given by (6). The key for the solution of this problem is the construct appearing in Section 2. Let f(x) = a0 + a1x + + ar1xr1 + xr and g(x) = b0 + b1x + + s1 bs1x + xs be two primitive polynomial in Zp ½x and let A and B be the corresponding companion matrices. Let P and Q be two invertible matrices such that A ¼ P AP 1 and B ¼ QBQ1 . With this construction, matrix M of (1), and therefore the period of the sequence (5) is lcmðpr 1; ps 1Þ:
22
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
Table 1 Order of M for different values of p, r and s p
r
s
Digits
3
32 48 64 32 30 64 24 32 64 22 32 64 16 32 64 16 32 64 12 32 64 9 32 64
31 47 63 31 33 63 27 31 63 21 31 63 19 31 63 15 31 63 13 31 63 10 31 63
30 39 47 38 39 61 39 43 70 39 50 67 39 57 98 40 64 111 46 76 168 40 93 169
5
7
11
19
31
251
257
The value of p or the sizes of A and B need not be very large in order to achieve long periods, as seen in Table 1, which shows some results for the period in terms of the parameters p, r and s. The value appearing in the column digits represents the number of decimal digits of the period of the binary sequence (the integer 2128 has 39 decimal digits). Observe that the values taken for r and s are relatively prime, in order to optimize the period. Finally, it should be remarked that the generation of the sequence (6) is very efficient if we consider t = 1 in expression (4); and that every output bit depends on the seed X.
4. Results 4.1. Test parameters The generator has been checked with five different statistics [2,16] (frequency tests plus autocorrelation tests) and with the linear complexity of the sequence [1,15]. In these tests, n is the length of the sequence.
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
23
The monobit test statistic is a single bit frequency test designed to check that the number of zeros and ones of the sequence is more or less the same. The test is expressed by the following equation, with n0 and n1 being the number of zero bits and one bits respectively 2
X1 ¼
ðn0 n1 Þ : n
This test follows a v2 distribution with 1 degree of freedom. The serial test checks that the frequency of pairs of bits (00, 01, 10 and 11) is also about the same. The parameters n00, n01, n10 and n11 are the number of occurrences of such pairs allowing them to overlap. X2 ¼
4 2 ðn2 þ n201 þ n210 þ n211 Þ ðn20 þ n21 Þ þ 1: n 1 00 n
This follows a v2 distribution with 2 degrees of freedom. The poker test checks that patterns of length m appear the same number of times sequence without overlapping. The value m is obtained with n in the m n ¼ 5 2 and n , k is taken as k ¼ is the number of occurrences of the i m m pattern i. ! 2m 2m X 2 X3 ¼ n k: 3 i¼1 i X3 follows a v2 distribution with 2m 1 degrees of freedom. A run is a pattern of all zeros or all ones; a block is a run of ones and a gap is a run of zeros. The expected number of runs of length i is ei = (n i + 3)/2i+2. The value of k is taken as the largest integer i for which ei P 5 and is the length limit for which runs are accounted; finally, Bi and Gi are the number of blocks and gaps of length i (up to length k) of the sequence. The runs test is expressed by X4 ¼
2 2 k k X X ðBi ei Þ ðGi ei Þ þ ; ei ei i¼1 i¼1
following a v2 distribution with 2k 2 degrees of freedom. The autocorrelation checks for coincidences between the sequence and itself displaced d bits. Taking A(d) as the amount of bits not equal between the sequence and itself displaced by d bits, we have 2 AðdÞ nd pffiffiffiffiffiffiffiffiffiffiffi 2 ; X5 ¼ nd which follows a N(0,1) distribution. The linear complexity of a sequence of bits is the length of the shortest LFSR that can generate that sequence. The expected linear complexity for a
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
24
random sequence is n/2. If we plot the linear complexity of a sequence against its length (for n = 1,2,3. . .) we have the linear complexity profile which should follow closely the line n/2 if the sequence is random. The linear complexity of a sequence can be computed using the Berlekamp–Massey algorithm [1,15]. 4.2. Test results The expected value for the linear complexity of a sequence is LC = n/2, being n the length of the sequence. For the rest of the tests, the results are compared with the threshold values given in Table 2, passing the test those below the threshold; a is the significance level or, more exactly, the probability that a sequence fails the test even if it should have passed that test. The higher the significance level, the more restrictive we are with the results. In our experimentation we have considered different values for p, r and s, obtaining in all cases excellent results for the six tests described in this section. As an example, in Table 3 we show the results for p = 257, r = 8 and s = 9. The primitive polynomials used to obtain matrices A and B, as described in Section 3, are f(x) = x8 + x + 19 and g(x) = x9 + x + 6 respectively. The seed X is an 8 · 9 matrix obtained using a white noise based random generator. We have performed each test with different sequences of lengths 2 · 103, 2 · 104, 2 · 105 and 2 · 106 bits; and different values of X. We can see that the results are particularly good, much better than the threshold values for a = 0.100 in Table 2; they are also highly regular, showing
Table 2 Statistics threshold values Monobit
a
Serial
Poker
Runs
Autocorrelation 3
0.001 0.005 0.010 0.025 0.050 0.100
10.830 7.870 6.635 5.024 3.842 2.706
13.820 10.590 9.210 7.378 5.992 4.605
330.5 316.9 310.5 301.1 293.2 284.3
2 · 10
2 · 10
29.59 25.19 23.21 20.48 18.31 15.99
39.25 34.27 32.00 28.85 26.30 23.54
4
5
2 · 10
2 · 10
51.18 45.56 42.98 39.36 36.42 33.20
59.70 53.67 50.89 46.98 43.77 40.26
6
3.090 2.576 2.326 1.960 1.645 1.282
Table 3 Statistical test results Length 3
2 · 10 2 · 104 2 · 105 2 · 106
Monobit
Serial
Poker
Runs
Autocorrelation
LC
1.352 0.012 1.095 0.814
1.514 1.579 1.451 3.313
251.7 243.1 239.1 207.2
11.260 8.441 26.550 37.590
0.800 0.800 0.789 0.807
1001 10001 100000 1000000
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
25
Fig. 1. Linear complexity profile of the sequence of length 2000.
no strange values and with all values passing the tests equally well. As we increase the number of bits produced we see that there is no direct relation between the length of the sequence and the values for the tests, there being differences mostly due to the fact that distinct X matrices were used for each sequence. The linear complexity test also yields excellent results, reaching in all three tests the n/2 mark. This means that the sequence presents about the same nonlinearity as that of a truly random sequence, avoiding predictability due to high linearity. The linear complexity profile is plotted in Fig. 1, showing how good the linear complexity follows n/2. 4.3. Comparison with the BBS generator We show in Table 4 a comparison between our algorithm and the well known BBS generator. The time shown is the time in seconds consumed by each algorithm to generate a 2 · 105 bits long sequence; not counting any precomputation or key setup, only the actual bit generation. The test has been conducted using the same computer and compiler with the same optimization settings for both algorithms, in order to make the test as fair as possible. Similar results have been achieved for different sequence lengths. The results confirm that the proposed generator is a lot faster than BBS, while the statistical results are better for most of the parameters. The main Table 4 Comparison with BBS for a 2 · 105 bits sequence Algorithm
Monobit
Serial
Poker
Runs
Autocorrelation
LC
Time
Matricial BBS
0.041 0.776
0.657 1.188
237.5 272.9
20.798 10.342
0.784 0.792
100000 100000
0.013 s 1.020 s
26
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
advantage of our algorithm lies in its flexibility, allowing to choose matrix sizes and bit extraction schemes for a given application, therefore optimizing for higher speeds, better security or low memory requirements.
5. Conclusions With the aim of contributing tools useful in secure systems, we have presented a new cryptographic pseudorandom generator based on the powers of a block upper triangular matrix. It is remarkable that, using primitive polynomials to generate the diagonal blocks and taking very small primes and matrix sizes, we can produce bit sequences of very high periods, great statistical properties and high linear complexities; in this way, the sequence can be generated very efficiently since no time consuming computations are required as it would be necessary for large primes or huge matrices. Even with such small sizes, the algorithm assures a high level of security due to the fact that the key space is very large and that the seed participates in the generation of each bit. The algorithm appears even more useful if we take into account that the size of the matrices and the number of bits taken per iteration can be chosen precisely for a given application, satisfying low memory requirements or obtaining very high speeds.
References [1] E.R. Berlekamp, Algebraic Coding Theory, McGraw Hill, New York, 1968. [2] H. Beker, F. Piper, Cipher Systems: The Protection of Communications, John Wiley and Sons, New York, 1982. [3] L. Blum, M. Blum, M. Shub, A simple unpredictable pseudorandom number generator, SIAM J. Comput. 15 (1986) 364–383. [4] M. Blaze, W. Diffie, R. Rivest, B. Schneier, T. Shimomura, E. Thompson, M. Weiner, Minimal Key Lengths for Symmetric Ciphers to Provide Adequate Commercial Security, Counterpane Labs Publications, 1996, Available on http://www.counterpane.com/ keylength.html. [5] T. Dierks, C. Allen, The TLS Protocol. Internet Draft, www.ietf.org, RFC2246 (1999). [6] N. Ferguson, B. Schneier, A Cryptographic Evaluation of IPSec, Counterpane Labs Publications, 1999, Available on http://www.counterpane.com/ipsec.html. [7] A.O. Freier, P. Karlton, P.C. Kocher, The SSL Protocol, Version 3.0. Internet Draft, Netscape, March 1996. [8] A. Fuster, J. Garcı´a, An efficient algorithm to generate binary sequences for cryptographic purposes, Theor. Comput. Sci. 259 (2001) 679–688. [9] J. Garcı´a, M. Rodrı´guez, A family of keystream generators with large linear complexity, Appl. Math. Lett. 14 (2001) 545–547. [10] S.W. Golomb, Shift Register Sequences, Aegean Park Press, California, 1982. [11] J. Kelsey, B. Schneier, N. Ferguson, Yarrow-160: notes on the design and analysis of the Yarrow cryptographic pseudorandom number generator, Sixth Annual Workshop on Selected
R. A´lvarez et al. / Appl. Math. Comput. 167 (2005) 16–27
[12] [13]
[14] [15] [16] [17] [18] [19] [20] [21] [22]
27
Areas in Cryptography, Lecture Notes in Computer Science, Springer-Verlag, Berlin, Vol. 1758 (1999) 13–33. J. Kelsey, B. Schneier, D. Wagner, C. Hall, Cryptanalitic attacks on pseudorandom number generators, Fast Software Encryption, Fifth International Workshop, Springer-Verlag, 1998. J.C. Lagarias, Pseudorandom number generators in cryptography and number theory. in: C. Pomerance (Ed.), Proc. Symposia in Applied Mathematics, Cryptography and Computational Number Theory. 42 (1990) 115–143. R. Lidl, H. Niederreiter, Introduction to Finite Fields and their Applications, Cambridge University Press, Cambridge, 1994. J.L. Massey, Shift-register synthesis and BCH decoding, IEEE Trans. Inf. Theory 15 (1969) 122–127. A. Menezes, P. van Oorschot, S. Vanstone, Handbook of Applied Cryptography, CRC Press, Florida, 2001. R.W.K. Odoni, V. Varadharajan, P.W. Sanders, Public key distribution in matrix rsings, Electron. Lett. 20 (1984) 386–387. R.A. Rueppel, Analysis and Design of Stream Ciphers, Springer-Verlag, Berlin, 1986. B. Schneier, Applied Cryptography: Protocols, Algorithms and Source Code in C, second ed., John Wiley and Sons, New York, 1996. SET Book 3: Formal Protocol Definition. SET Secure Electronic Transaction LLC (SETCo), 1999. W. Stallings, Cryptography and Network Security: Principles and Practice, Third ed., Prentice Hall, New Jersey, 2003. Accredited Standards Committee, X9 – Financial Services: American National Standard, Financial Institution Key Management. X9––Secretariat, American Bankers Association, ANSI X9.17. 1995.