Information Processing Letters 70 (1999) 89–93
A fast correlation attack on multiplexer generators L. Simpson a,1 , J. Goli´c b,2 , M. Salmasizadeh c,3 , E. Dawson a,∗ a Information Security Research Centre, Queensland University of Technology, GPO Box 2434, Brisbane Q 4001, Australia b Faculty of Electrical Engineering, University of Belgrade, Bulevar Revolucije 73, 11001 Belgrade, Yugoslavia c Electronic Research Centre, Sharif University of Technology, P.O. Box 11365-8639, Tehran, Iran
Received 1 August 1998 Communicated by S.G. Akl
Abstract The security of multiplexer generators is investigated with respect to a fast correlation attack based on iterative probabilistic decoding. The number of inputs to the multiplexer determines the coefficient of correlation between the keystream sequence and phase-shifts of one of the underlying shift register sequences. This known plaintext attack is successful if the number of low-weight parity-checks used is sufficiently large given the number of inputs, 2k , to the multiplexer. Successful experimental attacks are conducted for k = 2, 3 and 4. For multiplexer generators with large component shift registers, this attack is an improvement on previously described attacks based on similar assumptions. 1999 Elsevier Science B.V. All rights reserved. Keywords: Cryptography; Multiplexer generator; Fast correlation attacks; Correlation coefficients
1. Introduction This paper investigates the security of the multiplexer generator (MG), a type of keystream generator for stream ciphers, with respect to the fast correlation attack based on iterative error-correction. This known plaintext attack is conducted under the assumption that the cryptanalyst knows the complete structure of the generator, and the secret key defines only the initial states of the component shift registers. In this paper, as in [7], multiplexed sequences are defined as a slight generalization of the class of binary sequences introduced and analyzed in [8]. Let d = ∗ Corresponding author. Email:
[email protected]. 1 Email:
[email protected]. 2 Email:
[email protected]. 3 Email:
[email protected]. Research conducted while the author was at QUT.
{d(t)}∞ t =0 be a binary maximum-length sequence of period Pd = 2n − 1, n > 1. Then the multiplexed sequence z = {z(t)}∞ t =0 is defined by z(t) = d t + γ (X(t)) ,
t > 0,
(1)
where X = {X(t)}∞ t =0 is a periodic integer sequence with period Pa = 2m − 1, m > 1, such that the range of values for X is {0, . . . , K − 1} and γ is an injective mapping {0, . . . , K − 1} → {0, . . . , n − 1}, K > 1. The multiplexed sequence z can be produced by a MG using two regularly clocked binary linear feedback shift registers (LFSRa and LFSRd ) and a multiplexer (MUX), as illustrated in Fig. 1. Let both LFSRa and LFSRd have primitive feedback polynomials and lengths m and n, respectively. The multiplexer acts as a nonlinear combiner in the following manner. At time instant t, the contents of a fixed set of k stages of LFSRa are used to form an address, X(t). This is
0020-0190/99/$ – see front matter 1999 Elsevier Science B.V. All rights reserved. PII: S 0 0 2 0 - 0 1 9 0 ( 9 9 ) 0 0 0 4 5 - 9
90
L. Simpson et al. / Information Processing Letters 70 (1999) 89–93
d(t)
LFSRd a(t) 6
...
. X(t) .. -
ADDRESS
? LFSRa
using the sequences d and z. The recovered initial states are then tested for correctness by generating the keystream sequence for these initial states and comparing with the original keystream sequence. The attack is described in more detail in Sections 2.1 and 2.2.
INPUT
?
MUX
z(t) ? Fig. 1. Multiplexer generator.
used by the multiplexer to select a stage, γ (X(t)), of LFSRd from a fixed set of K stages Γ = γ (X): 0 6 X 6 K − 1 , where K = 2k for k 6 m − 1, and K = 2k − 1 for k = m. The content of the stage Γ (X(t)) is output as z(t), the current term in the keystream sequence. Both registers are clocked and the process repeated to form the keystream, a multiplexed sequence. Multiplexed sequences possess a cross-correlation weakness [7]: there is a termwise statistical dependence between the output sequence z and phase-shifts of the sequence d. The correlation coefficient between ∞ {z(t)}∞ t =0 and {d(t + ∆t)}t =0 , is equal to the probability that z(t) is chosen to be the same random variable as d(t + ∆t), for any ∆t ∈ Γ . Therefore, the correlation coefficient is equal to 1/K, independently of t. This can be exploited to perform a divide and conquer correlation attack.
2.1. Recovery of initial state of LFSRd The fast correlation attack on the MG is based on a model [11] in which the observed segment of the keystream sequence z = {z(t)}N−1 t =0 is regarded as a noisy version of a segment of a phase-shifted version of the unknown LFSRd sequence d, that is, z(t) = d(t + ∆t) + e(t) for some ∆t ∈ Γ , where {e(t)}N−1 t =0 is a segment of a binary noise sequence with Pr(e(t) = 1) = p < 0.5 for each t = 0, 1, . . . , N − 1. The relationship between the correlation coefficient between z(t) and d(t + ∆t), c, and the probability of noise, p, is given by c = 1 − 2p > 0. An iterative probabilistic error-correction algorithm [5], based on [9], is used to make modifications to the keystream sequence to obtain dˆ = {d(t + ∆t)}N−1 t =0 , a phase-shifted version of the original LFSR sequence d, where the correct phase-shift, ∆t ∈ Γ , has to be determined. The algorithm uses low-weight parity-checks based on the LFSRd feedback polynomial. Consequently, for any assumed ∆t, the candidate LFSRd initial state is obtained from any n consecutive bits of dˆ by backward LFSR recursion. Each candidate LFSRd initial state is then used to generate a candidate shift register sequence, whose minimal and maximal phase-shifts (from Γ ) are then tested for correlation with the original keystream sequence z. A candidate phase-shift is accepted if and only if both the correlations are close to 1/K. Note that testing the correlation requires O(K 2 ) known keystream bits which is typically much smaller than N . As a result, if the fast correlation attack is successful, the correct LFSRd initial state, that is, the sequence d is recovered.
2. Fast correlation attack algorithm 2.2. Recovery of initial state of LFSRa The fast correlation attack on the MG is a divide and conquer attack. Firstly, the correlation between the keystream sequence z and a phase-shift of the underlying shift register sequence d is used to attack LFSRd . Assuming the initial state of LFSRd has been recovered through the fast correlation attack, LFSRa is then attacked, and the initial state of LFSRa reconstructed
When the LFSRd sequence is known, the MG reduces to a time-varying nonlinear filter generator with LFSRa as the underlying shift register and the filter function defined by the stages of LFSRd used as inputs to the multiplexer. The LFSRa initial state is reconstructed by a sort of conditional correlation attack [10].
L. Simpson et al. / Information Processing Letters 70 (1999) 89–93
Each input to the multiplexer defines a Boolean function of k input variables. For each such Boolean function, find correlation coefficients to all the 2k − 1 nonzero input functions conditioned on the value of the known keystream bit. Select correlation coefficients with absolute value greater than or equal to some assumed threshold (e.g., 0.8). The corresponding linear equations involving the address bits, X(t), thus hold with probabilities significantly different from one half. Repeat the procedure until a sufficient number of linear equations are collected. Then recover the LFSRa initial state by randomly choosing a subset of m of the collected linear equations. If the subset is nonsingular, then solve the system to obtain the candidate LFSRa initial state, otherwise select another subset. The reconstructed LFSRa initial state is tested for correctness by using the LFSRd and candidate LFSRa initial states to generate the output sequence and comparing it with the known keystream sequence. If they are identical, then the correct LFSR initial states have been found and are unique if the LFSR lengths are coprime (see [6]). If k is small (e.g., k 6 3) then the LFSRa initial state may be recovered in a simplified procedure by setting the threshold value equal to 1. 2.3. Testing procedure The performance of the fast correlation attack algorithm described above was experimentally analyzed to find conditions under which the attack on the MG can be successful. The experiments use the following procedure: the sequence a is generated and the sequence of addresses is constructed from it; the sequence d is generated and the sequence of addresses is applied to it to generate the corresponding segment of the keystream sequence, z; the fast correlation attack is then launched on the produced segment of the keystream sequence. This is repeated for a number of sequences a and d generated from different LFSRa and LFSRd initial states, respectively. The set of parity-checks used is formed from phaseshifts of the parity-check polynomials. For k = 2 and 3 these polynomials are obtained by repeated squaring [9] of the LFSRd feedback polynomial, assumed to have a low weight (number of nonzero terms). For k > 4 the correlation coefficient 1/K is so small that many parity-checks must be generated to conduct a success-
91
ful attack. Obtaining these by repeated squaring results in parity-checks which require impractically long segments of the keystream sequence. Other methods then have to be applied to obtain low-weight polynomial multiples [4]. These methods can also be applied if the LFSRd polynomial does not have a low weight (e.g., if its weight is 10 or bigger). If the parity-checks used are not of low weight, then the required keystream length is reduced, but then more such parity-checks are needed and the complexity of the fast correlation attack may become prohibitively high.
3. Experimental results In the experiments, we used a 59-bit LFSRa with primitive feedback polynomial 1 + x 2 + x 4 + x 7 + x 59 , and a 63-bit LFSRd with primitive feedback polynomial 1 + x + x 63 . The experiment was conducted for k = 2, 3 and 4. For k = 2 and 3 a set of paritycheck polynomials was formed by the repeated squaring of the LFSRd feedback polynomial. For k = 4 parity-check polynomials of weight 3 were generated from low-weight low-degree polynomial multiples of the LFSRd feedback polynomial. The values of K, c and p for each value of k are given in Table 1. For k = 2 and 3, experiments were performed for both consecutive (CONS) and full positive difference set (FPDS) tap settings, for a range of keystream lengths. For k = 4 FPDS tap settings are not possible, so the experiments were performed for consecutive tap settings only. For each keystream length and each k, five different seeds were used for the LFSRa initial state and twenty different seeds were used for the LFSRd initial state. Thus, for each keystream length, each value of k and for each set of conditions, the fast correlation attack was performed one hundred times. In terms of the algorithm proposed in [5], the resetting and stopping criteria for the fast correlation attack were as follows: the attack was reset when the averTable 1 Values of c and p for each k k
K
c
p
2
4
0.25000
0.375000
3
8
0.12500
0.437500
4
16
0.06250
0.468750
92
L. Simpson et al. / Information Processing Letters 70 (1999) 89–93
age residual error probability dropped below 0.02, the number of parity-checks remained unchanged for five iterations or the cumulative number of complementations in a round reached 3000; the attack stopped when 90% of the parity-checks were satisfied, 50 rounds were completed or if an error-free sliding window of length 63 was found at the end of any round. The search for a sliding window began once the level of satisfied parity-checks exceeded 80%. The attack was considered successful if the correct LFSR initial states were obtained. In the case of the MG, the noise added to the LFSRd sequence is not independent, due to the feedforward transform of the LFSR sequence. For comparison, we also launched the fast correlation attack, using the same set of parity-checks and keystream length, on a keystream sequence formed by adding the same number of independent (INDEP) random errors to the LFSR sequence. The outcomes of the experiments were considered with respect to k, the number of taps used to form the addresses. Table 2 shows the success rate as the proportion of successful attacks for each value of k, under each set of conditions. As k increases, increasing the number of parity-checks and consequently the length of the keystream sequence increases the probability of successful recovery of the LFSR initial states. Table 2 Table 2 Success rate (%) versus k
k
Keystream length
2
3
4
Number of
Success
parity-check
rate (%)
polynomials
CONS
FPDS
INDEP
2500
6
4
5
52
5000
7
52
68
100
10000
8
98
99
100
20000
9
98
100
100
10000
8
3
1
1
20000
9
29
19
30
35000
10
68
68
94
65000
11
94
99
100
130000
12
98
99
100
132000
42
13
n/a
14
263000
48
74
n/a
68
also shows that the tap setting has little effect on the success of the attack, with the proportion of successful attacks being much the same for both of the tap settings used. In addition, Table 2 also shows that the fast correlation attack is less successful in the case of the MG than it is against independent noise.
4. Comparison with other attacks A number of attacks on the MG have been published in recent years. All are known plaintext attacks, conducted under similar assumptions as the fast correlation attack presented in this paper. Divide and conquer attacks proposed by Anderson [1,2] and Zeng, Yang and Rao [12] differ from the attack presented in this paper because they target LFSRa , rather than LFSRd . The approaches involve guessing the initial content of LFSRa and checking this guess for consistency with the known keystream sequence. In [2], the exhaustive search of the LFSRa keyspace is reduced; the approach is to guess only some of LFSRa bits, and to use the LFSRa feedback polynomial to construct as many bits in the LFSR sequence as possible, using these to check for consistency. This improvement is dependent on the LFSRa feedback polynomial being of low weight (also a requirement for the fast correlation attack) and the feedback taps being “bunched”. A different type of attack exploits the resynchronization weakness existing in some applications of synchronous stream ciphers [3]. The applicability of such an attack is limited by the requirements for a certain number of resynchronizations. For divide and conquer attacks, there is a trade-off between the length of keystream required to perform the attack and the number of trials to needed to obtain the correct LFSR initial state. In the attacks proposed by Anderson [1,2] and Zeng, Yang and Rao [12] only a small amount of keystream is required for the consistency checks, but a large number of guesses may be needed to recover the correct LFSR initial state. A lower bound on the required keystream length is given in [12] as N > m + n2k with 2m+k linear consistency tests being performed. As this is exponential in the length of LFSRa , the attack is not possible when LFSRa is large. By contrast, the fast correlation attack described in this paper requires a longer segment of keystream, but requires no guessing of the LFSR ini-
L. Simpson et al. / Information Processing Letters 70 (1999) 89–93
tial states. The computational complexity of the fast correlation attack is concentrated in the modification of the keystream sequence, and is proportional to the keystream sequence length and to the average number of parity-checks per keystream bit used. The number of parity-checks used depends on the value of the correlation coefficient of the LFSRd sequence to the keystream sequence, and also on the weight of the parity-checks used. Because the attacks targeting LFSRa require exhaustive search of the shift register initial state, or a slightly reduced exhaustive search, these attacks have computational complexity which is exponential in m, the length of LFSRa . Hence they are not applicable in situations where this register is large, even where k, the number of taps used to form the address input to the multiplexer, is small. The fast correlation attack outlined in this paper is not subject to this limitation, as evidenced by successful attacks where m = 59. However, the success of the fast correlation attack algorithm is dependent on the value of p; a direct function of k, and on the existence of enough low-weight polynomial multiples of the feedback polynomial of LFSRd to use for parity-checks. 5. Conclusion Systematic computer simulations on MGs show that the fast correlation attack based on iterative errorcorrection is successful provided that the number of the low-weight parity-checks used is sufficiently large. In general, if the attack is successful, then the modified keystream sequence converges to one of the phaseshifts of the original LFSRd sequence that is correlated to the keystream sequence. The computational complexity of the fast correlation attack is proportional to both the keystream sequence length and to the average number of paritychecks per keystream bit used. These depend on both the value of the correlation coefficient of the LFSRd sequence to the keystream sequence and of the weight of the parity-checks used. Successful experiments were conducted for k = 2, 3 and 4. The fast correlation attack has been shown to be successful if k is less than five, provided that the weight of the feedback polyno-
93
mial for LFSRd is relatively low (e.g., less than 10) or low-weight polynomial multiples of the feedback polynomial can be found. For a MG to offer practical security against this fast correlation attack and other known attacks, the component LFSRs should be large, with feedback polynomials which are not of low weight and do not have low-weight polynomial multiples of small or moderately large degree. Also, at least five or six taps from LFSRa should be used to form the addresses. References [1] R.J. Anderson, Solving a class of stream ciphers, Cryptologia 14 (3) (1990) 285–288. [2] R.J. Anderson, Faster attack on certain stream ciphers, Electron. Lett. 29 (15) (1993) 1322–1323. [3] J. Daemen, R. Govaerts, J. Vandewalle, Resynchronization weakness in synchronous stream ciphers, in: T. Helleseth (Ed.), Advances in Cryptology—EUROCRYPT ’93, Lecture Notes in Computer Sci., Vol. 765, Springer, Berlin, 1994, pp. 159– 167. [4] J.D. Goli´c, Computation of low-weight parity-check polynomials, Electron. Lett. 32 (21) (1996) 1981–1982. [5] J.D. Goli´c, M. Salmasizadeh, A. Clark, A. Khodkar, E. Dawson, Discrete optimisation and fast correlation attacks, in: E. Dawson, J. Goli´c (Eds.), Cryptography: Policy and Algorithms, Lecture Notes in Computer Sci., Vol. 1029, Springer, Berlin, 1996, pp. 186–200. [6] J.D. Goli´c, The number of output sequences of a binary sequence generator, in: D.V. Davies (Ed.), Advances in Cryptology—EUROCRYPT ’91, Lecture Notes in Computer Sci., Vol. 547, Springer, Berlin, 1991, pp. 160–167. [7] J.D. Goli´c, M. Salmasizadeh, E. Dawson, Autocorrelation weakness of multiplexed sequences, in: Internat. Symposium on Information Theory and its Applications, Vol. 2, 1994, pp. 983–987. [8] S.M. Jennings, A special class of binary sequences, PhD Thesis, University of London, 1980. [9] W. Meier, O. Staffelbach, Fast correlation attacks on certain stream ciphers, J. of Cryptology 1 (3) (1989) 159–167. [10] W. Meier, O. Staffelbach, Correlation properties of combiners with memory in stream ciphers, J. of Cryptology 5 (1) (1992) 67–86. [11] T. Siegenthaler, Decrypting a class of stream ciphers using ciphertext only, IEEE Trans. Comput. C-34 (1985) 81–85. [12] K.C. Zeng, C.H. Yang, T.R.N. Rao, On the linear consistency test (LCT) in cryptanalysis with applications, in: G. Brassard (Ed.), Advances in Cryptology—CRYPTO ’89, Lecture Notes in Computer Sci., Vol. 434, Springer, Berlin, 1990, pp. 164– 174.