Information Processing Letters 116 (2016) 192–196
Contents lists available at ScienceDirect
Information Processing Letters www.elsevier.com/locate/ipl
An improvement of a cryptanalysis algorithm Oualid Benamara a , Fatiha Merazka b,∗ , Kamel Betina a a
LATN Lab. Institute of Mathematics, University of Science and Technology Houari Boumediene (usthb), P.O. Box 32, El Alia, Algiers, Algeria LISIC Lab. Telecommunications Department, University of Science and Technology Houari Boumediene (usthb), P.O. Box 32, El Alia, Algiers, Algeria b
a r t i c l e
i n f o
Article history: Received 20 September 2014 Received in revised form 14 July 2015 Accepted 6 August 2015 Available online 12 August 2015 Communicated by S.M. Yiu
a b s t r a c t In this paper we present simulations that show how some pseudo random number generators can improve the effectiveness of a statistical cryptanalysis algorithm. We deduce mainly that a better generator enhances the accuracy of the cryptanalysis algorithm. © 2015 Elsevier B.V. All rights reserved.
Keywords: Cryptography Markov Chain Monte Carlo Classical cryptosystems Pseudo random number generators
1. Introduction Cryptography refers to the science that concerns encrypting data so that, without a secret key, a third party other than the sender and the receiver cannot recover the secret data [10]. At the same time, the cryptanalysts try to break the cryptosystems in order to prove that there is a security flaw, and then proceed to the correction of the cryptographic system. According to the Kerckhoffs principal, the algorithm should be known by all the parties and even the adversary who wants to break the code; the security should rely on the secret key, rather than the algorithm used [2]. We refer to [8] for the standard definitions. Classical cryptosystems operate at the byte level of the data whereas the modern ones at the bit level. We will study the substitution–transposition ciphers in this work
*
Corresponding author. E-mail addresses:
[email protected] (O. Benamara),
[email protected] (F. Merazka),
[email protected] (K. Betina). http://dx.doi.org/10.1016/j.ipl.2015.08.002 0020-0190/© 2015 Elsevier B.V. All rights reserved.
but the cryptanalysis techniques may be applied to the modern cryptosystems due to the fact that the operation mode is the same, yet the parameters of the two kinds of cryptosystems are not the same, like the alphabet space and the fact that the latter relies on more sophisticated operations. In this paper we deal with classical cryptosystems and try to improve a cryptanalysis algorithm by testing different pseudo random number generators (PRNG). A pseudo random number generator is a key parameter in the cryptanalysis process. The goal of this study is to evaluate how different PRNGs impact the performance of the MCMC algorithm in terms of the metrics that will be defined in the later sections. This paper is organized as follows: the next sections present Markov Chain Monte Carlo (MCMC) algorithm and a survey of the related theory. The outputs of the application of some PRNGs to MCMC are given thereafter, wherein simulation results are presented. We analyze those results and give our interpretation and analysis in Section 4.
O. Benamara et al. / Information Processing Letters 116 (2016) 192–196
2. Markov Chain Monte Carlo In this section, we give the definition of Markov Chain Monte Carlo (MCMC). A Markov Chain is a stochastic process X 1 , X 2 , . . . (the X i ’s are random variables) together with the following property [9]:
P ( X n+1 = xn+1 | X n = xn , X n−1 = xn−1 , . . . , X 1 = x1 )
= P ( Xn+1 = xn+1 | Xn = xn ) The above property means that the present state at n + 1 depends only on the previous state n and not on the other ones (n − 1), (n − 2), . . . , 2, 1. A Markov Chain is said to be on a finite or countable states space, if the random variables take only finite or countable values. A Markov Chain is said to be irreducible if it is possible to go from any state to another state in the state space within a nonzero probability. A state i in the state space is said to be aperiodic if the return to the state i occurs at irregular times. The chain is then aperiodic if the whole set of the states is aperiodic. Markov Chain Monte Carlo (with its variants) is a set of techniques or algorithms intended to generate a Markov Chain which converges to a given probability distribution [11]. The below proposition will justify the MCMC procedure. Proposition 1. If a Markov chain X n on a finite or countable state space X is irreducible and aperiodic, with stationary distribution π , then for every subset A ⊆ X ,
lim P ( X n ∈ A ) =
π (x)dx
n→∞
A
The above proposition means that the values of X n are distributed according to π for sufficiently large n. 2.1. MCMC algorithm In this section we rewrite the MCMC algorithm as described in [4]. Let π be a probability distribution. The MCMC algorithm operates as follows:
• Choose an initial state X 0 ∈ X , where X is all the pos• • • •
sible states that the Markov Chain may take. In probability theory we call X the universe. For n = 1, 2, 3, . . . . Propose a new state Y n ∈ X from some symmetric proposal density q( X n−1 , . . . , X 0 ). Let U n Uniform[0, 1], independently of X 0 , . . . , X n−1 , Y n . If U n < (π (Y n )/π ( X n−1 )), then “accept” the proposal by setting X n = Y n , otherwise “reject” the proposal by setting X n = X n−1 .
At the end of this process, we obtain a Markov Chain
( Xn )n∈ N which converges to π .
193
which defines the correspondence. The one-time pad (OTP) cipher, which belongs to this class, is one of the rare cryptosystems whose security is proved. However, it is impractical, due to the fact that the secret key is as long as the clear text. The transposition cipher applies a permutation to the clear text, but leaves the correspondence between the letters unchanged. Combining the above two functions results in the substitution–transposition cipher [8]. In order to apply the above MCMC algorithm to the deciphering process, we proceed as described in [4]. At the implementation step, we will use the best parameters found in the above study, such as the scale parameters and the number of iterations, resulting in a more accurate algorithm: 1. Choose an initial state (the states here are all the possible encryption keys), and a fixed scaling parameter p > 0. 2. Repeat the following steps for many iterations (e.g. 10 000 iterations). • Given the current state x, propose a new state y from some symmetric density q(x, y ). • Sample U n Uniform[0, 1], independently from all other variables. • If u < (π ( y )/π (x)) p , then “accept” the proposal by replacing x with y, otherwise reject y by leaving x unchanged.
π (x) =
r (β1 , β2 ) f x (β1 ,β2 )
β1 ,β2
where r are the frequencies of letters of the reference text and the f x are those of the decrypted text using the key x. 2.3. Testing methodology We have tested the MCMC algorithm with different pseudo random number generators (PRNGs). The results in terms of accuracy and number of successful decryption are given in the next paragraph. The definition of the metrics are given therein. Also we will provide the full algorithm of the PRNG used. The source code of the MCMC is available at [14]. Once the files downloaded and installed, download the following two files [15] and the following file [16]. Once done, change the corresponding files in the MCMC directory. After that, switch between the PRNGs by uncommenting and commenting out the functions at RandUtil::newRand() found in the file RandUtil.cpp. Once done, build the program again and run it. Each simulation consists of the following:
• Generate a secret key. • Encrypt the text using the above key. • Run the MCMC algorithm over the encrypted text in order to recover the secret key.
• Once done, compute the average correctness and eval-
2.2. An MCMC algorithm to break a substitution–transposition cryptosystem
uate if the cryptanalysis was successful or not (if the key has been recovered), according to the definitions given in the next section.
The substitution ciphers operate by replacing each letter in the clear text by another letter according to the key,
The above steps are repeated 100 times. At the end, we output the average correctness and the number of success-
194
O. Benamara et al. / Information Processing Letters 116 (2016) 192–196
Table 1 MCMC algorithm performance using drand48 PRNG. SN: the simulation number. AC: measures the correspondence between the actual secret key with the key generated by MCMC, and it is the mean over the 100 runs. NSD: the number of successful decryption out of 100 runs of the program. SN
AC
NSD
1 2 3 4 5
0.85 0.85 0.82 0.83 0.82
82 82 81 81 79
ful decryption. We can change the number of the loops, instead of 100 times, by changing the number of runs parameter in the main function. 3. The simulations A pseudo random number generator is designed to generate a sequence of binary numbers that aim to behave as “truly random numbers” [6]. Several techniques are used to achieve this task, each one with advantages and disadvantages. Some PRNGs rely on physical phenomenons and others are number theory based. In order to measure the performances of PRNG, a battery of tests are applied such as the diehard test designed by G. Marsaglia [12] and the NIST tests [13]. We refer to a survey of the subject given in [5] and the references therein. In the next section we recall three important PRNGs together with their impact on the MCMC output simulation. 3.1. drand48 pseudo random number generator
Table 2 MCMC algorithm performance using xorshift PRNG. SN: the simulation number. AC: measures the correspondence between the actual secret key with the key generated by MCMC, and it is the mean over the 100 runs.NSD: the number of successful decryptions out of 100 runs of the program.
where a, c and m are constants. Note that in Table 1, the abbreviations are as follows:
• SN is the simulation number, which refers to the five
AC
NSD
1 2 3 4 5
0.896 0.9161 0.8933 0.9156 0.8811
88 81 89 91 89
Table 3 MCMC algorithm performance using CI PRNG. SN: the simulation number. AC: measures the correspondence between the actual secret key with the key generated by MCMC, and it is the mean over the 100 runs. NSD: the number of successful decryptions out of 100 runs of the program. SN
AC
NSD
1 2 3 4 5
0.8700 0.8744 0.88 0.8983 0.8478
85 82 87 82 87
#include
uint32_t xor128(void) static uint32_t x = static uint32_t y = static uint32_t z = static uint32_t w = uint32_t t;
This generator generates a sequence of numbers according to this linear congruence
X n+1 = (a X n + c ) mod m
SN
{ 123456789; 362436069; 521288629; 886751238;
t = x ^ (x << 11); x = y; y = z; z = w; return w = w ^ (w >> 19) ^ (t ^ (t >> 8)); }
Note that x, y, z and w are seeds and have to be chosen as 32 bit integers. An enhanced version of this generator is available in [3]. Table 2 shows the obtained results.
simulations done in this paper.
• AC is the average correctness computed as follows. For
3.3. Chaotic iteration (CI) pseudo random number generator
each simulation, the text is encrypted with a different key and the cryptanalysis algorithm is run. The cryptanalysis algorithm outputs a decryption key. The AC measures the equality of the output of the cryptanalysis algorithm with the actual encryption key, in the overall. • NSD is the number of successful decryptions, which refers to the number of successful decryptions out of the 100 performed for each simulation.
This generator is obtained from reference [1]. As shown in this section later, this generator surpasses xorshift in some tests and a deep theoretic study proved that this generator has good randomness properties. The obtained results are tabulated in Table 3. Here x is a binary array of length N.
3.2. xorshift pseudo random number generator One of the properties of this generator is that it is a very fast algorithm with a great period (2128 − 1) [7]. Also, as we see below, its design is simple and it has been proved accurate in the tests measuring the quality of a PRNG.
a := XORshift1(); m := a mod 2 + c for i = 0,...,m do b := XORshift2(); S := b mod N; x[S] := 1 - x[S] mod 2; end for r := x; return r;
O. Benamara et al. / Information Processing Letters 116 (2016) 192–196
Table 4 MCMC algorithm performance with CI, xorshif and drand8 PRNGs after 1000 runs in terms of AC and NSD.
NSD AC
dran48 PRNG
CI PRNG
xorshift PRNG
800 0.8466
811 0.8486
833 0.8693
Comparing Table 3 with Tables 2 and 1 we conclude that the xorshift has the better result in the overall in terms of AC and NSD. 3.4. Further computations In order to enhance the accuracy of the computations, we increase the number of simulations. Each simulation consists of generating the secret key, encrypts the clear text with that key and then runs the MCMC to recover the key, as described in Section 2.3. In this section we repeat the above procedure 1000 times and then record the outputs. We consider the NSD and the AC as the output parameters and choose the best seed from the previous subsection. The records are shown in Table 4. Over 1000 runs of the MCMC algorithm, the number of successful runs for the xorshif PRNG is greater than the drand48 runs by 33. The number of successful runs for the CI PRNG is greater than the drand48 one by 11 runs. This proves that xorshift on average achieves better scores than the two others. Even if the xorshift scores are greater than the others, they go close to the CI and drand48 scores. The reason is due to the fact that on some simulations xorshift fails to achieve greater scores. One way to avoid this is to use the seed (Section 3.2) leading to better results. For example, the seed used in Table 2, simulation 4, seems to be better than the others. So the MCMC algorithm will be well configured if that seed is used for later runs instead of the others. More computations are given in this section. The values shown in the cells of the table correspond to the number of successful decryptions after 100 runs of the MCMC algorithm. The “random” library in C++ was used to generate the seeds for the xorshift PRNG. Also, seeds have to be chosen so that they contain 9 to 10 decimal digits. Tables 5, 6 and 7 (see Appendix A) contain the results obtained for the three PRNGs. Using the Tables 5, 6 and 7, we have computed the mean of the values in each table and the results are given in Table 8 (see Appendix A). Table 9 (see Appendix A) contains the maximum values obtained from the Tables 5, 6 and 7. We deduce from Table 8 that the CI and the xorshift PRNGs achieve better scores in terms of the mean and that the xorshif PRNG achieves the best score in terms of the maximum number of successful decryptions (NSD). 4. Results discussions Recall that the PRNG is used to sample a random value in the interval [0, 1]. After that, a test U n < (π (Y n )/ π ( Xn−1 )) is done to choose whether to accept that value
195
of Y n or not. This process leads to the generation of a Markov Chain X 1 , X 2 , . . . which converges to the secret key used for the encryption. Recall that the state space represents all the possible encryption keys. The AC and NSD (Section 3.1) both measure the success of the above procedure. In other words, AC and NSD measure the convergence of the Markov Chain X 1 , X 2 , X 3 , . . . to the stationary distribution π . By the design of the Markov Chain X 1 , X 2 , . . . , it is theoretically convergent. Thus the failure to converge may be induced by the less good statistical properties of the PRNG used. Of course, xorshift and CI PRNGs have been tested and passed the diehard battery tests. drand48 has been used in the original study [4]. The diehard tests check for the period and other metrics related to the generator in order to measure the degree of randomness of the PRNG. However, in terms of the simulations done in this work, xorshift is proved to be more accurate than the two others. We come to the conclusion in this section that, in terms of the convergence of the Markov Chain and in the configuration considered in this paper, xorshift delivers better scores. 5. Conclusion In this paper we have presented our simulations on the cryptanalysis of classical cryptosystems. We have outlined that the MCMC techniques are accurate tools in the field of cryptanalysis. Our results show that xorshift PRNG is more convenient to this kind of application since with this later we have obtained the highest scores. Previous studies show that CI PRNGs have the best statistical proprieties, but surprisingly this doesn’t imply that they are more suitable for all kinds of applications like those we have done in our paper. We have seen through this work that MCMC techniques are accurate cryptanalysis tools and have proved that classical cryptosystems are definitely insecure. Moreover, even the substitution and transposition ciphers are quite old, the operating principle remains widely used in modern cryptosystems, as the AES and the DES standards. As future direction of this work, it would be interesting to use MCMC over modern ciphers. Appendix A A.1. 100 simulations with each PRNG The detailed results obtained for the three PRNGs in terms of NSD can be found and are summarised in Tables 5, 6 and 7. A.2. 100 simulations summarised in terms of means with each PRNG The results in terms of means with each PRNG are summarised in Table 8.
196
O. Benamara et al. / Information Processing Letters 116 (2016) 192–196
A.3. 100 simulations summarised in terms of maxima with each PRNG
Table 5 Scores of 100 runs for the drand48 PRNG in terms of NSD. Scores
The results in terms of maximum score with each PRNG are summarised in Table 9.
83-86-80-87-82-83-84-78-84-76-84-83-79-87 78-81-78-84-82-83-85-82-78-85-81-78-83-83 86-80-87-84-82-81-83-83-83-79-89-79-85-80 80-81-80-82-76-89-77-79-88-85-87-76-81-79 86-78-79-79-80-74-80-80-86-82-83-78-82-85 80-81-78-80-82-82-83-84-86-87-80-89-81-82-84 79-79-88-78-82-81-89-81-84-86-86-87-84-85-73
Acknowledgments The authors would like to thank the anonymous reviewers for their helpful comments.
Table 6 Scores of 100 runs for the xorshift PRNG in terms of NSD.
References
Scores 85-78-86-87-85-83-75-80-80-80-82-85-90-90 79-79-87-85-83-81-82-80-80-84-82-86-91-78 83-78-93-76-85-76-87-80-86-86-81-88-83-78 93-76-85-76-87-87-78-88-88-79-79-80-82-83 85-79-85-86-84-82-78-80-87-80-91-81-88-80 82-82-83-84-82-82-87-84-78-88-87-84-74-81-80 81-84-83-87-84-85-80-84-82-82-75-86-79-78-85
Table 7 Scores of 100 runs for the CI PRNG in terms of NSD. Scores 87-82-88-84-87-85-81-82-81-87-78-88-85-83 76-80-85-82-70-85-81-85-84-80-79-78-81-82 86-83-84-87-81-86-84-84-79-90-85-82-86-86 80-86-83-85-76-81-83-82-87-86-88-75-84-83 83-84-85-83-82-86-77-86-86-83-79-83-80-83 81-78-82-81-81-80-85-86-81-88-88-80-82-89-79-76 86-83-84-86-84-79-86-77-83-80-77-84-86-85
Table 8 Mean values of the scores of the three PRNGs in terms of NSD.
Mean score
dran48 PRNG
xorshift PRNG
CI PRNG
82.18
82.83
82.85
Table 9 Maximum score of the three PRNGs in terms of NSD.
Maximum score
dran48 PRNG
xorshift PRNG
CI PRNG
89
93
90
[1] Jacques Bahi, Christophe Guyeux, Qianxue Wang, Improving random number generators by chaotic iterations. Application in data hiding, in: Int. Conf. on Computer Application and System Modeling, ICCASM 2010, 2010. [2] David Kahn, The Codebreakers: The Story of Secret Writing, 2nd edn., Scribners, 1996, p. 235. [3] Richard P. Brent, Some long-period random number generators using shifts and xors, in: Proceedings of the 13th Biennial Computational Techniques and Applications Conference, CTAC 2006, 2006. [4] Jian Chen, Jeffrey S. Rosenthal, Decrypting classical cipher text using Markov chain Monte Carlo, Stat. Comput. (2012). [5] Pierre L’Ecuyer, Uniform random number generation, Ann. Oper. Res. 53 (1) (1994) 77–120. [6] E. Knuth Donald, The Art of Computer Programming, Addison Wesley Longman Publishing Co., Inc., 1998. [7] George Marsaglia, Xorshift RNGs, J. Stat. Softw. (2003). [8] Alfred J. Menezes, Scott A. Vanstone, Paul C. Van Oorschot, Handbook of Applied Cryptography, CRC Press, Inc., ISBN 0849385237, 1996. [9] Jeffrey S. Rosenthal, First Look at Rigorous Probability Theory, World Scientific Publishing Co., 2006. [10] B. Schneier, Applied Cryptography, 2nd edn., Wiley, New York, 1996. [11] Steve Brooks, Andrew Gelman, Galin L. Jones, Xiao-Li Meng, Handbook of Markov Chain Monte Carlo, Chapman Hall/CRC, 2011. [12] George Marsaglia, The Marsaglia random number CDROM including the diehard battery of tests of randomness, also at: http://stat.fsu. edu/pub/diehard, 1996. [13] Andrew Rukhin, et al., A statistical test suite for random and pseudorandom number generators for cryptographic applications, National Institute of Standards and Technology (NIST), Technology Administration, U.S. Department of Commerce, 2010, http://csrc.nist.gov/ publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf. [14] probability.ca/decipher. [15] https://www.dropbox.com/s/i9ccjnx0giqdwr1/RandUtil.cpp?dl=0. [16] www.dropbox.com/s/317kmjxdd3mi5ic/RandUtil.h?dl=0.