Computing AES related-key differential characteristics with constraint programming

Journal Pre-proof Computing AES related-key differential characteristics with constraint programming David Gerault, Pascal Lafourcade, Marine Minier,...

Download PDF

664KB Sizes 0 Downloads 25 Views

Report

PDF Reader
Full Text

Journal Pre-proof Computing AES related-key differential characteristics with constraint programming

David Gerault, Pascal Lafourcade, Marine Minier, Christine Solnon

PII:

S0004-3702(18)30363-1

DOI:

https://doi.org/10.1016/j.artint.2019.103183

Reference:

ARTINT 103183

To appear in:

Artificial Intelligence

Received date:

5 July 2018

Revised date:

5 September 2019

Accepted date:

15 October 2019

Please cite this article as: D. Gerault et al., Computing AES related-key differential characteristics with constraint programming, Artif. Intell. (2019), 103183, doi: https://doi.org/10.1016/j.artint.2019.103183.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier.

Computing AES Related-Key Diﬀerential Characteristics with Constraint Programming David Geraulta , Pascal Lafourcadea , Marine Minierb , Christine Solnonc a

Universit´e Clermont Auvergne, CNRS, LIMOS, F-63000 Clermont, France b Universit´e de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France c Univ. Lyon, INSA Lyon, CNRS, LIRIS, F-69621, France

Abstract Cryptanalysis aims at testing the properties of encryption processes, and this usually implies solving hard optimization problems. In this paper, we focus on related-key diﬀerential attacks for the Advanced Encryption Standard (AES), which is the encryption standard for block ciphers. To mount these attacks, cryptanalysts need to solve the optimal related-key diﬀerential characteristic problem. Dedicated approaches do not scale well for this problem, and need weeks to solve its hardest instances. In this paper, we improve existing Constraint Programming (CP) approaches for computing optimal related-key diﬀerential characteristics: we add new constraints that detect inconsistencies sooner, and we introduce a new decomposition of the problem in two steps. These improvements allow us to compute all optimal related-key diﬀerential characteristics for AES-128, AES-192 and AES-256 in a few hours. Keywords: Constraint Programming, AES, Diﬀerential cryptanalysis. 1. Introduction Since 2001, AES (Advanced Encryption Standard) is the encryption standard for block ciphers [FIP01]. It guarantees communication conﬁdentiality by using a secret key K to cipher an original plaintext X into a ciphertext AESK (X), in such a way that the ciphertext can further be deciphered into −1 the original one using the same key, i.e., X = AESK (AESK (X)). Cryptanalysis aims at testing whether an attacker can observe any nonrandom property in the encryption process: if he cannot, then conﬁdentiality Preprint submitted to Artiﬁcial Intelligence

October 21, 2019

is guaranteed. In particular, diﬀerential cryptanalysis [BS91] studies the propagation of the diﬀerence δX = X ⊕ X between two plaintexts X and X through the cipher, where ⊕ is the exclusive or (xor) operator. If the distribution of the output diﬀerence δC = AESK (X) ⊕ AESK (X ) is non uniform, then an adversary can exploit this non-uniformity to guess the key K used to encrypt messages with diﬀerence δX faster than by performing exhaustive search: conﬁdentiality is therefore lowered. Today, diﬀerential cryptanalysis is public knowledge, and block ciphers such as AES have proven security bounds against diﬀerential attacks. Hence, [Bih93] proposed to consider diﬀerences not only between the plaintexts X and X but also between the keys K and K to mount related-key attacks. In this case, the cryptanalyst is interested in ﬁnding optimal related-key differentials, i.e., input and output diﬀerences that maximize the probability of obtaining the output diﬀerence given the input diﬀerence. In other words, we search for δX, δK, and δC that maximize the probability that δC is equal to AESK (X) ⊕ AESK⊕δK (X ⊕ δX) for any plaintext X and any key K. Finding an optimal related-key diﬀerential is a highly combinatorial problem that hardly scales. To simplify this problem, we usually compute diﬀerential characteristics which additionally specify intermediate diﬀerences after each step of the ciphering process. Also, to reduce the search space, Knudsen [Knu95] has introduced truncated diﬀerential characteristics where sequences of bits are abstracted by single bits that indicate whether sequences contain diﬀerences or not. Typically, each byte (8-bit sequence) is abstracted by a single bit (or, equivalently, a Boolean value). In this case, the goal is no longer to ﬁnd the exact diﬀerences, but to ﬁnd the positions of these diﬀerences, i.e., the presence or absence of a diﬀerence for every byte. However, some truncated diﬀerential characteristics may not be valid (i.e., there do not exist actual byte values corresponding to these diﬀerence positions) because some constraints at the byte level are relaxed when reasoning on diﬀerence positions. Hence, the optimal related-key diﬀerential characteristic problem is usually solved in two steps [BN10, FJP13]. In Step 1, every diﬀerential byte is abstracted by a Boolean variable that indicates whether there is a diﬀerence or not at this position, and we search for all truncated diﬀerential characteristics. Then, for each of these truncated diﬀerential characteristics, Step 2 aims at deciding whether it is valid (i.e., whether it is possible to ﬁnd actual byte values for every Boolean variable) and, if it is valid, at ﬁnding the actual byte values that maximize the probability. Two main approaches have been proposed to solve the optimal related-key 2

diﬀerential characteristic problem on AES: a graph traversal approach [FJP13], and a Branch & Bound approach [BN10]. The approach of [FJP13] requires about 60 GB of memory when the key has 128 bits, and it has not been extended to larger keys. The approach of [BN10] only takes several megabytes of memory, but solving Step 1 requires several days of computation when the key has 128 bits, and several weeks when the key has 192 bits. Of course, each of these problems must be solved only once, and CPU time is not the main issue provided that it is “reasonable”. However, during the process of designing new ciphers, this evaluation sometimes needs to be repeated several times. Hence, even though not crucial, a good CPU time is a desirable feature. Another point that should not be neglected is the time needed to design and implement these approaches: To ensure that the computation is completed within a “reasonable” amount of time, it is necessary to reduce branching by introducing clever reasoning. Of course, this hard task is also likely to introduce bugs, and checking the optimality of the computed solutions may not be so easy. Finally, reproducibility may also be an issue. Other researchers may want to adapt these algorithms to other problems, with some common features but also some diﬀerences, and this may again be very diﬃcult and time-consuming. 1.1. Declarative approaches for cryptanalytic problems An appealing alternative to dedicated approaches is to use generic solvers such as Integer Linear Programming (ILP), Boolean satisﬁability (SAT) or Constraint Programming (CP). In this case, we have to design a model of the problem (by means of linear inequalities for ILP, Boolean clauses for SAT, and constraints for CP), and this model is automatically solved by generic solvers. ILP. Many symmetric cryptanalysis problems on diﬀerent ciphers have been tackled with ILP. For example, ILP has been used to ﬁnd optimal (related key) diﬀerential characteristics against bit-oriented block ciphers such as SIMON, PRESENT or LBlock [SHW+ 14], to search for impossible diﬀerentials [ST17], and to provide security bounds in the design of the new lightweight block cipher SKINNY [BJK+ 16]. ILP models can only contain linear inequalities. Therefore, it is necessary to transform non-linear operators into sets of linear inequalities. However, the resulting ILP model may not scale well. For example, in [AST+ 17], authors introduce an ILP model of the non-linear part of a block cipher (such as the SubBytes operation used in 3

AES). They present some results on SKINNY-128 where the time required to ﬁnd diﬀerential paths is about 15 days. SAT. SAT and Satisﬁability Modulo Theories (SMT) have also been used to solve cryptanalysis problems. For example, in [MP13], Mouha and Preneel use a SAT solver to search for optimal diﬀerential characteristics for a type of ciphers called ARX, in which the nonlinear components are diﬀerent from the SubByte operation used in AES. In [SWW17], Sun et al. propose a SAT/SMT model to search for division properties on ARX and word-based block ciphers, and they show on SHACAL-2 that it scales better than ILP. In [KLT15] the authors analyze the general class of functions underlying the Simon block cipher: They derive SAT/SMT models for the exact diﬀerential and linear behaviour of Simon-like round functions, and they use SAT/SMT solvers to compute optimal diﬀerential and linear characteristics for Simon. The SAT solver used in [KLT15] is CryptoMiniSat [SNC09], which is well suited to solve cryptanalysis problems. In particular, it introduces xorclauses to easily model xor operations, and it uses Gaussian elimination to eﬃciently propagate these xor-clauses. These xor-clauses can be used to model bitwise xor operations in Step 2. However, they cannot be used to model these operations in Step 1. Indeed, if 1⊕1 = 0 at a bitwise level (during Step 2), this is no longer true during Step 1 because the xor of two bytes diﬀerent from 0 may be either 0 or a value diﬀerent from 0, depending on whether the two bytes have the same value or not (see Section 3.2). Similarly to ILP, non linear operations such as SubBytes are not straightforward to model by means of clauses. In [Laf18], Laﬁtte shows how to encode a relation associated with a non linear operation into a set of clauses and, in [SWW18], Sun et al. show how to reduce the number of clauses of this kind of encoding by using the same approach as in [AST+ 17]. CP. CP models have been used in [LCM+ 17] for modeling algebraic sidechannel attack on AES and in [RSM+ 11] to design the non-linear part of a block cipher with good properties. Recently, we have introduced CP models for ﬁnding related-key diﬀerential characteristics for AES [GMS16], Midori [GL16], and SKINNY [SGL+ 17]. These preliminary work have shown us that oﬀ-the-shelf CP solvers are able to solve these problems quicker than dedicated approaches. In particular, the CP model of [GMS16] for Step 1 of AES when the key has 128 bits is solved in a few hours by CP solvers such as Gecode [Gec06], Choco [PFL16], or Chuﬀed [CS14]. This CP model has 4

allowed us to ﬁnd a better solution than the one claimed to be optimal in [FJP13, BN10] for 4 rounds of AES when the key has 128 bits. However, the CP model for Step 1 described in [GMS16] is not able to solve all instances of AES within a reasonable amount of time. In [GLMS17, GLMS18], we show how to decompose the hardest instances into independent sub-problems which can be solved in parallel, thus allowing us to solve all AES instances. However, computing optimal diﬀerential characteristics with the approach of [GLMS17, GLMS18] still is very time consuming. For example, the hardest instance (for 10 rounds of AES-192) needs more than two months of CPU time to be solved. 1.2. Contributions and outline of the paper In this paper, we introduce a new CP model for Step 1 and a new decomposition of the solution process in two steps that allow us to solve every instance of AES, for all key lengths, in a few hours: All instances but two are solved in less than one hour, while the two hardest instances are solved in 5 hours and 15 hours, respectively. In Section 2, we describe the context of this work: the AES ciphering process, the AES related-key diﬀerential characteristic problem, the two step solving process that is used in [FJP13, BN10, GMS16], how to use diﬀerential characteristics to design related-key attacks, and the basic principles of CP. In Section 3, we describe the basic CP model of [MSR14] for solving Step 1. This model is a straightforward encoding which associates a constraint between Boolean variables with every AES operation. A weakness of this model is that the xor constraint at the Boolean level is a poor abstraction of the xor operation at the Byte level. As a consequence, there is a huge number of truncated diﬀerential characteristics, most of which are discarded at Step 2 because they are not valid. In Section 4, we introduce a ﬁrst contribution of this paper, which is a new CP model for solving Step 1. The idea is to generate new xor equations by combining initial equations coming from the key schedule, in a preprocessing step, and to use these new equations to ﬁlter the search space. We also tighten the deﬁnition of the xor constraint between Boolean variables by reasoning on diﬀerences between bytes. We compare our new model with the model introduced in [GMS16], and we show that it is faster, and that it drastically reduces the number of non valid truncated diﬀerential characteristics for the most challenging instance.

5

In Section 5, we describe the CP model for solving Step 2, which is a straightforward extension of the model of [GMS16] to other key lengths than 128 bits. This model is able to quickly ﬁnd the optimal byte-consistent solution, given a truncated diﬀerential characteristic. However, for some instances, there are many truncated diﬀerential characteristics and in this case ﬁnding the optimal solution for all truncated diﬀerential characteristics becomes time consuming. In Section 6, we introduce a second contribution of this paper, which is a new decomposition of the solution process in two steps which allows us to solve the full problem much quicker. In Section 7, we give an overview of the new cryptanalysis results that have been derived from the new related-key diﬀerential characteristics computed thanks to our new CP models, and we describe some further work. 2. Background We start by presenting the AES encryption scheme and introducing the notations used in our modeling. Then, we describe the AES related-key diﬀerential characteristic problem, the two step solving process generally used to solve it, and how diﬀerential characteristics are used to mount related-key attacks. Finally we recall some basic principles of constraint programming. Throughout the paper, we note [i, j] the set of all integer values ranging from i to j, i.e., [i, j] = {v ∈ N : i ≤ v ≤ j}. 2.1. Description of AES AES ciphers blocks of length n = 128 bits, where each block is seen as a 4 × 4 matrix of bytes, where a byte is a sequence of 8 bits. Given a 4 × 4 matrix of bytes M , we note M [j][k] the byte at row j ∈ [0, 3], and column k ∈ [0, 3]. The length of keys is l ∈ {128, 192, 256} bits, and we note AES-l the AES with keys of length l. The only diﬀerence when changing the length l of the key is in the KeySchedule operator. We illustrate AES and describe our cryptanalytic models for the case where l = 128, such that the key is a 4 × 4 matrix of bytes. We describe the cases where l ∈ {192, 256} in Appendix. Like most of today’s block ciphers, AES is an iterative process which is composed of r rounds. The number of rounds r depends on the key length: r = 10 (resp. 12 and 14) when l = 128 (resp. 192 and 256). An AES round has an SPN (Substitution-Permutation Network) structure 6

Operations applied at each round i ∈ [0, r − 1] for AES-128: Key K = K0 (4×4 bytes)

KS

Subkey Ki+1

KS

SB

SR

MC

Plaintext X (4×4 bytes)

ARK

(i=r−1)

ARK Xi

SXi

Yi

Zi

Xr = AESK (X) Ciphertext

Figure 1: AES ciphering process for 128 bit keys. Each 4 × 4 array represents a group of 16 bytes. Before the ﬁrst round, X0 is obtained by applying ARK on the initial text X and the initial key K = K0 . Then, for each round i ∈ [0, r − 1], SB is applied on Xi to obtain SXi , SR is applied on SXi to obtain Yi , M C is applied on Yi to obtain Zi (except during the last round when i = r − 1), KS is applied on Ki to obtain Ki+1 , and ARK is applied on Ki+1 and Zi to obtain Xi+1 . The ciphertext is Xr .

and is described in Fig. 1 for l = 128. It uses the following ﬁve operations, that are described in details below: SubBytes (SB), ShiftRows (SR), MixColumns (M C), AddRoundKey (ARK), and KeySchedule (KS). Before the ﬁrst round, AddRoundKey is applied on the original plaintext X and the initial key K0 = K to obtain X0 = ARK(X, K0 ). Then, for each round i ∈ [0, r − 1]: SubBytes is applied on Xi to obtain SXi = SB(Xi ); ShiftRows is applied on SXi to obtain Yi = SR(SXi ); MixColumns is applied on Yi to obtain Zi = M C(Yi ) (except during the last round where MixColumns is omitted so that Zr−1 = Yr−1 ); KeySchedule is applied on Ki to obtain Ki+1 = KS(Ki ); and AddRoundKey is applied on Zi and Ki+1 to obtain Xi+1 = ARK(Zi , Ki+1 ). The ﬁnal ciphertext is Xr . SubBytes. SB is a non-linear permutation which is applied on each byte of Xi separately, according to a look-up table (called S-box) S : [0, 255] → [0, 255], i.e., ∀i ∈ [0, r − 1], ∀j, k ∈ [0, 3], SXi [j][k] = S(Xi [j][k]). ShiftRows. SR rotates on the left by one (resp. two and three) byte position the second (resp. third and fourth) row of SXi , i.e., ∀i ∈ [0, r − 1], ∀j, k ∈ [0, 3], Yi [j][k] = SXi [j][(k + j)%4]

7

where % is the modulo operator that returns the remainder of the euclidean division. MixColumns. M C multiplies each column of the input matrix Yi by a 4 × 4 ﬁxed matrix M to obtain each corresponding output column in the matrix Z, i.e., ∀i ∈ [0, r − 2], ∀j, k ∈ [0, 3], Zi [j][k] =

3

M [j][x] · Yi [x][k]

x=0

where · is a ﬁnite ﬁeld multiplication operator. For the last round, M C is not applied and we have Zr−1 = Yr−1 . Maximum Distance Separable property. The coeﬃcients of the matrix M used by M C have been chosen to ensure a so-called MDS (Maximum Distance Separable) property. This property comes from the theory of error correcting codes [MS77] and guarantees the maximal possible diﬀusion property [Sin06]. For AES, it implies that, for each round i ∈ [0, r − 2] and each column k ∈ [0, 3], the number of bytes that are diﬀerent from zero in column k of Yi or Zi is either equal to zero or strictly greater than four, i.e., ∀i ∈ [0, r − 2], ∀k ∈ [0, 3], (

3

(Yi [j][k] = 0) + (Zi [j][k] = 0)) ∈ {0, 5, 6, 7, 8}.

j=0

This property also holds when xoring diﬀerent columns. More precisely, let be i1 , i2 ∈ [0, r − 2] two round numbers, and k1 , k2 ∈ [0, 3] two column numbers. For every row j ∈ [0, 3], we have Zi1 [j][k1 ] ⊕ Zi2 [j][k2 ] = ( =

3

M [j][x] · Yi1 [x][k1 ]) ⊕ (

x=0 3

3

M [j][x] · Yi2 [x][k2 ])

x=0

M [j][x] · (Yi1 [x][k1 ] ⊕ Yi2 [x][k2 ])

x=0

Therefore, the MDS property also holds for the result of the xor of two diﬀerent columns, i.e., ∀i1 , i2 ∈ [0, r−2], ∀k1 , k2 ∈ [0, 3], (

3

(dY [j] = 0)+(dZ[j] = 0)) ∈ {0, 5, 6, 7, 8}.

j=0

where dY [j] = Yi1 [j][k1 ] ⊕ Yi2 [j][k2 ] and dZ[j] = Zi1 [j][k1 ] ⊕ Zi2 [j][k2 ]. 8

AddRoundKey. Before the ﬁrst round, ARK performs a xor between the initial plaintext X and the initial key K0 to obtain X0 : ∀j, k ∈ [0, 3], X0 [j][k] = X[j][k] ⊕ K0 [j][k]. Then, at each round i, ARK performs a xor between Zi and subkey Ki+1 to obtain Xi+1 , i.e., ∀i ∈ [0, r − 1], ∀j, k ∈ [0, 3], Xi+1 [j][k] = Zi [j][k] ⊕ Ki+1 [j][k]. KeySchedule. KS computes the subkey Ki+1 of each round i ∈ [0, r −1] from the initial key K. Its exact deﬁnition depends on the length l of the key. We describe it for l = 128 (see Appendix for keys of length l ∈ {192, 256}). The subkey at round 0 is the initial key, i.e., K0 = K. For each round i ∈ [0, r−1], the subkey Ki+1 is generated from the previous subkey Ki as follows: • The ﬁrst column of Ki+1 is obtained from the ﬁrst and last columns of Ki in two steps. First, we apply the SubBytes operation on all bytes of the last column of Ki . We note SKi [j][3] the resulting byte for row j, i.e., ∀i ∈ [0, r − 1], ∀j ∈ [0, 3], SKi [j][3] = S(Ki [j][3]). Second, we rotate up all bytes in SKi by one byte position, and then xor them with those in the ﬁrst column of Ki . Moreover, a constant ci is xored with the byte at row 0. ∀i ∈ [0, r − 1], Ki+1 [0][0] = SKi [1][3] ⊕ Ki [0][0] ⊕ ci ∀i ∈ [0, r − 1], ∀j ∈ [1, 3], Ki+1 [j][0] = SKi [(j + 1)%4][3] ⊕ Ki [j][0]. • For the last three columns k ∈ [1, 3], we simply perform xors: ∀i ∈ [0, r−1], ∀j ∈ [0, 3], ∀k ∈ [1, 3], Ki+1 [j][k] = Ki+1 [j][k−1]⊕Ki [j][k]. 2.2. AES related-key diﬀerential characteristics To mount related-key attacks, we track diﬀerences through the ciphering process, where diﬀerences are obtained by applying the xor operator. We note δA the diﬀerential matrix obtained by applying the xor operator on two matrices A and A , and for every row j and column k, δA[j][k] = A[j][k] ⊕ 9

A [j][k] is called a diﬀerential byte. The set of all diﬀerential bytes is denoted diﬀBytesl . Among these diﬀerential bytes, some of them pass through Sboxes, and we note Sboxes l this set. When the key length is l = 128, these two sets are deﬁned by: diﬀBytes128 = {δX[j][k], δXr [j][k], δKr [j][k] : j, k ∈ [0, 3]} ∪ {δKi [j][k], δSKi [j][3], δXi [j][k], δSXi [j][k], δYi [j][k], δZi [j][k] : i ∈ [0, r − 1], j, k ∈ [0, 3]} Sboxes128 = {δKi [j][3], δXi [j][k] : i ∈ [0, r − 1], j, k ∈ [0, 3]} See Appendix for the deﬁnition of diﬀBytesl and Sboxes l when l ∈ {192, 256}. The goal is to ﬁnd an optimal related-key diﬀerential characteristics, i.e., a byte value for every diﬀerential byte in diﬀBytesl such that the probability of observing δXr given δX and δK0 is maximal. This problem would be easy to solve (and the probability would be equal to 1) if every operation applied during the ciphering process were linear. However, AES is composed of three linear operations (SR, M C, and ARK), and one non linear operation (SB). Moreover, the key schedule KS combines linear xor operations with the non linear SB operation. Propagation of diﬀerences by the linear operations. The linear operators only move diﬀerences to other places, and we can deterministically compute the output diﬀerence of a linear operator given its input diﬀerence. For SR and M C, given two matrices A1 and A2 , we have SR(A1 ) ⊕ SR(A2 ) = SR(A1 ⊕ A2 ) and M C(A1 ) ⊕ M C(A2 ) = M C(A1 ⊕ A2 ). This implies that the output diﬀerence of SR is δYi = SR(δSXi ) when the input diﬀerence is δSXi , and the output diﬀerence of M C is δZi = M C(δYi ) when the input diﬀerence is δYi . Similarly, given four matrices A1 , A2 , A3 , and A4 , we have ARK(A1 , A2 ) ⊕ ARK(A3 , A4 ) = ARK(A1 ⊕ A3 , A2 ⊕ A4 ). Therefore, the output diﬀerence of ARK is δXi+1 = ARK(δZi , δKi+1 ) when the input diﬀerences are δZi and δKi+1 . Propagation of diﬀerences by the non-linear operation SB. This property no longer holds for SB which applies to a byte B an S-box transformation S(B) such that, given two bytes B and B , S(B ⊕ B ) is not necessarily equal to S(B) ⊕ S(B ). Therefore, given an input diﬀerential byte δBin , we cannot deterministically compute the output diﬀerence after passing through S-boxes. We can only compute probabilities. More precisely, for every couple of diﬀerential bytes (δBin , δBout ) ∈ [0, 255]2 , we can compute the probability 10

that the input diﬀerence δBin becomes the output diﬀerence δBout , which is the proportion of byte couples (B, B ) such that δBin = B ⊕ B and δBout = S(B) ⊕ S(B ). More precisely, this probability is denoted pS (δBout |δBin ) and is deﬁned by pS (δBout |δBin ) =

#{(B, B ) ∈ [0, 255]2 | (B ⊕B = δBin ) ∧ (S(B)⊕S(B ) = δBout )} 256

0 For the AES S-box, most of the times the probability pS is equal to 256 or 2 4 , and rarely to 256 [DR13]. The only case where pS is equal to 1 is when 256 there is no diﬀerence, i.e., pS (0|0) = 1: There is no diﬀerence in the output (δBout = 0) if and only if there is no diﬀerence in the input (δBin = 0), since the S-Box is bijective.

Probability of a valuation of diﬀerential bytes. To compute the probability of a given valuation of all diﬀerential bytes, we ﬁrst have to check that all linear operators are satisﬁed by the valuation. If this is not the case, then the probability is equal to 0. Otherwise it is equal to the product of the transition probabilities of all diﬀerential bytes that pass through S-boxes, i.e., p= pS (δSB|δB) (1) δB∈Sboxes l

We refer the reader to [BS91] for more details on diﬀerential characteristics. 2.3. Two step solving process The three approaches described in [BN10], [FJP13], and [GMS16] compute optimal diﬀerential characteristics in two steps. In a ﬁrst step, each diﬀerential byte δB ∈ diﬀBytesl is abstracted with a Boolean variable ΔB such that ΔB = 0 ⇔ δB = 0 and ΔB = 1 ⇔ δB ∈ [1, 255]. In other words, each Boolean variable assigned to 1 gives the position of a diﬀerence. The goal is to ﬁnd all truncated diﬀerential characteristics (i.e., all Boolean solutions) that minimize the number of diﬀerences passing through S-boxes (i.e., that minimize δB∈Sboxes l ΔB). We say that an S-box δB ∈ Sboxes l is active whenever there is a diﬀerence passing through it, i.e., ΔB = 1. In a second step, for each truncated diﬀerential characteristic, we search for actual byte values that satisfy the AES operations and that maximize the probability p deﬁned in Eq.(1). When a Boolean variable ΔB is equal to 0 in the truncated diﬀerential characteristic, there is only one possible 11

Algorithm 1: Computation of optimal diﬀerential characteristics

1 2 3 4 5 6 7 8 9 10 11 12

Input: The size l of the key and the number r of rounds Output: An optimal diﬀerential characteristic c∗ begin v ∗ ← Step1-opt(l, r) v ← v∗ c∗ ← null repeat T ← Step1-enum(l, r, v) for each truncated diﬀerential characteristic t ∈ T do c ← Step2(l, r, t) if c = null and (c∗ = null or p(c) > p(c∗ )) then c∗ ← c; v ←v+1 until c∗ = null and p(c∗ ) ≥ 2−6v ; return c∗

value for δB, which is 0. However, when ΔB = 1, there are 255 possible values for δB. Note that some truncated diﬀerential characteristics are not valid and cannot be transformed into byte solutions because truncated differential characteristics only satisfy Boolean abstractions of the actual AES operations. These characteristics are said to be byte-inconsistent. The complete procedure to ﬁnd optimal diﬀerential characteristics is described in Algorithm 1. It ﬁrst calls function Step1-opt to compute the minimal number v ∗ of active S-boxes in a truncated diﬀerential characteristic (line 2), and it initializes v to this minimal number of active S-boxes. Then, it calls function Step1-enum to compute the set T of all truncated diﬀerential characteristics such that the number of active S-boxes is equal to v (line 6). For each truncated diﬀerential characteristic t ∈ T , it calls function Step2 (line 8): if t is not byte-consistent, Step2 returns null; otherwise, it returns the optimal diﬀerential characteristic c associated with t. If this optimal diﬀerential characteristic has a greater probability than c∗ , then it updates c∗ (line 9). Hence, when exiting from the loop lines 7-9, if c∗ is equal to null (because all truncated diﬀerential characteristics in T are byte-inconsistent), we have to increment v and iterate again on lines 6 to 10; otherwise, p(c∗ ) is the largest probability with v active S-boxes and we have 2−7v ≤ p(c∗ ) ≤ 2−6v (because each S-box has a probability which is equal to 2−7 or 2−6 ). However, it may be possible that p(c∗ ) is not maximal: there may exist a solution with 12

higher probability with v + 1 active S-boxes if p(c∗ ) < 2−6(v+1) . In this case, we have to increment v and iterate again on lines 6 to 10 to check whether there exists a higher probability with one more active S-box. In this paper, we describe CP models for implementing functions Step1opt, Step1-enum, and Step2. Step1-opt and Step1-enum use the same model which is described in Sections 3 and 4. The model of Step2 is described in Section 5. In Section 6, we introduce a new decomposition in two steps that speeds-up the solution process. 2.4. From diﬀerential characteristics to related-key attacks Let us ﬁrst clarify the relation between the diﬀerential characteristic c∗ computed by Algo. 1 and an optimal diﬀerential. What cryptanalysts would like to have is an optimal diﬀerential, i.e., input and output diﬀerences that maximize the probability of observing the output diﬀerence given the input diﬀerence. Diﬀerential characteristics have been introduced to simplify the problem by specifying all intermediate diﬀerences (after each step of the ciphering process). Several diﬀerential characteristics may have the same input and output diﬀerences and only diﬀer on intermediate diﬀerences. Hence, the probability of a diﬀerential is the sum of the probabilities of all diﬀerential characteristics that share its input and output diﬀerences. Thus, ﬁnding the diﬀerential characteristic with the best probability gives us a lower bound on the expected diﬀerential probability. This notion of expected diﬀerential probability evaluates the average behavior of the diﬀerential taken on average on all the key space, assuming the independence of intermediate probabilities, the stochastic equivalence of the keys and other simplifying assumptions such as those coming from the related-key setting, for example. In summary, the probability of the diﬀerential characteristic computed by Algo. 1 (denoted p) is an approximation for a lower bound of the expected diﬀerential probability. Let us now explain how the diﬀerences δX, δK, and δXr of the diﬀerential characteristic c∗ computed by Algo. 1 may be used to construct a so-called distinguisher. To this aim, let us assume that we can query an oracle: given a plaintext X, this oracle returns C = EncK (X) such that C is either the result of ciphering X by r rounds of the AES with an unknown and randomly chosen key K, or it is a random permutation. The attacker randomly chooses a set S = {X 1 , . . . , X m } of m plaintexts. For each plaintext X i ∈ S, she asks the oracle to compute C i = EncK (X i ) and C i = EncK (X i ) where X i = X i ⊕ δX, and K = K ⊕ δK. If the oracle implements the AES, 13

then she can expect to ﬁnd a pair (X i , X i ) such that C i ⊕ C i = δXr after trying roughly 1/p random input pairs. As p is much larger than 2−n , this is much sooner than expected for a randomly selected permutation. In other words, the diﬀerential characteristic c∗ allows the attacker to know if the Enc function of the oracle is r rounds of the AES or a random permutation: we have built a distinguisher on r rounds, i.e., we have exhibited a particular property that distinguishes a random permutation from the AES with a complexity equal to O(1/p). Finally, this distinguisher may be used to mount a related-key diﬀerential attack on r + 1 rounds. This last step is diﬃcult to explain in a few lines. The basic idea is to add an extra round at the end of the r rounds and to recover some key bits of the subkey Kr+1 by exploiting the diﬀerences in the ciphertexts. We refer the reader to [BS91] for more details on related-key attacks. 2.5. Reminders on Constraint Programming We brieﬂy recall basic principles of CP and we refer the reader to [RBW06] for more details. CP is used to solve Constraint Satisfaction Problems (CSPs). A CSP is deﬁned by a triple (X, D, C) such that X is a ﬁnite set of variables, D is a function that maps every variable xi ∈ X to its domain D(xi ) (that is, the ﬁnite set of values that may be assigned to xi ), and C is a set of constraints (that is, relations between some variables which restrict the set of values that may be assigned simultaneously to these variables). Constraints may be deﬁned in extension, by listing all allowed (or forbidden) tuples of the relation, or in intention, by using mathematical operators. Let us consider for example a CSP with X = {x1 , x2 , x3 } such that D(x1 ) = D(x2 ) = D(x3 ) = {0, 1}, and let us consider a constraint that ensures that the sum of the variables in X is diﬀerent from 1. This constraint may be deﬁned by a table constraint that enumerates all allowed tuples: (x1 , x2 , x3 ) ∈ {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0), (1, 1, 1)}. Conversely, it may be deﬁned by enumerating all forbidden tuples: (x1 , x2 , x3 ) ∈ {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. Finally, it may be deﬁned by using arithmetic operators: x1 + x2 + x3 = 1. 14

Solving a CSP involves assigning values to variables such that all constraints are satisﬁed. More formally, an assignment A is a function which maps each variable xi ∈ X to a value A(xi ) ∈ D(xi ). An assignment A satisﬁes (resp. violates) a constraint c ∈ C if the tuple deﬁned by the values assigned to the variables of c in A belongs (resp. does not belong) to the relation deﬁned by c. An assignment is consistent (resp. inconsistent) if it satisﬁes all the constraints (resp. violates some constraints) of the CSP. A solution of a CSP is a consistent assignment. An objective function may be added to a CSP, thus deﬁning a Constrained Optimization Problem (COP). This objective function is deﬁned on some variables of X and the goal is to ﬁnd the solution that optimizes (minimizes or maximizes) the objective function. CP languages provide high-level features to deﬁne CSPs and COPs in a declarative way. Then, these problems are solved by generic constraint solvers which are usually based on a systematic exploration of the search space: Starting from an empty assignment, they incrementally extend a partial consistent assignment by choosing a non-assigned variable and a consistent value for it until either the current assignment is complete (a solution has been found) or the current assignment cannot be extended without violating constraints (the search must backtrack to a previous choice point and try another extension). To reduce the search space, this exhaustive exploration of the search space is combined with constraint propagation techniques: Each time a variable is assigned to a value, constraints are propagated to ﬁlter the domains of the variables that are not yet assigned, i.e., to remove values that are not consistent with respect to the current assignment. When constraint propagation removes all values from a domain, the search must backtrack. Diﬀerent levels of constraint propagation may be considered; some enforce stronger consistencies than others, i.e., they remove more values from the domains. A widely used propagation level enforces Generalized Arc Consistency (GAC): a constraint c is GAC if for each variable xi of c and each value vi ∈ D(xi ), there exists a tuple t in the cartesian product of the domains of the variables of c such that t satisﬁes c and xi is assigned to vi in t (t is called a support of xi = vi ). Let us consider for example the constraint x1 + x2 + x3 = 1. If D(x1 ) = D(x2 ) = {0} and D(x3 ) = {0, 1}, then the constraint is not GAC because there is no support of x3 = 1 (the only tuple in D(x1 ) × D(x2 ) × D(x3 ) that satisﬁes the constraint is (0, 0, 0)). Hence, to enforce GAC, we must remove 1 from D(x3 ). 15

A key point to speed up the solving process of CSPs is to deﬁne the order in which variables are assigned, and the order in which values are assigned to these variables, when building the search tree. CP languages allow the user to specify this through variable and value ordering heuristics. 3. Basic CP model for Step 1 Functions Step1-opt and Step1-enum used in Algorithm 1 for computing optimal diﬀerential characteristics share the same CP model. The only diﬀerence is in the goal of the solving process: in Step1-opt the goal is to search for a solution that optimizes a given variable called obj Step1 , whereas in Step1enum the goal is to enumerate all solutions when the variable obj Step1 is assigned to a given value. In this section, we describe a ﬁrst CP model for Step1-opt and Step1enum, called CPBasic , which has been introduced in [MSR14] and is derived in a straightforward way from the deﬁnition of the AES operations. 3.1. Variables of CPBasic CPBasic does not associate a Boolean variable with every diﬀerential byte δB ∈ diﬀBytes l . Indeed, during Step 2, the initial diﬀerential plaintext δX, the last round diﬀerential subkey δKr , and the ﬁnal diﬀerential ciphertext δXr can be deterministically computed given the values of all other diﬀerential bytes: • δX is obtained by xoring δX0 and δK0 . • δKr is obtained by applying the key schedule to δKr−1 for AES-128, and to δKr−1 and δKr−2 for AES-192 and AES-256. Note that for the bytes δB that pass through S-boxes during this last round, we deterministically choose for δSB the value that maximizes pS (δSB|δB). • δXr is obtained by xoring δZr−1 and δKr . Hence, CPBasic associates a Boolean variable ΔB with every diﬀerential byte δB ∈ diﬀBytes l \ {δX[j][k], δKr [j][k], δXr [j][k] : j, k ∈ [0, 3]}. Each Boolean variable ΔB is assigned to 0 if δB = 0, and to 1 otherwise. We also deﬁne an integer variable objStep1 which corresponds to the number of active S-boxes. The domain of this variable is D(objStep1 ) = [1, 6l ]. Indeed, the smallest possible value is 1 because we need to have at least one 16

active S-box to have a diﬀerential characteristic (we forbid the obvious solution such that δX and δK only contain bytes set to 0, meaning that there is no diﬀerence in the initial plaintext and key). The largest possible value is 6l because the highest probability pS (δout |δin ) to pass through the AES S-box is 4 2−6 = 256 when δin = 0. Therefore, the greatest probability of a diﬀerential characteristic with objStep1 active S-boxes is 2−6∗objStep1 . As we want a diﬀerential characteristic which is more eﬃcient than the key exhaustive search, its probability must be greater than 2−l (as explained in Section 2.4) and, therefore, objStep1 must not be larger than 6l . 3.2. Deﬁnition of a xor constraint As many AES transformations use the xor operator, we deﬁne a constraint to model it at the Boolean level. During Step 2, when reasoning at the byte level, the xor operator is applied on each bit of each byte using the following table: x 0 0 1 1

y x⊕y 0 0 1 1 0 1 1 0

However, during Step 1, each byte is abstracted by a Boolean value indicating whether the byte is equal to 0 or not, and we must consider an abstraction of this operator. More precisely, let us consider three diﬀerential bytes δB1 , δB2 and δB3 such that δB1 ⊕ δB2 = δB3 (or equivalently, δB1 ⊕ δB2 ⊕ δB3 = 0). If two of these bytes are equal to 0, then we know for sure that the third one is also equal to 0 (because 0 ⊕ 0 = 0). Also, if one diﬀerential byte is equal to 0 and a second one is diﬀerent from 0, then we know for sure that the third one is diﬀerent from 0 (because 0 ⊕ δBi = δBi ). However, when two diﬀerential bytes are diﬀerent from 0, then we cannot know if the third one is equal to 0 or not (because if the two bytes have the same value then the third one is equal to 0, whereas if they have diﬀerent values the third one is diﬀerent from 0). Hence, when abstracting diﬀerential bytes δB1 , δB2 and δB3 with Boolean variables ΔB1 , ΔB2 and ΔB3 (which only model the fact that there is a diﬀerence or not), we obtain the following deﬁnition of the xor constraint: XOR(ΔB1 , ΔB2 , ΔB3 ) ⇔ ΔB1 + ΔB2 + ΔB3 = 1 17

(C1 ) objStep1 =

δB∈Sboxes l

ΔB

(C2 ) ∀δB ∈ Sboxes l , ΔSB = ΔB (C3 ) ∀i ∈ [0, r − 2], ∀j, k ∈ [0, 3], XOR(ΔZi [j][k], ΔKi+1 [j][k], ΔXi+1 [j][k]) (C4 ) ∀i ∈ [0, r − 1], ∀j, k ∈ [0, 3], ΔYi [j][k] = ΔSXi [j][(j + k)%4] 3 ΔY [j][k] + ΔZ [j][k] ∈ {0, 5, 6, 7, 8} (C5 ) ∀i ∈ [0, r − 2], ∀k ∈ [0, 3], i i j=0 (C6 ) ∀j, k ∈ [0, 3], ΔZr−1 [j][k] = ΔYr−1 [j][k] (C7 ) ∀i ∈ [0, r − 1], ∀j ∈ [0, 3], XOR(ΔKi+1 [j][0], ΔKi [j][0], ΔSKi [(j + 1)%4][3]) (C8 ) ∀i ∈ [0, r − 1], ∀j ∈ [0, 3], ∀k ∈ [1, 3], XOR(ΔKi+1 [j][k], ΔKi+1 [j][k − 1], ΔKi [j][k]) Figure 2: Constraints of the CPBasic model for Step 1 of AES-128

where + stands for the integer addition. In other words, either the three variables are false (no diﬀerence), or at least two variables are true (at least two diﬀerences). 3.3. Constraints of CPBasic The constraints of CPBasic are listed in Fig. 2 and are described below. Number of active S-boxes. Constraint (C1 ) ensures that objStep1 is equal to the number of active S-boxes, i.e., the number of ΔB variables that are associated with diﬀerential bytes in Sboxes l and that are assigned to 1. SubBytes. Constraint (C2 ) ensures that there is a diﬀerence in the output diﬀerential byte of an S-box iﬀ there is a diﬀerence in the input diﬀerential byte. Indeed, SB has no eﬀect on the presence/absence of diﬀerences during Step 1 because the S-box is bijective and (B ⊕B = 0) ⇔ (S(B)⊕S(B ) = 0). AddRoundKey. ARK is modeled by the xor constraint (C3 ). ShiftRows. SR is modeled by the equality constraint (C4 ) that simply links the shifted bytes. 18

MixColumns. MC cannot be modeled at the Boolean level, as knowing where diﬀerences hold in Yi is not enough to determine where they hold in Zi . However, the MDS property is modeled by constraint (C5 ): It ensures that the number of diﬀerences in a same column of Yi and Zi is equal to 0 or greater than 4. Constraint (C6 ) models the fact that MC is not applied during the last round. KeySchedule. KS is modeled by xor constraints (combined with a rotation of the bytes of SK for some columns). When l = 128, this corresponds to constraints (C7 ) and (C8 ). The constant ci which is xored with SKi [1][3] and Ki [0][0] when deﬁning Ki+1 [0][0] does not appear in (C7): it is canceled when xoring Ki+1 [0][0] with Ki+1 [0][0] to deﬁne δKi+1 [0][0]. Indeed, we have: δKi+1 [0][0] = Ki+1 [0][0] ⊕ Ki+1 [0][0] = SKi [1][3] ⊕ Ki [0][0] ⊕ ci ⊕ SKi [1][3] ⊕ Ki [0][0] ⊕ ci = δSKi [1][3] ⊕ δKi [0][0]

See Appendix for the deﬁnition of the KS constraints when l ∈ {192, 256}. 3.4. Goal The same model is used to solve two problems, i.e., Step1-opt and Step1enum. The diﬀerence between these problems is in their goals: • For Step1-opt, the goal is to ﬁnd the minimum number of active Sboxes, i.e., minimize objStep1 ; • For Step1-enum, the goal is to enumerate all solutions when objStep1 is assigned to a given value v. Each solution corresponds to a truncated diﬀerential characteristic with v active S-boxes. 3.5. Ordering heuristics As the goal is to minimize the number of active S-boxes (for Step1-opt) or enumerate all solutions with a small number of active S-boxes (for Step1enum), we deﬁne ordering heuristics as follows: ﬁrst assign variables associated with bytes that pass through S-boxes (those in Sboxes l ), and ﬁrst try to assign them to 0.

19

3.6. Performance of CPBasic CPBasic is complete in the sense that for any solution at the byte level (on δ variables), there exists a solution of CPBasic at the Boolean level (on Δ variables). However, experiments reported in [GMS16] have shown us that there is a huge number of solutions of CPBasic which are byte inconsistent and do not correspond to solutions at the byte level. For example, for AES-128, when the number of rounds is r = 3, the optimal solution of Step1-opt has objStep1 = 3 active S-boxes, and Step1-enum enumerates more than ﬁve hundred truncated diﬀerential characteristics with three active Sboxes. However, none of them is byte-consistent. Actually, the optimal byte-consistent truncated diﬀerential characteristic has ﬁve active S-boxes. In this case, most of the solving time is spent at generating useless truncated diﬀerential characteristics which are discarded in Step 2. 4. CPXOR : New CP model for Step 1 In [GMS16], we have described another model for Step 1, called CPEQ , that drastically reduces the number of inconsistent truncated diﬀerential characteristics for AES-128. In this section, we describe a new model for Step 1, called CPXOR , and we show that it is more eﬃcient than CPEQ . A weakness of CPBasic comes from the fact that the xor constraint between Boolean variables is a poor abstraction of the xor relation at the byte level, because whenever two xored bytes are diﬀerent from zero, we cannot know whether the result is equal to zero or not. In Section 4.1, we show how to tighten this abstraction by generating new xor equations, obtained by combining initial equations coming from the key schedule. In Section 4.2, we describe the variables of CPXOR , and we introduce new Boolean variables that model diﬀerences at the byte level. In Section 4.3, we describe the constraints of CPXOR , and we show how to tighten the deﬁnition of the AES operations at the Boolean level by reasoning on diﬀerences at the byte level. In Section 4.4 we discuss some implementation issues and in Section 4.5, we experimentally compare CPXOR with CPEQ . 4.1. Generation of new xors Every subkey diﬀerential byte δKi [j][k] either comes from the initial differential key δK, or is obtained by xoring two diﬀerential bytes according to the key schedule rules. Hence, the whole key schedule implies a set of 16 ∗ (r − 1) (resp. 16 ∗ (r − 1) − 8 and 16 ∗ (r − 2)) xor equations for AES-128 20

(resp. AES-192 and AES-256), where each of these equations involves three diﬀerential bytes. We propose to combine these initial equations to infer new equations. Let us consider, for example, the three equations that deﬁne δK1 [0][3], δK2 [0][2] and δK2 [0][3], respectively, for AES-128: δK0 [0][3] ⊕ δK1 [0][2] ⊕ δK1 [0][3] = 0 δK1 [0][2] ⊕ δK2 [0][1] ⊕ δK2 [0][2] = 0 δK1 [0][3] ⊕ δK2 [0][2] ⊕ δK2 [0][3] = 0

(2) (3) (4)

These equations share bytes: δK1 [0][2] for Eq. (2) and (3), δK1 [0][3] for Eq. (2) and (4), and δK2 [0][2] for Eq. (3) and (4). We may combine Eq. (2), (3), and (4) by xoring them, and exploit the fact that B ⊕ B = 0 for any byte B, to generate the following equation: δK0 [0][3] ⊕ δK2 [0][1] ⊕ δK2 [0][3] = 0

(5)

This new equation is redundant at the byte level, as (2) ∧ (3) ∧ (4) ⇒ (5). However, at the Boolean level, the propagation of the xor constraint corresponding to Eq. (5) detects inconsistencies which are not detected when only considering the xor constraints corresponding to Eq. (2), (3), and (4). Let us consider, for example, the case where ΔK0 [0][3] = 1, ΔK2 [0][1] = ΔK2 [0][3] = 0, and all other Boolean variables still have 0 and 1 in their domains. In this case, the propagation of xor constraints associated with Eq. (2), (3), and (4) does not detect an inconsistency as only one variable is assigned for each constraint, and for the two other variables, we can always choose values such that the sum is diﬀerent from 1. However, at the byte level, δK0 [0][3] cannot be assigned to a value diﬀerent from 0 when δK2 [0][1] = δK2 [0][3] = 0. Indeed, in this case, Eq. (3) and (4) imply: δK1 [0][2] ⊕ δK2 [0][2] = δK1 [0][3] ⊕ δK2 [0][2] = 0 ⇒ δK1 [0][2] = δK2 [0][2] = δK1 [0][3] ⇒ δK0 [0][3] ⊕ δK1 [0][2] ⊕ δK1 [0][3] = δK0 [0][3] As a consequence, if δK0 [0][3] = 0, we cannot satisfy Eq. (2). This inconsistency is detected when propagating the xor constraint associated with Eq. (5), as it ensures that ΔK0 [0][3] + ΔK2 [0][1] + ΔK2 [0][3] = 1. Hence, we propose to combine xor equations of the key schedule to generate new equations. More precisely, given two equations δB1 ⊕. . .⊕δBn = 0 21

and δB1 ⊕ . . . ⊕ δBm = 0 such that {B1 , . . . , Bn } ∩ {B1 , . . . , Bm } = ∅, we generate the equation: δB = 0. }\{B ,...,B }∩{B ,...,B } B∈{B1 ,...,Bn }∪{B1 ,...,Bm n 1 m 1

This new equation is recursively combined with existing ones to generate other equations until no more equation can be generated. For example, from Eq. (2), (3), and (4), we generate the following equations: From (2) and (3): δK0 [0][3] ⊕ δK1 [0][3] ⊕ δK2 [0][1] ⊕ δK2 [0][2] = 0 (6) From (2) and (4): δK0 [0][3] ⊕ δK1 [0][2] ⊕ δK2 [0][2] ⊕ δK2 [0][3] = 0 (7) From (3) and (4): δK1 [0][2] ⊕ δK1 [0][3] ⊕ δK2 [0][1] ⊕ δK2 [0][3] = 0 (8) Then, from Eq. (2) and (8) (or, equivalently, from Eq. (3) and (7) or from Eq. (4) and (6)), we generate Eq. (5). As no new equation can be generated from Eq. (2), (3), (4), (5), (6), (7), and, (8), the process stops. The number of new equations that may be generated grows exponentially with respect to the number r of rounds. For example, for AES-128, when r = 3 (resp. r = 4), the total number of new equations that may be generated is 988 (resp. 16332). When further increasing r to 5, the number of new equations becomes so large that we cannot generate them within a time limit of one hour. To avoid this combinatorial explosion, we only generate equations that involve at most four diﬀerential bytes. Indeed, the basic Boolean xor constraint associated with an equation simply states that the sum of the Boolean variables must be diﬀerent from 1. In Section 4.3.1, we show how to strengthen this constraint by reasoning on diﬀerences at the byte level when the xor constraint involves no more than four variables. In this case, each xor constraint is very eﬃciently handled by simply channeling three pairs of Boolean variables. Preliminary experiments have shown us that these strengthened xor constraints both reduce the number of choice points and speed-up the solution process, and that further adding xor constraints for equations that involve more than four variables (to forbid that their sum is equal to one) does not signiﬁcantly reduce the number of choice points and often increases time. For AES-128 (resp. AES-192 and AES-256) with r = 10 (resp. r = 12 and r = 14) rounds, the number of initial equations coming from the key schedule 22

is 144 (resp. 168 and 192). From these initial equations, we generate 122 (resp. 168 and 144) new equations that involve three diﬀerential bytes, and 1104 (resp. 1696 and 1256) new equations that involve four diﬀerential bytes. The algorithm that generates equations has been implemented in Picat [ZKF15], and the time spent by Picat to search for all equations of length smaller than or equal to four is always smaller than 0.1 seconds. This time is negligible compared to the time needed to solve Step 1. We note xorEq l the set of all equations (both initial and generated equations) coming from the key schedule when the key length is l. Note that xorEq l actually contains all possible equations with at most four diﬀerential bytes implied by the key schedule algorithm. In other words, our procedure does not miss implied equations even if it does not allow the generation of intermediate equations of length greater than 4. We have checked this by exhaustively generating all possible equations with at most four diﬀerential key bytes and, for each equation that does not belong to xorEq l , we have proven that it is not a logical consequence of the initial set of equations of the key schedule (by demonstrating that xorEq l is consistent with the negation of the equation). The whole checking procedure (performed by a program written in C) for all possible equations, is done in a few minutes for the three possible key lengths. 4.2. Variables of CPXOR All variables of CPBasic are also variables of CPXOR . We introduce new Boolean variables that are used to tighten the Boolean abstraction by reasoning on diﬀerences at the byte level: given two diﬀerential bytes δB1 and δB2 , the Boolean variable diﬀ δB1 ,δB2 is equal to 1 if δB1 = δB2 , and to 0 otherwise. We do not deﬁne a diﬀ variable for every couple of diﬀerential bytes in diﬀBytes l , but restrict our attention to couples of bytes for which it is useful to know whether they are equal or not. More precisely, we consider three separate sets of diﬀerential bytes and we only compare diﬀerential bytes that belong to a same set. • The ﬁrst set, called DK, contains bytes coming from δK and δSK matrices. The diﬀ variables associated with these bytes are used to tighten the constraints associated with the equations of xorEq l , as explained in Section 4.3.1. • The second and third sets, called DY and DZ, contain bytes coming from δY and δZ matrices, respectively. The diﬀ variables associated 23

with these bytes are used to propagate the MDS property of MixColumns at the byte level, as explained in Section 4.3.2. For each of these three sets, we consider a separate subset for each row j ∈ [0, 3]: • For DK, we know that every initial xor equation due to the key schedule either involves three bytes on a same row j (i.e., ΔKi+1 [j][k], ΔKi+1 [j][k − 1], and ΔKi [j][k]), or it involves two bytes on a same row j (i.e., ΔKi [j][0], and ΔKi+1 [j][0]), and a byte that has just passed through an S-box on the next row ((j+1)%4) (i.e., ΔSKi [(j+1)%4][3]). As we cannot know if the input and output diﬀerences of S-boxes are equal or not (i.e., if δKi [(j + 1)%4][3] is equal to δSKi [(j + 1)%4][3] or not), we can limit diﬀ variables to couples of diﬀerential bytes that occur on a same row of δK matrices, or on two consecutive rows of δK and δSK matrices. • For DY and DZ, the MDS property implies relations between columns of DY and DZ and, in Section 4.3.2, we use a generalization of this property that xors bytes of a same row for diﬀerent columns. As these xors are only performed on bytes that occur in a same row, we can limit diﬀ variables to couples of diﬀerential bytes that occur in a same row of δY matrices (for DY ) and δZ matrices (for DZ). Hence, for each row j ∈ [0, 3], we deﬁne the three following sets: DKj = {δKi [j][k], δSKi [(j + 1)%4][3] : i ∈ [1, r], k ∈ [0, 3]} DYj = {δYi [j][k] : i ∈ [0, r − 2], k ∈ [0, 3]} DZj = {δZi [j][k] : i ∈ [0, r − 2], k ∈ [0, 3]}. Given these sets, we deﬁne the diﬀ variables as follows: For each set D ∈ {DKj , DYj , DZj : j ∈ [0, 3]}, and for each pair of diﬀerential bytes {δB1 , δB2 } ⊆ D, we deﬁne a Boolean variable diﬀ δB1 ,δB2 . 4.3. Constraints of CPXOR The constraints of CPXOR are listed in Fig. 3. Constraints (C1 ) to (C6 ) are identical to Constraints (C1 ) to (C6 ) of CPBasic . Constraints (C2 ), (C4 ), (C6 ), (C7 ), (C10 ), and (C11 ) are equalities of the form Δi = Δj : they allow us to rename variables in order to simplify the description of the model without changing the performance of the solvers. 24

(C1 ) objStep1 =

δB∈Sboxes l

ΔB

(C2 ) ∀δB ∈ Sboxes l , ΔSB = ΔB (C3 ) ∀i ∈ [0, r − 2], ∀j, k ∈ [0, 3], XOR(ΔZi [j][k], ΔKi+1 [j][k], ΔXi+1 [j][k]) (C4 ) ∀i ∈ [0, r − 1], ∀j, k ∈ [0, 3], ΔYi [j][k] = ΔSXi [j][(j + k)%4] 3 ΔY [j][k] + ΔZ [j][k] ∈ {0, 5, 6, 7, 8} (C5 ) ∀i ∈ [0, r − 2], ∀k ∈ [0, 3], i i j=0 (C6 ) ∀j, k ∈ [0, 3], ΔZr−1 [j][k] = ΔYr−1 [j][k] (C7 ) ∀D ∈ {DKj , DYj , DZj : j ∈ [0, 3]}, ∀{δB1 , δB2 } ⊆ D, diﬀ δB1 ,δB2 = diﬀ δB2 ,δB1 (C8 ) ∀D ∈ {DKj , DYj , DZj : j ∈ [0, 3]}, ∀{δB1 , δB2 , δB3 } ⊆ D, diﬀ δB1 ,δB2 + diﬀ δB2 ,δB3 + diﬀ δB1 ,δB3 = 1 (C9 ) ∀D ∈ {DKj , DYj , DZj : j ∈ [0, 3]}, ∀{δB1 , δB2 } ⊆ D, diﬀ δB1 ,δB2 + ΔB1 + ΔB2 = 1 ) ∀(δB1 ⊕ δB2 ⊕ δB3 = 0) ∈ xorEq l , (C10 (diﬀ δB1 ,δB2 = ΔB3 ) ∧ (diﬀ δB1 ,δB3 = ΔB2 ) ∧ (diﬀ δB2 ,δB3 = ΔB1 ) ) ∀(δB1 ⊕ δB2 ⊕ δB3 ⊕ δB4 = 0) ∈ xorEq l , (C11 (diﬀ δB1 ,δB2 = diﬀ δB3 ,δB4 ) ∧ (diﬀ δB1 ,δB3 = diﬀ δB2 ,δB4 ) ∧ (diﬀ δB1 ,δB4 = diﬀ δB2 ,δB3 ) ) ∀i1 , i2 ∈ [0, r − 2], ∀k1 , k2 ∈ [0, 3], (C12 3 j=0 diﬀ δYi1 [j][k1 ],δYi2 [j][k2 ] + diﬀ δZi1 [j][k1 ],δZi2 [j][k2 ] ∈ {0, 5, 6, 7, 8} ) ∀i1 , i2 ∈ [0, r − 2], ∀j, k1 , k2 ∈ [0, 3], (C13 diﬀ δKi1 +1 [j][k1 ],δKi2 +1 [j][k2 ] + diﬀ δZi1 [j][k1 ],δZi2 [j][k2 ] + ΔXi1 +1 [j][k1 ] + ΔXi2 +1 [j][k2 ] = 1

Figure 3: Constraints of the CPXOR model for Step 1

Constraints (C7 ) and (C8 ) ensure symmetry and transitivity on diﬀ variables. For each set D ∈ {DKj , DYj , DZj : j ∈ [0, 3]}, they ensure that the diﬀ variables associated with couples of bytes in D deﬁne an equivalence 25

relation. Constraint (C9 ) relates diﬀ and Δ variables. Indeed, whenever two Boolean variables ΔB1 and ΔB2 are equal to 0, then we know for sure that there is no diﬀerence at the byte level, i.e., δB1 = δB2 . Similarly, whenever ΔB1 = ΔB2 , we know for sure that there is a diﬀerence at the byte level, i.e., δB1 = δB2 . Constraints (C10 ) and (C11 ) propagate the xor equations of xorEq l and are described in Section 4.3.1. Constraint (C12 ) propagates the MDS prop erty at the byte level and is described in Section 4.3.2. Constraint (C13 ) propagates ARK at the byte level and is described in Section 4.3.3. 4.3.1. Constraints associated with the xor equations of xorEq l In the basic model, every xor constraint involves exactly three Boolean variables, and ensures that the sum of these variables is diﬀerent from 1. In Section 4.1, we have shown how to generate new xor equations from initial key schedule equations, and these new equations either involve three or four diﬀerential bytes. Hence, we need to extend the xor constraint for the case where we have four Boolean variables. A straightforward extension is to simply state that the sum of the Boolean variables must be diﬀerent from 1. Indeed, a xor equation that involves four diﬀerential bytes cannot be satisﬁed if exactly one of these four diﬀerential bytes is diﬀerent from zero. However, this is a poor abstraction of the relation at the byte level. We propose to tighten Boolean xor constraints associated with the Key Schedule by exploiting diﬀ variables. Constraint (C10 ) corresponds to the case of equations that involve three diﬀerential bytes. For each equation δB1 ⊕ δB2 ⊕ δB3 = 0 in xorEq l , it ensures that whenever two diﬀerential bytes of {δB1 , δB2 , δB3 } have diﬀerent values, then the third one is diﬀerent from 0 and therefore its associated Boolean variable is equal to 1. Note that this constraint combined with Constraint (C9 ) ensures that ΔB1 + ΔB2 + ΔB3 = 1. Indeed, if ΔB1 = 1 and ΔB2 = ΔB3 = 0, then Constraint (C9 ) implies that diﬀ δB2 ,δB3 = 0, which is inconsistent with the fact that diﬀ δB2 ,δB3 must be equal to ΔB1 . Constraint (C11 ) corresponds to the case of equations that involve four diﬀerential bytes. For each equation δB1 ⊕ δB2 ⊕ δB3 ⊕ δB4 = 0 in xorEq l , it ensures that whenever two diﬀerential bytes of {δB1 , δB2 , δB3 , δB4 } are equal then the two other ones must also be equal. Again, this constraint combined with Constraint (C9 ) ensures that ΔB1 + ΔB2 + ΔB3 + ΔB4 = 1. Indeed, if ΔB1 = 1 and ΔB2 = ΔB3 = ΔB4 = 0, then Constraint (C9 ) implies that diﬀ δB2 ,δB3 = 0 and diﬀ δB1 ,δB4 = 1, which is inconsistent with the fact that 26

diﬀ δB2 ,δB3 must be equal to diﬀ δB1 ,δB4 . 4.3.2. Propagation of MDS at the byte level As seen in Section 2.1 (paragraph “Maximum Distance Separable property”), the MDS property also holds for the result of the xor of two diﬀerent columns of δY and δZ. Hence, for all i1 , i2 in [0, r − 2], and for all k1 , k2 in [0, 3], we have: 3

(δYi1 [j][k1 ] ⊕ δYi2 [j][k2 ] = 0) + (δZi1 [j][k1 ] ⊕ δZi2 [j][k2 ] = 0) ∈ {0, 5, 6, 7, 8}.

j=0 This property is modelled by constraint (C12 ).

4.3.3. Propagation of ARK at the byte level ARK implies the following equations: ∀i1 , i2 ∈ [0, r − 2], ∀j, k1 , k2 ∈ [0, 3], δKi1 +1 [j][k1 ] ⊕ δZi1 [j][k1 ] = δXi1 +1 [j][k1 ] δKi2 +1 [j][k2 ] ⊕ δZi2 [j][k2 ] = δXi2 +1 [j][k2 ]. By xoring these two equations, we obtain: δKi1 +1 [j][k1 ] ⊕ δKi2 +1 [j][k2 ] ⊕ δZi1 [j][k1 ] ⊕ δZi2 [j][k2 ] = δXi1 +1 [j][k1 ] ⊕ δXi2 +1 [j][k2 ]. From this equation, we infer that it is not possible to have exactly one of the three following inequalities which is true: δKi1 +1 [j][k1 ] = δKi2 +1 [j][k2 ] δZi1 [j][k1 ] = δZi2 [j][k2 ] δXi1 +1 [j][k1 ] = δXi2 +1 [j][k2 ] Constraint (C13 ) ensures this property: The ﬁrst two inequalities are modeled by diﬀ δKi1 +1 [j][k1 ],δKi2 +1 [j][k2 ] and diﬀ δZi1 [j][k1 ],δZi2 [j][k2 ] ; For the third inequality, we exploit the fact that if ΔXi1 +1 [j][k1 ] + ΔXi2 +1 [j][k2 ] = 1 then δXi1 +1 [j][k1 ] = δXi2 +1 [j][k2 ].

27

4.4. Implementation CPXOR only uses common and widely implemented constraints. Therefore, it may be easily implemented with any existing CP libraries. In order to ease the comparison between diﬀerent solvers, we have implemented CPXOR with a high-level modeling language called MiniZinc [NSB+ 07]: MiniZinc models are translated into a simple subset of MiniZinc called FlatZinc, using a compiler provided by MiniZinc, and most existing CP solvers have developed FlatZinc interfaces (currently, there are ﬁfteen CP solvers which have FlatZinc interfaces). We ran experiments with Gecode [Gec06], Choco [PFL16], Chuﬀed [CS14], and Picat-SAT [ZKF15]: Gecode and Choco are CP libraries which solve CSPs by combining search with constraint propagation; Chuﬀed is a lazy clause hybrid solver that combines features of ﬁnite domain propagation and Boolean satisﬁability; Picat-SAT translates CSPs into Boolean satisﬁability formulae, and then uses the SAT solver Lingeling [Bie14] to solve it. Gecode and Choco are clearly outperformed by Chuﬀed and Picat-SAT for both Step1-opt and Step1-enum. This shows us that clause learning is a key ingredient for solving these problems. Picat-SAT and Chuﬀed have complementary performance. For the optimisation problem Step1-opt, Picat-SAT is always the fastest solver. For the enumeration problem Step1-enum, Chuﬀed is faster than Picat-SAT on small instances, i.e., all AES-128 instances, and AES-192 (resp. AES-256) instances when the number of rounds r is lower than 5 (resp. 7). However, Picat-SAT is faster than Chuﬀed on larger instances and Chuﬀed is not able to solve AES-192 (resp. AES-256) instances within a 24 hour time limit when r ≥ 9 (resp. r ≥ 11). The good performance of Picat-SAT is not surprising given that most variables are Boolean variables. If we exclude equality constraints, there are only two diﬀerent kinds of constraints: • (C3 ), (C8 ), (C9 ), and (C13 ) are of the form: Δi ∈S Δi = 1; ) are of the form: • (C1 ), (C5 ) and (C12 Δi ∈S Δi = v where v is the integer variable objStep1 for (C1 ) and an integer variable whose domain is {0, 5, 6, 7, 8} for (C5 ) and (C12 ); where S is a set of Boolean variables. These two kinds of constraints are easily translated into Boolean formulae by combining at-most and at-least encodings which are widely studied in the SAT community [ZK17]. Note that we cannot use the xor-clauses introduced in CryptoMiniSat [SNC09] 28

for modeling the xor constraint in Step 1 because the semantics of the xor operation is changed when abstracting every byte by a Boolean value. As CPXOR only uses sum constraints, we can also translate it into an Integer Linear Programming (ILP) model in a very straightforward way, by using an encoding similar to the one introduced in [MWGP12], for example. Each constraint Δi ∈S Δi = 1 is translated into #S inequalities: for each Δj ∈ S, we add the inequality −Δj + Δi ∈S\{Δj } Δi ≥ 0. Each constraint Δi ∈S Δi = v such that v is an integer variable whose domain is {0, 5, 6, 7, 8} is encoded by introducing a new Boolean variable x ∈ {0, 1} and adding the following inequalities: Δi ≥ 5x Δi ∈S

∀Δi ∈ S, x ≥ Δi When x = 0, every Δi is constrained to be equal to 0 by x ≥ Δi ; Otherwise, the ﬁrst inequality ensures that at least 5 variables of S are set to 1. We have used Gurobi [Opt18] to solve this ILP model, and experiments have shown us that it is not competitive with Picat-SAT. 4.5. Comparison of CPXOR with CPEQ The model introduced in [GMS16], called CPEQ , is an improvement of CPBasic . Like CPXOR , CPEQ exploits equality relationships at the byte level to better propagate the MDS property. Constraints (C1 ) to (C9 ) and Con straints (C12 ) and (C13 ) are common to CPXOR and CPEQ 1 . However, in CPEQ we do not infer new xor equations from the initial equations of the key schedule, as explained in Section 4.1, and therefore Con straints (C10 ) and (C11 ) are not used in CPEQ . Instead of generating new equations, CPEQ uses the key schedule rules to pre-compute, for each diﬀerential byte δKi [j][k], a set V (i, j, k) of diﬀerential bytes such that δKi [j][k] is equal to the result of xoring all bytes in V (i, j, k). Then, for each diﬀerential byte δKi [j][k], a variable V1 (i, j, k) is constrained to be equal to the subset of bytes of V (i, j, k) that are diﬀerent from 0. These V1 variables are used to infer that two diﬀerential bytes are equal when their corresponding V1 1

In [GMS16], CPEQ is deﬁned by using eq variables instead of diﬀ variables, such that each diﬀ δB1 ,δB2 variable that occurs in a constraint of CPXOR must be replaced by 1 − eq δB1 ,δB2 in CPEQ .

29

variables are equal, and that ΔKi [j][k] is equal to 0 (resp. 1) when V1 (i, j, k) is empty (resp. contains only one variable). In this section, we experimentally compare CPXOR with CPEQ . Experimental setup. All experiments have been performed on a single core of a server with an Intel Xeon E5-2687W v4 CPU at 3.00GHz. We compare CPXOR and CPEQ on the two problems described in Section 2.3, i.e., Step1-opt, that aims at ﬁnding the minimal value of objStep1 , and Step1-enum, that aims at enumerating all truncated diﬀerential characteristics when the value of objStep1 is ﬁxed to a given value v. We consider 23 instances denoted AES-l-r where l ∈ {128, 192, 256} is the key length and r is the number of rounds: r ∈ [3, 5] (resp. [3, 10] and [3, 14]) when l = 128 (resp. 192 and 256). We do not consider values of r larger than 5 (resp. 10) when l = 128 (resp. 192) because for these values the maximal probability becomes smaller than 2−l . CPXOR and CPEQ are both implemented2 with MiniZinc [NSB+ 07], and we report experimental results obtained with Picat-SAT which is the best performing solver among all the considered solvers (as discussed in Section 4.4). Results for Step1-opt. For each instance, Table 1 reports the optimal value v ∗ of ObjStep1 computed by Step1-opt. This optimal value may be diﬀerent for CPXOR and CPEQ as they consider diﬀerent abstractions of the KS xor operations. In practice, for all instances but one, both models ﬁnd the same optimal value, and for this optimal value there exists at least one byteconsistent truncated diﬀerential characteristic (see Table 2) so that the repeat loop (lines 6-10) of Algorithm 1 is executed only once. However, for AES192-10, the minimal value of objStep1 is equal to 27 with CPEQ whereas it is equal to 29 with CPXOR . As a consequence, if we use CPEQ to solve Step1opt, the repeat loop of Algorithm 1 is executed three times: v is successively assigned to 27, 28, and 29, and for each of these values, we need to solve Step1-enum and Step2. When v = 27 (resp. 28), Step1-enum with CPEQ ﬁnds 92 (resp. 1436) truncated diﬀerential characteristics which are all byteinconsistent so that Step2 returns null for each of them. When v = 29, some truncated diﬀerential characteristics are byte-consistent and the repeat loop 2

These models are available on https://gitlab.inria.fr/source code/aes-cryptanalysiscp-xor-2019/

30

AES-128-3 AES-128-4 AES-128-5 AES-192-3 AES-192-4 AES-192-5 AES-192-6 AES-192-7 AES-192-8 AES-192-9 AES-192-10 AES-256-3 AES-256-4 AES-256-5 AES-256-6 AES-256-7 AES-256-8 AES-256-9 AES-256-10 AES-256-11 AES-256-12 AES-256-13 AES-256-14

Step1-opt Step1-enum CPEQ CPXOR CPEQ CPXOR opt opt ∗ ∗ enum v t1 v t1 #T t1 #T tenum 1 5 4 5 3 4 6 4 4 12 21 12 14 8 74 8 38 17 44 17 33 1113 32340 1113 22869 1 3 1 2 15 16 15 10 4 8 4 5 4 12 4 7 5 14 5 8 2 13 2 9 10 34 10 18 6 65 6 45 13 72 13 37 4 98 4 66 18 205 18 73 8 752 8 333 24 2527 24 520 240 43359 240 13524 27 3715 29 3285 27548 - 602 216120 1 3 1 3 33 39 33 29 3 8 3 7 14 38 14 25 3 13 3 8 4 21 4 15 5 25 5 17 3 29 3 20 5 48 5 47 1 22 1 15 10 61 10 49 3 76 3 52 15 172 15 106 16 705 16 430 16 236 16 112 4 385 4 224 20 488 20 286 4 705 4 312 20 625 20 140 4 1228 4 463 24 1621 24 822 4 1910 4 597 24 2179 24 682 4 1722 4 607

Table 1: Comparison of CPEQ and CPXOR for solving Step 1. For each instance, we display the results with CPEQ and CPXOR for Step1-opt (optimal value v ∗ of objStep1 , and time topt 1 in seconds) and Step1-enum (number #T of truncated diﬀerential characteristics in seconds). when objStep1 is assigned to the value v ∗ found with CPXOR , and time tenum 1 We report ’-’ when the time exceeds two weeks.

is stopped. For this instance, CPXOR is able to infer that there is no byteconsistent truncated diﬀerential characteristic with 27 or 28 active S-boxes, and it returns 29, which is the smallest possible value for which there are byte-consistent truncated diﬀerential characteristics. 31

When comparing CPU times, we note that CPXOR is always at least as fast as CPEQ : CPXOR is able to solve every instance but one (AES-192-10) in at most 822 seconds, whereas CPEQ needs up to 2527 seconds to solve these instances. For instance AES-192-10, CPXOR is a bit faster than CPEQ (3285 seconds instead of 3715), while it ﬁnds a better solution that has 29 active S-boxes instead of 27. Results for Step1-enum. In Table 1, we report results for solving Step1-enum when v is ﬁxed to the optimal value v ∗ found when solving Step1-opt with CPXOR . CPXOR is always faster than CPEQ , and it is able to solve all instances but three in less than 607 seconds. The three most challenging instances are AES-128-5, AES-192-9, and AES-192-10, which are solved by CPXOR in less than 7 hours, 4 hours and 60 hours, respectively. CPEQ is able to solve AES-128-5 and AES-192-9 in less than 9 and 12 hours, respectively. However, AES-192-10 cannot be solved by CPEQ within two weeks. Actually, for this instance, CPXOR strongly reduces the number of solutions: There are 602 truncated diﬀerential characteristics with CPXOR instead of 27548 with CPEQ 3 . Note that reducing the number of truncated diﬀerential characteristics is very important to reduce the total solving time as for each characteristic of Step 1, we need to search for an optimal byte solution (or prove that it is not byte-consistent, which is the case of every characteristic found with CPEQ but not with CPXOR ). 4.6. Discussion Experimental results show us that adding new xor constraints (obtained by combining the initial equations coming from the key schedule) allow us to tighten the Boolean abstraction and speed-up the solution process. Our new model only uses common and widely implemented constraints and it has been implemented in MiniZinc. Therefore, it can be solved by any CP solver that accepts MiniZinc models and it can be easily translated in any constraintbased modeling language. As a counterpart, using this model involves to precompute new xor equations. If this pre-computation step has a negligible time complexity compared to the solving process, it may limit the re-usability of the proposed approach for other cryptanalysis problems. 3

The number of solutions with CPEQ has been found by decomposing Step1-enum into independent sub-problems which have been solved in parallel (see [GLMS17]).

32

Another possibility to strengthen the CP model would have been to introduce a new global constraint. Global constraints are a key point of CP success: they both ease the modeling step by providing compact ways for declaring constraints, and speed-up the solution process by providing ded icated propagators. In our context, we could replace constraints (C10 ) and (C11 ) by a single constraint globalXOR({ΔKi [j][k] : i ∈ [0, r], j, k ∈ [0, 3]}). In this case, we must implement a propagator that ﬁlters the domains of ΔKi [j][k] variables in order to forbid Boolean assignments that do not satisfy the key schedule xor equations at the byte level. A key point is to ﬁnd a good compromise between the strength of the ﬁltering and its timecomplexity, and this usually involves an important work of design and implementation. This also implies a loss of universality of the model as it can only be solved by solvers for which a propagator dedicated to globalXOR has been implemented. 5. CP model for Step 2 Given a truncated diﬀerential characteristic computed by Step1-enum, Step 2 aims at searching for the byte values with the highest diﬀerential probability (or proving that the characteristic is not byte-consistent). In this section, we describe the CP model introduced in [GMS16] for AES-128, extended to AES-192 and AES-256 in a straightforward way. 5.1. Variables of the Step 2 model For each diﬀerential byte δB ∈ diﬀByte l , we deﬁne an integer variable. The domain of each of these variables depends on whether it has a corresponding Boolean variable in the Step 1 model: • Each diﬀerential byte δB ∈ {δX[j][k], δKr [j][k], δXr [j][k] : j, k ∈ [0, 3]} has no Boolean counterpart in the Step 1 model (because its value is deterministically inferred from the values of other variables). In this case, the domain is D(δB) = [0, 255]. • Each diﬀerential byte δB ∈ diﬀByte l \ {δX[j][k], δKr [j][k], δXr [j][k] : j, k ∈ [0, 3]} has a Boolean counterpart ΔB in the Step 1 model. In this case, the domain is D(δB) = {0} if ΔB = 0, and D(δB) = [1, 255] otherwise.

33

As we look for a byte-consistent solution with maximal probability, we declare an integer variable PδB for each diﬀerential byte δB ∈ Sboxesl : This variable corresponds to the base 2 logarithm of the probability pS (δSB|δB) of obtaining the S-box output diﬀerence δSB when the S-box input diﬀerence is δB. The domain of PδB depends on the value of ΔB in the truncated diﬀerential characteristic: If ΔB = 0, then pS (0|0) = 1 and therefore D(PδB ) = {0}; 2 4 otherwise, pS (δSB|δB) ∈ { 256 , 256 } and D(PδB ) = {−7, −6} (the constraint associated with the SubBytes operation forbids couples (δB, δSB) such that pS (δSB|δB) = 0). Finally, we introduce an integer variable objStep2 which corresponds to the base 2 logarithm of the probability of the diﬀerential characteristic. The domain of objStep2 is derived from the number of diﬀerences that pass trough S-boxes in the truncated diﬀerential characteristic, i.e., D(objStep2 ) = [−7 · v, −6 · v] where v = δB∈Sboxes l ΔB. 5.2. Constraints of the Step 2 model The constraints basically follow the AES operations to relate variables, as described in Section 3 for Step 1, but consider the deﬁnition of the operations at the byte level, instead of the Boolean level. A main diﬀerence is that the SubBytes operation, which has no eﬀect at the Boolean level, must be modeled at the byte level. This is done thanks to a ternary table constraint which extensively lists all triples (X, Y, P ) such that there exist two bytes B1 and B2 whose diﬀerence before and after passing through S-Boxes is equal to X and Y , respectively, and such that P is the probability of this transformation: For all δB ∈ Sboxesl , we add the constraint (δB, δSB, PδB ) ∈ {(X, Y, P ) | ∃(B1 , B2 ) ∈ [0, 255] × [0, 255], X = B1 ⊕ B2 , Y = S(B1 ) ⊕ S(B2 ), P = log2 (pS (Y |X))}. This table constraint contains 1 + 255 ∗ 127 tuples. The ﬁrst tuple is (0, 0, 0) and it corresponds to the case where the input diﬀerential byte δB is 0: In this case, the output diﬀerential byte δSB is also 0, and pS (0|0) = 1. All other tuples correspond to non null input diﬀerences: There are 255 possible non null input diﬀerences, and for each of them there are 127 possible output diﬀerences (one for which the probability is equal to 2−6 , and 126 for which the probability is equal to 2−7 ). All other constraints are deﬁned in a rather straightforward way, using table constraints. 34

5.3. Objective function The goal is to ﬁnd a byte-consistent solution with maximal diﬀerential probability. As we consider logarithms, this amounts to searching for a solution that maximizes the sum of all PδB variables. Hence, we constrain objStep2 to be equal to the sum of all PδB variables: PδB objStep2 = δB∈Sboxesl

and we deﬁne the objective function as the maximization of objStep2 . 5.4. Implementation Our Step 2 model only uses table constraints. These constraints are straightforward to implement with a CP library, and dedicated propagators have been designed for eﬃciently propagating them. Indeed, table constraints are a hot research topic and a key point for the success of CP (see, e.g., [CY10, DHL+ 16, VLS18]). Hence, we have implemented our Step 2 model with Choco [PFL16], using the domOverWDeg variable ordering heuristic and the lastConﬂict search strategy (that are predeﬁned in Choco). It is possible to model Step 2 using other kinds of approaches, such as ILP or SAT; however, these approaches do not directly support table constraints and do not have dedicated algorithms for propagating them. Therefore, they are less straightforward to use. In the next two paragraphs, we describe some recent work that introduce ILP or SAT encodings for solving similar cryptanalysis problems. Encoding table constraints with ILP. In [AST+ 17] Abdelkhalek et al. describe how to encode the S-box diﬀerential table which relates 8-bit input diﬀerences with 8-bit output diﬀerences. In a ﬁrst step, they generate one inequality for each 16-bit tuple that does not belong to the table (in order to forbid this tuple). For the AES S-box, there are 33150 forbidden tuples. In a second step, they reduce the number of inequalities. As the initial set of inequalities corresponds to a product-of-sum representation of Boolean functions, the minimum set of inequalities corresponds to the set of prime implicants of the relation, and it can be computed by using the QuineMcCluskey algorithm [Qui55, MJ56]. However, as computing the minimum set is an N P-hard problem, they use the heuristic algorithm called Espresso [BSVMH84] to compute an approximate solution. For the AES S-box, this algorithm reduces the number of inequalities from 33150 to 8302. Note that these inequalities must be added for each active S-box. 35

Encoding table constraints with SAT. In [Laf18], Laﬁtte describes how to encode a relation R ⊆ {0, 1}n (such that R is composed of k words of {0, 1}n ) as a SAT formula with 2n − k clauses: Each of these clauses is a disjunction of n litterals which forbids one word of {0, 1}n \ R. This encoding may be used to model the S-box diﬀerential relation and, in this case, we have 33150 clauses for each byte that passes through an active S-box. In [SWW18], Sun et al. show how to reduce the number of clauses by using the same approach as in [AST+ 17]. In [Wal00], Walsh shows that enforcing arc consistency on a binary constraint ﬁlters more values than unit propagation on the kind of SAT encoding proposed in [Laf18, SWW18]. This explains why SAT solvers on this encoding are usually outperformed by CP solvers. However, in [Bac07], Bacchus introduces a SAT encoding of an arbitrary constraint (deﬁned by a set of allowed tuples) which enforces GAC and is linear in the number of allowed tuples. Also, the set of allowed tuples may be represented by a binary decision diagram (BDD) [Bry86], and BDDs may be encoded into clauses which enforce GAC, as described by E´en and S¨orensson in [ES06], for example. These encodings should improve performance of SAT solvers (and also ILP solvers as similar encodings could be used for ILP too) for computing optimal diﬀerential characteristics. 5.5. Experimental evaluation In Table 2, we display the CPU time needed by Choco to search for optimal diﬀerential characteristics, given the set of truncated diﬀerential characteristics computed by Step1-enum. For all instances but three (AES-128-5, AES-192-9, and AES-192-10), the time needed to ﬁnd the optimal solution given one truncated diﬀerential characteristic is smaller than 100 seconds, and the number of truncated diﬀerential characteristics is smaller than 20, so that the optimal solution for all truncated diﬀerential characteristics is found in less than 500 seconds. However, for AES-128-5 (resp. AES-192-9 and AES-192-10), the average solving time per truncated diﬀerential characteristic is 210 (resp. 147 and 92) seconds, and there are 1113 (resp. 240 and 602) truncated diﬀerential characteristics so that the total solving time for all truncated diﬀerential characteristics exceeds 65 hours (resp. 9 hours and 15 hours). Note that for these three instances most truncated diﬀerential characteristics are not byte consistent: Among the 1113 (resp. 240 and 602) characteristics enumerated by Step1-enum, only 97 (resp. 13 and 202) are byte-consistent. 36

AES-128-3 AES-128-4 AES-128-5 AES-192-3 AES-192-4 AES-192-5 AES-192-6 AES-192-7 AES-192-8 AES-192-9 AES-192-10 AES-256-3 AES-256-4 AES-256-5 AES-256-6 AES-256-7 AES-256-8 AES-256-9 AES-256-10 AES-256-11 AES-256-12 AES-256-13 AES-256-14

#T 4 8 1113 15 4 2 6 4 8 240 602 33 14 4 3 1 3 16 4 4 4 4 4

#B p −31 2 2 8 2−75 97 2−105 15 2−6 4 2−24 2 2−30 6 2−60 4 2−78 8 2−108 80 2−146 202 2−176 33 2−6 14 2−18 4 2−18 3 2−30 1 2−30 1 2−60 16 2−92 4 2−98 4 2−122 4 2−122 4 2−146 4 2−146

t2 10 40 235086 15 13 11 35 46 119 35254 55310 26 25 12 11 9 19 457 160 178 237 244 302

t2 #T

2.5 5 211.2 1 3.3 5.5 5.8 11.5 14.9 146.9 91.9 0.8 1.8 3 3.7 8.8 6.3 28.6 40 44.5 59.3 61 75.5

Table 2: Results of Choco for solving Step 2. For each instance, we display: the number #T of truncated diﬀerential characteristics, the number #B of byte-consistent truncated diﬀerential characteristics, the maximal probability p = 2objStep2 among all byte-consistent truncated diﬀerential characteristics, the total time t2 for solving Step 2 for all truncated t2 for solving Step 2 for one characterdiﬀerential characteristics, and the average time #T istic. Times are in seconds.

6. New two-step decomposition Every Boolean solution computed during Step 1 corresponds to a truncated diﬀerential characteristic. Given one of these Boolean solutions, Step 2 is solved rather quickly, as shown in Table 2. However, in some cases, the number of Boolean solutions is quite large, and solving Step 2 for all Boolean 37

solutions becomes time-consuming. To reduce the number of Boolean solutions, we propose to shift the frontier between Steps 1 and 2. More precisely, we modify the output of Step1enum: Instead of enumerating all Boolean solutions, we focus on the variables associated with bytes that pass through S-boxes, i.e., we enumerate all assignments of Boolean variables associated with diﬀerential bytes in Sboxes l . For AES-128, this amounts to enumerating all assignments of ΔXi [j][k] and ΔKi [j][3] that belong to Boolean solutions (in other words, we do not enumerate the values of ΔKi [j][k] with k ∈ [0, 2], and we do not enumerate the values of ΔYi [j][k] and ΔZi [j][k]). Hence, every Boolean assignment computed by the new Step 1 no longer corresponds to a truncated diﬀerential characteristic (as Boolean variables associated with bytes that do not pass through S-boxes are not assigned), but to a non empty set of truncated diﬀerential characteristics (all truncated diﬀerential characteristics that share the same values for Boolean variables associated with bytes that pass through S-boxes). Step 2 is adapted to integrate the fact that the new Step 1 does not assign values to some variables: The domain of a variable δB associated with a Boolean variable ΔB which is not assigned in the new Step 1 is D(δB) = [0, 255]. This simple reduction of the scope of the new Step 1 greatly reduces the number of diﬀerent assignments (as many assignments only diﬀer on values assigned to ΔYi , ΔZi , or ΔKi [j][k] with k ∈ [0, 2]), without increasing the size of the search space to explore in the new Step 2. Indeed, the values of δYi , δZi , and δKi [j][k] with k ∈ [0, 2] are deterministically inferred from the values of the variables associated with inputs and outputs of S-boxes (i.e., δXi , δSXi , δKi [j][3], and δSKi [j][3] for AES-128). In Table 3, we give the results obtained with this new decomposition. For the new Step 1, we compare CPEQ and CPXOR for solving Step1-enum with Picat-SAT. Again, CPXOR is faster than CPEQ , and it is able to solve all instances but two in less than 7 minutes. The two hardest instances are AES-128-5, which is solved in 24 minutes, and AES-192-10, which is solved in less than 4 hours. For this last instance, CPEQ did not terminate after three days. We therefore stopped it, and report the number of Boolean assignments it found within these three days in Table 3. Both CPEQ and CPXOR ﬁnd the same number of Boolean assignments (#T ) for all instances but one. For AES-192-10, there are 7 Boolean assignments with CPXOR , and at least 40 with CPEQ (as CPEQ has enumerated 38

AES-128-3 AES-128-4 AES-128-5 AES-192-3 AES-192-4 AES-192-5 AES-192-6 AES-192-7 AES-192-8 AES-192-9 AES-192-10 AES-256-3 AES-256-4 AES-256-5 AES-256-6 AES-256-7 AES-256-8 AES-256-9 AES-256-10 AES-256-11 AES-256-12 AES-256-13 AES-256-14

New Step 1 CPEQ CPXOR enum #T t1 #T tenum 1 2 3 2 2 1 16 1 8 103 2651 103 1409 14 14 14 8 2 7 2 4 1 9 1 4 2 27 2 11 1 41 1 17 1 221 1 57 3 1720 3 386 ≥40 7 13558 33 34 33 23 10 25 10 14 4 20 4 10 3 27 3 12 1 21 1 8 2 55 2 18 4 203 4 63 1 99 1 41 1 320 1 77 1 258 1 89 1 694 1 140 1 1087 1 97

New Step 2

Total time

t2 #B t2 #T 2 7 3.5 1 13 12.6 27 52313 507.9 14 19 1.4 2 7 3.5 1 4 3.8 2 14 7.0 1 7 7.4 1 8 8.2 3 109 36.3 7 281 40.1 33 36 1.1 10 24 2.4 4 15 3.8 3 16 5.3 1 7 7.4 2 14 7.0 4 69 17.3 1 45 45.3 1 28 27.8 1 35 35.2 1 46 46.0 1 35 34.8

seq par 12 9 35 35 53755 3388 29 11 16 13 16 16 43 36 61 61 138 138 1015 942 17124 16892 62 27 45 23 33 22 45 34 62 62 81 74 238 186 198 198 391 391 264 264 1008 1008 814 814

Table 3: Results with the new decomposition. For each instance, we display: the results for the new Step 1 with CPEQ and CPXOR (#T = number of Boolean assignments = time spent by Picat-SAT to solve Step1-enum), the computed by Step1-enum; tenum 1 results for the new Step 2 (#B = number of Byte-consistent Boolean assignments; t2 = time spent by Choco for all Boolean assignments; tmax = worst time per Boolean assignment), and total time for solving the whole problem by Algorithm 1 with CPXOR and the new decomposition (seq = time when using a single core; par = time when using #T cores in parallel for solving Step 2). Times are in seconds. We report ’-’ when time exceeds 3 days.

39

40 Boolean assignments within the CPU time limit of 3 days). As expected, there are less Boolean assignments with the new decomposition than with the original one for many instances. In some cases, the reduction is drastic: from 1113 (resp. 240 and 602) to 103 (resp. 3 and 7) for AES-128-5 (resp. AES192-9 and AES-192-10). As a consequence, the time needed to enumerate all Boolean assignments is also smaller with the new decomposition. In some cases, the speed-up is important. For example, with CPXOR , Step1-enum is solved in more than 6 (resp. 3, and 60) hours with the initial decomposition for AES-128-5 (resp. AES-192-9, and AES-192-10), whereas it is solved in less than 24 (7, and 226) minutes with the new decomposition. For all instances but one (AES-128-5), every Boolean assignment computed by Step1-enum is byte consistent. However, for AES-128-5, only 27 Boolean assignments, among the 103 computed by Step1-enum, are byte consistent. The time spent by Choco to ﬁnd the optimal diﬀerential characteristic t2 given one Boolean assignment ( #T ) is rather comparable to the one displayed in Table 2, for the initial decomposition: it is larger for 9 instances, equal for 3 instances, and smaller for 10 instances. As there are less Boolean assignments with the new decomposition than with the initial one, the total time for Step 2 (t2 ) is often smaller and, for the most challenging instances it is much smaller: it is larger than 65 hours (resp. 9 hours, and 15 hours) with the initial decomposition for AES-128-5 (resp. AES-192-9, and AES-192-10), whereas it is smaller than 15 hours (9 minutes, and 4 hours) with the new decomposition. The last two columns of Table 3 give the time needed to solve the whole problem as described in Algorithm 1, i.e., solve Step1-opt with CPXOR , then solve Step1-enum with CPXOR and the new Step 1, and ﬁnally solve the new Step 2 for each Boolean assignment computed by Step1-enum (for all instances, the loop lines 6 to 10 is executed only once as the exit condition is satisﬁed when v = v ∗ ). The seq column gives the time when using a single core of a server with an Intel Xeon E5-2687W v4 CPU at 3.00GHz. The par column gives the time when Step 2 is solved in parallel for each Boolean assignment, using #T cores of the same server. This time has been estimated by adding the times of Step1-opt and Step1-enum with the maximal time needed to solve Step2 for one Boolean assignment, over the #T Boolean assignments computed by Step1-enum. The total seq time is smaller than one hour for all instances but two, and 11 instances are solved in less than one minute. The two hardest instances are 40

AES-128-5 (which is solved in less than 15 hours), and AES-192-10 (which is solved in less than 5 hours). When using #T cores to solve Step 2 in parallel for each Boolean assignment, the total solving time is reduced for all instances that have more than one Boolean assignment. In particular, the solving time is reduced to less than one hour for instance AES-128-5. This is a clear improvement with respect to dedicated approaches as the approach of [FJP13] cannot be extended to keys of size l > 128 bits, due to its memory complexity, and the approach of [BN10] needs several weeks to solve Step 1 for AES-192. As already pointed out in [GMS16], for AES-128-4, we have found a byteconsistent solution with objStep1 = 12 and a probability equal to 2−79 . This solution is better than the solution claimed to be optimal in [BN10] and [FJP13]: In these papers, authors address the same problem as us and they say that the best byte-consistent solution has objStep1 = 13, and a probability equal to 2−81 . Finally, we have found better solutions for AES-256. In particular, we have computed the actual optimal diﬀerential characteristic for AES-256-14, and its probability is 2−146 , instead of 2−154 for the one given in [BKN09]. 7. Conclusion We introduced new CP models for ﬁnding optimal diﬀerential characteristics for AES: We introduced a new decomposition of the solving process in two steps, that allows us to reduce the number of Boolean assignments, and a new CP model for solving the ﬁrst step, that exploits new xor constraints inferred from the key schedule to reduce the number of byte-inconsistent Boolean assignments. These new CP models have allowed us to solve all instances in a few hours. For AES-128, we have found a better diﬀerential characteristic for r = 4 rounds, with a greater probability, but it cannot be used to improve attacks as the best attacks in the related-key and chosen-key models exploit the optimal 5 round related-key diﬀerential characteristic. For AES-192 and AES-256, the optimal diﬀerential characteristics computed with our new models allowed us to mount new related-key attacks, related-key distinguishers and q-multicollisions. In particular, we improved the related-key distinguisher and the basic related-key diﬀerential attack on the full AES-256 by a factor 26 and the q-multicollisions by a factor 2 (see [GLMS18] for more details). 41

These cryptanalytic problems open new and exciting challenges for the CP community. In particular, these problems are not easy to model. More precisely, naive CP models (such as the one described in Section 3) are easy to design but they may not scale well. The addition of new xor equations and the introduction of diﬀ variables that model inequality relations at the byte level drastically improve the solving process, but these constraints are not straightforward to ﬁnd and implement. Hence, a challenge is to deﬁne new CP frameworks, dedicated to this kind of cryptanalytic problems, in order to ease the development of eﬃcient CP models for these problems. Among the most challenging cryptanalytic problems that we want to look at, we may cite the ones studied in [CHP+ 18] that directly implement particular attacks from related key diﬀerential characteristics and the recent results obtained using ILP solvers for studying the so-called division properties in an other symmetric key encryption primitive, the stream ciphers [TIHM17]. Finally, developing eﬃcient methods to evaluate the security of block cipher is a ﬁrst step towards the computer aided design of such primitives. This is a challenging and promising direction of research that will help us to have security by design. Acknowledgements. This research was conducted with the support of the FEDER program of 2014-2020, the region council of Auvergne-Rhˆone-Alpes, the GDR-IA, and the ANR (DeCrypt ANR-18-CE39-0007). We thank the reviewers for their comments that helped us improving the paper, J´er´emie Detrey for implementing the C code that checks the completeness of the sets xorEql , Charles Prud’homme and Jean-Guillaume Fages for their technical support on the use of Choco, and Neng-Fa Zhou for his technical support on the use of Picat. [AST+ 17] Ahmed Abdelkhalek, Yu Sasaki, Yosuke Todo, Mohamed Tolba, and Amr M. Youssef. MILP modeling for (large) s-boxes to optimize probability of diﬀerential characteristics. IACR Trans. Symmetric Cryptol., 2017(4):99–129, 2017. [Bac07] Fahiem Bacchus. Gac via unit propagation. In Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, pages 133–147. Springer, 2007. [Bie14] Armin Biere. Yet another local search solver and lingeling and friends entering the sat competition 2014. pages 39–40, 01 2014. 42

[Bih93] Eli Biham. New types of cryptoanalytic attacks using related keys (extended abstract). In Advances in Cryptology - EUROCRYPT ’93, volume 765 of Lecture Notes in Computer Science, pages 398–409. Springer, 1993. [BJK+ 16] Christof Beierle, J´er´emy Jean, Stefan K¨olbl, Gregor Leander, Amir Moradi, Thomas Peyrin, Yu Sasaki, Pascal Sasdrich, and Siang Meng Sim. The SKINNY family of block ciphers and its low-latency variant MANTIS. In Advances in Cryptology CRYPTO 2016 Part II, volume 9815 of LNCS, pages 123–153. Springer, 2016. [BKN09] Alex Biryukov, Dmitry Khovratovich, and Ivica Nikolic. Distinguisher and related-key attack on the full AES-256. In Advances in Cryptology - CRYPTO 2009, volume 5677 of LNCS, pages 231–249. Springer, 2009. [BN10] Alex Biryukov and Ivica Nikolic. Automatic search for relatedkey diﬀerential characteristics in byte-oriented block ciphers: Application to aes, camellia, khazad and others. In Advances in Cryptology - EUROCRYPT 2010, volume 6110 of LNCS, pages 322–344. Springer, 2010. [Bry86] Randal E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. Computers, 35(8):677–691, 1986. [BS91] Eli Biham and Adi Shamir. Diﬀerential cryptoanalysis of feal and n-hash. In Advances in Cryptology - EUROCRYPT ’91, volume 547 of Lecture Notes in Computer Science, pages 1–16. Springer, 1991. [BSVMH84] Robert King Brayton, Alberto L. Sangiovanni-Vincentelli, Curtis T. McMullen, and Gary D. Hachtel. Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers, 1984. [CHP+ 18] Carlos Cid, Tao Huang, Thomas Peyrin, Yu Sasaki, and Ling Song. Boomerang connectivity table: A new cryptanalysis tool. In Advances in Cryptology - EUROCRYPT 2018, volume 10821

43

of Lecture Notes in Computer Science, pages 683–714. Springer, 2018. [CS14] Geoﬀrey Chu and Peter J. Stuckey. Chuﬀed solver description, 2014. Available at http://www.minizinc.org/ challenge2014/description\_chuffed.txt. [CY10] Kenil C. K. Cheng and Roland H. C. Yap. An mdd-based generalized arc consistency algorithm for positive and negative table constraints and some global constraints. Constraints, 15(2):265– 304, 2010. [DHL+ 16] Jordan Demeulenaere, Renaud Hartert, Christophe Lecoutre, Guillaume Perez, Laurent Perron, Jean-Charles R´egin, and Pierre Schaus. Compact-table: Eﬃciently ﬁltering table constraints with reversible sparse bit-sets. In Michel Rueher, editor, Principles and Practice of Constraint Programming - 22nd International Conference, CP 2016, Toulouse, France, September 5-9, 2016, Proceedings, volume 9892 of Lecture Notes in Computer Science, pages 207–223. Springer, 2016. [DR13] Joan Daemen and Vincent Rijmen. The design of Rijndael: AES-the advanced encryption standard. Springer Science & Business Media, 2013. [ES06] Niklas E´en and Niklas S¨orensson. Translating pseudo-boolean constraints into SAT. JSAT, 2(1-4):1–26, 2006. [FIP01] FIPS 197. Advanced Encryption Standard. Federal Information Processing Standards Publication 197, 2001. U.S. Department of Commerce/N.I.S.T. [FJP13] Pierre-Alain Fouque, J´er´emy Jean, and Thomas Peyrin. Structural evaluation of AES and chosen-key distinguisher of 9-round AES-128. In Advances in Cryptology - CRYPTO 2013 - Part I, volume 8042 of LNCS, pages 183–203. Springer, 2013. [Gec06] Gecode Team. Gecode: Generic constraint development environment, 2006. Available from http://www.gecode.org.

44

[GL16] David G´erault and Pascal Lafourcade. Related-key cryptanalysis of midori. In Progress in Cryptology - INDOCRYPT 2016, volume 10095 of LNCS, pages 287–304, 2016. [GLMS17] David G´erault, Pascal Lafourcade, Marine Minier, and Christine Solnon. Revisiting AES related-key diﬀerential attacks with constraint programming. Cryptology ePrint Archive, Report 2017/139, Extended version of [GLMS18], 2017. https: //eprint.iacr.org/2017/139. [GLMS18] David G´erault, Pascal Lafourcade, Marine Minier, and Christine Solnon. Revisiting AES related-key diﬀerential attacks with constraint programming. Inf. Process. Lett., 139:24–29, 2018. [GMS16] David G´erault, Marine Minier, and Christine Solnon. Constraint programming models for chosen key diﬀerential cryptanalysis. In Principles and Practice of Constraint Programming - CP 2016, volume 9892 of LNCS, pages 584–601. Springer, 2016. [KLT15] Stefan K¨olbl, Gregor Leander, and Tyge Tiessen. Observations on the SIMON block cipher family. In Advances in Cryptology - CRYPTO 2015 - 35th Annual Cryptology Conference, Santa Barbara, CA, USA, August 16-20, 2015, Proceedings, Part I, volume 9215 of Lecture Notes in Computer Science, pages 161– 185. Springer, 2015. [Knu95] Lars R. Knudsen. Truncated and higher order diﬀerentials. In Fast Software Encryption, pages 196–211. Springer, 1995. [Laf18] Fr´ed´eric Laﬁtte. Cryptosat: a tool for sat-based cryptanalysis. IET Information Security, 12(6):463–474, 2018. [LCM+ 17] Fanghui Liu, Waldemar Cruz, Chujiao Ma, Greg Johnson, and Laurent Michel. A tolerant algebraic side-channel attack on AES using CP. In Principles and Practice of Constraint Programming - 23rd International Conference, CP 2017, Melbourne, VIC, Australia, August 28 - September 1, 2017, Proceedings, volume 10416 of Lecture Notes in Computer Science, pages 189–205. Springer, 2017. 45

[MJ56] E. J. McCluskey Jr. Minimization of boolean functions*. Bell System Technical Journal, 35(6):1417–1444, 1956. [MP13] Nicky Mouha and Bart Preneel. A proof that the ARX cipher salsa20 is secure against diﬀerential cryptanalysis. IACR Cryptology ePrint Archive, 2013:328, 2013. [MS77] Florence Jessie MacWilliams and Neil James Alexander Sloane. The theory of error-correcting codes, volume 16. Elsevier, 1977. [MSR14] Marine Minier, Christine Solnon, and Julia Reboul. Solving a Symmetric Key Cryptographic Problem with Constraint Programming. In 13th International Workshop on Constraint Modelling and Reformulation (ModRef ), in conjunction with CP’14, pages 1–13, 2014. [MWGP12] Nicky Mouha, Qingju Wang, Dawu Gu, and Bart Preneel. Diﬀerential and linear cryptanalysis using mixed-integer linear programming. In Chuan-Kun Wu, Moti Yung, and Dongdai Lin, editors, Information Security and Cryptology, pages 57–76. Springer, 2012. [NSB+ 07] Nicholas Nethercote, Peter J. Stuckey, Ralph Becket, Sebastian Brand, Gregory J. Duck, and Guido Tack. Minizinc: Towards a standard CP modelling language. In Principles and Practice of Constraint Programming - CP 2007, volume 4741 of LNCS, pages 529–543. Springer, 2007. [Opt18] Gurobi Optimization. Gurobi optimizer reference manual, 2018. [PFL16] Charles Prud’homme, Jean-Guillaume Fages, and Xavier Lorca. Choco Documentation. TASC, INRIA Rennes, LINA CNRS UMR 6241, COSLING S.A.S., 2016. [Qui55] W. V. Quine. A way to simplify truth functions. The American Mathematical Monthly, 62(9):627–631, 1955. [RBW06] Francesca Rossi, Peter van Beek, and Toby Walsh. Handbook of Constraint Programming (Foundations of Artiﬁcial Intelligence). Elsevier Science Inc., New York, NY, USA, 2006.

46

[RSM+ 11] Venkatesh Ramamoorthy, Marius-Calin Silaghi, Toshihiro Matsui, Katsutoshi Hirayama, and Makoto Yokoo. The design of cryptographic s-boxes using csps. In Principles and Practice of Constraint Programming - CP 2011 - 17th International Conference, CP 2011, volume 6876 of Lecture Notes in Computer Science, pages 54–68. Springer, 2011. [SGL+ 17] Siwei Sun, David G´erault, Pascal Lafourcade, Qianqian Yang, Yosuke Todo, Kexin Qiao, and Lei Hu. Analysis of aes, skinny, and others with constraint programming. In 24th International Conference on Fast Software Encryption, 2017. [SHW+ 14] Siwei Sun, Lei Hu, Peng Wang, Kexin Qiao, Xiaoshuang Ma, and Ling Song. Automatic security evaluation and (related-key) diﬀerential characteristic search: Application to simon, present, lblock, DES(L) and other bit-oriented block ciphers. In Advances in Cryptology - ASIACRYPT 2014 Part I, volume 8873 of LNCS, pages 158–178. Springer, 2014. [Sin06] R. Singleton. Maximum distance -nary codes. IEEE Trans. Inf. Theor., 10(2):116–118, September 2006. [SNC09] Mate Soos, Karsten Nohl, and Claude Castelluccia. Extending SAT solvers to cryptographic problems. In Theory and Applications of Satisﬁability Testing - SAT 2009, 12th International Conference, SAT 2009, volume 5584 of Lecture Notes in Computer Science, pages 244–257. Springer, 2009. [ST17] Yu Sasaki and Yosuke Todo. New impossible diﬀerential search tool from design and cryptanalysis aspects - revealing structural properties of several ciphers. In Advances in Cryptology - EUROCRYPT 2017, volume 10212 of Lecture Notes in Computer Science, pages 185–215, 2017. [SWW17] Ling Sun, Wei Wang, and Meiqin Wang. Automatic search of bit-based division property for ARX ciphers and word-based division property. In Advances in Cryptology - ASIACRYPT 2017, pages 128–157, 2017.

47

[SWW18] Ling Sun, Wei Wang, and Meiqin Wang. More accurate diﬀerential properties of led64 and midori64. IACR Transactions on Symmetric Cryptology, 2018(3):93–123, 2018. [TIHM17] Yosuke Todo, Takanori Isobe, Yonglin Hao, and Willi Meier. Cube attacks on non-blackbox polynomials based on division property. In Advances in Cryptology - CRYPTO 2017, volume 10403 of Lecture Notes in Computer Science, pages 250–279. Springer, 2017. [VLS18] H´el`ene Verhaeghe, Christophe Lecoutre, and Pierre Schaus. Compact-mdd: Eﬃciently ﬁltering (s)mdd constraints with reversible sparse bit-sets. In Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence, IJCAI 2018, pages 1383–1389, 2018. [Wal00] Toby Walsh. SAT v CSP. In Principles and Practice of Constraint Programming, volume 1894 of Lecture Notes in Computer Science, pages 441–456. Springer, 2000. [ZK17] Neng-Fa Zhou and H˚ akan Kjellerstrand. Optimizing SAT encodings for arithmetic constraints. In Principles and Practice of Constraint Programming - 23rd International Conference, CP 2017, volume 10416 of Lecture Notes in Computer Science, pages 671–686. Springer, 2017. [ZKF15] Neng-Fa Zhou, Hakan Kjellerstrand, and Jonathan Fruhman. Constraint Solving and Planning with Picat. Springer, 2015.

48

APPENDIX Extension to AES-192 and AES-256 Description of KeySchedule when l ∈ {192, 256} AES-192 For AES-192, the initial key K has 6 columns. The ﬁrst four columns of K are used to initialize the four columns of K0 , i.e., for each row j ∈ [0, 3] and each column k ∈ [0, 3], K0 [j][k] = K[j][k]. The last two columns of K are used to initialize the ﬁrst two columns of K1 , i.e., for each row j ∈ [0, 3] and each column k ∈ [0, 1], K1 [j][k] = K[j][k + 4]. The last two columns of K1 , and the four columns of all following subkeys are deﬁned as follows. Columns 1 and 3 are always obtained by performing a xor between bytes of the previous and the 6th previous column: ∀i ∈ [2, r], ∀j ∈ [0, 3], Ki [j][1] = Ki−2 [j][3] ⊕ Ki [j][0] ∀i ∈ [1, r], ∀j ∈ [0, 3], Ki [j][3] = Ki−1 [j][1] ⊕ Ki [j][2] Columns 0 and 2 are also obtained by performing a xor between bytes of the previous and the 6th previous column, but some of these xors are combined with SubBytes operations, word rotations, and constant additions, depending on the value of i%3. For column 0, if the round number i ∈ [1, r] is such that i%3 = 0, then ∀j ∈ [0, 3], Ki [j][0] = Ki−2 [j][2] ⊕ Ki−1 [j][3] otherwise Ki [0][0] = Ki−2 [0][2] ⊕ SKi−1 [1][3] ⊕ ci ∀j ∈ [1, 3], Ki [j][0] = Ki−2 [j][2] ⊕ SKi−1 [(j + 1)%4][3] where ci is a constant, and SKi−1 [j][3] is the result of applying the SubBytes operation on Ki−1 [j][3], i.e. ∀j ∈ [0, 3], SKi−1 [j][3] = S(Ki−1 [j][3]) For column 2, if the round number i ∈ [2, r] is such that i%3 = 1, then ∀j ∈ [0, 3], Ki [j][2] = Ki−1 [j][0] ⊕ Ki [j][1] 49

otherwise ∀j ∈ [0, 3], Ki [j][2] = Ki−1 [j][0] ⊕ SKi [(j + 1)%4][1] where SKi [j][1] is the result of applying the SubBytes operation on Ki [j][1], i.e. ∀j ∈ [0, 3], SKi [j][1] = S(Ki [j][1]) AES-256 For AES-256, the initial key K has 8 columns, and these columns are used to initialize K0 and K1 , i.e., for each row j ∈ [0, 3] and each column k ∈ [0, 3], K0 [j][k] = K[j][k] and K1 [j][k] = K[j][k + 4]. Then, for each round i ∈ [2, r − 1], the subkey Ki+1 is generated from the previous subkeys Ki and Ki−1 as follows: • The ﬁrst column of Ki+1 is obtained from the ﬁrst and last columns of Ki in two steps. First, we apply the SubBytes operation on all bytes of the last column of Ki . We note SKi [j][3] the resulting byte for row j, i.e., ∀i ∈ [1, r − 1], ∀j ∈ [0, 3], SKi [j][3] = S(Ki [j][3]). Then, depending on whether i is odd or even, we either simply perform a xor, or combine this xor with a rotation and a constant addition: for each i ∈ [1, r − 1], if i%2 = 0 then Ki+1 [0][0] = SKi [1][3] ⊕ Ki [0][0] ⊕ ci ∀j ∈ [1, 3], Ki+1 [j][0] = SKi [(j + 1)%4][3] ⊕ Ki [j][0] else ∀j ∈ [1, 3], Ki+1 [j][0] = SKi [j][3] ⊕ Ki [j][0]. • For the last three columns k ∈ [1, 3], we simply perform xors: ∀i ∈ [1, r−1], ∀j ∈ [0, 3], ∀k ∈ [1, 3], Ki+1 [j][k] = Ki−1 [j][k−1]⊕Ki [j][k].

50

Deﬁnition of diﬀBytesl and Sboxes l when l ∈ {192, 256} AES-192 diﬀBytes192 = { δX[j][k], δKi [j][k], δXi [j][k], δSXi [j][k], δYi [j][k], δZi [j][k] : i ∈ [0, r], j ∈ [0, 3], k ∈ [0, 3]} ∪ { δSKi [j][1] : i ∈ [1, r], j ∈ [0, 3], i%3 = 1} ∪ { δSKi [j][3] : i ∈ [1, r], j ∈ [0, 3], i%3 = 2} Sboxes192 = { δKi [j][1] : i ∈ [1, r − 1], j ∈ [0, 3], i%3 = 1}, ∪ { δKi [j][3] : i ∈ [1, r − 1], j ∈ [0, 3], i%3 = 2} ∪ { δXi [j][k] : i ∈ [0, r − 1], j ∈ [0, 3], k ∈ [0, 3]} AES-256 diﬀBytes256 = { δX[j][k], δKi [j][k], δXi [j][k], δSXi [j][k], δYi [j][k], δZi [j][k] : i ∈ [0, r], j ∈ [0, 3], k ∈ [0, 3]} ∪ { δSKi [j][3] : i ∈ [1, r], j ∈ [0, 3], k ∈ [0, 3]} Sboxes256 = { δKi [j][3] : i ∈ [1, r − 1], j ∈ [0, 3]}, ∪ { δXi [j][k] : i ∈ [0, r − 1], j ∈ [0, 3], k ∈ [0, 3]} Deﬁnition of constraints related to KS for Step 1 when l ∈ {192, 256} Constraints (C7 ) and (C8 ) of Fig. 2 correspond to the key schedule for AES-128. For AES-192 and AES-256, these two constraints must be replaced by the constraints listed below.

51

AES-192 ∀i ∈ [0, r − 1], ∀j ∈ [0, 3] :

if (i − 1)%3 = 2 then XOR(ΔKi [j][0], ΔSKi−1 [(j + 1)%4][3], ΔKi−2 [j][2]) else XOR(ΔKi [j][0], ΔKi−1 [j][3], ΔKi−2 [j][2]) ∀i ∈ [2, r − 1], ∀j ∈ [0, 3] : XOR(ΔKi [j][1], ΔKi−2 [j][3], ΔKi [j][0]) ∀i ∈ [1, r − 1], ∀j ∈ [0, 3] : if i%3 = 1 then XOR(ΔKi [j][2], ΔSKi [(j + 1)%4][1], ΔKi−1 [j][0]) else XOR(ΔKi [j][2], ΔKi [j][1], Ki−1 [j][0]) ∀i ∈ [1, r − 1], ∀j ∈ [0, 3] : XOR(ΔKi [j][3], ΔKi−1 [j][1], ΔKi [j][2]) AES-256 ∀i ∈ [1, r − 1] : ∀i ∈ [1, r − 1], ∀j ∈ [1, 3] :

XOR(ΔKi+1 [0][0], ΔSKi [1][3], ΔKi [0][0]) if i%2 = 0 then XOR(ΔKi+1 [j][0], ΔSKi [(j + 1)%4][3], ΔKi [j][0]) else XOR(ΔKi+1 [j][0], ΔSKi [j][3], ΔKi [j][0]) ∀i ∈ [1, r − 1], ∀j ∈ [0, 3], ∀k ∈ [1, 3] : XOR(ΔKi+1 [j][k], ΔKi−1 [j][k − 1], ΔKi [j][k]).

52

We wish to conﬁrm that there are no known conﬂicts of interest associated with this publication and there has been no signiﬁcant ﬁnancial support for this work that could have inﬂuenced its outcome. We conﬁrm that the manuscript has been read and approved by all named authors and that there are no other persons who satisﬁed the criteria for authorship but are not listed. We further conﬁrm that the order of authors listed in the manuscript has been approved by all of us. We conﬁrm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we conﬁrm that we have followed the regulations of our institutions concerning intellectual property. We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the oﬃce). He/she is responsible for communicating with the other authors about progress, submissions of revisions and ﬁnal approval of proofs. We conﬁrm that we have provided a current, correct email address which is accessible by the Corresponding Author and which has been conﬁgured to accept email from [email protected]. Signed by all authors as follows:

• David Gerault

• Pascal Lafourcade

• Marine Minier

• Christine Solnon

1

Computing AES related-key differential characteristics with constraint programming

Computing AES related-key differential characteristics with constraint programming

Recommend Documents