ARTICLE IN PRESS
JID: MICPRO
[m5G;February 1, 2017;6:32]
Microprocessors and Microsystems 0 0 0 (2017) 1–10
Contents lists available at ScienceDirect
Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro
Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection Cezar Reinbrecht a,∗, Altamiro Susin a, Lilian Bossuet b, Georg Sigl c, Johanna Sepúlveda c a
Federal University of Rio Grande do Sul, Brazil Laboratoire Hubert Curien - UMR CNRS 5516, University of Lyon, France c Institute for Security in Information Technology, Technical University of Munich, Germany b
a r t i c l e
i n f o
Article history: Received 16 June 2016 Revised 24 November 2016 Accepted 13 December 2016 Available online xxx Keywords: Network-on-Chip Security NoC Timing attack Timing side-channel attack
a b s t r a c t Many authors have shown how to break the AES cryptographic algorithm with side channel attacks; specially the timing attacks oriented to caches, like Prime+Probe. In this paper, we present two practical timing attacks on NoC that improve Prime+Probe technique, the P+P Firecracker, and P+P Arrow. Our attacks target the communication between an ARM Cortex-A9 core and a shared cache memory. Furthermore, we evaluate a secure enhanced NoC as a countermeasure against the timing attack. Finally, we demonstrate that attacks on MPSoCs through the NoC are a real threat and need to be further explored.
1. Introduction Multi-processors Systems-on-Chip (MPSoC) have emerged as the hardware paradigm for the next generation devices. MPSoCs provide flexibility and energy efficiency, being suitable for mobile segment and also new trends such as Internet-of-Things (IoT). These applications have high interactivity with external devices resulting in several security concerns. Typically, the shared hardware resources, like processors, memory, and communication components are potential targets of attacks. In the past few years, memories have been used to retrieve sensitive information from computational systems through Side Channel Attacks (SCA). SCAs are one of the most dangerous threats that targets hardware components [1,2]. SCAs based on timing behavior of memories became very popular, where the first attacks targeted to break the Advanced Encryption Standard (AES) [3–6]. One of the most efficient timing attack technique is the Prime+Probe, proposed by Osvik et al. [7]. Prime+Probe attack uses a spy process in the same CPU of the victim, to monitor the cache accesses during cryptographic operations. Different authors have optimized P+P, as presented in [8–11]. Recent works have proposed new attack strategies for more complex systems architectures, like MPSoCs. However, attacking MPSoCs environments is challenging. Main reasons
∗
Corresponding author. E-mail addresses:
[email protected],
[email protected] (C. Reinbrecht). URL: http://lattes.cnpq.br/2486661464139271 (C. Reinbrecht)
© 2017 Elsevier B.V. All rights reserved.
include the complexity of the system and new security features. Novel MPSoC protection wraps sensitive IP cores in secure zones [12]. Inside a secure zone, the IP cores run trusted applications, which allows the processing of sensitive information in a highly controlled environment. Such feature turns impossible the execution of the Prime+Probe attack since a spy process cannot run on the same victim’s IP core. However, the channel communication structure can also be used to attack the system [13]. This work proposes the improvement of the Prime+Probe attack by expanding the attack to the communication structure from the MPSoC, the Network-on-Chip (NoC). The NoC also leaks important information about the system. The works in [13–16] showed the potential threat of the NoC and the execution of a SCA. Therefore, we explore that vulnerability together with shared memories attack strategies to implement a novel Prime+Probe strategy, the NoC Prime+Probe (NoC P+P). Considering the characteristics of the MPSoC, such as the configuration of the cache memory and attributes of the NoC, different strategies can be applied to perform the NoC P+P. As a consequence, we developed two different techniques to cope with various cache sizes and NoC behavior: the P+P Firecracker and the P+P Arrow. The P+P Firecracker is an attack that targets the identification of all non-accessed sets of the cache during a cryptographic task. P+P Arrow aims to discover the used sets of the shared cache at each access. The main difference between these techniques is that P+P Arrow requires high precision to monitor the NoC resulting in a more efficient attack. Therefore, the main contributions of this work are:
http://dx.doi.org/10.1016/j.micpro.2016.12.010 0141-9331/© 2017 Elsevier B.V. All rights reserved.
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010
ARTICLE IN PRESS
JID: MICPRO 2
[m5G;February 1, 2017;6:32]
C. Reinbrecht et al. / Microprocessors and Microsystems 000 (2017) 1–10 Table 1 Summary of Prime+Probe attacks. Work
Platform
Leakage Source
Method
Traces used
Osvik et al. [7] Xinjie et al. [8] Liu et al. [10] Oren et al. [11] Yao et al. [14] Wassel [15] Sepúlveda et al. [13] This work (Firecracker) This work (Arrow)
SoC (single core) SoC (single core) Bus-based MPSoC Bus-based MPSoC NoC-based MPSoC NoC-based MPSoC NoC-based MPSoC NoC-based MPSoC NoC-based MPSoC
L1 Cache L1 Cache LLC - L3 Cache LLC - L3 Cache NoC NoC NoC NoC (Shared Cache) NoC (Shared Cache)
Spy process Spy process Spy process Browser process Spy process Spy process Spy process Spy process Spy process
16,0 0 0 350 33,600 50 0 0 Not mentioned Not mentioned Not mentioned 14 256
• Perform an analysis of shared memory configuration and its impact on system security • Perform an analysis of NoC configuration and its impact on system security • Present two novel attacks to be performed in NoC-based MPSoCs; • Describe practical experiment and results; This paper is organized into seven sections. Section 2 presents related work, regarding cache timing attacks. In Section 3, the background information regarding the proposed platform and cipher algorithm are explained. The proposed timing attack are presented in Section 4. The countermeasure Gossip NoC is described in Section 5. Section 6 shows all experiments and results. Finally, we conclude the paper in Section 7. 2. Related work Timing attacks on caches were first mentioned by Kocher [3] and Kelsey et al. [4]. According to Kocher, cryptographic algorithms running on a platform always presents timing leakages that can be used for performing side channel attacks (SCA). Tsunoo et al. [5] presented for the first time a practical timing attack on caches, breaking the DES algorithm. This work opened a new front on SCA. Bernstein [6] adapted Tsunoo attack to break AES cryptography. At the same time, Osvik et al. [7] proposed three techniques to perform very effective timing attacks on caches: (i) Evict+Time, (ii) Prime+Probe, and (iii) Asynchronous. The Prime+Probe was further applied in [8–11]. Xinjie et al. [8] propose a novel analysis strategy to implement Prime+Probe. Instead of using the accessed lines of the cache, it targets the non-accessed lines. The objective was to reveal the key with a reduced number of traces. Liu et al. [10] present a Prime+Probe attack targeting the last level cache of a bus-based MPSoC. It employs two techniques to break an ElGamal decryption: i) Probe cache sets without knowing the virtual address mapping; and ii) identify victims security-critical accesses using temporal access patterns. The Prime+Probe technique is very suitable to attack MPSoCs platforms since its components can directly access the hardware information, such as physical addresses and communication infrastructure. However, a practical timing attack on NoC-based MPSoC has not been reported yet. The exploration of NoC channels as a leakage source was addressed by Yao et al. [14], Wassel [15] and Sepúlveda et al. [13]. The works of Yao and Suh [14] and Wassel et al. [15] propose the integration of a hard QoS mechanism to isolate the sensitive information. They include temporal network partitioning, based on high [14] and bounded [15] priority arbitration schemes. Furthermore, the work of Sepúlveda [13] proposes random arbitration and adaptive routing as protection techniques. Despite the different levels of protection, these works do not show the effective execution of the timing attack. Our contributions are to show for the first time a timing attack in a NoC-based MPSoCs
and to verify the timing protection of Gossip NoC. Table 1 presents the state-of-the-art of the attacks in SoCs and MPSoCs. 3. Background This section introduces the concepts required to understand the Prime+Probe attack. The first subsection shows the MPSoC environment. The second one presents the AES algorithm and the cache accesses. 3.1. MPSoC architecture The target MPSoC architecture is composed of homogeneous processing IP cores (ARM Cortex-A9), external interfaces, memories and a NoC. Our approach can also be extended to heterogeneous NoC. The memory hierarchy follows the proposal of Sievers et al. [17], where the processing IP cores have access to a local and a shared caches. This memory strategy increases the performance and reduces the area overhead when compared to full local cache policy. The NoC is characterized by a mesh topology and deterministic routing algorithm. Moreover, the MPSoC implements a security zone, as in [12], defined at design time. The IPs inside this zone are considered trusted and are responsible for performing sensitive operations, such as the AES cryptography. Hence, normal Prime+Probe attack, which requires a spy process running in the same crypto IP core is impossible. To overcome this protection, new Prime+Probe mechanisms are needed in MPSoC environments. Fig. 1 presents an example of our target MPSoC, composed by 12 IP cores, one shared cache (IP 0), one shared memory (IP 15) and two external interfaces (IP 3 and IP 11) in a 4x4 mesh-based NoC. 3.2. Memory access in AES The timing attack targets to reveal the secret key used in the AES cryptography. AES is widely used in many (especially commercial) applications. It is based on a substitution-permutation strategy, composed of a key expansion step and several rounds that depend on the key size (10 rounds for a key of 128 bits). It can be implemented using just logical and arithmetic operations. However, to obtain better performance, this cipher is implemented with the main operations already pre-computed and stored in big tables (each table with 1kB), as presented in [18]. Fig. 2 presents the AES dataflow. The performance-oriented AES has two main phases. The first one is to calculate the expanded key (K → k0 ,…, k10 ), that will be used for all round computations. The second phase is to compute the intermediate states, given by xround . The same operations are repeated in every round. Each round is composed of the operations SubBytes, ShiftRows, and MixColumns (Fig. 2). The result is XORed with the round key Kround . These three operations are precomputed and stored in four tables (T0 , T1 , T2 , T3 ). Therefore, given
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010
ARTICLE IN PRESS
JID: MICPRO
[m5G;February 1, 2017;6:32]
C. Reinbrecht et al. / Microprocessors and Microsystems 000 (2017) 1–10
3
Fig. 1. Homogeneous MPSoC platform arranged in a 4 × 4 mesh topology.
a 16-byte plaintext p = ( p0 , . . . , p15 ), encryption proceeds by computing a 16-byte intermediate state at each round r as presented by Eq. (1):
xr = (xr0 , . . . , xr15 )
(1)
The resulting x10 is the ciphertext. These lookup tables combine all algebraic operations, which are ShiftRows, MixColumns and SubBytes. The last table is different, because it does not include the MixColumns computation as observed in Fig. 2.
The initial state x0 is computed by Eq. (2):
x0i = pi ki for i = 0, . . . , 15
(2)
The initial calculation is very important for the attack analysis, as presented in section IV-D. Then the first 9 rounds are computed updating the intermediate state as follows in the Eq. (3), for r = 0, .., 8: r+1 r+1 r+1 r r r r r+1 (xr+1 0 , x1 , x2 , x3 ) ← T0 [x0 ] T1 [x5 ] T2 [x10 ] T3 [x15 ] K0 r+1 r+1 r+1 r r r r r+1 (xr+1 4 , x5 , x6 , x7 ) ← T0 [x4 ] T1 [x9 ] T2 [x14 ] T3 [x3 ] K1 r+1 r+1 r+1 r r r r r+1 (xr+1 8 , x9 , x10 , x11 ) ← T0 [x8 ] T1 [x13 ] T2 [x2 ] T3 [x7 ] K2 r+1 r+1 r+1 r r r r r+1 (xr+1 12 , x13 , x14 , x15 ) ← T0 [x12 ] T1 [x1 ] T2 [x6 ] T3 [x11 ] K3
(3) The last round is computed by repeating the equation given in (3) with r = 9, except that T0 , . . . , T3 is replaced by T010 , . . . , T310 .
3.3. Timing side channel attack on NoCs A timing side-channel attack is an analysis that exploits information leakage through computation or communication timing behavior. The work in [13] shows this attack on a NoC using a malicious IP inside the MPSoC injecting data to saturate a NoC path, and then observes the variation on its throughput. The change of the throughput is used to infer the access pattern traffic that intersects the router under attack. Sensitive communication will produce low throughput rates. According to Reinbrecht et al. [16], this attack can be implemented by one or more infected IPs, where they can be classified into two groups: • Injectors, responsible for injecting data in the NoC at high data rates with the objective to increase the congestion of the attacked path;
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010
ARTICLE IN PRESS
JID: MICPRO 4
[m5G;February 1, 2017;6:32]
C. Reinbrecht et al. / Microprocessors and Microsystems 000 (2017) 1–10
Fig. 2. AES-128 dataflow.
Table 2 Relation of true and estimated sensitive packets by the attack. Timing attack
Estimated
Real
Successful rate
Accuracy
Scenario 1 Scenario 2
6647 8663
160 0 0 160 0 0
41.54% 41.64%
100% 76.72%
router. The Table 2 presents the relation of true sensitive packets and the estimated by the attacker. We tested two scenarios: 1. Scenario 1 - External communication flows do not access victim IP 2. Scenario 2 - External communication flows can access victim IP Despite our measurements can identify only 41% of all accesses occurred, the accuracy is 100% in a case where no other IP accesses the victim during the attack. This accuracy decreases when additional IPs communicate with the victim because it is not possible for the attacker to identify the destination of the traffic. However, if a second infected IP is included in the system, the accuracy can be increased again, since it is possible to determine when the victim’s access takes place. 4. NoC Prime+Probe attack: firecracker & arrow
Fig. 3. Transmission latency of attacker. Latencies higher than five cycles represents big collisions in attackers router.
• Observers, accountable to inject data at lower data rates and to sample the delay of their packets with the objective to collect the throughput traces of the attacked path. The malicious packets do not necessarily violate the security policies of the NoC (source, destination or size) to execute the attack, so often these packets can not be detected by a secure enhanced NoC. To understand how certain the NoC timing attack can be in a real MPSoC scenario, a practical experiment was elaborated. It evaluates the efficacy and precision of a timing attack on NoC. Practical experiment. This experiment evaluates the successful rate and accuracy of the collected data from the attacker (infected IP). The successful rate is the quantity of identified sensitive accesses. The accuracy represents the false-positives inside the gathered information. To extract these metrics, we evaluated 50 0 0 samples. Each sample corresponds to the latency of sending a message to the NoC during a cryptography task. After the experiment, the identified accesses by the attack were compared with the real sensitive traffic that passed through the network. Such comparative was used to calculate the metrics of success rate and accuracy, presented in Table 2. Fig. 3 presents the latency results. It can be observed that the typical latency of simple packets transmitted by the network interface is five cycles. Then, any variation on this latency can be interpreted as a big packet passing through attackers
Our proposal follows the previous works on Prime+Probe to implement a cache attack during a cryptographic task, but we introduce the combination of the NoC timing attack to improve the technique. As a result, we present a novel attack called NoC Prime+Probe. This combination allows performing such attacks in high complex systems, like MPSoCs. Therefore, the objective of the NoC Prime+Probe attack is to use the leakage channel of the NoC to monitor the interaction between the shared cache and the victim IP performing the cryptographic task to retrieve the secret key. The preconditions to perform the NoC Prime+Probe attack are: • • • • •
Attacker Attacker Attacker Attacker Attacker ory • Attacker • Attacker
knows IPs’ mapping on MPSoC knows the routing algorithm used in the NoC knows the cache configuration generates the encryption plaintext knows the location of the AES lookup tables in memcan access the shared cache can control an IP core inside the MPSoC
4.1. Stages of both attacks To develop an effective attack, NoC P+P follows a five-stage process: (i) infection; (ii) prime; (iii) NoC timing attack; (iv) probe; and (v) analysis. The dataflow can be observed in Fig. 4. 1. Infection: The Infection starts when the attacker stores a malware into the MPSoC. Such malicious software may be spread into several IPs. The infected IPs must intersect the sensitive path. Several infected IPs may aid in the malicious monitoring task, thus improving the efficiency of the attack.
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010
JID: MICPRO
ARTICLE IN PRESS
[m5G;February 1, 2017;6:32]
C. Reinbrecht et al. / Microprocessors and Microsystems 000 (2017) 1–10
5
infected IP core could be placed at IP 1. This position guarantees that cache responses to the victim will collide with the attacker’s packets. 6. Probe: The Probe aims to verify the accessed cache locations during the AES execution. After the trigger from the NoC Timing Attack stage, the infected IP core fetches its random vector from the cache memory. Longer fetching times reveal cache misses, thus the memory locations used by the AES. The cache configuration, such as size and number of sets and associativity can change the strategy of Probe stage. At this step, it can be observed two branches in the flowchart of the Fig. 4. Fig. 4(a) demonstrates the Firecracker flow, where the non-used sets are sampled. And, Fig. 4(b) shows the Arrow flow, where the used sets are sampled. Further, this information is stored and send to a host computer for the next stage, the Analysis. 7. Analysis: The analysis is a post-processing step that uses the collected data to calculate the possibilities and retrieve the key. The mathematical theory depends on the technique employed, which can explore the usage of the sets (used or non-used) or which round of the AES algorithm (first, first and second or last). The cache line configuration can limit the analysis because the attacker can observe only the accesses of the cache lines and not the words. Therefore, the quantity of word inside a line determines the amount of data that cannot be revealed.
4.2. P+P firecracker technique
Fig. 4. NoC Prime+Probe attack flowchart. (a) Firecracker flow variation. (b) Arrow flow variation.
2. Generate Plaintext: The attacker generates a known random plaintext to be encrypted by AES-128. This plaintext will be used in the analysis step to calculate the possibilities of the key. 3. Prime: The Prime consists in the preparation of the cache by the infected IP. The goal is to guarantee that there are no AES lookup tables in the cache before the attack. By spreading a random vector created by the attacker in the cache, the attacker overwrites several cache memory locations. After the cache is ready, the infected IP sends a random plaintext for the AES cryptoprocessor to start the execution of the cryptographic tasks. 4. Request Encryption: With the cache prepared, the attacker requires an encryption with the known plaintext to the target cipher IP. 5. NoC Timing Attack: The infected IP core throughput is monitored to detect the AES access to the cache. This timing attack is based on the technique used by Reinbrecht et al. [16]. By continuously requesting NoC communication, the infected IP core can monitor the collisions with the sensitive packets. The throughput of the infected IP reveals the access pattern and the volume of communication over the sensitive path. Considering the mesh-based XY routing, NoC presented in Fig. 1, the
P+P Firecracker is an attack that aims to check the non-used parts of the shared cache during the cryptography. This technique was inspired in the Prime+Probe proposed by Xinjie [8]. Using the non-accessed sets as the target, it is not important to know specific information, like what table is accessed or its order. It is only required that the first round has finished. The NoC timing attack resolves this task. Then, the attacker is capable of reducing the key search space, since the non-accessed indexes of the tables have relation with the bytes of the key. The information which table is accessed is not important for P+P Firecracker. Therefore, this technique is more suitable for small shared caches. Small caches concentrate the tables T0, T1, T2 or T3 in the same sets. So, it is not possible to differentiate by the set which table was accessed. The P+P Firecracker full operation can be observed in the flowchart presented in Fig. 4(a). These activities take into account the fact that the tables are concentrated in the same sets and that NoC timing attack can have a lower successful rate. As observed in Fig. 4(a), after generating the plaintext, the prime stage prepares all sets with an attacker information. Then, the attacker requests the encryption and starts the NoC timing attack. It monitors the accesses, where the 16th access triggers the probe stage. At probe stage, attacker data is read again. The cache misses are annotated and send to the host computer for the analysis stage. The Analysis stage follows the algorithm presented in [8]. We perform the first round analysis, where only the accesses during the first AES round are used for the scientific test. The values generated after the first round follow the expression x0i = pi ki (i = 0, . . . , 15 ). Therefore, by testing the data acquired in the Probe stage, it is possible to identify which sets have not been used by the crypto-processor, which means the non-accessed indexes. By assuming that x0i = pi ki (i = 0, . . . , 15 ) and knowing the plaintext byte pi , is possible to prove that ki = pi x0i (i = 0, . . . , 15 ). Hence, possible key candidates can be removed. Therefore, this process is repeated, changing the plaintext, until all bytes of the key are revealed.
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010
JID: MICPRO 6
ARTICLE IN PRESS
[m5G;February 1, 2017;6:32]
C. Reinbrecht et al. / Microprocessors and Microsystems 000 (2017) 1–10
4.3. P+P Arrow technique P+P Arrow is an attack that aims to identify the used set of the shared cache at each access by the cryptographic algorithm. To accomplish that, the NoC timing attack needs to identify the sensitive information communicating. To decrease such difficulty, this technique targets bigger shared caches, where AES tables do not share sets. In AES algorithm, the pre-computed tables are accessed sequentially (T0, T1, T2 and then T3), which gives to attacker the possibility to probe the cache always in the fourth access (considering bigger caches). Therefore, in such scenario, the successful rate of NoC timing attack (the ability to identify sensitive packets) have to be higher than 25% to implement the P+P Arrow. Our results are considered acceptable, where NoC timing attack can identify 41% of all sensitive traffic. The successful rate of 41% implies that the last four access of the first round cannot be taken into account. The assumption that our attack can identify at least one access in four make not safe to assume that the last four accesses from all sixteen belong to the first round of the AES algorithm. This means that this technique avoids the last four access, being capable of revealing only 12 of 16 bytes from the key. As a result, the last 4 bytes requires a postprocessing brute-force execution. Another important issue about this technique is the collisions in reading the shared cache when attacker and victim IP want to access at the same time. The cryptographic task cannot read in cache during the probe, because it could invalidate valuable information. So, if the AES access is faster than the probe stage, the same attack needs to be performed several times. At each time, different segments of the memory will be checked. The P+P Arrow detailed operation can be observed in the flowchart presented in Fig. 4(b). These actions take into account the fact that all tables are spread in different sets and that NoC timing attack can have a successful rate of 25%. As observed in Fig. 4(b), the prime stage prepare the sets for the attack, but in P+P Arrow, it targets one table per attack. Then, after the request of encryption, the NoC timing attack is performed, and at each four access identified a probe stage is executed. In the case of AES accesses are faster than all probe stage, the attacker needs to remake the process to acquire all sets information. The quantity of repetition depends on the time required to probe all possible sets used by the target table. Finally, the probe sends the information to the host. In P+P Arrow, the Analysis stage follows the original concept of Osvik [7], which correlates the set with the candidate byte of the key. We perform the first round analysis, where only the accesses during the first AES round are used for the scientific test. Since each set represents a group of bytes in the memory system, several bytes can be considered candidates. Then, the attack needs to be performed with different plaintexts to eliminate these possibilities. Experiments revealed that about 20 samples are enough to retrieve a key. However, this strategy only reduces the key search space, showing 12 bytes of the key. The last 4 bytes requires a brute force to be discovered, which means an effort of 232 . 5. NoC protection The Gossip NoC is a security enhanced architecture able to protect the MPSoC against timing attacks. It is composed by a traffic monitor and a counter-measure technique at each router, being a distributed security mechanism. The name gossip is proposed because the traffic monitors generate alert messages to other routers, creating a gossip message. Besides, to avoid false-positives the router uses a reinforcement parameter, called gossip confidence, to decide when to accept the gossip message. If an attack is detected, the router changes the routing algorithm to the packets that wants
Fig. 5. Gossip router microarchitecture: (1) Gossip in block; (2) Gossip logic; (3) Gossip generator.
to go through the path under attack. Gossip NoC was presented in [16]. Gossip NoC is based on two protection strategies: i) Detection, and ii) Protection. The first one includes the bandwidth monitoring and an alert message (gossip) generation in the presence of abnormal behavior. The second one is triggered when any gossip message is received and which can modify the route of the packet from XY to YX. The alert messages reinforce the suspect of an attack, avoiding false-positives. If an attack is detected, the router changes the routing algorithm for the packets that follows the sensitive path. The usage of XY and YX routing algorithms together is guaranteed as deadlock and livelock free [19,20]. The gossip router microarchitecture is shown in Fig. 5. It is based on a conventional NoC router composed of routing scheme (XY and YX), Round-Robin arbiter and FIFO memory. Additional three main components are integrated: • Gossip In Block: It controls the internal state of the gossip router according to the values of the input signals. When the number of gossip messages received from neighbor routers overcomes the threshold gossip confidence, an attack is confirmed. As a result, the routing of the gossip switching is modified. • Gossip switching: It commutes the incoming packets from an input to an output. Under attack, the traffic is commuted according to the YX algorithm. Otherwise, the XY route is implemented. • Gossip Generator: It monitors the traffic bandwidth. When it exceeds a protection bandwidth threshold, a signal indicating a possible attack is activated and transmitted to the Gossip In Block of all the other routers. 6. Experiments and results In this section we present the experimental setup, the evaluation of the NoC Prime+Probe attack techniques, Firecracker and Arrow, and the Gossip NoC security efficacy, efficiency and cost. 6.1. Experimental setup The MPSoC was implemented in an [21] FPGA. An ARM hard IP executes the AES cryptography. Other processing elements were modeled by means of synthesized traffic generators. The external communication IP used was the UART serial. The AES accesses the lookup tables from the shared cache of the MPSoC. It is a 16-way
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010
JID: MICPRO
ARTICLE IN PRESS
[m5G;February 1, 2017;6:32]
C. Reinbrecht et al. / Microprocessors and Microsystems 000 (2017) 1–10
Fig. 6. Number of key bytes correctly recovered vs number of encryptions needed in Firecracker Attack.
set associative cache, where it is a 1 kB for Firecracker experiment and 32 kB for the Arrow experiment. The cache response time is 16 cycles and the main memory access, in a case of a cache miss, adds 100 additional cycles. However, the total latency of a cache access must consider also the latency of the network interfaces and the NoC routers. Each network interface adds 5 cycles in a congestionfree scenario. The routers of the NoC adds 2 cycles. The proposed environment follows the concept presented in Fig. 1. The Firecracker attack was performed with 20 cryptography requests with random plaintexts generated by the attacker IP. Data collected during the attack from the IP was sended to the serial interface. The host computer acquired the data during the attack and then, computed the results. In the same manner, Arrow attack was implemented in the system, but with the bigger shared cache (32 kB). It requested only 10 cryptography tasks. Both attacks were repeated in a system with Gossip NoC, in order to evaluate the security efficacy. 6.2. Firecracker experiment The P+P Firecracker attack was performed analyzing 20 cryptography tasks, where all data collected was stored in attacker local memory, and sent to the host PC for analysis through UART interface. In Fig. 6 the number of encryption required versus the number of the revealed key bytes through the analysis process is presented. Fig. 6 presents that only 14 cryptographic tasks were needed to complete the attack. It is not possible to reveal all bytes of the key, because there is a limitation caused by the size of the cache line. The line of the cache in our experiment stores four words per time. Hence, after the attack, each table still has four possible addresses to be verified for each byte segment (ki = pi x0i (i = 0, . . . , 15 )). Therefore, to recover all bytes of the key, a final brute force step have to be executed. The effort of the brute force can be calculated by considering that each table has four segments and each segment has four possibilities (22 foronesegment, 4∗ (22 )foronetable). This means a required effort of 28 to recover all 256 possibilities of one table. Then, in order to recover all the key, all the four tables have to be computed, given an effort of 23 2. 6.3. Arrow experiment The P+P Arrow attack was performed analysing 10 cryptography tasks, where all data collected was stored in attacker local memory, and sent to the host PC for analysis through UART interface. The Arrow Attack has an important consideration regarding the NoC timing attack, which is the avoidance of reading collisions in the shared cache. The P+P Arrow is based in the precision
7
Fig. 7. Latency trace of NoC timing attack. Table 3 Shared cache access latencies in cycles and possible reads count between AES accesses. Cache Hit
Cache Miss
Limit
Max. Possible
Required
50 cycles
150 cycles
268 cycles
1 read
64 reads
to acquire the NoC information and probe the cache in the perfect moment. Executing the reading at each four access identified in the NoC, it allows the attacker to work without extreme precision. However, this also increases the problem of read collisions, since after four accesses all sets that contains a table needs to be checked (in this scenario each table do not share sets with each other). So, this means that for a cache with 64 sets per table, P+P Arrow needs to read four times 64 addresses (256 addresses) after each 4 accesses, and before the next cryptographic access, in order to avoid collision. Regarding the implementation of the P+P Arrow without read collisions, a small experiment to measure the time between cryptographic task access was performed. We took the minimum time measured between two requests identified by the NoC timing attack to define a reading limit. Fig. 7 shows the latency measured by the attacker, where the minimum time between two requests was about 268 cycles. Then, we developed an experiment, where our attacker performed several cache requests to measure the accumulated latency. Our results considered several cache hits and only one miss, as expected in our attack scenario. Table 3 presents the results of latency and the target limit. As observed in Table 3, it is possible to make only one access between AES accesses, because in a case of more than one miss identified, the latency will cause a read collision. Therefore, our attack requires to execute a encryption 64 times per target table. Finally, implementing the attack considering the reading collisions problem, we obtained the results presented in Fig. 8. Fig. 8 presents that P+P Arrow required 256 encryptions to perform the attack. In fact, only one generated plaintext was necessary to reveal the bytes of the key. The precision of this attack, identifies the correct set of the cache in the moment it is used. However, due to read collisions, this encryption was performed 64 times per each table, given 256 executions. As presented in P+P Firecracker, four bytes of the key cannot be revealed by this attack due to the fact the cache line stores four words. Besides, P+P Arrow presents another drawback, that is the avoidance of attack the last access to the tables during first round. As a consequence, three more bytes cannot be reavealed, being 9 bytes the limit of this technique. Therefore, a brute force step have to be performed after the analysis stage. To calculate the effort, we use the same logic as presented for P+P Firecracker, but with the consideration that three segments has four possibilities and one segment has 256
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010
ARTICLE IN PRESS
JID: MICPRO 8
[m5G;February 1, 2017;6:32]
C. Reinbrecht et al. / Microprocessors and Microsystems 000 (2017) 1–10
Fig. 8. Number of key bytes correctly recovered vs number of encryptions needed in Arrow Attack. Table 4 FPGA Cyclone V SoC synthesis results of the NoC, the processing elements, caches and the network interfaces (NI).
HPS (ARM core) Cache Attacker Core NI Cache NI Router NoC MPSoC Platform Gossip Router
Logic (in ALMs)
Registers
Power (mW)
n/a 7131 47 198 2560 499 6254 17826 605
n/a 12969 96 280 2725 738 9273 27323 770
n/a 198.75 0.27 2.70 16.02 4.44 63.17 885.61 5.16
possibilities. Then, we have an effort of 21 4 per table, meaning a total effort of 25 6 to reveal all the key. 6.4. Gossip NoC experiment The final experiment was performed to evaluate the timing security efficacy of the Gossip NoC under the proposed attack. Results show that Gossip NoC is able to protect the MPSoC due to the traffic deviation. Consequently, the timing attack did not succeed, not beign possible to perform the probe stage properly. A second attacker could perform a timing attack in the NoC at the same time to discover the new sensitive path. However, since the main attacker triggers the reading step, a synchronization between infected IPs should be implemented. 6.5. Hardware costs Table 4 presents the synthesis results, regarding logic utilization, block memories and power. As observed in Table 4, Gossip router increases the logic area of the NoC about 21%. However, the unprotected NoC that represented 35% of the current MPSoC, became 42.5%, meaning that the true logic overhead in the sytem was 7.5%. When calculating the same impact for power, we obtain only 1.18% of power overhead in the system. 7. Conclusion P+P Firecracker presents an easy approach to implement NoC Prime+Probe attack that requires only 14 different encryptions to reveal almost all the key. This technique does not require high precision in the NoC timing attack, and support noise from various applications that access the shared cache. This attack reduces the search space of the key to 232 . Future work aims to analyze the im-
pact of communication noise, and how to improve the NoC timing attack to surpass such difficulty. The P+P Arrow attack presents a very efficient technique, requiring only one encryption to recover several bytes of the key. However, such technique has two main drawbacks concerning its execution, which adds the necessity to execute the same encryption 256 times for our experiments. The first drawback is related to the reading collisions, that must be avoided during the probe stage of the attack. The second drawback regards the lack of precision of the NoC timing attack on real circumstances, which solution is to avoid the analysis of the last four accesses of the first round. This attack reduces the search space of the key to 256 . In this work, we also presented the efficiency of the Gossip NoC. It was introduced that Gossip NoC is a light approach to avoid sidechannel attacks, increasing the area of the MPSoC by 7.5% only. Besides, we propose strategies to improve the attack against such countermeasure. References [1] R. Karri, K. Wu, P. Mishra, Y. Kim, Concurrent error detection of fault-based side-channel cryptanalysis of 128-bit symmetric block ciphers, in: Design Automation Conference, 2001. Proceedings, 2001, pp. 579–584, doi:10.1109/DAC. 2001.156206. [2] P. Bayon, L. Bossuet, A. Aubert, V. Fischer, F. Poucheret, B. Robisson, P. Maurine, Contactless electromagnetic active attack on ring oscillator based true random number generator, 2012, pp. 151–166. [3] P.C. Kocher, Timing attacks on implementations of diffie-hellman, rsa, dss, and other systems, in: Proceedings of the 16th Annual International Cryptology Conference on Advances in Cryptology, in: CRYPTO ’96, 1996, pp. 104–113. [4] J. Kelsey, B. Schneier, D. Wagner, C. Hall, Side Channel Cryptanalysis of Product Ciphers, Springer Berlin Heidelberg, Berlin, Heidelberg, 1998, pp. 97–110, doi:10.10 07/BFb0 055858. [5] Y. Tsunoo, T. Saito, T. Suzaki, M. Shigeri, H. Miyauchi, in: Cryptographic Hardware and Embedded Systems - CHES 2003: 5th International Workshop, Cologne, Germany, September 8–10, 2003. Proceedings, Springer Berlin Heidelberg, Berlin, Heidelberg, 2003, pp. 62–76. [6] D.J. Bernstein, Cache timing attacks on aes, 2005, (https://cr.yp.to/antiforgery/ cachetiming-20050414.pdf). Accessed: 2016-01-03. [7] D.A. Osvik, A. Shamir, E. Tromer, Cache Attacks and Countermeasures: The Case of AES, Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. 1–20. [8] Z. Xinjie, W. Tao, M. Dong, Z. Yuanyuan, L. Zhaoyang, Robust first two rounds access driven cache timing attack on aes, in: Computer Science and Software Engineering, 2008 International Conference on, 3, 2008, pp. 785–788, doi:10. 1109/CSSE.2008.633. [9] Y.A. Younis, K. Kifayat, Q. Shi, B. Askwith, A new prime and probe cache side-channel attack for cloud computing, in: Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on, 2015, pp. 1718–1724, doi:10.1109/CIT/IUCC/DASC/PICOM.2015.259. [10] F. Liu, Y. Yarom, Q. Ge, G. Heiser, R.B. Lee, Last-level cache side-channel attacks are practical, in: Security and Privacy (SP), 2015 IEEE Symposium on, 2015, pp. 605–622, doi:10.1109/SP.2015.43. [11] Y. Oren, V.P. Kemerlis, S. Sethumadhavan, A.D. Keromytis, The spy in the sandbox: practical cache attacks in javascript and their implications, in: Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, in: CCS’15, ACM, New York, NY, USA, 2015, pp. 1406–1418, doi:10.1145/2810103.2813708. [12] J. Sepulveda, D. Florez, G. Gogniat, Efficient and flexible noc-based group communication for secure mpsocs, in: 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 2015, pp. 1–6, doi:10.1109/ReConFig. 2015.7393301. [13] M. Sepulveda, J.-P. Diguet, M. Strum, G. Gogniat, Noc-based protection for soc time-driven attacks, Embedded Syst. Lett. IEEE 7 (1) (2015) 7–10, doi:10.1109/ LES.2014.2384744. [14] W. Yao, E. Suh, Efficient timing channel protection for on-chip networks, in: Networks on Chip (NoCS), 2012 Sixth IEEE/ACM International Symposium on, 2012, pp. 142–151, doi:10.1109/NOCS.2012.24. [15] H. Wassel, G. Ying, J. Oberg, T. Huffmire, R. Kastner, F. Chong, T. Sherwood, Networks on chip with provable security properties, Micro, IEEE 34 (3) (2014) 57–68, doi:10.1109/MM.2014.46. [16] C. Reinbrecht, A. Susin, L. Bossuet, J. Sepúlveda, Gossip noc – avoiding timing side-channel attacks through traffic management, in: 2016 IEEE Computer Society Annual Symposium on VL SI (ISVL SI), 2016, pp. 601–606, doi:10.1109/ ISVLSI.2016.25. [17] G. Sievers, J. Daberkow, J. Ax, M. Flasskamp, W. Kelly, T. Jungeblut, M. Porrmann, U. Ráckert, Comparison of shared and private l1 data memories for an embedded mpsoc in 28nm fd-soi, in: Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on, 2015, pp. 175–181, doi:10.1109/MCSoC.2015.25.
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010
JID: MICPRO
ARTICLE IN PRESS
[m5G;February 1, 2017;6:32]
C. Reinbrecht et al. / Microprocessors and Microsystems 000 (2017) 1–10 [18] J. Daemen, V. Rijmen, The Design of Rijndael, Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2002. [19] A. Borhani, A. Movaghar, R. Cole, A new deterministic fault tolerant wormhole routing strategy for k-ary 2-cubes, in: Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on, 2010, pp. 1–7, doi:10.1109/ICCIC.2010.5705721.
9
[20] K. Tatas, S. Sawa, C. Kyriacou, Low-cost fault-tolerant routing for regular topology nocs, in: Electronics, Circuits and Systems (ICECS), 2014 21st IEEE International Conference on, 2014, pp. 566–569, doi:10.1109/ICECS.2014.7050048. [21] Altera, 2016. Cyclone v Soc - Overview, (Date last accessed 17-April-2016). URL https://www.altera.com/products/soc/portfolio/cyclone- v- soc/overview.tablet. html.
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010
JID: MICPRO 10
ARTICLE IN PRESS
[m5G;February 1, 2017;6:32]
C. Reinbrecht et al. / Microprocessors and Microsystems 000 (2017) 1–10 Cezar Rodolfo Wedig Reinbrecht, Phd. Student (Federal University of Rio Grande do Sul UFRGS, BR), Msc. in Computer Science (Federal University of Rio Grande do Sul UFRGS, BR, 2012) and B.Sc. in Computer Engineering (Catholic University of Rio Grande do Sul PUCRS, BR, 2009). Joined Laboratoire Hubert Curien (France) in 2015 working as a researcher in the field of MPSoC security. He worked at NSCAD Microeletronics from 2010 to 2014, being the project manager from 2013 to 2014 and instructor/IC designer from 2010 to 2012. As the project manager, the main projects were an ARM-M3 SoC with support to IEEE 802.15.4 and the start of the project of a transponder for the Brazillian Cubesat (sattelite) targeted for the SBCDA (INPE). As IC designer, he worked in four main frontend designs, being relevant a fault tolerant SoC (containing two MIPS) with partnership with Brazilian Spatial Agency and a digital baseband for the IEEE 802.15.4. Also, he worked at Uniritter University as a professor in Informatics dept. from 2013 to 2015. His research interest includes MPSoCs architectures, Security on Networks-on-Chips and silicon photonics.
Altamiro Susin is Graduated in Electrical Engineering from the Federal University of Rio Grande do Sul (1972), Diplome d’Etudes expertise Approfondies the Institut National Polytechnique de Grenoble (1979), M.Sc. in Computer Science from the Federal University of Rio Grande do Sul (1977), Ph.D. Informatique in the Institut National Polytechnique de Grenoble (1981) and post-doctorate from McGill University (1998). He is currently a professor at the Federal University of Rio Grande do Sul, His research is focus in Electrical Engineering with emphasis on microelectronics. His main topics are: Microprocessors, Integrated Circuits Architecture, Microelectronics, VLSI.
Lilian Bossuet received an M.S. degree (2001) in electrical engineering from Institut National des Sciences Appliques Rennes, France, and a Ph.D. degree (2004) in electrical engineering and computer sciences from the University of South Brittany, Lorient, France. From 2005 to 2010, he has been an Associate Professor, and the head of the Embedded System Department in the Bordeaux Institute of Technologies. From 2010, he is Associate Professor at the University of Lyon/Saint-tienne and he is a member of the Hubert Curien Laboratory. He is the head of the Embedded System Security group in this laboratory. He holds the outstanding CNRS (Centre National de la Recherche Scientifique) Chair of Applied Cryptography and Embedded System Security. His main research activities focus on embedded systems hardware security, IP protection, crypto-processor design, and reconfigurable architecture for security. Lilian is a senior member of the IEEE.
Georg Sigl finished his PhD in Electrical Engineering at Technical University Munich in 1992 in the area of layout synthesis. Afterwards he introduced new design-for-testability concepts in telecommunication ASICs at Siemens. In 1996 he joined the automotive microcontroller department at Siemens HL (later Infineon) to develop a universal library for peripherals to be used in 16- and 32-bit microcontrollers. From 20 0 0 he was responsible for the development of new secure microcontroller platforms in the Chip Card and Security division. Under his responsibility, two award winning platforms - the SLE88 (Cartes Sesames Award 2001) and the SLE78 (Cartes Sesames Award 2008; Innovation Award of the German Industry 2010) have been designed. In June 2010, he founded a new institute at Technical University Munich for Security in Electrical Engineering and Information Technology. In parallel, he is driving embedded security research as director at the Fraunhofer Research Institute for Applied and Integrated Security AISEC Munich.
Martha Johanna Sepúlveda Flórez received the M.Sc. and Ph.D. degrees in Electrical Engineering Microelectronics by the University of São Paulo, Brazil in 2006 and 2011, respectively. She was Post- doctoral fellow at the Integrated Systems and Embedded Software group at this University and at the group of Embedded Security of the University of South Brittany, France. Moreover, Dr. Sepulveda was a Visiting Researcher at the Computer Architecture group at the University of Bremen, Germany. In 2014, she worked as a Senior INRIA Postdoctoral researcher at the Heterogeneous Systems group at the University of Lyon, France. Since 2015, she holds a Senior Researcher Assistant position at the Technical University of Munich, Germany. She has been working in the field of embedded security design for more than 10 years. Her research interest also includes high performance SoC design and new technologies design.
Please cite this article as: C. Reinbrecht et al., Timing attack on NoC-based systems: Prime+Probe attack and NoC-based protection, Microprocessors and Microsystems (2017), http://dx.doi.org/10.1016/j.micpro.2016.12.010