Nano Communication Networks (
)
–
Contents lists available at ScienceDirect
Nano Communication Networks journal homepage: www.elsevier.com/locate/nanocomnet
Role based shared memory access control mechanisms in NoC based MP-SoC Arnab Kumar Biswas a,∗ , S.K. Nandy b a
Department of Electronic Systems Engineering, Indian Institute of Science, Bangalore, India
b
Supercomputer Education and Research Center, Indian Institute of Science, Bangalore, India
article
info
Article history: Received 26 February 2015 Received in revised form 1 November 2015 Accepted 18 November 2015 Available online xxxx Keywords: NoC architecture RBAC implementation Access control Confidentiality Availability
abstract Security is becoming one of the main aspects of Multiprocessor System-on-Chip (MP-SoC) design. Software attacks, the most common type of attacks, mainly exploit vulnerabilities like buffer overflow. This is possible if proper access control to memory is absent in the system. In this paper, we propose three hardware based mechanisms to implement Role Based Access Control (RBAC) model in Network-on-Chip (NoC) based MP-SoC. According to our knowledge, our solutions are the first attempts to implement hardware based solutions based on RBAC model. All the three proposed mechanisms use Resource Access Manager (RMAN) in a shared memory MP-SoC. Processing element (PE) connected routers of the NoC are retrofitted with a module called Local Access Manager (LMAN). Three proposed access control mechanisms are central, hybrid, and local depending upon the location of the access control decisions taken. The proposed mechanisms ensure confidentiality and availability in the whole MP-SoC i.e they not only can detect and prevent unauthorized access attack but can prevent denial-of-service attack too. The largest area increase of a PE connected router (for hybrid case) is only 8.6% compared to a normal router. Experimental results show that in most of the cases, proposed mechanisms have similar average memory access latencies compared to a normal NoC. Only drawback of the three mechanisms is that they have a scalability limit and cannot be used for very large number of PEs. This drawback is also solved by modified local access control mechanism. We show the effectiveness of the modified local access control mechanism by implementing it in a massively parallel multi-application architecture called REDEFINE. We show that the modified local access control mechanism is scalable and causes acceptable area and power overhead. Synthesis results show that the REDEFINE router area and power is increased by only 1.13% and 1.3% respectively without deteriorating maximum frequency of operation. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction A Multiprocessor System-on-Chip (MP-SoC) typically comprises two sub-systems, viz. computation and communication. Fig. 1 shows the abstract representation of the
∗
Corresponding author. E-mail addresses:
[email protected] (A.K. Biswas),
[email protected] (S.K. Nandy). http://dx.doi.org/10.1016/j.nancom.2015.11.002 1878-7789/© 2015 Elsevier Ltd. All rights reserved.
two sub-systems and their interactions. Now-a-days, MPSoCs are used extensively as embedded systems, specifically in different kinds of consumer electronic products including PDAs and cell phones. These products are used to access, store or manipulate various kinds of sensitive information that makes security a serious concern in MP-SoC design [1]. Secure MP-SoC means both computation and communication sub-systems are secure from different kinds of attacks which can be classified as physical, side-channel,
2
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
Fig. 1. Multiprocessor System-on-Chip mainly consists of computation sub-system and communication sub-system.
and software attacks [1]. Among them, software attacks are more common because of their relatively easy implementation and deployment procedure. A software attack is launched by executing a malicious software on the target system or by exploiting some vulnerabilities in an installed software. Buffer overflow is the most common software vulnerability that is used to launch various kinds of software attacks that result in unauthorized data access i.e. stealing of sensitive data and modification/destruction of data [2]. Buffer overflow is possible if writing to an unauthorized location in the memory is not prevented. So, the root cause is the unauthorized access or the violation of confidentiality. This indicates that, if we implement an efficient access control mechanism in hardware itself, majority of the software attacks can be prevented. The access control mechanism can not only control the memory access by authorized applications only, but also the actions they are authorized to do. In this paper, our aim is to protect the computation-subsystem from software attacks by implementing security measures on top of the communication sub-system. Onchip communication architecture can play a central role in MP-SoC security because of its system wide presence. In this paper, Network-on-Chip (NoC) is considered as the communication sub-system. The NoC not only can perform on-chip communication efficiently but also it can help to ensure a certain security policy in the MP-SoC. The entry from the computation sub-system into the NoC is at the routers only. So, security mechanisms have to be present at all the entry points (routers) of the communication sub-system to provide security. In this paper, we propose three access control mechanisms based on the Role Based Access Control (RBAC) model. Details of the RBAC model and its hardware implementation mechanism in MP-SoC are given in Section 3. The proposed three access control mechanisms are central, hybrid, and local depending upon the location of the access control decision taken. The central approach has a central Resource access manager (RMAN) that is responsible for access control decision taking. Every processing element (PE) connected router is retrofitted with a Local access manager (LMAN) and these routers are connected to the RMAN. In the hybrid approach, both the RMAN and the LMANs are responsible for access control decision taking. Only the LMANs look after the memory access control decision-taking in the local approach. These three
)
–
proposed mechanisms have scalability limit and are not applicable to shared memory MP-SoCs that contain large number of PEs and memory modules. The shared memory provides data exchange and synchronization support in an MP-SoC. The proposed access control mechanisms prevent a number of software attacks. This helps not only to ensure confidentiality by means of access control but also it enables the MP-SoC to maintain resource (memory) availability to the authorized applications. That means the proposed techniques can detect and prevent information leakage/corruption attack as well as it can prevent denialof-service attacks too. The proposed access control mechanisms are built into the NoC architecture framework which we already have. Next the scalability limit of three access control mechanisms are solved by modifying the local access control mechanism. We implement this modified local access control mechanism in a massively parallel data flow architecture called REDEFINE [3–6]. We use the multiapplication supporting version of REDEFINE. The modified local access control mechanism is shown to prevent a malicious application to launch unauthorized access attack in REDEFINE. The first contribution of our proposed methods (central, hybrid, local, and modified local) is the hardware implementation and analysis of RBAC model with simulation and synthesis results. It is known that shared bus is profitable compared to NoC up to the number of PEs around 6 (in case of simultaneous parallel applications); otherwise the scalability limit of shared bus is around a dozen [7]. So, the first three proposed methods are still useful. The second contribution of the first three methods is that they prepare the path for the highly scalable solution i.e. the modified local approach. Modified local approach can be easily understood only after clear description and analysis of the first three techniques because it is evolved from the first three techniques. The main contribution of modified local approach is that it is highly scalable. Its implementation in massively parallel REDEFINE architecture is described with simulation and synthesis results. 1.1. Outline of the paper The rest of the paper is organized as follows. Section 2 discusses about the related works. Next Section 3 analyses the RBAC model and its implementation procedure in an MP-SoC. Section 4 gives detailed descriptions of the three proposed memory access control mechanisms. Simulation results for the three access control mechanisms are given in Section 5. Section 6 contains the descriptions of highly scalable modified local access control mechanism and its implementation in REDEFINE architecture. Simulation results of REDEFINE implementation are also given in that section. Section 7 ends the paper with conclusions. 2. Related work Hardware protection mechanisms such as Trusted Platform Module (TPM) are designed to protect encryption keys [8]. TPM also includes the measurements of the
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
integrity of an OS to establish trust. However, TPM’s protection model does not consider how and where the keys are stored after they are removed from TPM. It is the application that is responsible for ensuring proper access control of sensitive data. In fact, the decrypted keys from TPM can be extracted by memory content examination [9,10]. Reset attack against TPM is also reported in literature [11]. Recently in 2013, Germany’s Federal Office for Information Security or BSI issued a statement that TPM 2.0 chips could lead to ‘‘a loss of control’’ over both the Operating System (OS) and hardware, without specifying exactly how that could occur [12]. Our target is to provide security inside an MP-SoC without depending on an external device like TPM. Security-aware communication architecture design has been introduced by [13]. Their architecture called SECA enforces access control rules in a particular context. AMBA bus transactions are monitored using a Security Enforcement Module (SEM) and a Security Enforcement Interface (SEI) for each slave device connected to the bus. The solution is applicable to the AMBA bus based systems only and is not applicable to NoC based systems. In ARM TrustZone [14], one extra NS (non secure) bit is added to the 32 bit AXI bus. This NS bit is used to differentiate between secure transactions from nonsecure ones. However, there is a problem of data coherence resulting from the 33 bit address space. Same memory location appears as two distinct locations in the address map, one Secure and one Non-secure. It is evident that the TrustZone is only applicable to a modified 33 bit AXI bus based system. In [15], software–hardware mechanisms are provided by the Secret Protection (SP) architecture. There, a Trusted Software Module (TSM) is directly protected by SP’s hardware features. In that work, authors enable applicationlevel access control and information sharing with direct hardware support and protection, bypassing the dependency on the OS. This means that existing applications have to be modified according to the architecture. In [16], a software–hardware architecture called DataSafe is proposed that realizes the concept of self-protecting data. This architecture provides dynamic instantiations of secure data compartments (SDCs) and hardware monitoring of the data flows from the compartment. The security is dependent on the modified hypervisor code base that is responsible for different cryptographic operations. Memory organization is changed to store tag bits for every stored word. Processor datapaths are also changed to add hardware tags. So, overhead is present in both software and hardware parts of the solution. Our proposed solutions do not require major changes in software/hardware as required in earlier works. Authors of [17] propose a mechanism to secure a reconfigurable MP-SoC based on NoC. The system consists of Secure NIs (SNIs) and a Secure Configuration Manager (SCM). The SNIs handle attack that may be caused by denial-of-service attacks and unauthorized read/write accesses. The SCM configures system resources and NIs and also monitors the system for possible attacks. The granularity of the access control is the hardware device i.e. if one PE has some permission, all the applications running in that PE also have that permission. This is a major
)
–
3
drawback and application level flexibility of access control is also not provided. Our proposed solutions use role based access control whose granularity of the access control is finer than device based access control. In [18], authors propose a secure NoC architecture composed of a set of Data Protection Units (DPUs) implemented within the Network Interfaces (NIs). A central unit called Network Security Manager manages the runtime configuration of the programmable part of the DPUs. The access rights are defined for two operating roles (kernel/user) in each DPU but that is not sufficient to support multiple security roles per PE. Though there is no performance penalty during access control, the mechanism implements permission tables in every NI which makes this solution extremely area hungry. The RBAC model provides finer access control granularity in our case and the overhead of entire permission table in every NI is also absent in our mechanisms. Authors of [19] propose a memory Protection Unit that supports flexible co-hosting of multiple software stacks, on a shared memory MP-SoC using an NoC. Apart from caches present in every NI, there is a central memory holding the full permission table. So, the area requirement is not less than the earlier solution in [18]. All access control decisions are made using caches. In the event of any cache miss, the mechanism requires several clock cycles because of long cache update procedure. If any application with any access level is allowed to run in any PE, then the application performance is hampered by permission update overhead. The RBAC model provides finer access control granularity in our case and the overhead of large caches in every NI is also absent in our mechanisms. 3. Implementation of role based access control The most popular access control model is Role Based Access Control (RBAC) model [20–22]. A role is a semantic construct that forms the basis of RBAC. A system administrator creates roles according to the different functions that must be performed in that system. Once roles are defined, the administrator grants permissions (access authorizations) to the roles instead of directly assigning them to the users. Then, the users are assigned correct roles according to user identities, performance necessities, and security requirements. The combination of users and permissions brought together by roles is not fixed. On the other hand, a permission associated with a role is more stable. Therefore, implementation of access control mechanism becomes simpler if we consider roles. New users can simply be reassigned to existing roles by the administrator as per requirements (both performance and security). Fig. 2 shows four main entities i.e. users (U), roles (R), permissions (P), and sessions (S) of the RBAC base model. In case of an MP-SoC, users (U) in the RBAC model correspond to the different applications running on different PEs. These applications are assigned appropriate roles on the basis of functional requirements to complete a task. Principle of least privilege is followed for the assignments i.e. only the least access permissions (in terms of roles) are granted just to complete a specific task. An application starts a session at the starting of its execution. Fig. 2 also shows user assignment (UA) and permission
4
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
4. Central, hybrid and local access control mechanisms
Fig. 2. Role based access control base model.
assignment (PA) relations which are basically manyto-many relations. An application may have different roles assigned to different parts of it. Again, different applications may need same role to execute a particular task. Similarly, a role can have many permissions, and a permission can also be assigned to many roles. The double headed arrow from the session to R in Fig. 2 indicates that each session may map an application to possibly many roles. Again, the single-headed arrow from the session to U in Fig. 2 indicates that each session is associated with an application only. An MP-SoC with multiple PEs may have multiple sessions running at the same time. The formal definition of RBAC base model is given as follows [21]. Definition 1. - The RBAC base model has the following components: • U, R, P, and S (users, roles, permissions, and sessions, respectively); • PA ⊆ P × R, a many-to-many permission-to-role assignment relation; • UA ⊆ U × R, a many-to-many user-to-role assignment relation; • user : S → U, a function mapping each session si to the single user user(si ) (constant for the session’s lifetime); and • roles : S → 2R , a function mapping each session si to a set of roles roles(si ) ⊆ {r |(user(si ), r ) ∈ UA} (which can change with time) and session si has the permissions r ∈roles(si ) {p|(p, r ) ∈ PA}. Fig. 3 shows the different steps required to execute an application in an MP-SoC with proposed access control mechanisms. The assignment of applications to appropriate roles is done by a trusted software component. It is assumed that a malicious application cannot acquire a role just because of its malicious access request. The trusted software component is part of our ongoing work and beyond the scope of this paper. The administrator stores the role to permission assignments in a table called permission table. The loading of the permission table is authenticated by hash based signatures for all the proposed mechanisms. Other authentication methods are also available in literature that can also ensure that only administrator can load the permission table. Proposed access control mechanisms check every memory going packet’s access request against the permission associated with the packet’s role. In case of an access violation, both the packet and the packet originating PE are blocked.
In this section three access control mechanisms are proposed. In all the cases multiple RMANs and multiple LMANs are used but their structures are different between any two cases. All the figures given in this section are general in nature with indications of optional modules and connections for different proposed cases. For example, if a module or connection is only present for central and hybrid cases, it is mentioned by ‘‘(For Central and Hybrid)’’ in the figure if space is available. This particular module or connection will be absent in local case. The optional points that are not mentioned in a figure are clearly given in the caption of that figure. 4.1. Central access control mechanism Fig. 4 shows a 4 × 4 mesh topology NoC with 4 shared memories and 4 RMANs. Every shared memory module requires an RMAN for access control. The last column of routers do not include LMAN inside them because they are not connected to PEs. All remaining 12 routers are connected to PEs and include LMANs. In central case the connections between the PE connected routers and the RMAN are unidirectional. All packets are diverted by LMAN from the router towards the RMAN if they are destined for the memory. In RMAN, every packet is checked against a permission table entry. The entry is accessed using the packet’s role as an index. If the packet is allowed, it is directly sent to the router connected to the memory. All other packets including the response packet from the memory travel only through the routers. If a malicious application tries to load from or store to a memory location for which it does not have permission, the packet is blocked in RMAN. The data line connection from all 12 routers to RMAN3 is shown. Only 1 router is allowed at a time to send data to RMAN. Fig. 5 shows the router connections required for 4 RMANs and 2 neighboring routers. Request signals are 1 bit, grant/block signals are 2 bit (only 1 bit grant signal for local case) and the data connections to RMAN and other routers are 32 bit wide. Multi-level interconnection implementation allows the 12 routers to communicate with 4 RMANs. For central case, the block update connection to same row RMAN is absent and the data lines to RMANs are unidirectional. The bidirectional arrow of the data lines in Fig. 5 indicates that the same lines carry PPT update also for hybrid and local cases. For local case, only roles are sent to RMAN instead of data and only grant signals arrive from RMAN instead of grant/block signals. The block diagram of RMAN is shown in Fig. 6. It mainly consists of an Arbitration module, a Manager module, a Comparator, and a Permission table. In central case, there is a selector module that detects the grant or the block signals and sends the grant/block signals accordingly. The grant signal goes first and the block signal may go later after data comparison. Every grant/block signal from the selector is 2 bit wide. One bit indicates the type of the signal (grant or block) and the other bit carries the information (set or not). After receiving requests from multiple routers, the arbiter grants one router and informs this router id
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
5
Fig. 3. Different steps of an application execution in an MP-SoC with proposed memory access control mechanisms.
Fig. 4. 4 × 4 mesh topology NoC with 4 shared memories and 4 RMANs. Apart from the last column 4 routers, every router is connected to a not shown PE. The same topology is used for central, hybrid, and local access control mechanism implementations after changing different modules (RMAN and routers) and connections between them. The connections between RMANs and memory connected routers are only present for central and hybrid cases.
Fig. 5. The router connections with 4 RMANs. For central case, the data links to RMANs are unidirectional. For local case, only roles are sent to RMAN instead of data and only grant signals arrive from RMAN instead of grant/block signals.
6
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
Fig. 6. RMAN block diagram with different modules. The number of Grant/Block signals and Request signals correspond to the number of routers connected. For local case, only roles arrive to RMAN from granted router in place of data and grant signal is sent from RMAN in place of grant/block signal. The connection to the memory connected router is also absent for local case.
to the manager module. Data arrives from the granted router to the comparator to be compared against the corresponding permission table entry. The comparator sends an allowed/not-allowed signal to the manager module after the comparison operation. If the result is not-allowed, the manager module sends a block signal to the granted router via the selector module. Otherwise, the comparator passes the packet to the memory connected router if the router is ready to receive. The sending of a grant signal, arrival of a packet to comparator, comparison operation and passing/blocking of packet—all these operations happen in one clock cycle. In case of blocking of packet, the manager module saves the malicious router address for later audit purpose by administrator. It can store the whole packet according to the audit requirement. A secure loading technique (authentication of administrator by hash based signatures) is available to upgrade or load permission table entries. Otherwise the trust of the system cannot be ensured. Currently, it is assumed that a signal reaches RMAN from a router or vice versa within 1 clock cycle. It must be noted that all the proposed modules operate with handshaking mechanism (e.g. request, grant/block, ready signals) and does not depend on the certain number of clock cycles required for signal propagation (1 in current case). This is true for all the three proposed mechanisms. We consider 1 clock cycle signal propagation to easily satisfy the signal integrity considerations. This may not hold true for larger fabric size (for example 10X10 NoC or larger). The signal integrity mainly depends on the frequency of operation of the MP-SoC. For a large clock period, the signal integrity may be satisfied (considering 1 clock cycle propagation) but the performance of the MP-SoC will be hampered. Higher frequency routers (e.g. pipelined routers) can also use the proposed access control mechanisms and signal propagation may take more than 1 clock cycles, provided the signal integrity (setup and hold times) is satisfied. The number of lines connected to an RMAN also increases with continuous increase in the number of routers. This may complicate the RMAN design both externally (number of ports) and internally (logic to handle all signals). Because of all these reasons, all the three access control mechanisms are not applicable in MP-SoCs
with very large number of PEs and shared memory modules. Fig. 7 shows a PE connected router architecture. The router has 4 input and 4 output ports connected to the 4 neighboring routers. There is an input and an output port connected to a PE. The routing table loader (RT Loader) module loads the routing tables from special configuration packets. The LMAN module is also shown with connections to an RMAN. For central case, permission update from RMAN and block update to RMAN are absent. In local case, there is no block signal from the RMAN to LMAN and LMAN sends role in place of data to the RMAN. The LMAN block diagram is shown in Fig. 8. It mainly consists of a memory module address store, a control module, and a data register. The comparator and the PPT are absent for central case. There is an interpreter that accepts the grant/block signal from RMAN and detects the type of the signal (grant or block). Then it forwards the signal either as a grant signal or a block signal. In case of multiple RMANs, multiple interpreters are present to accept multiple grant/block signals. The memory module address store is loaded with the memory module addresses (router addresses) present in the MP-SoC. LMAN receives packet from the eject port connected to the PE in parallel with the eject port FIFO. The parallel reception ensures that LMAN does not affect the normal packet flow in any manner unless the packet is going to memory. If the packet address is of a memory module, it disables the eject port FIFO of the router. Disabling of the FIFO ensures stopping of the particular packet routing to any other router. The control module sends a request signal to the RMAN for memory access. In case of multiple RMANs, it sends request signal to the particular memory module’s corresponding RMAN. Next the eject port FIFO contained copy of the packet is dequeued and the FIFO can accept new packets from the connected PE. This means, the PE can continue normal operation by sending non-memory packets to the other PEs. The stored packet in LMAN is sent to RMAN after granted by the same. The RMAN sends a block signal to the LMAN if the role corresponding to the packet does not have required permission. The LMAN can block the PE clearing any data present in the Eject port FIFO or it can send a warning signal to the PE about the mis-attempt. If the
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
7
Fig. 7. Router architecture including the LMAN with connections with an RMAN.
Fig. 8. LMAN block diagram with connections with an RMAN. For local case, Data register sends role to RMAN in place of data.
mis-attempt is unintentional, PE/application can correct it. If the attempt is intentional i.e. malicious, it will be unsuccessful again. For central case, memory connected router has one input port through which it receives packets from the RMAN to be sent to the memory. The router bypasses the free signal of the memory to the RMAN and also the data packet from the RMAN to the memory. All packets destined for the memory are diverted by the LMANs to the RMAN. It ensures that all packets from all the routers are properly checked in the RMAN before reaching the memory. When the memory replies, the reply packet traverses through the routers of the NoC only. The reason is that the response packet does not require any access control. Because the access is already granted to the destination PE/application and that is why it is going to receive the response from the memory.
4.2. Hybrid access control mechanism In hybrid case, the connections between the PE connected routers and the RMAN are bidirectional as shown in Fig. 4. The LMAN and the RMAN behavior for first memory going packet is same as central case. Only difference is that the hybrid RMAN also sends the packet’s corresponding permission table entry to the corresponding LMAN. An LMAN has a partial permission table (PPT) to hold the permission table entry. Next packet from the same PE is compared in the LMAN itself if the packet role remains same. Otherwise, the path through the RMAN to the memory is again taken and the LMAN PPT is updated. After comparison in the LMAN, an allowed packet is routed through the NoC routers towards the memory. For hybrid case, both RMAN and LMAN are responsible to block a malicious packet. Memory connected routers not only
8
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
receive packets from RMANs but from other routers also. All the three routers from a row send block update signal to the RMAN of the same row as shown in Fig. 4. If any malicious packet is detected in an LMAN, the concerned RMAN of that row can save the router address for audit purpose. The main components of RMAN are similar as central access case as shown in Fig. 6. The main differences are the permission update to LMAN and the block update from LMAN. The operations of different modules are also similar to central RMAN case. LMAN does not send the packet to the RMAN if it is compared locally in LMAN. In case of malicious detection, LMAN sends a block update signal to RMAN for audit purpose. Whether detected in RMAN or LMAN, the manager module of RMAN saves the malicious router address for later audit by administrator. For hybrid case LMAN, the main differences with central case are the presence of comparator module and the partial permission table (PPT) as shown in Fig. 8. Control module sends a block update signal to the RMAN if a malicious packet is detected in the comparator. If the PPT entry is empty or if there is a role mismatch between the packet and the PPT entry, the control module sends a request signal to the RMAN. Only the LMAN that has sent a packet (after receiving grant) to the RMAN is supposed to receive a PPT update. This is ensured by the control module of the particular LMAN by enabling the update reception port of the PPT. There is a high chance that the same PE/application is going to utilize the same role more than once consecutively for sending packets to the memory. In that case, the PPT entry will have the correct permission entry and the comparison operation will happen locally in LMAN. In this way, the arbitration delay of the LMANs, that really need to update their PPTs from the RMAN, gets reduced. The reason is that the arbitration delay depends on the total number of LMAN requests. Unlike central case, memory connected router in hybrid case receives packets destined for the memory not only from the RMAN but from other routers also. So, the port connecting RMAN with the router behaves like any other input port. Contrary to the central case, the input packet from the RMAN is not bypassed. 4.3. Local access control mechanism The connection between the RMAN and the memory connected router in Fig. 4 is absent in local case. The reason is that packets are checked for proper permissions in LMAN only. LMANs do not send any packet to the RMAN even for the first packet. If the LMAN PPT is empty or does not have the entry corresponding to the packet’s role, LMAN sends the role to the RMAN for the corresponding permission table entry. Next packet from the same PE is compared in the LMAN itself if the packet role remains same. Otherwise, the whole procedure of updating LMAN PPT from the RMAN is again taken. After comparison in the LMAN, an allowed packet is routed through the NoC routers towards the memory. For local case, only LMAN is responsible to block a malicious packet. That is why the grant/block signals of central and hybrid cases from RMAN are only grant signals (no block signals) for local case.
)
–
Table 1 An example of a permission table. Role Id
Multi range (2)
1st range
2nd-range (optional)
Permission
00 01 10 11
0 0 0 1
00 10 00 01
xx xx xx 00
10 01 11 10
The RMAN in local access case does not contain selector and comparator modules. The reasons are that the RMAN does not send block signal to LMAN and LMAN does not send any packet to the RMAN to be checked by comparator respectively. The connection with the memory connected router is also absent for local case. For local case LMAN, the interpreter module is absent and the grant signal from the RMAN is 1 bit wide. The interpreter is not required because there is only grant signal in place of grant/block signal from the RMAN. For local case, the LMAN sends role instead of data to the RMAN. For local case, the memory connected router receives packets destined for the memory from the other routers only. So, the memory connected router does not have any extra port connecting to the RMAN in the local case. 4.4. Permission table In the permission table, only those memory addresses are allowed whose leftmost few bits are equal to a particular value. If there are N CPUs trying to access memory, then there must be provision to allocate their data separately. So, we need at least N different memory ranges. Now, if the memory address is n bit wide, there are 2n locations in the memory. Then we can select a maximum number (say n m) that is power of 2 and follows the relation m ≤ 2N . So, if p is equal to log2 m, (n − p) bits can be used to distinguish different memory ranges. In every range 2p = m locations are present. Each PE can be allocated a range that is disjoint with any other memory range. But for shared memory architecture, two or more CPUs must have a common memory range so that they can communicate with each other by data sharing. More than one ranges for one PE can also be allowed and these ranges need not be contiguous. Extra bits per table entry are required to indicate if multiple ranges (non-contiguous memory ranges) are allowed for that entry. If allowed, these extra bits also indicate the number of ranges allowed. Table 1 shows an example of the permission table. Here, 4 roles are assumed which are represented by 2 bit role-id. The role-id is used to address a particular table entry. With the increase in required number of roles, the number of role-id bits also increases. Multirange option is present and supports maximum 2 memory ranges. The permissions are encoded as follows—read only (10), write only (01) and read and write both allowed (11). So, according to the table, the role-id 0 is allowed to access any memory address between (00)F and (3F )F (8 bit memory addresses) and from this range it is only allowed to read/load data. Also the role-id 0, 2, and 3 are sharing this same memory address range. An alternate
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
9
Table 2 Main differences between proposed three mechanisms. Central
Hybrid
Local
Access control decision taken Memory going packets travel
In RMAN Through RMAN
In RMAN or in LMAN Through RMAN or through NoC routers
In LMAN Through NoC routers
PPT is present
No
Yes
Yes
memory range representation is possible using start and end memory addresses. The number of bits required (8 + 8 = 16) is higher than the bits required (5) in Table 1 for representing a memory range. In spite of higher number of bits, the whole memory address representation gives flexibility to use any contiguous memory range. It does not have the constraint of same few leftmost bits in case of Table 1. The choice has to be made considering the memory range allocation requirement and the size of the table. In this paper, permission table representation of Table 1 is considered. 4.5. Walk-through example The main differences between central, hybrid, and local access control mechanisms are given in Table 2 summarizing the details given earlier. Next we provide a walk-through example using Fig. 4. Let us first consider the central case and we assume that a packet generated by the PE connected to the router03 wants to read a memory address located in memory3. The LMAN in the router03 detects the memory going packet in the eject port and sends a request to RMAN3. The RMAN3 sends a grant signal to router03 according to its arbitration policy. Then the LMAN in the router03 sends the packet to the RMAN3 where it is checked for required permission according to its role. Let us assume that this packet is legitimate and it is passed to router33 which in turn sends the packet to memory3. Memory3 returns the response packet to router33. The router33 then routes the response packet to router03 through router23 and router13 (following X–Y routing algorithm). For hybrid case, the same procedure is followed for the first memory going packet. In addition to that, the RMAN3 sends the permission table entry corresponding to the packet’s role to the LMAN in router03. If the next memory going packet from router03 has the same role, it is checked in LMAN and is not sent to RMAN. The allowed packet is routed through the NoC routers to reach router33 and then to memory3. In local case for the same example, the LMAN in router03 never sends packets to RMAN3. It sends the packet’s role to RMAN3 and receives the corresponding permission table entry. Next packet onwards this update is not required unless the packet generated by the connected PE has different role. For local case the allowed packet travels only through the NoC routers to reach memory. 5. Experimental results of central, hybrid, and local mechanisms We use the 4 × 4 mesh topology NoC architecture of Fig. 4 for our experimentations of the three access control mechanisms. Bluespec System Verilog (BSV) [23] is used
for implementation purpose. The 32 bit packet used in our experimentations has a separate role-id field apart from opcode, memory address, source and destination router addresses, and payload fields. The role-id field width depends on the number of roles permitted in the system. For N roles, log2 N bits are required in the role-id field to indicate every role separately. It is to be noted that the proposed mechanisms are not dependent on the number of roles permitted in the MP-SoC. For simulation, routers with 1 clock cycle latency following X–Y routing algorithm are used. This means that after entering a router, a packet can exit from an output port in the next clock cycle if there is no contention at the output port. 5.1. Memory access latency results We use artificial traffic generators (designed using BSV) to generate traffic. The generators are modified to generate only memory going packets. Warm-up period of 1000 clock cycles is considered, from 1000 to 0.1 Million clock cycles are considered as measurement period and from 0.1 to 1 Million clock cycles are considered as drainage period. Different average memory access latency variations for different MP-SoC simulation cases are shown in Fig. 9. The memory access latency denotes the time (number of clock cycles) required by a packet to reach memory after getting injected by a PE. Average memory access latency is calculated by averaging all the latencies at a particular injection rate for a particular simulation case. The packet travels through RMAN or through NoC routers depending on the simulation case. The simulations are done to show the overhead of the access control mechanisms on the memory accesses compared to a normal NoC. Both the best case and worst case scenarios of hybrid and local cases are considered to show the latency variations. The best case scenario happens when all the PEs/applications do not change their respective roles for the entire duration of the simulation. Worst case scenario happens when all the PEs/applications change their respective roles for every memory destined packets. This means that every LMAN PPT has to be updated for every memory destined packets. It is evident from Fig. 9 that average memory access latencies for central, best case hybrid, and best case local cases are similar to the normal NoC case for all injection rates. In fact, the central case is better than the normal NoC case up to 50% injection rate. The average memory access latencies for worst case hybrid, and worst case local access control mechanism are around 20% higher than the normal NoC case. The overhead is acceptable if we consider the increased security of the MP-SoC. So, if we consider the fact that the worst case condition is rare in normal operation, the proposed mechanisms ensure increased security without incurring memory access latency overhead. The
10
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
Fig. 9. Average memory access latency variations with respect to injection rates for different MP-SoC simulation cases. Table 3 Area of different modules for different access control mechanism cases. All the values are in µm2 .
RMAN PE connected router Memory connected router
Central
Hybrid
Local
386 5725 5408
430 5918 6181
347 5809 5447 (normal router)
main factor of the proposed mechanisms’ memory access latency is arbitration granting delay. This delay is the time duration for which a PE has to wait to get a grant from the RMAN. Matrix arbiter is used in RMAN to optimize this latency component. Other reasons for the comparable latency are the multi-hop delay and the congestion in NoC routers which are not present in case of packet flow through RMAN. 5.2. Area results Synopsis Design Compiler is used for synthesis purpose using 65 nm Faraday technology libraries. The area results of different modules for different access control mechanisms are given in Table 3. The memory connected router of the local access control mechanism is a normal router as indicated in the table i.e. without any changes introduced because of access control. This normal router’s area can be considered to compare against different router area increases for different access control mechanisms. The memory connected router area for hybrid case is 13.5% higher than the normal router. This is because of the extra port of the router to connect the RMAN. For the central case, the memory connected router area decreases 0.7% compared to the normal router. The reason is that there is no arbiter for the output port connected to the memory module because only RMAN can access the memory. The PE connected router area for the central case is 5.1% higher than the normal router because of the inclusion of the LMAN module. A PE connected router for both the local and the hybrid cases have PPT in LMAN. That is why
such router area for the local and the hybrid cases are 6.6% and 8.6% respectively higher than the normal router. The PPT in LMAN holds only one permission table entry from the RMAN permission table. The RMAN area mainly depends on the size of the permission table. Currently, the permission table holds entries for 4 roles. The permission table area will increase with increase in the number of roles. The advantage of using RMANs is not the small area because of small number of roles but the total area saving because of the only one copy of the permission table required in the MP-SoC (for 1 memory module). 5.3. Power results Synopsis Prime Time is used to get the total power values (i.e. summation of static power and dynamic power) using 65 nm Faraday technology libraries. The power results of different modules for different access control mechanisms are given in Table 4. Memory connected router for the hybrid case has an extra port and it receives memory destined packets from all ports. That is why the power consumption of this router is higher than the central and the local cases. The memory connected router for the central case only receives packets from the RMAN and consumes less power than the local case. The PE connected router for the central case does not contain PPT in LMAN and consumes less power than the hybrid and the local cases. Power dissipations in PE connected router for the hybrid and the local cases are similar in nature because of similar LMAN modules. The RMAN for the central case sends all the memory destined packets in MP-SoC to the
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
11
Table 4 Total power of different modules for different access control mechanism cases.
RMAN (in µW) Memory connected router (in mW) PE connected router (in mW)
Central
Hybrid
Local
80.23 0.1123 0.3697
67.38 0.4289 0.5725
57.86 0.2764 (normal router) 0.5378
memory connected router and consumes higher power than the hybrid and the local cases. The RMAN for the hybrid case sends only few memory destined packets to the memory connected router and consumes less power than the central case but more power than the local case. 5.4. Memory availability considerations In this subsection we discuss about the availability of memory in different scenarios assuming that the unauthorized access is not a problem any more. If a source (application) running in an MP-SoC tries to access memory, it can always access it if that is the only source in the MPSoC. One application may run various parts in different PEs. So from the application point of view, availability is 100% but from an individual PE’s point of view availability may not be 100%. Here, 100% availability means application or PE can access the memory all the time. In case of multiple memory access attempts from multiple PEs, a single PE may not be able to access the memory all the time because of the other PEs’ attempts. Even if other PEs are not trying to access the memory, their normal packets still interfere the memory going packets in the network. We assume different applications run on different PEs and one application does not have different parts running in different PEs. We represent a source by Xi with xi as its injection rate where i ∈ {1, 2, . . . , N }. N is the total number of sources in the MP-SoC trying to access the memory. Even if xi is not 100%, availability can be 100% if that source is the only source running in the MP-SoC. We propose a metric to measure availability in percentage as given below: av ailability =
Ri Gi
× 100%.
(1)
Here Gi is the number of memory going packets generated by Xi and Ri is the number of packets received by memory from Xi . If M memory modules are present then, av ailability =
R1i + R2i + · · · + RMi Gi
× 100%.
(2)
Let us assume that A and B are two resources in the MPSoC and Xi is trying to access B. But Xi can access B only by accessing A. So availability of B depends on availability of A for Xi . If Xj is also accessing A but not B, then the availability of B to Xi reduces. Here j ̸= i and j ∈ {1, 2, . . . , N }. In our case, A is NoC and B is memory. Network congestion (less availability of NoC) causes lower memory availability. Better memory availability can be achieved in central and some portions of hybrid case where RMAN is used to access memory. Though RMAN can be considered a resource itself but it is dedicated to memory only unlike NoC. So memory availability will not reduce by a source that is not trying to access memory. This gives advantage compared to normal
NoC, local, and some portions of hybrid cases where NoC is used to access memory. If a malicious source launches denial of service (DoS) attack by flooding NoC without any attempt to access memory or by sending legitimate packets to memory in large numbers, the packets cannot be stopped. But access through RMAN (central and some portions of hybrid case) can provide protection from DoS attack. Because in those cases memory availability does not change even if malicious sources try to generate more memory requests. Matrix arbiter in RMAN ensures that if a source receives a grant signal, it will not receive it again unless all other requesting sources get grants. In case of memory access through NoC, legitimate packets find it difficult to reach memory if malicious packets flood NoC resulting in reduction of memory availability. The memory availability simulations are done using the NoC of Fig. 4 designed using BSV. Warm-up period of 1000 clock cycles is considered, from 1000 to 0.1 Million clock cycles are considered as measurement period and from 0.1 to 1 Million clock cycles are considered as drainage period. We simulate and calculate availabilities for different memory access control scenarios using three packet generators as PE0, PE1, and PE2. The three PEs are connected to router00, router01, and router10 respectively in Fig. 4. We simulate using three PEs assuming that only three applications are running in the MP-SoC simultaneously which is a reasonable scenario. We also assume that all three applications are generating only memory going packets trying to access same memory bank (memory1). The memory availabilities are calculated using Eq. (1) for all scenarios. Figs. 10–12 shows memory availability variations of PE0, PE1, and PE2 respectively with respect to injection rates for different MP-SoC simulation cases. As expected the central and worst case hybrid scenarios have maximum availabilities for PE0 and PE2; even better than normal NoC. This is because of the use of RMAN to access memory instead of normal NoC routers. Worst case local scenario is the worst case among all scenarios because of its waiting time to get partial permission table update for every memory access attempt and the congestion in NoC routers. In case of PE1, best case hybrid and best case local have maximum availabilities and comparable to normal NoC. This is because of PE1’s location and use of X–Y routing algorithm in the NoC. PE1 can access memory without any contention from PE0 or PE2. But for all PEs availabilities decrease with continuous increase in injection rate. It is clear from the simulation results that the use of RMAN can provide higher memory availabilities in most of the cases compared to normal NoC. In other cases (PE1), proposed local and hybrid scenarios provide availabilities comparable to normal NoC. The simulation results clearly show the effects of proposed mechanisms on memory availability and help to compare
12
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
Fig. 10. Memory availability variations of PE0 with respect to injection rates for different MP-SoC simulation cases.
Fig. 11. Memory availability variations of PE1 with respect to injection rates for different MP-SoC simulation cases.
between different cases including normal NoC. They also show the effectiveness of our proposed metric of (memory) availability.
by minimal modifications to a massively parallel data flow architecture like REDEFINE. 6.1. Modified local access control mechanism design
6. Modified local access control mechanism Apart from all the advantages, all three access control mechanisms (central, hybrid, and local) have a drawback: they have a scalability limit. To solve even this drawback, we design a modified local access control mechanism. Modified local access control mechanism is highly scalable and it is implemented in a version of massively parallel data flow architecture called REDEFINE. Our intention is to show that our proposed mechanism is effective to enhance security of REDEFINE instead of proposing the REDEFINE architecture itself. Another intention is to show that the proposed mechanism can be easily implemented
We observe that high scalability can be ensured if the LMAN uses NoC itself for communication with RMAN. But blocking the PE’s outgoing packet for a long time (till the permission update arrives from RMAN through NoC) is not a good solution in terms of latency. The solution can be useful if the role itself remains same for the whole duration of execution. Here also the partial permission table has to be loaded the first time by requesting the RMAN through NoC. Instead of doing that we send the permission updates to modified LMANs before the start of normal execution and as a result the latency intensive update operation does not impact the normal execution time. We have
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
13
Fig. 12. Memory availability variations of PE2 with respect to injection rates for different MP-SoC simulation cases.
implemented the modified local access control mechanism in a version of reconfigurable data flow architecture called REDEFINE. 6.2. REDEFINE architecture and proposed modifications REDEFINE is a massively parallel reconfigurable dataflow architecture. A dedicated compiler compiles codes written in C to be run on REDEFINE. Currently REDEFINE works as a co-processor and is connected to the host via AXI bus interconnection. Host is responsible for loading applications and extracting results after execution completion. Since the first publication in [3], REDEFINE has evolved through continuous development [4–6]. We work with a multi-application version of REDEFINE. Throughout this paper, by REDEFINE we mean this particular version of REDEFINE. There are mainly three phases of operation: programming phase, execution phase, and end phase. During programming phase, applications are loaded by host. There is a specific input data structure that has to be followed for correct execution and it is generated by the REDEFINE compiler. After programming phase, host requests REDEFINE for a particular application’s execution along with input data loading. The multi-application option saves the number of clock cycles required for applications’ loading during programming phase. That means once programmed, the REDEFINE can execute any application (among the programmed applications) any number of times one after another without having it to load each time from host. Only execution phase has to be started and after end of an execution (end phase), REDEFINE notifies the host to collect the results. In this section we show that unauthorized access attack i.e. violation of confidentiality is possible against REDEFINE’s multi-application version and how our proposed modified local access control mechanism can protect from such attack. Fig. 13(a) shows the modified REDEFINE architecture. It has an Input/Output (I/O) controller that is connected to the host by AXI bus. There is a support logic around
NoC fabric whose main task is to help PEs to execute applications correctly. A honeycomb topology NoC fabric is also shown with torus links in Fig. 13(b). Eight data memory banks are shown with connections to both NoC via support logic and I/O controller. We have modified the I/O controller, support logic, and routers to implement our modified local access control mechanism. Modifications are mainly limited to inclusion of new modules without any major changes to the existing implementations. 6.2.1. I/O controller modifications A permission controller is included in the REDEFINE I/O controller. It receives permission updates from host and updates the permission table. Here we assume that the host has two different mode of operations : administrator and user. The administrator is responsible for loading permission table and the user uses REDEFINE coprocessor efficiently to execute some application. Loading of permission table is authenticated using hash based signatures similar to the central, hybrid, and local cases. It ensures that loading is secure and is done by the administrator only. There can be multiple users of REDEFINE coprocessor and one such user is the attacker. We also assume that the attacker wants to launch an unauthorized access attack against REDEFINE by loading malicious application and executing it on the REDEFINE. If an execution request is arrived from host, the I/O controller state is changed. Permission controller receives the I/O controller state which notifies the current state and the current application id. In current version of multiapplication scenario, an application is associated with one role only. Permission controller uses the associated role id to access the permission table and sends the permission update to modified router configuration module in support logic. Different connections of permission controller are shown in Fig. 14 and different working steps are shown in Fig. 15. If a malicious access attempt is detected by local access manager (LMAN) in router, a malicious reporting packet is generated by LMAN towards support
14
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
Fig. 13. (a) Block diagram of REDEFINE architecture and (b) REDEFINE NoC with our proposed modified local access control mechanism. Only the permission table is visible at this level among our proposed modules.
Fig. 14. Different connections of permission controller included in the I/O controller.
logic. Support logic then transfers this malicious report to permission controller in I/O controller for future audit by host (administrator). 6.2.2. Support logic modifications Fig. 16(a) shows the router configuration module already present in support logic of REDEFINE. Memory offset of a particular application arrives from I/O controller to be sent to NoC routers. After receiving the memory offset, control module changes state from normal to memory offset updation state. Next it directs the packet composer to form configuration packets with a special payload taken from the memory offset storage. We present a modified router configuration module as shown in Fig. 16(b). A permission update storage is included apart from the existing memory offset storage. Apart from the memory offset updation state, a permission updation state is also included in the control module. The control
module controls the flow of data from permission update storage and memory offset storage using the MUX. It also directs the packet composer accordingly to form and send configuration packets to NoC routers. It is to be noted that the configuration packet formation is secure from attacker. Because the modified router configuration module receives signals only from I/O controller and cannot be influenced by attacker from outside or using a malicious application. The working steps of the modified router configuration module are shown in Fig. 17. The modified portions of I/O controller and support logic together implement the RMAN module functionality of modified local access control mechanism. 6.2.3. NoC router modifications Fig. 18(a) shows the memory address updating module in REDEFINE router. Here a packet payload checker checks the input packets to identify memory offset carrying configuration packet. After detection of a configuration packet, memory offset is extracted from payload and is stored in memory offset storage. The packet payload checker modules operate in parallel with the normal packet processing sections like routing logic. This ensures that the normal packet flow latency is not incremented and performance is not hampered. The routers are connected in a honeycomb topology with four ports including eject port. Eject port input packet payloads are checked for memory load or store type and if found, memory address of the packet is modified by adding the memory offset in the
Fig. 15. Working steps of permission controller.
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
15
Fig. 16. (a) Router configuration module, and (b) modified router configuration module.
Fig. 17. Working steps of modified router configuration module.
Fig. 18. (a) Memory address updating module, and (b) Local access manager (LMAN) by modifying the memory address updating module in REDEFINE router.
packet payload checker and modifier module. The eject port input packet is then transferred towards the routing logic. We modify the memory address updating module to form the local access manager (LMAN) as shown in
Fig. 18(b). Here the partial permission table is updated if a permission update configuration packet is detected by any packet payload checker. The steps involved in updating both memory offset storage and partial permission table is shown in Fig. 19(a). The packet payload checker and
16
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
Fig. 19. (a) Working steps to update both memory offset storage and partial permission table, and (b) Working steps to update every memory going packet address and to check its permission at the eject port.
modifier first sends the modified load/store packet to control module instead of sending it to routing logic. The control module compares the packet’s new memory address and opcode (load or store) against the permitted memory range and permitted operation present in the partial permission table. The partial permission table always contains the permission update at the time of eject port input packet checking. This is because the configuration packets are sent and permission updates are stored before the start of normal application execution. If the packet is not malicious i.e. not trying to do any unauthorized access, the control module enables the packet payload checker and modifier to send the packet towards the routing logic. Otherwise the control module directs the malicious reporting packet composer to compose a packet to be sent to modified I/O controller via support logic for future auditing purpose. The packet payload checker and modifier cannot send any eject port input packet to routing logic unless enabled by the control module. Currently the malicious reporting packet consists of the current router’s address with a special payload. The steps involved in updating memory going packet address and checking its permission at the eject port is shown in Fig. 19(b). Our proposed modifications do not cause any latency increment of normal packet flow similar to memory address updating module in Fig. 18(a).
6.2.4. Memory operation modifications in multi-application REDEFINE architecture Each application is associated with a specific memory offset to enable multiple application execution one after another. Instruction and configuration memories also require offsets for each application. These offsets are loaded into the I/O controller at the time of programming. I/O controller sends appropriate offsets to instruction and configuration memories in support logic after reception of an execution request from host. The data memory offsets are sent to the LMAN in router using configuration packets. After loading of offsets, normal execution is started in PEs. For data memory operation, offsets are added to the data memory addresses in the packet payload checker and modifier module of LMAN inside router. This ensures that each application has its own memory range in each data memory bank. It is to be noted that our access control mechanism can handle multiple address ranges and the permission table example of Table 1 shows 2 address ranges (it can be more than 2). Similarly the modified local access control mechanism in REDEFINE also can handle multiple ranges but the multi-application version of REDEFINE does not support multiple ranges. Our intention in this section is to show that our proposed access control mechanism can be easily implemented in a massively parallel architecture and also to show its
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
effectiveness. Currently work is going on to improve the multi-application version of REDEFINE. Unauthorized access attack is possible in the multiapplication REDEFINE without our proposed modified local access control mechanism. For example, if legitimate application Y (with memory address range from A1 to A2 ) is executed first and a malicious application X (with memory address range from 0 to A1 ) is executed later, X can access data contents of application Y from the data memory. Application X can use large memory addresses that causes overflow of its memory range. It can simply load data from the range A1 to A2 and store them back in its own range between 0 and A1 . I/O controller can transfer the results to host from the appropriate data memory ranges after completion of execution. Our proposed modified local access control mechanism can protect from the unauthorized access attack. The LMAN inside routers ensure that an application is accessing from its own memory range and the operation is also as per the allowed operation at that memory address. The presence of LMAN inside routers prevents malicious packets (unauthorized access packets) to flood the NoC and protects from denial-of-service. The flooding of malicious packets cannot be stopped if the access control mechanism is present only at the destination (memory) instead of source (eject port of router). 6.3. Simulation and synthesis results and discussion We have simulated two applications on REDEFINE one after another with the first one legitimate and the second one malicious as mentioned in 6.2.4. Both the C source codes and their explanations are given next. Listing 1: Legitimate application source code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
unsigned i n t in [ 8 ] , out [ 8 ] ; i n t main ( ) { REDEFINE_input (& in ) ; out [ 0 ] = in [ 0 ] ; out [ 1 ] = in [ 1 ] ; out [ 2 ] = in [ 2 ] ; out [ 3 ] = in [ 3 ] ; out [ 4 ] = in [ 4 ] ; out [ 5 ] = in [ 5 ] ; out [ 6 ] = in [ 6 ] ; out [ 7 ] = in [ 7 ] ; REDEFINE_output(& out ) ; return 0; }
Listing 1 shows a legitimate C source code that can be compiled by REDEFINE compiler. Both REDEFINE_input and REDEFINE_output are REDEFINE compiler specific functions to indicate the input and output arrays. This application is executed first with a memory offset value of 128. Here the array in contains input data and the array out contains output data after successful execution. The input data is loaded after physical memory address 136 and the output result is stored after physical memory address 168. Both the addresses are given after addition of the memory
)
–
17
offset. The application contains eight memory loads from in array and eight memory stores to out array. These 8 load instructions and 8 store instructions are executed in REDEFINE PEs. After each execution, a memory going packet is generated that traverses the NoC. Our proposed modified local access control mechanism allows these packets after checking them at the eject port of the routers. Each array element is a word or 4 byte long and the data memory is byte addressable. So if in[0] accesses memory address 136, in[1] accesses memory address 140 (i.e. the address after one word). Listing 2: Malicious application source code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
unsigned i n t in [ 9 ] , out [ 9 ] ; i n t main ( ) { REDEFINE_input (& in ) ; out [ 0 ] = in [ 0 ] ; out [ 1 ] = in [ 3 2 ] ; out [ 2 ] = in [ 3 3 ] ; out [ 3 ] = in [ 3 4 ] ; out [ 4 ] = in [ 3 5 ] ; out [ 5 ] = in [ 3 6 ] ; out [ 6 ] = in [ 3 7 ] ; out [ 7 ] = in [ 3 8 ] ; out [ 8 ] = in [ 3 9 ] ; REDEFINE_output(& out ) ; return 0; }
A malicious C source code is shown in the Listing 2. This application is executed after the legitimate application with a memory offset of 0. The application execution only starts after the modified router configuration modules inside every router finish the configuration (i.e. the memory offset and permission updates corresponding to this application) after receiving configuration packets. The input data is loaded after physical memory address 8 and the output result is stored after physical memory address 48. Both the addresses are given after addition of the memory offset. Apart from the first load, all the remaining eight load operations access the first legitimate application’s input data and store them back into the malicious application’s output array in its memory range. This is because of the fact that in[32] is the 32nd word after in[0] and accesses the memory address (32 × 4) + 8 = 136. Without the presence of our proposed modified local access control mechanism, the second malicious application execution is successful. But after the inclusion of our proposed modules, the malicious application cannot execute successfully after the first legitimate load operation (i.e. in[0]). All the remaining 8 load instructions generate packets to load from the legitimate application’s memory range. These malicious load access packets are blocked by LMANs in the eject ports of the routers. The malicious reports are also received by the permission controller in I/O controller from the routers. We synthesize the router and the router configuration module in support logic before and after our modifications using Synopsys Design compiler. A 4 × 4 REDEFINE NoC
18
A.K. Biswas, S.K. Nandy / Nano Communication Networks (
)
–
Table 5 Comparisons with previous works.
Granularity of access control Router area Operating frequency
Our implementation in REDEFINE
[17]
[18]
[19]
Role based 46,477 µm2 625 MHz
PE based Not reported Not reported
Kernel/user per PE 126,500 µm2 a 1 GHzb
Multiple software stack per PE Not reported Not reported
a The number is obtained after taking (4 port router area + minimum NI area). Area is scaled down by 2 from the reported number to scale down from 130 to 65 nm technology node for proper comparison. b Frequency is scaled up by 2 from the reported number to scale down from 130 to 65 nm technology node for proper comparison.
is used for synthesis purpose. We use the 65 nm Faraday technology libraries for synthesis. The minimum operating clock period is 1.6 ns i.e. maximum frequency of operation is 625 MHz for both before and after our modification. That means inclusion of our proposed modules does not affect the REDEFINE operating frequency. The router area is increased from 45,957 to 46,477 µm2 i.e. an increase of 1.13%. The router’s power consumption is increased from 5.4 to 5.47 mW i.e. an increase of 1.3%. The router configuration module area is 15,163 µm2 and the modified router configuration module area is 15,368 µm2 i.e. an increase of 1.4%. The router configuration module’s power consumption is 3.27 mW and the modified router configuration module’s power consumption is 3.33 mW i.e. an increase of 1.83%. It is to be noted that the modified local access control mechanism adds a constant area overhead to REDEFINE router area and does not deteriorate the scalability property of REDEFINE architecture. Next, comparisons with previous works are given in Table 5. The main notable point is that our modified local access control mechanism employs RBAC model (i.e. role based access). Role based access control gives better granularity of access control compared to the previous works in [17–19]. In [18] maximum 10 PEs/memories are used to show results. So we use the router area corresponding to 4 × 4 NoC for proper comparison. The purpose of area and frequency numbers are to show that our access control mechanism implementation does not deteriorate REDEFINE router compared to other work. In fact the REDEFINE operating frequency remains same after our modifications. 7. Conclusion In this paper, we have proposed four memory access control mechanisms (central, hybrid, local, and modified local) using RMAN and LMAN for NoC based MP-SoCs. All four access control mechanisms are hardware implementations of RBAC model in MP-SoC. The mechanisms can easily be retrofitted onto an existing NoC architecture. The proposed mechanisms not only can detect and prevent unauthorized access attack but also can prevent denial of service attack. That means the proposed solutions can provide confidentiality and availability integrated together. The simulation results have shown that the average memory access latencies for the worst case hybrid and the worst case local access control mechanisms are around 20% higher than the normal NoC case. All other access control mechanisms have average memory access latencies similar to the normal NoC case for all injection rates. The experimental results have shown that the largest area increase of
a PE connected router (for hybrid case) is only 8.6% compared to a normal router area. Next the drawback of scalability limit for central, hybrid, and local cases are solved by the modified local access control mechanism. We have implemented this mechanism in a massively parallel data flow architecture called REDEFINE capable of multiple applications execution one after another. Apart from the simulation results, comparisons with previous works are given to show the effectiveness of proposed mechanism. The REDEFINE router area and power is increased 1.13% and 1.3% respectively without deteriorating maximum frequency of operation. This shows that all the 4 proposed access control mechanisms can provide security based on RBAC model without deteriorating the target NoC. Acknowledgments The authors would like to thank MHRD, Govt. of India for first author’s Ph.D. scholarship and Dr. Ranjani Narayan of Morphing Machines Pvt. Ltd. for invaluable discussions on REDEFINE architecture. References [1] S. Ravi, A. Raghunathan, P. Kocher, S. Hattangady, S ecurity in embedded systems: Design challenges, ACM Trans. Embed. Comput. Syst. 3 (2004) 461–491. [2] E. Chien, P. Szr, Blended attacks exploits, vulnerabilities and bufferoverflow techniques in computer viruses, White Paper, 2002. [3] M. Alle, K. Varadarajan, A. Fell, R.R. C., N. Joseph, S. Das, P. Biswas, J. Chetia, A. Rao, S.K. Nandy, R. Narayan, Redefine: Runtime reconfigurable polymorphic asic, ACM Trans. Embed. Comput. Syst. 9 (2009) 11:1–11:48. [4] K. Varadarajan, A coarse grained reconfigurable architecture framework supporting macro-dataflow execution (Ph.D. thesis), Indian Institute of Science, Bangalore, 2012. URL: http://cadl.iisc.ernet.in/ cadlab/resources/phd-thesis/. [5] M. Alle, Compiling for coarse-grained reconfigurable architectures based on dataflow execution paradigm (Ph.D. thesis), Indian Institute of Science, Bangalore, 2012. URL: http://cadl.iisc.ernet.in/ cadlab/resources/phd-thesis/. [6] A. Fell, RECONNECT: A flexible router architecture for network-onchips (Ph.D. thesis), Indian Institute of Science, Bangalore, 2012. URL: http://cadl.iisc.ernet.in/cadlab/resources/phd-thesis/. [7] F. Moraes, N. Calazans, A. Mello, L. Mller, L. Ost, Hermes: an infrastructure for low area overhead packet-switching networks on chip, Integr. VLSI J. 38 (2004) 69–93. [8] TPM, TPM Main Specification, Online, 2013. URL http://www. trustedcomputinggroup.org/resources/tpm_main_specification. [9] J.A. Halderman, S.D. Schoen, N. Heninger, W. Clarkson, W. Paul, J.A. Cal, A.J. Feldman, E.W. Felten, L est we remember: Cold boot attacks on encryption keys, in: In USENIX Security Symposium, 2008. URL: https://citp.princeton.edu/research/memory/. [10] A. Kumar, Discovering passwords in the memory, White Paper, 2003. [11] PKILab, TPM Reset Attack, Online, 2007. URL: http://www.cs. dartmouth.edu/∼pkilab/sparks/. [12] H.T. Wolde, German agency warns windows 8 pcs vulnerable to cyber threats, Online (Reuters), 2013. URL http://in.reuters.com/ article/2013/08/23/microsoft-germanyidINDEE97M00E20130823.
A.K. Biswas, S.K. Nandy / Nano Communication Networks ( [13] J. Coburn, S. Ravi, A. Raghunathan, S. Chakradhar, Seca: securityenhanced communication architecture, in: Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES’05, ACM, New York, NY, USA, 2005, pp. 78–89. http://dx.doi.org/10.1145/1086297.1086308. URL http://doi.acm.org/10.1145/1086297.1086308. [14] ARM, ARM Security Technology Building a Secure System using TrustZone Technology (white paper), ARM Limited, 2009. [15] Y.-Y. Chen, R. Lee, Hardware-assisted application-level access control, in: P. Samarati, M. Yung, F. Martinelli, C. Ardagna (Eds.), Information Security, in: Lecture Notes in Computer Science, vol. 5735, Springer, Berlin Heidelberg, 2009, pp. 363–378. http://dx.doi.org/10.1007/978-3-642-04474-8_29. [16] Y.-Y. Chen, P.A. Jamkhedkar, R.B. Lee, A software-hardware architecture for self-protecting data, in: Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS’12, ACM, New York, NY, USA, 2012, pp. 14–27. http://dx.doi.org/10.1145/2382196.2382201. [17] J.-P. Diguet, S. Evain, R. Vaslin, G. Gogniat, E. Juin, Noc-centric security of reconfigurable soc, in: Networks-on-Chip, 2007, NOCS 2007, First International Symposium on, 2007, pp. 223–232. http://dx.doi.org/10.1109/NOCS.2007.32. [18] L. Fiorin, G. Palermo, S. Lukovic, V. Catalano, C. Silvano, Secure memory accesses on networks-on-chip, IEEE Trans. Comput. 57 (2008) 1216–1229. [19] J. Porquet, A. Greiner, C. Schwarz, Noc-mpu: A secure architecture for flexible co-hosting on shared memory mpsocs, in: Design, Automation Test in Europe Conference Exhibition, DATE, 2011, pp. 1–4. http://dx.doi.org/10.1109/DATE.2011.5763291. [20] D.F. Ferraiolo, R. Sandhu, S. Gavrila, D.R. Kuhn, R. Chandramouli, Proposed nist standard for role-based access control, ACM Trans. Inf. Syst. Secur. 4 (2001) 224–274. [21] R. Sandhu, E. Coyne, H. Feinstein, C. Youman, Role-based access control models, Computer 29 (1996) 38–47.
)
–
19
[22] S. Osborn, R. Sandhu, Q. Munawer, Configuring role-based access control to enforce mandatory and discretionary access control policies, ACM Trans. Inf. Syst. Secur. 3 (2000) 85–106. [23] Bluespec, 2014. URL: http://www.bluespec.com. Arnab Kumar Biswas received his B.Eng. degree in Electronics and Communication Engineering from Burdwan University in 2008 with the University Gold Medal. Then, he received his M.Tech. degree in Microelectronics and VLSI from the Indian Institute of Technology Roorkee in 2011. His research interests are VLSI circuits and systems, I/O circuits, NoC architectures, and Secure systems. Currently, he is pursuing his Ph.D. in the Department of Electronic Systems Engineering of the Indian Institute of Science. S.K. Nandy is a Professor at the Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore. His research interests are in the areas of Low Power and High Performance Embedded Systems on a Chip, VLSI architectures for Reconfigurable Systems on Chip, and Architectures and Compiling Techniques for Heterogeneous Many Core Systems. Nandy received the B.Sc. (Hons.) Physics degree from the Indian Institute of Technology, Kharagpur, India, in 1977. He obtained the BE (Hons) degree in Electronics and Communication in 1980, M.Sc. (Engg.) degree in Computer Science and Engineering in 1986 and the Ph.D. degree in Computer Science and Engineering in 1989 from the Indian Institute of Science, Bangalore. He has over 150 publications in International Journals, and Proceedings of International Conferences.