Decentralized control for dynamically reconfigurable FPGA systems

Accepted Manuscript Decentralized control for dynamically reconfigurable FPGA systems Chiraz Trabelsi, Samy Meftali, Jean-Luc Dekeyser PII: DOI: Refer...

Download PDF

779KB Sizes 0 Downloads 212 Views

Report

PDF Reader
Full Text

Accepted Manuscript Decentralized control for dynamically reconfigurable FPGA systems Chiraz Trabelsi, Samy Meftali, Jean-Luc Dekeyser PII: DOI: Reference:

S0141-9331(13)00060-4 http://dx.doi.org/10.1016/j.micpro.2013.04.012 MICPRO 2043

To appear in:

Microprocessors and Microsystems

Received Date: Revised Date: Accepted Date:

27 August 2012 4 March 2013 24 April 2013

Please cite this article as: C. Trabelsi, S. Meftali, J-L. Dekeyser, Decentralized control for dynamically reconfigurable FPGA systems, Microprocessors and Microsystems (2013), doi: http://dx.doi.org/10.1016/j.micpro. 2013.04.012

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Decentralized control for dynamically reconfigurable FPGA systems Chiraz Trabelsi, Samy Meftali and Jean-Luc Dekeyser INRIA Lille Nord Europe - LIFL - Universite Lille1 Lille, FRANCE Email:{Firstname.Lastname}@inria.fr

Abstract—The progress in FPGA technology has enabled FPGA-based reconfigurable systems to target increasingly sophisticated applications, which has led to a high control design complexity, resulting in longer design cycles. In this paper, we propose a control design approach for FPGA-based reconfigurable systems aiming at increasing design productivity. This approach is based on a semi-distributed control model that splits different control concerns (monitoring, decision-making and reconfiguration) between autonomous modular controllers and makes use of formalism-oriented design, to decrease the control design complexity and facilitate design verification, reuse and scalability. This model is composed of distributed controllers handling the self-adaptivity of the system reconfigurable regions and a coordinator to respect the system global constraints. To enhance design productivity, the proposed control model is generated automatically using a high-level modeling approach. This approach is based on MDE (Model-Driven Engineering) and the MARTE (Modeling and Analysis of Real-Time and Embedded Systems) standard, allowing to make low-level technical details transparent to designers and to automate code generation from high-level models. Experiments on the generated control systems showed that the proposed semi-distributed control model is more flexible, reusable and scalable than the centralized one, at the cost of a slight increase in required hardware resources. Index Terms—Partial dynamic reconfiguration, FPGA, semidistributed control, centralized control, mode-automata, highlevel modeling, UML, MARTE

I. I NTRODUCTION Thanks to their ability to be reconfigured an arbitrary number of times, FPGAs (Field Programmable Gate Arrays) offer a high flexibility for modern embedded systems design. Classically, FPGAs are used to implement static designs that are not changed at runtime. In order to adapt to the changing execution conditions, such systems have to implement all the possible functionalities at the same time on the chip, which has two main drawbacks. The first one is that it leads to increased system size which might not suit FPGAs having reduced resources. The second one is that, even if the system is implemented on a large FPGA, this causes a waste of resources since not all the functionalities are required at the same time and leads thus to unnecessary additional power consumption. Therefore, dynamic reconfiguration appeared to handle this problem allowing the FPGA to be reconfigured at runtime in order to load different functionalities to adapt to runtime changes. However, this solution is based on the reconfiguration of the entire system, which leads to increased reconfiguration time overheads. Partial Dynamic Reconfiguration (PDR), sup-

ported by several FPGAs, offers more flexibility by allowing portions of the FPGA to be reconfigured at runtime, while the rest remains operating [1]. Such a powerful mechanism allows to enhance the overall performance by overlapping execution and reconfiguration of different portions of the system. The PDR approach divides the system into a static part and a dynamic part composed of Partial Reconfigurable Regions (PRRs). The content of such regions is modified at runtime through reconfiguration, by loading different configuration data (bitstreams). Reconfiguration control (bitstream loading control) can be done either by an external controller to the FPGA, or an internal one reducing communication latency. The progress in FPGA technologies has enabled to embed a growing number of computing resources on one chip targeting increasingly sophisticated applications. However, this has led to a growing design complexity since design tools do not evolve at the same pace as hardware technology, resulting in a productivity gap. One of the most complex design tasks for Reconfigurable Systems-on-Chip (RSoC) is the control design, since it has to handle different aspects related to runtime adaptivity. In this context, autonomy, modularity and formalism-oriented design can be viewed as an effective combination to deal with the growing RSoC design complexity. Indeed, splitting the control problem between autonomous controllers handling each the self-adaptivity of a reconfigurable component of the system allows to decrease their design complexity. In order to achieve self-adaptivity, each controller has to handle three main tasks: monitoring, decision-making and reconfiguration. Allocating each of these tasks to a separate module of the controller enhances design reuse and scalability. Using a design formalism allows also to decrease design complexity by abstracting the control problem and decreasing its dependency to the system implementation. It facilitates design verification, reuse and scalability resulting in shorter design cycles. Another solution to decrease design complexity is to raise the low-level technical details using Model-Driven Engineering (MDE) [2]. MDE is a high-level design approach that is suitable for Systems-on-Chip (SoC) co-design, allowing to model both software and hardware parts. In order to use MDE for a high level description of a system in a specific domain such as embedded systems, UML (Unified Modeling Language) [3] profiles are used. A UML profile is a set of stereotypes that adds specific information to a UML model

in order to describe a system related to a specific domain. Several UML profiles target embedded systems design such as the Modeling and Analysis of Real-Time and Embedded systems (MARTE) [4] profile, a standard profile promoted by the Object Management Group (OMG). In this paper, we propose a control design approach for FPGA-based reconfigurable systems aiming at increasing design productivity. This approach is based on a semidistributed control model. This model divides the control problem between autonomous controllers handling each the self-adaptivity of a reconfigurable region of the system through three major tasks: monitoring, decision-making and reconfiguration using three different modules. In order to respect global system constraints and correlations between reconfigurable regions, the controllers’ reconfiguration decisions are coordinated by a coordinator before launching reconfigurations. This two-layer decision-making is well adapted to singleFPGA, as well as multi-FPGA systems by implementing the coordinator on a master FPGA. The semi-distributed control is also well adapted to higher hierarchy control architectures by organizing the controllers into clusters coordinated by local coordinators and implemented either on the same FPGA or on different FPGAs. The proposed semi-distributed decisionmaking design is based on the mode-automata formalism, which allows to abstract the control problem and gives clear control semantics decreasing design complexity. Such a splitting of the control problem offers a high design flexibility facilitating reuse and scalability. In order to make low-level technical details transparent to designers and to automate code generation, we propose a high-level modeling approach. This approach allows to model FPGA-based systems integrating the semi-distributed control using a SoC design framework based on MDE and the MARTE standard. Using this approach, we generated automatically semi-distributed control systems. The generated code allowed to validate the proposed semidistributed control model and to show that it is more flexible, reusable and scalable than the centralized one, at the cost of a slight increase in required hardware resources. The rest of this paper is organized as follows. Section 2 gives a summary of the related works. Section 3 illustrates the proposed control model. Section 4 presents the high-level modeling approach through a video processing application case study. In section 5, the efficiency of the proposed model in terms of design productivity and resources and power overheads is evaluated. The last section concludes this paper and gives some future works. II. R ELATED WORKS Centralized control for reconfigurable FPGA-based systems was proposed in several works. Implementations can be divided into software and hardware ones. For the software implementation, the reconfiguration control is handled by one or more processors [5] [6]. In this context, the processor has two control tasks: reconfiguration decisions and bitstream loading through the configuration port such ICAP for Xilinx FPGAs [7]. Such a software implementation adds a non-negligible

execution overhead, which could make it not suitable for high performance applications. The hardware implementation of the control offers a solution to this problem. Indeed, hardware controllers allow to improve the control performance, compared to the software solution, thanks to the high parallelism offered by the hardware implementation [8]. However, it adds an area overhead whose impact is decreasing remarkably thanks to the increasing FPGA size. Another type of control implementation is to use an Operating System (OS). This OS has to offer additional services compared to the standard ones, such as spatio-temporal scheduling [9], the preemption and migration of the hardware tasks [10], and the communication between the hardware tasks [11]. Operating systems can be implemented on software or hardware. Extensions of standard OSs have been proposed in order to manage the reconfiguration [12] [13] [14]. The main drawback of such a software implementation is its execution time overhead. Several works implemented hardware operating systems, which allows to obtain better performances compared to the software solution [15] [16]. However, implementing the services of an OS in hardware leads to a considerable resources overhead. The main limit of the centralized control approaches is their high design complexity and rigidity, as well as their communication bottlenecks. A solution to these problems is to split the control between distributed controllers. Distributed control for FPGA-based systems adaptivity has been proposed by several works. In [17], a hardware controller was allocated to each reconfigurable region in order to control the tasks it runs throughout the application execution. However, reconfiguration decisions were dependent only on the task graph, and the correlation between regions was not treated. In [18], the distributed control was used for the reconfiguration of an organic computing system. However, this work focused more on the distributed access to the configuration port (ICAP), allowing to accelerate the reconfiguration process compared to the centralized access, without giving details about the used control components (monitoring, decisions, etc). In [19], the authors propose a general model of networked entities, handling each computing, monitoring, control and communication. Nevertheless, the decision of reconfiguring such entities is done in a centralized way. Distributed control was also used for multi-FPGA systems, which have started to gain interest and have been investigated in several works [14] [20] [21] [22]. The main limit of such works is that only one controller is used per FPGA, which implies a high design complexity of controllers. Various design approaches were proposed for distributed adaptivity control of non-FPGAbased systems [23] [24] [25]. Among these approaches multiagent systems [26] [27] [28], negotiation-based control [24] [29], and game theory models [30] [31]. To master design complexity, formal control models are important since they enable shorter design cycles by improving design reuse as well as offering formal verification. Mode-automata formalism [32], on which is based the semidistributed decision-making in our control model, is very suitable for the mode nature of most reconfigurable systems

external events Control model

Controlled region1

local monitored data

......

Controller1

Monitoring module

Monitoring module

monitoring data load(mode) Reconfiguration Decision module module loaded_mode

reconfiguration actions

Controllern

monitoring data load(mode) Reconfiguration Decision module module loaded_mode coordinator's suggestions/ responses

reconfiguration requests/ acceptance/ refusal Coordinator

local monitored data

Controlled regionn

reconfiguration actions Distributed monitoring Semi-distributed decision-making Distributed reconfiguration

Fig. 1: Overview of the proposed control model controller_decisioni (monitoring_datai, loaded_modei, coord_inprogress, coord_suggestioni, coord_decisioni) = (controlleri_request, controlleri_response, load_modei) coord_decisioni (modei j) = refusal conditioni (modei j) / send_decision (modei j)

Modei 1

coord_decisioni (modei j) = refusal coord_decisioni (modei j) = refusal conditioni (modei j) / send_decision (modei j) conditioni (modei j) / send_decision (modei j) Modei 2 Modei Mi loaded_modei = modei 1

loaded_modei = modei 2

coord_decisioni (modei j) = authorization loaded_modei = modei 2 / load _modei = modei j

coord_decisioni (modei j) = authorization / load _modei = modei j

coord_decisioni (modei j) = authorization / load _modei = modei j

* conditioni (modei j) / send_decison(modei j) <=> [ conditioni_request (modei j)= true / controlleri_request = modei j or condition i_acceptance (modei j)= true / controlleri_response = acceptance or condition i_refusal (modei j)= true / controlleri_response = refusal ]

Fig. 2: The controller mode-automaton

allowing them to switch between different modes, such as energy modes or display modes in video processing applications for instance. Such a formalism gives clear control semantics decreasing design complexity. It has been adopted as a specification language for control oriented reactive systems [33]. It has also been used to model control for FPGA-based reconfigurable systems [34]. However, the proposed models targeted a centralized controller. High-level modeling for reconfigurable systems was proposed in several works. In [35], the authors present a codesign methodology for reconfigurable MPSoC (Multiprocessor SoC), in order to generate reconfigurable embedded systems automatically using the MARTE UML profile. The proposed solution for control modeling is extremely simplistic, based on a UML state machine handling a repeated sequence of configurations for one reconfigurable co-processor. This solution is not suitable for more complex reconfiguration requirements. In [36], the authors propose a MARTEoriented design approach of dynamic reconfigurable FPGAbased RSoC using mode automata control for the Gaspard2 SoC design framework [37]. However, the proposed approach is based on a centralized reconfiguration controller for the whole FPGA, which makes it not suitable for large complex systems. In [38], the authors present a design approach for dynamic reconfigurable FPGA-based RSoC. This approach is based on the MARTE concepts for reconfiguration control such as mode and configuration concepts. However, it has the limit of remodeling the whole system for each possible configuration. There are also several other model-driven approaches for RSoC control that are not MARTE-oriented [39] [40]. The main drawback of the MDE approaches presented above is that they are limited to centralized control.

III. S EMI - DISTRIBUTED CONTROL MODEL The main objective of the control model proposed in this paper is to solve design problems related to design complexity, verification, reuse and scalability. The proposed control model targets FPGA-based systems implementing PDR. In this context, allocating a controller to each Partial Reconfigurable Region (PRR) of the system allows each region to be selfadaptive through an autonomous controller, which enhances the reusability of the controllers and the scalability of the control design targeting systems containing more reconfigurable regions. Each controller manages three major aspects: monitoring, reconfiguration decision-making and reconfiguration realization. In order to insure the flexibility of the controllers, they are designed in a modular way in order to facilitate their modification and reuse. Decision-making is the most complex task in the control model. In order to decrease such a complexity and to facilitate the reuse of decisionmaking components, we use a formalism-oriented design. The proposed control model combines thus three main key points to achieve design productivity: autonomy, modularity and formalism. A. Autonomous modular distributed controllers for selfadaptivity Each distributed controller is composed of three modules handling monitoring, reconfiguration decision-making and reconfiguration realization for a given reconfigurable region, as Figure 1 shows. The monitoring module collects information from the behavior of the controlled region and other external events such as those sent by sensors, for example. The monitoring data are sent then to the decision module, which makes reconfiguration decisions accordingly.

Each decision module makes local decisions about whether or not a reconfiguration of the controlled region is required. Due to the local vision of each controller, launching a reconfiguration of its controlled region without checking whether it can coexist with the current configurations of the other regions might result in problems such as safety problems or might not respect the control global constraints such as those related to performance, temperature, energy consumption, etc. Therefore, before launching a reconfiguration that it estimates required according to the monitoring data, the controller has to send a reconfiguration request to the coordinator. If the coordinator authorizes the requested reconfiguration, the decision module notifies the reconfiguration module in order to launch the required configuration. The role of the reconfiguration module is to apply reconfiguration actions on the controlled region, which consist in loading the required configuration data (bitstream) in the reconfigurable region through a configuration port, such as ICAP for Xilinx FPGAs. By using a reconfiguration module for each controller, partial reconfigurations can be launched in a distributed way, which reduces reconfiguration time for FPGA supporting parallel reconfigurations. After loading the required bitstream, the reconfiguration module notifies the decision module so that it updates its automaton’s current mode. B. Mode-oriented decision-making The proposed decision-making model is based on the modeautomata formalism [32]. Such a formalism decreases the control design complexity by dividing it according to different modes. A mode-automaton has clear and precise semantics allowing to facilitate design reuse. It is mainly composed of modes and transitions. It is basically an automaton whose states (modes) are labeled by data-flow programs. Transitions are associated with conditions, which serve to act as triggers. The proposed decision-making model is composed of the distributed controllers’ automata and the coordinator’s automaton. These automata communicate through coordination information whenever one of the controllers estimates that a reconfiguration of its controlled region is required. In this case, it sends a reconfiguration request to the coordinator and waits for its decision. If the request implies also the reconfiguration of other regions in order to respect the global system constraints, the coordinator sends reconfiguration suggestions to the concerned controllers. These controllers can accept or refuse those suggestions. After treating the controllers responses, the coordinator gives its decision, which can be either the authorization or the refusal of the requested reconfiguration. 1) The Controller mode-automaton: The decision module of each controller is modeled by a mode-automaton. Figure 2 shows an example of this mode-automaton. Each mode Modei j corresponds to a given configuration/mode of the controlled regioni , where i ∈ [1..n], n is the number of the system’s reconfigurable regions, j ∈ [1..Mi ] and Mi is the number of the modes/configurations of regioni . Here, we assume that each

coordinator({controlleri_request}, {controlleri_response}) = (coord_inprogress, {coord_suggestioni}, {coord_responsei}) {controlleri_request} != {} / coord_inprogress=true {controlleri_request} = {}

Idle

TreatRequests treat_requests()

involve_others=false /coord_inprogress=false, send_decision(), final_decision=null

final_decision !=null / coord_inprogress=false, send_decision(), final_decision=null

involve_others=true / send_suggestions()

TreatResponses treat_responses()

final_decision=null

Fig. 3: The coordinator mode-automaton

reconfigurable region has a set of configuration possibilities predefined at design-time. Inputs and outputs of the modeautomaton are shown in its header. Inputs are monitoring data (monitoring datai ) sent by the monitoring module, a reconfiguration success notification (loaded modei ) from the reconfiguration module, and coordination information from the coordinator. These information include a notification at the beginning/end of each coordination process (coord inprogress), reconfiguration suggestions (coord suggestioni ), and the coordinator decision at the end of a coordination process (coord decisioni ). Based on monitoring data, the current mode of the controlled region, and on coordination information (coord inprogress and coord decisioni ), the controller makes a decision on whether a reconfiguration of the controlled region is required. If yes, it sends a reconfiguration request to the coordinator (controlleri request ). During a coordination process, a controller might receive reconfiguration suggestions from the coordinator to which it can respond (controlleri response ) with acceptance or refusal depending on monitoring data. If the controller receives a reconfiguration authorization (coord decisioni (modei j ) = authorization) from the coordinator, it sends a reconfiguration command to the reconfiguration module (load modei ), indicating the mode to be loaded. If the coordinator refuses a reconfiguration (coord decisioni (modei j ) = re f usal), it cannot be launched because it cannot coexist with the current configurations of the other regions, according to the global system constraints as it will be explained later. 2) The coordinator mode-automaton: The role of the coordinator is to coordinate the reconfiguration decisions of the controllers in order to guarantee that the system configuration respects the global system constraints. For this, a table containing the allowed system configurations is used, according to constraints of safety, performance, consumption, etc. This table is determined at design time and can be filled manually by the designer or generated from high-level languages such as those based on contract mechanism [41]. We call this table GC (Global Configurations) shown in Figure 4, where each row corresponds to a global configuration, which is a combination of partial configurations of the reconfigurable regions. This table is defined as follows: GC[i, k] = modei ji.k ∀i ∈ [1..n], ji.k ∈ [1..Mi ] and k ∈ [1..K] where, modei ji.k is the mode of regioni corresponding to the global configuration k; n is the number of reconfigurable regions and of controllers; K is the

Global configurations

1

2

.......

K

Region1

mode1 j1.1

mode1 j1.2

.......

mode1 j1.K

Region2

mode2 j2.K

Reconfigurable regions

mode2 j2.1

mode2 j2.2

.......

.......

.......

.......

.......

Regionn

moden jn.1

moden jn.2

.......

....... mode2 jn.K

Fig. 4: Global configurations table (GC)

number of the global configuration possibilities, and Mi is the number of modes/configurations of regioni . The exchanges between the controllers and the coordinator are not continuous in time. They occur only when one of the controllers decides that a reconfiguration of its controlled region is required. This allows to reduce the impact of the communication latency inside the control model on the overall system performance. The coordination algorithm is executed using a three-mode automaton as shows Figure 3. The idle mode corresponds to a coordinator waiting for reconfiguration requests coming from controllers. The TreatRequests and TreatResponses modes correspond to a coordinator that is treating the controllers’ reconfiguration requests and responses to reconfiguration suggestions, respectively. The coordinator starts at the idle mode. Whenever it receives a reconfiguration request ({controlleri request } = 6 {}), it sends a notification to the controllers indicating that a coordination is in progress (coord inprogress = true), so that they stop sending reconfiguration requests. It moves then to the TreatRequests mode. Due to its local vision of the system, a request sent by a controller concerns only its controlled region, which corresponds here to a cell of the GC table. Note that the number of requests received at the same time depends on the instants reconfiguration decisions are made by controllers, as well as on the communication type between the controllers and the coordinator (through a bus, point-to-point, etc). In this paper, we used a point-to-point communication as will be shown in the case study. This implementation has the advantage of accelerating the coordination process offering the possibility to receive more than one request or response from controllers at the same time. Other communication architectures are still possible here by modifying the communication part of the coordinator without modifying the coordination algorithm. Once in the TreatRequests mode, the coordinator checks whether the configuration(s) requested by the controller(s) at a given time combined to the current configurations of the regions handled by other controllers exist as a global configuration possibility in the GC table. If no, it checks which other reconfigurations that, combined to the one requested, allow to obtain one or more global configurations that respect the global system constraints. Here, if more than one global configuration satisfy the requested reconfiguration(s), possibilities are ordered according to the control strategy/objective. In the present implementation of the coordinator, we order possibilities in a list according to the number of partial reconfigurations they require, and we give the highest priority to the global configuration that requires less partial reconfigurations

in order to minimize reconfiguration time. Different algorithms can also be used here to order configuration possibilities. In case the first possibility in the ordered list satisfies the requests and doesn’t require reconfiguration of other regions (involve others = f alse), the coordinator ends the coordination process (coord inprogress = f alse) and sends its decision ( f inal decision = authorization) directly to the controllers requesting the reconfigurations as shows Figure 3. Otherwise, the coordination process is divided into steps. Each coordination step is related to a possibility of the ordered list. The coordinator begins with sending reconfiguration suggestions to the controllers whose regions have to be reconfigured in order to move to the first possibility. Then it goes to the TreatResponses mode. If a coordination step ends with positive responses from all the concerned controllers, the coordination process ends with an authorization ( f inal decision = authorization). In this case, the coordinator notifies the controllers (coord inprogress = f alse) and sends its reconfiguration authorization to the controllers that requested the reconfigurations as well as the other concerned controllers, and goes back to the idle mode. Otherwise, the coordinator remains at the TreatResponses mode and considers the next possibility. The coordination ends with a refusal ( f inal decision = re f usal) if all the possibilities have been refused by the controllers. In this case, the coordinator notifies the controllers (coord inprogress = f alse) and sends a reconfiguration refusal to the controllers requesting the reconfigurations. Then it goes back to the idle mode. C. Advantages compared to the centralized model Thanks to the distribution of the control problem, the proposed control model provides a high design flexibility compared to the centralized control facilitating reuse and scalability. In the case of a centralized decision-making, the decision module of the centralized controller can be modeled by a mode-automaton, where each mode corresponds to a global configuration of the system (a combination of the configurations of the reconfigurable regions). Transitions from a mode to another depend thus on a global vision of the system, using monitoring data as well global constraints (the same as those used by the coordinator in the semi-distributed model). This makes the centralized decision module tightly dependent on the implemented system, which is an obstacle to design reuse. Indeed, when designers want to add other regions to a previous system design, the whole decision model has to be rewritten (both modes and transitions), because each mode has to take into account the new regions. On the other hand, with the semi-distributed decision-making model, adding new regions simply requires adding a controller for each new region. The monitoring and decision modules of the controllers can be reused easily since they depend only on the monitoring data related to the controlled region. The coordination algorithm doesn’t change. It has only to increase the number of controllers to be coordinated, and to modify the global constraints checked by the coordinator. This is simply done by modifying the GC table.

hrhc1:HFilter_RHC1

<> {direction=inout}

<> {direction=inout}

i :Interface PLB_interface1

PLB_signals

h1 :HFilter_PRR1

PLB_signals

PLB_interface2 <> {direction=out}

<> {direction=in}

hc1:HFilter_HW-C1 m :Monitoring

PLB_interface3 <> {direction=in}

monitoring_PLB_signals PLB_signals

PLB_interface4

<> battery_level {direction=out}

battery_level

battery_level battery_level

performance_level

d:Decision performance_level battery_level coordination_info

coordination_info

reconﬁguration_commands

coordination_info

r:Reconﬁguration reconf_PLB_signals

PLB_signals

reconﬁguration_commands

<> {direction=inout}

Fig. 6: The RHC modeling 1) Modeling and deployment UML + MARTE proﬁle

Application

Arcitecture Allocation Deployment

2) Model Transformations

Deployed MARTE

RTL

Deployment proﬁle

IP Library

Proﬁle Model

3) Code Generation C

VHDL Platform-dependent ﬁles

Code Model Transformation

4) Physical Implementation bitstream generation

uses used for

Fig. 5: FPGA-based systems design flow in the Gaspard2 framework IV. T HE HIGH - LEVEL MODELING APPROACH In order to maximize the efficiency of our control design approach in terms of design productivity, we integrate the semi-distributed control model in a SoC design framework named Gaspard2. Gaspard2 [42] is based on MDE and the MARTE standard allowing to move from high-level models to automatic code generation. Previous works on control modeling for reconfigurable systems based on this framework [36]

[34] have the drawback of being limited to centralized control. Our contribution to this framework is a modeling approach of FPGA-based reconfigurable systems integrating the proposed semi-distributed control. Thanks to this high-level approach, the low-level technical details of the control implementation is transparent to designers and the semi-distributed control systems are generated automatically, which allows for a high design productivity. The design flow in Gaspard2 follows several steps: 1) system modeling and deployment, 2) model transformations, 3) code generation, and 4) physical implementation, as shows Figure 5. More details about these steps can be found in [42]. Gaspard2 allows code generation targeting different languages such as Fortran, SystemC, OpenCL, C, VHDL, etc. Figure 5 shows the design flow of FPGA-based reconfigurable systems targeted by our approach. After model transformations, the code generation tool generates the necessary code in C and VHDL languages as well as FPGA-dependent files. The generated code can then be integrated into FPGA tools for synthesis, PAR (Placement and Routing) and bitstream generation. It can also be integrated into the IP library in order to be reused in other system designs at the deployment level. Our contribution in Gaspard2 allows to model controlled FPGA-based reconfigurable systems. However, the generation of entire systems is not yet fully supported. Nevertheless, the current Gaspard2 version allows the generation of control systems composed of distributed controllers and coordinators allowing to validate the proposed semi-distributed control model. In the rest of this section, we present our high-level modeling approach for FPGA-based reconfigurable systems integrating the proposed semi-distributed control through a video processing case study.

The considered video processing case study is based on a video downscaler application targeting a single-FPGA system. This downscaler transforms a video signal, which is expressed in Common Intermediate Format (CIF:352x288 pixels), into a smaller size video (132x128 pixels). This application is composed of two main tasks: a horizontal filter and a vertical filter. By applying the horizontal filter on the input frame, its size is reduced to 132x288 pixels. Then, by applying the vertical filter, we obtain the target size (132x128 pixels). Each filter is composed of a repetition of an elementary task. At each iteration of this task, a frame block is treated. In order to guarantee a high performance of the application, both filters are implemented in hardware using hardware accelerators. Each accelerator performs an elementary task of one of the filters. Using different parallelism degrees, each filter can be implemented by more than one accelerator. The input block sizes of the accelerators can also be varied in order to obtain different performance and power results. Indeed, using a bigger input block size for an accelerator reduces the execution time thanks to a higher hardware parallelism, at the cost of a higher resource overhead and thus a higher power consumption. The variety of parallelism possibilities of the considered application allows to test the scalability of our control model by varying the number of controllers, as will be explained in section V. In our case study, the objective of the control is to adapt the downscaler application to runtime changes in performance and power requirements. Each accelerator is implemented in a reconfigurable region that has three possible implementations using different input block sizes. Here, for both filters, the first version/mode (having the largest input block) corresponds to the highest performance but at the same time the highest power consumption. The third mode is the least performing but the least consuming. Our semi-distributed control model allocates a controller to each region allowing to switch different modes depending on requirements in terms of performance and power consumption. In the rest of this section, we detail our approach to integrate the proposed semi-distributed control in the RSoC design flow of Gaspard2. A. Application modeling An application is composed of hardware and software tasks. Each task is modeled by a UML component. Inputs and outputs of a task are represented by ports having the FlowPort stereotype of MARTE. More details about application modeling in Gaspard2 can be found in [42]. At the application level, we do not add extensions to integrate the semi-distributed control. So, we will describe briefly the components modeled at this level and that will be used later in the next modeling levels. Each reconfigurable task is modeled by a number of components, representing each a version of the related task. In our case study, the application is divided into hardware tasks executed by hardware accelerators and software tasks executed by the processor. Each hardware task corresponds to an elementary filter task executed by a PRR. Three versions

(HFilteri /V Filteri , i ∈ [1..3]) are possible for both filters, as we said previously. Software tasks are divided are composed of a number of static tasks and a reconfigurable task. Static tasks perform frame reading and building before and after the downscaler execution, respectively. The reconfigurable task is the task that sends input frame blocks to PRRs and retrieves output blocks afterwards. The configuration of this task depends thus on the current configurations of the PRRs whose inputs/output block sizes is different from a configuration to another. As we said previously, the coordinator guarantee that the system configuration respects global system constraints. In this case study, we suppose that global systems constraints require that all the regions have to implement the same mode number. This choice allows to test both acceptance and refusal in coordination processes by restricting global configuration possibilities. In this case, the reconfigurable software task, has three versions Downscaler i, i ∈ [1..3], where each Downscaler i communicates with PRRs implementing HFilteri and V Filteri . Note that HFilteri and V Filteri are the hardware tasks corresponding to PRR modes HFilter modei and V Filter modei respectively. B. Architecture modeling The architecture model contains the hardware system implemented on FPGA, on which the application will be running. The architecture modeling is also based on components. Here, we focus on the control integration at the architecture level. The proposed semi-distributed control is wholly designed in hardware in order to avoid the execution time overhead of the software implementation. Therefore, the distributed controllers’ modeling is done at the architecture level. For this, we introduce the concept of Reconfigurable Hardware Components (RHCs) in order to simplify the control design. Each RHC is composed of a PRR, a hardware controller (HwC) handling the self-adaptivity of this PRR, and an interface component allowing the communication between the HRC and the PLB bus (Processor Local Bus) [43], which is the system bus used here. Figure 6 shows the structure of one of the RHCs used for the horizontal filter. The rest of the RHCs follow the same concept. Each HW-C is composed of three components: monitoring, decision-making, and reconfiguration. In order to adapt the system to changes in performance and power requirements, each controller monitors two elements. The first element is the required performance level, which is a user input. The second is the battery level sent by a battery sensor. Performance levels are given through push buttons and handled as interruptions by the processor, which sends them then to controllers. Therefore, the monitoring component inputs are the PLB signals (allowing to extract the user performance level), and the battery level sent by the battery sensor, as shows Figure 6. The decision component’s inputs are the monitoring outputs modeled by the per f ormance level and battery level ports, and the coordination information through the con f iguration in f o port. After a reconfiguration is authorized by the coordinator, the decision component notifies the reconfiguration component

<< modeBehavior >> HFilterHW-C1

<< mode >> HFilter1_mode1 << TransitionAcceptance >> [battery_level / H1 >= b3,1 . FB / H1] << TransitionRequest >> [performance_level=1 and (battery_level / H1 >= b3,1 . FB / H1)]

<< TransitionAcceptance >> [true]

<< TransitionAcceptance >> [battery_level / H1 >= b2,1 . FB / H1] << TransitionRequest >> [performance_level=1 and (battery_level / H1 >= b2,1 . FB / H1)]

<< TransitionRefusal >> [battery_level / H1 < b2,1 . FB / H1]

<< TransitionRequest >> [performance_level=2 or (battery_level / H1 < a1,2 . FB / H1)]

<< TransitionRequest >> [performance_level=3]

<< mode >> HFilter1_mode2

<< TransitionRefusal >> [battery_level / H1 < b3,1 . FB / H1]

<< TransitionAcceptance >> [true]

<< TransitionRequest >> [performance_level=3 or (battery_level / H2 < a2,3 . FB / H1)]

<< TransitionAcceptance >> [battery_level / H2 >= b3,2 . FB / H1] << TransitionRefusal >> << TransitionRequest >> [battery_level / H2 < b3,2 . FB/ H1] [performance_level=2 and (battery_level / H2 >= b3,2 . FB / H1)]

<< TransitionAcceptance >> [true] << mode >> HFilter1_mode3

Fig. 7: The HW-C’s mode-automaton

through the recon f iguration commands port. As parallel reconfigurations are only possible for systems having more than one configuration port (ICAP for Xilinx FPGAs), such as multi-FPGA systems but not single-FPGA systems because current FPGAs have only one ICAP, in our single-FPGA system the reconfiguration will be realized in a sequential way. Each reconfiguration module contains a dedicated register indicating which mode is to be loaded in the controlled region. These registers are then read by a processor, which communicates with the ICAP in order to load the required bitstreams in the reconfigurable regions. Therefore, the reconfiguration component has two ports. The first one allows it to receive reconfiguration commands from the decision component and to notify this latter when the required reconfiguration has finished. The second one allows it to communicate with the processor as explained previously. The RHCs are then integrated in an architecture model containing other components. The architecture is not illustrated by a figure due to space limitation. The considered architecture contains four RHCs, where two RHCs implement the horizontal filter and the two others the vertical one. It contains also a coordinator, a Microblaze processor, a PLB bus, a hardware module emulating the battery sensor, and a UART allowing the user to follow the application and the control progress. In order to offer the possibility for the user to introduce the required performance level, the system contains push buttons and an interruption controller to handle the user commands as interruptions. The system contains also a SysACE controller for accessing the partial bitstreams placed in a Compact Flash, an ICAP controller to carry out partial reconfiguration, and the DDR3 memory to store the input and downscaled frames.

The compact Flash will also be used to store the images to be downscaled. This can be replaced by a camera connected to the FPGA in order to downscale its input video. The system contains also a TFT (Thin Film Transistor) controller in order to display images before and after downscaling. More details about the modeling of such hardware components using the Gaspard2 framework can be found in [42]. Our semi-distributed decision-making is easily adaptable to the Gaspard2 framework, since it is based on mode-automata formalism which is supported by the MARTE profile. In order to model the semi-distributed decision-making, we associate to the decision component of each HW-C a MARTE modeautomaton, where each mode corresponds to a given configuration/mode of the controlled RHC. Each mode represents thus a version of the hardware tasks mentioned previously. Provided that all reconfigurable regions have three configuration possibilities, each Hw-C uses a decision module modeled by a three-mode automaton. Figure 7 shows the modeautomaton of the HW-C modeled in Figure 6. Inspired from reactive mode-automata formalism [32], a mode in MARTE is an extension of the UML state, and mode behavior is an extension of the UML state machine. The mode-automaton has the modeBahavior stereotype, and each mode has the mode stereotype. Transitions have the modeTransition stereotype, and give the events triggering the transition and the activity that has to be carried out to move from a mode to another. In order to simplify the modeling of the Hw-C decisions, we extended the modeTransition stereotype using three new stereotypes shown in Figure 7. These stereotypes are added to the MARTE profile in order to facilitate the modeling of the decision module activities (requests, suggestion acceptance

and suggestion refusal) by making them implicit in transitions. The TransitionRequest stereotype indicates that the decision module sends a request to the coordinator asking for a reconfiguration to the target mode if the transition condition is valid. Likewise, TransitionAcceptance and TransitionRefusal stereotypes indicate that the decision module accepts or refuses the received reconfiguration suggestion to the target mode. Although these stereotypes are applied to transitions, they are not actual transitions to the target mode since actual transitions are only executed after the loading of the corresponding configuration (partial bitstream) as we explained in section III-B. Using these extensions allows to hide many implementation details (which are handled automatically by the model transformation and code generation tools) for designers. This simplifies the control design significantly. The designer has to mention only the transition conditions, since the transition activities are implicit (request, acceptance and refusal). The actual mode-automaton (implementing actual transitions) is obtained automatically by the code generation tool based on the concepts described in Figure 2. In our case study, the decision making process of each decision module is mainly based on the following rules, which are represented by the transition conditions in Figure 7: • Being at H/V Filter mode j1 , a controller decides that a reconfiguration to a less consuming mode H/V Filter mode j2 is required, only if the user requires a lower performance level, or the consumption constraints do not allow to stay at H/V Filter mode j1 , which is the case when the following condition is valid battery level/H j1 < a j1, j2 .FB/H1

•

(1)

as shows Figure 7, where H j (V j for the vertical filter) is the energy consumption per cycle of the controlled region’s mode H/V Filter mode j , battery level is the battery level sent by the battery sensor and FB is the energy of a full battery. This constraint allows to check whether the available energy is under a threshold (determined by a j1, j2 ) that allows to stay at H/V Filter mode j1 , taking as a reference the highest consumption (H1 ). In this case study, we take a1,2 = 75% and a2,3 = 75%.75%, which give the thresholds to move from H/V Filter mode1 to H/V Filter mode2 and from H/V Filter mode2 to H/V Filter mode3 respectively. FB = 1000J, H1 = 0.6nJ, H2 = 0.4nJ, H3 = 0.2nJ, V1 = 0.7nJ, V2 = 0.5nJ, and V3 = 0.3nJ. All numeric values are not shown in Figure 7 for a better readability. In order to move from a H/V Filter mode j2 to a H/V Filter mode j 1 that consumes more, it is necessary that the user requires a performance level that is higher than the previous one and that the consumption constraints allow to move to the target mode, which is the case when battery level/H j1 >= b j2, j1 .FB/H1

(2)

Note that the term b j2, j1 is greater than a j1, j2 from equation (1), in order to avoid that, once

in H/V Filter mode j1 , the consumption constraints in (1) lead the controller to decide to go back to H/V Filter mode j2 so soon, which would lead to an infinite loop. b j2, j1 has to be well chosen in order to avoid this problem. In our case study, we take a b2,1 = b3,1 = 75% + 5% and b3,2 = 75%.75% + 5%. • If the controller receives a reconfiguration suggestion from the coordinator it treats it as follows. If the suggestion requires to move to a less consuming mode, the controller accepts directly. Otherwise, the controller checks the consumption constraints in (2) in order to accept or refuse as shows Figure 7. The actual mode-automaton of the horizontal filter is illustrated in Figure 8. In this figure, the mode-automaton is described through three different control aspects (sending reconfiguration requests, treating the coordinator’s suggestions, and treating the coordinator’s response) for a better readability. The controller related to the vertical filter follows the same concepts. As for the coordinator modeling, it can be done by defining global system constraints using a contract-based language as presented in [34]. However, contracts cannot be modeled using MARTE. Therefore, MARTE has to be extended in order to handle further control needs. This extension is out of the scope of this paper. In the current implementation, the coordinator is written directly in VHDL and stored in the IP library in order to be attached later to the design at the deployment level. As we said previously, the application contains a reconfigurable software task, which sends input frame blocks to PRRs and retrieves output blocks afterwards. In order to handle the runtime-adaptation of this task, we use a software controller (SW-C). This controller is modeled by a mode-automaton associated to the processor. Each mode of the automaton corresponds to one configuration of the system. Here, we speak about a system configuration because in our case, in addition to software adaptation, the processor has to have a vision of the PRR configurations in order to launch the PRRs reconfiguration through the ICAP. As we said previously, we assume that global system constraints require that all the RHCs implement the same mode number. Therefore, there are three possible system configurations, which correspond to a three-mode automaton as shows Figure 9. The determination of the required system configuration will is handled by the code generation tool by generating the necessary code for the processor allowing to read reconfiguration registers, launching the required partial reconfiguration and to use the right version of the reconfigurable software task. As for the required partial bitstreams, they will be indicated later at the deployment level, so that the code generation tool inserts them in the software controller’s generated code. C. Allocation modeling The allocation model allows to allocate application tasks to architectural components, which are the processor and the PRRs. For this, Gaspard2 uses the MARTE allocate stereotype, which is an extension of the UML abstraction. In order

Controller's reconﬁguration requests coord_inprogress=false and refused_mode2 = false and performance_level=2 or (battery_level/ H1 < a1,2 . FB / H1) / request(mode2)

coord_inprogress=false and refused_mode2 = false and coord_inprogress=false and refused_mode3 = false and performance_level=2 and (battery_level / H2 >= b3,2 . FB / H1) performance_level=3 or (battery_level / H2 < a2,3 . FB / H1) / request(mode2) / request(mode3) loaded (mode2) = true loaded (mode3) = true HFilter_mode3 HFilter_mode2 loaded (mode2) = true loaded (mode1) = true

HFilter_mode1

coord_inprogress=false and refused_mode3 = false and performance_level=3 / request(mode3)

a)

coord_inprogress=false and refused_mode1= false and performance_level=1 and (battery_level / H1 >= b2,1 . FB / H1) / request(mode1)

coord_inprogress=false and refused_mode1= false and performance_level=1 and (battery_level / H1 >= b3,1 . FB / H1) ) / request(mode1)

Treating the coordinator's suggestions

coord_suggestion=mode1 and (battery_level / H1 >= b2,1 . FB / H1) coord_suggestion=mode2 and (battery_level / H2 >= b3,2 . FB / H1) coord_suggestion=mode3 /accept(mode1) / accept(mode2) / accept(mode3) coord_suggestion=mode1 and (battery_level / H1 >= b3,1 . FB / H1) ) loaded (mode3) = true loaded (mode2) = true HFilter_mode3 / accept(mode1) HFilter_mode2 HFilter_mode1 loaded (mode1) = true loaded (mode2) = true coord_suggestion=mode2 and (battery_level / H1 < b3,1 . FB / H1) )/ coord_suggestion=mode1 and (battery_level / H1 < b2,1 . FB / H1) coord_suggestion=mode2 and refuse(mode1) coord_suggestion=mode3/ accept(mode3) / refuse(mode1) b) (battery_level / H2 < b3,2 . FB / H1) / refuse(mode2) coord_suggestion=mode2 / accept(mode2)

Treating the coorinator's reponse

coord_response (mode2) = authorization coord_response (mode1) = authorization coord_response (mode2) = authorization / load(mode2) coord_response (mode3) = authorization / load(mode1) coord_response (mode3) = authorization coord_response (mode1) = authorization / load(mode2) / load(mode3) / load(mode3) / load(mode1) loaded (mode2) = true loaded (mode3) = true HFilter_mode3 HFilter_mode2 HFilter_mode1 loaded (mode2) = true loaded (mode1) = true coord_response (mode1) = refusal coord_response (mode3) = refusal coord_response (mode3) = refusal / / refused_mode1 = true coord_response (mode1) = refusal / refused_mode3 = true coord_response (mode2) = refusal refused_mode3= true coord_response (mode2) = refusal c) / refused_mode2 = true / refused_mode1 = true / refused_mode2 = true

Fig. 8: The mode-automaton to be generated for the horizontal filter’s controller << modeBehavior >> SW-C

<< conﬁguration >> System_Conﬁguration1

<< modeTransition >> [requiredConﬁguration=3] /ReconﬁgureToDownscaler3 << modeTransition >> [requiredConﬁguration=2] /ReconﬁgureToDownscaler2

r:ReadFrame

<< modeTransition >> [requiredConﬁguration=3] /ReconﬁgureToDownscaler3

<< mode >> System_mode1

<< mode >> System_mode2 << modeTransition >> [requiredConﬁguration=1] /ReconﬁgureToDownscaler1

d1:Downscaler1 R_out

R

G

G_in

G_out

G

B

B_in

B_out

B

<< mode >> System_mode3 << modeTransition >> [requiredConﬁguration=2] /ReconﬁgureToDownscaler2

b:BuildFrame

R_in

R

<>

<> {nature=timeScheduling} <> mb:MicroBlaze

<>

{mode =System_mode_1}

<< conﬁguration >> h1_1:HFilter1_Conf1

<< conﬁguration >> h2_1:HFilter2_Conf1

<< conﬁguration >> v1_1:VFilter1_Conf1

<< conﬁguration >> v2_1:VFilter2_Conf1

<< modeTransition >> [requiredConﬁguration=1] /ReconﬁgureToDownscaler1

Fig. 9: The mode-automaton of software controller

Fig. 11: Modeling of a system configuration

{mode = HFilter1_mode1} << conﬁguration >> HFilter1_Conf1 hh1:HFilter1

<> {nature=timeScheduling} <> h1:HFilter_PRR1

Fig. 10: Allocation of a hardware task on a PRR

to model the control at this level, we use the configuration MARTE stereotype. This stereotype is used to model PRRs configurations as well as system configurations. Figure 10 shows a part of the allocation model. It gives, as example, the configuration corresponding to the first mode of a PRR implementing the horizontal filter, which is HFilter1 mode1 in Figure 7. Here, the configuration contains the allocation of the hardware task HFilter1 (which corresponds to the first mode of the PRR) to the PRR HFilter PRR1. For this, the property

nature of the allocate stereotype is set to timeScheduling in order to indicate to the code generation tool that the PRRs will be reconfigured in order to implement different hardware tasks. In order to enhance design reuse, our approach reuses the PRR configurations when modeling system configurations. The configuration of Figure 10 is reused in Figure 11, which describes the system configuration corresponding to the first mode in Figure 9. As Figure 11 shows, each system configuration combines the allocation of the software tasks to the processor, and the PRRs configurations corresponding to the considered system mode. Reconfigurable software tasks are allocated using timeScheduling for the nature property of the allocate stereotype. Here, the reconfigurable software task used for the first system configuration is Downscaler1 described in section IV-A.

<> SwHFilterBlock1-C

<> Monitoring-VHDL

<>

<>

<< conﬁguration >> HFilter1_Conf1

<>

<>

<>

<>

<> hﬁlterBlock1.c

<> monitoring.vhd

<> h_1_1.bit

Fig. 12: Deployment of static components and PRRs

Region Region Region Region

1 2 3 4

Global configuration number 1 2 3 HFilter mode1 HFilter mode2 HFilter mode3 HFilter mode1 HFilter mode2 HFilter mode3 V Filter mode1 V Filter mode2 V Filter mode3 V Filter mode1 V Filter mode2 V Filter mode3

TABLE I: GC table for the 4-region system

physical implementation of FPGA-based systems integrating the semi-distributed control. V. E XPERIMENTAL RESULTS

D. Deployment modeling The deployment level allows to link existing IPs to the modeled components in order to add details allowing to get closer to the targeted platform. At this level, we use the Gaspard2 deployment profile. Stereotypes softwareIP and hardwareIP are used to indicate several platform-dependent attributes such as the implementation language and other parameters. The softwareIP stereotype is used for the application software elementary components. The hardwareIP stereotype is used for the application and architecture hardware elementary components. More details about Gaspard2 deployment can be found in [42]. All the elementary components of our design are deployed using the softwareIP and hardwareIP stereotypes, except for the PRRs since they have different implementations that will be switched at runtime. Existing code files are attached to elementary components using the codeFile stereotype, which is an extension of the UML artifact. We associate VHDL codeFiles to hardware IPs, and C codeFiles to software IPs. In order to deploy the PRRs, we extended the use of the codeFile stereotype in order to enable its association to the PRR configurations. Figure 12 gives a part of the deployment model handling the deployment of a software task, a hardware component (the monitoring component), and a PRR configuration. For the HW-Cs, we associate code files to the monitoring and reconfiguration components, whereas decision modules will be generated automatically from the modeled mode-automata. E. Code generation As we said previously, the generation of entire FPGA-based systems is not yet fully supported. Nevertheless, the current Gaspard2 version allows the generation of control systems composed of distributed controllers and coordinators allowing to validate the proposed semi-distributed control model. For this, we extended the existing UML two MARTE model transformation of Gaspard2 in order to take the extended mode-automata into account. A chain of extended model transformations is then carried out in order to obtain a RTL (Register Transfer Level) model, as shows Figure 5. This model contains technical details allowing the generation of the VHDL code of the control model using the code generation tool. Further extensions will be carried out in future works in order to enable the generation of the necessary code for the

In this section, we validate the generated semi-distributed control model through a simulation scenario. Then we evaluate its efficiency in terms of design complexity, reuse, scalability, and resource and power overheads compared to the centralized model. A. The semi-distributed control validation through a simulation scenario In our simulation scenario, we consider the control model used in section IV, which is composed of 4 controllers and a coordinator. The control model was simulated using ISE 12.4 of Xilinx. The first two controllers control regions HFilter PRR1 and HFilter PRR2, which implement the horizontal filter. The two others control regions V Filter PRR1 and V Filter PRR2, which implement the vertical filter. The coordinator’s GC table is described in Table I, indicating that all the regions have to implement the same mode number. As we said previously, the inputs of the control model are the available battery signal sent by the battery sensor, and the processor commands. These commands allow to send the user required performance level, read reconfiguration registers of the reconfiguration modules and notify them at the end of the reconfigurations. Since, in the current design flow, the processor and its C code are not generated, we simulate the processor communication with the control model using VHDL processes. These processes manage the PLB signals of the controllers in order to simulate the processor commands allowing to read reconfiguration registers and to send performance levels. For our simulation scenario, we assume that the processor reads the reconfiguration registers after each frame downscaling and that for global configurations 1, 2 and 3, the system downscales 10, 8 and 5 frames respectively. These values are used to determine the instants the processor reads reconfiguration registers. The battery sensor is simulated by a VHDL module, where the battery level decrementation step takes into account the energy consumption per cycle of the reconfigurable regions, which are the values used in Section IV-B. Figure 13 describes the simulation scenario represented by a chart, where the x axis describes instants when different events occur. These events are related to the current battery level, the performance level required by the user and the coordination processes. The y axis describes the available

Resource occupation Semi-distributed control model Centralized control model

Slice registers Slice LUTs Slice registers Slice LUTs

2 375 (0.12%) 474 (0.31%) 290 (0.1%) 286 (0.19%)

Number of controlled regions 4 6 8 744 (0.25%) 1112 (0.37%) 1479 (0.49%) 907 (0.6%) 1388 (0.92%) 2010 (1.33%) 434 (0.14%) 576 (0.19%) 720 (0.24%) 545 (0.36%) 736 (0.49%) 964 (0.64%)

10 1847 (0.61%) 2499 (1.66%) 862(0.29%) 1207 (0.8%)

TABLE II: Synthesis details of the semi-distributed and centralized control models battery level 100%.FB 1 80%.FB 75%.FB

1

2

(75%.75% +5%).FB.V2/V1 (75%.75% +5%).FB.H2/H1 75%.75%.FB.V2/V1 75%.75%.FB.H2/H1

2

3

3

3

3

2

3

0%.FB

0

simulation time

3

t1 t1 + p 1 t2 t2 + p 2 t3

performance level=1

t4

t5 t5 + p 3 t6 t6 + p 4

performance level=2

Fig. 13: Simulation scenario

battery energy with a precision of the thresholds used by the different controllers to make reconfiguration decisions. The chart points are labeled with numbers corresponding to the global configuration at different instants of the simulation. At t < t1 , the current global configuration is number 1. The user required performance level is 1. At t = t1 , the available battery energy (battery level) reaches 75% of a fully-charged battery (FB). In this case, all controllers send reconfiguration requests to the coordinator asking to move to H/V Filter mode2 as we have seen in Figure 8(a). The coordinator notifies the controllers that a coordination process is in progress. Then, it looks for the global configuration(s) that satisfy the received requests. Here, only one global configuration satisfies the requests, which corresponds to column 2 of Table I. Since the current coordination process doesn’t involve additional controllers to those that sent the requests, a reconfiguration authorization is sent to the controllers with a notification of the end of the coordination process. The decision modules of the controllers send reconfiguration commands to the reconfiguration modules asking them for loading H/V Filter mode2 as it was shown in Figure 8(c). Then, when the processor launches its read commands, it finds that the controllers require to move to H/V Filter mode2. After loading the required partial configurations, the processor notifies the controllers (loaded(mode2) = true in Figure 8(a)) so that they update the current modes of their mode-automata to be H/V Filter mode2. The whole process ends at t = t1 + p1 , by modifying the global configuration to 2. Note that p j corresponds to the coordination process number j including the time required to load partial bitstream if the reconfiguration has been authorized.

At t = t2 , the available battery is less than (75%.75%).FB.V 2/V 1. Controllers 3 and 4 send reconfiguration requests to the coordinator related to V Filter mode3. The coordinator sends then reconfiguration suggestions (moving to HFilter mode3) to controllers 1 and 2. These controllers accept the suggestions according to Figure 8(b). The coordinator authorizes then the reconfiguration to all the controllers. The whole process ends at t = t2 + p2 by modifying the global configuration to 3. At t = t3 , the battery is flat, so it moves to the charging mode. Since the regions implementing the horizontal filter consume less than the other regions, controllers 1 and 2 are the first to reach a battery threshold allowing their regions to move to mode2 if the performance level required by the user is 2. For this, we simulate the change of the user performance level to 2 at t = t4 . At t = t5 , the battery threshold is reached for Controllers 1 and 2 (battery level > (75%.75% + 5%).FB.H2/H1). In this case, controllers 1 and 2 send reconfiguration requests to the coordinator in order to move to HFilter mode2 at t = t5 . The coordinator suggests V Filter mode2 to controllers 3 and 4. Since the available battery doesn’t allow to move to V Filter mode2 (battery level < (75%.75% + 5%).FB.V 2/V 1), controllers 3 and 4 send a refusal to the coordinator. The coordination process ends at t = t5 + p3 with a reconfiguration refusal sent by the coordinator to controllers 1 and 2. Those controllers will not send requests to move to mode2 anymore because they will be refused. They can wait until the coordinator sends them reconfiguration suggestions to move to mode2. At t = t6 , there is enough battery to move to V Filter mode2. Controllers 3 and 4 send reconfiguration requests to the coordinator. The coordinator sends suggestions to controllers 1 and 2 to move to HFilter mode2. These controllers accept, and the whole process ends at t = t6 + p4 by modifying the global configuration to 2. B. Design reusability and scalability In order to evaluate the efficiency of our control model compared to the centralized one in terms of design reusability and scalability, we designed the semi-distributed and centralized control models for different numbers of controlled regions (up to n = 10 regions, where n/2 regions implement the horizontal filter and the rest the vertical filter). For the semidistributed control model, we used the high-level modeling approach to generate the control systems automatically. We started the semi-distributed model design with one controller

Resource overhead

for each type of filter. Later, we reused these controllers to compose bigger control models. As we said previously, the coordinator is not modeled using high-level modeling. So, for the coordination scalability, we modified its code manually. This modification consists only in modifying the number of coordinated controllers given as parameter to the coordinator, as well as the GC table. On the other hand, adapting the centralized controller to different numbers of regions was more complicated. The centralized controllers were implemented using one mode-automaton as explained previously. The controller was rewritten each time to adapt to the system implementation, which led to longer design phases.

VI.

CONCLUSION

In this paper, we propose a novel control design approach for FPGA-based reconfigurable systems aiming at increas-

1%

Distributed controllers' overhead in slice registers

0,5%

Distributed controllers' overhead in slice LUTs

0%

Coordinator's overhead in slice registers 2

4

6

8

10

Number of regions

Coordinator's overhead in slice LUTs

Fig. 14: Resource overhead variation of the semi-distributed model with the number of controlled regions

Power overhead (mW)

C. Resource and power overheads After simulating both control models, we synthesized them for Virtex6-xc6vlx240t in order to estimate their overheads in terms of hardware resources. Figure 14 shows that the overhead of the distributed controllers is linear with the number of reconfigurable regions, because as we explained in section V-B, the controllers are reused to move from a parallelism degree to another. The overhead of controllers is up to 0.57% of slice registers and 1.36% of slice LUTs, which is an acceptable overhead compared to the overhead of reconfigurable regions as presented in works such as [44] and [45]. The coordinator’s overhead is also linear with the number of regions. The main reason to this is that the implemented coordinator uses a point-to-point communication allowing to handle many requests/responses from and to the controllers at the same time, which increases the required resources with the number of distributed controllers. Using different communication types decreases the overhead of the coordinator at a cost of longer coordination processes. However, here also there is an overhead of the communication architecture (overhead of a bus, a NoC, etc.). Up to 10 controlled regions the overhead of the implemented version of the coordinator is acceptable (0.035% of slice registers and 0.29% of slice LUTs). Table II gives a comparison between the overhead of the semi-distributed and the centralized models. The semidistributed control model has an overhead that is almost twice the centralized model overhead for different numbers of regions. This difference of overhead is mainly due to the resources required for the coordination between controllers as well as to the higher modularity of the semi-distributed control. However, the overhead of the semi-distributed model (up to 2499 slice LUTs) is quite acceptable for large FPGAs such as Virtex-6 used here, and the even larger Virtex-7 [46]. The power overhead of the semi-distributed model is also almost twice the centralized model overhead for different numbers of regions for a frequency of 100MHz as shows Figure 15. Up to 10 regions this overhead does not exceed 5mW, which is considered as an acceptable consumption compared to the consumption of reconfigurable regions.

1,5%

5 4,5 4 3,5 3 2,5 2

Power overhead of the semi-distributed model

1,5 1 0,5 0

Power overhead of the centralized model

2

4 6 8 Number of regions

10

Fig. 15: Power overhead of both semi-distributed and centralized control models

ing design productivity. This approach is based on a semidistributed control model. This control model is well adapted to both single-FPGA and multi-FPGA systems. This control model is composed of distributed autonomous modular controllers controlling each the runtime adaptivity of a region of the system, and a coordinator. Each controller handles three tasks: monitoring, decision-making and reconfiguration. The role of the coordinator is to coordinate the decision of the distributed controllers in order to respect the global system constraints. The decision-making design is based on the modeautomata formalism allowing to decrease the its complexity and to facilitate its reuse. For a higher design productivity, we propose a high-level modeling approach allowing to model RSoC integrating the semi-distributed control. This approach is applied to the SoC design framework Gaspard2 based on MDE and the MARTE standard, which allows to make lowlevel technical details transparent to designers and to automate code generation from high-level models. The generated semidistributed control systems code was used to validate the proposed control model and showed that it is more flexible, reusable and scalable than the centralized one, at the cost of a slight increase in required hardware resources. As future works, we plan to integrate our control model in a whole reconfigurable system in order to evaluate more its efficiency through physical implementations. Our high-level approach will be also extended in order to generate controlled FPGAbased RSoC from high-level models automatically in order to maximize design productivity.

R EFERENCES [1] P. Lysaght, B. Blodget, J. Mason, J. Young, and B. Bridgford, “Invited paper: Enhanced architectures, design methodologies and cad tools for dynamic reconfiguration of xilinx fpgas,” in FPL, 2006, pp. 1–6. [2] Object Management Group (OMG), “Portal of the model driven engineering community,” http://www.planetmde.org. [3] ——, UML Infrastructure Specification, v2.1.2, http://www.omg.org/spec/UML/2.1.2/Infrastructure/PDF/, 2007. [4] ——, “Uml profile for modeling and analysis of real-time and embedded systems, version 1.1,” 2011, http://www.omg.org/spec/MARTE/1.1/. [5] R. Koch, T. Pionteck, C. Albrecht, and E. Maehle, “A lightweight framework for runtime reconfigurable system prototyping,” in IEEE International Workshop on Rapid System Prototyping. Los Alamitos, CA, USA: IEEE Computer Society, 2007, pp. 61–64. [6] B. Blodget, S. McMillan, and P. Lysaght, “A lightweight approach for embedded reconfiguration of fpgas,” in Proceedings of the conference on Design, Automation and Test in Europe - Volume 1, ser. DATE ’03, Washington, DC, USA, 2003, pp. 399–400. [7] M. Hubner, C. Schuck, and J. Becker, “Elementary block based 2dimensional dynamic and partial reconfiguration for virtex-ii fpgas,” in International Parallel and Distributed Processing Symposium, ser. IPDPS’2006, 2006, pp. 196–196. [8] E. Carvalho, N. Calazans, F. Moraes, and D. Mesquita, “Reconfiguration control for dynamically reconfigurable systems,” in DCI’04, 2004, pp. 405–410. [9] K. Bazargan, R. Kastner, and M. Sarrafzadeh, “Fast template placement for reconfigurable computing systems,” IEEE Design Test of Computers, vol. 17, pp. 68–83, January 2000. [10] L. Levinson, R. Manner, M. Sessler, and H. Simmler, “Preemptive multitasking on fpgas,” in Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines, ser. FCCM ’00, Washington, DC, USA, 2000, pp. 301–302. [11] T. Marescaux, A. Bartic, D. Verkest, S. Vernalde, and R. Lauwereins, “Interconnection networks enable fine-grain dynamic multi-tasking on fpgas,” in Proceedings of the 12th International Conference on FieldProgrammable Logic and Applications, ser. FPL ’02, London, UK, 2002, pp. 795–805. [12] N. Bergmann, J. Williams, and P. Waldeck, “Egret: A flexible platform for real-time reconfigurable system-on-chip,” in Proceedings of International Conference on Engineering Reconfigurable Systems and Algorithms, 2003, pp. 300–303. [13] M. D. Santambrogio, V. Rana, and D. Sciuto, “Operating system support for online partial dynamic reconfiguration management,” in International Conference on Field Programmable Logic and Applications, 2008, pp. 455–458. [14] V. Rana, M. Santambrogio, D. Sciuto, B. Kettelhoit, M. Koester, M. Porrmann, and U. Ruckert, “Partial dynamic reconfiguration in a multifpga clustered architecture based on linux,” in Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, 2007, pp. 1 –8. [15] P. Merino, J. C. Lopez, and M. F. Jaco, “A hardware operating system for dynamic reconfiguration of fpgas,” in Proceedings of the 8th International Workshop on Field-Programmable Logic and Applications, From FPGAs to Computing Paradigm, London, UK, 1998, pp. 431–435. [16] K. Danne, “Operating systems for fpga based computers and their memory management,” in Organic and Pervasive Computing Workshop Proceedings, ser. ARCS 2004, 2004, p. 41. [17] S. Toscher, T. Reinemann, and R. Kasper, “An adaptive fpga-based mechatronic control system supporting partial reconfiguration of controller functionalities,” in Proceedings of the first NASA/ESA conference on Adaptive Hardware and Systems, ser. AHS ’06, Washington, DC, USA, 2006, pp. 225–228. [18] C. Schuck, B. Haetzer, and J. Becker, “An interface for a decentralized 2d reconfiguration on xilinx virtex-fpgas for organic computing,” International Journal of Reconfigurable Computing, vol. 2009, January 2009. [19] J.-M. Philippe, B. Tain, and C. Gamrat, “A self-reconfigurable fpgabased platform for prototyping future pervasive systems,” in Evolvable Systems: From Biology to Hardware, ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2010, vol. 6274, pp. 262–273. [20] S. Muhlbach and A. Koch, “A dynamically reconfigured multi-fpga network platform for high-speed malware collection,” International Journal of Reconfigurable Computing, vol. 2012, January 2012.

[21] X. Y. Niu, K. H. Tsoi, and W. Luk, “Reconfiguring distributed applications in fpga accelerated cluster with wireless networking,” in International Conference on Field Programmable Logic and Applications (FPL), 2011, pp. 545 –550. [22] A. Akoglu, A. Sreeramareddy, and J. G. Josiah, “Fpga based distributed self healing architecture for reusable systems,” Cluster Computing, vol. 12, pp. 269–284, September 2009. [23] V. Vyatkin, M. Hirsch, and H.-M. Hanisch, “Systematic design and implementation of distributed controllers in industrial automation,” in IEEE Conference on Emerging Technologies and Factory Automation, ser. ETFA ’06, 2006, pp. 633 –640. [24] V. Douros, G. Polyzos, and S. Toumpis, “Negotiation-based distributed power control in wireless networks with autonomous nodes,” in IEEE 73rd Vehicular Technology Conference, ser. (VTC Spring), 2011, pp. 1–5. [25] C.-H. Yu, J. Werfel, and R. Nagpal, “Collective decision-making in multi-agent systems by implicit leadership,” in Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 3 - Volume 3, ser. AAMAS ’10, 2010, pp. 1189–1196. [26] K. Huang, S. Srivastava, and D. Cartes, “Decentralized reconfiguration for power systems using multi agent system,” in Annual IEEE Systems Conference Waikiki Beach, 2007, pp. 1–6. [27] M. Khalgui and H. M. Hanisch, “Reconfiguration protocol for multiagent control software architectures,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 41, no. 1, pp. 70–80, 2011. [28] A.Deshmukh, F.Ponci, A.Monti, L.Cristaldi, R.Ottoboni, M.Riva, and M.Lazzaroni, “Multi agent systems: An example of dynamic reconfiguration,” in Instrumentation and Measurement Technology Conference, Sorrento, ITALY, 2006, pp. 1172 –1177. [29] M. Dias and A. Stentz, “Enhanced negotiation and opportunistic optimization for marketbased multirobot coordination,” Robotics Institute, Carnegie Mellon University, Pittsburgh (Pennsylvania), Tech. Rep. CMU-RI-TR-02-18, 2002. [30] D. Puschini, F. Clermidy, P. Benoit, and G. S. L. Torres, “Temperatureaware distributed run-time optimization on mp-soc using game theory,” in IEEE Computer society annual Symposium on VLSI, ser. ISVLSI ’08, Montpellier, France, 2008, pp. 375 –380. [31] ——, “Adaptive energy-aware latency-constrained dvfs policy for mpsoc,” in IEEE International SOC Conference, ser. SOCC 2009, Belfast, Northern Ireland, UK, 2009, pp. 89–92. [32] F. Maraninchi and Y. Remond, “Mode-automata: a new domain-specific construct for the development of safe critical systems,” Science of Computer Programming, no. 46, pp. 219–254, 2003. [33] E. Borde, G. Haik, and L. Pautet, “Mode-based reconfiguration of critical software component architectures,” in Design, Automation Test in Europe Conference Exhibition, ser. DATE ’09, 2009, pp. 1160 –1165. [34] S. Guillet, F. de Lamotte, E. Rutten, G. Gogniat, and J.-P. Diguet, “Modeling and formal control of partial dynamic reconfiguration,” in International Conference on Reconfigurable Computing and FPGAs, ser. ReConFig’10, 2010, pp. 31–36. [35] J. Vidal, F. de Lamotte, G. Gogniat, J.-P. Diguet, and P. Soulard, “Uml design for dynamically reconfigurable multiprocessor embedded systems,” in Proceedings of the Conference on Design, Automation and Test in Europe, ser. DATE ’10, 2010, pp. 1195–1200. [36] I. R. Quadri, S. Meftali, and J.-L. Dekeyser, “Integrating mode automata control models in soc co-design for dynamically reconfigurable fpgas,” in International Conference on Design and Architectures for Signal and Image Processing, ser. DASIP ’09, 2009, pp. 73–80. [37] O. Labbani, J.-L. Dekeyser, P. Boulet, and E. Rutten, “Introducing control in the gaspard2 data-parallel metamodel: Synchronous approach,” in Proceedings of the International Workshop MARTES: Modeling and Analysis of Real-Time and Embedded Systems, 2005. [38] I. R. Quadri, A. Gamatie, P. Boulet, and J.-L. Dekeyser, “Modeling of configurations for embedded system implementations in marte,” in 1st workshop on Model Based Engineering for Embedded Systems Design, ser. DATE ’10, 2010, pp. 11–16. [39] S. Pillement and D. Chillet, “High-level model of dynamically reconfigurable architectures,” in Conference on Design and Architectures for Signal and Image Processing, ser. DASIP ’09, 2009, pp. 81–88. [40] R. Gumzej, M. Colnaric, and W. A. Halang, “A reconfiguration pattern for distributed embedded systems,” Software and System Modeling, vol. 8, pp. 145–161, 2009.

[41] G. Delaval, H. Marchand, and E. Rutten, “Contracts for modular discrete controller synthesis,” in Proceedings of the ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, ser. LCTES ’10, 2010. [42] A. Gamatie, S. L. Beux, E. Piel, R. B. Atitallah, A. Etien, P. Marquet, and J.-L. Dekeyser, “A model driven design framework for massively parallel embedded systems,” ACM Transactions on Embedded Computing Systems (TECS), vol. 10, no. 4, pp. 1–36, 2011. [43] Xilinx, CoreConnect Architecture, http://www.xilinx.com/ipcenter/ processor central/coreconnect/coreconnect.htm. [44] M. Rummele-Werner, T. Perschke, L. Brauna, M. Hubner, and J. Becker, “A fpga based fast runtime reconfigurable real-time multi-objecttracker,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2011, pp. 853–856. [45] J. Huang, M. Parris, J. Lee, and R. F. DeMara, “Scalable fpga-based architecture for dct computation using dynamic partial reconfiguration,” ACM Transactions on Embedded Computing Systems, vol. 9, no. 1, pp. 1–18, 2008. [46] Xilinx, “7 series fpgas overview, advance product specification, Tech. Rep. DS180 (v1.8), 2011.

Chiraz Trabelsi received the Engineer’s degree in software engineering from the National Institute of Applied Science and Technology, Tunis, Tunisia, in 2009. She is currently a third-year Ph.D student in computer science at Université Lille1, Lille, France. She is a member of the DaRT project at Institut National de Recherche en Informatique et Automatique (INRIA Lille-Nord Europe), France. Her research interests include reconfigurable systems, embedded systems and high-level modeling.

Professor Samy Meftali received both his Master of Science (1999) and his Ph.D. in Comoratory, Université de Lille 1 where he worked on heterogeneous multi language simulation platforms for embedded SoCs. He became associate professor at Université de Lille 1 in 2004. He worked then in LIFL laboratory and INRIA DaRT project. His research interests are mainly modelling, simulating and implementing massively parallel systems on partially and dynamically reconfigurable FPGAs. He obtained l'Habilitation à Diriger des Recherche (HDR) in 2010. He is currently leading several French and international research programs as ANR Famous and INRIA Med3+3, AUF. Professor Jean-Luc Dekeyser received his PhD degree in computer science from the university of Lille in 1986. He worked on high performance computing at Institute in Florida State University, then he joined in 1988 the University of Lille in France as an assistant professor. There he created a research group working on High Performance Computing in the CNRS lab. He is currently Professor in computer science at University of Lille and heading the DaRT Inria project. His research interests include embedded systems, System on Chip co-design, synthesis and simulation, performance evaluation, high performance computing, model driven engineering, dynamic reconfiguration, Chip-3D.