Microelectronics Journal 45 (2014) 311–324
Contents lists available at ScienceDirect
Microelectronics Journal journal homepage: www.elsevier.com/locate/mejo
Timing characterization and constraining tool Piotr Amrozik n, Andrzej Napieralski Department of Microelectronics and Computer Science, Lodz University of Technology, Wólczanska 221/223, 90-924 Łódź, Poland
art ic l e i nf o
a b s t r a c t
Article history: Received 7 February 2013 Received in revised form 19 November 2013 Accepted 27 November 2013 Available online 15 December 2013
This paper presents Timing Characterization and constraining Tool (TCT) that facilitates designing of modular reconfigurable Integrated Circuits (ICs) by supporting early constraint-based design space exploration and timing constraining. These steps of the design methodology are crucial from the perspective of quality of results and are not directly addressed by the synthesis tools used nowadays. Although the idea of TCT is presented here using one of the currently available logic synthesis tools as an example, it can be easily adapted for other ones. Such flexibility increase usability of TCT and makes it very helpful for scientists who look for new integrated architectures that utilize dynamically reconfigurable resources. & 2013 Elsevier Ltd. All rights reserved.
Keywords: Design automation Design methodology Integrated circuit synthesis Digital integrated circuits Reconfigurable logic
1. Introduction Unique features of Dynamically Reconfigurable (DR) circuits give opportunity to optimize performance (execution time of complex tasks and power dissipation) of integrated computational architectures due to better utilization of available hardware resources. For that reason integrated computational systems based on DR hardware are very attractive field of study for researchers [1]. Many of scientific publications show that such systems have great potential to compete with or even replace general-purpose processors in diverse implementations [2–11]. Most of the above-referenced systems employ off-the-shelf modular DR ICs, commonly known as Field-Programmable Gate Array (FPGA). It is reported that non-conventional use of these devices requires an effective solution for dealing with their reconfiguration time bottleneck. This goal can be achieved by utilizing sophisticated reconfiguration techniques, like multi-context dynamic reconfiguration [12], and different granularities of reconfigurable matrices. The problem is that the devices equipped with such advanced reconfiguration solutions and diverse granularities have never been available on the market. Therefore, researchers who work on new computational architectures based on DR hardware have to look for possibilities to realize their ideas as custom ICs [1,10,13]. Ubichip [13] constitutes an example of a modular DR IC developed for special scientific purposes. It was designed and fabricated for the PERPLEXUS project carried out under the European
n
Corresponding author. E-mail addresses:
[email protected] (P. Amrozik),
[email protected] (A. Napieralski). 0026-2692/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.mejo.2013.11.014
Commission's Sixth Framework Programme [14]. During implementation of Ubichip, the design team was forced to develop their own design flow which employs hierarchical approach along with cloning techniques [15]. This approach solves the critical problems related to Static Timing Analysis (STA) [16] and it enabled the designers to lead the project to the successful end. But one of the specific features of the developed flow is that it makes the design very sensitive to the quality of a floorplan. This is a typical problem for the hierarchical designs [17,18]. In case of FPGA-like architectures most part of a chip floorplan is composed of a matrix of identical Reconfigurable Module (RM). Since RMs constitute partitions to be cloned in the chip layout [15], the size and shape of these logic blocks significantly affect the design – its timings, size and power consumption. Therefore, the designers need something more than general early chip estimations in such cases. They have to explore the design space of RMs before the start of the floorplan preparation in order to carry out the whole process effectively. The early constraint-based design space exploration of a logic block is an iterative process, which consists in determining design space points by manipulating timing constraints [16,19]. Each point is composed of three values: worst path delay, silicon area utilization and power consumption of the circuit. The result of this process is a set of design space points with timing constraints correlated to them. Determination of the worst path delay (timing characterization) of an RM requires analysis of path delays in relation to each of its configuration modes. It can be realized with every kind of simulations which allow collecting data on module timings, for example with back-annotated functional simulations or STA. The first approach seems to be a natural candidate for this purpose.
312
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
However, it requires additional effort from designers (i.a. preparation of appropriate testbenches) and it has to be done using dedicated software, outside of the synthesis tools. Therefore, back-annotated functional simulations are not suitable for early constraint-based design space exploration of the design. Fortunately, STA, which is used by every synthesis tool, can be employed for this purpose as well. Such approach gives significant advantage since it allows performing timing characterization of a logic block during its synthesis. In consequence, it enables to explore a design space of the module in one logic synthesis session. Timing characterization with STA of an RM requires detailed analysis of reports produced by STA what is a quite laborious task. Moreover, the architecture of reconfigurable logic blocks causes serious problems for STA engine [15]. These obstacles can be overcame by appropriate timing constraining of the circuit. Therefore, the matter of timing constraining of RMs for their timing characterization with STA is very important and requires careful consideration – it will be discussed in detail further in this paper. Concluding, an early constraint-based design space exploration and timing constraining of RMs are crucial moments during the logic synthesis stage of modular reconfigurable IC design. These steps of the methodology are very time-consuming and quite difficult even for experienced designers. Unfortunately, up to now, there have not been any Electronic Design Automation (EDA) tools that support designers in that field. Constant interest of researchers in integrated DR systems and authors' experience gained during the design of Ubichip became the motivation for starting the work on Timing Characterization and constraining Tool (TCT) – the tool which is intended to make the designers' life easier and encourage researchers to implement custom modular DR ICs. In the following part of this paper the idea of TCT is presented. In Section 2 the general assumptions regarding the tool are listed and discussed. This section is an introduction to the description of TCT. Section 3 addresses the timing constraining issues of RMs. It presents single-mode and multi-mode timing constraining techniques and their influence on results produced by STA from the perspective of RM timing characterization. Section 4 contains detailed description of TCT and its influence on the design methodology of modular reconfigurable ICs. In Section 5 the results of TCT verification are presented and discussed. Finally, at the end the conclusions are enclosed.
2. TCT – general assumptions Taking into consideration problems related to the modular reconfigurable IC design methodology and the state-of-the-art EDA tools, the authors assumed that TCT should: 1. combine logic synthesis with design space exploration of an RM; 2. not be a stand-alone application, but extend capabilities of the existing logic synthesis tools, by cooperating with them to cover the early stage of the design methodology; 3. operate relying only on data carried by standard technology libraries and input data provided by designers; 4. automatically generate a set of timing constraints in a commonly used industrial format like Synopsys Design Constraints (SDCs) [19] for each of determined design space points; 5. automatically explore the design space by manipulating of the timing constraints; 6. provide reports on the design space exploration results including timing characterization; 7. generate results in a reasonable period of time.
Commonly used today logic synthesis tools, like CADENCE Encounter RTL Compiler [20], provide commands giving access to the design database created during synthesis session. Therefore, development of a new logic synthesis tool in order to enable constraint-based design space exploration of RMs is not necessary. It is better to implement a custom tool based on Tool Command Language (TCL) that utilizes access to the design database and in this manner support reconfigurable circuit design (as it is stated in points 1 and 2). Regarding the 3rd point, the input data for the tool should be simpler and easier-to-prepare than a set of timing constraints in e. g. SDC standard. Thus, it should rely on designer's knowledge about RM architecture and do not require detailed knowledge of timing constraints. Timing constraints have direct impact on synthesis results [16], so the designer must be sure of their reliability. Furthermore, the timing constraints have to be set using appropriate technique which enables exploitation of STA for the early constraint-based design space exploration (particularly timing characterization). In order to perform this task, the tool should automatically operate the timing constraints. Moreover, it should save them for each of designated design space points using commonly used format (according to the points 4 and 5). SDC format is most suitable since it is a well-known industrial standard exploited by IC EDA tools of most vendors. According to the point 6, the results of the design space exploration should be reported using format which is readable by human and which can be easily converted into graphs. It is important since TCT is intended to give the designer a clear view on the design from the perspective of a floorplan. The tool should also generate the results in reasonable time. Manually performed design space exploration can take up to a few person months (depending on design team experience and complication level of the design). Therefore, shortening the time needed for this operation to several hours would be a great achievement. Moreover, if the results can be produced quite fast and the input file is not complicated to prepare, the tool can be used for early predictions which simplifies the choice of a fabrication process and facilitate fulfillment of the design specification.
3. Constraining methods of RMs To start the discussion, let us analyze the simple RM depicted in Fig. 1. It can be considered as a typical reconfigurable logic block similar in its structure to ones used in available FPGA devices. This simplification is introduced to facilitate the discussion by limiting the number of timing paths to be taken into account. It does not omit any important issues from the perspective of timing constraining. The considered RM has two input ports: IN and MODE, and one output port called OUT. From the module behavior point of view, IN and OUT ports belong to data-flow paths. Through the MODE port, the configuration bit can be stored in the module
Fig. 1. Example of a Reconfigurable Module (RM).
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
configuration memory – represented in this case by the ordinary Flip–Flop (FF), called FFc. The configuration mechanism of this RM (that is, the manner in which the FFc is enabled to store the bit) can be also omitted here, because it is only an additional logic which has no influence on setting of timing constraints. The example of an RM can be configured to operate in one of two sequential modes. In the first mode the MODE signal is logic-0 and a data-flow path is as follows: from port IN through combinational logic 1, first MUX, FFd, combinational logic2, second MUX to output port OUT. In the second mode, the MODE signal is logic-1 and a data-flow path as follows: from input port IN through combinational logic 3, first MUX, FFd, combinational logic 4, second MUX to output port OUT. It is also assumed that combinational logic 1 and 2 include more logic levels than 3 and 4 (it is marked by different sizes of blocks in the figure). In fact, this assumption causes that the first mode is slower (introduces bigger timing delays) than the second one. 3.1. Single-mode technique Suppose that the RM is constrained in a typical manner. That is, the clock declaration is made, and input and output delays are defined. STA will then recognize the timing paths from A to G. This is illustrated in Fig. 2. As it was mentioned earlier, from a perspective of the data flow during normal operation of the module, the only true paths are A and B or C and D, according to the state of FFc (logic-0 or logic-1 respectively). All remaining paths are functionally false. Obviously, STA takes into account all the paths of the circuit regardless if they are functionally or logically false or not. This scenario induces three main problems. Firstly, the designer does not differentiate operating modes by means of timing constraints. This causes that the timing specification of the RM cannot be directly reflected by timing constraints. Moreover, since STA does not take into account module functionality, it has to workaround the combinational loops, which are likely to exist in the RM [15]. Secondly, the designer cannot force the logic optimization tools to put the same effort on interesting parts of the circuit. Logic optimization tools use STA to determine paths (parts of logic) which should be optimized e.g. regarding the logic delay. Bearing in mind that the first mode utilizes most of the module resources, it is clear that the worst path belongs to path A or B, so paths from these groups will mainly be optimized. This situation causes that logic related to remaining part of the circuit would not be optimized with the same effort as logic having the worst path delays. Direct consequence of such a scenario is that the second mode and configuration process would not be as fast as it could be. Thirdly, the timing reports produced by STA, although they are complete, do not provide direct information on timing characterization of the module. That is, the designer does not obtain the information of operation speed of the circuit in a given operating mode. Theoretically, getting such information is possible – it would require deep analysis of these reports. Nevertheless, since
Fig. 2. Example of a Reconfigurable Module (RM) – single-mode constraints.
313
these reports include typically thousands of paths, such an attempt is totally impractical. The problem of too general timing reports causes that STA-based timing characterization of the module is almost impossible to be performed. It must be mentioned here that there is a possibility of reducing the second and third problem with the cost groups. The cost groups are introduced e.g. in CADENCE RTL Compiler, to address issues related to problems with distribution of optimization effort across the logic. The cost groups enable to collect selected paths and make the optimization engine to put the same (or directly specified) effort for every defined cost group. They also allow producing timing reports which take into account only paths from a given cost group. The cost group approach requires setting additional constraints on selected path groups. Because typical RMs are very complex and many paths are shared among the operating modes, selecting the paths is too prone to errors when done manually. Therefore, it is not an effective solution for the considered problems. Concluding discussion on single-mode timing constraining, it is worth to be emphasized that as far as the timing information still can be extracted from timing reports by some additional manipulations, the loss on constraints handling and especially on logic optimization cannot be neglected.
3.2. Multi-mode technique Suppose now that the RM is constrained in a multi-mode manner, that is with set_mode SDC command, and every operating mode is represented by its own clock, and input and output delay definitions. Moreover, the FFc output port, by the set_case_analysis command, has been assigned a constant according to its actual value – logic-0 for the first mode, and consequently logic-1 for the second. This constant now propagates through the logic excluding undesired timing paths. The result of constraining the example circuit in this manner is illustrated in Fig. 3. STA is considering now paths A, B and E for the first mode, and C, D and E for the second one. The path E, related to the configuration logic, is still active and must be disabled. This can be achieved by the set_disable_timing command which disables timing arcs of the FFc, eliminating the endpoint for the path E. After the specified operations STA will process the paths A and B for the first mode, and C and D for the second mode, what is in accordance with actual data flow in the circuit in a given mode.
Fig. 3. Example of a Reconfigurable Module (RM) – multi-mode constraints.
314
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
The presented multi-mode constraining technique resolves all three problems described in the previous section. It gives the designer precise control over timing constraints. Each of the actual operating modes of the module can be reflected by the dedicated constraint set. Therefore, the timing specification of the module can be directly converted to the equivalent set of timing constraints. Moreover, the combinational loops, created by the logic controlling the operating modes of the module, are eliminated in a natural way. The problem regarding the distribution of logic optimization effort is solved automatically. In the multi-mode analysis case, the logic optimization tool analyses the circuit separately for each mode. Therefore, the same effort is put on the logic used by every mode individually. The problem with timing reports is also solved. Since STA generates reports separately for every operating mode, the designer gets direct information on the timings with respect to the mode. This makes the timing characterization of modules much easier. It must be stressed that the preparation of the multi-mode timing constraints requires more effort than in case of the singlemode technique. It also requires good knowledge of circuit architecture from the designer. This can be regarded as a drawback of this approach. There is one another matter requiring discussion, namely the configuration process. It is a very important aspect of reconfigurable circuits [1,12]. Since a number of configuration bits of an RM are often significant, the designer must pay special attention to the reconfiguration time. Unfortunately, in case of modules which do not have special configuration mechanisms implemented (e.g. output port disabling mechanism), the configuration process involves all the resources. This can be illustrated by the another example of an RM depicted in Fig. 2 (see paths F and G). Every change of the FFc state causes signal propagation to all of the endpoints – to the input pin of FFd, and to the output port. Another issue is path E, which is disabled in the multi-mode constraining approach. Since it is not taken by STA and optimization tools into consideration, the designer has to take care of its timings too. Therefore, the configuration process involves almost every path of the circuit. To ensure that all timings are met during the configuration process, almost all of timing paths must be taken into consideration. This can be done e.g. by declaration of specific mode devoted to the configuration process. This special mode must be constrained similar to the single-mode manner in order to cover all of the paths of the circuit. In this way the special mode will be automatically characterized and actual reconfiguration time of the circuit can be determined.
Fig. 4. Example of reconfigurable routing resources – disabling timing arcs by mode.
Fig. 5. Main stages of TCT operation.
CONF_R port is removed and the data-flow paths are analyzed simultaneously. Obviously, the removed path related to the CONF_R signal should not be left unconstrained – the reasons are the same as in case of the module from Fig. 1. Therefore, they must be taken into account by the mode intended for the configuration process.
4. TCT In general, the TCT operation can be divided into three main stages. This is illustrated in Fig. 5. At the beginning, the input data required for TCT must be prepared by the designer. Then, the tool utilizing provided data performs initial timing constraining of the design. Finally, the last stage of the flow can be started – the constraint-based design space exploration. In the following part of this section all these stages are described in detail.
3.3. Constraining of routing resources 4.1. Input data Another important part of the typical RM architecture is reconfigurable routing resources. They are often partially included inside an RM thus the matter of their timing constraining must be discussed as well. Let us consider the example depicted in Fig. 4. As it can be seen, the module has the purpose of switching the input signal from port IN to output ports OUT1 and OUT2 according to the chosen operating mode (via port CONF_R and FFc). This is an example of a simple switch box commonly used in reconfigurable devices [12]. During normal operation of the circuit (after configuration), there are two valid timing paths from a data-flow perspective – from IN port to OUT1 port in the first mode, and from IN port to OUT2 port in the second mode. These paths are symmetrical, thus they introduce almost the same delays. Therefore, there is no sense to assign two different constraint modes for every path in this case, but it is better to analyze the circuit as a whole. This can be achieved by using set_disable_timing command on routing configuration FF. This causes that the timing path related to
Besides providing technology libraries and elaborated RM, that is Register-Transfer Level (RTL) description of an RM mapped onto generic logic [21], the tool requires additional input data. First of all, during the first step of the methodology developed in [15], the neighborhood of an RM (bearing in mind that it constitutes partition to be cloned) must be modeled by a set of environmental constraints [16,19]. This must be done by the designer using SDC commands that define input port drivers and output port loads of the module. Another necessary input data is the definition of RM operating modes. It can be provided as a Comma-Separated Values (CSV) compliant file. For the purpose of this paper such data is called the “mode file”. The CSV format is suitable since it can be easily prepared by a human (using broadly available software tools) and processed by scripts. The mode file stores names of the modes, entries that specifies if a mode is purely combinational or sequential, and a specification of configuration registers. The
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
315
Fig. 6. Example of the mode file.
timing arcs preserved. This setup is for configuration registers acting in a given mode as operational registers (for example in modes in which the LUT registers act as memory). In the next step, the cost groups are created. Paths from all these groups are optimized with the same effort (for more details see Section 3.1). It is especially important from the perspective of a design space exploration process during which timing constraints are consecutively changed, independently for each path group. Therefore, it is important to assure balanced logic optimization after every change of timing constraints. The cost groups are created respectively to the path groups:
I2C group – all input-to-register (in2reg) paths (all paths of Fig. 7. The initial constraining stage of TCT.
example of such file is depicted in Fig. 6. The description of values used in the presented table will be given in the next subsection. Finally, designer needs to provide initial constraint set that will be used as a starting point for the design space exploration algorithm. 4.2. Initial constraining The flow diagram of the initial constraining stage is depicted in Fig. 7. As it is presented, this stage is composed of six subsequent steps. In general during this part of the flow the timing constraints are set on an RM utilizing the multi-mode constraining technique described in Section 3. As a result the first design space point is determined. At the beginning, the mode file is read and the data is converted into the TCT database. Then the tool sets timing constraints according to the database entries and the initial constraint set that was prepared by the designer. After this process, the design is constrained for every mode and the configuration registers that are marked with 0 or 1 in the mode file, propagate appropriate constants across the circuit. After setting the initial timing constraints, the first logic synthesis is started. Since the design has not been mapped yet, therefore this is a complete logic synthesis with technology-independent Boolean optimization, technology mapping and technology-dependent gate optimization [22]. This full logic synthesis is required to allow disabling timing arcs of configuration registers, since the timing arcs can be removed only from mapped cells. Disabling timing arcs is performed according to the multi-mode technique. TCT disables timing arcs corresponding to the input pin of all configuration registers that propagates 0 or 1 constants. In this way, the tool eliminates path E presented in Fig. 3. The registers marked with x are equivalent to routing configuration registers (see Fig. 4). This setting causes that all of the timing arcs of these registers are disabled by TCT. Registers marked with n have their
in2reg logic, see Fig. 9),
C2C cost group – all register-to-register (reg2reg) paths, C2O cost group – all register-to-output (reg2out) paths, I2O cost group – all input-to-output (in2out) paths. At the end of the initial constraining stage, the second logic synthesis is performed. Its purpose is to create new reports which include the changes introduced to the design database. They are required for the further TCT operation. For this reason, the incremental logic synthesis is used [22]. This type of logic synthesis does not map the design again but only performs the technologydependent optimization of logic under the current constraint set. 4.3. Design space exploration During this stage, the early constraint-based design space exploration with circuit timing characterization is performed. Subsequent steps of this stage are depicted in Fig. 8. In the first step, the tool saves data of the first design space point (the netlist, SDC files), and generates timing, area and power reports. This data is based on the results of the initial constraining stage and constitutes the start point for the design space exploration. In general, the design space exploration can be divided into two phases. During the first phase, the constraining algorithm checks, for every constraint mode, current timing constraints and changes them if it is required. This algorithm will be presented in detail later in this section. At the beginning of the second phase, the incremental logic synthesis is performed. It is intended to optimize the circuit under new constraints determined by the constraining algorithm. Relying on the results obtained by the synthesis, the next design space point is saved. However, design space points are saved only if the constraining algorithm reports a change in the timing constraints. This is introduced to eliminate similar design space points based on the same timing constraint set from the summary report. After the second phase is finished, the decision on continuation of the design space exploration is taken. This decision relies on the
316
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
Fig. 8. Flow diagram of the constraint-based design space exploration.
Fig. 9. The example of an RM within the environment.
progress of the timing constraining. The design space exploration is repeated until the constraining algorithm does not change the timing constraints during three subsequent iterations. Such a situation implies that the timing constraints are set tight enough that the optimization tools cannot obtain better result than the current one. The mentioned constraining algorithm directly affects design space points, the quality of timing characterization and timing constraints set on the circuit. Because of its important role played in TCT, it will be presented now in detail. The constraining algorithm implements the multi-mode timing constraining technique described in Section 3.2, which enables characterization of RM operating speed in particular constraint modes. This algorithm is complicated because it must take into account two main aspects: placement of an RM inside the undetermined environment (remembering that TCT is intended to explore the design space of a partition) and interaction between timing constraints. Therefore, in order to better illustrate the constraining algorithm in case of sequential modes (for combinational modes it is quite simple and does not require examples), let us consider the circuit depicted in Fig. 9. This module is composed of two FFs and some combinational logic. Suppose that the in2reg logic introduces delay of D1, reg2reg logic delay of D2, and reg2out logic delay of D3, respectively. Another assumption is that the considered module will be used as a partition, and the logic delays of environment modules are not known, and this environment is initially modeled.
Fig. 10. Path group delays of the example of an RM for D2 4 D1 and D2 4D3.
Fig. 11. Path group delays of the example of an RM for D1 4 D2, D1 4D3 and D3 4D2 (inappropriate constraint set).
Figs. 10–13 are introduced to facilitate understanding the constraining algorithm. They present the delays of the in2reg, reg2reg and reg2out paths under various timing constraint sets. During timing constraining of the example of an RM, three scenarios must be considered: 1. D2 4 D1 and D2 4 D3: This is the case in which the reg2reg logic includes the longest delay path (called the critical path). The final operating frequency of the module depends on the worst
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
Fig. 12. Path group delays of the example of an RM for D1 4D2, D1 4D3 and D3 4 D2 (appropriate constraint set).
Fig. 13. Path group delays of the example an RM for D3 4 D1 and D3 4D2.
delays of the reg2reg paths. Consequently, all constraints have to be set in accordance to the D2 delay by using the set_clock command with delay declaration of D2. This command causes that each logic group is constrained to the D2, see Fig. 10. Consequently, because D2 4D1 and D2 4 D3, the in2reg and reg2out logics are underconstrained.1 Therefore, the set_input_delay with ðD2 D1Þ and set_output_delay command with ðD2 D3Þ declaration must be used. It has to be done, since the environment logic delays are not known and they can influence the final operating frequency. In order to reduce of such possibility, the in2reg and reg2out logics have to be fitly constrained.2 2. D1 4 D2 and D1 4D3: In this case the in2reg logic has the longest delay (includes the critical path). Suppose now, that the circuit is constrained in accordance to the reg2reg logic (by the set_mode command with D2 declaration), as it was presented in the previous point. This is illustrated in Fig. 11. Consequently, the in2reg logic (and in case of D3 4 D2 the reg2out logic too) is overconstrained3 what causes timing violations within in2reg logic group (and in case of D3 4 D2 within the reg2out logic too). This is not a proper set of constraints for this circuit. Moreover, such circuit will never be faster than it is determined by the D1 delay (plus delay introduced by the environment), thus the reg2reg logic is unnecessary constrained fitly – optimization of the reg2reg logic will not improve the maximal frequency of the considered module. Taking into account above-presented discussion, it can be deduced that the set_clock command should be used with D1 declaration, which makes the in2reg logic properly constrained (see Fig. 12). Accordingly, the input delay must be set to 0 by the set_input_delay command.4 Although, the reg2reg logic is underconstrained, it should be left unchanged, because it is pointless to constrain it tighter, as it was explained earlier. The reg2out logic is underconstrained too. But in this case, it cannot be left unchanged, since it can influence the final operating frequency of the circuit. It can happen, when the combinational logic loading the OUT port of the considered module (the in2reg env. logic in Fig. 9) has long enough delay (longer than delay of the reg2out env. logic). As it was assumed, the designer cannot
1 This means that the constraint set on this logic is looser than its delay (the slack is positive). 2 This means that the constraint set on this logic is almost the same as its delay (the slack is close to 0). 3 This means that the constraint set on this logic is tighter than its delay (the slack is negative). 4 Zero input delay related to the env. module 1 from Fig. 9 may seem to be unrealistic from the complete design point of view. However, in this case the input delay does not influence timing characterization of the considered module. The neglected input delay can be easily taken into account by changing clock period during top-level assembly.
317
determine this delay, thus the reg2out logic should be constrained fitly by the set_output_delay with ðD1 D3Þ declaration. 3. D3 4 D1 and D3 4 D2: In this case the reg2out logic has the longest delay (includes the critical path). This situation is similar to the previous point. Therefore, the set_clock command with D3 declaration, set_input_delay command with D3 D1 declaration, and set_output_delay command with 0 declaration should be used. The foregoing discussion shows that the timing constraining of an RM used as a partition is not a trivial task. This is especially true in the context of timing characterization, which requires to take into consideration delays of the logic group. That is why, the constraining algorithm requires data on worst slacks and current constraints (see Fig. 8). This data enable the constraining algorithm to calculate and compare the worst delays for every cost group,5 and to decide on, in accordance to which logic group the constraints should be set. If the changing constraint algorithm has to check constraints for a sequential mode, the flow is as follows: 1. Check whether worst delay path belongs to the C2C group and whether worst slack for C2C path group is positive (see the decision phase in Fig. 14(a) and Table 1). If so, start the procedure of changing constraints and finish constraint check for this mode. Otherwise go to point 2. 2 Check whether worst delay path belongs to the I2C group and whether worst slack for I2C path group is positive (see the decision phase in Fig. 14(b)). If so, start the procedure of changing constraints and finish constraint check for this mode. Otherwise go to point 3. 3. Check whether worst delay path belongs to the C2O group and whether worst slack for C2O path group is positive (see the decision phase in Fig. 14(c)). If so, start the procedure of changing constraints and finish constraint check for this mode. Otherwise go to point 4. 4. Check whether worst slack for I2C path group is positive (see the decision phase 1 in Fig. 14(d)). If so, start the procedure of changing this constraint and then go to the decision phase 2. Otherwise omit the procedure of changing constraint for I2C group and go directly to the decision phase 2. During the decision phase 2, check whether worst slack for C2O path group is positive (see the decision phase 2 in Fig. 14(d)). If so, start the procedure of changing this constraint and finish constraint check for this mode. Otherwise omit the procedure of changing constraint for C2O group and finish constraints check for this mode. If the condition mentioned in point 1 is satisfied, then the constraint tightening procedure is started. The flow of this procedure is depicted in Fig. 14(a). This procedure can be divided into three phases – calculating of the new clock period (NCP) – phase 1, new input delay (NID) – phase 2 and new output delay (NOD) – phase 3. At the beginning of the phase 1, the worst slack of C2C path group (slack(C2C)) is checked whether it is greater than constraint limit (CL). If this is true, the NCP is decreased by CL. This limit is introduced to prevent reducing clock period too much, what in consequence would lead to reduction of the total number of design space points (it can be perceived as a setting of a maximum constraint step). This mechanism is especially useful at the beginning of the design space exploration process, when the 5
4.2.
They were set that every cost group represents every logic group, see Section
318
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
Fig. 14. Changing timing constraints. (a) Sequential mode, point 1. (b) Sequential mode, point 2. (c) Sequential mode, point 3. (d) Sequential mode, point 4.
Table 1 Legend for figures “Changing timing constraints”. Abbreviation
Description
CL CP E2IR IOD NCP NID NIOD NOD OC PG_WD slack(PG) WD WDPG
Constraint limit Clock period External to internal max delay ratio I2O worst delay New clock period New input delay New input to output delay New output delay Overconstraining ratio Worst delay of given path group Worst slack for a given path group Worst delay of each path group Worst delay path group
initial timing constraints are loosely set and slacks are big. Thanks to the CL parameter, the designer can adjust the maximal progress in timing constraining. If the slack(C2C) is less or equal to the CL, then the NCP is set to the value of the worst delay of C2C path group (C2C_WD) and additionally overconstrained by a value which is calculated according to overconstraining ratio (OC). This ratio is defined by the designer. Thus, the overconstraining value is the product of the OC and the slack(C2C). Thanks to this the designer has an opportunity to decide how much the constraints should be tightened, in accordance to their current value, during the design space exploration.
When the new clock period is determined, the new input and output delays can be calculated (see phases 2 and 3 in Fig. 14(a)). Firstly, the NID is calculated. It is a result of subtraction of the NCP and current worst delay of I2C path group (I2C_WD). Then, it is checked whether the NID is not less than zero. This scenario is possible due to rounding errors and should be eliminated. Thus, if the condition is true, the NID is set to zero. At the same time the NID is checked whether it is greater than result of multiplication of user-defined parameter called the external to internal max delay ratio (E2IR) and the NCP. When this condition is true, the NID is set in accordance to this ratio. For example, if NCP ¼ 10 ns, E2IR ¼0.8 and current I2C_WD is 1 ns then the NID is set to the value of 0:8 10 ns ¼ 8 ns and not to the value of initially calculated 10 ns 1 ns ¼ 9 ns. This mechanism is introduced to give the designer possibility of not to overconstrain the input logic if the sufficient “time space” is left for the logic delay of the environment. This feature is useful, when there are big delay differences among the logic groups. For instance, when the worst in2reg logic delay is several times shorter than the worst delay of reg2reg logic, it is pointless to constrain the in2reg logic fitly. The environment logic has enough “time space” not to influence the circuit final frequency. The E2IR is intended to give the designer a mechanism which enables preventing this situation. After determining NID the input delay constraint is set on the circuit. Similar procedure is performed for a calculation of the new output delay (compare phases 2 and 3 in Fig. 14(a)). Therefore, it will not be described here in detail. When the output delay is calculated, the constraining algorithm is started again for the next mode.
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
When the condition from point 2 of the constraining algorithm is satisfied (remember that in such case the condition from point 1 is not satisfied), then the constraint tightening procedure is started in accordance to the worst I2C path group delay (see phases 1, 2 and 3 in Fig. 14(b)). The phase 1 of this procedure is similar to the phase 1 of the previous point (compare Fig. 14(a) and (b)). Since in this point the timing constraints are set in accordance to I2C_WD, the new input delay has to be set to zero. The output delay is calculated by the same procedure as in case of point 1. When the condition from point 3 is satisfied (again this means that the conditions from points 1 and 2 are not met), then the constraint tightening procedure is started in accordance to the worst delay of C2O group (see Fig. 14(c)). The flow of changing constraints in this case is similar to the previous point, but this time the new output delay is set to 0, and the input delay is calculated similar to the procedure presented in point 1. Point 4 covers a scenario when actual clock period constraint is not fulfilled yet, but the worst slacks of input and/or output logic are positive. In such case, there is only need to tighten the constraints for I2C and/or C2O cost groups. As it is illustrated in Fig. 14(d), this procedure is composed of two decision phases and two constraining phases. During the decision phases the worst slack of given path group is checked whether it is positive. If this is true, the constraining phases are started. They are similar to the phase 2 and phase 3 of point 1. The only difference is that the NID and NOD are not checked whether they are 0, since such case is not possible. When this procedure is completed, the constraining algorithm is started to check constraints for another mode. The constraining procedure is much simpler in case of combinational modes. This is illustrated in Fig. 15. During the decision phase, the worst slack of I2C group is checked whether is positive. If this is true, the constraint must be tightened. This decision starts the constraining phase. As it can be seen in the figure it is similar to the phase 1 of point 1 of the flow for the sequential modes (depicted in Fig. 14(a)). The constraint is compared with the CL and overconstrained in the similar manner. After determining its value, the set_path_delay command with appropriate delay declaration is used. When the constraining phase is accomplished, or the slack (I2O) is negative or zero, the constraining procedure is finished.
319
4.4. TCT output data TCT, besides performing of the RM design space exploration, is intended also to prepare necessary data for later stages of the proposed methodology (particularly for the floorplanning process) [15]. For that purpose it generates all necessary reports and files that are needed during first stages of physical synthesis. In general, the output data produced by TCT is constituted by a set of timing constraints and netlists, and detailed reports on timing, area and power (including summary reports). The set of timing constraints (in SDC format), netlists and reports are generated for every design space point during the design space exploration process. It is worth noting that netlists and corresponding SDC files are ready to be directly utilized in the next stages of the design flow. The detailed reports on timing, area and power of the design hold all data produced by the design space exploration algorithm. However, they are too detailed to get the essential information directly, and are generated for debug purposes only. Therefore, TCT is equipped with procedures which scan these reports, collect most important information, and return it as one CSV formatted report. A part of the summary report (for the first design space point) is presented in Fig. 16. As it can be seen, besides general information (about a kind of the report, the creation and modification dates and design version) it provides details on the area utilization, power dissipation and timings of the module. The information of the module timings includes current timing constraints and the worst delays for each path group and every mode. 4.5. Design flow with TCT TCT modifies the netlist generation stage of the design flow presented in [15]. As it is depicted in Fig. 17, the first stage of the flow (the division of RTL description) remains the same. Then, the logic synthesis of partitions is performed with TCT. There is no need to prepare the timing constraints for partitions manually. Instead, the mode file and environmental constraints have to be prepared as it is described in Section 4.1. After the logic synthesis with TCT, the designer has the opportunity to choose the design space point for the physical synthesis. It is worth noting that the designer is not forced to take the timing constraints as they are. The resulting constraint set can be used as a base for further modifications. The reports and files, already provided by the tool, can be utilized as a comprehensive overview of the design. Obviously, after modification of the timing constraints, the logic synthesis of the partitions has to be repeated in order to produce the proper netlists. This flexibility comprises a great advantage of this approach, since on the one hand, TCT produces ready-to-use files and reports, enabling fast preparation of the floorplan, and on the other hand, it facilitates fine adjustment of the timing constraints for demanding designs. During the layout generation stage [15], the design is physically synthesized with the chosen design space point. If the verification shows that the design specification is not fulfilled, then the designer can consider another point from the design space. Anyway, such situation is not as probable as in case of implementation without TCT. In approach with TCT, during the floorplanning the designer can rely on the TCT output data, which provides comprehensive overview of the design space.
5. Verification of TCT
Fig. 15. Changing timing constraints – combinational mode.
The TCT performance was verified using Ubichip. This device was designed using the methodology without TCT and fabricated in 2009 with UMC 180 nm process. The results obtained during its physical synthesis were confirmed by the physical tests of the
320
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
Fig. 16. TCT summary report.
Fig. 17. Getting the netlist stage with TCT.
device. For these reasons Ubichip was an ideal candidate to verify the reliability of the proposed tool. Therefore, the authors decided to use Ubicell [23] – the basic element of the Ubichip reconfigurable array, as the example of an RM. The verification of TCT was performed in three steps. First of them consisted in logic synthesis of Ubicell with TCT. Then the obtained results were verified by the physical synthesis (second step) and back-annotated functional simulations of the module (third step).
5.1. Logic synthesis with TCT First of all, this experiment was conducted to show the usefulness of the TCT flow for the designing of the RM. Moreover, it was intended to generate input data for the next two steps of the assumed verification.
As it was described in Section 4, TCT requires input data in a form of the mode file. This file has to reflect the actual operating modes and architecture of the RM. Therefore, the mode file prepared for the purpose of this test contained declarations of 16 constraint modes which covers actual operating modes of Ubicell, plus one called default_s. Ubicell has no special mechanisms implemented for the reconfiguration process. Therefore, during the reconfiguration, signals from the reconfiguration registers propagate through the circuit without any obstacles. This causes that for this specific operation the module should be considered as the single-mode one (as it was discussed in Section 3). For this reason, the default_s constraint mode has been introduced. In this mode, all the reconfiguration registers have been assigned value of n. It makes all of the timing arcs of reconfiguration registers remain enabled (see Section 4.1), so there are no constants propagating through the circuit. Besides covering the reconfiguration mode, the default_s mode is also useful from a perspective of discussion on the results – it shows the difference between constraining of the RM in the single-mode and multi-mode manner. The initial timing constraints were set as follows: initial clock period 10 ns, input and output delays 0 ns for all sequential modes and input to output delay of 10 ns for each combinational mode. The results of the logic synthesis with TCT were obtained in about 24 h using the personal computer equipped with INTEL CORE I7 X980 3.33 GHz processor, 3 4 GB PC-12800 RAM, Intel X58 chipset. The tool has determined 11 design space points. The generated summary report was converted into diagrams and presented for the purpose of the discussion in Figs. 18–21. Fig. 18 depicts a general comparison of worst delays related to every Ubicell operating mode and every determined design space point. On the X-axis the worst delays scaled in nanoseconds are traced. On the Y-axis the subsequent operating modes are listed. Description of used abbreviations is presented in Table 2. There are 11 bars drawn for every mode. Each of these rectangles represents the subsequent design space point. They are color coded according to the legend (on the right-hand side of the figure). The length of the rectangle represents worst delay for a given design space point. The design space points are marked from ubicell_0 (the first point – the loosest timing constraints) to ubicell_10 (the last point – the tightest timing constraints). As it can be seen in this figure, the default_s mode has the longest delays. The remaining operating modes (constrained with
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
321
Fig. 18. Comparison of the Ubicell worst delays in relation to operating modes and determined design space points. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
the multi-mode technique) are actually much faster than it could be deduced taking into account only the default_s mode results. This difference is between several hundred picoseconds in case of point ubicell_0 of normalALU mode and almost two nanoseconds in case of point ubicell_8 and ubicell_9 of shiftReg, 3SM, 2x2SM, counter and indLUTs_s modes. It gives error of about 700% in the worst cases. This fact confirms deliberations conducted in Section 3. Concluding, as it is shown, the single-mode constraining technique is not suitable for the STA based timing characterization of RMs since in this case STA takes into consideration all circuitry with parts of it that are not used by particular modes. Fig. 18 shows also progress in the design space exploration process. It can be seen that the subsequent points have shorter and shorter worst delays. This illustrates the proper reaction of the optimization tools on circuit overconstraining performed by the
design space exploration algorithm. There are also some exceptions from this rule. They are clearly visible in case of first design space points (for instance point ubicell_3 of addCLA mode) as well as last ones (for instance point ubicell_10 of lfsr mode). This result can be justified by current values of timing constraints set on the circuit. For first design space points they are quite loose, and for last points they are set too tightly. Conclusion after this analysis of the results is that not all of the design space points are worth to be taken into consideration by the designer. The most reliable points are placed in the middle of the determined design space. 5.2. Physical synthesis In order to verify the results obtained during Ubicell logic synthesis with TCT the physical synthesis was performed. This
322
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
Table 2 Description of the Ubicell operating mode abbreviations. Mode abbr.
Description
default_s sram lfsr addCLA subCLA normalALU shiftReg 3SM 2x2SM confReg counter 2level_s wideDec_s indLUTs_s 2level_c wideDec_c indLUTs_c
“Single-mode” constraints SRAM mode Linear feedback shift register Carry-Lookahead (CLA) adder CLA substractor ALU mode Shift register 3-bit state machine Two 2-bit state machines Four 1-bit state machines Counter mode 2-level logic (registered) Multi-input logic function (registered) 4 independent 4-input logic functions (registered) 2-level logic (combinational) Multi-input logic function (combinational) 4 independent 4-input logic functions (combinational)
Fig. 20. Worst delay vs. area – normalALU mode.
Fig. 21. Worst delay vs. area – counter mode.
Fig. 19. Worst delay vs. area – default_s mode.
synthesis was similar to the partition Place and Route (P&R) session of the design flow presented in [15]. It was a typical flat approach composed of the following steps: floorplan preparation, initial placement, clock tree synthesis, post-CTS optimization, routing, and signal integrity checks. The floorplanning process of this physical synthesis was based directly on the results produced by TCT. The core area was increased by 10% because the logic synthesis results did not include the area needed i.a. by the clock tree [22]. The scripts prepared for this test were also equipped with the reporting mechanisms similar to the ones implemented in TCT. Thanks to this, the output results were formatted in the same manner making the comparison of the results easier. Figs.19–23 depict the comparison of the results (the worst delay and area) obtained during the logic synthesis with TCT and the physical synthesis of Ubicell. They are prepared only for the most interesting operating modes from the perspective of analysis of the results. As it can be seen, the timing results of the logic synthesis related to the points from the middle of the design space point collection more or less correspond with the ones from the physical synthesis. Significant differences are related to the first design space points (from ubicell_0 to ubicell_2). Anyhow, it is reasonable since in these cases the timing constraints are too loose what usually leads to unreliable results – sometimes the physical synthesis optimization algorithms obtain better timings, sometimes worse than the logic synthesis ones. Nevertheless, the timing specification is met in every case. Concluding, the results produced by TCT are confirmed – the netlists and timing constraint sets of determined design space points allow obtaining physical devices which performance is consistent with predictions.
Fig. 22. Worst delay vs. area – 3SM mode.
Fig. 23. Worst delay vs. area – shiftReg mode.
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
5.3. Back-annotated simulations After the successful physical synthesis, the design was checked against functional correctness and timing specification fulfillment. For this purpose back-annotated functional simulations were performed with Modelsim [24] and Very High Speed Integrated Circuits Hardware Description Language (VHDL) testbenches. The input data for these simulations consisted of netlists and Standard Delay Format (SDF) files obtained from the physical synthesis as well as behavioral VHDL description of Ubicell. These simulations were intended to validate the correctness of multi-mode timing constraining technique. In general, each test was performed in two steps. First, the Ubicell was configured to act in a chosen operating mode. Then it was functionally verified by test vectors applied to its inputs. Both configuration and functional verification were conducted with respect to the timing characterization determined by TCT. That is, in case of sequential operating modes the module clock and input ports were driven preserving the reg2reg and in2reg worst delays respectively. The timing requirements of the outputs were checked in accordance to the timing characterization information on out2reg worst delay. In case of combinational modes signal propagation time through the circuit was checked with respect to in2out worst delay. The test vectors were prepared for every operating mode and they cover almost all of the Ubicell resources. Every design space point (every Ubicell version) determined by TCT was tested and all tests were passed. Successful back-annotated functional verification of the Ubicell validated the multi-mode constraining technique used in TCT.
5.4. Summary Verification of TCT was performed using Ubicell as the example of an RM. This choice is justified by the work done in the PERPLEXUS project during which Ubichip was implemented using the implementation methodology without TCT, fabricated and physically verified. The operating chip is the best source of reliable data that can be used for verification of TCT. The verification of the tool was divided into three steps. The first was the logic synthesis of Ubicell performed with TCT. The tool automatically set the timing constraints with the multi-mode technique, explored the design space using the dedicated algorithm and produced detailed reports on every designated point including area of the module, timings and estimations of power consumption. It also prepared the data required for further steps of the implementation flow. These results were produced in about 24 h on the typical PC, thus it took incomparably much less effort and time than by manual timing constraining and design space exploration of Ubicell, what in case of the non-TCT approach takes time measured in person-months. The produced results concerning the design space exploration are fully compliant with performance of the fabricated chip's Ubicells. The second verification step was the physical synthesis of the module based on the data prepared previously by TCT. As it was shown, for every determined design space point the sets of netlists and timing constraints lead to similar post-layout results regarding the timings and area occupied by the module. The first and the second verification steps together confirmed the overall correct operation of TCT. Particularly they tested the design space exploration algorithm and showed that it produced useful results for early chip performance estimations. It also demonstrated that the data produced by the tool can be easily used for the rest part of the design methodology. In effect use of TCT leads to great reduction of time needed for closing first part of
323
the project (accurate estimations of chip performance) and obtaining the first version of a chip layout. The third verification step comprised the back-annotated functional simulations based on the data obtained during the previous steps. These tests were intended to confirm whether the proposed method of timing constraining of the module is precise and reliable. The obtained results confirmed the reliability of the proposed multi-mode timing constraining technique and showed consistency between timings reported by STA and back-annotated functional simulations.
6. Conclusions The idea of new EDA tool, called TCT, is a result of authors' experience gained during implementation of the dynamically reconfigurable chip – Ubichip. TCT is intended to facilitate designing of integrated electronic devices utilizing reconfigurable resources. The authors developed TCT in Tcl and integrated it with CADENCE Encounter RTL Compiler. TCT equips this logic synthesis tool with algorithms of early constraint-based design space exploration. It supports the designer in a timing constraining process and timing characterization of reconfigurable modules. TCT automatically sets the timing constraints on the reconfigurable module, utilizing the multi-mode technique presented in Section 3. Thanks to this feature, the designer is supported in a very time-consuming and error-prone process, which has direct impact on the quality of the design. Moreover, TCT employs STA for the timing characterization of a reconfigurable module. This enables fast and early estimations of the design timings. Thanks to the proposed design space exploration algorithm, TCT provides detailed reports on timing, area and power, giving a comprehensive overview of circuit performance at the early stage of the design flow. TCT reveals possible trade off between timing effort, area utilization and power consumption. Therefore, this tool provides a great opportunity for fast and reliable exploration of a given architecture in a chosen IC process technology. It also facilitates the floorplanning, which is a critical issue of the design methodology of modular DR devices. TCT is fully compatible with the design flow proposed in [15], and enhances it by radical reduction of the time needed to complete the floorplan. The idea of TCT can be adapted for similar design flows and for other currently available logic synthesis tools. TCT was successfully verified by using it for Ubichip implementation. The tool performs full design space exploration of basic element of Ubichip reconfigurable array in about 24 h. This is a significant achievement since in case of non-TCT approach this stage of the methodology lasted a few person-months. Therefore, it can be stated that TCT gives the designers reliable support during the first stages of DR circuit design flow, particularly in reliable timing constraining of RMs (partitions) and preparation of a chip floorplan.
Acknowledgment This work was funded from 2010/2011 budget resources of Polish Ministry of Science and Higher Education as a research project entitled “Development of CAD tools for dynamically reconfigurable digital circuits”, number: N N515 603639. It also utilized a part of research on the PERPLEXUS project financed by the European Commission 6th Framework Programme, number: IST-2006-34632. Piotr Amrozik was a scholarship holder of project entitled “Innovative unlimited education, integrated development of Technical University of Lodz, university management, innovative educational offer, augmentation of hiring ability including disabled” supported by European Social Fund.
324
P. Amrozik, A. Napieralski / Microelectronics Journal 45 (2014) 311–324
References [1] M. Platzner, J. Teich, N. Wehn, Dynamically Reconfigurable Systems, Architectures, Design Methods and ApplicationsSpringer Science þBusiness Media B. V., ISBN: 978-90-481-3484-7, 2010. [2] S. Vassiliadis, D. Soudris, Fine- and Coarse-Grain Reconfigurable ComputingSpringer eBook Collection, Springer, ISBN: 978-1-4020-6504-0, 2007. [3] L. Józwiak, N. Nedjah, M. Figueroa, Modern development methods and tools for embedded reconfigurable systems: a survey, Integration, The VLSI Journal 43 (2010) 1–33, http://dx.doi.org/10.1016/j.vlsi.2009.06.002. [4] X. Zhang, Y. Ding, Y. Huang, X. Dong, Design and implementation of a heterogeneous high-performance computing framework using dynamic and partial reconfigurable FPGAs, in: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), 2010, pp. 2329–2334. http://dx. doi.org/10.1109/CIT.2010.401. [5] L. Zhuo, V. Prasanna, Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems, IEEE Trans. Parallel Distrib. Syst. 18 (4) (2007) 433–448, http://dx.doi.org/10.1109/ TPDS.2007.1001. [6] T. Todman, G. Constantinides, S. Wilton, O. Mencer, W. Luk, P. Cheung, Reconfigurable computing: architectures and design methods, Comput. Digit. Techn. IEEE Proc. 152 (2) (2005) 193–207, http://dx.doi.org/10.1049/ipcdt:20045086. [7] T. El-Ghazawi, E. El-Araby, M. Huang, K. Gaj, V. Kindratenko, D. Buell, The promise of high-performance reconfigurable computing, Computer 41 (2) (2008) 69–76, http://dx.doi.org/10.1109/MC.2008.65. [8] D. Rossi, F. Campi, S. Spolzino, S. Pucillo, R. Guerrieri, A heterogeneous digital signal processor for dynamically reconfigurable computing, IEEE J. Solid-State Circuits 45 (8) (2010) 1615–1626, http://dx.doi.org/10.1109/JSSC.2010.2048149. [9] R. Khraisha, J. Lee, A scalable h.264/avc deblocking filter architecture using dynamic partial reconfiguration, in: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2010, pp. 1566–1569. http:// dx.doi.org/10.1109/ICASSP.2010.5495525. [10] R. Kiełbik, G. Jabłoński, B. Świercz, P. Amrozik, Instructionless processor architecture using dynamically reconfigurable logic, in: Mixed Design of Integrated Circuits and Systems (MIXDES), 2010 Proceedings of the 17th International Conference, 2010, pp. 112–116. [11] N. Kapre, A. DeHon, SPICE2: spatial processors interconnected for concurrent execution for accelerating the spice circuit simulator using an fpga, IEEE Trans.
[12]
[13]
[14]
[15]
[16]
[17] [18]
[19] [20] [21]
[22] [23]
[24]
Comput. Aided Design Integr. Circuits Syst. 31 (1) (2012) 9–22, http://dx.doi. org/10.1109/TCAD.2011.2173199. S. Hauck, A. DeHon, Reconfigurable computing: the theory and practice of FPGA-based computation, Systems on SiliconMorgan Kaufmann, United States, ISBN: 978-0-12-370522-8, 2008. A. Upegui, Y. Thoma, E. Sanchez, A. Perez-Uribe, J. Moreno, J. Madrenas, The perplexus bio-inspired reconfigurable circuit, in: Second NASA/ESA Conference on Adaptive Hardware and Systems, 2007, AHS 2007, 2007, pp. 600–605. http://dx.doi.org/10.1109/AHS.2007.105. E. Sanchez, A. Perez-Uribe, A. Upegui, Y. Thoma, J. Moreno, A. Napieralski, A. Villa, G. Sassatelli, H. Volken, E. Lavarec, Perplexus: pervasive computing framework for modeling complex virtually unbounded systems, in: Second NASA/ESA Conference on Adaptive Hardware and Systems, 2007, AHS 2007, 2007, pp. 587–591. http://dx.doi.org/10.1109/AHS.2007.84. L. Kotynia, P. Amrozik, A. Napieralski, Methodology for implementing scalable run-time reconfigurable devices, in: Int. J. Electron. Telecommun. (2011) 177– 183. http://dx.doi.org/10.2478/v10177-011-0025-8. J. Bhasker, R. Chadha, Static Timing Analysis for Nanometer Designs, A Practical ApproachSpringer Science þ Business Media, LLC, ISBN: 978-0-38793819-6, 2009, URL: 〈http://www.google.pl/books?id=N1Zn1RdqPVoC〉. H. Kaeslin, Digital Integrated Circuit Design, From VLSI Architectures to CMOS FabricationCambridge University Press, UK, ISBN: 978-0-52-188267-5, 2008. L. Wang, Y. Chang, K. Cheng, Electronic Design Automation, Synthesis, Verification, and TestMorgan Kaufmann Publishers, ISBN: 978-0-12-3743640, 2009. Synopsys, SDC official website, 2013. URL: 〈http://www.synopsys.com/commu nity/interoperability/pages/tapinsdc.aspx〉. CADENCE, Encounter RTL Compiler official website, 2013. URL: 〈http://www. cadence.com/products/ld/rtl_compiler/pages/default.aspx〉. H. Bhatnagar, Advanced ASIC chip synthesis: using Synopsys Design Compiler, Physical Compiler, and PrimeTimeKluwer Academic Publishers, ISBN: 978-079-237644-6, 2002. CADENCE, Using Encounter RTL Compiler, Product version 10.1., Technical documentation, 2010. J. Moreno, J. Madrenas, A reconfigurable architecture for emulating large-scale bio-inspired systems, in: IEEE Congress on Evolutionary Computation, 2009, CEC '09, 2009, pp. 126–133. http://dx.doi.org/10.1109/CEC.2009.4982939. Mentor Graphics, Modelsim official website, 2013. URL: 〈http://model.com/〉.