Proceedings of the 13th IFAC Conference on Proceedings of Devices the 13thand IFACEmbedded Conference on Programmable Systems Proceedings of the 13th IFAC Conference on Proceedings of Devices the 13thand IFAC Conference on online at www.sciencedirect.com Available Programmable Embedded Systems May 13-15, 2015. Cracow, Poland Programmable Devices and Embedded Systems Programmable Devices and Embedded Systems May 13-15, 2015. Cracow, Poland May May 13-15, 13-15, 2015. 2015. Cracow, Cracow, Poland Poland
ScienceDirect
IFAC-PapersOnLine 48-4 (2015) 354–361
On PLCs Control Program Hardware Implementation On PLCs Control Program Hardware Implementation On Control Program Hardware On PLCs PLCs Control Program Hardware Implementation Selected Problems of Mapping andImplementation Scheduling Selected Problems of Mapping and Scheduling Selected Problems of Mapping and Selected Problems of Mapping and Scheduling Scheduling Adam Milik* Adam Milik* Milik* Adam Adam Milik*
*Institute of Electronics, Silesian University of Technology of Gliwice, Poland *Institute of Electronics, Silesian University ofpolsl.pl). Technology of Gliwice, Poland (e-mail: *Institute Silesian University Technology *Institute of of Electronics, Electronics,(e-mail: Silesianadam.milik@ University of ofpolsl.pl). Technology of of Gliwice, Gliwice, Poland Poland adam.milik@ (e-mail: (e-mail: adam.milik@ adam.milik@ polsl.pl). polsl.pl). Abstract: The paper shows the FPGA dedicated method of mapping a PLC program written according to Abstract: The paper shows the FPGA dedicated method of mapping aa PLC program written according to the IEC61131-3 standard. is described complete synthesis process from the program Abstract: The shows the dedicated method of program written according to Abstract: The paper paper showsThere the FPGA FPGA dedicated method of mapping mapping a PLC PLC program writtendescription according to the IEC61131-3 standard. There is described complete synthesis process from the program description to hardware implementation through mapping and scheduling procedures. PLCs’ programming languages the IEC61131-3 standard. There is described complete synthesis process from the program description to the IEC61131-3 standard. There is described complete synthesis process from theprogramming program description to hardware implementation mapping and scheduling procedures. PLCs’ languages are translated into commonthrough intermediate graph form. It enables massive parallel implementation. There is hardware implementation through mapping and scheduling procedures. PLCs’ programming languages hardware implementation through mapping and scheduling procedures. PLCs’ programming languages are translated common intermediate graph form. Itwith enables massive parallel implementation. There is presented an into originally developed graph structure attribute edges. Finally the graph mapping are into common intermediate graph form. enables massive parallel implementation. There are translated translated into common intermediate graph form. It Itwith enables massive parallel implementation. There is is presented an originally developed graph structure attribute edges. Finally the graph mapping methodologies are discussed. A general hardware mapping concept and algorithms for utilizing specific presented an originally developed graph structure with attribute edges. Finally the graph mapping presented an originally developed graph structure with attribute edges. Finally the graph mapping methodologies are discussed. A general hardware mapping concept and algorithms for utilizing specific FPGA components are presented. An efficient the DSP48 block is shown. It attempts to methodologies are A hardware mapping concept and for specific methodologies are discussed. discussed. A general general hardwaremapping mappingof concept and algorithms algorithms for utilizing utilizing specific FPGA components are presented. An efficient mapping of the DSP48 block is shown. It attempts to utilize all features of the block in pipelined calculation model. The consideration are summarized with FPGA components are presented. An efficient mapping of the DSP48 block is shown. It attempts to FPGA components are presented. An efficient mappingmodel. of theThe DSP48 block is shown. It attempts to utilize all features of the block in pipelined calculation consideration are summarized with implementation result comparison for general hardware mapping and with use of DSP48 units. utilize all features of the block in pipelined calculation model. The consideration are summarized with utilize all features of the block in pipelined calculation model. The consideration are summarized with implementation result comparison for general hardware mapping and with use of DSP48 units. implementation result comparison for general hardware mapping and with use of DSP48 units. implementation result comparison for IL, general mapping andby with use ofLtd. DSP48 units. © 2015, IFAC (International Federation of Automatic Hosting Elsevier All rights reserved. Keywords: PLC, FPGA, DSP48, LD, SFC,hardware DFG,Control) high level synthesis, logic synthesis, reconfigurable Keywords: PLC, FPGA, DSP48, LD, IL, SFC, DFG, high level synthesis, logic synthesis, reconfigurable hardware Keywords: PLC, FPGA, DSP48, LD, IL, SFC, DFG, high level synthesis, logic synthesis, reconfigurable hardware hardware hardware is described a wide range of programming languages from the 1. INTRODUCTION is described aa wide range of languages from the simple textual instruction listprogramming (IL) to the high level structured is range programming languages from 1. INTRODUCTION is described described a wide wide range of of programming languages from the the simple textual instruction list (IL) to the high level structured 1. INTRODUCTION text (ST). The ladder diagram (LD)to graphical language is still textual instruction list (IL) the high level structured Programmable logic controllers (PLCs) are present from simple simple textual instruction list (IL) to the high level structured text (ST). The ladder (LD) graphical still Programmable logic controllers are present popular amongdiagram automation designers.language It has is been text The diagram (LD) language is still early 70’s of XX century. They (PLCs) successfully replacedfrom the very Programmable logic controllers (PLCs) are from text (ST). (ST). The ladder ladder diagram (LD) graphical graphical language isbeen still Programmable logic controllers (PLCs) are present present from very popular among automation designers. It has early 70’s of XX century. They successfully replaced the inherited from design methodology based on electrical very popular among automation designers. It has been mechanical and electro mechanical control systems offering early 70’s of XX century. They successfully replaced the very popular among automation designers. It has been early 70’s of XX century. They successfully replaced the inherited from design methodology based electrical mechanical and electro systems offering schematics of relay control systems. The SFCon language is inherited design methodology based on electrical better performance and mechanical reliability. control Today PLCs mechanical and mechanical control systems offering inherited from from design methodology based onlanguage electrical mechanical and electro electro mechanical control systemsbecome offeringaa schematics of relay control systems. The SFC is better performance and reliability. Today PLCs become derived from GRAFCET (David, 1995). It offers ability of schematics of relay control systems. The SFC language is standard in automation. The performance improvement of better performance and reliability. Today PLCs become a schematics of relay control systems. The SFC language is better performance and reliability. Today PLCs become ofa derived from GRAFCET (David, 1995). It offers ability of standard in automation. The performance improvement describing concurrent processes. All languages allow of to derived from GRAFCET (David, 1995). It offers ability logic controllers is a main concern for automation system standard in automation. The performance improvement of derived from GRAFCET (David, 1995). It offers ability of standard in automation. Theconcern performance improvement of describing concurrent processes. All languages to logic controllers is aa main express a control algorithm independently from allow a target concurrent processes. All allow to designers. Is it possible to concern improve for the automation performancesystem of a describing logic controllers is main for automation system describing concurrent processes. All languages languages allow to logic controllers is a main concern for automation system express a control algorithm independently from a target designers. Is it possible to improve the performance of a hardware platform. Automation designers do not have to go express a control algorithm independently from a target PLC? An attempt to answer question made by different express aplatform. control algorithm independently from a target designers. Is it possible to this improve theis performance of a hardware Automation do not have to the go PLC? An attempt answer this is made by different implementation details and designers concentrate on solving hardware platform. designers do not have go researchers like into (Chmiel and question Hrynkiewicz, 2010). One of into PLC? An attempt to answer this question is made by different hardware platform. Automation Automation designers do on not solving have to to the go PLC? An attempt to answer this question is made by different into implementation details and concentrate researchers like in (Chmiel and Hrynkiewicz, 2010). One of control problem. into implementation details and concentrate on solving the the ideas that can significantly increase performance is fully researchers like like in in (Chmiel (Chmiel and and Hrynkiewicz, Hrynkiewicz, 2010). 2010). One One of of control into implementation details and concentrate on solving the researchers problem. the ideas that can implementation significantly increase performance is fully problem. custom of controller structure (Du control the ideas that significantly performance is control problem. the ideashardware that can can implementation significantly increase increase performance is fully fully The ASLC (application specific logic controller) is custom hardware of controller structure (Du et al. 2010, Economakos and Economakos 2012, Ichikawa et custom hardware implementation of controller structure (Du The ASLC (application controller) is custom hardware implementation of controller structure (Du implemented from a standardspecific programlogic with use of the high The ASLC (application specific logic controller) is et al. 2010, Economakos and Economakos 2012, Ichikawa et al. 2011, Mocha and Kania 2012, Milik 2013) ItIchikawa was even et al. 2010, Economakos and Economakos 2012, et implemented from a standard program with use of the high et al. 2010, Economakos and Economakos 2012, Ichikawa et level synthesis covering: translation, scheduling and implemented from a standard program with use of the high al. 2011, Mocha and Kania 2012, Milik 2013) It was even implemented from a standard program with use of the high shown thatMocha controllers with 2012, advanced fuzzy algorithms are level synthesis covering: translation, scheduling and al. 2011, and Milik 2013) It al. 2011, and Kania Kania 2012, Milik 2013) It was was even even Each of covering: those steps translation, requires knowledge and skills level synthesis synthesis covering: translation, scheduling and shown thatMocha controllers with in advanced fuzzy algorithms are mapping. level scheduling and efficiently implemented hardware Each of those steps requires knowledge and skills shown that controllers with advanced fuzzy (Wyrwoł algorithms and are mapping. from areas like: compilation, digital synthesis, operation mapping. Each of those steps requires knowledge and skills efficiently implemented in hardware (Wyrwoł and mapping. Each of those steps requires knowledge and skills Hrynkiewicz, 2013). They are microprocessor efficiently in hardware (Wyrwoł and from areashardware like: compilation, synthesis, operation efficiently implemented implemented in opposite hardwareto (Wyrwoł and schedule, mapping. digital To make the application from like: digital synthesis, operation Hrynkiewicz, 2013). They are opposite to microprocessor from areas areashardware like: compilation, compilation, digital synthesis, operation based implementation that are based on serial processing of Hrynkiewicz, 2013). They are opposite to microprocessor schedule, mapping. To make the application Hrynkiewicz, 2013). They are opposite to microprocessor specific logic controller competitive to standard PLCs the set schedule, hardware mapping. To make the application based implementation that are based on serial processing of schedule, hardware mapping. To make the application instructions. The serial processing concept is a source of based implementation that are on logic controller competitive to standard PLCs the set of specific based implementation that are based basedconcept on serial serialisprocessing processing of EDA tools must be created. This paper is a continuation of specific logic controller competitive to standard PLCs instructions. The serial processing a source of specific logic controller competitive to standard PLCs the the set set significant performance limitation. It can beis overcome by instructions. The serial processing concept aa source of of EDA tools must be created. This paper is a continuation of instructions. The serial processing concept is source of research program over the ASLC implementation (Milik and of EDA tools must be created. This paper is a continuation of significant performance limitation. It can be overcome by of EDA tools mustover be created. Thisimplementation paper is a continuation of dedicated implementation of a hardware for by significant performance It overcome program the ASLC (Milik and significant performance limitation. limitation. It can can be bestructure overcome byaa research Hrynkiewicz 2012 and 2014, Milik 2013). It briefly recalls research program over the ASLC implementation (Milik and dedicated implementation of a hardware structure for research program over the ASLC implementation (Milik and particular control task execution. In opposite to serial dedicated implementation of a hardware structure for a Hrynkiewicz 2012 and Milik 2013). It briefly recalls dedicated implementation of a hardware structure for a basic definitions and 2014, recently presented methods of Hrynkiewicz 2012 2014, Milik 2013). recalls particular control task execution. In opposite to serial Hrynkiewicz 2012 and and 2014, Milik presented 2013). It It briefly briefly recalls processing a massively parallel processing is possible with basic particular control task execution. In opposite to serial definitions and recently methods of particular control task execution. In opposite to serial intermediate program representation. Than it moves to stages basic and recently methods of processing aa massively parallel processing is possible with basic definitions definitions and recently presented presented methods of use of SRAM configured FPGAs. The application specific processing parallel processing is with program representation. it moves to processing a massively massively parallel processing is possible possible with intermediate devoted to technology mapping. ThereThan are discussed anstages early intermediate program representation. Than it moves to stages use of SRAM configured FPGAs. The application specific intermediate program representation. Than it moves to stages logic (ASLC) assures reprogramability through the devoted to technology mapping. There are discussed an early use of SRAM FPGAs. The specific use ofcontroller SRAM configured configured FPGAs. The application application specific mapping stages, scheduling problems and different mapping devoted technology mapping. There are an logic controller (ASLC) assures reprogramability through the devoted to tostages, technology mapping. There and are discussed discussed an early early static reconfiguration. It is worth to mention an interesting logic controller (ASLC) assures reprogramability through the mapping scheduling problems different mapping logic controller (ASLC) assures reprogramability through the strategies targeted to specific components of contemporary mapping stages, scheduling problems and different mapping static reconfiguration. ItPLC is worth todescribed mention in an(Welch interesting mapping stages, scheduling problems and different mapping proposal of a dedicated FPGA and targeted to components of contemporary static reconfiguration. It is worth to mention an interesting strategies FPGAs represented by specific the Xilinx Spartan 6 family. strategies targeted specific components of proposal of strategies targeted to to specific components of contemporary contemporary Carletta, proposal 2000). of aaa dedicated dedicated PLC PLC FPGA FPGA described described in in (Welch (Welch and and FPGAs represented by the Xilinx Spartan 6 family. proposal of dedicated PLC FPGA described in (Welch and FPGAs Carletta, 2000). FPGAs represented represented by by the the Xilinx Xilinx Spartan Spartan 6 6 family. family. Carletta, 2000). Carletta, 2. THE PLC PROGRAM INTERMEDIATE The great2000). success of PLCs is connected with easiness of 2. THE PLC PROGRAM INTERMEDIATE The great success of PLCs is connected with easiness of REPRESENTATION 2. INTERMEDIATE programming. The method of program design has been The great success of PLCs is connected with easiness of 2. THE THE PLC PLC PROGRAM PROGRAM INTERMEDIATE The great success ofmethod PLCs is connected with easiness of REPRESENTATION programming. The of program design has been REPRESENTATION standardized by the IEC61131-3 document and its subsequent programming. The method of program design has been REPRESENTATION programming. The method of program design has been The control program must be translated into a form that is standardized by the IEC61131-3 document and its subsequent revisions (Cenelec, 2013, John and Tiegelkamp, 2010). There The control program must be translated into a form that is standardized by the IEC61131-3 document and its subsequent standardized by the IEC61131-3 document and its subsequent suitable for further handling accommodated hardware revisions (Cenelec, 2013, John and Tiegelkamp, 2010). There The control program must be translated into a to form that is revisions for further handling accommodated to hardware revisions (Cenelec, (Cenelec, 2013, 2013, John John and and Tiegelkamp, Tiegelkamp, 2010). 2010). There There suitable suitable for further handling accommodated to suitable for further handling accommodated to hardware hardware
2405-8963 © 2015, IFAC (International Federation of Automatic Control) Copyright IFAC 2015 354Hosting by Elsevier Ltd. All rights reserved. Peer review©under of International Federation of Automatic Copyright IFAC responsibility 2015 354Control. Copyright © IFAC 2015 354 Copyright © IFAC 2015 354 10.1016/j.ifacol.2015.07.060
PDeS 2015 May 13-15, 2015. Cracow, Poland
Adam Milik et al. / IFAC-PapersOnLine 48-4 (2015) 354–361
Q is the set of variables associated with outputs and internal nodes. The above sequential processing is transformed to parallel processing by introducing an auxiliary variables set D as follows:
synthesis. It is expected that intermediate representation allows not only revealing the program semantics but also operation dependencies. It is important to reveal all independent operations for possible parallel execution in hardware. An enhanced data flow graph (EDFG) has been proposed. The flow graphs are commonly used by compilers (Hopcroft and Ullman 1979, Wirth, 1976) and in hardware synthesis (Gajski et al., 1994).
d i = f i ( I , d 1,..., d i −1, q i,..., q n ) ; i = 1 n
The D set represents already evaluated values. The processing cycle is completed by the items value assignment form D to Q. Ladder diagram a
a
c
e
d V3
& &
& OR
e V3
V2
c OR
y
b
a
b
c1
Graph Items Description &
AND ADD
OR
OR SUB
1 0
Simple edge
&
-1
0 0
b
q2
Step s2 set node
c1
&
OR
&
s2.x
OR
s2.x
&
c2
Read variable c1 Write variable c1
Simple edge c1
sub-EDFG delivering c1
Inverting edge
3. PREPARATION FOR GENERAL HARDWARE MAPPING
e
After generation and optimization stages the EDFG graph is transformed in order to meet target hardware platform requirements. The general hardware mapping procedure assumes that there are present basic arithmetic blocks corresponding to graph operations nodes (addition, multiplication and division). Attributed edges allows for easy optimization of arithmetic expressions. The node merge reduces the tree height and respective paths length from variable or constant read nodes to assignment nodes. The node merge also simplifies constant handling. An arithmetic operation node argument set after optimization (propagation) contains maximally one constant node. It is important to balance the graph while the compilation process creates the
y
1
Conditional selection NOT edge
&
The figure (Fig. 2) shows an essence of a compilation concept. The input languages are translated into common representation of the EDFG. The first case shows the LD compilation process to raw EDFG form that is further processed (optimized). The second case depicts translation of the SFC step into an equivalent EDFG structure. The control program written with use of different languages is brought to the common representation that retain processing capabilities. Directed edges determine operation dependencies that enables extracting independent operations and tasks.
5. b
&
&
Fig. 2. The compilation outline process for LD and SFC to EDFG
y
a
q1
Legend c1
d
y
&
Step s2 clear node
c
e
1
c2 s3
a
d
OR
s1.x
s2
V4
c
q2
c1
4.
a
q1
s1
y
3.
&
1
&
V4
1
Sequential Functional Chart
(1)
d
V1
b
c
2.
b
q1
q1
c
Where: I denotes the set of variables associated with inputs, a
b
q1
The compilation process is aimed for obtaining the high performance processing model. It is based on single processing cycle by propagating input variables through the nodes of an EDFG until assignment nodes. It has been described in details in (Milik, 2013). The compilation process assures the sequential processing according to proper variable accessing. This can be put down as ordered sequence of functions represented by rungs of the LD:
1.
(2)
Q=D
The EDFG is a directed acyclic graph. It is given by G=〈V, E〉 where: V is a set of nodes and E is a set of directed edges. The directed edge e is an ordered triple e = 〈vSRC, vDST, a〉 where: vSRC is a predeceasing node and vDST is a successor node of the directed edge. The a is an attribute of the edge chosen from the set A. The set A consist of unary operations applicable for a particular node type. Exemplary EDFGs are shown in the figure (Fig. 1). The attributed edge combines an assignment operation with a logic or an arithmetic complement. This modification significantly simplifies algorithms for graph creation, optimization and hardware mapping. There are shown the raw form of the logic graph (1) and the arithmetic graph (3) obtained during compilation process. Next to them are shown graphs after merge operations (2) and (4) respectively. A conditional execution is implemented with use of selection node shown in (5).
= 1...r q i f= i ( I,Q),i
355
y
Complement edge
Fig. 1. The EDFG concept 355
PDeS 2015 356 May 13-15, 2015. Cracow, Poland
Adam Milik et al. / IFAC-PapersOnLine 48-4 (2015) 354–361
grapevine form from consecutive assignments. There are two fundamental transformations described further that are applied to the EDFG before scheduling and mapping process can take place.
complementary edge by reducing it (Fig. 3.3) or propagating to the constant (Fig. 3.4). Finally the single complement edge is propagated before or after node. The decision depends on successive node. If successive node performs addition the complementary edge is forward propagated. When the forward propagation is not possible the complement operation is expanded and back propagated to the argument with shorter calculation time.
3.1. Complement edges transformation The EDFG complement edges are convenient for transforming arithmetic operations. When all transformations (optimizations) on the arithmetic level are completed the complement edges are translated into equivalent hardware representation. It consists of adder, logic inversion and constant value:
3.2. Multiple arguments node expansion The node with multiple arguments (>2) does not correspond to elementary hardware components. It cannot be directly mapped and requires expansion to satisfy real component constraints. The expansion process creates two argument nodes that are directly mapped. A special handling is used for additive nodes with a constant. The developed expansion process iteratively expands arithmetic nodes with more than two arguments. The expanded arguments calculation time is taken into consideration during the process. The procedure balances a total calculation time of the subtree during node expansion. Considering execution time allows for better balancing the tree and improve obtained scheduling result. Let variable t is associated with the EDFG node and describes the operation completion time. The operation completion time (in as soon as possible approach) for particular node is calculated as:
(3) −a = a + 1 The respective expansion procedure is applied to arithmetic nodes with complemented arguments only. It is aimed for reducing the number of additional operations but does not constraint successive processes. The complement arc transformation process connected to addition node is shown in figures (Fig. 3.1 and Fig. 3.2). The process changes each complement arc into a bitwise inversion arc and adds a constant 1. Referring to the property of a single constant argument per operation node, the constant values are merged. The transformation process is applied iteratively to all complementary arcs. Finally operation is implemented with use of the adder and typical logic components available in FPGAs. An adder is implemented with use of LUT and arithmetic support components next to it creating carry chain. The multiplicative operation nodes allow reduction of 1.
c
a
d
a
3
c
b
d
Where: tj is a j node calculation completion time, ti is i–th argument node completion time of node j and tpj is j node operation completion time according to the operation type and implementation target. The tj value can be determined provided all ti are known
HW Concept
y=a-b=a+¯+1 y=a-b=a+b+1
5
a
A
b
B
The expansion algorithm starts from initializing the variable t for each operation node. The variable t in the figure (Fig. 4.1) is shown as a subscript of each node. The t variable of operation node is set to uninitialized value (t = -1) except of reading variable nodes. Those nodes are initialized with t = 0 as the value is immediately available. Following procedure is
y
y
y
Ci
1
2.
y=a+b+¯+¯+2 y=a+b+c+d+2
y=a+b-c-d c
b
a
d
c
b
a
Legend Simple edge Complement edge Inverting edge
2
1.
y=a*(-7)
y=(-a)*7
b
a
b
4.
y=a*b
a c
y
y
3. y=(-a)*(-b) a
d
0
d
0
0
5
2. c
0
1
e
-1
-7
a
7
a
(4)
i
y=a+b+¯+¯+5 y=a+b+c+d+5
y=a+b-c-d+3 b
= t j MAX ( t i ) + tp j
0
d
0
a
0
f
y
0
e
8
-1
5.
(¯¯¯)+1 (a*b)+1
(a*(-b)) a
b
y
y
a
c
a*(¯+1) a*(b+1) b
b 1
4
0
d N4
0
a
0
1
4
f
0
0
N5
y 5 1
4.
0
e
0
f
N4
N2
0 N5
1
8
0
-1
y y
5
0
N2
N3
9
N1
a
N1
1 N3
3.
Tc 0
1
2
3
4
5
6
7
8
y
Fig. 4. The arithmetic node expansion algorithm example
Fig. 3. The complement edge transformations 356
PDeS 2015 May 13-15, 2015. Cracow, Poland
Adam Milik et al. / IFAC-PapersOnLine 48-4 (2015) 354–361
Where: e is a variable associated with input signal, d – a variable associated with an output signal, p – a read-write variable associated with internal signal, α and β read only variables (parameters of function).
applied for all variable assignment nodes. Starting from variable assignment node the procedure traces back to argument nodes. If the t variable of current node is not assigned than it is a subject of the t value calculation according to (4). If there is an argument node that t = -1 the procedure is called recursively for this node. If the graph adjacency matrix row contains more than two nonzero items than the node is subject of the expansion process. The node arguments count is used in practical EDFG implementation instead of adjacency matrix. Arguments pair with the smallest t value is selected for expansion. Selected arguments are reconnected to the newly created node. The new node is assigned t value according to (4) and becomes the argument of the node j (the expanded node). The expanded nodes are marked with grey colour in figures (Fig. 4.1 and Fig. 4.2). The node j expansion process is continued until number of arguments is equal to 2. There is a special case of expansion for addition nodes with a constant. An adder is able to perform addition of 2 bit vector arguments (a, b) and a single bit item (ci) applied to the carry in input. When the constant value c meet a requirement:
y = a + b + ci : ci ∈ {0,1} n
c < n-1 : c ∈ } ⇒ c = ∑ ci j
357
1.
e
d
p
α
2.
e
α
p
β p
d
e
d d
β
d 0
1
2
4
t
The simplest method utilize a direct correspondence between hardware arithmetic primitives and respective nodes of an EDFG. The EDFG is directly mapped to respective arithmetic components that are separated with registers (shown as hatched rectangles). The direct mapping process bounds permanently each node with unique hardware resource. It has been shown schematically in the figure (Fig. 5.2). The assignment process can be made with two possible scheduling greedy approaches ASAP or ALAP. For the circuit the ASAP method has been selected. The flow controller (not shown in the diagram) assures an ordered data flow through the unit. At the bottom of the schematic diagram the time line of calculation process is shown. The problem of variable allocation does not exist while each node is assigned to unique hardware component. The direct mapping concept results in fast resource run out and small hardware utilization. In this approach each arithmetic module is used once per calculation cycle. It can be noticed that the directly mapped structure is perfect for pipelined processing systems preferred in system working without feedback (Sun et al. 2007). The structure (Fig. 5.2) requires balancing the time dependencies in all processing paths by registers inserting. The PLC controller or fast feedback controllers require cyclic calculation in a loop with a controlled object. The utilization of pipelined processing is limited while an object feedback is required for starting next calculations cycle.
(5)
j =1
Where: c is a constant node value represented as a natural number, n is a number of arguments of an expanded node. The constant value is distributed among expanded nodes in form of carry in value. 4. FPGA ORIENTED SCHEDULING AND MAPPING The FPGA device perfectly fits for implementing the logic dependencies of the ASLC. The Boolean dependencies are implemented within single cycle according to (2). The ASLC not only relays on Boolean variables evaluation but also implements arithmetic calculations and mutual dependencies between Boolean and numerical data. The arithmetic operations are resource demanding while they operate on long bit vectors. High level programming languages defined several integer types as 16, 32 or even 64 bit vectors. The FPGAs offers custom sized numeric variables implementation abilities with different arithmetic or even DSP support (Deschamps et al. 2006). In this paper the mapping problem to the recent architecture of Spartan 6 is considered. The considerations starts from general hardware mapping approach and move to the more specialized general approach and finally the specific method of utilizing DSP48A1 unit is presented.
4.2. Resource sharing mapping Limited number of resources especially multiplier cores and dividers that are resource demanding components and low utilization factor of components induce developing a method
4.1. General direct mapping procedure e
The obtained EDFG (after arithmetic operation expansion) is a subject of hardware mapping. The figure (Fig. 5) illustrates the problem given in form of the EDFG and a direct mapping concept. The exemplary diagram shows the implementation of typical signal processing equation :
d = α ( e − p ) + β ( d − e ) + d , p= e
3
Fig. 5. The direct hardware mapping procedure: (1) An input EDFG, (2) The mapped circuit structure.
v2
v1
0 1 2 3 4
p
d Distributed RAM
(6)
t
α
Adder Multip. Y A B Y A B v1 e p v1 E d v2 v1 α d v 2 d v2 v 1 β d v2 d -
β
Fig. 6. The EDFG mapping with resource sharing. 357
PDeS 2015 358 May 13-15, 2015. Cracow, Poland
Adam Milik et al. / IFAC-PapersOnLine 48-4 (2015) 354–361
(α, β). Those variables are used in read-only approach. The separate channel for parameters update is created.
that enables resource sharing.. The set of arithmetic cores and the set of variables is reused and distributed in time. The figure (Fig. 6) depicts the result of scheduling and mapping processes. The schematic diagram shows the obtained structure of the circuit. The table next to the circuit holds the schedule of variables passed to arithmetic units and respective results assignment. For systematic scheduling procedure each operation node is described with structure:
v SCH = c S , c L, c , t 0, t 1
Summarizing the shared resources approach introduces much higher complexity of mapping in comparison to direct approach. The main advantage is sharing of resources that are strongly limited. The high sharing cost of adders The serious is the main limitation that makes the implementation difficult. Utilizing the distributed RAMs reduces multiplexing costs but currently is limited to special usage cases for read only variables.
(7)
Where cS and cL are ASAP and ALAP schedule cycles respectively, c scheduled calculation cycle, t0 a result creation cycle, t1 last access cycle to the result. In the literature scheduling process assumes unit execution time for all operations (Gajski et al. 1994, Paulin and Knight 1989, Wang et al. 2007). The proposed general structure allows for scheduling operations with different operation execution time and resource availability. The scheduling procedure fills the respective variables successfully. It starts from the cS and cL. The difference between cL and cS determines the operation mobility. The mobility factor is used for selecting operations for assignment. Nodes with lower mobility factor are promoted for scheduling. The local dependencies are used for further selection factors. The algorithm is based on modified list scheduling approach. The maximal available resource set is determined at the beginning of the procedure. The scheduling procedure assigns the operation to particular operation cycles. After first pass of the assignment the resource reduction step is possible depending on operation mobility. The operations are shifted in mobility range into free time gaps. The schedule map requires binding to the physical resources. The binding procedure is aimed to reduce the overall sharing cost. This problem has been raised in (Hadjis et al. 2012). The resource sharing cost is calculated for each input. For reducing the address decoding costs the one hot controller is used (Kania and Czerwiński, 2013). It can be determined that multiplexing n arguments with k input LUT (look up table) with one-hot approach is given by following formula:
δ LUT
2n − k +1 n > 1 = v s k − 1 n =1 0
4.3. DSP48 module mapping Xilinx’s FPGAs offer DSP support through the hardwired core called DSP48 starting from Virtex 4 family. The unit is intended to implement the most common DSP operation based on multiplication and result accumulation. The Spartan 6 family implements the DSP48A1 core version (Xilinx, 2009). The general block diagram is shown in the figure (Fig. 7). The unit is partially programmable at the run time. The run time programmable components are marked in the diagram with gray rectangles. The equivalent EDFG mapping pattern is shown under the diagram. The central part and a key feature of the block is a multiplier unit. The core itself is able to implement 18 bit number processing that completely satisfies the 16 bit integer processing requirements. The 48 bit accumulator perfectly fits for fixed point number scaling and protects against overflowing with a margin of 12 bits. The registers structure is statically selectable during implementation process. A pipelined structure shown on the diagram has been chosen for timing efficiency purposes. The EDFG expanded for general mapping is not suitable for the DSP48 mapping. The node merge concept discussed in EDFG transformation perfectly prepares for other than general mapping concepts. Using DSP48 units requires clustered expansion to the pattern shown in (Fig. 7.2). There are two level of clustering the first one merges operations into single DSP48 unit. The next level of clustering merges operations with accumulative addition of result. The clustering operation improve the performance enabling
(8)
COUT
1.
Where: vs is the vector size, the δLUT is the number of LUT generators. The δLUT factor exhibits low sensitivity to changes during resource allocation procedure and hinders changes tracking. There was introduced a general multiplexing factor for variables and constants. It counts the number of inputs required for particular hardware resource and assigns value of 2 for variables and 1 for constant value. Finally the variable nodes are exchanged in order to minimize overall multiplexing costs in scheduled operations set.
C [47:0] PCIN
D [17:0] B [17:0]
D CE
A [17:0]
Q
0 {D:A:B} D
Q
Z
X
D CE
CE
Q
P [17:0]
CIN Function selection
2.
In order to reduce argument multiplexing costs the use of distributed RAM (Fig. 6) is attempted. The problem of use a memory modules are also addressed in work (Coussy et al. 2008). Currently the ability of utilizing the distribute RAM is reduced to variables that implement processing parameters. The processing parameters are marked in the gray rectangles
EDFG mapping pattern
0
Fig. 7. The DSP48A simplified block diagram (1) and EDFG mapping pattern (2). 358
PDeS 2015 May 13-15, 2015. Cracow, Poland
1.
Adam Milik et al. / IFAC-PapersOnLine 48-4 (2015) 354–361
calculation for one cycle in order to propagate the arguments. Finally the (Fig. 8.3) shows the mixed addition and multiplication nodes mapping result. There are shown the successive unit reuse with pipelined processing utilization. The procedure attempts to fully utilize the unit without cycle stalls.
DSP48 Cluster
a b
a
d
c
1
0
b c
359
1
d
½ FPGA Tail 2.
DSP48 Cluster
0 a
c
b
RAMB8
0
b
DIA AA DIB AB
DIN
a
DOUT
0
CTRL
DSP48 Cluster
c
a
d b
a
1
CLK
0
AARGS
DIA AA DIB AB
c
B A
DOA
0
DOB
P
OP
OP Control word AD AB
0
DSP48A1
D
DOB
RAMB8
c 3.
DOA
AA
OP
Fig. 9. The DSP48A simplified block diagram (1) and EDFG mapping pattern (2).
b
d
The Spartan6 device offers two block RAM cores next to each DSP48 unit. When memories are configured to the 18 bit data bus that matches width of the DSP48 block data path they are able to store 512 words. The memories content is initialized by configuration process that eliminates the implementation of this process reducing overall hardware overhead. The schematic diagram of calculation unit based on a DSP48 and respective block RAMs is shown in the figure (Fig. 9). Block RAMs operate in parallel creating a 3 arguments addressable register set for A, B and D inputs of the DSP48. The control unit is implemented with use of general purpose logic components. Its concept is based on a counter based control units with LUT based argument readdressing. The utilization of dedicated hardware resources and minimization of general purpose logic resources usage (LUTs and Flip-Flops) are fundamental advantages of the architecture. It can be noticed that even the smallest devices from the Spartan6 family (e.g. XC6SLX4) enables implementation of 8 arithmetic blocks with performance of 250 – 300MHz that results with a pick performance of 2.0 – 2.4 GMAC/s.
Fig. 8. The EDFG expansion procedure for DSP48A mapping. pipelined calculations. The mapping procedure requires declaring two constant values of 0 and 1 that enable bypassing adder and multiplier stages respectively. Let’s consider the general expansion and mapping for addition and multiplication nodes. The expansion procedure starts from the assignment node and traces back to the argument nodes. The general properties of merged arithmetic node should be recalled. The node arguments set contains nodes with different operations. The identical operation precedes current node only if it is an argument of multiple nodes (cannot be merged). The mapping procedure recursively visits unassigned argument nodes. The set of nodes is ordered according to the calculation time with ASAP approach. If the node cycle identifier is not assigned than it is a subject of recursive evaluation. The DSP48 pattern is created with clustered nodes to which nodes are reconnected. The addition node with multiple arguments selects the pair of arguments with smaller cycle identifiers (possibly variable read nodes). They are reconnected to the DSP48 pattern as B and D arguments. If there are still unmapped arguments a new instance of DSP48 is created connected to the previous on in accumulative fashion. The B and D argument selection of the DSP48 unit is repeated. The operation is continued until all arguments of the addition node are assigned. The last mapping procedure holds the created DSP48 instance while tracing back to the map requesting node enables use of multiplier node.
4.4. Hardware mapping summary The Spartan 6 FPGAs offers a reach set of resources that are able to implement arithmetic calculations. In order to explain the differences the exemplary design of a PID controller has been implemented with presented strategies (Tab.1). It was obtained through generation of mapped design in Verilog HDL utilizing a specific components according to guidelines of XST synthesis tool (Xilinx, 2013). There are distinguished LUT generators operating as logic or memory components. DSP48 units utilization distinguish between a multiplier only usage and full unit utilization. In order to emphasize the differences in mapping there are considered one (1CH) and two independent instances (2CH) of the PID controller.
The multiplication node with multiple arguments mapping pattern is shown in (Fig. 8.2). It is a rear case in signal processing but general approach requires addressing this situation. The nodes with smallest ASAP time factor are selected. The operation is repeated iteratively by creating DSP48 clusters until all arguments are assigned to DSP48 clusters. It should be noticed that DSP48 defer the next
The direct mapping strategy (DIR) is the simplest one. Usage of registers is dominant for storing all variables and 359
PDeS 2015 360 May 13-15, 2015. Cracow, Poland
Adam Milik et al. / IFAC-PapersOnLine 48-4 (2015) 354–361
Presented algorithms belong to originally developed synthesis tool for a hardware implementation of LD, IL and SFC control programs. The compilation and synthesis tool is a subject of ongoing research and development. It is planned to extend arithmetic support to floating point numbers and further improvement of scheduling and mapping processes.
Tab. 1.Mapping strategies implementation result comparison Type
Map alg.
DIR SHR 1CH SDR D48D D48B DIR SHR 2CH SDR D48D D48B
FF 297 189 112 24 22 594 351 204 28 22
LUT
DSP48
LOG MEM MUL ALL 115 4 136 1 50 18 1 28 72 1 20 1 230 8 175 1 86 18 1 32 72 1 23 1
fMAX [MHz]
tC
197.2 130.6 180.3 260.5 302.9 185.3 128.4 178.1 239.8 302.9
7 7 7 8 10 7 10 10 10 14
TH
[Mc/s]
28.1 18.6 25.7 32.5 30.2 26.4 12.8 17.8 23.9 21.6
REFERENCES Cenelec (2013). EN 61131-3, Programmable Controller – Part 3: Programming Languages, Intern. Standard, Management Centre, Avenue Marnix 17, Brussels. Chmiel M., Hrynkiewicz E. (2010) Concurrent operation of processors in the bit-byte CPU of a PLC. Control and Cybernetics. 2010 vol. 39 iss. 2, pp. 559-579 Canis A., Choi J., Aldham M., Zhang V., Kammoona A., Anderson J.H., Brown S., Czajkowski T. LegUp: Highlevel synthesis for FPGA-based processor/accelerator systems, ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pp. 33-36, Monterey, CA, February 2011. Coussy P., Chavet C., Bomel P., Heller D., Senn E., Martin E.: GAUT: A High-Level Synthesis Tool for DSP Applications. From C Algorithm to RTL Architecture, Coussy P., Morawiec A. (eds.) High-Level Synthesis. From Algorithm to Digital Circuit, Springer Science + Busines Media 2008 Czerwiński R., Kania D., (2013) Finite state machine logic synthesis for complex programmable logic devices. Springer, Berlin, 2013 Du D., Xu X., Yamazaki K.(2010) A study on the generation of silicon-based hardware PLC by means of the direct conversion of the ladder diagram to circuit design language, The International Journal of Advanced Manufacturing Technology, Springer London, 2010, vol. 49, issue 5, pp.615-626 David R. (1995) Grafcet: A powerful tool for specification of logic controllers, IEEE Transactions on Control Systems Technology, vol. 3(3), 1995, pp. 253-268. Deschamps J.P., Bioul G.J.A., Sutter G.D.: Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems, John Wiley & Sons, 2006 Economakos C.; Economakos G. (2008). FPGA implementation of PLC programs using automated highlevel synthesis tools; IEEE International Symposium on Industrial Electronics, pp 1908 – 1913 Economakos C.; Economakos G. (2012). C-based PLC to FPGA translation and implementation: The effects of coding styles, 16th International Conference on System Theory, Control and Computing, pp.1-6, 12-14 Oct. 2012 Gajski D., N Dutt., Wu A., Lin S., (1994) High-Level Synthesis Introduction to Chip and System Design, Kluwer Academic Publishers Hadjis S., Canis A., Anderson J.H., Choi J., Nam K., Brown S., Czajkowski T.: Impact of FPGA Architecture on Resource Sharing in High-Level Synthesis, ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, CA, February 2012 Hopcroft J.E., Ullman J.D. (1979) Introduction to Automata Theory, Languages and Computation, Addison-Wesley Publishing Company, Massachusetts, 1979
coefficients. The LUT utilization is acceptable. The next algorithm (SHR - shared) implements the resource sharing but revels the problem of additional cost of resource sharing. It introduces the calculation resources reduction but overall LUT resource utilization is higher than in the direct approach. Utilizing distributed RAM’s (SDR – shared with distributed RAM) allows for reduction of registers and multiplexing resources. Mentioned already mapping strategies are based on elementary arithmetic components. The DSP48 core is used as a multiplier only part. The DSP48 core mapping strategy enables full utilization of the core and further reduction of resource requirements. There are two methods of implementation differing with variables storage implementation. The D48D utilize DSP48 with distributed RAMs while D48B utilize the RAMB8 block memories. In both cases utilization of general purpose logic is significantly reduced. All arithmetic computation are implemented in DSP48 block. Finally the throughput (TH) expressed as number of fully processed samples per second should be analyzed. There can be observed a good performance of D48D method that gives the best trade-off between resource requirements and performance. 5. SUMMARY The paper presents entire synthesis process of a hardware implemented controller from an abstract description to hardware mapping. The paper recalls presented translation methods of PLC languages to the original intermediate form of the EDFG. The obtained structure represents entire control process that is subject of a hardware mapping. Finally the hardware mapping stages are presented. The mapping procedure starts with EDFG accommodation to hardware components. There are discussed a multiple argument node expansion problem that maximize parallel execution of the algorithm. The other problem that has been addressed is an adder optimization with use of the attributed edges. The hardware prepared EDFG is scheduled with two different approaches. The direct approach allows simple transformation into hardware structure. The hardware component reuse has been introduced in the second method that increase the hardware components utilization. Finally the DSP48 component mapping procedure is shown. The mapping procedure introduces mapping strategy that expands operation nodes into DSP48 patterns. The method takes benefits from the pipelined architecture and accumulative adder of the DSP48 unit.
360
PDeS 2015 May 13-15, 2015. Cracow, Poland
Adam Milik et al. / IFAC-PapersOnLine 48-4 (2015) 354–361
Ichikawa S., Akinaka M., Hata H., Ikeda R. and Yamamoto H.(2011) An FPGA implementation of hard-wired sequence control system based on PLC software. IEEJ Trans Elec Electron Eng, 6: 367–375. doi: 10.1002/tee.20670 John K. H., Tiegelkamp M. (2010): IEC 61131-3: Programming Industrial Automation Systems: Concepts and Programming Languages, Requirements for Programming Systems, Decision-Making Aids, SpringerVerlag, Berlin Heidelberg Milik A., Hrynkiewicz E. (2014): On Translation of LD, IL and SFC Given According to IEC-61131 for Hardware Synthesis of Reconfigurable Logic Controller, Proc. of IFAC World Congress, Cape Town, 2014 Milik, A. (2013) On Hardware Synthesis of Reconfigurable Logic Controllers From Ladder Diagrams According to IEC61131-3, IFAC Workshop on Programmable Devices and Embedded Systems, 2013. Milik A and Hrynkiewicz E. (2012): Synthesis and implementation of reconfigurable PLC, International Journal of Electronics and Telecommunications, vol. 58, nr 1, March 2012, pp. 85-94 Mocha J. Kania D. (2012) Hardware Implementation of a control program in FPGA structures, Przegląd Elektrotechniczny Dec. 2012 vol. 88 issue 12a, pp. 95100 Paulin, P.G.; Knight, J.P.: Algorithms for high-level synthesis, IEEE Design & Test of Computers, vol.: 6 , iss.: 6, pp. 18-31, 1989 Sun W., Wirthlin M. J., Neuendorffer, S.: FPGA Pipeline Synthesis Design Exploration Using Module Selection and Resource Sharing, IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol.26, no.2, pp.254-265, Feb. 2007 Wang G., Gong W., Kastner R. (2008) Operation Scheduling: Algorithms and Applications, Coussy P., Morawiec A. (eds.) High-Level Synthesis. From Algorithm to Digital Circuit, Springer Science + Busines Media 2008 Welch, J.T.; Carletta, J. (2000). A direct mapping FPGA architecture for industrial process control applications International Conference on Computer Design pp.595598, 2000 doi: 10.1109/ICCD.2000.878352 Wirth, N., (1976) Algorithms + Data Structures = Programs, Prentice Hall; 1st Edition Wyrwoł B., Hrynkiewicz E. (2013) Decomposition of the fuzzy inference system for implementation in the FPGA structure Int. J. of App. Math. and Comput. Sci. 2013 vol. 23 no. 2, s. 473-483 Xilinx (2009) UG389, Spartan-6 FPGA DSP48A1 Slice Xilinx (2013) UG XST User Guide for Virtex-6, Spartan-6, and 7 Series Devices
361
361