ARTICLE IN PRESS
JID: MICPRO
[m5G;November 24, 2015;12:11]
Microprocessors and Microsystems 000 (2015) 1–10
Contents lists available at ScienceDirect
Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro
An IEC 61131-3-based PLC implemented by means of an FPGA M. Chmiel a, J. Kulisz a,∗, R. Czerwinski a, A. Krzyzyk b, M. Rosol b, P. Smolarek a a Institute of Electronics, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka Str. 16, 44-100 Gliwice, Poland b Cadence Design Systems, Inc., Katowice, Poland
a r t i c l e
i n f o
Article history: Received 29 September 2015 Accepted 2 November 2015 Available online xxx Keywords: Programmable logic controllers EN 61131-3 Standard Central processing units Arithmetic and logic units Floating-point arithmetic Field programmable gate arrays
a b s t r a c t The paper discusses the design process of a programmable logic controller implemented by means of an FPGA device. The PLC implements on the machine language level a subset of the instruction set defined in the EN 61131-3 norm. Different aspects of instruction list and hardware architecture design are presented, however two aspects are the most important: Central Processing Unit (CPU) and the Arithmetic and Logic Unit (ALU). The ALU can execute 34 operations, which include the basic logic operations, comparators, and the four basic arithmetic operations. The operations can be performed for fixed-point and floating-point numbers. All the operations are implemented fully in hardware, so the solution is fast. The developed PLC is implemented using an FPGA device; however, the HDL models used for synthesis can be easily ported to an ASIC. © 2015 Published by Elsevier B.V.
1. Introduction Programmable Logic Controllers (PLCs) have gained great popularity as a convenient means of implementing control systems during the last 50 years. For many years PLCs used to be constructed using general-purpose Microprocessor, or Microcontroller Units (MCUs), e.g. the popular MCS‘51 from Intel [1,2]. Such solutions were cheap, but quite inconvenient in terms of implementing many functionalities specific for PLCs. The PLCs’ programs had to be compiled to the machine language of the target MCU, and the specific PLC functionality was implemented fully in software. Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuit (ASIC) in general, offer an interesting alternative to MCU-based PLCs design. Logic resources of modern FPGA devices can reach millions of equivalent logic gates [3,4], what is sufficient to implement in a single device a full 32-bit CPU, together with most peripheries. Great popularity of PLCs made many vendors offer many different kinds of controllers. An attempt to put some ordering to the PLCs market led to establishing the EN 61131 standard. The EN 61131 is a standard for programmable controllers [5]. It consists nine parts; however, the most important for Authors’ research investigations is part three (EN 61131-3). This part discusses programming languages. Two groups of languages are specified: text languages and graphic
∗
Corresponding author. Tel.: +48 322371495. E-mail addresses:
[email protected] (M. Chmiel),
[email protected] (J. Kulisz),
[email protected] (R. Czerwinski),
[email protected] (A. Krzyzyk),
[email protected] (M. Rosol),
[email protected] (P. Smolarek).
languages. The most common in the industry are graphical languages. On the other hand, the most comfortable form of language, during design of the controller and also for programmers, is Instruction List (IL). Instruction list is especially helpful for testing, commissioning and improving the control programs. Some manufacturers offer controllers that can be programmed using languages classified under EN 61131-3 Standard [1,2,6]. It seems very often that the hardware structure of the PLC is not compatible with the software standard – the manufacturer uses a compiler that translates additional processing program written in a standard-based language to the “native” language and then compiles it for that controller language [7]. Such an approach often makes use of the PLC not optimal. As a matter of facts, the controller resources are not aligned with the standards. Logic capacity, and flexibility of modern FPGAs give us an opportunity to develop and test different solutions and built prototypes of PLCs [8,9,10,11,12]. The presented design is based on the classical “software” architecture – the PLC is realized as a micro-programmed unit in which the operation is based on the control program. Implementation of the PLC gives us a chance to check advantages and disadvantages of the standard. Feasibility of the standard collections can also be checked [13,14]. Moreover, it is possible to embed function blocks like B-BAC written in a standard-based language into the PLC and increase effectiveness of the control program [15]. The above mentioned considerations were the inspiration for a group of researchers from the Silesian University of Technology to develop a prototype design of a PLC, and implement it as a whole in an FPGA device. Another objective of the project was to design a CPU with the instruction list compatible with the EN 61131-3 Standard
http://dx.doi.org/10.1016/j.micpro.2015.11.001 0141-9331/© 2015 Published by Elsevier B.V.
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001
JID: MICPRO 2
ARTICLE IN PRESS
[m5G;November 24, 2015;12:11]
M. Chmiel et al. / Microprocessors and Microsystems 000 (2015) 1–10
[7]. With this design the objective was to achieve compatibility with the norm at the machine language level. However, it has to be stressed that not the full norm is actually implemented, but a subset of it. An objective of the norm was to define a possibly full, and even redundant set of operations, and variety of data types, which is supposed to cover all problems imaginable. Many operations (usually described as “functions”), are very complex, and their direct implementation in hardware would be very difficult and impractical. From this set a subset of 122 instructions was chosen for implementation. The idea of developing a PLC CPU, which is optimized for executing control programs written in a particular language, was presented in a number of works, e. g. in [16] for the LAD format, and in [17], and [18] for IL. The purpose of the paper is to present a concept of classical implementation of a programmable logic controller by means of FPGA logic devices. The presented PLC concept is compatible with EN 61131-3 Standard. The term “Classical software architecture” means a software processor. In fact, it should be stated that the presented solution is a System on Chip. The presented CPU is really a software processor; however it is supported by means of hardware modules – e.g. timers and counters are realized as hardware blocks and works in parallel (concurrent) with the CPU [19,20,21]. The most important aspect of the paper is the Arithmetic and Logic Unit (ALU) design. A characteristic feature of the ALU is that all supported arithmetic operations were implemented fully in hardware. The paper regards three different aspects: EN 61131-3 standard, elements of PLC structure and FPGAs resources. 2. PLC structure There are many degrees of freedom when starting designing a PLC. One constraint was obvious – the EN 61131-3 Standard – but the other must be assumed. It has been assumed that instructions are unary or without any argument. Because of experimental and research character of the work, there is no need to build a huge PLC. The basic assumptions include: the data bus is 32-bit wide, the address bus is 8-bit wide, and the control bus is 10-bit wide. A generalized structure of the PLC is presented in Fig. 1. The most important part of the controller is the Central Processing Unit (CPU). The CPU includes elements that control the operation of the entire controller – read and execute instructions. Moreover, the CPU includes memories dedicated for markers (bit memory markers and double word memory markers) with marker controllers. The Central processing unit is divided into two main parts: the bit logic unit and the double word arithmetic-logic unit. Both parts operate on stacks of Current Result (CR) registers. There is also the overflow flag (OV) inside the CPU. The purpose of the two separate data processing units is to accelerate execution of simple operations on binary variables [19,21]. Counters and timers are realized as hardware modules that support operation of the central processing unit. Counters and timers are based on 30-bit wide data. The timer resolution is equal to 1 ms. From the point of view of “outside”, the most important is a possibility to connect inputs and outputs. Necessity of operating on different standards (24VDC, 120VAC, 10 V for analogue I/O) forced the necessity of connecting signal modules outside the FPGA. The I/O Controller that is a part of the PLC implemented inside the FPGA is responsible for communication with signal modules. Input and output image memories are parts of the I/O controller. The imaging is synchronized with the program cycle (control loop). Inputs and outputs are read/written from/to image memories in the control program. Control bus includes signals for controlling communication of the CPU with internal modules like:
• counters: CU – Count Up, CD – Count Down, SC – Set Counter, R – Reset, • timers: TOF – Timer OFF, TP – Timer Pulse, TON – Timer ON, • or I/O controller. A control bus includes also signals for reading data from modules into the CPU (RD) and for writing data into modules from the CPU (WR). End of control loop is signaled by means of the END bit in the control bus. 3. Memory controller One of the most important features of PLCs is memory. It is hard to imagine a control program that uses no data memory in real industrial solutions. The second kind of memory is the program memory. The structure of the first one is the most interesting for programmers and it should be carefully constructed in the PLC. Data in PLCs have different formats. First of all, bit variables are used for bit calculations. However, it is necessary to use “wider” data for calculations based on the byte, word, double word or floating-point variables. It would be good if in spite of using one memory contents different data format would be accessible. For example the M0.0 bit should be a part of the M0 double word marker. Programmers should have access to the bit, and to the entire double word. PLC vendors commonly offer this possibility; however, an access to a bit is sometimes done by masking operations – explicitly or implicitly, that is done by the software after program compilation. There are microprocessors, like ARM Cortex-M3, which have immediate access to bits and double words thanks to the bit-band alias [22]. Similar functionality is available also in modern FPGA devices. It is a rule, that modern FPGA devices contain dedicated memory blocks - Block RAM-s, which can be configured as true dual-port memories [23]. These features of block RAMs are used in the presented PLC. For example there are two double word memory markers that have bit accessibility as bit markers (Fig. 2b). These two double word memory markers can be also accessed as classical two double word memory markers. Input and output image memories can be also arranged as bit/double word memories. Due to the 8-bit memory address in the designed PLC, only 256 memory cells can be addressed. In fact, the memory space is further reduced twice. This is because one bit of address (MSB) is intended as bit/double word designator (as presented in example in Fig. 2c). It is possible to address 128 bits and 128 double words. Those 128 bits are parts of double words thanks to real dual-port RAM. One double word of input image memory is bit accessible, one double word of output image memory and two double words as markers memory. However, the memory map is designed for research purpose. There are no obstacles to extend address bus and as the result the memory map too. 4. Central processing unit The programmable logic controller CPU is designed to execute the appropriate instructions, specified by the programmer. Instruction list of presented CPU was designed based on the EN 61131-3 Standard, especially based on text language – Instruction List (IL). Instruction list of the designed unit consists of the following operations: copy bit/double word data, bitwise operations for bit/double word data, arithmetic, rotation and move for double word data, trigger detection, jumps, counters and timers servicing. Complete Instruction List can be found in [24]. As it was already stated, instructions in IL are unary or without any argument. The first part of the instruction is the opcode. In this project the CPU executes instruction set consisting of 120 types of commands, which makes the operation code is 7-bit wide.
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001
ARTICLE IN PRESS
JID: MICPRO
[m5G;November 24, 2015;12:11]
M. Chmiel et al. / Microprocessors and Microsystems 000 (2015) 1–10
3
Fig. 1. The structure of the presented programmable logic controller.
b m
...
DUAL-PORT RAM PORT A
m PORT A
CTRL A
PORT B
MW 3
MW 2
MW 2
MW 1
MEMORY ARRAY
n
MW 3
MW 0 31
0
c
n PORT B
CTRL B
DI1
Addr 1000_0001
DI0
Addr 0000_0001 7
0
Fig. 2. a) Model of dual-port RAM. b) Fragment of markers memory organization. c) Example of bit/byte addressing scheme.
The second part of a command is the optional operand, which may be data or constant. The standard does not specify the size of the operands, and actually says it can be anything. As the most common data in industry are stored on 16-bits (Integer) or 32-bits (Real), it was decided to use 32-bit wide data. The combined operation code and operand results in a memory cell of a 39-bits width. One-operand CPU requires extra special register called in the standard CR (Current Result) [5,7]. Each operation uses the contents of the register, and the result is written back to CR. As the programmable controllers perform operations on numeric variables and single bits, the unit is equipped with two types of CR: CR_W0 (32 bits) for nu-
meric operations and CR_b0 (1 bit) for bit operations. Moreover, to use “bracket operations” it was necessary to built-in stacks of CRs (by means of the LIFO – Last In First Out data structure). The structure of the CPU is presented in Fig. 3. The most important parts of the CPU are: (a) Program counter (PC) – It is implemented as a module of 10-bit counter synchronized by the main system clock, and having the possibility of writing values directly (for “jump” execution); (b) Program memory (PM) – It contains a program written by the programmer; it has been placed in the structure of a “Block
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001
ARTICLE IN PRESS
JID: MICPRO 4
[m5G;November 24, 2015;12:11]
M. Chmiel et al. / Microprocessors and Microsystems 000 (2015) 1–10
Program Counter
Program Memory
PM_ADDR
8 39 Address Bus OP
Command Counter
32
IC
IC
Word Data Bus
4
INC_PC WR_PC WR_CC OP_Code
DMUX_CH CRW_CH
Instruction Decoder
WR_b
CRb_CH
CR_b0
OP_SIG_b
Control Bus
5 OP_SIG_W
IN_1 CODE Double
OUT (ACCW_DATA) DATA_BIT_BUS
CR_b0
IN_2
Word Arithmetic – Logic Unit (ALU_W) OV
8
CODE
Word Stack
CR_W0
WR_W
ALUW_ CR_b0
10
Bit Stack
IN_1
IN_2
Bit Logic Unit (ALU_b)
OUT
Fig. 3. The structure of the CPU.
(c)
(d)
(e)
(f)
RAM” of an FPGA; At this stage of the research, the program is loaded during the FPGA configuration process; Command counter (CC) – it is a special block arranged for the control of time (number of system clock cycles) of particular instructions. It has direct access to the command code and the instruction decoder. Thanks to the command counter, CPU is able to carry out instructions of any duration of execution, i.e. multiplication or division instructions that take more than one system clock cycle (Table 4); Bit-logic unit (ALU_b) – it performs operations on bits. Bitlogic unit includes: a processing module, CR_b0 and 256-bit stack for bracket instructions. Specific commands for PLCs, like AND/OR/XOR/NOT, are executed; Arithmetic-logic unit (ALU_W) - performs operations on 32-bit variables. It includes: the processing module, CR_W0 and 256double word stack for bracket instructions. In addition, the ALU generates the overflow (OV) flag. The ALU _W is precisely described in Section 5; Data bus (DB) – It enables communication between different PLC modules. It must therefore be very “flexible”. There are no tri-state buffers inside contemporary FPGAs. It is impossible to create the traditional bi-directional bus. Each must contain a separate inputs and outputs in order to write and read information. This makes every module to contain a 64bit port for each module using double word data. Such a data bus was created using multiplexers and it includes: six one-bit inputs, seven 32-bit inputs, one one-bit output and 32-bit output. Moreover, to execute instructions and to ensure flexible communication, a connection between the least significant bit of the 32-bit output and the one-bit input as well as between the one-bit output and the least significant the 32-bit input has been created. It is therefore possible to transmit one-bit data from markers memory by means of the 32-bit output of
0 Initialization
CLK
3 Instruction fetch
Command Counter = 0
CLK
Command Counter <> 0
2 Execution
1 Decoding
CLK
Fig. 4. An instruction cycle of the designed PLC.
markers memory. Communication, via the data bus controller, is to create a suitable channel in data bus multiplexers. This channel is determined by the instruction decoder; (g) Instruction decoder (ID) – it decodes the code of operation, and generates control signals for all modules of the PLC. It has been designed as a finite state machine implementing the four main phases of the instruction cycle (Fig. 4): initialization, decoding, execution and instruction fetch for the next command. The transition to each subsequent phase occurs with the rising edge of the system clock, except the state of instruction execution. Execution phase lasts a specified number of clock cycles (see the command counter). During the initialization phase the command counter is set by number of clock cycles required to implement the instruction (Table 4). During the decoding phase the appropriate control signals are prepared. The third phase is the execution order. In this phase the instruction decoder waits until the command counter is zero. In the meantime it resets, increment or decrement signals for stacks, prepares the PC and resets the control signal. The last phase of a state machine of the instruction decoder is
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001
JID: MICPRO
ARTICLE IN PRESS
[m5G;November 24, 2015;12:11]
M. Chmiel et al. / Microprocessors and Microsystems 000 (2015) 1–10
5
Fig. 5. Three clock cycle instruction realization diagram.
Fig. 6. One clock cycle instruction realization diagram.
preparation for next instruction cycle, e.g. the signal that increments the PC is reset. Time analysis of the state machine is shown in Figs. 5 and 6. The timing diagram presented in Fig. 5 shows the execution of the first instruction from the moment the system is turn on. It takes three clock cycles to generate the correct result by the ALU. Time analysis presented in Fig. 6 shows the execution of the command “in the middle of the program” for which ALU generates the result in one clock cycle. Markers memory is the memory located in the CPU, where the user can store any data: single-bit or 32-bit. It is built-in into Block RAMs and its structure was described in Section 3.
5. Arithmetic and logic unit It was decided that the PLC would support two number types: integer, and real. Both types are represented on 32-bit wide data words. Integer numbers use the well-known Two’s Complement (TC) format [25], and the real numbers are represented as single precision floating point numbers according to the IEEE 754 Standard [26].
5.1. Implementation of the ALU – the combinatorial version It was decided that the first version of the ALU would be a fully combinatorial structure, consisting of separate subcircuits responsible for generating results for the relevant operations. A block diagram of the combinatorial version of the ALU is presented in Fig. 7. As the diagram shows, input arguments to be processed by the ALU are applied to inputs of all subcircuits. The subcircuits operate in parallel, and each of them generates the result for the relevant operation. The final result is selected by the output multiplexer, and transferred to the output of the whole ALU. The address inputs of the multiplexer are controlled by the opcode delivered by the control unit of the CPU. The subcircuits responsible for performing subsequent operations do not share resources in general, although addition and subtraction were implemented in one module. The objective of the combinatorial implementation was to obtain an estimate of the maximum speed, which can be achieved by the ALU for a target technology. For this reason also the resource sharing was avoided. If the same resource was used by several subcircuits of the ALU, large multiplexers would have to be inserted into the circuit structure, and this would lengthen signal paths, and increase time delays.
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001
ARTICLE IN PRESS
JID: MICPRO 6
[m5G;November 24, 2015;12:11]
M. Chmiel et al. / Microprocessors and Microsystems 000 (2015) 1–10
32
32
Add/ Sub Real
32
Mul Int
32
32
OV flag 32
OV flag 32
...
32
Add/ Sub Int
A B
32-bit Result
OV flag
32
32
32
Mul Real
OV flag Comp. output
...
...
32
Comparator output
32
...
32
Compare
32
OV flag OV flag
Opcode
5
Fig. 7. Structure of the combinatorial version of the ALU.
The combinatorial version of the ALU performed 32 operations, which included: • the four basic arithmetic operations on integer and real numbers (i.e. ADD_I, SUB_I, MUL_I, DIV_I, ADD_R, SUB_R, MUL_R, DIV_R), • the modulo operation for integer numbers (MOD_I), • conversion from integer to real (ITR), and from real to integer (Round), • sign conversion for integer and real numbers (NEG_I, NEG_R), • the six basic comparators for integer and real numbers (EQ_I, NE_I, GT_I, GE_I, LT_I, LE_I, EQ_R, NE_R, GT_R, GE_R, LT_R, LE_R), • three logic operations on 32-bit data words (AND, OR, XOR), • shift to the left, and shift to the right operations (SL, SR), • two rotation operations (RL, RR). A more detailed description of the combinatorial ALU, together with the full list of the implemented operations, can be found in [27]. The design was prepared as a set of synthesizable VHDL modules. Implementation of logic operations in VHDL is simple, as logic operations can be directly synthesized. This concerns also comparators, fixed point addition, and subtraction, if appropriate numerical packages are included. If vectorized notation is used for the arguments, together with the relevant operators in the VHDL code, the synthesis software is capable of identifying the relevant operations, and implementing them using fast carry generators, what is a very efficient, and fast solution [28]. In the comparator block only two comparators were implemented directly, i.e. the “Greater than”, and “Equal”. The remaining operations
were generated using simple logic relations, e.g. “Less than or equal” is equivalent to “not Greater than”. The floating point comparators use basically the same circuitry, as their fixed point counterparts. They differ slightly, as floating point representation is based on the sign- magnitude concept. Floating point comparators perform also a check, if input arguments are correct. Synthesizing multiplication and division is a distinct problem. Several algorithms of performing these operations can be found in the literature. The synthesis software is usually not capable of synthesizing the operations directly, and the designer has to specify the circuit structure at a lower level of abstraction. Another option is to use IP core generators. However, this way we would obtain a solution, which is bound to the synthesis software used, and to the target architecture. As one of the objectives of the project was to prepare a design, which is architecture-free, and portable between software platforms, this approach was dropped. Fortunately, it is a rule, that modern FPGA architectures contain dedicated functional blocks (usually referred to as “DSP Blocks”) that facilitate implementation of more complex arithmetic operations, and – in particular – multiplication [29]. The ISE software that was used for synthesis is capable of synthesizing fixed point multiplication directly, if only the target FPGA structure contains blocks of this kind. It was assumed for the design that the target architecture would contain some kind of DSP blocks. Multiplication was such described in VHDL in a similar manner, as addition, and subtraction, i.e. by using vectorized notation, and the “∗ ” operator. In such a case the synthesis software implements fixed point multiplication using DSP blocks. The solution is both effective in terms of resources usage, and very fast [30]. However, this concept could not be applied for division operations, as no functional blocks performing division can be found in FPGA architectures. Finally, the fixed point division was implemented using the simple digit recurrence algorithm [25] iterated in space, i.e. the structure consisting of cascaded adders/subtractors. Implementation of floating operations in hardware is a more complex problem, as the floating point representation is not uniform, like the two’s complement representation. A floating point number consist of three components: the sign, the exponent, and the mantissa [26]. The actual operations should be performed on the mantissas of the arguments, but also the exponents have to be taken into account, as they contain information on the order of magnitude of a number. Fig. 8 shows a block diagram representing the structure of the subcircuit performing floating point addition, and subtraction. In the first block (the “CompMux” block), input arguments are compared, the greater, and the less numbers are identified, and transfer to the “Gt”, and “Le” outputs respectively. Then, in the “ExSub” block, exponent of the less number is subtracted from the exponent of the greater number. The difference “delta” is transferred to the “MShifter” block, which is responsible for shifting the mantissa of the less number by the appropriate number of bits, before the mantissas are added, or subtracted in the “MAddSub” block. Finally, in the “Norm” block, the resulting number is assembled, and transferred to the circuit output. The “Norm” block is also responsible for handling special cases, like infinities. If the result cannot be represented as a regular floating point number, the “NAN” output is activated. The combinatorial ALU was synthesized using the ISE software from Xilinx, and implemented in a Spartan 6 device. Table 1 shows the resulting resources usage. As we can see, the combinatorial ALU consumes a vast amount of logic elements in the target structure. This is the price for speed, and parallel operation. Especially the floating point operations, and division (both fixed, and floating point) requite much logic resources. Instruction execution times, which were obtained for the combinatorial ALU, are presented in Table 2.
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001
ARTICLE IN PRESS
JID: MICPRO
[m5G;November 24, 2015;12:11]
M. Chmiel et al. / Microprocessors and Microsystems 000 (2015) 1–10
A
B CompMux
Gt
Le
EGt
ELe
Op
MLe
delta
SLe
ExSub
MShifter MGt
MLe_Sh MAddSub RDe
EGt
7
torial hardware formed a structure containing extremely long signal paths with hundreds of logic levels. Delays of the combinatorial logic reaching 180 ns would degrade frequency of the clock controlling the whole CPU, if the assumption was made that every instruction was executed in one clock cycle. So it was assumed for the design, that the frequency of the clock would be adjusted to delays of the faster operations, and the division operations would be executed in more clock cycles. To accomplish this a timer/counter circuit was implemented in the CPU, that counts an appropriate number of clock cycles for every instruction, and generates a “Ready” signal for the control unit in the CPU. In the next step an attempt was made, to optimize the ALU structure. 5.2. Implementation of the ALU – the synchronous version The combinatorial ALU, which served as the reference design, proved that the ALU can be significantly faster, that the rest of the CPU. It was decided that the CPU would be controlled by a 50 MHz clock, i.e. the clock period was equal to 20 ns. Most of the instructions could be executed in one clock cycle. Moreover, the design of the CPU itself limited its operation speed. The CPU needed at least three clock cycles to execute all of the steps of an instruction cycle. So it was decided to optimize the ALU with respect to resources usage. Apart from some minor refinements, two ways of optimizing resources were applied to the structure of the combinatorial ALU: • Reuse resources in space. • Reuse resources in time, i. e. implement sequential processing.
Norm NaN
Result
Fig. 8. Structure of the circuit performing floating-point addition/subtraction. Table 1 Resources usage for the combinatorial ALU implemented in Spartan 6. Logic elements
Usage
DSP blocks LUTs
8 4592
Table 2 Instruction execution times for the combinatorial ALU implemented in Spartan 6. Instruction
Instruction execution time [ns]
Add/Sub. Integer Mult. Integer Div. Integer Add/Sub. Real Mult. Real Div. Real Word logic Compare Integer Compare Real Shift/Rotate
10 32 173 27 27 128 9 11 11 16
The term “Instruction execution time” for the combinatorial ALU is in fact equivalent to the delay of the signal path starting at the ALU inputs, ending at the ALU outputs, and running through the subcircuit responsible for generating the result for the relevant operation. As the table shows, for most if the instructions the ALU is very fast, and the delays are not greater than 27 ns. The division operations are the exception. The digit recurrence algorithm translated to combina-
To introduce resources reuse in space, the design of the ALU was reviewed, and function blocks that could be shared between subcircuits generating results for different operations, were identified. Applying resources sharing in space usually forces inserting into the circuit structure multiplexer blocks, often of big size (e.g. 32 bit wide). This method is thus efficient, if the shared functional blocks have an appropriate big size. The additional circuitry almost always introduces some extra logic levels to signal paths, so the resources are saved at the cost of worse instruction execution times. At first the resource sharing in space was applied to the comparator blocks. The “Greater than” comparison can be obtained by subtracting the arguments, and testing the sign of the result. Such the comparator block was merged with the fixed point adder/subtractor block. Moreover, the fixed point, and floating point comparator blocks can be easily merged together. Positive floating point numbers can be compared using a fixed point comparator. For negative numbers a simple converter circuit has to be added at the comparator inputs, as floating point representation is based in the sign-magnitude concept. Additionally some extra circuitry was added to detect illegal floating point arguments (e.g. NAN). The illegal arguments are indicated by the comparator block using the overflow flag. Resources sharing in space was also applied to blocks performing multiplication, and division. The floating point multiplication, and division, can be partially executed using the relevant fixed point blocks. However, sharing resources between fixed point, and floating point adder/subtractor was found inefficient. While preforming a floating point operation, a mantissa of a temporary result often needs to be shifted. For this purpose the same blocks were used, as the blocks generating the results of the shift, and rotate operations. The shifter blocks were based on the multiplexer scheme. Applying sequential execution consists in inserting into a circuit structure parallel registers, for storing temporary results, and using the same circuitry to generate temporary results in subsequent clock cycles, instead of iterating the circuitry in space. At first parallel registers were added at the inputs of both arguments. Thanks this the ALU became more independent from the rest of the CPU, and the control unit of the CPU is not blocked, while the ALU executes long operations.
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001
ARTICLE IN PRESS
JID: MICPRO 8
[m5G;November 24, 2015;12:11]
M. Chmiel et al. / Microprocessors and Microsystems 000 (2015) 1–10 Table 3 Resources usage for the synchronous ALU.
A 32
TC to sign magnitude Initial data write
32 62
32 31
Reminder
0
Quotient
Shift register with parallel load 62 32
31
B 32
TC to sign magnitude
4
32
4 levels of Adders / Subtractors
Fig. 9. Structure of the revised circuit performing fixed-point division.
Sequential processing was applied to the circuits performing division. In the combinatorial version of the ALU the fixed point division subcircuit contained 32 levels of adder/subtractor blocks, and this generated a big delay of almost 200 ns. The division subcircuit was redesigned. A block diagram representing the structure of the redesigned division subcircuit is presented in Fig. 9. The objective of the work was to reduce the number of adder/subtractor blocks, at the cost of executing the division operations in several clock cycles. It was experimentally (i.e. by simulation) found, that the delay of four levels of adder/subtractor blocks is still less than the assumed clock period of 20 ns. So the redesigned structure of the fixed point division subcircuit contains four levels of adder/subtractor blocks, generating a temporary result, which is stored in a parallel register. Four bits of the result are calculated in one clock cycle, and the division subcircuit needs 9 clock cycles to generate the result (one clock cycle is required to write input arguments to a parallel register). So the execution time is only slightly longer, than the execution time in the combinatorial ALU, but the division blocks consumes several times less resources. Finally, two additional operations were added to the ALU instruction list: the square (SQR_R), and the square root (SQRT_R). The square operation uses the floating point multiplication circuit. However, for implementing the square root, a new, complex block had to be added to the ALU structure. The square root subcircuit is a hardware implementation of the digit-by-digit algorithm [31]. The structure of the circuit is similar to the structure of the division block. Like the division block presented in Fig. 9, the square root subcircuit calculates four bits of the result in one clock cycle. The amounts of resources required by the redesigned ALU, depending on the target architecture, are presented in Table 3. As the table shows, the resources usage for the redesigned ALU is similar to that of the combinatorial, not optimized version. However, one has to keep in mind, that the new ALU contains an extra subcircuit, responsible for generating the square root, which is quite complex.
Logic elements
Usage (Spartan 6)
Usage (Artix 7)
DSP blocks LUTs
4 4297
4 4160
Table 4 shows execution times for all of the instructions implemented in the ALU. The data presented in the table concern only the ALU itself. Some extra clock cycles may be necessary because of operation of the CPU as the whole. For most of the instructions the result can be generated in one clock cycle. The execution times are similar, as of the combinatorial version of the ALU, which was used as the reference. The instruction execution times were compared with the relevant data for a few CPUs delivered by leading PLC manufactures [32,33]. Unfortunately the data regarding operation of the ALUs themselves are not published for commercially available PLCs. The only way to present a realistic comparison is to compare execution times of the whole instruction cycles. The operation of the CPU as the whole requires three extra clock cycles to be added to the cycles used by the ALU itself. The relevant data are also listed in the table. Unfortunately it is a rule, that for the newest CPUs, that have recently appeared on the market, only sample data concerning instruction execution time are published. In [34] one can find, that for the CPU 1215, i.e. the fastest CPU of the new S7-1200 family of compact PLCs from Siemens, floating point math instructions execute in 2.3 us. Similarly, for the S7-1500 we find that the execution time for an integer math instruction is equal to 16 ns, and for floating point math –64 ns [35]. PLCs of the Quantum series from Schneider execute integer math instructions in 45–60 ns, and floating point instructions in 400–500 ns [36]. The VIPA company supplying CPUs, which are advertised as compatible with Siemens PLCs on the machine language level, but faster than the original product, claims that their CPUs can execute arithmetic instructions on integer numbers in 10 ns, and on floating point numbers – in 60 ns. In [37] a design under development was presented, for which the objective was similar – to develop a PLC directly executing commands defined in the IEC 61131-3 norm. It was reported, that integer math instructions execute in 20 to 60 ns, and floating point math instructions – in 120 ns. As the above-mentioned data show, the presented ALU is comparable with the devices supplied by leading PLC manufacturers. One has to keep in mind, that CPUs of modern PLCs delivered by leading manufacturers are implemented as Application Specific Integrated Circuits (ASICs). For technological reasons an ASIC can achieve a much better speed, than programmable logic. There is also a possibility, that implementation of some operations (e.g. division) is based on a ROM memory, which is easy, and cheap to be processed in an ASIC, but not convenient in the FPGA technology. However, the presented design is architecture-free, and can be easily ported to an ASIC. In such a case a significant increase of the clock frequency, and thus also instruction execution times, can be expected. 6. I/O controller The I/O controller forms an intermediate block between the central processing unit and the signal modules. Due to industrial standards, signals modules were implemented outside the FPGA. In fact, the I/O controller is responsible for communications in three directions: CPU, signal modules and image memory. It realizes communication with central processing unit within the control loop (Section 4). After the end of the control loop the I/O controller realizes communication with signal modules. Input data are completed and output data are transferred to the signal modules. The transmission is based on the simple parallel protocol synchronized by means of read
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001
ARTICLE IN PRESS
JID: MICPRO
[m5G;November 24, 2015;12:11]
M. Chmiel et al. / Microprocessors and Microsystems 000 (2015) 1–10
9
Table 4 Instruction execution times for the synchronous ALU (Spartan 6 and Artix 7). Instruction
Instruction execution time The ALU itself
ADD_I SUB_I MUL_I DIV_I MOD_I ADD_R SUB_R MUL_R DIV_R ITR ROUND NEG_I NEG_R AND OR XOR EQ_I GT_I GE_I NE_I LE_I LT_I EQ_R GT_R GE_R NE_R LE_R LT_R SL SR RL RR SQR_R SQRT_R
The whole CPU
Clock cycles
Execution time [ns]
Clock cycles
Execution time [ns]
1 1 1 10 10 2 2 1 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 8
20 20 20 200 200 40 40 20 180 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 160
4 4 4 13 13 5 5 4 12 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 11
80 80 80 260 260 100 100 80 240 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 220
and write signals. The I/O controller is responsible for intelligent data exchange with the I/O modules because internal Data Bus is 32-bit wide but external Data bus is 16-bit wide. Furthermore, analogue 16bit data is converted to a 32-bit two’s complement value. Two problems concern the I/O controller and PLC configuration. First of all, the address space (8-bit external Address Bus) of signal modules is incompatible with the PLC memory map. This is because a PLC memory map is designed for experiments and it will be extended in the future. The second problem is that in this configuration there is no information about module presence. The I/O controller must check the entire address space. The simple transmission will be replaced in the future by an intelligent protocol of data exchange. The I/O controller is based on a dual-port block RAM analogously to the memory presented in Section 3. 7. Conclusions The design process of a programmable logic controller is shown in the paper. The designed prototype of PLC is compliant with the EN 61131-3 standard on the machine language level. Instructions are unary or without arguments. The developed PLC is implemented using an FPGA device in the form of a System on Chip. Hardware units like the I/O controller support the central processing unit for more effective work. However, all the operations are implemented fully in hardware. The design was prepared as a set of synthesizable Verilog, and VHDL models. The central processing unit includes an arithmetic-logic unit supporting floating point operations, and a bit-logic unit. It is very important that the bit-logic unit executes operations within one clock cycle. It is 80 ns for the 50 MHz clock oscillator that supplies the FPGA in
Siemens S7-312
Siemens S7-319
GE CPE040
220 220 210 510 430 1100 1100 1110 4850 700 4820 120 200 280 280 280 430 430 430 430 430 430 1670 1670 1670 1670 1670 1670 460 460 450 450 1150 8140
10 10 8 50 60 40 40 40 60 20 25 5 5 14 14 14 23 23 23 23 23 23 46 46 46 46 46 46 19 19 19 19 40 475
330 300 350 300 300 300 300 290 360 270 310 – – 440 480 440 350 350 360 360 360 360 370 380 380 380 380 380 630 630 360 400 330 290
the prototype PLC. The instruction decoder was designed to execute 122 commands. It is built in a very “flexible” way, and it can be easily adapted to other implementations. The element most responsible for this “flexibility” is the program counter, which counts the number of system clock cycles needed to execute a particular instructions. By entering a new value the number of cycles can be easily modified. The longest instructions are floating point calculations as well as multiplication and division. The ALU supports the main execution unit in executing arithmetic and logic operations on 32-bit words. The ALU can execute 34 operations, which include the basic logic operations, comparators, and the four basic arithmetic operations. The comparators and the arithmetic operations can be performed for two data types: integer, represented in the two’s complement format, and real, represented as a floating point number compatible with the IEEE 754 norm. Comparison of the presented CPU with products delivered to the marked by leading PLC manufacturers proves, that it is possible to achieve similar performance using programmable logic technology, and in particular FPGA devices, as the hardware platform. Future works will concentrate on optimizing the structure of the design, which can be achieved by including resource sharing to a bigger extent, and/or pipelining. It is also planned to broaden the instruction set implemented in the ALU so far, to more complex operations, e.g. logarithm, exponent, and trigonometric functions. Acknowledgments This work was supported by the Ministry of Science and Higher Education funding for statutory activities (decision no. 8686/E367/S/2015 of 19 February 2015).
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001
JID: MICPRO 10
ARTICLE IN PRESS
[m5G;November 24, 2015;12:11]
M. Chmiel et al. / Microprocessors and Microsystems 000 (2015) 1–10
References [1] M.S. Boggs, T.L. Fulton, S. Hausman, G. McNabb, A. McNutt, S.W. Stimmel (2003). Programmable logic controller – method, system and apparatus. US Patent No. US 6,574,743 B1. [2] Siemens, Simatic S7-200 Programmable Controller – System Manual, Siemens AG, Germany, 2008. [3] Altera Corporation (2013). Altera product catalog, (http://www.altera.com/ literature/sg/product-catalog.pdf last accessed 17.11.15). [4] Xilinx Inc. (2013). UltraScale architecture product selection guide. (http://www. xilinx.com/publications/prod_mktg/ultrascale_product_selection_guide.pdf last accessed 17.11.15). [5] K.-H. John, M. Tiegelkamp, IEC 61131-3: Programming Industrial Automation Systems, Springer, 2010. [6] Rockwell Automation (2012). Logix5000 Controllers IEC 61131-3 Compliance. Rockwell Automation Publication 1756-PM018C-EN-P. [7] Cenelec, EN 61131-3, Programmable Controller – Part 3: Programming Languages, International Standard, Management Centre, Avenue Marnix 17, B-1000 Brussels, 2013. [8] A. Milik, (2006). High Level Synthesis – Reconfigurable Hardware Implementation of Programmable Logic Controller, PDeS’06, Brno, Czech Republic, 14-16 Feb. 2006, 138–143. [9] J. Mocha, D. Kania, Hardware implementation of a control program in FPGA structures, Electr. Rev. 88 (12/2012) (2012) 95–100 (in polish). [10] M. Chmiel, J. Mocha, E. Hrynkiewicz, A. Milik, Central processing units for PLC implementation in Virtex-4 FPGA, in: Proceedings of the 18th IFAC World Congress, Milano, Italy, 2001 August 28-September 2. [11] Z. Hajduk, J. Sadolewski, B. Trybus, FPGA-based execution platform for IEC 611313 control software, Electr. Rev. (2011) ISSN 0033-2097. Vol. 87. No. 8/2011. [12] R. Czerwinski, M. Chmiel, W. Wygrabek, FPGA implementation, of programmable logic controller compliant with EN 61131-3, in: Proceedings of the 12th IFAC/IEEE Conference on Programmable Devices and Embedded Systems, PDeS’2013, Velke Karlovice, Czech Republic, 2013 25-27 September 2013, pp. 24–29. [13] E. Hrynkiewicz, M. Chmiel, Programmable logic controller – basic structure and idea of programming, Electr. Rev. (2012) 98–101 Vol. 88. No. 11b/2012. [14] E. Hrynkiewicz, M. Chmiel, About programmable logic controller – step by step, Electr. Rev. (2012) 303–307 Vol. 88. No. 9a/2012. [15] T. Klopot, P. Laszczyk, K. Stebel, J. Czeczot, Flexible function block implementation of the balance-based adaptive controller as the potential alternative for PID-based industrial applications, Trans. Inst. Meas. Control. 36, (8) (2014) 1098–1113. [16] M. Chmiel, E. Hrynkiewicz, M. Muszynski, The way of ladder diagram analysis for small compact programmable controller, in: Proceedings of the 6th Russian-Korean International Symposium on Science and Technology, KORUS2002, Novosybirsk, Russia, 2002, pp. 169–173. [17] S.L. Carrillo, A.Z. Polo, M.P. Esmeral, Design and implementation of an embedded microprocessor compatible with IL language in accordance to the norm IEC 61131-3, in: Proceedings of the 2005 International Conference on Reconfigurable Computing and FPGAs (ReConFig 28-30 Sept. 2005), IEEE Computer Society, 2005, pp. 18–23. [18] M. Okabe, Development of processor directly executing IEC 61131-3 language, in: Proceedings of the SICE Annual Conference, The University ElectroCommunications, 20-22 August 2008, Tokyo, Japan, 2008, pp. 2215–2218. [19] M. Chmiel, E. Hrynkiewicz, Remarks on Parallel bit-byte CPU structures of programmable logic controllers, in: M.A. Adamski, A. Karatkevich, M. Wegrzyn (Eds.), Design of Embedded Control Systems, Springer, 2005, pp. 231–242. Section V. [20] M. Chmiel, On reducing PLC response time, Bulletin of the Polish Academy of Sciences-Technical Sciences 56 (3) (2008) 229–238. [21] M. Chmiel, E. Hrynkiewicz, Concurrent operation of the processors in bit-byte CPU of a PLC, Control Cybern. 39 (2) (2010) 559–579. [22] ARM (2008). Cortex-M3, Technical Reference Manual. [23] Xilinx (2011). Spartan-6 FPGA Block RAM Resources, User Guide. [24] M. Chmiel, R. Czerwinski, P. Smolarek, IEC 61131-3-based PLC implemented by means of FPGA, in: Proceedings of 13th IFAC/IEEE International Conference on Programmable Devices and Embedded Systems, PDeS’15, May 13–14, 2015, Krakow, Poland, 2015, pp. 383–388. [25] J.P. Deschamps, G.J.A. Bioul, G.D. Sutter, Synthesis of arithmetic circuits, FPGA, ASIC and Embedded Systems, Wiley Interscience, Hoboken, 2006. [26] IEEE (2008). IEEE 754-2008 – Standard for Floating-Point Arithmetic. DOI 10.1109/IEEESTD.2008.4610935. [27] J. Kulisz, M. Chmiel, A. Krzyzyk, M. Rosol, A hardware implementation of arithmetic operations for an FPGA-based programmable logic controller, in: Proceedings of the 13th IFAC Conference on Programmable Devices and Embedded Systems, PDeS’, 13th–15th May 2015, Krakow, Poland, 2015, pp. 471–476. [28] Xilinx Inc. (2011). Spartan-3 Generation FPGA User Guide UG331 (v1. 8). [29] Xilinx Inc. (2011). Virtex-6 FPGA DSP48E1 User Guide UG369 (v1. 3). [30] J. Kulisz, J. Mikucki, An IP-Core generator for circuits performing arithmetic multiplication, in: Proceedings of the 12th IFAC/IEEE Conference on Programmable Devices and Embedded Systems, PDeS’2013, 25th–27th September 2013, VelkeKarlovice, Czech Republic, 2013, pp. 59–64. [31] L. Yamin, C. Wanming, Implementation of single precision floating point square root on FPGAs, in: Proceedings of the IEEE Symposium on FPGA for Custom Computing Machines, Napa, California, USA, 1997, pp. 226–232. [32] Siemens AG (2012). Simatic S7-300 Instruction List, CPU 312, CPU 314, CPU 315-2 DP, CPU 315-2 PN/DP, CPU 317-2 PN/DP, CPU 319-3 PN/DP, IM151-8 PN/DP CPU, IM 154-8 PN/DP CPU.
[33] General Electric Company (2015), Intelligent Platforms, Programmable Control Products, PACSystems, RX7i & RX3i CPU Reference Manual, GFK-2222 W, August 2015. [34] Siemens AG (2012). Simatic S7-1200 Programmable Controller, System Manual. [35] Siemens AG (2013). Simatic S7-1500 Technical Data. [36] Schneider Electric, Modicon Quantum Automation platform, Hot Standby System Unity Pro, 2013. [37] VIPA GmbH (2014). VIPA System 300S SPEED7 - CPU 314-6CF02, Manual, May 2014. Miroslaw Chmiel received the M.Sc. degree and the Ph.D. degree in Technical Sciences from the Faculty of Automatics, Electronics and Computer Science, Silesian University of Technology in 1992 and 2003, respectively. Now he works as an Assistant Professor at the Institute of Electronics, The Silesian University of Technology, Poland. His research interests include design and application of Digital Circuits, and, in particular, Programmable Logic Controllers, and Programmable Logic Devices.
Jozef Kulisz received the M.Sc. degree and Ph.D. degree in Technical Sciences from the Faculty of Automatics, Electronics and Computer Science, Silesian University of Technology in 1992 and 2003, respectively. Now he works as an Assistant Professor at the Institute of Electronics, Silesian University of Technology, Poland. His research interests include design and application of Digital Circuits, and in particular Programmable Logic Devices, Hardware Description Languages, and Programmable Logic Controllers.
Robert Czerwinski received the M.Sc. degree and the Ph.D. degree in Technical Sciences from the Faculty of Automatics, Electronics and Computer Science, Silesian University of Technology, Poland, in 2001 and 2006, respectively. Now he works as an Assistant Professor at the Institute of Electronics, Silesian University of Technology, Poland. His research interests include programmable logic devices, logic synthesis, and optimization.
Adrian Krzyzyk received the B.S. degree in electronics engineering in Technical Sciences from the Faculty of Automatics, Electronics and Computer Science, Silesian University of Technology, Poland, in 2014. Now, he is at present a M.S. student at the Silesian University of Technology, and a probationer at Cadence Design Systems. His research interests focus on logic synthesis.
Marcin Rosol received the B.S. degree in electronics engineering in Technical Sciences from the Faculty of Automatics, Electronics and Computer Science, Silesian University of Technology, Poland, in 2014. He is at present a M.S. student at Silesian University of Technology. He is also a probationer at Cadence Design Systems. His research interests focus on logic synthesis.
Patryk Smolarek received the B.S. degree in electronics engineering in Technical Sciences from the Faculty of Automatics, Electronics and Computer Science, Silesian University of Technology, Poland, in 2014. He is at present a M.S. student at the Silesian University of Technology, and a probationer at Cadence Design Systems. His research interests include logic synthesis, and verification.
Please cite this article as: M. Chmiel et al., An IEC 61131-3-based PLC i mplemented by means of an FPGA, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.11.001