The Dynamic Properties Investigation of the PLC CPU Implemented in FPGA M. Chmiel*., E. Hrynkiewicz**
Institute of Electronics, Silesian University of Technology, Gliwice * (e-mail:
[email protected]) ** (e-mail:ehrynkiewicz)@polsl.pl Abstract: The paper presents some program examples written and tested to show possibilities of construction of CPUs for PLCs build based on FPGA development platform. Presented unit is optimised for minimum response and throughput time. The constructions are based on bit-word structures of CPU and two types of data (condition) exchange methods: with acknowledge and without acknowledge – in both cases control data are passed through the set of flip-flops. Third unit is built to compare with these two. This unit is simple one processor unit which can execute both - binary and numerical operations. The experiments shown high performances of elaborated CPUs. Keywords: Programmable Logic Controller; Central Processing Unit; Bit-Word Structure of CPU; Scan Time; Throughput Time; Response Time; Concurrent Operation; Field Programmable Logic Array.
1. INTRODUCTION It may be noticed that processes for which PLCs are applied have most of all binary character or binary with small analogue component. Besides it, there are also controlled objects where analogue and digital parts are independent. This observation led the designer to the conclusion that it was possible to develop central processing units that are called bit-byte or bit-word units (see Aramaki et al., 1997). Such units consist of two processors - one for binary and one for analogue tasks. Particular processors in such units execute the assigned for them tasks. In this way such a unit makes possible parallel operation of a few processors. For such CPU the main problem for solution is the way of task assigning to particular processors and finding the gifted structure of CPU to the completion of such task assigning in practice as shown by Michel (1990). The other important problem inseparable from hardware is programming tools. Those tools should enable easy and efficient creation of control program. The programming toolbox should take benefits from all aspects of multiprocessor unit. Often bit-word CPU is designed and constructed with standard microprocessors while development of programmable logic devices creates the new possibilities in this area. The bit-word PLC CPU implemented on FPGA platform was presented by the authors in Chmiel et al. (2009). The ideas presented in (Chmiel and Hrynkiewicz, 2005; Chmiel et al., 2005) were used in that approach. The concurrent execution of instructions, as well as the processors synchronisation mechanism is presented in cited papers. Information between processors may be exchanged in two alternative ways: by means of flags written to the flip-flops set equipped with a acknowledge mechanism (see Chmiel and Hrynkiewicz, 2008) and by means of exchange memory, which was implemented in dual port RAM (see Chmiel et al., 2010).
Fig. 1. CPU with data exchange mechanism. The programming language for the elaborated CPU is similar to the STL language for S7-300/400 (see Berger, 2001) and S7-200 (see Siemens, 2009) PLC Siemens families. For assigning the parts of control program to each processor special compiler was developed. Programmer writes a program in form of instructions sequence. Compiler checks the syntax and splits the sequence of instructions into two streams. Those streams are later compiled separately for each processor and written to the program memory of the particular processor. The paper deals with the problem of dynamic properties investigation of such CPUs. The dynamic features of programmable logic controller are described by means of such parameters as instruction execution time, scan time and throughput time. While working on the optimisation of PLC central processing unit, all listed above parameters must be taken into account (see Chmiel, 2008) 2. THE STRUCTURES OF BIT-WORD CPU PROCESSORS IMPLEMENTED IN FPGA For experimental and evaluation purposes, three structures of central processing unit have been designed, described in VHDL and finally implemented: CPU with two independently working processors, CPU with two
dependently (one to the other) working processors, single processor CPU equipped with word and bit operations. For two processors units one processor for bit operations and second for word operations have been designed. 2.1 Word Processor Hardware Implementation
The bit processor has been designed to perform logic operations quickly. General construction of the bit processor was derived from the word processor with some simplification possible to the specificity of logic bit operation (Fig 3):
Accumulator’s size is reduced from 16 bits to 1 bit. The Ac_a register is the default target for all operations;
ALU is simplified that is restricted only to logic operation. It should be called a Logic Unit (LU);
Number of auxiliary registers is reduced;
Bit co-processor has been removed as no longer required in this structure;
An 8-bit bus is used for data transfer purposes instead of 16-bits.
Adress and Data Busess
The block diagram of designed word processor is shown in Fig. 2. It consists of the blocks described in Chmiel et al. (2011).
2.3 Bit Processor Hardware Implementation
Fig. 2. Word processor block diagram. 2.2 Word Processor Executed Operations Fig. 3. Bit processor block diagram. Following operations are executed on 16-bit words:
Transfer operations – the content of word and bit accumulators is transferred to I/O modules or marker memory;
Arithmetic operation performed on 16-bit arguments;
Comparison operation that are performed on the state of word accumulators while the result is stored in bit/Boolean accumulator Ac_A;
Using VHDL for design purposes enables flexible description and easy modification in functionality (see Skahill, 1996). 2.4 Bit Processor Operations Instruction list of the bit processor covers the following operations:
Data transfer instructions allows exchanging data between I/O space, marker memory and exchange memory;
Logic operations on word accumulator contents;
Bit coprocessor operations. Set of logic operations performed on Ac_A and Ac_B 1-bit accumulator. Result is always stored in Ac_A;
Logic operations are performed on accumulators Ac_a and Ac_b contents. Result is placed to Ac_a;
Reading of binary/Boolean output of counters and timers;
Counter and timer operations;
I/O space and process image memory configuration.
I/O space and process memory configuration.
2.5 Timer and Counter Hardware Implementation Timers and counters are used by both bit and word processing units. The timer and counter units have been designed to operate with single and dual processor CPU.
They are autonomic units that operate under configuration controlled by the word processor. Each timer and counter has an individual triggering input and individual output. This allows for direct access to timers and counters reducing the system bus load. This also yields simultaneous access to the timers and counters by word and bit processing units (Fig. 4). The structure of timers and counters implementation is presented in details in Chmiel et al. (2010).
Fig. 4. The connections between timers and processors. The CPU has been equipped with 16 timers marked T0 to T15: timer-on delay, timer on-delay retentive and timer-off delay function have been implemented. All timers can operate with resolutions of 1s, 100ms, 10ms and 1ms – programmed individually in time base unit. The maximum count is 16383 clock pulses. Apart from timers, the CPU has been equipped with 16 counters marked from C0 to C15. Each counter can operate in one of three modes: count up, count down and bi-directional. 3. IMPLEMENTATION RESULTS After completing design and verification processes the CPU has been implemented in the target device. For implementation quality, the number of required resources was collected for each block. The target device is a Xilinx XC4VLX25 that belongs to the Virtex-4 family (see Xilinx, 2008). In this platform it has been implemented not only CPU but whole small PLC with digital input module, digital output module, flash memory controller and serial asynchronous interface. Such unit consumes about 17% (1841 slices) of available logic resources and about 15% Block RAMs (11 blocks) while CPU itself consumes 13% logic resources.
the market the special programs were prepared. The problems solved by individual programs were diversified in such a way that each of them utilizes other recourses and possibilities of the CPUs. The problems and the programs written in programming language elaborated for programming designed CPUs are presented below. It is necessary to remember that elaborated CPUs were equipped with identical structures of bit and word processor. Therefore there is not possible to observe difference between time of executing the operations on binary variables and numerical variables as it can be observed in many other constructions. Program 1. Rectangular wave generator with frequency 1Hz is modelled. The output pulses are used for incrementation of counter state. The state of the counter is compared to constants. As the result the subsequent LED is fired. A number of fired LEDs depends on the counter state. The maximum number of the fired LEDs is 8. The ninth 9 pulse resets the counter and counting starts again. The counter, timer, comparator and 8 cells of data exchange hardware are used in the program. The number of data exchange operations is similar to the number of other operations. It means that during program splitting the compiler introduces many auxiliary instructions. By means of this program it is tested a behaviour of the CPUs when large number of data exchanges between processors occurs and a compiler has to introduce to a program many auxiliary instructions which control data exchange process. Program 2. Setting of binary output with hysteresis is modelled. Word processor reads out the numeric value of analogue input and compares it to the two threshold values. The results of comparison are transferred to bit processor which uses them for setting the binary output. In this program the main operations are executed by word processor. The information is transferred only in one direction – from word processor to bit processor. Program 3 Its task is based on solving logic equation represented by ladder diagram shown in Fig. 5. Generator outputs (Gen 1Hz and Gen 5Hz) are used as the logic variables in the ladder diagram. The two rectangular wave generators are modelled with timers utilisation like it was shown in Program 1.
Two processors (bit and word) were implemented for investigation different configurations of central processing units. Most of instructions of those processors are executed within 2 clock cycles. The development board was clocked by a 50MHz oscillator, equivalent of 40ns per instruction. 4. EXPERIMENTS For investigation an efficiency of the CPU operation, for comparison designed and implemented processors and for comparison the PLC with designed CPUs to PLCs present in
Fig. 5. Ladder diagram of the task from the program 3.
Program 4. The counting of 24 hours production of a factory is simulated. Three production lines work in the factory. The program utilises the timers, counters and markers memory. The information between processors is transferred in both directions. The programs elaborated for solution of the above described problems were written using language proposed in Chmiel et al. (2010) for built CPUs. Next the real time of each program execution was measured (see Table 1). Table 1. Test program execution time CPU type Word Bit-Word CPU processor work time with no acknowledgment Bit processor work time Word Bit-Word CPU processor work time with acknowledgment Bit processor work time CPU Single processor processor CPU work time
Prog. 1
Prog. 2
Prog. 3
Prog. 4
2.22 μs
0.76 μs
0.96 μs
2.88 μs
1.62 μs
0.50 μs
2.52 μs
2.02 μs
2.28 μs
0.76 μs
2.52 μs
2.96 μs
2.28 μs
0.76 μs
2.52 μs
2.96 μs
2.24 μs
0.92 μs
3.18 μs
3.12 μs
The following conclusions may be formulated on the base of measured results. When a number of data exchange operations is comparable to a number of other operations then time of program execution by dual processor CPU become similar to the time of program execution by one processor CPUs (see Program 1). Increasing more and more the number of information exchange operations will cause that one processor CPU will be the fastest. Taking into account the above consideration we can come to the conclusion: to obtain improving efficiency of dual processors CPUs it is necessary to shorten information exchange time between processors. Applying in the information exchange hardware individual data line to and from each memory cell/flip-flop it would be possible to improve system efficiency. Such solution causes that disappears necessity of memory cells addressing and this way access time to the cells can be shortened. Higher utilisation of programmable device resources is a cost of this solution however it is not critical for chosen FPGA. In Program 1 one can see 8 transfers of information from word processor to bit processor. This is the reason that 8 auxiliary instructions were introduced to the programs executed by each processor. Considerable number of auxiliary instructions in Program 1 causes that one processor CPU executes the same task faster then dual processor CPU exchanging information with acknowledge. Whole program for dual processor CPU consists of 59 instructions. In this number, 16 instructions were added during splitting the program by a compiler into the parts assigned to bit and word processors. It is more then 27% of all instructions. The listing presented below shows the program for two exchanges of information in each direction. The auxiliary instructions added by a compiler were bolded.
Nr 12 13 14 15 16 17 18 19 20 21 22
Word Processor CRD C0 LBL 1 GE DFL 0
LBL 2 GE DFL 1
Bit Processor
Comment ;C0 to Accu_A ;1 to Accu_B ;if C0>=1 ;1 to exchange memory ;C0>=1 – 1 to ac_A ;ac_A to O0.0 ;2 to Accu_B ;if C0>=2 ;1 to exchange memory ;if C0>=1 - 1 to ac_A ;ak_A to O0.1
LFLB 0 STMB O0.0
LFLB 1 STMB O0.1
In Program 1 word processor executes 30+8 instructions while bit processor executes only 13+8 instructions. The situation in which a CPU working with acknowledges executes its program faster then other one working without acknowledges is not possible. It results from the processors construction. In both types of CPUs they execute the same instructions with the same speed. A CPU working with acknowledge can to be equal at most to a CPU working without acknowledge. Such case has happen when Program 2 and Program 3 were executed. Therefore construction of the CPUs that processors operate fully independently seems to be justified because they will operate faster then other constructions. A difference in program execution time between CPU exchanging information with acknowledge by the processors and without acknowledge mostly depends on a program structure. The execution time of Program 2 and Program 3 by both dual processor CPUs does not differ a lot. In case of program 2 there are two exchanges of information (Fig. 6). The both exchanges are directed from word processor to bit processor. Despite that bit processor has to wait a long for information from word processor its program is finished earlier. The program for word processor is longer therefore it decides about execution time of whole program. Bit processor
Word processor
Information transfer
Information transfer
Processor is working
Bit processor waiting for the information from word processor
Fig. 6. Program 2 executing with acknowledgment. The Program 2 consists of 23 instructions in which there are 2 instructions for information exchange for each processor. In
this program 13+2 instructions executes word processor while bit processor executes 6+2 instructions. The opposite situation can be observed during Program 3 execution. In this case the word processor has to wait for information from bit processor but he finishes his program before bit processor. Program 3 contains 59 instructions with 2 instructions for data exchange for each processor in it. In Program 3 word processor executes 18+2 instructions and bit processor 6+2 instructions. An information exchange is directed only in one direction in this program too.
state to word processor. The word processor checks the signal for changes. If it detects the positive edge the markers exchange is performed. Nr 43 44 45 46 47 48
Word Processor
LDR WLDW WSTW WSTMB
6 M0 M1 M10
Comment ;I0.5 to ak_A (shift) ;ak_A to exch. memory ;edge detection ; ;if edge - M0 to M1 ;and set M10 marker
Bit Processor LDB I0.5 DFLB 6
As it may be observed the noticeable shorter times of program execution by both processors in CPU exchanging an information without acknowledge are obtained for Program 4. This is an effect of evenness load of both processors. The bit processor mainly evaluates the moments when counting should start and calculates the logic function controlling the moment of resetting of all counters used in the program. The word processor counts subsequent elements, executes addition, information transfer and counters resetting. The Program 4 consists of 72 instructions in which there are 10 pairs additional instructions introduced by the compiler that control information exchange between both processors. In this program word processor executes 32 +10 instructions and bit processor executes 20+10 instructions.
The elaborated programs have been used for benchmarking of developed CPUs with Siemens’ PLCs the most popular in Poland. The Table 2 presents execution times of these programs. Most of the commercial units execute program written with use of LAD diagram much longer than presented solutions. The direct implementation with STL allows for most optimal implementation. Two observations can be made. The significant execution time reduction can be achieved by implementing program with use of STL. The only one unit S7-319 that can be compared with developed two processors solutions but it is the most expensive unit. This is only possible if the program is written in STL language and is optimised.
This program is the longest one and contains the most interesting elements. Below the listing of a one counter servicing is presented.
Table 2. Test program execution time for commercial PLCs
Nr 5 6 7 8 9 10 11 12 13 14
Word Processor
LDR 0 CU C1
LFL 1 WCRES C1
Comment ;I0.0 to ak_A ;ak_A to exch. memory ;edge detection ;incrementing of C1 ;M10 to ak_A to ak_B ;M10 to ak_A to ak_B ;M10 OR M11 to ak_A ;ak_A to exch. memory ;exch. memory to Ak_A ;if Ak_A=1 reset C1
Bit Processor LDB I0.0 DFLB 0
CPU type S7-315 S7-315 S7-315 S7-319
LOB M10 LOB M11 ORB DFLB 1
As it is seen the information exchange between processors occurs in this part of program twice. The bit processor has to transfer to the word processor information about state of input I0.0. The word processor has to detect rising edge on this input and next executes suitable transfers between cells of flag memory. The word processor process the request by incrementing the C1 counters (operations 5 – 8). In the lines 9 to 14 the counter clearing at the beginning of each shift in the factory is written. The fragment of program in lines 46 – 48 is executed conditionally when the condition is met. The conditional execution of instructions is achieved by appending the prefix W to instructions. Presented part of program contains interprocessor data exchange fragments. The bit processor transfers the I0.5 input
S7-224
Prog. 1 Written in 57.38μs LAD language LAD – with 55.28μs optimisation Written in 6.79μs STL language Written in 1.42μs STL language Written in 268.9μs LAD language
Prog. 2
Prog. 3
Prog. 4
61.9μs
54.2μs
75.6μs
39.8μs
52.4μs
44.3μs
1.87μs
4.07μs
6.76μs
0.544μs
0.59μs
0.695μs
222.6μs
83.68μs
280.2μs
To measure throughput time, to the CPUs were connected one digital input and one digital output modules. Program shown below rewrites state from I0.0 to O0.0. It takes 21 clocks cycles, which give 420ns throughput time for each presented CPUs without I/Os modules . SOEAB 1
– number of last module to auxiliary register - 2 clock cycles
SCNB 0
– state of input module to process image inputs - 5 clock cycles
LDB I0.0
– I0.0 to Ak_a - 4 clock cycles
STMB O0.0
– Ak_a to O0.0 - 4 clock cycles
UOB 1
– process image outputs to output module – 6 clock cycles
Table 3. Scan times and throughput times for tested CPUs CPU type S7-315 S7-319 S7-224 Designed CPU
Scan time [μs] 300 10 370 (Boolean)
Throughput time [ms] 3.90 2.90 6.50
50
1.70
ACKNOWLEDGEMENT The values of throughput times presented in Table 3 were measured for CPU with one digital input and one digital output modules. The values of scan times were taken from the technical specifications of considered PLCs. 6. CONCLUSIONS The developed central processing unit structures was fully custom design created from the ground. The purpose of the design was to compare obtained performance with that offered on the market. In designed unit two different mechanism of inter-processor communication have been implemented. The inter-processor data exchange based on set of flip-flops unit. The other one was based on the dual port RAM. This solution allows for fully parallel operation. The use of the VHDL and high density programmable logic devices allowed designing, constructing and verifying completely new constructions. The experiments have been carried out on designed units after their implementation in FPGA. Presented processing unit has its own instruction list that also required designing dedicated assembler. Obtained results proof the efficiency of data exchange without acknowledgement. The data exchange requires additional instructions that are automatically inserted during compilation process. Despite introduced program overhead the improved performance has been achieved. It must be pointed out that eliminating acknowledgement a parallel execution of the program by both processors assure proper program processing without rising race conditions leading to undesired behaviour of the PLC. The works over reducing interprocessor data exchange time without acknowledge and development of improved dedicated to PLC processor cores will be continued. Processor architecture will be improved by reducing number of cycles required for completing the instruction and using queue mechanisms. The control program execution time comparison is difficult for different PLC manufactures while there are not a standard set of benchmark programs that will allow for rational and objective assessment of controllers performance. The result can be incorrectly interpreted depending on chosen program and family of PLCs. There are lacks of reliable patterns (programs) and results that can be used for comparison purposes. Manufacturers do not publish the architecture details of the PLC processing unit. There are very general information that does not cover information about number of processors operating in CPU and their functional dependencies. Presented in Table 2 results of execution time of proposed programs significantly differ among each other. We only suppose that the differences between execution times are a result of PLC CPUs architecture differences.
This work has been supported by the Polish Ministry of Science and Higher Education (5391/B/T02/2010/38). REFERENCES Aramaki N., Shimokawa Y., Kuno S., Saitoh T., Hashimoto H. (1997), A new Architecture for High-performance Programmable Logic Controller, Proceedings of the IECON’97 23rd International Conference on Industrial Electronics, Control and Instrumentation, IEEE part vol.1, pp.187-190, New York, USA Berger H. (2001), Automatic with STEP7 in STL and SCL – SIMATIC S7-300/400 Programmable Controllers, Siemens AG, Germany Chmiel M., Hrynkiewicz E. (2005), Remarks on Parallel BitByte CPU structures of Programmable Logic Controllers. In: Design of Embedded Control Systems, Section V, (Adamski M.A., Karatkevich A., Węgrzyn M), pp. 231-242, Springer Science+Business Media, Inc. Chmiel M., Hrynkiewicz E., Milik A. (2005), Concurrent operation of the processors in Bit-Byte CPU of a PLC, Preprints of the IFAC World Congress, Prague, Czech Republic, July 3-8 Chmiel M., Hrynkiewicz E. (2008), Fast Operating Bit-Byte PLC, Preprints of the 17th IFAC World Congress (on DVD-ROM), Seoul, Korea, July 6-11, pp. 14810-14815 Chmiel M. (2008), On Reducing PLC Response Time, Bulletin of the Polish Academy of Sciences. Technical Sciences, Vol.56, No.3, pp.229-238 Chmiel M., Mocha J., Hrynkiewicz E. (2010), A FPGABased Bit-Word PLC CPUs Development Platform, The International IFAC Workshop on Programmable Devices and Embedded Systems, PDeS’10, October 6-7, Pszczyna, Poland, pp. 155-160 Chmiel M., Mocha J., Hrynkiewicz E., Milk A. (2011), Central Processing Units for PLC implementation in Virtex-4 FPGA, Proceedings of the 18th IFAC World Congress, August 28-September 2, Milano, Italy Michel G. (1990), Programmable Logic Controllers, Architecture and Applications, John Wiley & Sons, West Sussex, England Siemens (2002), Simatic S7-200 Programmable Controller System Manual ed. 04/2002, Siemens AG Skahill K. (1996), VHDL Language for Programmable Devices, Prentice Hall Practice Xilinx (2008), Virtex-4 FPGA User Guide, UG070, version 2.6. www.xilinx.com, USA