' ! " J (, L(~ ~!R~J
|LilGTRIG POLLIR $t8?11f?18 R|II|IllelCH
m
ELSEVIER
Electric Power Systems Research 33 (1995) 69-75
Real-time simulation methods for a six-pulse converter Tim A. Haskew, David Jeff Jackson Department o/Electrical Engineering, The Universi O' o! Alabama, Box 870286, Tuscaloosa, AL 35487-0286, USA Received 10 October 1994
Abstract This paper demonstrates the simulation of a six-pulse converter within stringent real-time constraints. A clamped-input technique was employed to reduce the computational burden at each time step and to relax the time step constraints associated with other system components which may be interacting in a system-level simulation. Additionally, unique architectural aspects of the computing platform were exploited to enhance the simulation performance. The results and methodologies employed to obtain real-time processing on the IBM RISC System/6000 that are presented are intended to provide guidelines for various problems of a similar nature.
Keywords: Converters; Switching devices; Simulation techniques; Distributed processing; Data processing
I. Introduction Six-pulse power converters are common in many industrial and utility applications. Thus, simulation of these types of systems inherently includes simulation of converter circuits. The primary challenge in the simulation of such converters is presented by the power semiconductor switching devices. To accurately treat the effects of device switching, small simulation time steps are often employed. This produces excellent results at the expense of increased computation time. The small time step is applied to the entire system, which introduces an unnecessarily high level of oversampling to system components operating predominantly at the fundamental power frequency. Real-time simulation of power systems is a topic that is currently receiving a great deal of attention. The concept is to provide digital transient network analyzers that will allow digital hardware in the loop. This will provide a much more cost effective means of control and protection system design and calibration as well as operator training than currently exists. The current concept in these simulation methodologies is based on distributed processing architectures [1-3]. Other work has been focused on serial simulation of smaller scale systems for relay testing with digital signal processing (DSP) hardware [4]. Specifically, the simulation of a six-pulse converter is considered herein. This simulation is designed to oper0378-7796/95/$09.50 © 1995 Elsevier Science S.A. All rights reserved S S D I 0378-7796(95)00930-G
ate in real time with realizable interfaces to other components in a larger system. The computational platform used was the IBM RISC System/6000 Model 320. C and assembly language analyses are presented.
2. Converter topology The converter scheme for consideration is presented in Fig. 1. For modeling purposes, the valves were represented as bistate resistances; each valve had definite on and off resistance values. This modeling procedure treats conduction losses, but neglects switching losses. The circuit model for the converter is shown in Fig. 2.
+; vac
{ {? { l
-------o
3 o
I Fig. 1. Schematic diagram ot" the converter.
T.A. Haskew, D.J. Jackson Electric Power System.~ Research 33 (1995) 69 75
70
Table t Circuit element values
Fig. 2. Circuit model of the converter.
•
0
1
R
4 o
~
Value
R~ C L V. Vb V~ VI = V2 R~
100 f~ 101aF 10mH 60 sin(100zrt) kV 60 sin(100~rt - 120 °) kV 60 sin(100~rt + 120 °) kV 50 kV 5 0.056 Q when conducting 1 Mr2 when blocking
&/
s
+
Va
.
•
l1
_d- Vdc i/2
Circuit element
R.
Vc
; I
L_.7~ +__z,'% ?. I 2--*¢vv-~
Fig. 3 (left)• DC supply. Fig. 4 (right). Load model•
The DC input was modeled as an ideal bipolar DC supply, as indicated in Fig. 3. Fig. 4 shows the load model employed. Note that this load representation is a Thrvenin equivalent for each phase with a purely resistive impedance. The load model represents an interface with other system components. Hence, the Thrvenin equivalent may be varying dynamically. Values for the circuit elements are provided in Table 1.
3. State model formulation The state model for the complete network was found by graph theoretic techniques [5]. A tree for the network included all capacitors and voltage sources and excluded all inductors. Symbolic voltage and current sources were substituted for the capacitors and inductors, respectively, thus producing a resistive equivalent network, as shown in Fig. 5. A component equation (CE) was written for each circuit element in the converter model. All tree currents were expressed as a sum of cotree currents, thus producing a set of independent Kirchhoff current law (KCL) equations. Similarly, all cotree voltages were expressed as a sum of tree voltages. This provided a set of independent Kirchhoff voltage law (KVL) equations. Loop analysis was found to be optimal for the network. Thus, the K C L equations were substituted into the CEs. This result was substituted in the KVL equations. The result was a set of 12 simultaneous equations. The dummy equations were removed from the set, and six simultaneous independent equations
involving capacitor voltages, inductor currents, load currents, and lower-leg R C combination currents remained. Inspection of these equations, as was expected, indicated that the operation of each phase of the converter was independent of the other phases. This fact was exploited for simulation coding purposes. Solution of the nondummy equations allowed the lower-leg R C currents and load currents to be written as functions of the inputs and state variables. Thus, a standard-form state model could be formulated for each phase. The state model was expressed as
xi = A~/xi + B~/ui
(1)
The state model of Eq. (1) was converted to one of homogeneous nature: -fi = C0xi
(2)
In Eq. (2), xi is a state vector which has been augmented to include the original state vector as well as the voltage inputs. The subscript i denotes the phase under consideration ( i = 1,2, 3). The coefficient matrix is defined in Eq. (3) and the state vector in Eq. (4):
~--~R II ~--~g 21
~. R31
VcllT~ ILII~ Vc21~ iL21 ~ Vc31_+_+ iL3]~ 13+
vl i
Vb
v2,
)
~
R22 +_~ R32 Rc
V~z2_ u21
Rc
Rc
V~e2 "l
Vc3e
Fig. 5. Resistive equivalent network.
vc
71
T.A. Haskew, D.J. Jackson/Electric Power Systems Research 33 (1995) 69 75
xi =
(4) Ui
The subscript i on the coefficient matrices denotes the phase described, while the subscript j denotes the matrix corresponding to the present switch status. Note that the lower submatrices of C 0 are zero because the input is clamped over the time step, thereby having a zero-valued time derivative. Each phase contains two switches that can be in one of two states. Thus, there are four coefficient matrices for each phase. As each phase is identical in component values, only four coefficient matrices need to be computed for the entire converter, as indicated by Eq. (5). Each coefficient matrix was o f dimension 7 x 7. C i / = C2i = C3~ = C,
j = 1, 2, 3, 4
(5)
4. State model simulation
Each phase of the converter had an independent state model. Thus, a set of state models required simulation. For the ith state model (i = 1, 2, 3), a first-order finite-difference simulation was employed. The recursive relation is provided in Eq. (6), where Wj is the transition matrix defined in Eq. (7):
xi(tk) = ~,Xi(tk
t)
(6)
~i=(I--AtCi)
I
(7)
where I is the identity matrix, and At is the simulation time step. Along with the recursive matrix multiplication indicated in Eq. (6), the status of all switches required evaluation at each simulation solution point. An SCR is shown in Fig. 6(a) with the voltage and current conventions defined in the passive sense. The SCR circuit model is shown in Fig. 6(b). The switching conditions for an ideal SCR are presented in Table 2. A state of 0 represents an open switch, and a state of 1 represents a closed switch. When the bistate representation of the SCR is employed, the current and voltage are proportional. Thus, the comparative status relation can be expressed without the voltage, as indicated in Table 3. Note that only under two conditions does the switch status change. For simulation purposes, these are the only conditions that must be evaluated. Furthermore, no information has to be computed from the state vector. Since the i
(a)
(b)
Fig. 6. (a) SCR with positive Vii conventions. (b) SCR equivalent model with V/i conventions.
Table 2 Ideal SCR switching conditions At previous time step State
V
0 0
~>0 <0
At present time step i
1 I
Drive signal
State
>0
1 0
>0 ~<0
I 0
Table 3 Resistive SCR model switching conditions At previous time step
At present time step
State
i
Drive signal
State
0 0
~>0 <0
>0
1 0
1 1
>0 ~<0
1 0
switching devices are in series with inductors, the switch currents appear directly in the state vector. With the simplistic DC source model, the only output required for the simulation was the load currents. In fact, these currents can be found from three of the dummy equations previously mentioned. Thus, the load currents can be expressed as
i,(tk ) = Djxi(tk )
(8)
It bears mention at this point that the simulation is not dependent on the evaluation of the load currents. These values are only required at those points in simulation time when converter-load interfaces are required. Fig. 7 shows the load currents calculated during simulation. The valves were fired at 50 Hz and a 50 ~ts time step was used. This time step was the target value for real-time simulation. The load voltage sources were 4000,0
-200O.0
~ o 0 o - -
g.000
~
i
,
0--010
,
OmO ~
01030 Time (s)
Fig. 7. Simulation output waveforms.
Oleo
0'050
72
T.A. Haskew, D.J. Jackson/Electric Power 3~vstems Research 33 (1995) 69 75
4000.0 '',
¢!
, '
--il ..........i."
i
multiuser, multitasking nature of the AIX operating system.
2000.0 6. The IBM RISC System/6000 computer architecture <
0.0 /:':i
"2000"I ! -4000
'J
',./ 0 010
•/
i 0 020 0.030 Time(s)
,./ i 0.040
0.050
Fig. 8. ATP-EMTP simulationresults. interfaced with the simulation at each time step to provide a worst-case scenario. To demonstrate the validity of the simulation, Fig. 8 shows the results for the same system simulated with the Alternative Transient Program (ATP) version of the Electromagnetic Transient Program (EMTP). The results are identical, and thus are not shown on the same set of axes.
To most effectively exploit the potential of a given computer architecture as a platform for real-time simulation, a fundamental understanding of certain aspects of the computer architecture must be developed. The computer architecture for the RS/6000, shown in Fig. 9, consists of a highly concurrent, superscalar, pipelined central processing unit (CPU). Detailed descriptions of the processor architecture may be found in Refs. [6-8]. The CPU is capable of simultaneously executing integer, floating-point, and branch related instructions. Additionally, the processor can execute compound operations, such as floating-point multiply and accumulate, which play a significant role in decreasing the total execution time for simulations of this nature. The pipelined, concurrent organization of the processor allows the execution of many operations to be overlapped, thus allowing some operations to be executed with zero cycle cost. Exploiting these features yields very high fixed- and floating-point performance.
5. Considerations for real-time simulation 7. Simulation implementation Numerous considerations must be made for realtime simulation systems. Generally, these considerations are, in fact, a subset of the issues that must be addressed in any dedicated embedded, real-time environment. From a programming viewpoint, it is naturally imperative that executed code be as efficient as possible. Obtaining this efficiency involves a multilevel process which includes consideration of algorithms, compiler efficiency, coding style, code optimization techniques, and the target computer architecture which will, to a large degree, determine the feasibility of the real-time simulation. Additional concerns arise when the target computer architecture is general purpose in nature and supports a multiuser, multitasking operating system such as AIX (i.e. the IBM implementation of UNIX). These concerns address such issues as interrupt latency tolerance, context switch time between multiple processes executing on the system, control of memory management within the computer system, and dynamic versus fixed priorities for scheduling processes on the system. The ability to accurately analyze execution times for the simulation is also of primary concern. Obtaining sufficient efficiency for the simulation primarily involves an analysis of code generated using the C language, inclusion of several commonly utilized code optimization techniques which exploit the architectural features of the IBM RISC System/6000 platform, and inclusion of code to minimize effects of the
As previously mentioned, the simulation was developed in the C language for execution on the IBM RISC System/6000 Model 320 platform. Several iterations of optimization, which consisted of various levels of loop decomposition, were performed to allow the simulation to meet the real-time constraints. An analysis of the generated assembly language code provided insight to the optimization procedures applied and to the effect these procedures had on execution/simulation time. The time-critical section in the initial simulation program consisted of a triply nested loop structure in which the state vector was updated for each phase of the converter output. The outer loop iterates across the inde-
I
DataCache
]
Main Memory
Fig. 9. The IBM R1SC System/6000computer architecture.
73
T.A. Haskew, D.J. Jackson/Electric Power Systems Research 33 (1995) 69-73
Table 4 Simulation times and associateddata for optimizationsof the simulation Program (C code)
Average simulation time (ms) for 900 time steps
Maximum simulation time (ms)
Minimum simulation time (ms)
Standard deviation
Code size (bytes)
( 1) Totally looped (2) Decomposedterms (3) Decomposedequations (4) Totally decomposed
49.05 41.37 39.33 33.96
51.75 41.92 40.75 35.84
48.46 41.10 38.89 33.57
0.42 0.23 0.31 0.36
21177 21103 21531 21929
pendent phases, the secondary loop iterates across the equations to be evaluated for a given phase, and the inner loop iterates across terms within each equation. As expected, this procedure introduced the highest degree of computational overhead. Although some of the loop overhead can be hidden within the computation [6,9], the real-time constraints were not met with this approach. The first optimization technique utilized was the decomposition of the inner loop structure to remove the loop overhead associated with iterating across individual terms within an equation. This technique provided a small degree of reduction in computational overhead, but did not meet the given time constraints. The loop decomposition was extended to include the iterations across the equations for a given phase. The code structure then consisted of two completely unrolled/decomposed loops which formed the set of independent equations for each phase. The program consisted of a singly nested loop structure iterating across the three phases. As a final decomposition optimization, the phase loop was also decomposed. The resulting code met the real-time constraints of the system under most operating conditions. The execution/simulation times and additional data of interest for each of the programs listed above are given in Table 4. A point which should be noted is the code size reduction in the program with the term loop decomposed. This reduction indicates that the overhead in the inner loop structure of the original program is prohibitive, both computationally and spatially, with respect to the complexity of the operations performed within the loop. The overhead is attributed to array element address calculations for the three 3-dimensional arrays referenced within the loop and the code comprising the loop structure itself. An important compiler option used to allow the analysis of assembly code generated by the C compiler is the -qdebug = cycles switch. Use of this switch when generating assembly language output, as described in Ref. [9], enables the generation of information saved by instruction overlap made possible by the architecture of the IBM RISC System/6000. The compiler generates
the number of cycles required for execution of a particular instruction within the program. A cycle count of zero for a given instruction implies that the instruction is overlapped with another currently executing instruction and therefore does not add to the execution time for the program. An instruction of this type is referred to as hidden. A sample section of the assembly language generated for the original simulation is shown in Table 5. The use of hidden and compound instructions ( F M A = floating-point multiply and accumulate) within this code fragment is considered efficient. However, the ratio of data movement operations to arithmetic operations is high and is therefore considered an inefficient implementation. The total static cycle count for each implementation of the real-time portion of the simulation and a breakdown of the cycles into arithmetic, data movement, and loop overhead instructions are summarized in Table 6. The static cycle count consists of all visible arithmetic, data movement, and loop overhead cycles. The static cycle and instruction count data for the various simulation implementations show that the compiler is able to effectively schedule overlapped instructions and thus reduce total execution time. Additionally, the percentage of data movement cycles required for execution is significantly reduced in the case of the totally decomposed code. These factors, coupled together, allow the simulation to progress in substantially less than real time for the last case. Although the static cycle count increases with the increasing decomposed code, the number of dynamically executed instructions is in fact reduced. A summary of the number of dynamically executed instructions for a single simulation time step for each program is given in Table 7. This instruction count, combined with the information in Table 6, demonstrates the effectiveness of performing the described code optimizations. As shown, the number of executed instructions is significantly reduced for the totally decomposed code. This reduction, combined with the reduction in loop overhead, and the increased ratio of hidden-to-visible instructions accounts for the increased efficiency and decreased execution time of the simula-
T.A. Haskew, D.J. Jackson/Electric Power Systems Research 33 (1995) 69 75
74
Table 5 Inner loop construct and corresponding assembly language code C language code
Assembly language code No. of cycles
Instruction mnemonic
for(kkk = 0;kkk < 7;kkk + + ) x[iii][k][jjj] + = phi[ii][jjj][kkk]*x[iii][i][kkk];
Operands
Comments
Beginning of loop Load data from array phi Load data from array x Multiply and add Increment loop index Store result in array x Branch until complete Increment array index for x
CL.68 1
LFDU
fp0,r5 = phi(r5,8)
1
LFDU
fp2,r6 = x(r6,8)
1
FMA
fpl = fpl,fp0,fp2
0
AI
r0 = r0,1
1
STFL
x(r30,r24,0) = fpl
0
BCT
CL.68
1
A!
r30 = r30,8
tion. The execution times, per simulation time step, for each program, with respect to the real-time constraint, are shown in Fig. 10. Although the optimization methods described allow the simulation to proceed in real time under most operating conditions, the multiuser, multitasking nature of the AIX operating system may introduce latency in the simulation. Latency is introduced primarily by the normally varying priorities the AIX operating system assigns to various programs queued to execute, context switching between multiple processes executing on the system, and the overhead associated with code being transferred to virtual memory. To overcome these latencies, code was introduced into the simulation which establishes a fixed, high priority for the process. This favored priority effectively blocks other user code from executing on the system until the simulation is complete. This favored priority also slightly reduces the execution time for the simulation. Additionally, code was introduced to pin the
simulation code to real memory, thus preventing the operating system from transferring the code to virtual memory. Therefore, no overhead associated with virtual page swapping was present. The effects of these two techniques are shown in Fig. 11. The original totally decomposed code is executed twice: once without any system load and once when an artificial system load is introduced twice during the simulation. The system was artificially loaded, by introducing additional programs to be executed, at the sixteenth and sixty-third simulation time steps. As shown in Fig. 11, the execution time of the original code corresponding to the totally decomposed case remains sensitive to the system load and may exceed the real-time constraint. Also as indicated, the simulation may be desensitized to system load, thus minimizing variance in execution time to allow real-time constraints to continue to be met. Additional descriptions of these coding techniques may be found in Ref.
[lO].
Table 6 Static instruction cycle breakdown for the simulations Program
(1) (2) (3) (4)
Totally looped Decomposed terms Decomposed equations Totally decomposed
Arithmetic cycles used
Data movement cycles
Loop overhead cycles
Visible
Hidden
Visible
Hidden
Visible
Hidden
161 146 170 274
l0 16 22 73
141 121 191 192
30 29 46 59
33 34 37 64
14 13
l0 15
Hidden instructions (total)
Total static cycle count
54 56 78 147
335 301 407 530
T.A. Haskew, D.J. Jackson/"Electric Power Systems Research 33 (1995) 69 75
Table 7 Dynamic instruction count for a single simulation time step for each program Program
Dynamic instruction count per simulation time step
( 1) (2) (3) (4)
1024 750 738 685
Totally looped Decomposed terms Decomposed equations Totally decomposed
50 p.s ~
n
t
75
8. Simulation conclusions The optimization procedures described provide for the real-time execution of the simulation. Additionally, the optimizations provide better than real-time results, thus allowing the simulation to be interfaced with other component simulations in a larger system. This interface is the subject of future research. Availability of increasingly powerful computers, namely, advanced models of the IBM RISC System/6000, will continue to improve simulation times. Thus, the optimization techniques described, being applicable to higher performance models, afford the opportunity for formulating more complex simulations which may interact with other device simulations in a distributed fashion. The accuracy of the simulation has been verified on existing simulation and analysis packages.
References (1)
(2)
(3)
(4)
Fig. 10. Simulation times for programs with respect to real time.
140.0
•
.
.
,
•
.
.
,
.
.
.
,
.
.
.
,
.
.
.
1 -Original code: no system load - - - Original code: with introduced system load - " " Modified code: with introduced system load
120.0
100.0
I.~
r
?t
80.0
E 6O.0
.I
I
I
J
i
i
40.0
20"01.0
. . . . . . . . . . 21,0
;,
41.0 6 .0 Simulation Time Step
,;,
8 .0
Fig. I I. Effects of fixing program execution priority and code pinning.
[1] D.M. Falcao, E. Kaszkurewicz and H,L.S. Almeida, Application of parallel processing techniques to the simulation of power system electromagnetic transients, IEEE Trans. Power Syst., 8 (1993) 90 96. [2] R.C, Durie and C. Pottle, An extensible real-time digital transient network analyzer, IEEE Trans. Power Syst., 8 (1993) 84 89. [3] H. Taoka, I. lyoda, H. Noguchi, N. Sato and T. Nakazawa, Real-time digital simulator for power system analysis on a hypercube computer, 1EEE Trans. Power Syst., 7 (1992) l 10. [4] M. Kezunovi~ et al., Transients computation for relay testing in real-time, IEEE Trans. Power Delivery, 9 (3) (1994) 1298 1307. [5] W.A. Blackwell and L.L. Grigsby, lntroductoo' Network Theorl,, PWS, Boston, MA, 1985. [6] M. Misra (ed.), IBM RISC System/6000 Technology, IBM Document No. SA23-2619, IBM, Austin, TX, 1990. [7] R.R. Oehler and R.D. Groves, IBM RISC System/6000 processor architecture, IBM J. Res. Dev., 34 (1990) 23 36. [8] G.F. Grohoski, Machine organization of the IBM RISC System/ 6000 processor, IBM J. Res. Dev., 34 (1990) 37 58. [9] R. Bell, IBM RISC System/6000 PerJormance Tuning jor Numerically hltensive FORTRAN and C Programs, IBM Document No. GC24-3611, IBM, Poughkeepsie, NY, 1991. [10] R. Jordan et al., AIX Version 3.1 RISC System~6000 as a Real-Time St'stem, IBM Document No. GG24-3633-0, IBM, Austin, TX, 1991.