Improvement of GPU parallel real-time equilibrium reconstruction for plasma control

Improvement of GPU parallel real-time equilibrium reconstruction for plasma control

Fusion Engineering and Design 128 (2018) 82–85 Contents lists available at ScienceDirect Fusion Engineering and Design journal homepage: www.elsevie...

597KB Sizes 2 Downloads 27 Views

Fusion Engineering and Design 128 (2018) 82–85

Contents lists available at ScienceDirect

Fusion Engineering and Design journal homepage: www.elsevier.com/locate/fusengdes

Improvement of GPU parallel real-time equilibrium reconstruction for plasma control

T



Y. Huanga, B.J. Xiaoa,b, Z.P. Luoa, , Q.P. Yuana a b

Institute of Plasma Physics, Chinese Academy of Sciences, Hefei, China School of Nuclear Science & Technology, University of Science & Technology of China, China

A R T I C L E I N F O

A B S T R A C T

Keywords: Equilibrium reconstruction Plasma control EAST GPU parallel computation

Improvements of GPU parallel real-time equilibrium reconstruction code P-EFIT are presented. P-EFIT is based on the EFIT framework, but built with the CUDA™ architecture to take advantage of massively parallel Graphical Processing Unit(GPU) cores to significantly accelerate the computation. Newly designed architecture and applied technique make P-EFIT integrate into plasma control system(PCS) as a sub-algorithm. Efficient parallel strategy and algorithms allow P-EFIT to achieve good computational performance. P-EFIT can complete one equilibrium iteration and transfer real-time diagnostics, control signals and equilibrium data in 375us with 129 × 129 spatial grid. P-EFIT has high spatial resolution, customized modules and internal current profile reconstruction for realtime plasma control in EAST.

1. Introduction Experimental equilibrium reconstruction provides information such as plasma boundary, current density and safety factor profiles for tokamak operation and research. EFIT reconstructs equilibrium by finding the Grad-Shafranov solution that is the least squares best fit to the experimental measurements [1]. It efficiently combines equilibrium and fitting iterations to search for an optimum solution and has been widely used in many tokamaks worldwide. However, the full algorithms of EFIT is computation intensive to be used in real-time control particularly for the high spatial grid resolution. Its real-time version, called RT-EFIT [2] is currently used in DIIID, EAST, KSTAR, NSTX and MAST [3–6]. Based on CPU OpenMP parallel computation, the code called GPEC [7] and LIUQE [8] have been developed and applied in ASDEX-U [9]. Additionally, a parallelized version EFIT code based on GPU for magnetic reconstructions, P-EFIT has been developed, which significantly accelerate the whole computation process [10]. P-EFIT efficiently takes advantage of massively parallel GPU cores and significantly accelerates the EFIT reconstruction algorithms and it has been successfully implemented for plasma control in EAST [11]. P-EFIT is based on the EFIT framework but instead of using a multicore CPU platform, it takes advantage of massively parallel GPU cores to significantly accelerate the computation. The equilibrium reconstruction algorithm consists of several sequential middle-scale matrix multiplications which need to be iterated. Parallelizing these algorithms require many computational cores and a large amount of



Corresponding author. E-mail address: [email protected] (Z.P. Luo).

https://doi.org/10.1016/j.fusengdes.2018.01.043 Received 22 June 2017; Received in revised form 2 January 2018; Accepted 17 January 2018 0920-3796/ © 2018 Elsevier B.V. All rights reserved.

communication between cores. There are standard linear algebra routines in CUDA™ library, but the size of matrix in plasma equilibrium reconstruction algorithms is too small to achieve good performance by directly using them. When developing modules in P-EFIT, these customized parallel algorithms should be carefully designed with full consideration on the needs of numerical algorithms and the GPU capacity. Efficiently distributing hundreds of GPU cores and minimizing the communication among them are basic principle. Some optimizations for middle-scale matrix multiplication are described in [10]. A fast Grad-Shafranov solver based on eigenvalue decomposition solve the block tri-diagonal block linear system in parallel on the GPU as described in [12]. Efficient parallel strategies and algorithms allow P-EFIT to achieve good computational performance regardless of different current representations and diagnostics [13]. P-EFIT supports MultipleInput Multiple-Output (MIMO) plasma shape control experiments [14,15] and includes POlarimeter-INTerferometer (POINT) diagnostic for real-time plasma current density and q profile reconstruction in EAST [13]. Advanced plasma control for EAST steady state operation requires more accurate and detailed real-time plasma equilibrium reconstruction. P-EFIT needs to be improved to have higher spatial resolution [16], more diagnostics and integration into PCS. Framework and algorithms in P-EFIT should be updated for universality and integration. By newly designed architecture and interface, PCS can directly manage P-EFIT which is integrated into PCS as a sub-algorithm, the equilibrium data and control signals are transferred through PCI-E. With the

Fusion Engineering and Design 128 (2018) 82–85

Y. Huang et al.

Fig. 1. The hardware architecture of integrating P-EFIT into PCS: installing Tesla P100 GPU into PCS real-time computing node.

2. Improvement in P-EFIT algorithms and architecture P-EFIT was an independent system to PCS, a reflective memory network (RFM) between them was built to share the real-time data [11]. However, this implementation has some disadvantages when extending P-EFIT’s capacity to allow more applications in plasma control. Firstly, P-EFIT was integrated with PCS as a sub-system, the only communication method between two systems is through RFM network. When emergency situations occur, PCS cannot manage P-EFIT directly, it would make the systems unstable. A more reliable way is that integrating P-EFIT into PCS as a sub-algorithm, PCS can invoke and manage P-EFIT as a thread directly. Secondly, more applications in plasma control require much more data transferring in real-time, the data transmission speed of RFM cannot fulfill the requirement. After integrating P-EFIT into PCS as a sub-algorithm, PCI-E could be used in transferring mass of real-time data between the CPU and GPU. Lastly, more modules in P-EFIT need to be developed to meet the real-time control requirements.

Fig. 2. The software architecture of integrating P-EFIT into PCS: PCS invokes P-EFIT directly, data is transferred between PCS and P-EFIT through PCI-E.

2.1. Integrated into PCS as a sub-algorithm EAST new PCS has one host and one real time computing node, the real time computing machine is Dell R730 server with two Intel Xeon E5-2667 eight-core CPUs with frequency 3.2 GHz on which 64-bit Linux Redhat 6.7 operation system is installed [17]. After installing Tesla P100 GPU hardware into PCS real-time computing node as shown in Fig. 1, the first challenging problem is how to make P-EFIT be invoked by PCS directly. EAST PCS is deployed by using C programming language and P-EFIT is deployed by using CUDA-C language [18]. Although CUDA-C language is the C language with new CUDA extension, it is hard to adjust either source code into the other’s. The implementation is realized through compiling and wrapping P-EFIT source code into static library which could be linked when compiling PCS source code. The architecture is shown as Fig. 2, GPU processes equilibrium calculations, PCS distributes one CPU core called RT7 as PEFIT’s host. All the data is transferring between CPU and GPU’s memory through PCI-E, the Stream and DMA techniques to accelerate data transmission speed will be introduced in the following sections. By this way, P-EFIT is integrated into PCS as a sub-algorithm which could be invoked and managed by PCS directly.

Fig. 3. Two streams are distributed on GPU, stream0 drives kernel engine for GPU parallel computation, stream1 drives copy engine for transferring data through PCI-E.

development of GPU hardware, optimization is carried out for P100 GPU. Robust parallel strategy and algorithms allow P-EFIT to achieve good computational performance, including equilibrium calculation, global parameters, geometry parameters, profile calculation and data transmission. After integrated in PCS, P-EFIT can complete one equilibrium iteration and transfer real-time diagnostics, control signals and equilibrium data in 375us with 129 × 129 spatial grid on P100 GPU. With all these improvements, P-EFIT can provide precise, detailed realtime plasma equilibrium reconstruction for sophisticated plasma control in EAST.

Fig. 4. Compared with sequential data transmission, concurrent data transmission can overlap most of data transmission and copy time.

83

Fusion Engineering and Design 128 (2018) 82–85

Y. Huang et al.

Fig. 5. Control error signals comparison between former and updated P-EFIT versions, RX1, ZX1, RX2, ZX2 are X-points control errors; SEG04, SEG03, SEG06, SEG01, SEG08, SEG09 are control errors on control segments.

2.2. Concurrent data interface through PCI-E In addition to magnetic diagnostics and control errors in plasma shape control, much more data need to be transferred in real-time when extending P-EFIT’s capacity to allow more applications in plasma control. For example, plasma boundary locations are needed in density control [19], profile information is required in current profile control [13] and poloidal flux distribution on grid is needed in NTM control [20]. When P-EFIT reconstructs equilibrium with 129 × 129 grid size, the totally about 80000 bytes data containing control parameters and equilibrium results need to be transferred from GPU’s memory to CPU’s memory through PCI-E per iteration in real-time. Although the bandwidth of PCI-E 3.0 is much higher than RFM, sequential data transmission and copy would cost over 100us. As shown in Fig. 3, Stream technique [18] is applied in P-EFIT. There are two streams distributed on GPU, one drives GPU’s kernel engine for GPU parallel computation, the other drives copy GPU’s engine for transferring data through PCI-E. Direct memory access(DMA) technique makes that GPU could access CPU’s memory directly. Combining stream and DMA techniques, a large quantity of data can be transferred concurrently through PCI-E between PCS and P-EFIT as shown in Fig. 4. Compared with sequential data transmission, concurrent data transmission make P-EFIT can transfer data during parallel computation kernels executing. By this way, most of time cost by data transmission copy could be overlapped, and the increased time consuming by data transferring per iteration is reduced to about 25us.

Fig. 6. Execution time evolution of P-EFIT’s one iteration in whole discharge.

Table 1 The consuming time of each part in P-EFIT per typical equilibrium reconstruction iteration.

Equilibrium calculation Control signals calculation Global parameters calculation Profile calculation Geometry parameters calculation Data transmission Total

T(us)

%

232 19 29 48 22 25 375

61.9 5.1 7.7 12.8 5.9 6.6 100

2.3. Newly designed algorithms and modules When extending P-EFIT’s capacity to allow more applications, global parameters like βp, li, volume, stored energy and safety factor profile are important information in plasma control and physics 84

Fusion Engineering and Design 128 (2018) 82–85

Y. Huang et al.

analysis [20]. The previous P-EFIT version do not trace the exact location of plasma boundary and magnetic flux surface, only compute the flux of boundary which is sufficient for plasma shape control with isoflux control algorithm [2,11]. When calculating the βp, li, volume, stored energy and safety factor profile, the locations of plasma boundary, flux surface are needed at first. The integrals over the whole plasma volume, poloidal cross-section, plasma boundary and flux surfaces are performed to calculate results. These computation cost more than 30% of the whole computation time in EFIT. When designing GPU algorithms for plasma boundary tracing and flux surface location searching, the algorithm in EFIT and the parallel strategy in reference [8] is adopted, GPU searches 96 angular rays from magnetics axis, by using GPU massively cores, each ray and each point on rays can be searched in parallel, the whole tracing boundary and surface computing costs less than 50us with 129 × 129 spatial grid. There are also some image processing algorithms that could perform parallel contouring of the flux, this may be explored in a future P-EFIT version. After knowing the plasma boundary and flux surfaces locations, the integral can be easily parallelized on GPU, βp, li, volume, stored energy and safety factor can calculated in 30us with 129 × 129 spatial grid. At the same time, two-dimension interpolation algorithm takes place of the green function algorithm in calculating control error signals [11]. Compared with the green function algorithm, interpolation algorithm is more computational efficient and flexible, it faster by a factor of ten.

reconstruction method which has high spatial resolution, customized modules and internal current profile calculation for plasma control in EAST. Efficient parallel strategies and algorithms allow P-EFIT to achieve good computational performance regardless of variable size of matrices because of different current representations and diagnostics. PEFIT included POINT diagnostic for real-time plasma current density and q profile reconstruction in EAST. P-EFIT has been integrated into PCS as a sub-algorithm through newly designed architecture and interface. After integrated in PCS, P-EFIT can complete one iteration, including equilibrium calculation, profile calculation, parameter calculation and data transmission in 375us with 129 × 129 spatial grid. Benchmarks are performed to demonstrate the updated P-EFIT validity and computational performance. With all these improvements, P-EFIT can provide precise, detailed real-time equilibrium reconstruction for sophisticated plasma control in EAST.

3. Benchmark tests and computational performance

[1] L.L. Lao, et al., MHD equilibrium reconstruction in the DIII-D tokamak, Fusion Sci. Technol. 48 (2) (2005) 968–977. [2] J.R. Ferron, et al., Real time equilibrium reconstruction for tokamak discharge control, Nucl. Fusion 38 (7) (1998) 1055. [3] J.G. Kwak, et al., Key features in the operation of KSTAR IEEE, Trans. Plasma Sci. 40 (2012) 697–704. [4] D.A. Gates, et al., Plasma shape control on the National Spherical Torus Experiment (NSTX) using real-time equilibrium reconstruction, Nucl. Fusion 46 (2006) 17–23. [5] L. Pangione, et al., New magnetic real time shape control for MAST, Fusion Eng. Des. 88 (2013) 1087–1090. [6] Huazhong Wang, Jiarong Luo, Qinchao Huang, Real time equilibrium reconstruction algorithm in EAST tokamak, Plasma Sci. Technol. 6 (4) (2004) 2390. [7] M. Rampp, et al., A parallel Grad-Shafranov solver for real-time control of tokamak plasmas, Fusion Sci. Technol. 62 (3) (2012) 409–418. [8] J.M. Moret, et al., Tokamak equilibrium reconstruction code LIUQE and its real time implementation, Fusion Eng. Des. 91 (2015) 1–15. [9] L. Giannone, et al., Improvements for real-time magnetic equilibrium reconstruction on ASDEX Upgrade, Fusion Eng. Des. 100 (2015) 519–524. [10] X.N. Yue, B.J. Xiao, Z.P. Luo, Y. Guo, Fast equilibrium reconstruction for tokamak discharge control based on GPU, Plasma Phys. Control Fusion 55 (2013) 085016. [11] Yao Huang, et al., Implementation of GPU parallel equilibrium reconstruction for plasma control in EAST, Fusion Eng. Des. 112 (2016) 1019–1024. [12] Yao Huang, Bing-Jia Xiao, Zheng-Ping Luo, Fast parallel Grad-Shafranov solver for real-time equilibrium reconstruction in EAST tokamak using graphic processing unit, Chin. Phys. B 26 (8) (2017) 085204. [13] Y. Huang, et al., Development of real-time plasma current profile reconstruction with POINT diagnostic for EAST plasma control, Fusion Eng. Des. 120 (2017) 1–8. [14] R. Albanese, et al., A MIMO architecture for integrated control of plasma shape and flux expansion for the EAST tokamak, Control Appl. IEEE (2016) 611–616. [15] Y. Guo, et al. Preliminary results of a new MIMO plasma shape controller for EAST contributed paper to this conference, to be published in Fusion Engineering & Design. [16] Q. Ren, et al., High spatial resolution equilibrium reconstruction, Plasma Phys. Controlled Fusion 53 (9) (2011) 095009. [17] Q.P. Yuan, et al. ‘Upgrade of EAST plasma control system for steady-state advanced operation’ contributed paper to this conference, to be published in Fusion Engineering & Design. [18] CUDA C Programming Guide v. 8.0. NVIDIA, 2016. [19] Q.P. Yuan, et al., Plasma current, position and shape feedback control on EAST, Nucl. Fusion 53 (4) (2013) 043009. [20] S.Y. Liang, et al., Real-Time detection for magnetic island of neoclassical tearing mode in EAST plasma control system, Plasma Sci. Technol. 18 (2) (2016) 197–201.

Acknowledgments This work is supported by the National Magnetic Confinement Fusion Research Program of China (No.2014GB103000, No.2015GB102004), the National Natural Science Foundation of China (No.11575245, No.11375237, No.11405205). References

With the hardware development on GPU technique, Tesla P100 GPU is chosen for P-EFIT optimization and customization. Updated parallel algorithms and architecture allow P-EFIT to achieve good computational performance, including the whole equilibrium iteration and data transmission. Benchmarks are performed to demonstrate the updated PEFIT validity and computational performance after integrated into PCS. For EAST PCS, there is a running mode called hardware test, in which all the hardware instruments are used and diagnostic data are read from history experimental shot. As shown in Fig. 5, control error signals on segments as same in [11] calculated by P-EFIT are compared. The shot 73113 is the discharge in EAST 2017 campaign using former P-EFIT version, and the shot 998960 is the simulation discharge by PCS hardware test using the data from shot 73113. Both P-EFIT versions use polynomial plasma current representation (NP = 2,NF = 1,δ = 1) [1,11] with magnetic diagnostics only. The benchmark shows that the controls error signals by former and updated P-EFIT versions are consistent with each other. The consuming time of P-EFIT per whole equilibrium reconstruction iteration integrated in PCS is also tested as shown in Fig. 6, P-EFIT takes about 375us per iteration with 129 × 129 spatial grid during the discharge. Compared with former P-EFIT version with 65 × 65 spatial grid on Tesla K20 [11], new updated P-EFIT can complete one iteration in similar time. The consuming times of each parts in P-EFIT’s whole equilibrium reconstruction iteration are summarized in Table 1. Fulfilling the requirement for real-time control, finer spatial resolution could provide more accurate results when extending to profile reconstruction with more diagnostics [1,16]. Though in this benchmark P-EFIT only uses magnetic data, computational performance is redundancy 4. Conclusion P-EFIT

provides

a

routine

real-time

plasma

equilibrium

85