GPU-OpenCL accelerated probabilistic power flow analysis using Monte-Carlo simulation

GPU-OpenCL accelerated probabilistic power flow analysis using Monte-Carlo simulation

Electric Power Systems Research 147 (2017) 70–72 Contents lists available at ScienceDirect Electric Power Systems Research journal homepage: www.els...

475KB Sizes 0 Downloads 47 Views

Electric Power Systems Research 147 (2017) 70–72

Contents lists available at ScienceDirect

Electric Power Systems Research journal homepage: www.elsevier.com/locate/epsr

GPU-OpenCL accelerated probabilistic power flow analysis using Monte-Carlo simulation Morad Abdelaziz Department of Electrical and Computer Engineering, Université Laval, Québec, Canada

a r t i c l e

i n f o

Article history: Received 18 October 2016 Received in revised form 20 February 2017 Accepted 21 February 2017

a b s t r a c t This paper investigates the acceleration of Monte-Carlo (MC) based probabilistic power flow (PPF) analysis by exploiting the massively parallel architecture of graphics processing units (GPUs). © 2017 Elsevier B.V. All rights reserved.

Keywords: GPU Monte-Carlo simulation Probabilistic power flow Renewable energy

1. Introduction

2. Literature review

Probabilistic power flow (PPF) analysis is an effective tool to account for the uncertain nature of loads and renewable resources in power systems studies. PPF analysis calculates the probability distributions of the power flow variables based on the probability distributions of the stochastic generation and demand. There is overwhelming agreement in the literature that Monte-Carlo (MC) based PPF techniques are the most capable and accurate amongst the different PPF techniques [1]. However, MC based PPF techniques require numerous deterministic power flow calculations to converge. In order to speed-up the PPF analysis, several analytical and approximate techniques have been proposed in the literature, to alleviate the computational burden required by the MC simulations [2] on the expense of the attained accuracy. In this paper, an alternative path to speed-up the PPF analysis is investigated. A parallel implementation of the MC based PPF analysis on graphics processing units (GPUs) is presented to offload the heavy computational burden of the MC method to the GPU. The proposed implementation exploits the massively parallel architecture of GPUs, resulting in a much shorter execution time while retaining the accuracy of the MC method.

In the past few years, the field of general-purpose GPU computing has seen remarkable advances [3]. Several GPU-based power system algorithms have been proposed in the literature. In Refs. [4,5], GPU based programs were developed to enable the rapid electromagnetic transient simulation of large-scale power systems. In Ref. [6], the use of GPUs to perform fast small signal stability analysis of real-world transmission systems was proposed. In Ref. [7], the application of GPU computing in solving the distribution network reconfiguration problem was investigated. Despite the interest shown over the past decade in harnessing the GPU’s computing capabilities in power system application, to the author’s best knowledge the utilization of GPUs to speed up the MC based PPF analysis has not been previously investigated in the literature. On the other hand, other approaches have been previously investigated in the literature to speedup the PPF analysis. Prominent among these approaches are linearization methods and the point estimate methods. For instance, in Ref. [8] a linearized power flow model was adopted to speed up a MC based probabilistic technique. However, the problem with using linearized models is that as the system perturbations around the linearization point increase, the solution becomes prone to larger and unacceptable linearization errors. Point estimate methods are based on computing the non-linear deterministic power flow for some predefined sample points of the input variables [9]. While the point estimate methods can outperform linearization methods in accuracy, these methods are generally less accurate when compared to MC based methods,

E-mail address: [email protected] http://dx.doi.org/10.1016/j.epsr.2017.02.022 0378-7796/© 2017 Elsevier B.V. All rights reserved.

M. Abdelaziz / Electric Power Systems Research 147 (2017) 70–72

71

Fig. 1. OpenCL platform and device models.

as evident by the fact that point estimate methods are generally benchmarked by comparison with MC based methods [10]. 3. GPU computing and OpenCL abstraction The Open Computing Language (OpenCL) is a hardware independent heterogeneous open standard that can be used to program and run any compliant processing device irrespective of the device type or vendor [11]. Given the cross-platform nature of OpenCL, this work adopts the OpenCL abstraction. OpenCL abstracts different processing devices, independent of their vendor or type, as computing devices. Fig. 1 shows a schematic of the OpenCL platform and device models. Each of the OpenCL compute devices has one or more compute units, which are each composed of one or more processing elements. In a multicore CPU the number of cores represent the number of compute units, with typically each of them having a single processing element. Alternatively, in a contemporary GPU there is hundreds of primitive processing cores (i.e., running at low to moderate frequency) with each of them organized in single instruction multiple data (SIMD) clusters of individual processing elements. Accordingly, while an OpenCL application can be run on different types of processing devices, the prominent advantage of OpenCL is how it is suited to program GPUs containing a large number of compute units and processing elements. The platform model for OpenCL divides a system into a single CPU-based host computing device connected to, and managing, one or more compute devices (e.g., GPUs). An OpenCL application is the combination of a host program, executed by the host, and OpenCL kernels, executed by one or more OpenCL devices. The host program builds the kernels and download them on the devices for execution. Each OpenCL kernel can be executed multiple times concurrently over a predefined N-dimensional index space, called the NDRange. An instance of the kernel is executed by a processing element for each point in the NDRange. Here it is important to highlight that the kernel code executed by the processing elements is purely sequential. Each independent kernel execution is called a work-item. Although, each work-item executes the same kernel code, the pathway they follow through the code and the data they process may vary from one work-item to the next. Work-items are grouped together to form work-groups. The work-items in each work-group are executed concurrently by the processing elements of a single compute unit. Each work-item is uniquely identified by two N-dimensional tuples giving its global position and its local position, known as global ID and local ID, respectively. Work-items rely on their IDs to determine which data they should access and process.

MC method concurrently on independent processing elements. First, the initial setup of data, that includes reading system data, calculating the bus admittance matrix and generating random samples based on the probability distributions of the stochastic loads and generators in the system, is performed on the CPU since this takes a negligible fraction of the total run-time. Second, a 1-dimensional discrete computational domain (i.e., NDRange), of global size Nsamples , is defined for the execution of the load flow operations. A work-item is identified for each independent load flow operation executed at each point in this NDRange, representing a specific MC sample. The sizes of the work-groups, collecting the different work-items together, depend on the specifics of kernels implementation and on the GPU device in use and hence cannot be generalized. However, using the OpenCL function clGetKernelWorkGroupInfo() the maximum as well as the preferred work-group sizes can be identified before the execution of the kernels on the GPU. OpenCL devices cannot access the host memory directly. Accordingly, in the third step, the data required to run the entire simulation is transferred to the GPU’s global memory. Fourth, each processing element uses its global ID to access the data pertaining to its MC sample, copies the data to its private memory and then uses it to perform the load flow computation. The work-items use the Newton–Raphson method to solve the power flow equations. However, to avoid the computationally expensive matrix inversion usually adopted to perform the Newton–Raphson method, Gaussian Elimination followedby Backward Substitution    are used to solve the matrix equation J x(t−1) x(t) = −F x(t−1) relating the vector function F, the Jacobian matrix J(x), and the vector of unknowns x, at iteration t. Fifth, the results of the different load flow calculations are transferred back to the host CPU, where expected value and the standard deviation of the power flow variables are calculated. 5. Case study In this section, the proposed parallel GPU imementation of the MC based PPF is tested on the 33-bus system distribution test system shown in Fig. 2. The uncertainty in the bus loads can be modelled using a normal distribution [12,13]. In this case study, the bus

4. GPU implementation of MC based PPF The parallelization strategy adopted in this work is to perform the deterministic power flow computation required by the

Fig. 2. 33-bus test system. Feeder and load data can be found in Ref. [16].

72

M. Abdelaziz / Electric Power Systems Research 147 (2017) 70–72

Table 1 Expected values (p.u.) and standard deviations (×10−3 p.u.) of the Bus voltage magnitudes. Bus no.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

CPU

GPU









1 0.9973 0.9847 0.9780 0.9714 0.9548 0.9515 0.9474 0.9422 0.9375 0.9368 0.9357 0.9311 0.9290 0.9279 0.9268 0.9253 0.9247 0.9968 0.9932 0.9925 0.9919 0.9814 0.9755 0.9722 0.9531 0.9508 0.9406 0.9328 0.9295 0.9258 0.9250 0.9247

0 0.12 0.78 1.10 1.40 2.20 2.30 2.60 3.20 3.70 3.80 4.00 4.90 5.10 5.40 5.80 6.40 6.40 0.12 0.15 0.16 0.17 1.10 1.80 1.80 2.30 2.50 3.20 3.60 3.90 4.50 4.60 4.60

1 0.9973 0.9846 0.9780 0.9714 0.9548 0.9515 0.9474 0.9422 0.9374 0.9368 0.9357 0.9310 0.9290 0.9278 0.9268 0.9252 0.9247 0.9968 0.9932 0.9925 0.9919 0.9814 0.9754 0.9721 0.9531 0.9508 0.9405 0.9327 0.9294 0.9257 0.9249 0.9246

0 0.12 0.77 1.10 1.40 2.20 2.30 2.60 3.10 3.70 3.80 4.00 4.80 5.10 5.40 5.70 6.30 6.30 0.12 0.15 0.16 0.17 1.10 1.80 1.80 2.30 2.40 3.10 3.60 3.80 4.40 4.50 4.50

loads in the system are assumed to follow normal distributions with the mean value being the nominal bus loads and the standard deviation having a value of 5% of the mean value [13]. Beta and Weibull distributions are the most popular distributions for the representation of solar insolation and wind speed, respectively [14,15]. In this case study, the output power of the two photovoltaic generators located at buses 13 and 28 are assumed to follow two different Beta distributions with the shape parameters ˛ and ˇ shown on Fig. 2. Similarly, the wind speeds for the three wind turbines located at buses 17, 24 and 32 are assumed to follow three different Weibull distributions with the shape and scale indices, k and c, respectively, given on Fig. 2. In order to determine the effectiveness and accuracy of the proposed parallel GPU implementation, the MC based PPF was solved twice (with Nsamples = 16,384 in each test); first sequentially on the CPU and secondly in parallel on the GPU. The CPU version was implemented in C and compiled with full optimization for best performance. The GPU version was also implemented in C/OpenCL. Both implementations adopted the same approach to solve the deterministic power flow equations, as described in Section 3. The two tests were run on a PC equipped with 2.7 GHz Intel Core i5 processor with 8GB of RAM and AMD Radeon (R9M290X) GPU. The expected values and standard deviations of the

bus voltages were calculated in both tests and are given in Table 1. The results obtained using the GPU are almost identical to those obtained using the CPU. This indicates that the PPF results can be accurately estimated using the proposed parallel GPU implementation. The average run time for both CPU and GPU implementations was 40.43 and 1.99 s, respectively. These results shows that the speed-up attained from the parallel GPU implementation reached 20.3×. 6. Conclusion This paper presented a parallel GPU implementation for the MC based PPF problem. A 33-bus distribution test system was used to test the correctness, effectiveness and adaptability of the proposed implementation. The results showed that exploiting the massively parallel architecture of GPUs in solving the MC based PPF problems can provide a significant speedup compared to the sequential implementation on CPUs. References [1] F. Miao, V. vital, G.T. Heydt, R. Ayyanar, Probabilistic power flow studies for transmission systems with photovoltaic generation using cumulants, IEEE Trans. Power Syst. 27 (4) (2012) 2251–2261. [2] J. Morales, J. Pérez-Ruis, Point estimate schemes to solve the probabilistic power flow, IEEE Trans. Power Syst. 22 (4) (2007) 1594–1601. [3] R.C. Green, L. Wang, M. Alam, Applications and trends of high performance computing for electric power systems: focusing on smart grid, IEEE Trans. Smart Grid 4 (2) (2013) 922–931. [4] Z. Zhou, V. Dinavahi, Parallel massive-thread electromagnetic transient simulation on GPU, IEEE Trans. Power Deliv. 29 (3) (2014) 1045–1053. [5] J.K. Debnath, A.M. Gole, W.-K. Fung, Graphics processing unit based acceleration of electromagnetic transients simulation, IEEE Trans. Power Deliv. 31 (5) (2016) 2036–2044. [6] F. Milano, Small-signal stability analysis of large power systems with inclusion of multiple delays, IEEE Trans. Power Syst. 31 (4) (2016) 3257–3266. [7] V. Roberge, M. Tarbouchi, F. Okou, Distribution system optimization on graphics processing unit, IEEE Trans. Smart Grid (2017), http://dx.doi.org/10. 1109/TSG.2015.2502066, in press, (early access, available online). [8] A. Bagchi, L. Goel, P. Wang, Generation adequacy evaluation incorporating an aggregated probabilistic model of active distribution network components and features, IEEE Trans. Smart Grid (2017), http://dx.doi.org/10.1109/TSG. 2016.2616542, in press, (early access, available online). [9] X.M. Ai, J.Y. Wen, T. Wu, W. Lee, A discrete point estimate method for probabilistic load flow based on the measured data of wind power, IEEE Trans. Ind. Appl. 49 (5) (2013) 2244–2252. [10] Su Chun-Lien, Probabilistic load-flow computation using point estimate method, IEEE Trans. Power Syst. 20 (4) (2005) 1843–1851. [11] C.-L. Su, P.-Y. Chen, C.-C. Lan, L.-S. Huang, K.-H. Wu, Overview and comparison of OpenCL and CUDA Technology for GPGPU, in: Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Kaohsiung, Taiwan, 2012, pp. 448–451. [12] L. Wenyuan, R. Billinton, Effect of bus load uncertainty and correlation in composite system adequacy evaluation, IEEE Trans. Power Syst. 6 (4) (1991) 1522–1529. [13] Z. Ren, W. Li, R. Billinton, W. Yan, Probabilistic power flow analysis based on the stochastic response surface method, IEEE Trans. Power Syst. 31 (3) (2016) 2307–2315. [14] Z. Qin, W. Li, X. Xiong, Incorporating multiple correlations among wind speeds, photovoltaic powers and bus loads in composite system reliability evaluation, Appl. Energy 110 (2013) 285–294. [15] Y. Li, W. Li, W. Yan, J. Yu, X. Zhao, Probabilistic optimal power flow considering correlations of wind speeds following different distributions, IEEE Trans. Power Syst. 29 (4) (2014) 1847–1854. [16] D. Singh, R. Misra, D. Singh, Effect of load models is distributed generation planning, IEEE Trans. Power Syst. 22 (4) (2007) 2204–2212.