Neural network-based nonlinear prediction of magnetic storms

Neural network-based nonlinear prediction of magnetic storms

Journal of Atmospheric and Solar-Terrestrial Physics 64 (2002) 651 – 656 www.elsevier.com/locate/jastp Neural network-based nonlinear prediction of ...

143KB Sizes 0 Downloads 73 Views

Journal of Atmospheric and Solar-Terrestrial Physics 64 (2002) 651 – 656

www.elsevier.com/locate/jastp

Neural network-based nonlinear prediction of magnetic storms D. Jankovi+cov,a ∗ , P. Dolinsk,y, F. Valach, Z. V1or1os Geomagnetic Observatory Hurbanovo, Geophysical Institute SAS, Komarnanska 108, 947 01 Hurbanovo, Slovak Republic

Abstract The best known manifestations of the solar wind impact on the Earth’s magnetosphere are the geomagnetic storms. At middle latitudes, the magnetic storm is best described in terms of the horizontal component of the geomagnetic 5eld. We use the method of neural networks which is suitable for nonlinear dynamical systems modeling and for the nowcasting as well as forecasting of the magnetic storms. For constructing a neural network model, a multivariate method for determination of the inputs is applied 5rst. This method enables us to reduce the original 17 input variables to two variables, the so-called principal components. The performance of the model is characterized by the correlation coe8cient . Its mean value is 0.93 considering two principal components and time history 6 h. Our interest is also focused on the more than 1 h ahead forecasting of the geomagnetic index Dst . Another question investigated here is how the inclusion of the history of Dst index into the input c 2002 Elsevier Science Ltd. All rights reserved. matrix in?uences the predicting ability of the network.  Keywords: Magnetospheric physics; Magnetic storms; Solar wind–magnetosphere interactions; Nonlinear methods

1. Introduction One of the most important goals in understanding space weather variability is the prediction of geomagnetic storms, which are usually measured by the geomagnetic index Dst . Modern approaches suggest that the complex dynamics of the magnetosphere can be modeled using nonlinear methods. The method of neural networks is well suited for such modeling. It is known that the solar corona is not uniform and expands outward. The bulk of solar wind plasma ?ows out from the sun’s active regions and under the suitable geometric and energetic conditions can reach the near-Earth environment. Magnetosphere and ionosphere currents increase and intensify in consequence to the enhanced level of solar wind–magnetosphere coupling (Akasofu and Chapman, 1972; Jacobs, 1989). At middle and low latitudes, the ring current and at higher latitudes, a system of ionospheric electrojet currents and 5eld-aligned currents dominate. It is manifested especially in the horizontal component (H -component) of the geomagnetic 5eld in a given region, which is often used for constructing the geomagnetic indices (Dst , AE) (Davis and Suguira, 1966). The global level ∗

Corresponding author. E-mail address: [email protected] (D. Jankovi+cov,a).

of the magnetic storm is monitored by the geomagnetic index Dst , which is constructed from the H -component of four near-equatorial observatories around the world in geomag◦ ◦ netic latitude range (±9:5 ; ±33:3 ). The solar wind-driven magnetosphere–ionosphere system is a dynamical system which can be described in terms of a relevant number of parameters (Weigel et al., 1999). In the past years a number of satellites which measure the solar wind parameters were constructed, e.g. the ACE satellite which is operational from August 1997 and is situated in L1 Earth–Sun Lagrange point (240RE , where RE is the Earth radius). According to the largest value of decay of the Dst index, the storms are categorized as storms of low, medium, high and extreme disturbance: low

Dst ¿ − 20 nT;

medium

−20 nT ¿ Dst ¿ − 50 nT;

high

−50 nT ¿ Dst ¿ − 100 nT;

extreme

−100 nT ¿ Dst :

A lot of papers dealing with predictions of geomagnetic activity demonstrate that magnetic storms are mainly controlled by the solar wind (Gleisner et al., 1996; Gleisner and Lundstedt, 1999; Lundstedt and Wintoft, 1994). Some of the solar wind parameters, especially n, v, Bz (n—concentration

c 2002 Elsevier Science Ltd. All rights reserved. 1364-6826/02/$ - see front matter  PII: S 1 3 6 4 - 6 8 2 6 ( 0 2 ) 0 0 0 2 5 - 1

652

D. Jankovi+cov,a et al. / Journal of Atmospheric and Solar-Terrestrial Physics 64 (2002) 651 – 656

of the solar wind particles, v—solar wind speed, B— north–south component of the IMF) control the main part of the geomagnetic activity. In this paper, a neural network model is constructed for modeling and consequently, for the prediction of the magnetic storms on the basis of known parameters of the solar wind measured by ACE. We use multivariable analysis to determine the inputs of the neural network model. In our analysis, the time series of the solar wind parameters and Dst index with time resolution J = 1 h are used from 1998 to 1999. 2. Data analysis methods 2.1. Principal component analysis The main idea of principal component analysis is to describe the dispersion of an array of P points in m-dimensional space (Gnanadesikan, 1977; Reyment and J1oreskog, 1996). For this purpose, we introduce a new set of orthogonal linear coordinates, the so-called principal components. Let x = (x1 ; x2 ; : : : ; xm ) denote the m coordinates of observation and the rows of the P × m matrix, X, constitutes the P m-dimensional observations. The mean vector is de5ned by 1 X Xmean = 1 (1) P and the covariance matrix (X − Xmean )(X − Xmean ) S = ((sij )) = ; (2) (P − 1) where 1 is a column vector, whose elements are equal to 1, and Xmean is a P × m matrix, whose each rows is equal i to xmean , where i = 1; : : : ; P. The geometric interpretation of principal component analysis is the following. The principal components introduce the axes of the concentrated ellipsoids placed in the centre of the gravity of the samples. These ellipsoids can be described by quadratic form c = (X − Xmean ) S−1 (X − Xmean );

(3)

where c ¿ 0. Algebraically, the principal component analysis involves 5nding the eigenvalues and eigenvectors of the sample covariance matrix. For example, for obtaining the 5rst principal component, z1 , we have to seek the vector of coe8cients, a = (a1 ; a2 ; : : : ; am ), such that the linear combination ax has the maximum sample variance of all linear combinations. The solution of this problem is to 5nd the largest of eigenvalues, 1 of S and the required solution for a is the eigenvector l1 of S corresponding to 1 (l1 x): For determining the second and the next principal components we need m eigenvalues 1 ¿ 2 ¿ · · · ¿ m ¿ 0 and corresponding eigenvectors of S. Then the required principal components are provided by l1 x; l2 x; : : : ; lm x. We can get it by the so-called spectral decomposition of the matrix S. There exists an orthogonal matrix, L, such that 

S = LL ;

(4)

where  is a diagonal matrix with diagonal elements 1 ; 2 ; : : : ; m (the eigenvalues of S). The columns of L are the eigenvectors of S. The principal components transformation of the data, which is de5ned to include a shift of origin to the sample mean, is then speci5ed by Z = L (X − Xmean ):

(5)

2.2. Neural network as a nonlinear model Neural networks represent an alternative for nonlinear modeling. A model of neural network is based on the ability to learn input–output relations and recognize patterns from a database (Hertz et al., 1991). We used a multilayer perceptron feed forward neural network, which is represented by g : R N → Rn :

(6)

This consists of one input layer with N inputs, one hidden layer with q units and one output layer with n outputs. The output of this model can be expressed as (NHrgaard, 1997)  q  N    (a)  (b) (b) (a) Wnj fj Wjl xl + Wj0 + Wn0 ; (7) y n = Fn j=1

l=1

where Wnj(a) are the weights between unit j and unit n of input and hidden layers and Wjl(b) are the weights between hidden layer and an output. The activation functions Fn (x) and fj(x) are linear or nonlinear. We used one hidden layer with fj (x) as a tangent hyperbolic nonlinear activation functions and F1 (x) as linear function in output layer. For a given set of M inputs, we de5ne the normalized mean square error (NMSE) by N (ysout − yspred )2 NMSE = s=1 ; (8) M2 where yout denotes the actual given output and ypred the neural network output. The network is trained to minimize NMSE by choosing the set of the weights according to the modi5ed form of gradient descant called error back-propagation (Rumelhart et al., 1986). Wnj(a; b)new = Wnj(a; b)old + JWnj ;

(9)

where JWnj = −!

"(NMSE) + #JWnj ; "Wnj

(10)

where !1 is a learning rate, and # is an inertial term in the range 0; 1 . The performance of the model can be characterized by the correlation coe8cient M out out mean )(yjpred − ypred mean ) j=1 (yj − y = ; (11) pred $yout $y where yout mean means the mean value of actual output and ypred mean the value of predicted output.

D. Jankovi+cov,a et al. / Journal of Atmospheric and Solar-Terrestrial Physics 64 (2002) 651 – 656

653

Table 1 The proportion of total data variance 5t by each component PC

PC1

PC2

PC3

PC4

PC5

PC6

PC7

PC8

prop

0.331

0.166

0.085

0.074

0.058

0.054

0.052

0.042

PC9 0.037

PC10 0.028

PC11 0.024

PC12 0.022

PC13 0.010

PC14 0.007

PC15 0.002

PC16 0.000

PC17 0.000

Table 2 The investigated extreme events in the years 1998–1999 (p—the number of time period; Tstart —the starting time; and Tend —the ending time) p

Tstart

Tend

p

Tstart

Tend

1 2 3 4 5 6

10.2.1998(6:00) 3.3.1998(12:00) 26.4.1998(16:00) 19.6.1998(20:00) 27.7.1998(8:00) 21.8.1998(8:00)

27.2.1998(8:00) 20.3.1998(4:00) 13.5.1998(8:00) 6.7.1998(12:00) 13.8.1998(24:00) 7.9.1998(24:00)

7 8 9 10 11 12

15.9.1998(8:00) 6.1.1999(6:00) 10.2.1999(16:00) 11.4.1999(0:00) 15.9.1999(8:00) 14.10.1999(12:00)

2.10.1998(24:00) 22.1.1999(22:00) 27.2.1999(8:00) 26.4.1999(16:00) 2.10.1999(24:00) 31.10.1999(4:00)

3. Results In this paper, the time series of the solar wind parameters Bx ; By ; Bz ; |B|; n, v; nv; n|B|; v|B|; vn|B|; dBx =dt, dBy =dt, dBz =dt, dB=dt, dv=dt, dn=dt; the Akasofu parameter ) and Dst index with time resolution J =1 h for the years 1998–1999 were studied. The solar wind data were obtained from the ACE database at NOAA Space Environment Center and the hourly Dst were provided by World Data Center, Kyoto. The time series of each parameter were normalized by xnorm = (x−xmean )=$x , where x denotes the given variable, xmean is the mean and $x is the standard deviation of the series of x. Since we do not know exactly the number and the properties of control parameters, we determined the linear combination of 17 parameters which could control the maximum of the variance of the geomagnetic activity in the 5rst part of our analysis. Using the principal component analysis we determined 17 principal components (PC). It is assumed that the computed quotients are negligible for PCs which contribute less to the total data variance. Their contribution to the total data variance is shown in Table 1. In the second part of the analysis, we used these PCs to model the geomagnetic index Dst by a neural network. We selected only the intervals of the extreme storms from the interval 1998 to 1999. There were 12 extreme events (Dst ¡ − 100 nT) considered (Table 2). We divided these sets as follows: for training (p=1), for validation (p = 2– 4), independent sets for prediction (p = 5–12). First, a model without the Dst time history T was examined, considering only the principal components time history. We considered the cases in which the number of PCs was changed. Fig. 1a (Fig. 1b) shows the correlation coe8cients  (NMSE) for these cases. We can see that when we take more than two PCs into account, the results of the predictions will not change considerably. The information about

the variance which is contained in the 5rst PC is ∼33:1%, in the second PC ∼ 16:6%. Let us neglect the contribution of the succeeding PCs in the following taking into account the 5rst two PCs with the largest variance only. The correlation coe8cient increases with time history and at T = 6 h has its maximum (Fig. 1a). Using the NMSE as a characteristic parameter we get the same qualitative results (Fig. 1b). The largest mean value of the  between yout and ypred is 0.64 (Fig. 1c). The parameters of the network were taken: ! = 0:00001; # = 0:9; T = 6 h. When the Dst history was also considered as input, the quality of the predictions increase. In the case of one hour ahead prediction, the value of the  did not change with the time history of the inputs or the number of PCs. The mean value of the  was 0.93. In comparison with Fig. 1c, the network gets more information from Dst as from the solar wind parameters. It seems to be reasonable, because the Dst index does not exhibit such rapid changes due to the time resolution considered. In the case when the Dst index was predicted several hours ahead (6, 12 and 18 h) (Fig. 2), the values of  decreased. In parallel, the contribution and importance of the succeeding PCs to the total variance increased. The dependences of  on history T for a single PC are similar to that in Fig. 1a. We can see the 1, 6, 12 and 18 h ahead predictions on Fig. 3. The parameters of the networks were chosen to be ! = 0:00001; # = 0:9; T = 6 h. In order to estimate the role of the persistence caused by the consideration of the Dst history in input, the correlation coe8cients between the original time series and series delayed over 1, 6, 12 and 18 h were evaluated as 0.95, 0.56, 0.22 and 0.11, respectively. Comparing it with Table 3 it can be concluded that for 1 h ahead prediction, the persistence has a dominant role (the prediction is practically made by persistence). For predicting time 6, 12 and 18 h, respectively, the role of persistence decreases. Our results show that up to 86%

654

D. Jankovi+cov,a et al. / Journal of Atmospheric and Solar-Terrestrial Physics 64 (2002) 651 – 656

Fig. 1. The correlation coe8cients (a) and NMSE (b) as functions of diRerent numbers of PCs and diRerent solar wind time histories, not considering the histories of Dst [◦—PC1 ; ∗—(PC1 ; PC2 ); pentagram—(PC1 ; PC2 ; PC3 ); triangle—(PC1 ; PC2 ; PC3 ; PC4 )]; (c) the prediction for T = 6 h for two PCs (the storm No. 11 in Table 2).

of the variance of the Dst index was reproduced by the trained network for the considered data sets. This result is comparable to the studies of Wu and Lundstedt (1996) ( ∼ 0:91), who used the Elman recurrent neural networks and as considered inputs only the following solar wind parameters: the interplanetary magnetic 5eld z-component (GSM); the solar wind plasma density n; the solar wind velocity v. 4. Conclusion The aim of this paper was (a) to 5nd linear combinations—principal components of the parameters of the solar wind which could explain the largest part of the observed variance of geomagnetic activity. Within the frame of multivariate analysis 17 parameters were considered. Principal component analysis revealed that the information content of the subsequent PCs rapidly decreases. Therefore, only the two 5rst (most important, holding about ∼ 49% of information) principal components were considered;

(b) to elaborate a model for the prediction of the Dst index using neural networks with principal components as inputs. Proper results were achieved for 1 h ahead predictions, mainly if the Dst history was also considered as input. In the case, when the Dst index is not considered in input, that is, the input information on actual magnetospheric state is neglected, and only the solar wind parameters are considered, then the main phase of the storms is better reproducible than the recovery phase. However, when the Dst index is also considered as input, both phases of the storms are equally well predicted. This is because the initial phase of geomagnetic storms thought to be driven mainly by the solar wind, while the recovery phase seems to be determined mainly by internal magnetospheric process. These 5ndings give evidence to nonlinear character of the solar wind–magnetosphere interaction processes. The best mean value of  is 0.93 considering the two principal components and T = 6 h. This result is comparable to the results of Wu and Lundstedt (1996) ( ∼ 0:91), Gonzalez et al. (1989) ( ∼ 0:90) or Pisarskij et al. (1989) ( ∼ 0:95).

D. Jankovi+cov,a et al. / Journal of Atmospheric and Solar-Terrestrial Physics 64 (2002) 651 – 656 0.8

0.935

correlation coefficient

correlation coefficient

0.94

0.93 0.925 0.92 0.915

0.75

0.7

0.65

1 hour ahead 0.91

0

2

6 hours ahead

4 6 history T [ hour ]

0.6

8

0

2

0

2

4 6 history T [ hour ]

8

0.75 correlation coefficient

0.7 correlation coefficient

655

0.65

0.6

0.55

0.7 0.65 0.6 0.55 0.5 0.45 18 hours ahead

12 hours ahead 0.5

0

2

4 6 history T [ hour ]

0.4

8

4 6 history T [ hour ]

8

50

50

0

0 Dst index [ nT ]

Dst index [ nT ]

Fig. 2. The correlation coe8cients as functions of diRerent numbers of PCs and diRerent solar wind time histories in the case when the histories of Dst is considered [◦—PC1 ; ∗—(PC1 ; PC2 ); pentagram—(PC1 ; PC2 ; PC3 ); triangle—(PC1 ; PC2 ; PC3 ; PC4 )] for 1, 6, 12 and 18 h ahead prediction.

_50 _100 _150

_50 _100 _150

1 hour ahead _200

0

100

200

300

6 hours ahead _200

400

0

100

50

0

0

_50 _100

400

_50 _100 _150

_150 12 hours ahead _200

300

t [ hour ]

50

Dst index [ nT ]

Dst index [ nT ]

t [ hour ]

200

0

100

200 t [ hour ]

300

18 hours ahead 400

_200

0

100

200

300

400

t [ hour ]

Fig. 3. The prediction of Dst index for 1, 6, 12 and 18 h ahead with Dst in input (the storm No. 11 in Table 2).

656

D. Jankovi+cov,a et al. / Journal of Atmospheric and Solar-Terrestrial Physics 64 (2002) 651 – 656

Table 3 The correlation coe8cients for diRerent predicted times Predicted time

1

6

12

18



0.93

0.73

0.69

0.66

Acknowledgements The authors wish to acknowledge NOAA Space Environment center for solar wind and World Data Center for Geomagnetism for geomagnetic data. We also wish to acknowledge the useful advice of referees. This work was supported by VEGA grant 2=6040=99. References Akasofu, S., Chapman, S., 1972. Solar Terrestrial Physics. Oxford University Press, Oxford. Davis, T.N., Suguira, M., 1966. Auroral electrojet activity index AE and its universal time variations. Journal of Geophysical Research 71 (3), 785–801. Gleisner, H., Lundstedt, H., Wintoft, P., 1996. Predicting geomagnetic storms from solar-wind data using time-delay neural networks. Annales Geophysicae 14, 679–686. Gleisner, H., Lundstedt, H., 1999. Ring current in?uence on auroral electrojet predictions. Annales Geophysicae 17, 1268–1275. Gnanadesikan, R., 1977. Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York.

Gonzalez, W.D., Tsurutani, B.T., Gonzalez, A.I.C., Smith, E.J., Tang, F., Akasofu, S.I., 1989. Solar wind–magnetosphere coupling during intense magnetic storms (1978–1979). Journal of Geophysical Research 94, 8835 –8851. Hertz, J., Krogh, A., Palmer, R.G., 1991. Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA (Chapter 4). Jacobs, J.A., 1989. Geomagnetism, Vol. 3. Academic press, London. Lundstedt, H., Wintoft, P., 1994. Prediction of geomagnetic storms from solar wind data with the use of a neural network. Annales Geophysicae 12, 19–24. NHrgaard, M., 1997. Neural Network Based System Identi5cation Toolbox. Technical University of Denmark, Denmark. Pisarskij, V.Y., Feldstein, Y.I., Rudneva, N.M., Prigancov,a, A., 1989. Ring Current and Interplanetary Medium Parameters. Studia Geophysica et Geodaetica. Academia, Prague, Czechoslovakia. Reyment, R.A., J1oreskog, K.G., 1996. Applied Factor Analysis in the Natural Sciences. Cambridge University Press, Cambridge. Rumelhart, D.E., Hinton, G., Williams, R., 1986. Learning representations by back-propagating errors. Nature 323, 533. Weigel, R.S., Horton, W., Tajima, T., 1999. Forecasting Auroral Electrojets Activity from solar wind input with neural networks. Geophysical Research Letters 26 (10), 1353–1356. Wu, J.G., Lundstedt, H., 1996. Prediction of geomagnetic storms from solar wind data using Elman recurrent neural networks. Geophysical Research Letters 23 (4), 319–322.