Application of new training methods for neural model reference control

Application of new training methods for neural model reference control

Engineering Applications of Artificial Intelligence 74 (2018) 312–321 Contents lists available at ScienceDirect Engineering Applications of Artifici...

1MB Sizes 0 Downloads 43 Views

Engineering Applications of Artificial Intelligence 74 (2018) 312–321

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

Application of new training methods for neural model reference control Amir H. Jafari a , Martin T. Hagan b, * a b

George Washington University, Washington, DC, United States Oklahoma State University, Stillwater, OK, United States

ARTICLE

INFO

Keywords: Recurrent neural network Model reference control SOM Novelty sampling Magnetic levitation

ABSTRACT In this paper, we introduce new, more efficient, methods for training recurrent neural networks (RNNs) for system identification and Model Reference Control (MRC). These methods are based on a new understanding of the error surfaces of RNNs that has been developed in recent years. These error surfaces contain spurious valleys that disrupt the search for global minima. The spurious valleys are caused by instabilities in the networks, which become more pronounced with increased prediction horizons. The new methods described in this paper increase the prediction horizons in a principled way that enables the search algorithms to avoid the spurious valleys. The work also presents a novelty sampling method for collecting new data wisely. A clustering method determines when an RNN is extrapolating, which occurs when the RNN operates outside the region spanned by the training set, where adequate performance cannot be guaranteed. The new method presented in this paper uses a clustering method for extrapolation detection, and then the novel data is added to the original training set. The network performance is improved when additional training is performed with the augmented data set. The new techniques are applied to the model reference control of a magnetic levitation system. The techniques are tested on both simulated and experimental versions of the system.

1. Introduction RNNs are good candidates to represent nonlinear dynamic systems, as demonstrated by their application in many areas, such as system identification and control (Hagan and Demuth, 1999), long term predictions of chemical processes (Su et al., 1992), financial analysis of multiple stock markets (Roman and Jameel, 1996) and phasor detection (Kamwa et al., 1996). However, it is well known that RNNs are difficult to train (Atiya and Parlos, 2000; Gori et al., 2010). (It should be noted that we are using RNN to designate any discrete time neural network with one or more feedback connections that contain one or more delays.) Two of the proposed reasons for the difficulties in RNN training are the problems of vanishing and exploding gradients (Bengio et al., 1994; Pascanu et al., 2012). When a recurrent network is stable (or, more precisely, a given trajectory is stable), effects of inputs to the network diminish as they move forward in time, and, consequently, the gradients of performance with respect to inputs and weights diminish as they are propagated backward in time. This is referred to as the vanishing gradient problem, which makes it difficult to learn long-term dependencies between inputs and outputs, if the initial weights of the network produce a stable response. This is generally not as important an issue in nonlinear system identification as it is, for example, in *

natural language processing, where the meaning of a word might be more accurately identified by context in the previous paragraph. The exploding gradient problem is caused by the complement of the vanishing gradient problem. If the network is unstable (a given trajectory is unstable), the effects of inputs to the network grow as they propagate forward in time. The gradients will therefore grow as they move backward in time. This is connected to the existence of spurious valleys in the error surfaces of recurrent networks (Jesus et al., 2001; Horn et al., 2009; Phan and Hagan, 2013a). These valleys are not associated with the true minimum of the surface, or to the problem the RNN is trying to solve. They are strongly dependent on the input sequence in the training data. (If the input sequence changes, even though the system being modeled stays the same, the valleys will move significantly.) Any batch search algorithm is very likely to be trapped in these spurious valleys. These valleys occur in regions of instability, as was shown in Horn et al. (2009) and Phan and Hagan (2013a). Alternate training methods have been developed to mitigate the effects of these spurious valleys (Jesus et al., 2001; Phan and Hagan, 2013b). Because the spurious valleys depend so strongly on the input sequence, one alternate method is to divide the data into multiple subsequences, or minibatches. The subsequences can be alternated during training, which will move the valleys and prevent the algorithm

Corresponding author. E-mail address: [email protected] (M.T. Hagan).

https://doi.org/10.1016/j.engappai.2018.07.005 Received 8 August 2017; Received in revised form 16 June 2018; Accepted 14 July 2018 0952-1976/© 2018 Elsevier Ltd. All rights reserved.

A.H. Jafari, M.T. Hagan

Engineering Applications of Artificial Intelligence 74 (2018) 312–321

2. Modified training for recurrent neural networks

from becoming trapped (Jesus et al., 2001). Recently, Phan and Hagan (2013b) demonstrated a modified procedure, in which the error gradient associated with each subsequence is monitored during training. Large gradient magnitudes indicate that the training algorithm is located within a spurious valley for those subsequences, and so those subsequences can be removed temporarily from the training process. Another technique that was introduced in Phan and Hagan (2013b) was to increase the prediction horizon gradually during the training process. The initial training segment used a one-step-ahead prediction. This was increased at each training segment, until the prediction horizon during the final training segment covered the full length of the original sequences. This process can require long training times, if the prediction horizon is increased too slowly, but will fail to converge if the prediction horizon is increased too quickly. In this paper, we are introducing a method that searches for an optimal horizon step at each training segment (Jafari and Hagan, 2015). We demonstrate the process on a practical system identification problem. Even after a recurrent network has been successfully trained, satisfactory performance can only be ensured if the network inputs are similar to those in the training set. This is also true for feedforward networks, but extrapolation is a more urgent problem for recurrent networks, where, because of feedback connections, responses can become unstable when network inputs (including feedback signals) fall outside the training set. The process of detecting network inputs that are outside the training set is called novelty detection (Pimentel et al., 2014). In this paper, we are proposing a type of novelty detection based on clustering. We demonstrate that the proposed technique is able to detect incipient network failures and instabilities well before they occur. The clustering method we use for novelty detection is the SelfOrganizing Map (SOM) (Kohonen, 1990). This is a topology preserving network, in that neurons within the network have neighbor relationships that are preserved by the training process. The idea will be to train the SOM on composite vectors that contain the inputs to the network augmented with the target network output. The SOM will divide the training data into clusters, so that each input/target pair will be near one cluster center. When a new data point appears that is not near any cluster center, extrapolation will be identified. We are also going to use the SOM to collect additional data in order to improve the training procedure. It is unlikely that the original data set will effectively cover the full range of conditions where the network will be used. The RNN will extrapolate when network inputs fall outside the space spanned by the training data set. We are going to collect additional training data when the SOM indicates extrapolation. Then, we will retrain the RNN network with the new data combined with the initial training data set. This procedure is known as novelty sampling (Raff et al., 2005). This will be done in phases until no novel conditions are detected after many additional tests. Some of these ideas were first presented in Jafari and Hagan (2015), but they will be expanded in three ways in this paper. First, we introduce the use of the SOM for novelty sampling, which is used to select new data for the training process. Secondly, in addition to the modeling of dynamic systems, we also apply novelty sampling, and the other new recurrent network training methods, to the training of a model reference control (MRC) system, represented as a larger RNN that contains both the plant model and a neurocontroller. Finally, all of the new procedures will be tested on a magnetic levitation system — both simulated and experimental versions. This paper is organized as follows. In Section 2, we explain the modified training algorithm, in which subsequences are removed from the training set when their gradient becomes large. Next, in Section 3, we introduce the new method for determining the optimal prediction horizon. In Section 4, we describe the recurrent network modeling of a magnetic levitation system. In Section 5 we enhance the models by adding new data by novelty detection. Next, in Section 6, we describe how the new procedures can be used to develop model reference controllers. In Section 7, we perform experimental verification of the methods on an experimental prototype maglev system.

In order to train an RNN to approximate a dynamic system, we need appropriate data. Unlike static networks, where each input/target pair stands on its own, RNN data must consist of ordered sequences of inputs and target outputs. When training recurrent networks, the length of the sequence determines the prediction horizon. If the length of a sequence increases by one, then the prediction horizon increases by one. For example, if the training sequences have a length of 5 time steps, and the maximum number of delays in the network is 2, then training the closed-loop network means that we are doing 3-step-ahead predictions. For the method introduced in Phan and Hagan (2013b), training begins with short prediction horizons, and then the prediction horizons increase as training proceeds. Short prediction horizons require short sequences, so the original training sequences are divided into multiple subsequences. The network is trained for multiple iterations at a given prediction horizon. We call this a training segment. After the completion of a training segment, the prediction horizon is increased by the horizon step, which requires that the original training sequences be subdivided again. The key concept introduced in Phan and Hagan (2013b) was that, because the spurious valleys are caused by the input sequence, each training subsequence will have a different set of valleys. If training becomes trapped in a valley, the sequence that owns that valley could then be removed from the training set. In order to determine which sequence to remove, the individual training gradients for each sequence are computed, and the sequence with the largest gradient is removed, since gradients will be highest inside the spurious valleys. To determine when the training algorithm entered a valley, Phan and Hagan (2013b) proposed using a feature of the Levenberg–Marquardt (LM) training algorithm (Hagan and Menhaj, 1994), as described below. The LM update rule for weights 𝐱𝑘 at the 𝑘th iteration is [ ]−1 𝛥𝐱𝑘 = − 𝐉𝑇 (𝐱𝑘 )𝐉(𝐱𝑘 ) + 𝜇𝑘 𝐈 𝐉𝑇 (𝐱𝑘 )𝐞(𝐱𝑘 )

(1)

where 𝐞 is the network error and 𝐉 is the Jacobian matrix of the network errors with respect to the weights. 𝐉 can be computed using backpropagation. For RNNs, we need to use dynamic backpropagation. Jacobian calculations for a general dynamic network can be found in Jesus and Hagan (2007). The important feature of the LM algorithm is that as 𝜇𝑘 becomes large it reverts to steepest descent with a small learning rate, which guarantees that the performance function 𝐅(𝐱) (typically mean square error) must decrease if 𝜇 is made large enough. The algorithm starts with a small 𝜇𝑘 , and if 𝐅(𝐱) does not decrease at any iteration, the algorithm increases 𝜇𝑘 by factor of 10. If 𝐅(𝐱) decreases, 𝜇𝑘 is reduced by a factor of 10, because the algorithm converges faster in the Gauss– Newton mode (𝜇𝑘 small). The algorithm is stopped, if 𝜇𝑘 becomes too large. This indicates that 𝐅(𝐱) does not decrease, even when a very small step in the steepest descent direction is taken. This indicates that the algorithm is stuck in one of the spurious valleys, which tend to be steep and narrow. The other approach suggested in Phan and Hagan (2013b) was to start the first training segment with one-step-ahead predictions and then to increase the prediction horizon gradually for each successive training segment. The training procedure from Phan and Hagan (2013b) can be summarized as follows (Method 1): 1. In the first training segment, use open-loop training (one-stepahead predictions). All training segments involve maxit iterations of the training algorithm. 2. Closed-loop training with increasing prediction horizon: Do 𝑘step-ahead prediction (𝑘 ≥ 2). This includes segmentation of the original long sequences into smaller subsequences. 3. At each iteration of the LM algorithm, if 𝜇 reaches 𝜇𝑚𝑎𝑥 , remove the subsequence with largest gradient. If 𝐅(𝐱) does not decrease, keep removing the subsequence with next largest gradient until 313

A.H. Jafari, M.T. Hagan

Engineering Applications of Artificial Intelligence 74 (2018) 312–321

Fig. 1. Selecting best horizon step (Method 2).

𝐅(𝐱) decreases (the algorithm escapes from the valleys). Return the removed subsequences to the training data before proceeding to next iteration. 4. Increase the prediction horizon 𝑘 (sequence length). If all subsequences are removed, shorten the prediction horizon and go back to step 2.

the subsequences. The objective of our proposed horizon step selection method is to find the largest horizon step for which the oscillations occur over a sufficiently small percentage of the sequence. The new procedure will follow the same steps outlined in the previous section, including the use of the LM algorithm, but in the final step, where the prediction horizon is increased, the following procedure will be implemented to determine the horizon step. Using the weights determined at the completion of the previous training segment, the MSE will be computed for prediction horizons from 1 to maxstep steps ahead of the prediction horizon used in the previous training segment. At this point, the algorithm will find all local minima of the MSE with respect to the prediction horizon. It will then select the local minimum with the smallest MSE. The flowchart for choosing the best horizon step is shown in Fig. 1. We will reference this as Method 2 in the remainder of the paper.

In the method proposed in Phan and Hagan (2013b), the prediction horizon was incremented according to a preplanned schedule. The schedule was conservative, with small horizon steps, since it is difficult to know the optimal horizon, and horizons that are too long can produce steep, narrow valleys. Because the schedule was conservative, training times could become quite long. In the next section, we will describe a modification to Phan and Hagan (2013b), in which the best horizon step is determined for each training segment. 3. New procedure for horizon step selection

4. Demonstration of the new training procedure on magnetic levitation

This section provides a brief discussion of the new procedure for determining the best horizon step. A more detailed discussion is provided in Jafari and Hagan (2015). Before we describe the new procedure, we want to discuss the effect of the prediction horizon on the training process. As shown in Horn et al. (2009) and Phan and Hagan (2013a), the spurious valleys appear in regions of the weight space where the network is unstable. Even though the network output is very small at the bottom of the valley, small changes in the weights will result in unstable responses. The network response is initially small, but after some time oscillatory behavior begins. If the oscillatory behavior only takes place over a small region at the end of the sequence, then training can continue successfully. However, when the oscillations occur over a significant percentage of the sequence, it is difficult for the training to recover a stable response. We want to increase the prediction horizon as much as possible at each training segment, but we do not want to increase it so far that unstable behavior occurs over too large a percentage of any of

In this section, we will consider the basic magnetic levitation system. Magnetic levitation has been used in several industrial applications, such as transportation systems. In our simulated magnetic levitation system, we will suspend a magnet above an electromagnet, as in maglev trains. The goal of this magnetic levitation system is to control the position of a magnet above an electromagnet. Fig. 2 shows the magnetic levitation system, which consists of a magnet suspended above an electromagnet, where the magnet is constrained so that it can only move in the vertical direction. The equation of motion of the magnet is: 𝑑 2 𝑦(𝑡) 𝑖2 (𝑡)𝑠𝑔𝑛[𝑖(𝑡)] 𝛽 𝑑𝑦(𝑡) 𝛼 = −𝑔 + × − × 𝑀 𝑦(𝑡) 𝑀 𝑑𝑡 𝑑𝑡2

(2)

where 𝑦(𝑡) is the distance of the magnet above the electromagnet, 𝑖(𝑡) is the current in the electromagnet, 𝑀 is the mass of the magnet, 𝑔 is 314

A.H. Jafari, M.T. Hagan

Engineering Applications of Artificial Intelligence 74 (2018) 312–321

discussed in Section 3. For the maglev identification, it took 67 training segments for the prediction horizon to reach the full length of the original sequence (9998 step ahead prediction). The method reached 𝜇𝑚𝑎𝑥 (which indicated a spurious valley and required the temporary removal of some subsequences) in 14 of the 67 training segments. After training was completed for the maximum prediction horizon, the network response closely followed the target response for all 20 of the original training sequences. Fig. 5(a) shows the results of the 9998 step ahead prediction. We see very little difference between the actual position of the magnet and the position predicted by the NARX network. (The MSE is 2.57 × 10−4 .) In order to validate the NARX network, we generated 100 additional test sequences of 10,000 time points, which were not used for training. In some of the cases the network is extrapolating, and the network response is inaccurate. A typical oscillatory response is shown in Fig. 5(b). This type of oscillatory response is characteristic of RNNs. The feedback connections allow for the possibility of instabilities. However, when the network has been accurately trained over the relevant input space, it should be possible to avoid these issues. Our hypothesis was that the instabilities occur when the network inputs fall outside the space spanned by the training data set. In this case, the network would be extrapolating, and reasonable performance could not be guaranteed. In the next section, we will introduce a method to determine when the network is extrapolating. The objective will be to detect pending oscillatory behavior in a trained network, well before the oscillation occurs.

Fig. 2. Magnetic levitation system. Table 1 Simulation parameters for the magnetic levitation. 𝛽

𝛼

𝑔

𝑀

𝑖(𝑡)

𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙

12

15

9.8( sm2 )

3 (kg)

−1 to 4(A)

0.010

the gravitational constant, 𝛽 is a viscous friction coefficient and 𝛼 is a field strength constant. Table 1 shows the simulation parameters that we used. In order to model this system using RNNs, first we need to collect a set of training data. We used Simulink as a tool to gather data from this dynamic system, and we applied random inputs consisting of a series of pulses of random widths and heights, known as a skyline function (Hagan et al., 2002). An example is shown in Fig. 4(a). We use a Nonlinear AutoRegressive eXogenous (NARX) network (see Narendra and Parthasarathy, 1990), shown in Fig. 3, to model this system. The network we used had 3 input and output delays (the prediction begins with the fourth data point) and also initially had 15 hidden neurons. A total of 20 sequences of 10,000 time steps each were generated to produce the entire training data set, in order to cover the full range of required network operation. Fig. 4(b) shows a histogram of magnet position in the training set (in centimeters), which demonstrates the coverage of the modeling. The magnetic levitation system is to be modeled as the magnet position varies in the range from 0 to 6 centimeters and as the current varies from −1 to 4 amp. The first training segment for the modified training algorithm (Method 1) in Section 2 uses open-loop (one step ahead prediction) training. In this training segment, there are two inputs to the NARX network shown in Fig. 3 — the input sequence and the target sequence. The tapped delay line in the feedback connection is filled with previous target values (magnet positions in this case). After performing the open loop training segment, we performed additional training segments with multiple step ahead predictions (closing the feedback connections). The horizon step for each training segment was chosen automatically by the new procedure (Method 2)

5. Enhancing the training results by novelty detection and novelty sampling 5.1. Novelty detection Extrapolation is defined as estimating, beyond the original observation range, the value of an output variable on the basis of its relationship with an input variable or variables. In the case of a NARX model, the inputs are the past values of the control signal and the past values of the system output. When the vector of these past values falls outside the range covered in the training set, the network is extrapolating. When training recurrent networks, it will not be possible to guarantee reasonable network performance if the network inputs move outside the range of the data on which the network is trained. It is important to be able to detect when this extrapolation is occurring. For example, if the RNN is part of a feedback control system, (Hagan et al., 2002), we would want to disable the RNN when extrapolation occurs, and replace with a conventional controller. Detecting extrapolation is a form of novelty detection. We want to know when the inputs to the network fall outside the range of the training data. In this section we propose a type of novelty detection for RNNs. A number of approaches to novelty detection are reviewed

Fig. 3. NARX recurrent network. 315

A.H. Jafari, M.T. Hagan

Engineering Applications of Artificial Intelligence 74 (2018) 312–321

Fig. 4. Training data.

Fig. 5. Training and testing results.

in Pimentel et al. (2014). The approach proposed here is a type of clustering method, in which the inputs from the training set are characterized by a small set of prototype vectors. The minimum distance of a new input to the nearest prototype is used to quantify novelty. For example, if the distance from the new input to the nearest prototype is larger than the maximum distance of that prototype to the cluster of training inputs that are assigned to it, then the new input could be considered novel. (One might also use other similar types of threshold distances to indicate novelty.) The clustering method proposed here for novelty detection is the Self-Organizing Map (SOM) (Kohonen, 1990). This is a topology preserving network, in that neurons within the network have neighbor relationships that are preserved by the training process, allowing visualization in two dimensions. Several studies have shown the SOM to produce more accurate clustering than other algorithms (see Abbas, 2008; Mangiameli et al., 1996), and the ability to visualize the clusters, which is unique to the SOM, was quite helpful in the initial phases of this study. Although other clustering methods could probably be substituted, the SOM worked well for our purposes. The idea in this paper is to train the SOM to cluster combination vectors – delayed inputs to the NARX network and delayed network outputs – augmented with the target network output. For each cluster, we calculate the maximum distance between the cluster center and the most distant member of that cluster in the training set. We then use those maximum distances to determine when the RNN is extrapolating. While

the RNN is operating, at each time step we create an input vector to the SOM. (When the target output is not available, the last element of the SOM input vector is replaced with the RNN output.) The distance of this SOM input to the nearest cluster center is compared with the maximum distance associated with that cluster. If the current input distance is larger than the maximum distance, there is potential extrapolation under way and the extrapolation flag is set. The extrapolation flag goes high many time steps before oscillation begins in the network output. The extrapolation flag may go high for only one time point, and then return low. To ensure that extrapolation is occurring, we wait until the extrapolation flag is high for a predetermined indication width before collecting additional data. The indication width can be adjusted to set desired false alarm rates. We found that an indication width of 3 performed well over many experiments. (For a more complete description of the novelty detection procedure, see Jafari and Hagan, 2015.) 5.2. Novelty sampling Because it is difficult to guarantee that the original data set will effectively cover the full range of conditions where the network will be used, we are going to collect additional training data using the SOM novelty detector. Whenever the SOM indicates extrapolation, we 316

A.H. Jafari, M.T. Hagan

Engineering Applications of Artificial Intelligence 74 (2018) 312–321

Fig. 6. Only test sequence with oscillatory response. Fig. 7. Target and trained network response after retraining for a test sequence.

will collect that data and then retrain the network with the new data combined with the initial training data. This procedure is known as novelty sampling (Raff et al., 2005). This will be done in phases until no novel conditions are detected after many additional tests. For the magnetic levitation system, we performed two phases of novelty sampling and retraining. Novelty detection identified the sequences which have extrapolation with respect to the original training data set, which is then augmented with these new sequences. The new training data set will grow in phases until there are no significant errors on new tests sets. (Even after a network has been implemented, the SOM can continue to monitor performance, and the network can be retrained when a significant number of sequences have been added to the training set.) After training the NARX model of the maglev system, as described in Section 4, we trained a 20 × 20 SOM over the original 20 training sequences. We then generated 100 new test sequences. We applied the trained NARX network and SOM to the new test sequences. The SOM identified novelty in 39 of the 100 test sequences. These 39 sequences were added to the original 20 training sequences to produce a retraining data set of 59 sequences, which is almost three times as large. We were able to reach the full prediction horizon on all 59 sequences and complete the retraining process, in the same manner as described in Section 4 for the original training process. The first phase of retraining is completed, and it is time to check the new network with additional test sequences. In this part of the process, we should be able to see an improvement on the test results. (We expect the number of oscillatory responses to decrease.) In the second retraining process, we had 59 original training sequences, and we trained another 20 × 20 SOM on this data. We then generated 100 more test sequences. The NARX network response at this stage for the 100 test sequences was very good for all the sequences except 1. Fig. 6 shows this test sequence, which has an oscillatory response for the trained NARX model. We performed the novelty detection for the new data set using the SOM, and 8 new sequences were collected and added to our training data set for a total of 67 sequences. We performed modified training on this new data set to develop a new NARX model. As described in Section 4, we began with one step ahead prediction and then multi-step ahead prediction to reach the full prediction horizon. The training algorithm reached the full prediction horizon with very small MSE, completing the second stage of retraining. After the second stage of the retraining process, we trained an SOM on the 67 training data sequences, and then generated another 100 test sequences. The response of the NARX network was extremely accurate, even for the maximum prediction horizon (9998 steps ahead), and the SOM did not indicate any extrapolation.

In summary, the novelty sampling method, combined with Method 1 and Method 2, results in a very stable and trustworthy model for the magnetic levitation system. Fig. 7 shows the final results for a typical test sequence. As shown in the figure, it is very hard to distinguish between the network response and the target, even for a prediction horizon of 9998 steps. The next step is to train a neural model reference controller (MRC) using the same process. The MRC is also an RNN, with multiple feedback loops, and includes the NARX maglev model as a component. (See Fig. 9.) The MRC training will be a good test of the enhanced RNN training procedures. 6. Neural model reference control Now that we have trained a neural network to represent the plant, we are going to use the model reference control architecture to control the plant output. The neural MRC architecture was first introduced in Narendra and Parthasarathy (1990), and it consists of two parts. The first part is a plant model, which is identified using the NARX network. The second part is an NN controller. The controller network is trained so that the plant output follows the reference model output. The overall MRC block diagram is given in Fig. 8. The architecture of the MRC network is shown in Fig. 9. The MRC is a 4 layer dynamic network; the first two layers make up the controller, and the third and fourth layers make up the plant model (trained NARX network). There are three sets of controller inputs: delayed reference inputs, delayed controller outputs (plant inputs), and delayed plant outputs. For the maglev system, the tapped delay line on the reference input contains delays of 1 and 2 time steps, and the controller outputs and plant outputs are followed by tapped delay lines with delays of 1, 2 and 3. We used 10 hidden neurons for the controller, and the trained NARX plant model has 20 hidden neurons. The next step is to train the controller network. Training proceeds using the full MRC network of Fig. 9, but the plant model weights are held fixed, while the controller weights are allowed to vary. Training data is obtained by applying a skyline function to the reference model and collecting the reference model output. For our tests, the reference model was chosen to be a critically damped second order system with a time constant of 1 s. We initially generated 20 training sequences of 10,000 points sampled at 0.02s. The MRC training process is the same as that described in previous sections for the NARX model — Method 1 is used for updating weights and removing sequences to avoid spurious valleys; Method 2 is used to 317

A.H. Jafari, M.T. Hagan

Engineering Applications of Artificial Intelligence 74 (2018) 312–321

Fig. 8. Model reference control structure.

Fig. 9. Model reference control network.

determine the best prediction horizon increments; and novelty sampling is used to select additional input data for complete training. Because the architecture of the MRC is different than the NARX network, the initial one-step-ahead training is slightly different. For onestep-ahead training, the feedback lines from the plant output are opened, and the targets are used to replace the feedback terms, as is done for the NARX network. However, the feedback loop from the output of the controller network (Layer 2) cannot be opened, because there are no available target values at this point in the network. These tapped delays are generally set initially to zero. In addition, the second layer weights and bias of the controller network are set to zero, which will initialize the system with no input to the plant. This eliminates the possibility of initial instabilities. Fig. 10 shows the trained MRC response. The bottom axis shows the reference input, and the top axis shows the plant output and the reference model output. At the scale of this figure, the two plots cannot be distinguished. The SOM training for the MRC uses 10-element vectors, with three delays in each TDL associated with 𝐈𝐖1,1 , 𝐋𝐖1,4 and 𝐋𝐖1,2 , plus the target. Using the SOM, we preformed novelty sampling and collected an additional 100 sequences for training. In general, we found that the MRC training proceeded more quickly that the NARX plant model training. No oscillations were discovered in any of the test sequences, although a small amount of extrapolation occurred initially. After the completion of the novelty sampling, all additional test responses were similar to Fig. 10. Now that the recurrent network training algorithms have been tested with simulations, the next step is to perform experimental tests. The next section will describe the construction of an experimental maglev system and the development and implementation of a maglev neural MRC.

Fig. 10. Trained MRC Response.

7. Experimental verification In order to test the recurrent network training procedures described in the previous sections, we designed and constructed an experimental maglev system, which is shown in Fig. 11. The basic structure was designed using SolidWorks software and constructed with a 3D printer. ′′ The permanent magnet was a 1′′ × 14 × 1′′ neodymium block magnet. 318

A.H. Jafari, M.T. Hagan

Engineering Applications of Artificial Intelligence 74 (2018) 312–321

Fig. 12. Target and trained network response for test sequence.

The NN filter works in the following manner. If the absolute value of the difference between the raw IR position measurement and the NARX position estimate is less than a threshold, we perform a weighted average of the NARX estimate and the raw position measurement. Otherwise, we use the NARX position estimate alone. The resulting method is shown as Algorithm 1 , where 𝑦𝑠 (𝑡) is the raw IR sensor reading, 𝑦𝑛𝑛 (𝑡) is the output of the NARX maglev model and 𝑦𝑓 (𝑡) is the filter output.

Fig. 11. Magnetic levitation setup.

The electromagnet was an APW EM200 2′′ diameter round flat faced electromagnet. The microcontroller that collected senor measurements and ran the MRC was an Arduino Mega 2560. We used a DFROBOT 2 × 2A DC Arduino Motor Shield to drive the electromagnet using PWM signals from the Arduino. The magnet position sensor was a Sharp GP2Y0A51SK0F IR Analog Distance Sensor, which was read with an Arduino analog input. (Instructions for building such a system are provided at http://thl.okstate.edu/MagLevNeural.html.) To implement the MRC, we used the Simulink Support Package for Arduino Hardware, which is available to all Simulink users. Simulink blocks are available for driving the Arduino PWM output lines and for reading the Arduino analog input lines. Once the neural networks are implemented in Simulink and incorporated with Arduino blocks, the resulting system can be compiled and downloaded to the Arduino with a single click. After the system was constructed, we followed the same steps described in the previous section to first develop a NARX model of the maglev system, and then to train and implement a neural MRC. For the NARX model training set, we applied a skyline input voltage in the range of 9 V to 12 V. This is a voltage range in which the magnet will remain levitated. In reading the IR sensor for the magnet position, we applied a simple three-step median filter to remove noise spikes. This provided the target for the NARX training. Fig. 12 shows an example of a test sequence applied to the NARX plant model after training. We used 5 test sequences. No oscillatory responses were detected, and we did not find any large errors in any of the test results. Training using experimental data required fewer steps than training using simulated data. The next step is to train the controller network. As with the simulated system, training proceeded using Method 1, Method 2 and novelty sampling. Training data was again generated by applying a skyline input to a critically damped, second order reference model with a time constant of 1 s It should be noted that during the controller training, the experimental system is not used. The NARX plant model is used to represent the maglev system, and the MRC network of Fig. 9 is trained off-line, with only the controller network weights being adjusted. After the controller network was trained, the system was implemented using the Simulink Support Package for Arduino Hardware and downloaded to the Arduino. One change was made for the experimental implementation, when compared to the simulation described in the previous section. Because of intermittent noise in the IR position sensor, we decided to use the NARX maglev model as a Neural Network (NN) filter.

At each time step 𝑡 ; read IR sensor 𝑦𝑠 (𝑡); update 𝑦𝑛𝑛 (𝑡); if |𝑦𝑠 (𝑡) − 𝑦𝑛𝑛 (𝑡)| < 𝜖 then 𝑦𝑓 (𝑡) = 𝛼 ∗ 𝑦𝑠 (𝑡) + (1 − 𝛼) ∗ 𝑦𝑛𝑛 (𝑡); else 𝑦𝑓 (𝑡) = 𝑦𝑛𝑛 (𝑡); end Algorithm 1: Filtering Algorithm This NN filter significantly improved the position estimate. It is a unique method that takes advantage of the NARX maglev model. This is a side benefit of using the NN MRC approach. Fig. 13 shows a typical experimental test result for the MRC controller with a random skyline reference signal. The magnet accurately follows the reference input. To verify the quality of the NN MRC controller, we will compare it next with a PID controller. PID controllers represent a standard for industrial applications. The basic PID control algorithm is 𝑡

𝑢(𝑡) = 𝐾𝑝 𝑒(𝑡) + 𝐾𝑖

∫0

𝑒(𝜏)𝑑𝜏 + 𝐾𝑑

𝑑 𝑒(𝑡) 𝑑𝑡

(3)

where 𝑒(𝑡) is the error between the reference signal and the plant output and 𝑢(𝑡) is the control signal (voltage applied to the electromagnet). PID design involves tuning the proportional, integral and derivative gains 𝐾𝑝 , 𝐾𝑖 and 𝐾𝑑 . To perform accurate tuning, we used the MATLAB PID Tuning toolbox. This requires a linear system model, which we obtained by linearizing the trained NARX model around a nominal set point of 1.8 cm. We tuned the controller to achieve a critically damped second order response that matched the reference model used for the NN MRC system. We implemented the PID controller on the experimental system. Because the NN filter was not available for the PID controller, we used a simple low pass linear filter to smooth the IR sensor reading. Every effort was made to produce the best possible PID response, using linear techniques. 319

A.H. Jafari, M.T. Hagan

Engineering Applications of Artificial Intelligence 74 (2018) 312–321

The Sharp distance sensor has noise with a heavy tailed distribution — producing a number of outlier points. We handled this with two data cleaning methods. First, when collecting data to train the NARX plant model, we filtered the sensor output using a median filter of length 3. This reduced the noise significantly. Secondly, when operating the MRC, we used the NARX plant model as a filter to smooth the raw sensor signals. Under normal conditions, the raw sensor signal was averaged with the NARX model output. When the raw sensor signal was significantly different than the NARX model output, then the NARX model output was used in place of the sensor signal. The system performance was not acceptable without these two data cleaning procedures. However, when they were used we found the system identification and control tasks were easier to successfully complete for the experimental system than for the simulated system, which did not include any noise. The SOM procedures worked as well, if not better, for the experimental system than they did for the simulated systems. One other practical issue that came up with the experimental system was temperature. The behavior of the electromagnet was different when it was first turned on than after it had been running for some time and had reached thermal equilibrium. The results shown above were obtained in the thermal equilibrium state, which would be the most common operating condition. If it were necessary to operate accurately during an initial warm-up period, then a separate controller could be trained under those conditions.

Fig. 13. Experimental data MRC with NN filter for test reference input.

8. Conclusions In this paper we have described new methods for training recurrent neural networks for system identification and control. These methods are designed to overcome the problem of spurious valleys in the error surfaces of these networks. The first stage involves beginning the training with one-step prediction horizons, and then increasing the prediction horizons in a principled way. As the prediction horizon is increased, gradient magnitudes are checked for indications that the algorithm is trapped in a spurious valley. Subsequences that are responsible for the valley are temporarily removed. After the recurrent network training is complete, SOM networks are trained, further data is collected, and novelty sampling is used to collect additional data to refine the recurrent networks. We have performed extensive demonstrations of the new methods on a magnetic levitation system. We have developed NARX models and neural model reference controllers for the maglev system using computer simulations, and have also designed and built an experimental prototype maglev system. We have implemented and tested a neural MRC on the experimental maglev system and have compared it with a PID controller. All simulations and experimental testing validate the performance of the new recurrent training methods. We were able to consistently develop robust NARX models of the physical system and to train stable MRC controllers that outperform linear PID controllers. We have provided a website (http://thl.okstate.edu/MagLevNeural. html) with all materials necessary to build a maglev system and reproduce the results described in this paper.

Fig. 14. Experimental data PID with linear filter for test reference input.

Fig. 14 shows the experimental magnet response with the PID controller. We can see that the NN MRC controller has better tracking response, as shown in Fig. 13, than the PID controller. The overall RMS error of the NN MRC was found to be 0.08 cm, and the RMS error for the PID controller was 0.13 cm. There are two fundamental reasons for the improved accuracy of the NN MRC. First, the PID controller is linear, and is designed to operate best for the linearized model at a particular set point. The NN MRC is nonlinear, and is designed to operate equally well throughout the operating range for which it is trained. Also, the NARX plant model can be used to accurately filter the sensor data throughout the operating range. We have made available all of the materials needed to reproduce the experiments described in this section at the website http://thl. okstate.edu/MagLevNeural.html. It contains 3D printing files to create experimental structure, links to websites for purchasing the electronic parts, Simulink models and instructions.

References Abbas, O.A., 2008. Comparisons between data clustering algorithms. Int. Arab J. Inf. Technol. (IAJIT) 5 (3). Atiya, A.F., Parlos, A.G., 2000. New results on recurrent network training: Unifying the algorithms and accelerating convergence. IEEE Trans. Neural Netw. 11 (3), 697–709. Bengio, Y., Simard, P., Frasconi, P., 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5 (2), 157–166. Gori, M., Hammer, B., Hitzler, P., Palm, G., 2010. Perspectives and challenges for recurrent neural network training. Log. J. IGPL 18 (5), 617–619. Hagan, M.T., Demuth, H.B., 1999. Neural networks for control. In: American Control Conference, 1999. Proceedings of the 1999, Vol. 3. IEEE, pp. 1642–1656. Hagan, M.T., Demuth, H.B., Jesus, O.D., 2002. An introduction to the use of neural networks in control systems. Internat. J. Robust Nonlinear Control 5 (6), 989–993.

7.1. Discussion In implementing the new methods on the experimental system, it was necessary to accommodate some practical issues, such as sensor noise. 320

A.H. Jafari, M.T. Hagan

Engineering Applications of Artificial Intelligence 74 (2018) 312–321 Pascanu, R., Mikolov, T., Bengio, Y., 2012. On the difficulty of training recurrent neural networks. arXiv preprint arXiv:1211.5063. Phan, M., Hagan, M.T., 2013a. Error surface of recurrent networks. IEEE Trans. Neural Netw. Learn. Syst. 24 (11), 1709–1721. Phan, M., Hagan, M.T., 2013b. A procedure for training recurrent networks. In: Proc. Int. Joint Conf. Neural Netw. pp. 1–8. Pimentel, M., Clifton, D.A., Clifton, L., Tarassenko, L., 2014. A review of novelty detection. Signal Process. 99, 215–249. Raff, L.M., Malshe, M., Hagan, M., Doughan, D., Rockley, M.G., Komanduri, R., 2005. Ab initio potential-energy surfaces for complex, multichannel systems using modified novelty sampling and feedforward neural networks. J. Chem. Phys. 122 (8), 084104. Roman, J., Jameel, A., 1996. Backpropagation and recurrent neural networks in financial analysis of multiple stock market returns. In: Proceedings of the Twenty-Ninth Hawaii International Conference on System Sciences, 1996, Vol. 2. IEEE, pp. 454–460. Su, H.T., McAvoy, T.J., Werbos, P., 1992. Long-term predictions of chemical processes using recurrent neural networks: A parallel training approach. Ind. Eng. Chem. Res. 31 (5), 1338–1352.

Hagan, M.T., Menhaj, M.B., 1994. Training feedforward networks with the marquardt algorithm. IEEE Trans. Neural Netw. 5 (6), 989–993. Horn, J., Jesus, O.D., Hagan, M.T., 2009. Spurious valleys in the error surface of recurrent networks - analysis and avoidance. IEEE Trans. Neural Netw. 20 (4), 686–700. Jafari, A.H., Hagan, M.T., 2015. Enhanced recurrent network training. In: 2015 International Joint Conference on Neural Networks. (IJCNN), IEEE, pp. 1–8. Jesus, O.D., Hagan, M.T., 2007. Backpropagation algorithms for a broad class of dynamic networks. IEEE Trans. Neural Netw. 18 (1), 14–27. Jesus, O.D., Horn, J., Hagan, M.T., 2001. Analysis of recurrent network training and suggestions for improvements. In: Proc. Int. Joint Conf. Neural Netw. pp. 2632–2637. Kamwa, I., Grondin, R., Sood, V.K., Gagnon, C., Mereb, J., 1996. Recurrent neural networks for phasor detection and adaptive identification in power system control and protection. IEEE Trans. Instrum. Meas. 45 (2), 657–664. Kohonen, T., 1990. The self-organizing map. Proc. IEEE 78 (9), 1464171480. Mangiameli, P., Chen, S.K., West, D., 1996. A comparison of som neural network and hierarchical clustering methods. European J. Oper. Res. 93 (2), 402–417. Narendra, K.S., Parthasarathy, K., 1990. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1 (1), 4–27.

321