Time Series Prediction Model of Grey Wolf Optimized Echo State Network

As a novel recursion neural network, Echo State Networks (ESN) are characterized by strong nonlinear prediction capability and effective and straightforward training algorithms. However, conventional ESN predictions require a large volume of training samples. Meanwhile, the time sequence data are complicated and unstable, resulting in insufficient learning of this network and difficult training. As a result, the accuracies of conventional ESN predictions are limited. Aimed at this issue, a time series prediction model of Grey Wolf optimized ESN has been proposed. W out of ESN was optimized using the Grey Wolf algorithm and predictions of time series data were achieved using simplified training. The results indicated that the optimized time series prediction method exhibits superior prediction accuracy at a small sample size, compared with conventional prediction methods.


Introduction
The time series are a group of random variables arranged in time order and has been widely applied in our daily life and industry, including commerce, meteorology, finance, and agriculture. To fully understand universal laws and provide references to optimized decision-making, great attentions have been invested in time series predictions (Rojas and Pomares 2016)  .
Owing to effects by various factors, time series is usually characterized by significant randomness and nonlinearity . For accurate predictions of time sequence data, various time series prediction models have been proposed. For instance, ARIMA is a conventional linear time series prediction model with considerable prediction capability . However, continuously increasing data size leads to increasing size of nonlinear time series data and applications of linear prediction models have been limited. In virtue of rapid developments in machine learning and artificial intelligence, novel time series prediction models such as neural networks (Chandra 2015) (Wen et al. 2012) and support vector machine (SVM) (Yaseen et al. 2016) (Misaghi and Sheijani 2017) (Nieto et al. 2017) (Jaramillo et al. 2017) have been proposed and widely applied in nonlinear time series predictions. Ren et al. achieved rapid collections of receptor temperature using the reverse propagation neural network and rapid prediction of collector tube temperature based on the data collected (Ren et al. 2016). However, traditional neural networks are prone to fall into local optimum and dimensionality disasters due to the uncertainty of their structure. Huang et al. reported effective predictions of mammary cancer using the SVM algorithm (Huang et al. 2017). Although the algorithm overcomes the problems of traditional neural network prediction, its sequence suitable for processing is limited. Yao et al. proposed a RNN-based double layer mechanism model (denoted as DA-RNN) (Qin et al. 2017). This model can obtain the long-term time sequence dependency relationship and predict by selecting relevant time sequence driving series. This model can obtain the long-term time sequence dependency relationship and predict by selecting relevant time sequence driving series. The network needs to calculate the error gradient in the training process. Because the error ladder method is difficult to train the network, the inherent disadvantage of this difficult training limits the wide application of the recurrent neural network in practice (Egrioglu et al. 2015). In addition, since the weighting requirements of the training algorithm are continuously updated, and the update process is computationally intensive, the training time of the RNN network is increased.
The Echo State Network (ESN) (Sacchi et al. 2007) is a novel recursion network proposed by Jaeger in 2001 and has been applied in time series predictions. ESN is improved on the basis of the traditional recurrent neural network. The network structure is unique. The concept of "reservoir pool" is introduced, which can better adapt to the application of nonlinear systems. The network training process uses linear regression and has short-term memory function and the network model is simple and fast, has high prediction performance, overcomes the problems of large computational complexity, low training efficiency and local optimization in traditional recurrent neural networks, and can be adapted to the processing of time series data in practical problems. However, ESN requires a large volume of training samples owing to its unique structure and training is difficult in ESN. Meanwhile, reservoir applicability and prediction accuracy in complicated situations needs to be improved. Afterwards, various optimized ESN models have been proposed. Qin et al. proposed a novel E-KFM model by combining the KFM algorithm and ESN and applied it for multistep prediction of time sequence data (Xiao et al. 2017). The E-KFM model exhibits excellent effectiveness and robustness, but did not achieve good optimization results. Qiao et al. reported accurate predictions by ESN via optimization of W out of ESN using particle swarm algorithm (Qiao et al. 2016), and proposed the PSO_ESN prediction model. The model improves the prediction performance to a certain extent, but the model training time is longer due to the evolution of particles and the number of iterations. Zhong et al. reported optimization of double layer ESN by genetic algorithms and applied the optimized model in multiarea time series prediction (Zhong et al. 2017). The prediction model optimizes the echo state network, improves the accuracy of time series prediction, and shortens the prediction time to some extent. However, the genetic algorithm has complex coding, many parameters and choices rely on experience, which cannot solve the problem of large-scale calculation.
Aimed at this issue, a time series prediction model of Grey Wolf optimized ESN is proposed by introducing the Grey Wolf algorithm, a swarm intelligence optimization algorithm. First, significance of time series predictions and the state-of-the-other studies in this field were introduced. Then, the GWO time series prediction method for ESN was proposed and described in details. Finally, the proposed model is verified based on different data sets.

Time series prediction method of Grey Wolf optimized ESN
In order to solve issues (e.g., difficult training) in ESN predictions, time series prediction model of Grey Wolf optimized ESN is proposed. This method eliminates the issue of difficult training by optimizing W out using the Grey Wolf algorithm and improves the accuracy of ESN prediction. Additionally, experiments demonstrated significant enhancements of prediction accuracy of the proposed prediction method over different time series data sets.

Echo State Network
ESN is a novel recursion neural network consisting of input layer, hidden layer, and output layer (Lun et al. 2015) (Han and Mu 2011). As shown in Figure 1, layers are connected to each other via different weight matrices. Herein, W in refers to the input weight connection matrix, which is the connection between input layer and reservoir; W refers to the internal weight connection matrix, which is the connection between reservoir and internal neuron; W back refers to the feedback weight connection matrix, which is the connection between output layer and the next output layer; W out refers to the output weight connection matrix, which is the connection between reservoir and output layer. W out is the only key parameter that requires training.
Unlike other neural networks, the hidden layer in this network is replaced by reservoir. Herein, the reservoir consisting of various sparse neurons dynamically connected to each other and it exhibits memory capability via performance of weight storage system between neurons. The reserve pool is the core part of the ESN network, and its parameters are of great significance to the network, including the size of the reserve pool N, the internal connection weight spectrum radius SR of the reserve pool, the input unit scale IS and the sparsity degree SD. Among them, the size of the reserve pool N is reflected by the number of neurons. The size of the scale N affects the predictive power of the ESN network. In general, the size of N is adjusted by the number of data sets. The internal connection weight spectrum radius of the reserve pool is a key parameter of the reserve pool, affecting its memory capacity. In general, an ESN network can have a stable echo state attribute when 0 < SR <1. Due to the different types of neurons in the reserve pool and the different characteristics of the data, the input signal needs to be scaled by the reserve cell input unit size IS to be transported from the input layer to the reserve pool. The size of the input unit scale is related to the nonlinear data to be processed. The stronger the nonlinearity, the larger the input unit scale. The sparsity of the reserve pool SD specifically refers to the proportion of neurons connected in the reserve pool to the total number of neurons. In general, when the SD is 10%, the reserve pool can maintain certain dynamic characteristics.
The basic equations of ESN are: where u(n) = u 1 (n), u 2 (n), …, u k (n), x(n) = x 1 (n), x 2 (n), …, x N (n), y(n) = y 1 (n), y 2 (n), …, y L (n) are input vector, state vector, and output vector of ESN, respectively; f and f out are activation functions for internal neurons of processing unit and output unit of the reservoir, respectively, and they are generally tanh functions.

Grey Wolf Optimizer
The Grey Wolf Optimizer (GWO) is a novel swarm intelligence algorithm proposed by Mirjalili in 2014 (Saremi et al. 2015) (Rezaei et al. 2018). This algorithm is based on mimicking of social hierarchy and hunting activities of grey wolf herd. Grey wolves are social animals with strict hierarchy, including α, β, δ, and ω. Herein, α is the leader who distribute different tasks (surrounding, hounding, attacking) to individuals of different levels to achieve global optimization. In virtue of its simple structure, negligible parameter adjustment required, and high effectiveness, the GWO algorithm has been widely applied for function optimizations. For a population consisting of N grey wolves (X = X 1, X 2 , …, X N ), the location of the ith wolf is defined as X i = X I 1 , X I 2 , …, X I d and X I d refers to the location of the ith wolf is d-dimensional space. The specific hunting activity is defined as follows: where A and C are coefficient vectors, t is the iteration number, X(t) is the location vector of a grey wolf, X i (t) is target location vector of the grey wolf, D is the distance between the grey wolf and the prey. The coefficient vector is defined as follows: where r 1 and r 2 are random vectors with values in [0, 1] and a is the iteration factor.
Grey wolves have a strong prey search capability. α is the leader who command all activities and β and δ may participate occasionally. In the GWO algorithm, α is defined as the optimal solution, while β and δ can also provide effective target information to α. Therefore, α, β, and δ are the three optimal solutions currently and their updated locations are as follows: where X α , X β , and X δ are current locations of α, β, and δ, respectively; X(t) is the target location of grey wolf; D α , D β , and D δ are distances from the prey to α, β, and δ, respectively; X(t + 1) refers to the location vector with updated searching factor; C and A are random vectors.

The optimized Echo State Network algorithm
As a key parameter in ESN, W out is selected by a series of linear regressions of data in the training set. Owing to its unique structure, ESN requires a large volume of training samples, making its training highly challenging. Therefore, W out was optimized using the Grey Wolf algorithm and a Grey Wolf optimized echo state network algorithm (denoted as the GWO_ESN algorithm) is proposed.
Procedures of the GWO_ESN algorithm are as follows: a) Establish ESN as shown in Figure 1 and initialize the parameters of this network. b) Initialize parameters and location functions and target location functions of α, β, δ, as shown in Figure 2. Herein, the initial locations of α, β, δ are fixed and the corresponding parameters are C and a, respectively. ω is the wolf at the bottom hierarchy and prey is located in the middle part. c) Calculate the value of fitness function using Eq (16) and compare it with the value of target function in Step b. Herein, y i is predicted value based on W out and Eq (2) and y is the practical value. corresponding to α, β, δ using Eq (6) and (7). f) Execute time series transversal and update locations of α, β, δ using Eq (4) and (5). The specific updating equations are Eq (9), (10), (11), (12), (13), and (14). Figure 3 shows updated locations of grey wolves. g) If the maximized iteration number is not achieved, go back to Step b and repeat the process; if the maximized iteration number is achieved, obtain updated locations of α, β, δ and calculate the ultimate optimization result (W out ) using Eq (15).

The pseudo code of GWO_ESN:
Algorithm GWO_ESN Optimize W out function GWO_ESN (X i , a, A, C, N, W in , W, W back ) position =initialization (m, dim); W out =position do fitness =ESN (U, Y, M, W out ); If fitness< X α fitness = X α X 1 =position end if fitness>X α &&fitness< X β fitness = X β X 2 =position end if fitness>X α &&fitness> X β && fitness> X δ fitness = X δ X 3 =position end t=0 for X 1 , X 2 , X 3 update by Equation (12 13 14) end for update a, A, and C update X α X β X δ t=t+1 until (t > Max_iteration) W out =3/sum (X 1 +X 2 +X 3 ) return W out end function

Time series prediction model of GWO_ESN
In this article, a time series prediction model of Grey Wolf optimized ESN (denoted as the GWO_ESN model) combining ESN and the Grey Wolf algorithm is proposed. Herein, W out of ESN is optimized using the Grey Wolf algorithm and the proposed GWO_ESN algorithm is applied in time series predictions. This model eliminates the issue of over-large volume of training samples in ESN and improves the prediction accuracy.
Step1: Pre-process the original sequence and obtain de-noising and dimensionality reduced normalized data. Step2: Initialize parameters in ESN and Grey Wolf algorithm.
Step3: Optimize W out of ESN using the GWO algorithm.
Step4: Predict using ESN based on W out .
In order to better verify the performance of the time series prediction model, this experiment selected seven sets of data, of which the first five groups are nonlinear data., including the EEG public EEG data EEG, China Statistical Yearbook official website 1999-2008 different influencing factors The Shanghai Railway Index in the historical stock index data of the railway passenger traffic volume, China's 1985-2011 grain production data 1, 2 and Netease Financial Network 1990/12/20-1991/1/24. The latter two groups are chaotic time series data, mainly Lorenz chaotic sequence and Mackey-Glass chaotic sequence. The specific nonlinear data set information is shown in Table 1. The chaotic time series is defined as follows: (1) The Mackey-Glass chaotic time series is defined by the following time delay differential equation: Where x(0) = 1.2, τ = 17, iteratively generates chaotic time series using the fourth-order Runge-Kutta method.
(2) Lorenz chaotic time series The Lorenz chaotic time series is described by the following three-dimensional ordinary differential equations: dx dt a y x dy dt c z x y dz dt xy bz When the parameters a = 10, b = 8/3, c = 28, the initial value x (0) = y (0) = z (0) = 1, the sys tem generates chaos, which is iteratively generated by the fourth-order Runge-Kutta method. Chaotic time series. The delay time and the embedding dimension of the sequence are set as: τ 1 = 19, τ 2 = 13, τ 3 = 12, m 1 = 3, m 2 = 5, m 3 = 7.

Evaluation standards
To compare accuracies of different prediction models and evaluate performance of the proposed GWO time series prediction method for ESN, two evaluation parameters are involved: comparison of fitting of predicted sequence and actual sequence and mean square error (MSE) of predicted values and actual values. The MSE as an evaluation parameter in this study is defined as: ( ) where ŷ refers to the prediction value, y refers to the measured value, n refers to the data length.

Results and analysis
3.3.1. Experiment 1: Fitting of prediction curves by different time series prediction models and practical curves The BP neural network model (Zhai and Cao 2016), the Elman neural network model (Liang et al. 2017), and the ESN model (Li et al. 2012), ESN prediction model based on recursive least squares (denoted as RLS_ESN) (Chouikhi et al. 2017), PSO optimization based ESN model (denoted as PSO_ESN) (Zhang et al. 2015), and the proposed GWO_ESN model were involved in this prediction experiment. The prediction results by these five models over the five time series data sets were compared with practical results and to each other in the way of fitting graphs. Figure 4 shows fitting of practical results and prediction results by the five prediction models over five data sets. As observed, fitting of practical results and prediction results by the five prediction models follows: GWO_ESN model > PSO_ESN model > RSN model > Elman model > BP model. Herein, prediction results by the proposed GWO_ESN model were highly fitted with practical results and the amplitudes were relatively small, indicating high prediction accuracy of the proposed model. In terms of different data sets, performances of the BP network and the Elman network varied significantly with the data set due to their poor structural stabilities. Meanwhile, the Elman network shows advantages in applicability to time series cases over the BP network, thus presenting good prediction performance.
Compared with the Elman model and the BP model, the ESN model exhibits excellent prediction accuracy in stock data set and EEG data set. As shown in Figure 4 (d) and (e), stock data set and EEG data set are characterized by large volume of training samples and predictions by ESN are based on sufficient training in these cases. For the other three data sets with relatively small training sample sizes, sufficient training cannot be achieved and prediction accuracy of ESN was limited in these cases. On the other hand, the GWO_ESN model exhibited good prediction performances in all five cases, indicating strong generalization capability of this model. In other words, the GWO_ESN model is applicable for predictions of various time series data. Therefore, two conclusions can be drawn. First, fitting effectiveness of the ESN model varies significantly with the data set. Second, the GWO-ESN model shows excellent fitting effectiveness for all data sets, while the PSO-ESN model is inferior for history data sets but its effectiveness still satisfies the requirement. Additionally, predictions by the PSO-ESN model are significantly deviated from practical results for certain data sets. This can be attributed to data characteristics and parameter setting.
Experiment 1 demonstrated that fitting efficiency of the proposed GWO_ESN time series prediction model is significantly improved compared with the ESN model and the prediction results by the proposed GWO_ESN time series prediction model are perfectly aligned with practical results. Therefore, the prediction performance of the proposed GWO_ESN time series prediction model is considered to be optimized. Meanwhile, the proposed GWO_ESN time series prediction model is characterized by low time complexity, less parameters required, and highly effective algorithm compared with other models. Table 2 summarizes MSE of different data sets by the five prediction models. A low MSE indicates good model performance. As observed, MSE of data by the GWO-ESN model is significantly lower than that by other models, indicating excellent prediction performance of the GWO-ESN prediction model. Additionally, prediction accuracies of the PSO-ESN model and the ESN model are significantly higher than those of the BP neural network and the Elman network. Moreover, performances of the BP neural network model and the Elman network model are unstable and their prediction performances may surpass the ESN model and the PSO-ESN model in certain cases, but never the GWO-ESN prediction model.

Experiment 2: MSEs of different prediction models for different data sets
In summary, the proposed GWO_ESN model exhibited excellent prediction performance even at small training sample size and it is superior to other models in terms of prediction accuracy. Meanwhile, due to its superior structural stability, the ESN network structure shows advantages in prediction based on nonlinear data over the BP neural network model and the Elman network model. Additionally, involvement of the GWO algorithm makes the proposed model leads to enhanced overall performance in all cases compared to the BP neural network model and the Elman network model. Sufficient learning of fluctuating data avoids performance degradation induced by any individual parameter.  Table 3 shows the comparison of the running time of the six predictive models on different datasets. It can be seen from the table that the GWO_ESN predictive model has relatively few running times under seven different datasets, although in some datasets the model running time is not dominant compared to the BP, Elman, and ESN prediction models, but it can be seen from Table 2 that, in the case of ensuring higher prediction accuracy, the model has a relatively small running time compared to other optimization models.

Conclusions
In this paper, we proposed a GWO_ESN time series prediction model in which W out of ESN is optimized using the Grey Wolf algorithm to solve difficult training issues in ESN induced by. Meanwhile, this model allows sufficient learning of fluctuating and nonlinear time series data. Compared with the PSO_ESN model, the RLS_ESN model, the ESN model, the BP neural network model, and the Elman network model, the proposed model exhibits advantage in prediction accuracy and reliability. In addition, parameters of the reserve pool in the ESN network in this experiment are mainly selected through empirical summary and multiple experimental results, and these parameters have certain influence on the experimental results, so find more suitable parameters to achieve better. The experimental effect is worthy of further study and discussion. Besides, performances of the proposed model for prediction of data distributions in other cases need to be verified.

Funding Information
This work was financially supported by the National Youth Science Foundation of China (No.61503272), the Scientific and technological project of Shanxi (No.201603D22103-2).