The time series are a group of random variables arranged in time order and has been widely applied in our daily life and industry, including commerce, meteorology, finance, and agriculture. To fully understand universal laws and provide references to optimized decision-making, great attentions have been invested in time series predictions (Rojas and Pomares 2016) (Jiang et al. 2017).
Owing to effects by various factors, time series is usually characterized by significant randomness and nonlinearity (Chen et al. 2017). For accurate predictions of time sequence data, various time series prediction models have been proposed. For instance, ARIMA is a conventional linear time series prediction model with considerable prediction capability (Liu et al. 2016). However, continuously increasing data size leads to increasing size of nonlinear time series data and applications of linear prediction models have been limited. In virtue of rapid developments in machine learning and artificial intelligence, novel time series prediction models such as neural networks (Chandra 2015) (Wen et al. 2012) and support vector machine (SVM) (Yaseen et al. 2016) (Misaghi and Sheijani 2017) (Nieto et al. 2017) (Jaramillo et al. 2017) have been proposed and widely applied in nonlinear time series predictions. Ren et al. achieved rapid collections of receptor temperature using the reverse propagation neural network and rapid prediction of collector tube temperature based on the data collected (Ren et al. 2016). However, traditional neural networks are prone to fall into local optimum and dimensionality disasters due to the uncertainty of their structure. Huang et al. reported effective predictions of mammary cancer using the SVM algorithm (Huang et al. 2017). Although the algorithm overcomes the problems of traditional neural network prediction, its sequence suitable for processing is limited. Yao et al. proposed a RNN-based double layer mechanism model (denoted as DA-RNN) (Qin et al. 2017). This model can obtain the long-term time sequence dependency relationship and predict by selecting relevant time sequence driving series. This model can obtain the long-term time sequence dependency relationship and predict by selecting relevant time sequence driving series. The network needs to calculate the error gradient in the training process. Because the error ladder method is difficult to train the network, the inherent disadvantage of this difficult training limits the wide application of the recurrent neural network in practice (Egrioglu et al. 2015). In addition, since the weighting requirements of the training algorithm are continuously updated, and the update process is computationally intensive, the training time of the RNN network is increased.
The Echo State Network (ESN) (Sacchi et al. 2007) is a novel recursion network proposed by Jaeger in 2001 and has been applied in time series predictions. ESN is improved on the basis of the traditional recurrent neural network. The network structure is unique. The concept of “reservoir pool” is introduced, which can better adapt to the application of nonlinear systems. The network training process uses linear regression and has short-term memory function and the network model is simple and fast, has high prediction performance, overcomes the problems of large computational complexity, low training efficiency and local optimization in traditional recurrent neural networks, and can be adapted to the processing of time series data in practical problems. However, ESN requires a large volume of training samples owing to its unique structure and training is difficult in ESN. Meanwhile, reservoir applicability and prediction accuracy in complicated situations needs to be improved. Afterwards, various optimized ESN models have been proposed. Qin et al. proposed a novel E-KFM model by combining the KFM algorithm and ESN and applied it for multi-step prediction of time sequence data (Xiao et al. 2017). The E-KFM model exhibits excellent effectiveness and robustness, but did not achieve good optimization results. Qiao et al. reported accurate predictions by ESN via optimization of Wout of ESN using particle swarm algorithm (Qiao et al. 2016), and proposed the PSO_ESN prediction model. The model improves the prediction performance to a certain extent, but the model training time is longer due to the evolution of particles and the number of iterations. Zhong et al. reported optimization of double layer ESN by genetic algorithms and applied the optimized model in multi-area time series prediction (Zhong et al. 2017). The prediction model optimizes the echo state network, improves the accuracy of time series prediction, and shortens the prediction time to some extent. However, the genetic algorithm has complex coding, many parameters and choices rely on experience, which cannot solve the problem of large-scale calculation.
Aimed at this issue, a time series prediction model of Grey Wolf optimized ESN is proposed by introducing the Grey Wolf algorithm, a swarm intelligence optimization algorithm. First, significance of time series predictions and the state-of-the-other studies in this field were introduced. Then, the GWO time series prediction method for ESN was proposed and described in details. Finally, the proposed model is verified based on different data sets.
In order to solve issues (e.g., difficult training) in ESN predictions, time series prediction model of Grey Wolf optimized ESN is proposed. This method eliminates the issue of difficult training by optimizing Wout using the Grey Wolf algorithm and improves the accuracy of ESN prediction. Additionally, experiments demonstrated significant enhancements of prediction accuracy of the proposed prediction method over different time series data sets.
ESN is a novel recursion neural network consisting of input layer, hidden layer, and output layer (Lun et al. 2015) (Han and Mu 2011). As shown in Figure 1, layers are connected to each other via different weight matrices. Herein, Win refers to the input weight connection matrix, which is the connection between input layer and reservoir; W refers to the internal weight connection matrix, which is the connection between reservoir and internal neuron; Wback refers to the feedback weight connection matrix, which is the connection between output layer and the next output layer; Wout refers to the output weight connection matrix, which is the connection between reservoir and output layer. Wout is the only key parameter that requires training.
Unlike other neural networks, the hidden layer in this network is replaced by reservoir. Herein, the reservoir consisting of various sparse neurons dynamically connected to each other and it exhibits memory capability via performance of weight storage system between neurons. The reserve pool is the core part of the ESN network, and its parameters are of great significance to the network, including the size of the reserve pool N, the internal connection weight spectrum radius SR of the reserve pool, the input unit scale IS and the sparsity degree SD. Among them, the size of the reserve pool N is reflected by the number of neurons. The size of the scale N affects the predictive power of the ESN network. In general, the size of N is adjusted by the number of data sets. The internal connection weight spectrum radius of the reserve pool is a key parameter of the reserve pool, affecting its memory capacity. In general, an ESN network can have a stable echo state attribute when 0 < SR <1. Due to the different types of neurons in the reserve pool and the different characteristics of the data, the input signal needs to be scaled by the reserve cell input unit size IS to be transported from the input layer to the reserve pool. The size of the input unit scale is related to the nonlinear data to be processed. The stronger the nonlinearity, the larger the input unit scale. The sparsity of the reserve pool SD specifically refers to the proportion of neurons connected in the reserve pool to the total number of neurons. In general, when the SD is 10%, the reserve pool can maintain certain dynamic characteristics.
The basic equations of ESN are:
where u(n) = u1(n), u2(n), …, uk(n), x(n) = x1(n), x2(n), …, xN(n), y(n) = y1(n), y2(n), …, yL(n) are input vector, state vector, and output vector of ESN, respectively; f and fout are activation functions for internal neurons of processing unit and output unit of the reservoir, respectively, and they are generally tanh functions.
The Grey Wolf Optimizer (GWO) is a novel swarm intelligence algorithm proposed by Mirjalili in 2014 (Saremi et al. 2015) (Rezaei et al. 2018). This algorithm is based on mimicking of social hierarchy and hunting activities of grey wolf herd. Grey wolves are social animals with strict hierarchy, including α, β, δ, and ω. Herein, α is the leader who distribute different tasks (surrounding, hounding, attacking) to individuals of different levels to achieve global optimization. In virtue of its simple structure, negligible parameter adjustment required, and high effectiveness, the GWO algorithm has been widely applied for function optimizations.
For a population consisting of N grey wolves, the location of the ith wolf is defined as and refers to the location of the ith wolf is d-dimensional space. The specific hunting activity is defined as follows:
where A and C are coefficient vectors, t is the iteration number, X(t) is the location vector of a grey wolf, Xi(t) is target location vector of the grey wolf, D is the distance between the grey wolf and the prey.
The coefficient vector is defined as follows:
where r1 and r2 are random vectors with values in [0, 1] and a is the iteration factor.
Grey wolves have a strong prey search capability. α is the leader who command all activities and β and δ may participate occasionally. In the GWO algorithm, α is defined as the optimal solution, while β and δ can also provide effective target information to α. Therefore, α, β, and δ are the three optimal solutions currently and their updated locations are as follows:
where Xα, Xβ, and Xδ are current locations of α, β, and δ, respectively; X(t) is the target location of grey wolf; Dα, Dβ, and Dδ are distances from the prey to α, β, and δ, respectively; X(t + 1) refers to the location vector with updated searching factor; C and A are random vectors.
As a key parameter in ESN, Wout is selected by a series of linear regressions of data in the training set. Owing to its unique structure, ESN requires a large volume of training samples, making its training highly challenging. Therefore, Wout was optimized using the Grey Wolf algorithm and a Grey Wolf optimized echo state network algorithm (denoted as the GWO_ESN algorithm) is proposed.
Procedures of the GWO_ESN algorithm are as follows:
The pseudo code of GWO_ESN:
|Optimize W out|
|function GWO_ESN (Xi, a, A, C, N, Win, W, Wback)|
|position = initialization (m, dim);|
|fitness =ESN (U, Y, M, Wout);|
|If fitness< Xα|
|fitness = Xα|
|if fitness>Xα&&fitness< Xβ|
|fitness = Xβ|
|if fitness>Xα&&fitness> Xβ&& fitness> Xδ|
|fitness = Xδ|
|for X1, X2, X3|
|update by Equation (12 13 14)|
|update a, A, and C|
|update Xα Xβ Xδ|
|until (t > Max_iteration)|
|Wout=3/sum (X1+X2 +X3)|
In this article, a time series prediction model of Grey Wolf optimized ESN (denoted as the GWO_ESN model) combining ESN and the Grey Wolf algorithm is proposed. Herein, Wout of ESN is optimized using the Grey Wolf algorithm and the proposed GWO_ESN algorithm is applied in time series predictions. This model eliminates the issue of over-large volume of training samples in ESN and improves the prediction accuracy.
The experiment environment includes Matlab R2014b, Windows 7 Basic, 8G memory, Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz.
In order to better verify the performance of the time series prediction model, this experiment selected seven sets of data, of which the first five groups are nonlinear data., including the EEG public EEG data EEG, China Statistical Yearbook official website 1999–2008 different influencing factors The Shanghai Railway Index in the historical stock index data of the railway passenger traffic volume, China’s 1985–2011 grain production data 1, 2 and Netease Financial Network 1990/12/20—1991/1/24. The latter two groups are chaotic time series data, mainly Lorenz chaotic sequence and Mackey-Glass chaotic sequence. The specific nonlinear data set information is shown in Table 1. The chaotic time series is defined as follows:
|No.||Datasets||Data Length||Training set||Testing set|
|1||Separation of EEG data||5001*1||2000||500|
|2||Railway passenger traffic||34*8||16||16|
|3||Food production 1||27*8||13||13|
|4||Food production 1||10*10||5||5|
|5||The Shanghai Composite Index||400*7||200||199|
(1) The Mackey-Glass chaotic time series is defined by the following time delay differential equation:
Where x(0) = 1.2, τ = 17, iteratively generates chaotic time series using the fourth-order Runge-Kutta method.
(2) Lorenz chaotic time series
The Lorenz chaotic time series is described by the following three-dimensional ordinary differential equations:
When the parameters a = 10, b = 8/3, c = 28, the initial value x (0) = y (0) = z (0) = 1, the sys tem generates chaos, which is iteratively generated by the fourth-order Runge-Kutta method. Chaotic time series. The delay time and the embedding dimension of the sequence are set as: τ1 = 19, τ2 = 13, τ3 = 12, m1 = 3, m2 = 5, m3 = 7.
To compare accuracies of different prediction models and evaluate performance of the proposed GWO time series prediction method for ESN, two evaluation parameters are involved: comparison of fitting of predicted sequence and actual sequence and mean square error (MSE) of predicted values and actual values. The MSE as an evaluation parameter in this study is defined as:
whererefers to the prediction value, y refers to the measured value, n refers to the data length.
The BP neural network model (Zhai and Cao 2016), the Elman neural network model (Liang et al. 2017), and the ESN model (Li et al. 2012), ESN prediction model based on recursive least squares (denoted as RLS_ESN) (Chouikhi et al. 2017), PSO optimization based ESN model (denoted as PSO_ESN) (Zhang et al. 2015), and the proposed GWO_ESN model were involved in this prediction experiment. The prediction results by these five models over the five time series data sets were compared with practical results and to each other in the way of fitting graphs.
Figure 4 shows fitting of practical results and prediction results by the five prediction models over five data sets. As observed, fitting of practical results and prediction results by the five prediction models follows: GWO_ESN model > PSO_ESN model > RSN model > Elman model > BP model. Herein, prediction results by the proposed GWO_ESN model were highly fitted with practical results and the amplitudes were relatively small, indicating high prediction accuracy of the proposed model. In terms of different data sets, performances of the BP network and the Elman network varied significantly with the data set due to their poor structural stabilities. Meanwhile, the Elman network shows advantages in applicability to time series cases over the BP network, thus presenting good prediction performance.
Compared with the Elman model and the BP model, the ESN model exhibits excellent prediction accuracy in stock data set and EEG data set. As shown in Figure 4 (d) and (e), stock data set and EEG data set are characterized by large volume of training samples and predictions by ESN are based on sufficient training in these cases. For the other three data sets with relatively small training sample sizes, sufficient training cannot be achieved and prediction accuracy of ESN was limited in these cases. On the other hand, the GWO_ESN model exhibited good prediction performances in all five cases, indicating strong generalization capability of this model. In other words, the GWO_ESN model is applicable for predictions of various time series data. Therefore, two conclusions can be drawn. First, fitting effectiveness of the ESN model varies significantly with the data set. Second, the GWO-ESN model shows excellent fitting effectiveness for all data sets, while the PSO-ESN model is inferior for history data sets but its effectiveness still satisfies the requirement. Additionally, predictions by the PSO-ESN model are significantly deviated from practical results for certain data sets. This can be attributed to data characteristics and parameter setting.
Experiment 1 demonstrated that fitting efficiency of the proposed GWO_ESN time series prediction model is significantly improved compared with the ESN model and the prediction results by the proposed GWO_ESN time series prediction model are perfectly aligned with practical results. Therefore, the prediction performance of the proposed GWO_ESN time series prediction model is considered to be optimized. Meanwhile, the proposed GWO_ESN time series prediction model is characterized by low time complexity, less parameters required, and highly effective algorithm compared with other models.
Table 2 summarizes MSE of different data sets by the five prediction models. A low MSE indicates good model performance. As observed, MSE of data by the GWO-ESN model is significantly lower than that by other models, indicating excellent prediction performance of the GWO-ESN prediction model. Additionally, prediction accuracies of the PSO-ESN model and the ESN model are significantly higher than those of the BP neural network and the Elman network. Moreover, performances of the BP neural network model and the Elman network model are unstable and their prediction performances may surpass the ESN model and the PSO-ESN model in certain cases, but never the GWO-ESN prediction model.
In summary, the proposed GWO_ESN model exhibited excellent prediction performance even at small training sample size and it is superior to other models in terms of prediction accuracy. Meanwhile, due to its superior structural stability, the ESN network structure shows advantages in prediction based on nonlinear data over the BP neural network model and the Elman network model. Additionally, involvement of the GWO algorithm makes the proposed model leads to enhanced overall performance in all cases compared to the BP neural network model and the Elman network model. Sufficient learning of fluctuating data avoids performance degradation induced by any individual parameter.
Table 3 shows the comparison of the running time of the six predictive models on different datasets. It can be seen from the table that the GWO_ESN predictive model has relatively few running times under seven different datasets, although in some datasets the model running time is not dominant compared to the BP, Elman, and ESN prediction models, but it can be seen from Table 2 that, in the case of ensuring higher prediction accuracy, the model has a relatively small running time compared to other optimization models.
In this paper, we proposed a GWO_ESN time series prediction model in which Wout of ESN is optimized using the Grey Wolf algorithm to solve difficult training issues in ESN induced by. Meanwhile, this model allows sufficient learning of fluctuating and nonlinear time series data. Compared with the PSO_ESN model, the RLS_ESN model, the ESN model, the BP neural network model, and the Elman network model, the proposed model exhibits advantage in prediction accuracy and reliability. In addition, parameters of the reserve pool in the ESN network in this experiment are mainly selected through empirical summary and multiple experimental results, and these parameters have certain influence on the experimental results, so find more suitable parameters to achieve better. The experimental effect is worthy of further study and discussion. Besides, performances of the proposed model for prediction of data distributions in other cases need to be verified.
This work was financially supported by the National Youth Science Foundation of China (No.61503272), the Scientific and technological project of Shanxi (No.201603D22103-2).
The authors have no competing interests to declare.
Chandra, R. 2015. Competition and Collaboration in Cooperative Coevolution of Elman Recurrent Neural Networks for Time-Series Prediction. IEEE Transactions on Neural Networks & Learning Systems, 26(12): 3123. DOI: https://doi.org/10.1109/TNNLS.2015.2404823
Chen, C, Twycross, J and Garibaldi, J. 2017. A new accuracy measure based on bounded relative error for time series forecasting. Plos One, 12(3): 1–23. DOI: https://doi.org/10.1371/journal.pone.0174202
Chouikhi, N, Ammar, B, Rokbani, N and Alimi, AM. 2017. PSO-based analysis of Echo State Network parameters for time series forecasting. Applied Soft Computing, 55: 211–225. DOI: https://doi.org/10.1016/j.asoc.2017.01.049
Egrioglu, E, Yolcu, U, Aladag, CH and Bas, E. 2015. Recurrent multiplicative neuron model artificial neural network for non-linear time series forecasting. Neural Processing Letters, 41(2): 249–258. DOI: https://doi.org/10.1007/s11063-014-9342-0
Han, M and Mu, DY. 2011. LM algorithm in echo state network for chaotic time series prediction. Control & Decision, 26(10): 1469–1472.
Huang, MW, Chen, CW, Lin, WC, Ke, SW and Tsai, CF. 2017. SVM and SVM ensembles in breast cancer prediction. Plos One, 12(1): e0161501. DOI: https://doi.org/10.1371/journal.pone.0161501
Jaramillo, J, Velasquez, JD and Franco, CJ. 2017. Research in financial time series forecasting with SVM: Contributions from literature. IEEE Latin America Transactions, 15(1): 145–153. DOI: https://doi.org/10.1109/TLA.2017.7827918
Jiang, P, Dong, Q, Li, P and Lian, L. 2017. A novel high-order weighted fuzzy time series model and its application in nonlinear time series prediction. Applied Soft Computing, 55: 44–62. DOI: https://doi.org/10.1016/j.asoc.2017.01.043
Li, D, Han, M, Wang, J. 2012. Chaotic time series prediction based on a novel robust echo state network. IEEE Trans Neural Netw Learn Syst, 23(5): 787–799. DOI: https://doi.org/10.1109/TNNLS.2012.2188414
Liang, Y, Qiu, L, Zhu, J and Pan, J. 2017. A Digester Temperature Prediction Model Based on the Elman Neural Network. Applied Engineering in Agriculture, 33(2): 142–148. DOI: https://doi.org/10.13031/aea.11157
Liu, C, Hoi, SCH, Zhao, P and Sun, J. 2016. Online arima algorithms for time series prediction. In: Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, 1867–1873.
Lun, SX, Yao, XS, Qi, HY and Hu, HF. 2015. A novel model of leaky integrator echo state network for time-series prediction. Neurocomputing, 159(1): 58–66. DOI: https://doi.org/10.1016/j.neucom.2015.02.029
Misaghi, S and Sheijani, OS. 2017. A hybrid model based on support vector regression and modified harmony search algorithm in time series prediction. In: 2017 5th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS). IEEE, 54–60. DOI: https://doi.org/10.1109/CFIS.2017.8003657
Nieto, PJG, García-Gonzalo, E, Fernández, JRA and Muñiz, CD. 2017. A hybrid wavelet kernel SVM-based method using artificial bee colony algorithm for predicting the cyanotoxin content from experimental cyanobacteria concentrations in the Trasona reservoir (Northern Spain). Journal of Computational & Applied Mathematics, 309(1): 587–602. DOI: https://doi.org/10.1016/j.cam.2016.01.045
Qiao, J, Li, R, Chai, W and Han, HJ. 2016. Prediction of BOD based on PSO-ESN neural network. Control Engineering, 23(4): 463–467. DOI: https://doi.org/10.15407/fm23.03.463
Qin, Y, Song, D, Chen, H, Cheng, W, Jiang, G and Cottrell, GJ. 2017. A dual-stage attention-based recurrent neural network for time series prediction. International Joint Conferences on Artificial Intelligence Organization, 2627–2633. DOI: https://doi.org/10.24963/ijcai.2017/366
Ren, T, Liu, S, Yan, G and Mu, HJ. 2016. Temperature prediction of the molten salt collector tube using BP neural network. IET Renewable Power Generation, 10(2): 212–220. DOI: https://doi.org/10.1049/iet-rpg.2015.0065
Rezaei, H, Bozorg-Haddad, O and Chu, X. 2018. Grey Wolf Optimization (GWO) Algorithm. In Advanced Optimization by Nature-Inspired Algorithms. Springer. 81–91. DOI: https://doi.org/10.1007/978-981-10-5221-7_9
Rojas, I and Pomares, H. 2016. Time Series Analysis and Forecasting. Contributions to Statistics, 43(5): 175–197. DOI: https://doi.org/10.1007/978-3-319-28725-6
Sacchi, R, Ozturk, MC, Principe, JC and Carneiro, AAFM. 2007. Water Inflow Forecasting using the Echo State Network: a Brazilian Case Study. In: International Joint Conference on Neural Networks. DOI: https://doi.org/10.1109/IJCNN.2007.4371334
Saremi, S, Mirjalili, SZ, Mirjalili, SM. 2015. Evolutionary population dynamics and grey wolf optimizer. Neural Computing and Applications, 26(5): 1257–1263. DOI: https://doi.org/10.1007/s00521-014-1806-7
Wen, L, Liang, XM, Long, ZQ, Qin, HY. 2012. RBF neural network time series forecasting based on hybrid evolutionary algorithm. Control & Decision, 27(8): 1265–1268+1272.
Xiao, Q, Chu, C, Zhao, L. 2017. Time series prediction using dynamic Bayesian network. Optik International Journal for Light and Electron Optics, 135: 98–103. DOI: https://doi.org/10.1016/j.ijleo.2017.01.073
Yaseen, ZM, Allawi, MF, Yousif, AA, Jaafar, O, Hamzah, FM, El-Shafie, A. 2016. Non-tuned machine learning approach for hydrological time series forecasting. Neural Computing & Applications, 1–13.
Zhai, J and Cao, J. 2016. The combined prediction model based on time series ARIMA and BP neural network. Statistics and Decision, 3(4): 29–32.
Zhang, Y, Yu, D, Seltzer, ML and Droppo, J. 2015. Speech recognition with prediction-adaptation-correction recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 5004–5008. DOI: https://doi.org/10.1109/ICASSP.2015.7178923
Zhong, S, Xie, X, Lin, L and Wang, F. 2017. Genetic algorithm optimized double-reservoir echo state network for multi-regime time series prediction. Neurocomputing, 238: 191–204. DOI: https://doi.org/10.1016/j.neucom.2017.01.053