As a novel recursion neural network, Echo State Networks (ESN) are characterized by strong nonlinear prediction capability and effective and straightforward training algorithms. However, conventional ESN predictions require a large volume of training samples. Meanwhile, the time sequence data are complicated and unstable, resulting in insufficient learning of this network and difficult training. As a result, the accuracies of conventional ESN predictions are limited. Aimed at this issue, a time series prediction model of Grey Wolf optimized ESN has been proposed. W^{out} of ESN was optimized using the Grey Wolf algorithm and predictions of time series data were achieved using simplified training. The results indicated that the optimized time series prediction method exhibits superior prediction accuracy at a small sample size, compared with conventional prediction methods.

The time series are a group of random variables arranged in time order and has been widely applied in our daily life and industry, including commerce, meteorology, finance, and agriculture. To fully understand universal laws and provide references to optimized decision-making, great attentions have been invested in time series predictions (

Owing to effects by various factors, time series is usually characterized by significant randomness and nonlinearity (

The Echo State Network (ESN) (^{out} of ESN using particle swarm algorithm (

Aimed at this issue, a time series prediction model of Grey Wolf optimized ESN is proposed by introducing the Grey Wolf algorithm, a swarm intelligence optimization algorithm. First, significance of time series predictions and the state-of-the-other studies in this field were introduced. Then, the GWO time series prediction method for ESN was proposed and described in details. Finally, the proposed model is verified based on different data sets.

In order to solve issues (e.g., difficult training) in ESN predictions, time series prediction model of Grey Wolf optimized ESN is proposed. This method eliminates the issue of difficult training by optimizing W^{out} using the Grey Wolf algorithm and improves the accuracy of ESN prediction. Additionally, experiments demonstrated significant enhancements of prediction accuracy of the proposed prediction method over different time series data sets.

ESN is a novel recursion neural network consisting of input layer, hidden layer, and output layer (^{in} refers to the input weight connection matrix, which is the connection between input layer and reservoir; W refers to the internal weight connection matrix, which is the connection between reservoir and internal neuron; W^{back} refers to the feedback weight connection matrix, which is the connection between output layer and the next output layer; W^{out} refers to the output weight connection matrix, which is the connection between reservoir and output layer. W^{out} is the only key parameter that requires training.

Echo State Network diagram.

Unlike other neural networks, the hidden layer in this network is replaced by reservoir. Herein, the reservoir consisting of various sparse neurons dynamically connected to each other and it exhibits memory capability via performance of weight storage system between neurons. The reserve pool is the core part of the ESN network, and its parameters are of great significance to the network, including the size of the reserve pool N, the internal connection weight spectrum radius SR of the reserve pool, the input unit scale IS and the sparsity degree SD. Among them, the size of the reserve pool N is reflected by the number of neurons. The size of the scale N affects the predictive power of the ESN network. In general, the size of N is adjusted by the number of data sets. The internal connection weight spectrum radius of the reserve pool is a key parameter of the reserve pool, affecting its memory capacity. In general, an ESN network can have a stable echo state attribute when 0 < SR <1. Due to the different types of neurons in the reserve pool and the different characteristics of the data, the input signal needs to be scaled by the reserve cell input unit size IS to be transported from the input layer to the reserve pool. The size of the input unit scale is related to the nonlinear data to be processed. The stronger the nonlinearity, the larger the input unit scale. The sparsity of the reserve pool SD specifically refers to the proportion of neurons connected in the reserve pool to the total number of neurons. In general, when the SD is 10%, the reserve pool can maintain certain dynamic characteristics.

The basic equations of ESN are:

where u(n) = u_{1}(n), u_{2}(n), …, u_{k}(n), x(n) = x_{1}(n), x_{2}(n), …, x_{N}(n), y(n) = y_{1}(n), y_{2}(n), …, y_{L}(n) are input vector, state vector, and output vector of ESN, respectively; f and f^{out} are activation functions for internal neurons of processing unit and output unit of the reservoir, respectively, and they are generally tanh functions.

The Grey Wolf Optimizer (GWO) is a novel swarm intelligence algorithm proposed by Mirjalili in 2014 (

For a population consisting of N grey wolves

where A and C are coefficient vectors, t is the iteration number, X(t) is the location vector of a grey wolf, X_{i}

The coefficient vector is defined as follows:

where r_{1} and r_{2} are random vectors with values in [0, 1] and a is the iteration factor.

Grey wolves have a strong prey search capability. α is the leader who command all activities and β and δ may participate occasionally. In the GWO algorithm, α is defined as the optimal solution, while β and δ can also provide effective target information to α. Therefore, α, β, and δ are the three optimal solutions currently and their updated locations are as follows:

where X_{α}, X_{β}, and X_{δ} are current locations of α, β, and δ, respectively; X(t) is the target location of grey wolf; D_{α}, D_{β}, and D_{δ} are distances from the prey to α, β, and δ, respectively; X(t + 1) refers to the location vector with updated searching factor; C and A are random vectors.

As a key parameter in ESN, W^{out} is selected by a series of linear regressions of data in the training set. Owing to its unique structure, ESN requires a large volume of training samples, making its training highly challenging. Therefore, W^{out} was optimized using the Grey Wolf algorithm and a Grey Wolf optimized echo state network algorithm (denoted as the GWO_ESN algorithm) is proposed.

Procedures of the GWO_ESN algorithm are as follows:

Establish ESN as shown in Figure

Initialize parameters and location functions and target location functions of α, β, δ, as shown in Figure

Calculate the value of fitness function using Eq (16) and compare it with the value of target function in Step b. Herein, y_{i} is predicted value based on W^{out} and Eq (2) and y is the practical value.

If the values of fitness function obtained in Step c are lower than target function values of α, β, δ, target function values of α, β, δ are updated to fitness function values.

Calculate Parameter a in each iteration using Eq (8) and coefficient coefficients (A and C) corresponding to α, β, δ using Eq (6) and (7).

Execute time series transversal and update locations of α, β, δ using Eq (4) and (5). The specific updating equations are Eq (9), (10), (11), (12), (13), and (14). Figure

If the maximized iteration number is not achieved, go back to Step b and repeat the process; if the maximized iteration number is achieved, obtain updated locations of α, β, δ and calculate the ultimate optimization result (W^{out}) using Eq (15).

The pseudo code of GWO_ESN:

Grey wolf’s initial position.

Grey wolf location update.

Optimize ^{out} |

function GWO_ESN (_{i}, ^{in}, W, W^{back} |

position = initialization (m, dim); |

^{out} |

do |

fitness =ESN (^{out} |

If fitness< X_{α} |

fitness = X_{α} |

_{1} |

end |

if fitness>X_{α}&&fitness< X_{β} |

fitness = X_{β} |

_{2} |

end |

if fitness>X_{α}&&fitness> X_{β}&& fitness> X_{δ} |

fitness = X_{δ} |

_{3} |

end |

for _{1}, X_{2}, X_{3} |

update by Equation (12 13 14) |

end for |

update |

update X_{α} X_{β} X_{δ} |

until (t > Max_iteration) |

^{out}_{1}_{2}_{3} |

return ^{out} |

end function |

In this article, a time series prediction model of Grey Wolf optimized ESN (denoted as the GWO_ESN model) combining ESN and the Grey Wolf algorithm is proposed. Herein, W^{out} of ESN is optimized using the Grey Wolf algorithm and the proposed GWO_ESN algorithm is applied in time series predictions. This model eliminates the issue of over-large volume of training samples in ESN and improves the prediction accuracy.

Step1: Pre-process the original sequence and obtain de-noising and dimensionality reduced normalized data.

Step2: Initialize parameters in ESN and Grey Wolf algorithm.

Step3: Optimize W^{out} of ESN using the GWO algorithm.

Step4: Predict using ESN based on W^{out}.

The experiment environment includes Matlab R2014b, Windows 7 Basic, 8G memory, Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz.

In order to better verify the performance of the time series prediction model, this experiment selected seven sets of data, of which the first five groups are nonlinear data., including the EEG public EEG data EEG, China Statistical Yearbook official website 1999–2008 different influencing factors The Shanghai Railway Index in the historical stock index data of the railway passenger traffic volume, China’s 1985–2011 grain production data 1, 2 and Netease Financial Network 1990/12/20—1991/1/24. The latter two groups are chaotic time series data, mainly Lorenz chaotic sequence and Mackey-Glass chaotic sequence. The specific nonlinear data set information is shown in Table

Data set information.

No. | Datasets | Data Length | Training set | Testing set |
---|---|---|---|---|

1 | Separation of EEG data | 5001*1 | 2000 | 500 |

2 | Railway passenger traffic | 34*8 | 16 | 16 |

3 | Food production 1 | 27*8 | 13 | 13 |

4 | Food production 1 | 10*10 | 5 | 5 |

5 | The Shanghai Composite Index | 400*7 | 200 | 199 |

6 | Mackey-Glass | 400*1 | 200 | 199 |

7 | Lorenz | 600*1 | 300 | 299 |

(1) The Mackey-Glass chaotic time series is defined by the following time delay differential equation:

Where x(0) = 1.2, τ = 17, iteratively generates chaotic time series using the fourth-order Runge-Kutta method.

(2) Lorenz chaotic time series

The Lorenz chaotic time series is described by the following three-dimensional ordinary differential equations:

When the parameters a = 10, b = 8/3, c = 28, the initial value x (0) = y (0) = z (0) = 1, the sys tem generates chaos, which is iteratively generated by the fourth-order Runge-Kutta method. Chaotic time series. The delay time and the embedding dimension of the sequence are set as: τ_{1} = 19, τ_{2} = 13, τ_{3} = 12, m_{1} = 3, m_{2} = 5, m_{3} = 7.

To compare accuracies of different prediction models and evaluate performance of the proposed GWO time series prediction method for ESN, two evaluation parameters are involved: comparison of fitting of predicted sequence and actual sequence and mean square error (MSE) of predicted values and actual values. The MSE as an evaluation parameter in this study is defined as:

where

The BP neural network model (

Figure

Comparison of different model predictions.

Compared with the Elman model and the BP model, the ESN model exhibits excellent prediction accuracy in stock data set and EEG data set. As shown in Figure

Experiment 1 demonstrated that fitting efficiency of the proposed GWO_ESN time series prediction model is significantly improved compared with the ESN model and the prediction results by the proposed GWO_ESN time series prediction model are perfectly aligned with practical results. Therefore, the prediction performance of the proposed GWO_ESN time series prediction model is considered to be optimized. Meanwhile, the proposed GWO_ESN time series prediction model is characterized by low time complexity, less parameters required, and highly effective algorithm compared with other models.

Table

Mean square error comparison.

Number | BP | Elman | ESN | RLS_ESN | PSO_ESN | GWO_ESN |
---|---|---|---|---|---|---|

1 | 0.0357 | 0.0164 | 0.0250 | 0.0303 | 0.0217 | 0.0019 |

2 | 0.0413 | 0.0058 | 0.0306 | 0.0224 | 0.0272 | 6.2226e–5 |

3 | 0.0240 | 0.0253 | 0.0221 | 0.0189 | 0.0189 | 0.0013 |

4 | 0.0464 | 0.1284 | 0.0207 | 0.1023 | 0.0266 | 3.84e–6 |

5 | 0.2834 | 0.0887 | 0.0086 | 0.0005 | 0.1241 | 1.6817e–6 |

6 | 0.0362 | 0.0214 | 0.0122 | 0.0056 | 0.0435 | 0.0011 |

7 | 0.0413 | 0.0326 | 0.0237 | 0.0147 | 0.1267 | 2.65e–4 |

In summary, the proposed GWO_ESN model exhibited excellent prediction performance even at small training sample size and it is superior to other models in terms of prediction accuracy. Meanwhile, due to its superior structural stability, the ESN network structure shows advantages in prediction based on nonlinear data over the BP neural network model and the Elman network model. Additionally, involvement of the GWO algorithm makes the proposed model leads to enhanced overall performance in all cases compared to the BP neural network model and the Elman network model. Sufficient learning of fluctuating data avoids performance degradation induced by any individual parameter.

Table

Running time comparison(s).

Number | BP | Elman | ESN | RLS_ESN | PSO_ESN | GWO_ESN |
---|---|---|---|---|---|---|

1 | 5.0357 | 8.9908 | 3.1273 | 6.0547 | 240.4544 | 30.3024 |

2 | 4.1332 | 4.8048 | 2.0346 | 4.3509 | 80.0272 | 20.3445 |

3 | 3.0233 | 3.4559 | 0.2234 | 2.0465 | 100.3323 | 14.3445 |

4 | 2.1347 | 5.5456 | 0.4563 | 2.1342 | 90.2314 | 10.8436 |

5 | 3.4536 | 6.1877 | 1.1386 | 3.2432 | 130.3213 | 34.2564 |

6 | 4.5434 | 7.2331 | 2.0454 | 4.0989 | 205.4512 | 38.0921 |

7 | 4.6564 | 6.7789 | 1.8732 | 3.9807 | 180.3455 | 45.6733 |

In this paper, we proposed a GWO_ESN time series prediction model in which W^{out} of ESN is optimized using the Grey Wolf algorithm to solve difficult training issues in ESN induced by. Meanwhile, this model allows sufficient learning of fluctuating and nonlinear time series data. Compared with the PSO_ESN model, the RLS_ESN model, the ESN model, the BP neural network model, and the Elman network model, the proposed model exhibits advantage in prediction accuracy and reliability. In addition, parameters of the reserve pool in the ESN network in this experiment are mainly selected through empirical summary and multiple experimental results, and these parameters have certain influence on the experimental results, so find more suitable parameters to achieve better. The experimental effect is worthy of further study and discussion. Besides, performances of the proposed model for prediction of data distributions in other cases need to be verified.

This work was financially supported by the National Youth Science Foundation of China (No.61503272), the Scientific and technological project of Shanxi (No.201603D22103-2).

The authors have no competing interests to declare.