Article Preview
TopIntroduction
Electric load forecasting plays a vital role in ensuring the security, stability, and efficiency of the smart grid. From the view of electricity generation, it helps to make a reasonable plan to provide a sufficient power supply and avoid the waste of resources caused by excessive production (Patel et al., 2019). From the view of the electricity market, it helps to set a time-of-use price to encourage off-peak power consumption (Zhao et al., 2021). Recently, the continuous emergence of various high-precision data acquisition equipment (such as smart meters) in the smart grid has provided strong support for electric load forecasting (Fekri et al., 2021).
As load data are usually recorded sequentially at a certain time interval, electric load forecasting can be regarded as the time series prediction in the field of data mining (Yu et al., 2020). Therefore, Recurrent Neural Network (RNN) that can well capture the time correlation of the sequence has been considered as a good choice for this task recently (Bianchi et al., 2017). As the simplest architecture, a Vanilla RNN is generally composed of three parts: the input layer, the hidden layer, and the output layer. Instead of the traditional Multi-Layer Perceptron containing only input and output connections, the Vanilla RNN introduces the recurrent connection from the previous to the current moment in the hidden layer (Elman, 1990). This means that the current input xt and the hidden state of the previous moment ht-1 together affect the hidden state ht. Because of the self-connection of the hidden layer, RNN designed for sequence modeling can be regarded as a deep network when it is unrolled along the time axis. Such a deep network structure can easily lead to the vanishing or exploding gradient problem for the long sequence in the process of parameter training using Back Propagation Through Time (BPTT) (Fernando et al., 2018). To address this challenge, different architectures were proposed to improving the trainability of RNN, but at the cost of significant computation overhead, such as Long Short-Term Memory (LSTM) (Hochreiter et al., 1997) and its variants (Yu et al., 2019). They pose a challenge in the training of many more parameters with the introduction of gates. Recently, Clock-Work RNN (CW-RNN) was proposed to utilize another simple but effective architecture to alleviate the gradient problem (Koutnik et al., 2014). It first divides the hidden layer into several modules with different updated frequencies, and then makes the slow-updating modules have recurrent connections with longer time delays. This allows data dependencies to be passed in fewer time steps to avoid excessive multiplications of the gradient. Besides, CW-RNN uses the predefined rule instead of training to determine module updates, greatly reducing the number of network parameters. But this inevitably weakens the generalization ability of the network. The motivation of this article is to refine and extend this architecture based on multi-timescale connections, aiming to resolve the contradiction between the performance and the number of parameters. The research content mainly includes the adaptive updating strategy of modules in the hidden layer and the pruning problem of recurrent connections caused by this strategy. The contributions of this article are:
- •
A new modularized RNN (M-RNN) is proposed to generalize the existing CW-RNN. M-RNN is a framework with the skip length of the module as the key component, which can realize the update of the hidden state by taking the module as the minimum unit.
- •
Two adaptive strategies for updating the hidden modules are proposed by designing two new activation functions to calculate the priority of each module. On this basis, the unordered and ordered adaptive M-RNNs (AM-RNNs) are defined respectively to achieve dynamic multi-timescale connections.
- •
Since the existing pruning strategy is only applicable to ordered AM-RNN, a two-way pruning strategy is designed for the unordered AM-RNN to realize the sparsification of recurrent connections.
- •
Both versions of AM-RNN are compared with other popular RNNs for electric load forecasting. The experimental results show that AM-RNNs can achieve better predictive accuracy with fewer network parameters than the current RNNs widely used in this field.