Open AccessArticle

A Hybrid Stacking Model for Enhanced Short-Term Load Forecasting

Fusen Guo

^1,*

Huadong Mo

Jianzhang Wu

Lei Pan

Hailing Zhou

⁴

Zhibo Zhang

Lin Li

⁵

and

Fengling Huang

⁶

School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Melbourne, VIC 3122, Australia

School of Systems and Computing, The University of New South Wales, Canberra, ACT 2600, Australia

School of Information Technology, Deakin University, Waurn Ponds, Geelong, VIC 3216, Australia

⁴

School of Engineering, Swinburne University of Technology, Melbourne, VIC 3122, Australia

⁵

School of Accounting, Information System and Supply Chain, RMIT University, Melbourne, VIC 3000, Australia

⁶

School of Aeronautics and Astronautics, Shanghai Jiaotong University, Shanghai 200240, China

Author to whom correspondence should be addressed.

Electronics 2024, 13(14), 2719; https://doi.org/10.3390/electronics13142719

Submission received: 20 June 2024 / Revised: 7 July 2024 / Accepted: 8 July 2024 / Published: 11 July 2024

(This article belongs to the Special Issue Advances in Power System Dynamics, Stability, Control and Dispatch with Large-Scale Renewable Energy Penetrated)

Download

Browse Figures

Versions Notes

Abstract

The high penetration of distributed energy resources poses significant challenges to the dispatch and operation of power systems. Improving the accuracy of short-term load forecasting (STLF) can optimize grid management, thus leading to increased economic and social benefits. Currently, some simple AI and hybrid models have issues to deal with and struggle with multivariate dependencies, long-term dependencies, and nonlinear relationships. This paper proposes a novel hybrid model for short-term load forecasting (STLF) that integrates multiple AI models with Lasso regression using the stacking technique. The base learners include ANN, XgBoost, LSTM, Stacked LSTM, and Bi-LSTM, while lasso regression serves as the metalearner. By considering factors such as temperature, rainfall, and daily electricity prices, the model aims to more accurately reflect real-world conditions and enhance predictive accuracy. Empirical analyses on real-world datasets from Australia and Spain show significant improvements in the forecasting accuracy, with a substantial reduction in the mean absolute percentage error (MAPE) compared to existing hybrid models and individual AI models. This research highlights the efficiency of the stacking technique in improving STLF accuracy, thus suggesting potential operational efficiency benefits for the power industry.

Keywords:

smart grid; short-term load forecasting; deep learning; stacking approach; time series analysis

1. Introduction

Integrating renewable energy sources into the electricity market has amplified uncertainties in power systems. It is the critical role of electricity load forecasting techniques in efficient power system management, which impacts the operation and maintenance activities of transmission and distribution systems. Within competitive electricity markets, the accuracy of load forecasts significantly affects financial, infrastructural, and operational aspects. Notably, even a 1% increase in the STLF error could potentially lead to an additional 10 million USD in annual operational costs [1]. As shown in Figure 1, STLF is important for energy efficiency and sustainability because of its crucial role in balancing power generation and consumption [2,3,4].

Existing prediction models can be primarily categorized into three types: statistical models [5,6,7,8], AI models [9,10,11,12,13,14,15,16], and hybrid models [17,18,19,20,21,22]. Hybrid models are considered to be highly accurate because they combine the strengths of multiple models while overcoming individual models’ limitations [23,24]. However, STLF is often affected by external factors outside the power system, such as temperature, electricity prices, and others [25,26]. However, very few of the hybrid models in previous studies consider such factors. To make the prediction models work in realistic conditions, it is critical to consider a wide range of factors, particularly external factors [27]. Furthermore, the model combining methods in existing studies are often simple, e.g., the reciprocal of errors method to combine long short-term memory network (LSTM) and XgBoost models based on the weight of the MAPE [18]. Therefore, stacking techniques have been proposed for STLF without thorough analysis and discussion [28].

We propose a novel hybrid model called the stacked model that leverages the stacking technique to enhance the prediction accuracy by considering multiple external factors. The stacking technique was chosen for its ability to combine the strengths of various base models, thereby improving the overall predictive performance. Our model integrates XgBoost, LSTM, Bi-LSTM, and stacked LSTM as base learners, with each contributing unique strengths: XgBoost handles structured data and captures nonlinear relationships effectively, LSTM and Bi-LSTM excel at learning long-term dependencies in sequential data, and stacked LSTM further improves the depth of temporal feature extraction. Lasso regression is used as the metalearner to combine the outputs of these base models, thus ensuring robust and sparse predictions. By considering factors such as temperature, rainfall, and daily electricity prices, the model aims to more accurately reflect real-world conditions and improve the prediction accuracy under both stable and unstable load conditions. This paper makes several significant contributions to accurate load forecasting for helping power system planning and dispatch, as well as reducing economic loss. Our contributions are summarized as follows:

Based on different AI models’ complementary strengths in handling nonlinear relationships, long-term dependencies, and temporal feature extraction, our model outperformed five single AI models (ANN, XgBoost, LSTM, two-layer LSTM, and Bi-LSTM) and two hybrid models (ANN-WNN and LSTM-XgBoost).
To address the varying load characteristics in different regions, we enhanced the prediction accuracy under both stable and unstable load conditions.
We integrated external factors to simulate real-world conditions and maintain high accuracy, such as minimum and maximum temperature, electricity price, and rainfall level.
We validated the novel application of the stacking technique by integrating a broader diversity of AI models, thus demonstrating its superior capability in improving prediction accuracy.

The rest of this paper is structured as follows: Section 2 provides an overview of the recent relative literature on STLF. The relative methodology and the framework of the proposed model are discussed in Section 3. We describe the case study and experimental results in Section 4. The effects and limitations of the proposed model are presented in Section 5, and the paper is concluded in Section 6.

2. Literature Review

In this section, we provide a comprehensive review of various models utilized in STLF using various models.

2.1. Statistical Models

A statistical model is based on a mathematical function that leverages sample data to make projections about broader phenomena, thus grounded in a mix of mathematical functions and statistical principles. Many statistical models exist, including the autoregressive integrated moving average (ARIMA), its extension the autoregressive integrated moving average with explanatory variable (ARIMAX), the Kalman Filter (KF), and multiple linear regression (MLR) models [29]. The ARIMA model has been proposed to predict peak load, and it has better performance than the autoregressive moving average (ARMA) model [8,30]. An improved KF model has been proposed to further enhance the accuracy of peak load forecasting [6].

Additionally, the ARIMAX model, considering varying consumer behavior during weekdays, weekends, and holidays, has outperformed the ARMA model [7,31]. A seasonal autoregressive integrated moving average (SARIMA) model was proposed to address the nonlinear relationships among variables and to determine relevant time lag patterns [5]. However, when the forecasting process incorporates multiple variables, statistical models encounter difficulties due to extended computation times, increased processing demands, and a limited scope for generalization [28].

2.2. Artificial Intelligence Models

AI models are considered to be more advanced due to their ability to unravel complex, nonlinear relationships between load and influencing factors [32]. Artificial neural network (ANN) models are often employed in STLF for their self-learning and error tolerance abilities, but their limited generalizability led to the development of the support vector machines (SVMs) model to address these limitations [9]. Comparative studies have demonstrated the superiority of the XgBoost model over the backpropagation neural network in terms of prediction accuracy when considering seasonal patterns [10]. A window-based XgBoost Model was proposed, thus incorporating real-time electricity pricing and maintaining an impressive MAPE of 0.35% [16]. Furthermore, an advanced Xgboost model was used to identify extreme weather for determining the range of peak load occurrence [11].

Furthermore, two LSTM models with different uses have been proposed: one for predicting single-step ahead load and the other for multistep intraday rolling horizons. These models have demonstrated superior performance compared to the generalized regression neural network (GRNN) model and extreme learning machine (ELM) [12]. Considering the excellent performance and diversity of the LSTM model [14] compared the performance of bidirectional long short-term memory (Bi-LSTM) and stacked LSTM, we find that Bi-LSTM is better than the stacked LSTM due to the lowest MAPE of 0.22%. In addition, the Bi-LSTM model was validated to have excellent peak value prediction capability [13]. However, the superior performance of stacked LSTM over Bi-LSTM was observed [15]. This performance discrepancy could be attributed to differences in the datasets used, but it is undeniable that various LSTM models perform consistently well in STLF problems.

2.3. Hybrid Models

A hybrid model combines two or more models into a whole, and it leverages the advantages of individual models [33,34]. To make accurate predictions, data processing algorithms have been combined with artificial neural networks, thus resulting in the creation of hybrid models. Ref. [21] proposed a hybrid STLF model using a grasshopper optimization algorithm (GOA) to optimize the parameters and a support vector network (SVN) for prediction. It achieved a higher accuracy than SVM models when considering temperature and humidity. However, Ref. [21] acknowledged the potential for improvement by considering additional influential variables. Another proposed hybrid model used wavelet neural networks (WNNs) for decomposing the load and influenced factors into several components and predicting using the ANN, thus resulting in a lower MAPE than that of the ANN [22].

Hybrid models have been proposed using the k-means method for data grouping and ANN–WNN for prediction, thus exhibiting extraordinary results [17]. This model applies WNN to forecast residuals from the ANN model. This integration enhances the data variance and the forecasting accuracy of the model, thus leading to better performance than standalone ANN or WNN models. Furthermore, an LSTM–XgBoost model used a reciprocal error method for improved accuracy, thus achieving an MAPE of 0.57 [18]. Despite the demonstrated effectiveness of these hybrids, the stacking method is viewed to be more effective in integrating models, but the discussions are limited [28]. One hybrid model combined a stacking technique with an improved artificial fish swarm algorithm to unite multiple support vector regression (SVR) models, thus considering previous-day temperature data [19]. In addition, a hybrid model combining multiple deep neural network (DNN) models as base models was developed in [20], thus utilizing principal component regression (PCR) to construct a metamodel. These hybrid models highlight the potential of ensemble learning for improving prediction accuracy.

3. Methodology and Framework

This section describes the related methodology and the structure of the proposed model.

3.1. Related Methodology

3.1.1. XgBoost

XgBoost, which stands for Extreme Gradient Boosting [35], is a highly efficient and powerful machine learning algorithm designed to enhance the performance of gradient boosting models. It excels in accurately capturing dynamic trends in short-term power loads, thus making it particularly effective for STLF problems [10]. It primarily utilizes gradient boosting decision trees to enhance speed and performance. The technique works by iteratively adding new weak learners to correct the residuals of previous ones. This iterative process addresses the limitations of individual learners, which might not achieve satisfactory results on their own. By combining these weak learners, XgBoost produces a final prediction that is more accurate than the predictions of the individual models. In essence, the final prediction generated by XgBoost is the sum of the scores of each weak learner [36,37].

The computational procedure of the XgBoost is as follows [36]:

1:: The sample weights and model parameters are first initialized by assigning the same weights to all samples in the training set.
2:: We need to use Equation (1) to calculate the classification error rate at each iteration:

$e r r_{c} = \frac{\sum w_{i} I (y_{i} \neq G_{c} x_{i})}{\sum w_{i}}$

(1)

where $w_{i}$ is the weight of the ith sample, and $G_{m}$ is the cth classifier.
3:: In the $c + 1_{t} h$ iteration, we need to calculte the weight of the $i + 1_{t} h$ sample using Equation (2):

$w_{i + 1} = w_{i} e^{a_{m} I (y_{i} \neq G_{m} x_{i})}$

(2)

where $a_{c}$ = log((1− $e r r_{c}$ )/ $e r r_{c}$ )
4:: Training XgBoost involves reducing the loss of the dataset’s goal function. To balance the decay of the goal function with the model’s complexity, a second-order Taylor expansion is performed on the loss function, and a regular term is added to the objective function to prevent overfitting. Equation (3) is used to compute the objective function:

$O b j = \sum_{i = 1}^{n} L (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})$

(3)

where n is the dimension of the feature vector, L( $y_{j}$ , ${\hat{y}}_{i}$ ) is the loss function, $y_{i}$ is the true value, ${\hat{y}}_{i}$ is the predicted value, and $Ω$ ( $f_{k}$ ) is a regular term used to control the complexity of the tree structure.
5:: As for the complexity, we use Equation (4) to calculate it:

$Ω (f) = γ N + \frac{1}{2} λ \sum_{i = 1}^{N} w_{i}^{2}$

(4)

where N is the number of leaf nodes, and $γ$ is the decreasing value of the minimum loss function for node splitting, which is used to control the degree of conversation. The representation of the weight L2 regularization is $λ$ . The square L2 modulus of w is used to regulate the tree’s complexity and prevent overfitting.

3.1.2. LSTM

LSTM is a recurrent neural network (RNN) architecture used to address the challenges posed by processing long time sequences in RNNs. LSTM tackles the gradient explosion or disappearance problem encountered by traditional RNNs due to their single-state hidden layer [38]. LSTM introduces a memory cell and three types of gates: an input gate, a output gate, and a forget gate. The memory cell serves as a long-term storage location capable of retaining information over extended periods, while the gates regulate the flow of information into and out of the cell [39]. This architecture allows LSTM to store and propagate errors backward through time and layers, thus facilitating the learning process across multiple time steps [40]. In the STLF problem, LSTM can handle input sequences of variable lengths, where historical load data have different lengths [13]. Moreover, LSTM is adept at capturing sequential patterns and long-term dependencies in historical load data, including daily cycles and seasonal trends [12]. The architecture of an LSTM cell is shown in Figure 2. The calculation process of each cell in LSTM is shown below [38].

Forget Gate:

It determines what information to discard and how much useful information to retain from a prior cell state by assigning a value between 0 and 1 to the previous cell state in comparison to the current cell input with the help of the sigmoid activation function:

G_{t} = σ (W_{G} \cdot [h_{t - 1}, x_{t}] + b_{G})

(5)

where

σ

is the sigmoid function,

G_{T}

signifies the forget gate’s weight matrices,

b_{G}

indicates the bias terms of the forget gate, and [

h_{t - 1}

x_{t}

] denotes combining two vectors into a single vector.

Input Gate:

The process begins by applying the sigmoid activation function to the combined information from the previous unit output and the current unit input. This generates a vector that determines which parts of the input should be updated. Simultaneously, the tanh activation function is applied to the same combined information to produce another vector that represents the candidate values for the cell state. These two vectors are then multiplied elementwise. The result of this multiplication determines the new information that will be added to the cell state:

\begin{matrix} i_{t} & = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}) \end{matrix}

(6)

\begin{matrix} {\tilde{c}}_{t} & = tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c}) \end{matrix}

(7)

where

h_{t - 1}

is the output information,

x_{t}

is the input information,

W_{i}

and

W_{c}

represent the input gate and cell state weight matrices, respectively, and

b_{i}

and

b_{c}

denote the bias terms of the input gate and cell state.

Cell Status:

Long-term memory updates by mixing previous data with new inputs. It updates by dot producting the prior state and the forget gate, and it then adds the product of the input gate and current cell’s state.

c_{t} = G_{t} * c_{t - 1} + i_{t} * {\tilde{c}}_{t} .

(8)

Output Gate:

It assigns values between 0 and 1 to regulate information flow in its current state by using the sigmoid function for previous cell output data and processing current cell input data inversely. Next, it applies the tanh function to the cell state’s output right before executing a dot product operation:

\begin{matrix} o_{t} & = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}) \end{matrix}

(9)

\begin{matrix} h_{t} & = o_{t} * tanh (c_{t}) \end{matrix}

(10)

where

h_{t - 1}

is the output information of the previous cell,

x_{t}

is the input information of the current cell,

W_{o}

indicates the output gate weight matrices, and

b_{o}

values denote the bias terms of the output gate.

3.1.3. Stacked LSTM

Multiple layers of LSTM units comprise the stacked LSTM model, which is an extension of the LSTM model. Each layer of a stacked LSTM consists of multiple LSTM units, with the output of one layer as the input for the following layer [15]. The layers are layered atop one another to form the architecture of a deep neural network. The architecture of the stacked LSTM is shown in Figure 3. Using a stacked LSTM is intended to enhance the capacity and ability of the model to learn complex patterns in sequential data. Each stack layer can capture distinct levels of abstraction and learn distinct temporal dependencies [20]. Lower layers can concentrate on capturing local patterns and short-term dependencies, whereas higher layers can learn more abstract and long-term dependencies.

3.1.4. Bi-LSTM

The bidirectional LSTM (Bi-LSTM) is an extension of the traditional LSTM model that incorporates both forward and backward information flows. Bi-LSTM consists of a forward LSTM layer and a backward LSTM layer as shown in Figure 4. During the forward pass, the forward LSTM processes the input sequence from the beginning to the end, thus capturing past information. Simultaneously, the backward LSTM processes the input sequence in reverse, thus capturing future information [13]. The outputs of the two layers are combined to obtain the final output sequence. It allows the model to capture contextual information from both the forward and backward perspectives, thus reducing reliance on any single time step and improving model robustness [14].

3.1.5. Lasso Regression

The lasso regression algorithm is a linear regression approach adapted to the L1 regularization function; it performs shrinkage and variable selection simultaneously for better prediction [41]. As for feature selection, it can change the weight of useless features to zero to solve the multicollinearity problem among various features. The lasso regression aims to identify the subset of important features that minimize response variables’ prediction error. The lasso regression aims to minimize the loss function:

\begin{matrix} L (β) & = Σ {(Y_{i} - Σ β_{j} * X_{i j})}^{2} + λ Σ |β_{j}| \end{matrix}

(11)

where

Y_{i}

is the observed outcome for the ith observation,

X_{i j}

is the jth feature value for the ith observation,

β_{j}

is the coefficient for the jth feature, and

λ

is the regularization parameter. By increasing the value of

λ

, the coefficients of less important features are shrunk towards zero, and the model will become sparser and more interpretable.

3.1.6. Stacking Technique

The stacking technique is an ensemble learning method combining multiple models to improve machine learning performance. It is used to leverage the strengths of different models and minimize the weaknesses of each model. It is usually composed of two layers: a series of base models considered as the first layer and a metamodel, which is usually only one and considered to be the second layer. The output of the base model will be input to the metamodel as new features, and the output of the metamodel is considered to be the final prediction result. As for the first layer, more base models are helpful for feature learning, because they can obtain the learning ability of different models for features. As for the second layer, the regression algorithm has been demonstrated to be effective [20].

3.2. The Framework of the Proposed Model

The innovation of this study mainly lies in developing and applying a new load forecasting model using staking technology. Compared to other stacked models, it integrates some AI models as base models and considers more influencing factors in STLF. In addition to load factors, our dataset considers the highest and lowest temperatures, the amount of precipitation, and the daily electricity price. We aim to make more accurate predictions on stable and unstable load datasets under conditions closer to the natural environment.

The framework of the proposed model is shown in Figure 5; it uses the stacking technique to combine the strengths of the models shown for an optimal load forecast. In the architecture of the model, XgBoost, LSTM, Bi-LSTM, and stacked LSTM act as the base model. The predictions of each model will be combined and used for the final prediction of the metamodel lasso regression. These base models create the first computational layer, with each possessing unique capabilities in terms of processing data and recognizing patterns. Using a metamodel to integrate the predictions from multiple models, we can achieve a more comprehensive and robust forecast that considers diverse patterns and relationships within the data.

During the base model selection process, we chose the XgBoost model due to its capacity to model complex nonlinear relationships, its resistance to overfitting, and its inbuilt regularization. The ability to capture temporal dependencies makes LSTM a base model for this task. The reason we chose stacked LSTM is for a better understanding of the underlying complexities within the data, and we expect to capture more abstract features within sequences of data. As for the Bi-LSTM, it allows the model to capture patterns that future data points influence. Moreover, the grid search technique determines the selection of parameters within these models. This rigorous procedure ensures good performance by selecting the best combination of parameters for each model.

The process of building this model is summarized in the following four steps:

Transforming the training dataset using the min–max normalization. Standardizing the scaling of data is essential for reducing the likelihood of irregularities caused by heterogeneous data ranges.
Individually applying the four fundamental models to generate predictive outputs. These outputs are then merged to produce a comprehensive set of input features for the metamodel to train.
Adopting the min–max normalization for combined input features. It ensures that the input features presented to the metamodel are scaled appropriately, thereby improving the accuracy and reliability of final output predictions.
Utilizing the metamodel to generate the final prediction. The prediction is based on the input features that have been processed.
Validating the final prediction result according to the MAPE based on the testing dataset. It is helpful for evaluating the accuracy of the proposed model.

3.3. Measurement Method

This paper uses the MAPE to evaluate the performance and difference between the proposed model and previous models; it clearly provides the disparity between the predicted and actual values and facilitates the analysis of the impact brought about by precise predictions. The measurement method is defined as follows:

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{X_{a c t (i) - X_{p r e d} (i)}}{X_{a c t} (i)}|

(12)

where

X_{a c t (i)}

represents the actual load in one day, and

X_{p r e d (i)}

is the value predicted by the model. A high value indicates that the predictions of the model are significantly off from the actual values. To achieve the highest level of accuracy, the ideal objective in model prediction is to minimize the value.

4. Performance Evaluation

In this section, we conduct two real case studies to compare the performance of the proposed model with two hybrid modes proposed before and five AI models in stable and unstable load datasets. It has two hybrid models—ANN–ANN [17], LSTM–XgBoost [18]—and five AI models—ANN, XgBoost, LSTM, Stacked LSTM, and Bi-LSTM, which are the components of the hybrid model.

4.1. Data Exploration and Preprocessing

The robustness and accuracy of the proposed model were examined through the implementation of two real case studies. The primary case study used data acquired from Victoria, Australia, from 1 January 2015 to 31 December 2019 [42]. The secondary case study utilized a dataset drawn from Spain, which integrated four consecutive years of data from 1 January 2015 to 31 December 2018 [43]. Each dataset is plotted in Figure 6. The Australian dataset manifests a stable and evident seasonal pattern, whereas the Spanish dataset, in contrast, demonstrates notable load fluctuations, thus indicating a more volatile nature.

To effectively simulate the impact of multiple influencing factors on the power load in real-world scenarios, our dataset comprehensively included the daily electric load quantities, a wide spectrum of meteorological elements, and the concurrent daily electricity prices. The features of our datasets are shown in Table 1. We anticipated that our model would maintain its capacity to generate accurate predictions even with the integration of more factors.

Before the model learning, we split 70% of the dataset for training and the remaining 30% for validation and testing. Given the substantial range inherent in several features that this study considered, we adopted the min–max normalization approach. This procedure uniformly adjusts each feature to fit a standard range of [0,1]. The definition of the normalization function is as follows:

x_{new} = \frac{x_{i} - x_{m i n}}{x_{m a x} - x_{m i n}}

(13)

where

x_{n e w}

represents the processed data, and

x_{i}

is a data point.

x_{m i n}

and

x_{m a x}

represent the minimum and the maximum value of the sequence

{x_{1}, x_{2}, \dots, x_{i}}

, respectively.

4.2. Experimental Results

4.2.1. Australia Dataset Experiments

We analyzed the Australian dataset to evaluate if the performance of the proposed model exhibited a stable trend and distinct seasonal patterns. The dataset is compiled from daily samples taken over five years, thus resulting in 1826 data points. Table 2 lists the MAPE values for the proposed model and benchmarks. The MAPE of our proposed model demonstrated admirable accuracy, thus averaging around 5.99%. Compared to other forecasting models, this score is lower. One important finding is that while the LSTM-XgBoost model performed better than its XgBoost and LSTM component models individually, it fell short of our proposed model. The better performance of the Bi-LSTM and stacked LSTM models compared to the ANN–WNN and LSTM–XgBoost models can be attributed to the parameters set according to their original design.

Specifically, compared to the ANN–WNN model, the proposed model exhibited a significant improvement, with a reduction of 45.65% in the MAPE, thus representing the largest improvement of the proposed model. Similarly, compared to the LSTM–XgBoost model, the proposed model also achieved a reduction of 15.28% in the MAPE. Furthermore, compared to the Bi-LSTM model, which already achieved the lowest MAPE among the other models, the proposed model achieved a notable reduction of approximately 9.2% in the MAPE. Evaluating the remaining AI models, the improvement of the proposed model in terms of the MAPE ranged between 10.6% and 45.46%. These results strongly highlight the potential advantages of the proposed model, as it combines the strengths of different models and effectively identifies complex relationships between variables.

Figure 7 provides a graphical representation of the predicted and actual lines of the hybrid models and the proposed model. The proposed model demonstrated alignment with the actual data because it displayed a laudable ability to accurately identify increased loads and corresponding peak values, even though fluctuations exist. The model adeptly recognized seasonal patterns, thus contributing to an accurate representation of the overall load trend. However, it had drawbacks in identifying low values. Conversely, the LSTM–XgBoost and ANN–WNN hybrid models did not meet the anticipated performance standards. Instances arose where these models generated forecasts that starkly deviated from the actual data, and a consistent pattern of load overestimation was discernible. Such consistent overestimation could potentially instigate substantial resource wastage. The results highlight the accuracy and efficiency of the proposed model in identifying seasonal patterns and understanding the overall load trend. Consequently, the proposed model outperformed other models in terms of its prediction accuracy and trend identification.

4.2.2. Spain Dataset Experiments

To verify that the proposed model still applied to an unstable load dataset, we conducted a case study of the Spain dataset, which has been sampled once daily for a total of 1461 samples for four years. In contrast to the Australian dataset, this dataset lacks evident seasonal patterns and is marked by substantial very short-term fluctuations. Thus, we designate it as an unstable dataset. The efficacy of the proposed model and benchmarks are displayed in Table 3. The proposed model continued to have the lowest MAPE in the unstable load dataset, at 7.8%, compared to other models. The predictive performance outcomes of the hybrid models were superior to those of individual AI models, because the performance outcomes of hybrid models exceeded those of the foundation models that compose them. Based on the results, we can conclude that the proposed model reduced the MAPE of the LSTM–XgBoost and ANN–WNN models by 11.3% and 16.4%, respectively. The MAPE increased by up to 24% compared to the ANN model, which had the worst performance.

As shown in Figure 8, the prediction line of the proposed model closely aligned with the actual line, thus effectively identifying both upward and downward trends in most situations. This trend consistency underscores the model’s capacity to capture load variations accurately. In contrast, the other hybrid models demonstrated drawbacks in trend identification, particularly during extreme conditions, where their predictions frequently deviated considerably from the actual values. A significant advantage of this model is that it does not produce outliers, whereas the predictions from other models presented multiple outliers. Furthermore, it resisted the tendency to overestimate the load encountered, which occurred with the LSTM–XgBoost model. Our proposed model effectively avoids scenarios of severe resource wastage and shortages, and it is helpful for dispatch and generation. Despite the noted improvement in trend identification, the proposed model exhibited some limitations in accurately forecasting the load due to the observed gap between the actual and predicted values.

5. Discussion

In this section, we will provide a detailed explanation of the implications of the proposed model in real-world scenarios. Additionally, we will discuss the model’s limitations and provide recommendations for further improvement.

5.1. The Effect on Economy

Based on the above evaluation, our proposed hybrid approach using the stacking technique has been proven to be accurate in predicting nonlinear, volatile, stationary, and seasonally fluctuating electricity demand. The accurate STLF leads to significant economic benefits, such as a reduction of 1% in prediction errors that can result in a cost reduction of 10 million USD [1]. This means that the proposed model is beneficial for saving costs due to its higher accuracy. In addition, it will also help power companies make efficient generation decisions and properly plan maintenance schedules, thus resulting in substantial savings in operational and maintenance costs.

5.2. The Effect on Power Grid

Due to the inherent instability of electricity load and the difficulty regarding storing electricity, the stability of the power grid is frequently challenged, thus leading to various extreme scenarios, such as power shortages and wastage [44]. In this regard, our proposed predictive model identifies variation trends of electricity accurately. It is beneficial for conducting electricity production and scheduling planning, thereby maintaining the system’s stability. Furthermore, the stacking model empowers our model to consider more factors, thus making it more representative of real-world environments. It has stronger adaptability and helps operators in power transmission and dispatch decisions, as losses can occur during this process due to weather conditions.

5.3. Limitations and Recommendations

The effectiveness of the proposed hybrid model utilizing the stacking technique has been demonstrated. However, predicting minimum values in stable and unstable load datasets encounters challenges. In addition, it only partially predicts the load in an unstable electricity load environment. Therefore, selecting a robust base model is required to enhance the applicability of our forecasting model. Hence, we need to explore further utilization of hybrid models with higher accuracy as base models or to incorporate a broader range of models into the base model to enhance the forecasting model’s applicability. Furthermore, testing the proposed model with a larger dataset, a smaller time interval, or load data for different locations is a good approach to validate its effectiveness. Future models could integrate more variables such as calendar effects (holidays and weekends) and economic indicators. Incorporating these variables is crucial, as they can significantly influence electricity load patterns and simulate the real situation. For instance, holidays and weekends typically exhibit different consumption behaviors compared to regular weekdays [45]. When more features are included in the data, the VMD method can be considered for signal denoising and feature selection, as it requires less computational effort to obtain its components and is unaffected by the mode mixing issue [46].

6. Conclusions

STLF is an important research topic that helps minimize prediction errors by leveraging historical load data and relevant influencing factors. In this paper, we have developed a novel hybrid model that integrates multiple AI models using the stacking technique to predict load under various influencing factors, including daily minimum and maximum temperatures, electricity prices, and rainfall. To assess the robustness and accuracy of our model, we conducted comparative analyses with diverse models using load datasets from Australia and Spain. Our results demonstrate significant improvements in the MAPE compared to the WNN–ANN [17] and LSTM–XgBoost [18] hybrids and five AI models. Our experimental findings show the effectiveness of our proposed model in facilitating proactive power planning, mitigating extreme events such as power outages and excess generation, and advancing the overall reliability and efficiency of power systems.

Furthermore, our experimental results confirm the effectiveness of using the stacking technique to combine different models, thereby outperforming traditional statistical approaches. Our novel approach, which leverages the strengths of individual models and integrates diverse perspectives, improves the accuracy of STLF. Future work could focus on integrating more accurate and advanced hybrid models using the stacking technique. Selecting suitable base models is crucial, as their accuracy directly affects the predictive performance of the metamodel. Additionally, improving the prediction of extreme load points helps in conducting reliable power line inspections, which are essential for ensuring grid stability and security [47].

Author Contributions

Conceptualization, F.G., J.W. and L.P.; methodology, F.G. and H.M.; software, F.G. and Z.Z.; validation, F.G., H.M., L.L. and F.H.; formal analysis, F.G. and J.W.; investigation, F.G. and L.P.; resources and data curation, J.W. and L.P.; writing—original draft preparation, F.G. and H.Z.; writing—review and editing, H.M. and H.Z.; visualization, Z.Z.; supervision, L.P.; project administration, H.M. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in https://www.kaggle.com/datasets/manualrg/spanish-electricity-market-demand-gen-price (accessed on 1 January 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

STLF	Short-Term Load Forecasting
AI	Artificial Intelligence
ARIMA	Autoregressive Integrated Moving Average
ARIMAX	Autoregressive Integrated Moving Average with Explanatory Variable
KF	Kalman Filter
GOA	Grasshopper Optimization Algorithm
LSTM	Long Short-Term Memory Network
MAPE	Mean Absolute Percentage Error
SARIMA	Seasonal Autoregressive Integrated Moving Average
GRNN	Generalized Regression Neural Network
ELM	Extreme Learning Machine
ANN	Artificial Neural Network
WNN	Wavelet Neural Network
SVM	Support Vector Machine
SVN	Support Vector Network
SVR	Support Vector Regression
DNN	Deep Neural Network
PCR	Principal Component Regression

References

Peng, L.; Lv, S.X.; Wang, L.; Wang, Z.Y. Effective electricity load forecasting using enhanced double-reservoir echo state network. Eng. Appl. Artif. Intell. 2021, 99, 104132. [Google Scholar] [CrossRef]
Fan, G.F.; Zhang, L.Z.; Yu, M.; Hong, W.C.; Dong, S.Q. Applications of random forest in multivariable response surface for short-term load forecasting. Int. J. Electr. Power Energy Syst. 2022, 139, 108073. [Google Scholar] [CrossRef]
Yang, D.; e Guo, J.; Sun, S.; Han, J.; Wang, S. An interval decomposition-ensemble approach with data-characteristic-driven reconstruction for short-term load forecasting. Appl. Energy 2022, 306, 117992. [Google Scholar] [CrossRef]
Habbak, H.; Mahmoud, M.; Metwally, K.; Fouda, M.M.; Ibrahem, M.I. Load Forecasting Techniques and Their Applications in Smart Grids. Energies 2023, 16, 1480. [Google Scholar] [CrossRef]
Maldonado, S.; González, A.; Crone, S. Automatic time series analysis for electric load forecasting via support vector regression. Appl. Soft Comput. 2019, 83, 105616. [Google Scholar] [CrossRef]
Sharma, S.; Majumdar, A.; Elvira, V.; Chouzenoux, E. Blind Kalman filtering for short-term load forecasting. IEEE Trans. Power Syst. 2020, 35, 4916–4919. [Google Scholar] [CrossRef]
Sheshadri, G.S. Electrical load forecasting using time series analysis. In Proceedings of the 2020 IEEE Bangalore Humanitarian Technology Conference, Vijiyapur, India, 8–10 October 2020; pp. 1–6. [Google Scholar]
Nepal, B.; Yamaha, M.; Yokoe, A.; Yamaji, T. Electricity load forecasting using clustering and ARIMA model for energy management in buildings. Jpn. Archit. Rev. 2020, 3, 62–76. [Google Scholar] [CrossRef]
Ye, N.; Liu, Y.; Wang, Y. Short-term power load forecasting based on SVM. In Proceedings of the World Automation Congress 2012, Puerto Vallarta, Mexico, 24–28 June 2012; pp. 47–51. [Google Scholar]
Xu, J. Research on power load forecasting based on machine learning. In Proceedings of the 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), Heifei, China, 25–27 September 2020; pp. 562–567. [Google Scholar]
Deng, X.; Ye, A.; Zhong, J.; Xu, D.; Yang, W.; Song, Z.; Zhang, Z.; Guo, J.; Wang, T.; Tian, Y.; et al. Bagging–XGBoost algorithm based extreme weather identification and short-term load forecasting model. Energy Rep. 2022, 8, 8661–8674. [Google Scholar] [CrossRef]
Hossain, M.S.; Mahmood, H. Short-term load forecasting using an LSTM neural network. In Proceedings of the 2020 IEEE Power and Energy Conference at Illinois (PECI), Champaign, IL, USA, 27–28 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
Mughees, N.; Mohsin, S.A.; Mughees, A.; Mughees, A. Deep sequence to sequence Bi-LSTM neural networks for day-ahead peak load forecasting. Expert Syst. Appl. 2021, 175, 114844. [Google Scholar] [CrossRef]
Atef, S.; Eltawil, A.B. Assessment of stacked unidirectional and bidirectional long short-term memory networks for electricity load forecasting. Electr. Power Syst. Res. 2020, 187, 106489. [Google Scholar] [CrossRef]
Ren, H.; Li, Q.; Wu, Q.; Zhang, C.; Dou, Z.; Chen, J. Joint forecasting of multi-energy loads for a university based on copula theory and improved LSTM network. Energy Rep. 2022, 8, 605–612. [Google Scholar] [CrossRef]
Zhao, X.; Li, Q.; Xue, W.; Zhao, Y.; Zhao, H.; Guo, S. Research on ultra-short-term load forecasting based on real-time electricity price and window-based XGBoost model. Energies 2022, 15, 7367. [Google Scholar] [CrossRef]
Aly, H.H. A proposed intelligent short-term load forecasting hybrid models of ANN, WNN and KF based on clustering techniques for smart grid. Electr. Power Syst. Res. 2020, 182, 106191. [Google Scholar] [CrossRef]
Li, C.; Chen, Z.; Liu, J.; Li, D.; Gao, X.; Di, F.; Li, L.; Ji, X. Power load forecasting based on the combined model of LSTM and XGBoost. In Proceedings of the 2019 the International Conference on Pattern Recognition and Artificial Intelligence, Wenzhou China, 26–28 August 2019; PRAI ’19; Association for Computing Machinery: New York, NY, USA, 2019; pp. 46–51. [Google Scholar]
Tan, Z.; Zhang, J.; He, Y.; Zhang, Y.; Xiong, G.; Liu, Y. Short-term load forecasting based on integration of SVR and stacking. IEEE Access 2020, 8, 227719–227728. [Google Scholar] [CrossRef]
Moon, J.; Jung, S.; Rew, J.; Rho, S.; Hwang, E. Combination of short-term load forecasting models based on a stacking ensemble approach. Energy Build. 2020, 216, 109921. [Google Scholar] [CrossRef]
Barman, M.; Dev Choudhury, N.; Sutradhar, S. A regional hybrid GOA-SVM model based on similar day approach for short-term load forecasting in Assam, India. Energy 2018, 145, 710–720. [Google Scholar] [CrossRef]
El-Hendawi, M.; Wang, Z. An ensemble method of full wavelet packet transform and neural network for short term electrical load forecasting. Electr. Power Syst. Res. 2020, 182, 106265. [Google Scholar] [CrossRef]
Li, J.; Deng, D.; Zhao, J.; Cai, D.; Hu, W.; Zhang, M.; Huang, Q. A novel hybrid short-term load forecasting method of smart grid using MLR and LSTM neural network. IEEE Trans. Ind. Inform. 2021, 17, 2443–2452. [Google Scholar] [CrossRef]
Tayab, U.B.; Zia, A.; Yang, F.; Lu, J.; Kashif, M. Short-term load forecasting for microgrid energy management system using hybrid HHO-FNN model with best-basis stationary wavelet packet transform. Energy 2020, 203, 117857. [Google Scholar] [CrossRef]
Azeem, A.; Ismail, I.; Jameel, S.M.; Harindran, V.R. Electrical load forecasting models for different generation modalities: A review. IEEE Access 2021, 9, 142239–142263. [Google Scholar]
Akhtar, S.; Shahzad, S.; Zaheer, A.; Ullah, H.S.; Kilic, H.; Gono, R.; Jasiński, M.; Leonowicz, Z. Short-Term Load Forecasting Models: A Review of Challenges, Progress, and the Road Ahead. Energies 2023, 16, 60. [Google Scholar] [CrossRef]
Yin, C.; Mao, S. Fractional multivariate grey Bernoulli model combined with improved grey wolf algorithm: Application in short-term power load forecasting. Energy 2023, 269, 126844. [Google Scholar] [CrossRef]
Massaoudi, M.; Refaat, S.S.; Chihi, I.; Trabelsi, M.; Oueslati, F.S.; Abu-Rub, H. A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for short-term load forecasting. Energy 2021, 214, 118874. [Google Scholar] [CrossRef]
Moradzadeh, A.; Mansour-Saatloo, A.; Nazari-Heris, M.; Mohammadi-Ivatloo, B.; Asadi, S. Introduction and literature review of the application of machine learning/deep learning to load forecasting in power system. In Application of Machine Learning and Deep Learning Methods to Power System Problems; Springer: Berlin/Heidelberg, Germany, 2021; pp. 119–135. [Google Scholar]
Ni, H.; Meng, S.; Geng, X.; Li, P.; Li, Z.; Chen, X.; Wang, X.; Zhang, S. Time Series Modeling for Heart Rate Prediction: From ARIMA to Transformers. arXiv 2024, arXiv:2406.12199. [Google Scholar]
Xiao, X.; Mo, H.; Zhang, Y.; Shan, G. Meta-ANN—A dynamic artificial neural network refined by meta-learning for short-term load forecasting. Energy 2022, 246, 123418. [Google Scholar] [CrossRef]
Solyali, D. A comparative analysis of machine learning approaches for short-/long-term electricity load forecasting in Cyprus. Sustainability 2020, 12, 3612. [Google Scholar] [CrossRef]
Wu, Z.; Zhao, X.; Ma, Y.; Zhao, X. A hybrid model based on modified multi-objective cuckoo search algorithm for short-term load forecasting. Appl. Energy 2019, 237, 896–909. [Google Scholar] [CrossRef]
Moradzadeh, A.; Zakeri, S.; Shoaran, M.; Mohammadi-Ivatloo, B.; Mohammadi, F. Short-term load forecasting of microgrid via hybrid support vector regression and long short-term memory algorithms. Sustainability 2020, 12, 7076. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; KDD ’16; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Suo, G.; Song, L.; Dou, Y.; Cui, Z. Multi-dimensional short-term load forecasting based on XGBoost and fireworks algorithm. In Proceedings of the 2019 18th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), Xuzhou, China, 8–10 November 2019; pp. 245–248. [Google Scholar]
Nobre, J.; Neves, R.F. Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets. Expert Syst. Appl. 2019, 125, 181–194. [Google Scholar] [CrossRef]
Lv, L.; Wu, Z.; Zhang, J.; Zhang, L.; Tan, Z.; Tian, Z. A VMD and LSTM based hybrid model of load forecasting for power grid security. IEEE Trans. Ind. Inform. 2022, 18, 6474–6482. [Google Scholar] [CrossRef]
Li, Z.; Yu, H.; Xu, J.; Liu, J.; Mo, Y. Stock market analysis and prediction using LSTM: A case study on technology stocks. Innov. Appl. Eng. Technol. 2023, 2, 1–6. [Google Scholar] [CrossRef]
Patterson, J.; Gibson, A. Deep Learning: A Practitioner’s Approach; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Kozlov, A. Daily Electricity Price and Demand Data. 2020. Available online: https://www.kaggle.com/datasets/aramacus/electricity-demand-in-victoria-australia (accessed on 1 January 2024).
Jhana, N. Hourly Energy Demand Generation and Weather. 2019. Available online: https://www.researchgate.net/profile/Riyas-Hamsath-Mohammed-Khan/publication/374415434_Hourly_energy_demand_generation_and_weather_Electrical_demand_generation_by_type_prices_and_weather_in_Spain/links/651c5388b0df2f20a20ae18a/Hourly-energy-demand-generation-and-weather-Electrical-demand-generation-by-type-prices-and-weather-in-Spain.pdf (accessed on 1 January 2024).
Li, S.; Kong, X.; Yue, L.; Liu, C.; Khan, M.A.; Yang, Z.; Zhang, H. Short-term electrical load forecasting using hybrid model of manta ray foraging optimization and support vector regression. J. Clean. Prod. 2023, 388, 135856. [Google Scholar] [CrossRef]
Son, J.; Cha, J.; Kim, H.; Wi, Y.M. Day-ahead short-term load forecasting for holidays based on modification of similar days’ load profiles. IEEE Access 2022, 10, 17864–17880. [Google Scholar] [CrossRef]
Ribeiro, M.H.D.M.; da Silva, R.G.; Moreno, S.R.; Canton, C.; Larcher, J.H.K.; Stefenon, S.F.; Mariani, V.C.; Coelho, L.d.S. Variational mode decomposition and bagging extreme learning machine with multi-objective optimization for wind power forecasting. Appl. Intell. 2024, 54, 3119–3134. [Google Scholar] [CrossRef]
Li, Y.; Ni, M.; Lu, Y. Insulator defect detection for power grid based on light correction enhancement and YOLOv5 model. Energy Rep. 2022, 8, 807–814. [Google Scholar] [CrossRef]

Figure 1. The diagram of a grid system.

Figure 2. The architecture of the LSTM cell.

Figure 3. The architecture of the stacked LSTM.

Figure 4. The architecture of the Bi-LSTM.

Figure 5. Framework of the proposed model.

Figure 6. The trends of the datasets.

Figure 7. The trend of each prediction outcome in case 1. (a) LSTM–XgBoost model; (b) ANN–WNN Model; (c) proposed model.

Figure 8. The trend of each prediction outcome in case 2. (a) LSTM–XgBoost model; (b) ANN–WNN model; (c) proposed model.

Table 1. The features of the datasets.

Features	Description
Min-temperature	Minimum temperature during the day (°C)
Max-temperature	Maximum temperature during the day (°C)
Rainfall	Daily rainfall in mm
Daily price	The average price per MWh/$
Load	The total daily electricity demand in MWh

Table 2. The performance of Mmodels in the Australia dataset.

Models	MAPE (%)
ANN	10.98
XgBoost	7.45
LSTM	7.37
Stacked LSTM	6.70
Bi-LSTM	6.60
LSTM–XgBoost	7.07
ANN–WNN	11.02
Proposed	5.99

Table 3. The performance of models in the Spain dataset.

Models	MAPE (%)
ANN	10.27
XgBoost	9.87
LSTM	9.43
Stacked LSTM	8.10
Bi-LSTM	8.14
LSTM–XgBoost	9.33
ANN–WNN	8.83
Proposed	7.80

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, F.; Mo, H.; Wu, J.; Pan, L.; Zhou, H.; Zhang, Z.; Li, L.; Huang, F. A Hybrid Stacking Model for Enhanced Short-Term Load Forecasting. Electronics 2024, 13, 2719. https://doi.org/10.3390/electronics13142719

AMA Style

Guo F, Mo H, Wu J, Pan L, Zhou H, Zhang Z, Li L, Huang F. A Hybrid Stacking Model for Enhanced Short-Term Load Forecasting. Electronics. 2024; 13(14):2719. https://doi.org/10.3390/electronics13142719

Chicago/Turabian Style

Guo, Fusen, Huadong Mo, Jianzhang Wu, Lei Pan, Hailing Zhou, Zhibo Zhang, Lin Li, and Fengling Huang. 2024. "A Hybrid Stacking Model for Enhanced Short-Term Load Forecasting" Electronics 13, no. 14: 2719. https://doi.org/10.3390/electronics13142719

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Stacking Model for Enhanced Short-Term Load Forecasting

Abstract

1. Introduction

2. Literature Review

2.1. Statistical Models

2.2. Artificial Intelligence Models

2.3. Hybrid Models

3. Methodology and Framework

3.1. Related Methodology

3.1.1. XgBoost

3.1.2. LSTM

3.1.3. Stacked LSTM

3.1.4. Bi-LSTM

3.1.5. Lasso Regression

3.1.6. Stacking Technique

3.2. The Framework of the Proposed Model

3.3. Measurement Method

4. Performance Evaluation

4.1. Data Exploration and Preprocessing

4.2. Experimental Results

4.2.1. Australia Dataset Experiments

4.2.2. Spain Dataset Experiments

5. Discussion

5.1. The Effect on Economy

5.2. The Effect on Power Grid

5.3. Limitations and Recommendations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI