Article Preview
TopIntroduction
Currently, with the escalating global resource and environmental challenges, countries worldwide are increasingly embracing “dual-carbon” policies and initiatives. The emphasis on clean energy and electric vehicles signifies the prevailing trend towards upgrading the power supply infrastructure. To achieve the dual carbon objective, China is committed to constructing a new power system primarily reliant on new energy sources. However, the integration of a higher proportion of renewable energy sources brings about greater volatility and uncertainty in grid operations (Lam et al., 2020), which seriously affects the grid frequency stability and control performance standards (CPS). Microgrids offer a solution by enhancing the utilization rate of distributed new energy and effectively addressing electricity consumption challenges in remote areas, deserts, or islands. Moreover, microgrids provide a crucial avenue for integrating electric vehicles (EVs) and diverse forms of distributed green energy (Fan et al., 2022).
In the context of load frequency control and regulation within microgrids, energy storage units play a vital role. In recent years, with the rapid popularization of EVs, these vehicles can be leveraged as controllable loads and distributed energy storage units (Chae et al., 2020; Iqbal et al., 2020). Through vehicle-to-grid (V2G) technology, EVs can absorb or transmit power back to the grid to regulate system frequency deviations in the event of grid disturbances or faults (Ziras et al., 2019; Chae et al., 2020). Literature (Li et al., 2019; Karkevandi et al., 2018) has established single-area and multi-area power system load frequency control models incorporating EVs. Various control strategies have been proposed, investigating the dynamic characteristics of system frequency control under these strategies. Simulation results indicate that EV involvement in power system frequency regulation can significantly enhance regulatory performance. However, the reliance on trial-and-error parameter adjustments in the Proportional-Integral control method employed in these models makes it challenging to achieve optimal control performance. Additionally, with the pursuit of the dual-carbon goal, extensive development and integration of new and clean energy sources have emerged as focal points of China's energy landscape (Chen et al., 2020). As a result, the existing controller outlined in the aforementioned literature needs to be enhanced to address the stochastic disturbances stemming from large-scale new energy integration in islanded microgrids.
The rapid development of emerging machine learning techniques such as reinforcement learning and deep learning in recent years has provided new methods and ideas to solve the above problems. Barbalho et al. (2022) designed a microgrid controller based on the DDPG algorithm to achieve microgrid frequency stabilization by changing the output power of energy storage elements. Fan et al. (2022) proposed a DQN-based load frequency control strategy for microgrid with electric vehicle islanding, which effectively solves the microgrid frequency fluctuation problem under wind disturbance. Wang et al. (2018, 2021) proposed the design of load frequency controllers based on Q-learning and deep Q-learning. It is worth noting that the above methods are derived from classical Q-learning, which updates the Q-function by approximating the maximum desired action value. However, a major drawback of this approach is the overestimation of action values, leading to suboptimal results due to local optimization. In order to address this issue, Hasselt et al., (2015) proposed a Double Q-learning (DQL) algorithm. DQL improves upon classical Q-learning by decoupling the action and state of the Q-function and can effectively solve the overestimation problem of action value in Q-learning. However, the method is not completely unbiased and may introduce an underestimation bias in action values while solving the overestimation problem. This bias could potentially hinder intelligent agents from exploring optimal strategies in the stochastic environment.