Article Preview
TopIntroduction
As wireless devices increase rapidly, mobile video communication networks are facing giant challenges to increase network capacity with the ever-increasing demand (Huang et al., 2017; Zhao et al., 2017a). By offloading user equipment (UEs) from the macro base station (MBS) to femto base station (FBS), the heterogeneous network (HetNet) (Zhang et al., 2016; Chen et al., 2015; Zhao et al., 2018) balances mobile video communication network traffic (Lien et al., 2015; Wu et al., 2015). Furthermore, For the sake of improving the cellular network’s entire spectral efficiency the FBS and the MBS (Wang et al., 2016; Bashar, 2015) can share the same channel. Consequently, HetNets can improve the network capacity and energy efficiency is regarded as a promising approach in the future networks.
Mobile offloading problem is one of factors influencing the performance of HetNets, which has been investigated in some existing works (Ye et al., 2013; Bayat et al., 2014; Elsherif et al., 2015). In (Ye et al., 2013) 2013, user association was proposed to solve the load balancing problem in heterogeneous cellular networks. Distributed user association and femtocell allocation was investigated in HetNets (Bayat et al., 2014). The authors in (Elsherif et al., 2015) investigated resource allocation and inter-cell interference management to obtain the optimal offloading strategy. Considering the non-convex features of the mobile offloading optimization issue, it is difficult to gain the global optimal strategy. Some methods have been developed in many works recently. Game theory has been proposed in (Shen et al., 2014). Markov approximation (Chen et al., 2013) has also been used to solve this problem. Nevertheless, we can't use these existing optimization solutions to obtain the optimal strategy effectively without complete and accurate network information. This complete information makes calculating the optimal strategy difficultly is usually not available. This paper introduces a reinforcement learning method to solve the mobile offloading optimization problem of HetNets.
Reinforcement learning (RL) method (Katayama, 2016; Dulac-Arnold et al., 2016; Levine et al., 2017) can obtain the optimal policy to solve the intelligent decision problem by interacting with the environment. Moreover, RL can obtain the long-term goals instead of the optimal current rewards (Degris et al., 2006; Dung et al., 2006, Eremeev et al., 2018). The widely used RL technique is Q-learning. The authors in (Bennis et al., 2010) proposed a Q-learning based approach to interference avoidance in self-organized femtocell networks. In a single agent RL system, independent agents can alter actions with no collaboration, which may result in fluctuating actions in the learning strategy (Talor et al., 2009; D'Eramo et al., 2017). When there are some agents in the environment, it is necessary to consider the dynamic characteristic of the environment due to the behaviors of the other agents. In addition, considering that the cumulative reward of one UE may be inevitably influenced by other UEs' actions, cooperative multi-agent reinforcement learning (MARL) (ElTantawy et al., 2013) should be considered. Unfortunately, there are many issues in MARL to obtain the optimal strategy (Awheda et al., 2016; Graham et al., 2010; Wu et al., 2009), such as convergence, learning speed, and multiple equilibrium.