Article Preview
Top1. Introduction
In last decade, cloud computing paradigm has attracted more and more attentions in both academic and commercial areas, due to its ability to provide a cost-effective IT-infrastructure with enhanced efficiency and elasticity (Zhang et al., 2013; Marmol and Kuhnen, 2015; Hayyolalam and Pourhaji-Kazem, 2018). Regardless of its advantages, cloud computing also raises plenty of challenges on distributed resource management and performance optimization (Weingartner et al., 2015; Fei et al., 2019). For instance, dynamic and unpredictable workload might lead to poor resource allocation decision (Valliyammai and Selvi, 2012; Fei et al., 2019; Habibi et al., 2019); heterogeneous resources and various user requirements make some effective approaches used in traditional distributed system be unsuitable any long (Reyes et al., 2010; Sztajnberg et al., 2011). As a result, resource/performance monitoring service plays a crucial role for improving and optimizing the resource management policy in current cloud platforms (Fu et al., 2013; Alcaraz-Calero and Aguado, 2015).
Generally, a monitoring service is to obtain a full knowledge of underlying resources through a set of well-designed toolkits (Povedano-Molina et al., 2013; Thrihinas et al., 2014; Ghanavati et al., 2017; Xu et al., 2018). In a cloud environment, an effective monitoring service also should take into account the inherent features of cloud, including resource virtualization (Lu et al., 2016), elastic resource provisioning (Thrihinas et al., 2014), utility-based service model (Gutierrez-Aguado et al., 2016) and so on. To handle these issues, several cloud monitoring solutions/systems are developed in recent years, each having its own advantages and disadvantages (Montes et al., 2013; Povedano-Molina et al., 2013; Andreolini et al., 2015). Unfortunately, most of these existing cloud monitoring tools only passively raise an alert event when a QoS violation occurs. As a result, a cloud provider is difficult to find the performance bottleneck that causes such a QoS violation simply based on the observed alert-event logs (Povedano-Molina et al., 2013; Alcaraz-Calero and Aguado, 2015). More importantly, as QoS violations from different applications have different semantics, determining the importance of QoS violation becomes very difficult if not impossible (Cicotti et al., 2015; Gutierrez-Aguado et al., 2016; Du and Li, 2017). Finally, from the perspective of cloud users, fine-grained monitoring service can provide more resource information which is very helpful for improving their QoS satisfactory, while it also implies higher sampling frequency which significantly increases the monitoring overhead from the perspective of cloud provider (Thrihinas et al., 2014; Mdhaffar et al., 2017).