Article Preview
Top1. Introduction
In recent years, cloud computing has emerged as a leading paradigm to enable customers deploying their applications in cost-effective and performance isolation manners (Khorshed et al., 2013; Zhang et al., 2016). Accompanying with the new opportunities brought by cloud computing, many challenges also raise when the number of cloud resources are scaled up (Weingartner et al., 2015; Singh & Chana, 2016). Therefore, knowing the online status of various resources (e.g., availability, capability, efficiency) plays a critical role to effective managing nowadays cloud infrastructures from the perspective of cloud providers (Manvi & Shyam, 2014; Canali & Lancellotti, 2014; Lu, et al., 2016). In addition, a full knowledge and control of the current status of those resources is also helpful to improve the delivered QoS or avoid SLA violations from the perspective of cloud users (Serhani et al., 2014; Fatema et al., 2014; Tomanek et al., 2016). As a result, an effective cloud monitoring service has become a necessary infrastructure in current cloud environments (Canali & Lancellotti, 2014; Wang et al., 2017; Mdhaffar et al., 2017).
Generally saying, a monitoring service is to collect and aggregate performance-related metrics from distributed resources (Balis et al., 2011; Fatema et al., 2014). By analyzing monitoring data, cloud providers can understand the runtime status of underlying resources, and then detect, diagnose or solve problems before they happen (Jing & Min, 2011; Pardi et al., 2013). Unfortunately, it is not easy to manage the information of a huge number of resources in a reliable and scalable way due to the enormous scale and complex structure of cloud computing, which is especially true when multiple tenants with different monitoring requirements are taken into account (Fatema et al., 2014; Du & Li, 2017). In addition, the inherent characteristics of cloud systems (such as resource virtualization and elastic resource provision) also increase the difficulties of implementing a non-intrusive cloud monitoring service (Montes et al., 2013; Povedano-Molina et al., 2013). In the past decade, many cloud monitoring tools, or approaches have been proposed, each having its own advantages and disadvantages (Fatema et al., 2014; Wang et al., 2018). Even so, how to mitigate the overhead introduced by monitoring service is still a key issued need to be addressed, since growing scale and complexity of cloud platform make it impossible for simultaneous monitoring all resources without introducing significant overheads on network and storage subsystems (Fatema et al., 2014; Weng et al., 2016). In addition, monitoring data covering all resource elements will be too huge to be efficiently handled, and in most scenarios, it is not necessary. Therefore, a more flexible monitoring service that enables to dynamically adjust or customize monitoring process in an on-demand manner becomes necessary.