Article Preview
Top1. Introduction
Since last few years, due to the emerging technologies such as cloud computing, big data, and Internet of Things (IoT) (Joshi et al., 2021; Muniswamaiah et al., 2019), the volume of data is increasing day by day. The rapid growth of web applications including search engine, online shopping, and cloud computing is putting forward severe requirements on the underlying infrastructure in terms of computing, storage, and networking. In order to meet the storage and processing needs of large amounts of data, Data Center (DC) has become an indispensable information platform, which is responsible for the management and maintenance of massive computing and storage systems. Internet companies like Microsoft, Google, Amazon, Facebook, and Alibaba have built high-performance data centers around the world. These data centers connect servers and network switches over network to meet the needs of high-speed computing and massive storage in a more convenient way. While Data Center Network (DCN) plays a crucial role in data center by connecting all the data center resources together (Chen et al., 2021).
Machine Learning (ML) is a very successful approach of Artificial Intelligence (AI) (Di Mitri et al., 2017; Phellan et al., 2021), which is the core of AI, and it is also a form of AI in which the computer learns how to complete a task by itself. ML can help machines to take right decisions and smart actions in real time without human intervention. There are two common ML models that are supervised learning model and unsupervised learning model. ML has been around for a while which has grown at a high speed in recent years. In future, ML will be one of the best solutions for analyzing large amounts of data. If handled right, ML could change the way humans live more than any technology that ever existed.
As more and more people begin to connect to the Internet, data from DC is increasing, but health-related data is what we are concerned about. Health has always been part of our whole way of life. Every part of our life relies on having good health. Living a healthy lifestyle can help prevent chronic diseases and long-term illnesses. The importance of good health in our life is undoubtedly great. Accordingly, the main contributions of this paper are summarized as follows. (i) Combining voting strategy based global outlier detection with K-means based nearest-furthest neighbors search, an improved algorithm for health-related data based outlier detection algorithm is proposed. (ii) We propose local importance based random forest feature selection algorithm to measure the importance of each feature.
The remaining of this paper is organized as follows. Section 2 reviews the related work. In Section 3, two algorithms are proposed in terms of data preprocessing. The experimental results are shown in Section 4. Section 5 concludes this paper.