Big Data in Real Time to Detect Anomalies

Big Data in Real Time to Detect Anomalies

Copyright: © 2024 |Pages: 26
DOI: 10.4018/979-8-3693-0413-6.ch015
(Individual Chapters)
No Current Special Offers


The proliferation of linked devices and the Internet have made it easier for hackers to infiltrate networks, which can result in cyber attacks, financial loss, healthcare information theft, and cyber war. As a result, network security analytics has drawn a lot of interest from researchers lately, especially in the field of anomaly detection in networks, which is seen to be essential for network security. Current methods are ineffective mostly because of the large amounts of data that linked devices have amassed. It is essential to provide a framework that can manage real-time massive data processing and identify network irregularities. This study makes an effort to solve the problem of real-time anomaly detection. This work has examined both the key features of related machine learning algorithms and the most recent real-time big data processing technologies for anomaly detection. The recognized research problems of massive data processing in real-time for anomaly detection are described at this point.
Chapter Preview


Sensors, connected devices, smart home appliances, smart cities, smartphones, mobile clouds, healthcare applications, multimedia, virtual reality, and autonomous cars are just a few of the emerging technologies that are growing quickly. These technologies also contribute to the massive accumulation of real-time data that is flowing in a network. According to a research, the Internet might see a large community of 50.1 billion linked gadgets by 2020. This anticipated expansion raises serious concerns about network security.The goal of the current study is to give a thorough understanding of the most recent real-time big data technology, applications, and anomaly detection methods. Anomaly detection, machine learning techniques, and real-time big data processing are the three diverse fields that are detailed in this paper's comparative study and relationship. Also, a comprehensive taxonomy built on a comparison of the aforementioned domains is presented. This paper's second goal is to identify and describe the difficulties associated with real-time large data processing for anomaly detection. The most network attacks, with a total of 79,790 security incidents, were reported from the public, information, and financial services industries, and 75% of attacks spread from 0 victims to 1 victim within a given day, (Aburomman et al., 2017) according to a study that examined data from 70 organizations spread across 61 countries. The network infrastructure has also been subject to a variety of cyber attacks. As an illustration, consider phishing, malware, search engine poisoning, bonnets, distributed denial of service attacks, denial of service attacks, spam, and credential comparison. Nowadays, monitoring networking risks has emerged as the main difficulty for most businesses, particularly in well-known industries like government, energy, healthcare, banking, and research facilities. The infrastructure is safeguarded and secured by these firms employing a variety of monitoring methods at great financial expense. Yet, because attackers utilise advanced tactics to access the infrastructure, the current security technologies and log analysis to find attackers operating offline eventually become outdated.

There have been several reports of large-scale, sophisticated cyberattacks against connected devices in 2016. Thousands of Internet-linked gadgets, including cameras, recorders, and other connected devices, were compromised by hackers. Major websites in the United States were attacked by cybercriminals using common home equipment and gadgets. The development of the Internet of Things (IoT) has paved the way for an enormous amount of data generation and flow, creating a barrier for the procedures used to monitor the security of the network infrastructure. (Ahmad, A et al.,2016) According to a recent investigation from Spider Labs, a significant cyber attack on a Singaporean healthcare data centre led to the loss of 1.5 million patient details. Nevertheless, database administrators did not discover and report the data theft for a few days. They discovered unusual activity on one of Sing Health's IT databases, which prompted an alert and the network's closure to prevent a further data breach. Anomaly detection is one of the effective tools used to assure network security since it aids security analysts in seeing potential attacks that might affect the network in the next days or months.

Key Terms in this Chapter

Cyberattacks: Attempts to steal, expose, alter, disable, or destroy another's assets through unauthorized access to computer systems.

Anomaly detection: Generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour.

Mobile Clouds: A new computing paradigm where mobile devices exploit the available cloud computing platform for performing specific tasks and/or accessing data on demand rather than on the individual devices themselves.

Sensor Data: It is the output of a device that detects and responds to some type of input from the physical environment.

Contextual Anomalies: If a data instance is anomalous in a specific context (but not otherwise), then it is termed as a contextual anomaly (also referred to as conditional anomaly

Network Traffic: It is the amount of data moving across a computer network at any given time. Network traffic, also called data traffic, is broken down into data packets and sent over a network before being reassembled by the receiving device or computer.

Cluster Centers: A point to represent central location (usually mean) of the cluster. Cluster centers have been used to represent the points of its cluster.

Complete Chapter List

Search this Book: