Anomaly Detection Using System Logs: A Deep Learning Approach

Anomaly Detection Using System Logs: A Deep Learning Approach

Rohit Sinha, Rittika Sur, Ruchi Sharma, Avinash K. Shrivastava
Copyright: © 2022 |Pages: 15
DOI: 10.4018/IJISP.285584
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Anomaly detection is a very important step in building a secure and trustworthy system. Manually it is daunting to analyze and detect failures and anomalies. In this paper, we proposed an approach that leverages the pattern matching capabilities of Convolution Neural Network (CNN) for anomaly detection in system logs. Features from log files are extracted using a windowing technique. Based on this feature, a one-dimensional image (1×n dimension) is generated where the pixel values of an image correlate with the features of the logs. On these images, the 1D Convolution operation is applied followed by max pooling. Followed by Convolution layers, a multi-layer feed-forward neural network is used as a classifier that learns to classify the logs as normal or abnormal from the representation created by the convolution layers. The model learns the variation in log pattern for normal and abnormal behavior. The proposed approach achieved improved accuracy compared to existing approaches for anomaly detection in Hadoop Distributed File System (HDFS) logs.
Article Preview
Top

1. Introduction

With the advancement of cutting-edge technologies, our lives are getting easier but the same technology can be exploited in a wrong way that can cause immense harm to an organization or an individual. Thus, anomaly detection becomes an essential task to make our systems and networks secure. At the same time, it is also crucial to secure other rare events from any kind of exploitation. These rare events may have great significance but can be extremely hard to find, which may be the result of increasing vulnerabilities due to the emergence of complex technologies. In modern systems log files hold the system state and significant information on various execution paths.

Though the data in log files might seem to be homogeneous, it may contain some unusual anomaly that might not be visible to the user. To address this issue, methods for anomaly detection come into play. Anomaly detection helps to identify any anomalous behavior in our data or unexpected pattern that should not conform to an expected pattern. System Logs can play a critical part in this quest for anomaly detection. System logs record all the information on an active running process. So, if any failure occurs or any anomaly happens, it gets recorded in the system logs.

In an event of normal functioning, the log files contain homogeneous data, but any unusual behaviour or an anomaly results in a pattern change in the log files. Therefore, we can harness this property into a method to detect anomalies by looking at the system logs for any unusual pattern.

But, log files are created in huge amounts in a system, and parsing them manually in search of any anomaly might not be possible for a human expert. Here, we need an automated process that can be used for improved efficiency. Anomaly detection addresses unpredictable or uncertain, rare and minor events. This increases the complexity of the problem for the detection methods. The rarity and the heterogeneous nature of the anomalies makes it difficult to identify and leads to false classification of normal events as anomalies. Suppose you are working as a system admin at an e-commerce giant. There can be an issue in the front-end that stops your customer from buying things in your platform. How do you know if your customers spending suddenly drops when your services still run perfectly normal? That’s when anomaly detection comes in. Although a large number of methods have been introduced over the years (Breunig et al., 2000; Liu et al., 2012), reducing the false positives and increasing the recall rates for detecting anomalies are an important yet difficult challenge to address. Since the data generated by logs are of higher dimensions and anomaly detection in higher dimensions has been a long standing problem (Zimek et al., 2012), performing the detection in a subspace of original feature (Keller et al., 2012; Lazarevic & Kumar, 2005; Liu et al., 2012) or constructed features seems like a straightforward solution. However, identifying higher order, heterogeneous and non-linear feature interaction and coupling remains a major challenge for anomaly detection. Furthermore, the previously known methods of subspace based methods and feature selection based methods (Altalhi & Gutub, 2021; Pang et al., 2018; Pang et al., 2017) does not preserve the proper information. Thus it may become challenging for anomaly detection due to the heterogeneity of the anomalies. These previously known methods also do not address the challenges of detecting anomalies that have spatial, temporal or graph-based interdependent relationships among them. The rarity of the anomalies makes the dataset of the anomalies susceptible to noisy instances. The main challenge is to significantly identify the noises as they can be distributed in the data space irregularly. This challenge poses a major obstacle for traditional machine learning techniques(Juvonen et al., 2015). These challenges posed by anomaly detection in traditional methods can be addressed by the Convolutional Neural Networks (CNN).

Previous approaches used Long Short Term Memory (LSTM), Principal Component Analysis (PCA) based anomaly detection, invariant mining, one-class Support Vector Machine (SVM), isolation forests to detect anomaly in the system log files (Du et al., 2017; Juvonen et al., 2015; Liang et al., 2007).

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing