A Review of Machine Learning Techniques for Anomaly Detection in Static Graphs

A Review of Machine Learning Techniques for Anomaly Detection in Static Graphs

Hesham M. Al-Ammal (University of Bahrain, Bahrain)
DOI: 10.4018/978-1-7998-2418-3.ch007

Abstract

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.
Chapter Preview
Top

Background

Graphs (also called networks in other sources) have been used extensively to model real-world situations in mathematics, operations research, and computer science. Their usefulness stems from the fact that they are generic representations that can be analyzed using an algorithm. Once this algorithm exists and is efficient, any graph that fits the requirements can be examined, leading to insights and pattern discovery. Famously, Euler laid out the ground work for graph theory when he studied in 1736 the so called Königsberg Bridges problem by formulating the layout as a graph which led to a solution (Bollobas 2012). More recently, advances in analyzing graph structures such as the World Wide Web lead to the success of Google after the PageRank algorithm was employed to strengthen information retrieval (Page et al. 1999). There are numerous other examples of the success (and limitations) of graph-based algorithms, to which the reader is referred to the book by Bollobas 2012 among many other sources.

When examining a massive data set, detecting an anomaly within the data is often more important than getting general facts about the data set (such as the mean or other general observations). Thus, anomaly detection techniques are used to detect or discover rare occurrences within a given data set. Consequently, several vital applications rely on anomaly detection including applications in cybersecurity, fraud detection, finance, health care and many others.

It should be noted that in the data mining field there is often some confusion between the terms “outlier detection” and “anomaly detection”. However, an outlier is a term usually used for points within data placed in a multi-dimensional space independently. When dealing with data represented using a graph a more general term which better captures cases such as discrepancies as well as rare events is “anomaly”.

Furthermore, although there are many outlier detection algorithms, not all data can be represented as points in multi-dimensional space. Several situations have inter-dependencies among the data that is better represented by a graph with links among the objects (Akoglu et al. 2015). In several situations, the richness provided by graph representation greatly enhances the data set and provides means of improving the algorithms or achieving a task. This is clearly evidenced by the case previously mentioned regarding Google search (Page et al. 1999).

In machine learning algorithms, statistical models are constructed from “training data” that enable a computer system to perform a task without being explicitly instructed on how the task is performed. This definition of “without being explicitly programmed” is attributed to Arthur Samuel and describes the core of the definition of machine learning (Torres 2016).

Machine learning is a branch of Artificial Intelligence (AI). As indicated by the timeline of discoveries within AI in Figure 1, machine learning techniques can belong to one of the main branches of AI, namely: (a) Symbolic AI, (b) Statistical AI, or (c) Hybrid AI techniques. It should be noted that machine learning spans the three branches and can utilize a statistical mode, neural network, or a hybrid model.

Figure 1.

Brief overview timeline of artificial intelligence fields and techniques

978-1-7998-2418-3.ch007.f01

Complete Chapter List

Search this Book:
Reset