Real-Time Cyber Analytics Data Collection Framework

Real-Time Cyber Analytics Data Collection Framework

Herbert Maosa, Karim Ouazzane, Viktor Sowinski-Mydlarz
Copyright: © 2022 |Pages: 10
DOI: 10.4018/IJISP.311465
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In cyber security, it is critical that event data is collected in as near real time as possible to enable early detection and response to threats. Performing analytics from event logs stored in databases slows down the response time due to the time cost of database insertion and retrieval operations. The authors present a data collection framework that minimizes the need for long-term storage. Events are buffered in memory, up to a configurable threshold, before being streamed in real time using live streaming technologies. The framework deploys virtualized data collecting agents that ingest data from multiple sources including threat intelligence. The framework enables the correlation of events from various sources, improving detection precision. The authors have tested the framework in a real time, machine-learning-based threat detection system. The results show a time gain of 300 milliseconds in transmission time from event capture to analytics system, compared with storage-based collection frameworks. Threat detection was measured at 95%, which is comparable to the benchmark snort IDS.
Article Preview
Top

Introduction

Security analytics systems rely upon data sourced from multiple network infrastructure devices such as Intrusion Prevention and Detection Systems (IDPS), network firewalls and routers, network switches and various application firewalls. Before this data can be analyzed for possible security threats, it needs to be collected. Therefore, data collection is a crucial and critical step in the cyber analytics process. Consequently, data collection might as well be a performance-critical path for analytics systems (Ramah et al., 2006), (Qadeer et al., 2010), especially when the need to consume big data or perform analysis in real time arises.

The figure below presents a typical cyber analytics process.

Figure 1.

Typical analytics process

IJISP.311465.f01

Different analytics applications will be the consumers of the data collected and processed in the preceding stages. Suppose the analytics applications and consequent processes are time sensitive. In that case, the data collection stage must make the ingested data available as quickly as possible, while at the same time collecting sufficient data on which to form accurate inferences.

While detection capability remains key in any cyber response system, the timeliness of the detection is even more paramount as attacks detected too late would have caused significant damage by the re-action time.

This paper presents a data collection framework that enables the real time detection and response of cyber threats. The near total elimination of local long-term storage of collected data saves significant time cost complexity. The use of state-of-the-art real time streaming technologies ensures that data is available to analytics applications as soon as practically possible, enabling our analytics applications to implement real-time reactions.

The innovation of our solution comes in several ways. Firstly, the system ingests from multiple source types including external cyber threat intelligence. This improves the maturity capability of the overall security operations. Secondly, the architecture improves the storage layer by allowing in-memory analytics, which improves the overall detection and response time. Further, the architecture embraces modern technologies to enable real time streaming and analysis of security events, mitigating technology limitations prevalent in the state-of-the-art solutions.

Our contribution is a flexible, scalable, expansible, and multi-source collection architecture and framework for data collection that enables timely detection of security threats and response.

The rest of this paper is organized as follows:

First, we review some of the recent research in data collection for cyber security, where we critically analyze and highlight the research gaps this paper addresses. Then we present our proposed framework, highlighting the architectural pillars that differentiate our work. We then illustrate an implementation based on our Framework, followed by experimentation and results. We conclude this work and propose some future work.

Top

Various research on data collection methods and technologies can be found in the research literature.

The collection module proposed in (Razaq et al., 2016) populates security-related data in a local MySQL database, after which a Hadoop snoop job exports the data to an off-shore data store based on Hadoop File System. Analytics applications then run atop the data in the Hadoop system.

The real-time cyber threat detection platform in (Carvalho et al., 2016) collects data from both internal and external sources. After some pre-processing, the data is loaded into multiple databases according to data type (Malware Database, Social Media Database, Email Database, etc.). Big data analytics is then deployed using machine learning algorithms that train and detect threats in real-time data flows.

Open Source technologies are used in (R. More et al., 2017) to detect threats in real time. Captured Sensor data is uploaded to Apache Hadoop Clusters before being trained and classified using Apache Mahout.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing