Unsupervised Learning and Online Anomaly Detection: An On-Condition Log-Based Maintenance System

Unsupervised Learning and Online Anomaly Detection: An On-Condition Log-Based Maintenance System

Leticia Decker, Daniel Leite, Francesco Minarini, Simone Rossi Tisbeni, Daniele Bonacorsi
DOI: 10.4018/IJERTCS.302112
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The Large Hadron Collider (LHC) demands a huge amount of computing resources to deal with petabytes of data generated from High Energy Physics (HEP) experiments and user logs, which report user activity within the supporting Worldwide LHC Computing Grid (WLCG). An outburst of data and information is expected due to the scheduled LHC upgrade, viz., the workload of the WLCG should increase by 10 times in the near future. Autonomous system maintenance by means of log mining and machine learning algorithms is of utmost importance to keep the computing grid functional. The aim is to detect software faults, bugs, threats, and infrastructural problems. This paper describes a general-purpose solution to anomaly detection in computer grids using unstructured, textual, and unsupervised data. The solution consists in recognizing periods of anomalous activity based on content and information extracted from user log events. This study has particularly compared One-class SVM, Isolation Forest (IF), and Local Outlier Factor (LOF). IF provides the best fault detection accuracy, 69.5%.
Article Preview
Top

Introduction

Separating High-Energy Physics (HEP) developments and experiments from computational approaches to data analysis is currently an infeasible task. For instance, the Large Hadron Collider (LHC) at CERN in Geneva produces several petabytes of data yearly from particle collision experiments and simulations. Exabytes of data are required to be processed, including metadata, and data from a posteriori analysis. Therefore, a huge amount of computing resources is needed for data storage, and to support a computing throughput of around 105 tasks per day followed by an increasing demand for efficient data sharing among computing centers through high-speed networks.

The Worldwide LHC Computing Grid (WLCG) has been created to support HEP experiments at CERN. The grid infrastructure is an essential asset to support the LHC discoveries. Nonetheless, grid resource requests tend to boom in the near future due to a scheduled LHC upgrade that aims to increase the experiment's luminosity by a factor of 10 over its current value, increasing the amount of data to process. In HEP scattering, luminosity IJERTCS.302112.m01 is the ratio of events IJERTCS.302112.m02 detected through a cross-section IJERTCS.302112.m03 over a period of time IJERTCS.302112.m04, i.e., IJERTCS.302112.m05. The increase of IJERTCS.302112.m06 generates at least a linear or polynomial increase on the amount of data, and, consequently, a polynomial increase of the workload of computing centers across the grid. A complex technological challenge is envisioned, namely, to keep the grid infrastructure working along the Run-3 and Run-4 stages of the High-Luminosity LHC project (HL-LHC) (Di Girolamo et al., 2020; Herr & Muratori, 2006).

The HEP Software Foundation (HSF) released a road-map document describing the actions needed to prepare the grid to support the HL-LHC upgrade (Albrecht et al., 2019). As a result, the Operational Intelligence group (OpInt) was created as a task force to improve the WLCG quality of service (QoS). Through data analytics and log data mining, its main research line concerns the development and maturation of machine learning (ML) tools based on event-oriented maintenance systems. Many ad-hoc solutions have been promoted by the OpInt group, from log parsing to diagnostic systems, as real-time anomaly detection approaches developed to assist the computing center of the Italian Institute of Nuclear Physics (INFN-CNAF) (de Sousa et al., 2019). ML algorithms reduce system downtime and optimize the usage of resources.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 2 Issues (2018)
Volume 8: 2 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing