Article Preview
TopIntroduction
Nowadays, large-scale computer processors are comprised of number of software applications and components running on thousands of operating nodes. The runtime statistics of these processors are continuously gathered and accumulated in the form of log files and analyzed thereof to detect the cause and exact location of issues during system failure and malfunctioning (Bao et al., 2018). In general, runtime storage or logging is a usual process to store system functional data helpful for developers as well as support engineers to analyze the behavior of systems and track down the difficulties that may arise in the future. Thus, log files play an essential role in the maintenance and development of software-based computing systems. Additionally, the rich data present in the log files facilitate a huge variety of system analytic practices like ensuring software security (Latib et al., 2018), analyzing application statistics (Patel & Parikh, 2017), detecting performance anomalies (Vaarandi et al., 2018), identifying crashes and errors (Suman et al., 2018; Adam et al., 2016) and so on. In spite of the remarkable data available in logs, it is a great deal to perform effective analysis due to the following challenges. Firstly, recent software systems regularly produce tons of logs (e.g., a commercial cloud device can generate about gigabytes of data every hour) (Astekin et al., 2019). These high volume logs make it impossible to do a manual inspection for key diagnostics even if provided with search and grep utilities. As a result, traditional log analysis methods that mostly depend on manual operation have become unfeasible and prohibitive (Jia et al., 2018; Li et al., 2018). Secondly, the messages in log files are intrinsically unstructured as developers normally store the system activities in a free-text format for better accessibility and flexibility (Rath, 2016). Therefore, there exists a great demand for automated log analysis for all kinds of applications (He et al., 2017; El-Masri et al., 2020). An automatic log analysis based on keyword searches with ad hoc scripts like “CRITICAL” or “ERROR” is found to be inadequate for fixing several problems (Baudart, 2018; Vega et al., 2017). Further, rule-based methods are an advanced technique; however it is difficult to formulate each and every rule throughout the analysis (Khan & Parkinson, 2018). The drawbacks of these initial approaches significantly increased the difficulty in log file data analysis. To overcome these limitations, recent studies as well as industrialists provide different alternatives with a powerful word search and machine learning analytics like Splunk (Carasso, 2012), ELK (Smith, 2015), Logentries (Jaunin & Burdick, 2011), etc. Nevertheless, the first and foremost step to enable these log analysis is log parsing through which free-text raw log data are parsed into a stream of structured data (He et al., 2016).