Article Preview
TopIntroduction
Internet of Things (IoT) and cloud computing are used in many real-time smart applications such as smart health-care, smart traffic, smart industries and smart city. To reduce communication delay between cloud and IoT Devices, Cisco introduced fog computing (Yousefpour et al., 2019) as intermediate computing infrastructure. To improve performance of fog computing, a monitoring system is required that keep track of all the activities and behavior of fog infrastructure with associated IoT Devices (Birje & Bulla, 2019). Monitoring system is also used in predictive maintenance system to detect and predict faulty or deviating behavior of fog nodes and IoT devices (Birje & Manvi, 2011). The anomaly detection and root cause analysis models play a vital role in improving performance of smart applications such as smart industries and smart healthcare system.
The anomaly detection techniques find unknown pattern or outliers in unlabeled data when something unusual occurs or when condition deviates from normal behavior. Root cause Analysis (RCA) is a systematic process to understand reason for anomalies or faults that helps operator to diagnose the problem and solve the issue within short period of time (Singh, 2020). The root cause analysis allows end users to accurately identify anomalies and its root cause to avoid failures that may occur in future. The following requirements have to be satisfied for effective root cause analysis (Steenwinckel et al., 2021) i) Accuracy: the RCA should give accurate cause of the problem and reduce false positives, ii) Minimal human effort: the RCA work automatically without human involvement, iii) Context aware: the RCA should provide context to increase the performance, iv) Adaptive: the RCA system should be capable of adapting detection behavior of changing conditions, v) Interpretable: The user of system must easily understand the failures and its cause to plan appropriate action. vi) Scalable: The RCA must work efficiently even with huge data.
There are three main techniques (Steenwinckel et al., 2021, Solé et al., 2017) to find root cause analysis in fog computing infrastructure: i) data-driven model: it identifies anomaly and its root cause based on the unusual pattern using machine learning or deep learning approach ii) Knowledge-driven techniques: it works on expert knowledge iii) Hybrid: it combines the both data-driven and knowledge-driven technique to meet the requirement of RCA. The above techniques suffer from few of the critical issues such as data-driven technique suffers from interpretability and accuracy, knowledge-driven technique is unable to find new types of faults and its causes and hybrid model consumes more computational resources.
In the new era of smart industries, Anomaly detection and root cause analysis play a vital role in improve the performance of machine and reduce maintainance cost. Anomaly detection identify abnormal behavior of production machine and outliers /quality deviation in the production line. The root cause analysis of anomaly may help to resolve tor fix the issue. The existing the anomaly detection and root cause analysis models does not meet all the requirement and are computationaly expansive. Therefore, there is a need of an effective root cause analysis model which provides high interpretability and accuracy with minimum overhead. The existing works have focused on data-driven root cause analysis considering the above mentioned techniques, but failed to meet requirements such as accuracy, scalability and interpretability. Also, no work has been carried out highlighting the importance of views of multi-agent in predictive maintainance system. Hence, this paper proposes a multi-agent based data-driven root cause analysis model using SHAP algorithm. The main objectives of proposed root cause analysis model are: first, increase the accuracy and reduce the false positives in detecting anomalies in fog computing environment. second, 2) To develop a light weight root cause analysis that fulfill all requirments of root cause analysis with reduced the overhead.