Discovering Process Horizontal Boundaries to Facilitate Process Comprehension

Discovering Process Horizontal Boundaries to Facilitate Process Comprehension

Pavlos Delias (Department of Accounting and Finance, Eastern Macedonia and Thrace Institute of Technology, Kavala, Greece) and Kleanthi Lakiotaki (Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece)
DOI: 10.4018/IJORIS.2018040101

Abstract

Automated discovery of a process model is a major task of Process Mining that means to produce a process model from an event log, without any a-priori information. However, when an event log contains a large number of distinct activities, process discovery can be real challenging. The goal of this article is to facilitate process discovery in such cases when a process is expected to contain a large set of unique activities. To this end, this article proposes a clustering approach that recommends horizontal boundaries for the process. The proposed approach ultimately partitions the event log in a way that human interpretation efforts are decomposed. In addition, it makes automated discovery more efficient as well as effective by simultaneously considering two quality criteria: informativeness and robustness of the derived groups of activities. The authors conducted several experiments to test the behavior of the algorithm under different settings, and to compare it against other techniques. Finally, they provide a set of recommendations that may help process analysts during the process discovery endeavor.
Article Preview
Top

Introduction

The idea of process mining is to discover, monitor and improve fact-based processes by extracting knowledge from event logs readily available in today’s systems. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments (Van der Aalst et al., 2012). Discovery is the type of process mining that strives to take unstructured data (Event Log) and without any a-priori information to produce a process model (Van der Aalst, 2011, p. 10). The focus of this paper is to facilitate the discovery endeavor through a clustering approach. The proposed method is useful when a process is expected to contain a large set of unique event classes. This work aims to provide computerized facilitation to process analysts following the Decision Support Systems paradigm. Facilitation occurs through a) suggesting a way to divide-and-conquer (Carmona et al., 2009a) the log file, i.e., to define horizontal boundaries for the global process, b) discovering marginal yet meaningful process models by virtually any technique and c) providing recommendations to merge marginal models into an overall process model.

Process discovery is one of the most challenging tasks in process mining. State-of-the-art techniques still have problems dealing with large and/or complex event logs and process models (Van der Aalst, 2012a). Complexity (either due to a large number of distinct activities comprising the process, or due to a huge number of traces in the event log - the Big Data case) is the main reason why the discovery problem is hard, since most discovery techniques are exponential to the number of activities. These techniques have difficulties, or even fail to deliver results in a reasonable time when an Event Log contains a large number of unique event classes. Moreover, process discovery is hard because the process is typically quite unstructured and/or the log that an analyst can obtain is not complete. A large number of activities aggravates the potentials to deal with those facts. Researchers have early identified the need for Process Mining to keep pace with the Big Data reality, making process mining problems decomposition an emerging topic of the field (Van der Aalst 2012a; Van der Aalst 2013a; Verbeek & Van der Aalst, 2013; Van der Aalst, 2013b; Munoz-Gama et al., 2013b; Munoz-Gama et al., 2013a).

An additional reason why discovery is hard is that when producing a process model, one must consider multiple criteria. The need to consider multiple criteria in all process mining tasks has been well documented in the literature (Buijs et al., 2012; Adriansyah et al., 2011b; Rozinat & Van der Aalst, 2008; Van der Aalst et al., 2008). Four main quality criteria have been identified for the created process models: fitness (be able to replay the observed behavior), precision (do not allow too much additional behavior), generalization (avoid overfitting) and simplicity (do not increase, beyond what is necessary, the number of entities required to explain the behavior). Since these criteria are conflicting, the discovery methods should find a way to trade-off their requirements. On the other hand, despite its hardness, the discovery problem remains interesting because organizations need process models that reflect the real processes for documentation, verification, performance analysis etc. (Lee et al., 2013). These needs are reflected in the growing interest of Business Process Management vendors, consultants, and researchers.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing