Parallel Distributed Patterns Mining Using Hadoop MapReduce Framework

Parallel Distributed Patterns Mining Using Hadoop MapReduce Framework

Ishak H. A. Meddah (University of Sciences and Technology of Oran “Mohamed Boudiaf” (USTO-MB), Oran, Algeria) and Khaled Belkadi (University of Sciences and Technology of Oran “Mohamed Boudiaf” (USTO-MB), Oran, Algeria)
Copyright: © 2017 |Pages: 16
DOI: 10.4018/IJGHPC.2017040105

Abstract

The treatment of large data is proving more difficult in different axes, but the arrival of the framework MapReduce is a solution of this problem. With it we can analyze and process vast amounts of data. It does this by distributing the computational work across a cluster of virtual servers running in a cloud or large set of machines while process mining provides an important bridge between data mining and business process analysis. The process mining techniques allow for extracting information from event logs. In general, there are two steps in process mining: correlation definition or discovery and process inference or composition. Firstly, the authors' work consists to mine small patterns from a log traces. Those patterns are the representation of the traces execution from a log file of a business process. In this step, they use existing techniques. The patterns are represented by finite state automaton or their regular expression. The final model is the combination of only two types of small patterns whom are represented by the regular expressions (ab)* and (ab*c)*. Secondly, the authors compute these patterns in parallel, and then combine those small patterns using the MapReduce framework. They have two parties: the first is the Map Step in which they mine patterns from execution traces; the second is the combination of these small patterns as reduce step. The authors' results are promising in that they show that their approach is scalable, general, and precise. It minimizes the execution time by the use of the MapReduce framework.
Article Preview

Many techniques are suggested in the domain of process mining, we quote:

M. Gabel et al. (Gabel & Su, 2008) present a new general technique for mining temporal specification, they realized their work in two steps, firstly they discovered the simple patterns using existing techniques, then combine these patterns using the composition and some rules like Branching and Sequencing rules.

Temporal specification expresses formal correctness requirement of an application’s ordering of specific actions and events during execution, they discovered patterns from traces of execution or program source code; The simples patterns are represented using regular expression (ab)* or (ab*c)* and their representation using finite state automaton, after they combine simple patterns to construct a temporal specification using a finite state automaton.

G. Greco et al. (Greco et al., 2006) discovered several clusters by using a clustering technique, and then they calculate the pattern from each cluster, they combine these patterns to construct a final model, they discovered a workflow scheme from, and then they mine a workflow using a Mine Workflow Algorithm, after they define many clusters from a log traces by using clustering technique and Process Discover Algorithm and some rules cluster.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2018): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing