A Hybrid Pre-Post Constraint-Based Framework for Discovering Multi-Dimensional Association Rules Using Ontologies

A Hybrid Pre-Post Constraint-Based Framework for Discovering Multi-Dimensional Association Rules Using Ontologies

Emad Alsukhni, Ahmed AlEroud, Ahmad A. Saifan
DOI: 10.4018/IJITWE.2019010106
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Association rule mining is a very useful knowledge discovery technique to identify co-occurrence patterns in transactional data sets. In this article, the authors proposed an ontology-based framework to discover multi-dimensional association rules at different levels of a given ontology on user defined pre-processing constraints which may be identified using, 1) a hierarchy discovered in datasets; 2) the dimensions of those datasets; or 3) the features of each dimension. The proposed framework has post-processing constraints to drill down or roll up based on the rule level, making it possible to check the validity of the discovered rules in terms of support and confidence rule validity measures without re-applying association rule mining algorithms. The authors conducted several preliminary experiments to test the framework using the Titanic dataset by identifying the association rules after pre- and post-constraints are applied. The results have shown that the framework can be practically applied for rule pruning and discovering novel association rules.
Article Preview
Top

1. Introduction

Association rule mining is one of the most important steps in the knowledge discovery process from a given database (Fayyad, Piatetsky-Shapiro, Smyth, & Uthurusamy, 1996). Association rule mining was first introduced by (Agrawal, Imielinski, & Swami, 1993) to discover transactions that frequently occur together in a database. Association rule mining is used to discover frequent patterns in transactional databases. Its applications range from marketing and computer networks to discovering shopping patterns, (e.g., items purchased together) and friend suggestions in social networks.

Association rule algorithms search for frequent items that have been purchased together in marketing databases. However, the occurrence of such items must exceed a percentage n% in the transactional database which is known as a minimum support (Yan, Zhang, & Zhang, 2009). Those items that pass the minimum support cutoff are known as frequent item sets. The set of frequent items are then used to discover association rules. Those are the items that frequently occur together in a database of transactions, where each rule consists of a rule antecedent and rule consequent. In order to examine which set of the discovered rules is useful for analysis, a rule quality measure such as rule confidence is used. The rule confidence is the ratio between the support of the rule antecedent and it’s consequent.

In the large databases, which consist of millions of transactions the large number of the discovered association rules makes it not feasible to distinguish those rules that contain novel patterns (Hamalainen & Webb, 2017). In addition, there could be many redundant rules. The rule’s semantics is another problem, which refers to the lack of semantic-based relationships in the set of the discovered association rules (Ferraz & Garcia, 2008).

To this end, discovering more rules enhances the probability of identifying useful and unexpected patterns in transactional databases. On the other hand, when more rules are discovered a post-processing effort is needed to process them, filter-out the non-useful, and keep the useful ones based on analyst’s goals and preferences.

Given the challenges above, we propose an approach which aims at achieving three objectives;

  • 1.

    Using ontologies to discover semantically related association rules on multiple dimensions. This approach is very similar with what happens with the fact and dimensions tables in the data warehouse to discover useful patterns (Kimball, Ross, Mundy, & Thornthwaite, 2015).

  • 2.

    Post-processing the set of discovered association rules. The pre-filtering is done by applying one or more hierarchical constraints on an ontology mapped representation of the original items dataset. Such triples of constraints are not possible using the traditional matrix representation of the data sets.

  • 3.

    Going beyond the set of discovered association rules, by rolling up and drilling down on the dimensions of a specific association rule, that is, to post-process the discovered association rules using domain ontologies. This objective represents our main contribution compared to existing approaches. The main purpose of the post processing step is to discover the redundant patterns at different dimensions or discover contradictory patterns with respect to the original association rules. This can be done by analyzing one or more dimensions of the discovered rules.

Our research presents a robust approach for mining association rules at multiple dimensions and constraints using the power of ontology that represent the original dataset (Galarraga, Teflioudi, Hose, & Suchanek, 2013). The goal is to improve the association rule pruning techniques that involve using ontologies, which contain general concepts that have meaningful relationships in between. By using domain ontologies we may achieve many objectives. First, ontologies can play a major role in the rule pruning by removing those redundant rules at different levels. Second, the semantics of the original relationships between different dimensions can be identified. Finally, domain ontology is a very useful representation method for data with multiple dimensions (Galarraga et al., 2013).

Complete Article List

Search this Journal:
Reset
Volume 19: 1 Issue (2024)
Volume 18: 1 Issue (2023)
Volume 17: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 16: 4 Issues (2021)
Volume 15: 4 Issues (2020)
Volume 14: 4 Issues (2019)
Volume 13: 4 Issues (2018)
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing