Protein Classification Using N-gram Technique and Association Rules

Protein Classification Using N-gram Technique and Association Rules

Fatima Kabli (GeCode Laboratory, Department of Computer Science Tahar Moulay University of Saïda, Saïda, Algeria), Reda Mohamed Hamou (GeCode Laboratory, Department of Computer Science Tahar Moulay University of Saïda, Saïda, Algeria) and Abdelmalek Amine (GeCode Laboratory, Department of Computer Science Tahar Moulay University of Saïda, Saïda, Algeria)
Copyright: © 2018 |Pages: 13
DOI: 10.4018/IJSI.2018040106

Abstract

The knowledge extraction process from biological data is increasingly being considered, it addresses general issues such as grouping, classification and association; The Protein classification is an important activity for the biologist to respond to biological needs. For this reason, the authors present a global framework inspired by the knowledge extraction process from biological data to classified proteins from their primary structure based on the association rules. This framework has three main steps: The first one is, the pre-processing phase, consists of extracting descriptors by N-Gram technique. The second is the extraction of associations rules, applying the Apriori algorithm. The third step is selecting the relevant rules, and applied the classifier. The experiments of this technique were performed on five classes of protein, extracted from UniProt data bank and compared with five classification methods in the WEKA platform. The obtained results satisfied the authors' purpose to propose an effective protein classifier supported by the N-gram technique and the Apriori algorithm.
Article Preview
Top

2. State Of The Art

The extraction of association rules, is an important task, seem particularly well-adapted, to discover the relations between the sets of elements from a database; In its most common version, an association rule is characterized as follows: I= {i1, i2, ..., im} Set of items, and T = {t1, t2, ..., tn} set of transactions, each one associated with a subset of I. An association rule is defined by X → Y, in which X, Y ⊆ I and X∩Y = ∅, each rule induces two notions, the support and the confidence, measure respectively the scope and the precision of the rule.

For example, rule AB → C, support = 20%, and confidence = 80% indicates that when A and B occur, C also occurs in 80% of cases, and all three events occurs at the same time in 20% of all instances. The user sets a minimum support threshold and a minimum confidence threshold for the generation of rules.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2020): 2 Released, 2 Forthcoming
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing