Protein Classification Using N-gram Technique and Association Rules

Protein Classification Using N-gram Technique and Association Rules

Fatima Kabli, Reda Mohamed Hamou, Abdelmalek Amine
Copyright: © 2018 |Pages: 13
DOI: 10.4018/IJSI.2018040106
(Individual Articles)
No Current Special Offers


The knowledge extraction process from biological data is increasingly being considered, it addresses general issues such as grouping, classification and association; The Protein classification is an important activity for the biologist to respond to biological needs. For this reason, the authors present a global framework inspired by the knowledge extraction process from biological data to classified proteins from their primary structure based on the association rules. This framework has three main steps: The first one is, the pre-processing phase, consists of extracting descriptors by N-Gram technique. The second is the extraction of associations rules, applying the Apriori algorithm. The third step is selecting the relevant rules, and applied the classifier. The experiments of this technique were performed on five classes of protein, extracted from UniProt data bank and compared with five classification methods in the WEKA platform. The obtained results satisfied the authors' purpose to propose an effective protein classifier supported by the N-gram technique and the Apriori algorithm.
Article Preview

2. State Of The Art

The extraction of association rules, is an important task, seem particularly well-adapted, to discover the relations between the sets of elements from a database; In its most common version, an association rule is characterized as follows: I= {i1, i2, ..., im} Set of items, and T = {t1, t2, ..., tn} set of transactions, each one associated with a subset of I. An association rule is defined by X → Y, in which X, Y ⊆ I and X∩Y = ∅, each rule induces two notions, the support and the confidence, measure respectively the scope and the precision of the rule.

For example, rule AB → C, support = 20%, and confidence = 80% indicates that when A and B occur, C also occurs in 80% of cases, and all three events occurs at the same time in 20% of all instances. The user sets a minimum support threshold and a minimum confidence threshold for the generation of rules.

Complete Article List

Search this Journal:
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing