Bagging Approach for Medical Plants Recognition Based on Their DNA Sequences

Bagging Approach for Medical Plants Recognition Based on Their DNA Sequences

Mohamed Elhadi Rahmani (GeCoDe Laboratory, Dr. Tahar Moulay University of Saida, Saida, Algeria), Abdelmalek Amine (GeCoDe Laboratory, Dr. Tahar Moulay University of Saida, Saida, Algeria) and Reda Mohamed Hamou (GeCoDe Laboratory, Department of Computer Science, University of Dr. Tahar Moulay, Saida, Algeria)
DOI: 10.4018/IJSESD.2018100103

Abstract

Many drugs in modern medicines originate from plants and the first step in drug production, is the recognition of plants needed for this purpose. This article presents a bagging approach for medical plants recognition based on their DNA sequences. In this work, the authors have developed a system that recognize DNA sequences of 14 medical plants, first they divided the 14-class data set into bi class sub-data sets, then instead of using an algorithm to classify the 14-class data set, they used the same algorithm to classify the sub-data sets. By doing so, they have simplified the problem of classification of 14 plants into sub-problems of bi class classification. To construct the subsets, the authors extracted all possible pairs of the 14 classes, so they gave each class more chances to be well predicted. This approach allows the study of the similarity between DNA sequences of a plant with each other plants. In terms of results, the authors have obtained very good results in which the accuracy has been doubled (from 45% to almost 80%). Classification of a new sequence was completed according to majority vote.
Article Preview
Top

1. Introduction

Since the first appearance of humans on earth, plants played an important role throughout human history. Plants affect the human body by identical processes for each of their chemical compounds such as digitalis that is isolated from foxglove, Taxol from periwinkle, vincristine from yew, and morphine that is extracted from opium poppy and considered as one of the most effective sedative for pain; those effects are well known in the domain of pharmaceutical drugs. Herbal medicines have the same work as conventional drugs on human body, thus they have also the same side effects (Tapsell, 2006). So, the best use of plants in medicine needs a careful documentation. Here we found the domain of Ethnobotany, it is simply the investigation about plants used by primitive societies in various parts of the world (Acharya, 2008). However, the first step to the right use of plants is the recognition of the plant species.

In 1990's, researchers discovered that yew tree bark could not be used as a sustainable source of the drug, which made them to stop using the blockbuster drug Taxol. This is a simple example of the high number of clinical trials that caused a diminution in the clinical potential of these compounds and that is due to low production levels in plant species. In that case, a Taxol precursor happened to be more readily available in a renewable part of the tree, and a semi-synthetic protocol could be developed to convert it into the drug. While researchers look for more efficient solutions that are needed in order to ensure that wealth of bio active compounds works well, they have found metabolic engineering of effective plant and microbial production platforms, these techniques are based on DNA sequencing.

DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four bases — adenine, guanine, cytosine, and thymine - in a strand of DNA. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery. In biology, one of the main field of research is the knowledge of DNA sequences. This kind of researches is applied in many domains such as medicine, biotechnology, recognition of species. Recently, DNA sequences has become quick which allows to recognize different species from plants to animals to humans even microbial species are recognized based on its DNA sequences. A lot of works have been done in recognizing unknown DNA sequences, the works are divided into several categories: the alignment-based, alignment-free, statistics method and others.

Data mining is the core stage of the knowledge discovery process that is aimed at the extraction of interesting nontrivial, implicit, previously unknown and potentially useful information from data in large databases (Fayyad, 1996). Machine learning is a part of data mining which it focuses on prediction, based on known properties learned from the training data.

The present paper shows a bagging-based approach of machine learning algorithms in data mining to identify DNA sequences for recognition of medical plants. The organization of the paper was done as following, the next section presented a view of literature about the domains that this work touches. Section 3 described the used data set and collection of the Medical Plants Genome Resources. While the discussion of the proposed approach was given in Section 4. In section 5 we detailed the obtained results in the experiments and studies done in this work. Finally, we cited the major conclusions in section 6 and mentioned the future works.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 12: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 11: 4 Issues (2020): 3 Released, 1 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing