Predicting the Future Research Gaps Using Hybrid Approach: Machine Learning and Ontology - A Case Study on Biodiversity

Predicting the Future Research Gaps Using Hybrid Approach: Machine Learning and Ontology - A Case Study on Biodiversity

Premisha Premananthan, Banujan Kuhaneswaran, Banage T. G. S. Kumara, Enoka P. Kudavidanage
DOI: 10.4018/978-1-7998-7258-0.ch009
(Individual Chapters)
No Current Special Offers


Sri Lanka is one of the global biodiversity hotspots that contain a large variety of fauna and flora. But nowadays Sri Lankan wildlife has faced many issues because of poor management and policies to protect wildlife. The lack of technical and research support leads many researchers to retreat to select wildlife as their domain of study. This study demonstrates a novel approach to data mining to find hidden keywords and automated labeling for past research work in this area. Then use those results to predict the trending topics of researches in the field of biodiversity. To model topics and extract the main keywords, the authors used the latent dirichlet allocation (LDA) algorithms. Using the topic modeling performance, an ontology model was also developed to describe the relationships between each keyword. They classified the research papers using the artificial neural network (ANN) using ontology instances to predict the future gaps for wildlife research papers. The automatic classification and labeling will lead many researchers to find their desired research papers accurately and quickly.
Chapter Preview


Wildlife Protection is a trending topic all over the world till the date. Wildlife is critical for the sustenance of life on earth. Conserving biodiversity is critical to maintaining a healthy ecological balance in the world. Especially Sri Lanka has a larger set of biological hotspots which contain a rare and wide variety of fauna and flora. However, the Sri Lankan wildlife is critically threatened due to many reasons, mainly human interventions, and needs dire conservation measures. Lack of Wildlife conservation practices is also impeded by information and technological assistance. Study results of wildlife studies can, but the contribution currently made cannot be satisfactorily integrated into data-oriented conservation and management decisions. This research demonstrates a new method for data mining to find secret keywords and automated labeling for past research activities in this area. Despite the importance and opportunities for the best data resources, there was a lack of research interest from outside of the field. So the problem arises with the retreat to finding research gaps and ideas. Our study focused to sort out this issue while we proposed a method using Machine learning and Ontology to find research gaps and an automatic classification system using past research papers on the Wildlife of Sri Lanka.

The selection of research topics is often not compatible with the actual research needs due to multiple reasons. This is a disheartening scenario as there are plenty of opportunities for such work. Inadequate knowledge of the existing research and its applicability, inadequate use of technology, and inability to locate some research are some of the contributing factors. Other than the research published in a known journal, some past research information available online cannot be found properly because they belong to conventional archives, unfortunately. Increasing public awareness of the values of wildlife and the consequences of losing this heritage can assist conservation to a large extent. To achieve this, we have to simplify the gap between the public and the accessibility to information on wildlife. Technology can play a major role in filling the gap between them. But interacting between the domain and technical party is the problem in our case.

Despite its small size, Sri Lanka has a wide ecosystem diversity because of its topographic and climatic heterogeneity as well as its coastal consequences. So as Sri Lankans we must aware of the wildlife around us. Nowadays there are a lot of crimes and careless attention towards the wildlife especially the rare species. The major reason is the lack of awareness and knowledge. But there is a big number of researches conducted on wildlife. But those researches (Mateo, Arroyo, & Garcia, 2016) didn’t reach properly to the outsiders. Mostly the professionals who were involved in wildlife only knew about those. Even other department graduates didn’t acknowledge the endangered species or the current situation of our wildlife. This is a bad sign for our country which has one of the world’s largest wildlife populations. A small number of online availability of researches were found too. Either there’s no proper way to find the availability of past, ongoing research details manually also. So we have to deviate past research techniques to come up with our final solution. Some trending techniques are used here to improve the outputs. Our research aims to resolve the inadequate application of wildlife research and technologies in the decision-making process.

These problems lead to struggle researchers to select this field even they have interests in this field. Mostly wildlife studies aimed to understand species diversity, behavior, and habitat use, and ecology, the role of wildlife in disease transmission, species conservation, population management, and methods to control threats to diversity. In our study, we concentrate on reviewing past research papers using data mining techniques to provide potential research ideas that can be conducted in the future. To fill the data needs for conservation our solution focuses primarily on semi-automating the finding of research gaps through abstract analysis. Finally, the model includes the most commonly used keywords and question top. This will be a vital milestone for researchers as well as wildlife activists to give an eye on recent problems that need a solution urgently. We must use our data stores efficiently to remove the barriers to easily find research ideas and desired gaps. In this motive, we proposed a novel methodology using trending technologies to make our model more efficient.

Key Terms in this Chapter

Artificial Neural Network (ANN): An artificial neural network (ANN) is a piece of a computer system programmed to replicate the way the human brain analyzes and processes information. The foundation of artificial intelligence (AI) solves problems that, by human or mathematical criteria, would be impossible or complicated. ANNs have the capability of self-learning, meaning that more evidence is needed to obtain improved outcomes.

Recurrent Neural Network (RNN): A recurrent neural network (RNN) is a type of artificial neural network that uses sequential data or time-series data. These deep learning algorithms are widely used for ordinal or temporal concerns, such as language translation, natural language processing (NLP), speech recognition, and image captioning; they are implemented into popular applications such as Siri, voice search, and Google Translate.

Conservation: Conservation is the conservation and preservation of these properties so that they can continue to survive for future generations. It involves preserving the diversity of animals, genomes, and habitats, as well as environmental functions, such as nutrient cycling.

Latent Dirichlet Allocation (LDA): LDA is a generative mathematical model that allows a series of results to be explained by unobserved classes that understand why certain parts of the data are identical. For example, if observations are words gathered in documents, it is argued that each text is a combination of a limited number of themes and that the appearance of each word is due to one of the themes of the document. LDA is an example of a theme model which belongs to the machine learning toolbox and, more generally, to the artificial intelligence toolbox.

Long Short-Term Memory (LSTM): LSTM networks are a form of a recurrent neural network capable of learning order dependency on sequence prediction problems. This behavior is required in complex problem domains such as machine translation, speech recognition, and more.

Research: Research is a comprehensive investigation process involving the compilation of data; the recording of critical information; and the review and evaluation of those data/information in conjunction with appropriate methodologies developed by particular technical fields and academic disciplines. Research is undertaken to assess the relevance of a theory or an interpretive framework; compile a body of substantive information and observations to share them acceptably, and produce questions for further inquiry.

Topic Modeling: Topic modeling is a tool for unsupervised document classification, analogous to clustering on numeric data, which identifies certain normal classes of things (topics) even though we're not sure what we're searching for. Topic modeling provides tools for arranging, interpreting, scanning, and summarizing broad electronic collections automatically.

Ontology: Ontology includes the description, systematic naming, and specification of types, properties, and relationships between concepts, data, and entities that underpin one, several, or all spheres of discourse. More specifically, ontology is a means of showing the properties of the subject field and how they apply to it, by identifying a collection of definitions and categories that describe the subject.

Complete Chapter List

Search this Book: