Comparative Prediction of Wine Quality and Protein Synthesis Using ARSkNN

Comparative Prediction of Wine Quality and Protein Synthesis Using ARSkNN

Ashish Kumar, Roheet Bhatnagar, Sumit Srivastava, Arjun Chauhan
Copyright: © 2020 |Pages: 11
DOI: 10.4018/IJITPM.2020100103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The amount of data available and information over the past few decades has grown manifold and will only increase exponentially. The ability to harvest and manipulate information from this data has become a crucial activity for effective and faster development. Multiple algorithms and approaches have been developed in order to harvest information from this data. These algorithms have different approaches and therefore result in varied outputs in terms of performance and interpretation. Due to their functionality, different algorithms perform differently on different datasets. In order to compare the effectiveness of these algorithms, they are run on different datasets under a given set of fixed restrictions (e.g., hardware platform, etc.). This paper is an in-depth analysis of different algorithms based on trivial classifier algorithm, kNN, and the newly developed ARSkNN. The algorithms were executed on three different datasets, and analysis was done by evaluating their performance taking into consideration the accuracy percentage and execution time as performance measures.
Article Preview
Top

Introduction

The Human race is evolving at a very fast pace. Humans are producing, harvesting and feasting on unprecedented amount of data. New technologies and analytics platforms are helping in deriving unforeseen information from the huge amount of data. Evolution has accelerated and is primarily aided by the introduction of computational machines. These machines not only help with storage of data, but also help aid development. The dominance of these machines in the digital age has only led to generation of enormous amount of data which is only increasing at a rapid pace. As surveyed in 2012, about 2.5 Exabytes of data is generated every day, and this amount of data doubles every 40 months. More data is streamed across the internet every second than what was the storage of the entire internet only 20 years ago (Chen & Zhang, 2014). To keep up with such flow of data in the form of images, videos, datasets, files, text, animations, etc. it is necessary for humans to harness it for his complete benefit.

In such a scheme, big data analytics plays a crucial role only due its sheer capability to function and analyze large datasets. The approach of harnessing information and using it to predict an unknown event or to find patterns is known as data mining. Several techniques and methods have been developed by a lot of mathematicians and statisticians in order to extract information from these large datasets. The combination of these techniques and computational processing is the base for the formation of data mining. The main purpose of data mining techniques (Witten, Frank, Hall & Pal, 2016) is to extricate high level knowledge from raw data. Many algorithms and approaches were developed by mathematicians in the early 20th century. However, due to technological restraints at that time, these algorithms couldn’t put to practical use and it was only till recently when there was a surge in processing power when these algorithms again became an area of interest for many researchers. Of the many prevalent algorithms to exist, a few more commonly used is SVM, kNN and Gradient Descent (Kotsiantis, Zaharakis & Pintelas, 2007). These algorithms can be broadly classified into 4 types based on their functionality, namely Regression, Classification, Clustering and Rule extraction.

Classification is a technique through which a model or a classifier is created to predict the class of a particular query q. In the scope of this paper we shall discuss two of these classification techniques kNN and ARSkNN. The kNN is a commonly used classifier which is based on distance as a parameter to conclude the class of a testing instance. There are 76 similarity parameters which are used but all of them are based on distance (Choi, Cha & Tappert, 2010). Due to this approach, the kNN is often computationally and time ineffective and uses a lot of memory to operate. To overcome this problem, another classifier known as ARSkNN was developed which uses Massim as a similarity measure rather than distance (Kumar, Bhatnagar, & Srivastava, 2014). This approach not only reduces computational power required, but also reduces the overall time taken for classification. The performance of both these classifier is, however variable and is dependent on the characteristics of the dataset, i.e. if it is symmetric, binary etc.

The paper makes the following contributions; It:

  • Establishes the empirical and experimental foundation of ARSkNN.

  • Empirically evaluate ARSkNN and traditional kNN. The results shows ARSkNN superiority upon two parameters known as avg. accuracy percentage and avg. runtime.

In the following paper, the authors have compared and evaluated the performance of the kNN classifier using Euclidean distance as a similarity measure and the ARSkNN. Wine Quality dataset and Yeast dataset which were taken from the UCI repository (Blake & Merz, 1998) were used in this study. The paper is divided into 7 sections, namely: Introduction, Literature Review, Experimental Setup, Datasets Used, Empirical Evaluation, Discussion and Conclusion.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing