Rider Chaotic Biography Optimization-driven Deep Stacked Auto-encoder for Big Data Classification Using Spark Architecture: Rider Chaotic Biography Optimization

Rider Chaotic Biography Optimization-driven Deep Stacked Auto-encoder for Big Data Classification Using Spark Architecture: Rider Chaotic Biography Optimization

Anilkumar V. Brahmane (Koneru Lakshmaiah Education Foundation, Guntur, India) and Chaitanya B. Krishna (Koneru Lakshmaiah Education Foundation, Guntur, India)
Copyright: © 2021 |Pages: 21
DOI: 10.4018/ijwsr.2021070103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The novelty in big data is rising day-by-day in such a way that the existing software tools face difficulty in supervision of big data. Furthermore, the rate of the imbalanced data in the huge datasets is a key constraint to the research industry. Thus, this paper proposes a novel technique for handling the big data using Spark framework. The proposed technique undergoes two steps for classifying the big data, which involves feature selection and classification, which is performed in the initial nodes of Spark architecture. The proposed optimization algorithm is named rider chaotic biography optimization (RCBO) algorithm, which is the integration of the rider optimization algorithm (ROA) and the standard chaotic biogeography-based optimisation (CBBO). The proposed RCBO deep-stacked auto-encoder using Spark framework effectively handles the big data for attaining effective big data classification. Here, the proposed RCBO is employed for selecting suitable features from the massive dataset.
Article Preview
Top

1. Introduction

Data mining is a technology that is utilized for extracting effective, inherent, theoretic valuable, and understandable data from the massive dataset using emergent computing technologies (Karim, et al., 2018; Ramírez-Gallego, et al., 2018). Extreme learning machine (ELM) is adopted when the dataset increases and becomes complex. Moreover, ELM recognizes the capability to be quick and operative (Duan, et al., 2018). Big data is essential for triumphant data mining to have measurable and effective solutions that should be easily reached to various types of skilled experts (Koliopoulos, et al., 2015). Data mining is a technique for dealing with huge data to improve decision making (Ramírez-Gallego, et al., 2017). Using conventional methods, it is inconvenient to collect datasets with more accuracy and processability. Thus, big data is extensively used for processing massive datasets which provides an easier understanding that makes it possible for organizations to acquire insights from previous data. Various journals have been launched by the academic community on big data. The journals, namely Nature” and “Science” addressed the big data obstacles which involve “Big data” and “Deal with data” that helped to solve various obstacles. Internet expertise, economics, supercomputing, medicines (Sailee Bhambere, 2017; Sailee D Bhambere 2017), etc are some of the issues in big data technology (Lin, et al., 2017). An exceptionally large quantity of processed data is accumulated by current systems so research on big data has become widespread, as an enormous amount of data is needed for performing research activities (Ramírez-Gallego, et al., 2017). Processing data is simple with small data, but if there is a rise in data size performance may go down. The database software is incapable to process data that differ in size, quantity, variety, accuracy. Moreover, typical database software cannot incarcerate and pile. Such software does not have the power to influence or to form an idea (Ramírez-Gallego, et al., 2018).

Parallel programming models have become famous nowadays which invokes interest amongst researchers to devise novel machine learning algorithms. Usually, big data deals with examining a large quantity of data from the geographically distributed region using Machine learning algorithms (Lv, et al., 2018). The machine-learning algorithm has become a prominent examining tool for multinational companies and governments. Machine learning techniques interpret difficult and complex data sets to make effective decisions after detailed and scholarly analysis to attain high performance. Anticipated performances have a direct connection with the model features on basis of a parallel programming framework (Sheshasaayee, & Lakshmi, 2017). In the conventional technique, data is collected from all over the world to central data center location which is then processed by data-driven parallel application (Hernández, et al., 2018). Thus, a better tool is still needed for processing a huge amount of data. Apache Spark is one of the big data examining tools for security analysis (Lighari, & Hussain, 2017). MapReduce and Spark are the major platforms that are largely used and support Genetic algorithms and particle swarm optimization (Elsebakhi, et al., 2015; Ditzler, et al., 2017; Ekanayake, et al., 2016). MapReduce is a parallel programming model that is used to achieve maximum productivity in the form of processing big datasets. The new models of MapReduce are highly measurable. For parallel processing, MapReduce utilizes Hadoop distributed file system. However, the increase in development and technology led to various new platforms like Kafka, Spark, and Flume, etc. Due to the shortcoming of MapReduce, many new technologies took place of it (Sheshasaayee, & Lakshmi, 2017). Apache Spark is one of the fast cluster computing engines that give accurate measurability and fault detecting features as compared to MapReduce (Hadgu, et al., 2015).

Complete Article List

Search this Journal:
Reset
Volume 22: 1 Issue (2025)
Volume 21: 1 Issue (2024)
Volume 20: 1 Issue (2023)
Volume 19: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 18: 4 Issues (2021)
Volume 17: 4 Issues (2020)
Volume 16: 4 Issues (2019)
Volume 15: 4 Issues (2018)
Volume 14: 4 Issues (2017)
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing