Big Data Analytics in Aquaculture Using Hive and Hadoop Platform

Big Data Analytics in Aquaculture Using Hive and Hadoop Platform

P. Venkateswara Rao (ASCET, India), A. Ramamohan Reddy (S. V. University, India) and V. Sucharita (K. L. University, India)
Copyright: © 2018 |Pages: 7
DOI: 10.4018/978-1-5225-2947-7.ch002
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In the field of Aquaculture with the help of digital advancements huge amount of data is constantly produced for which the data of the aquaculture has entered in the big data world. The requirement for data management and analytics model is increased as the development progresses. Therefore, all the data cannot be stored on single machine. There is need for solution that stores and analyzes huge amounts of data which is nothing but Big Data. In this chapter a framework is developed that provides a solution for shrimp disease by using historical data based on Hive and Hadoop. The data regarding shrimps is acquired from different sources like aquaculture websites, various reports of laboratory etc. The noise is removed after the collection of data from various sources. Data is to be uploaded on HDFS after normalization is done and is to be put in a file that supports Hive. Finally classified data will be located in particular place. Based on the features extracted from aquaculture data, HiveQL can be used to analyze shrimp diseases symptoms.
Chapter Preview
Top

Introduction

With the quick increase of huge data there is a need of innovative analytical strategies that handles complex data. At present the aquaculture has entered into the world of Big Data. The key challenges are to be identified by analysts of Big Data that are trying to solve problems for experts of aquaculture. Due to advances in communication huge information is available. More data is generated in huge quantities from various sources. Big Data is defined to be the digital data in large scale which is very difficult to manage and analyze using software tools that are traditional (Chenx et al., 2014; Marx, 2013; Zhang, 2014). The data that is being generated is expected in exabytes on daily basis(Gandomi et al.,2013). There are various sources of big data as shown in figure 1. They are logs of application, streams, sensors, generated data, media, Business organizations, daily transactions, emails, mobile phones signals, twitters. Smart Phone users have been increased to 75% in USA (Huh et al., 2014). Thise kind of data can be used for sentimental analysis by which it is identified what people think about different products (Arora et al., 2015; Laney, 2001). Therefore, volume is considered to be very important here. The current meters also generate huge daily reading. This is very important to analyze this kind of data which helps to optimize the energy. Another feature of Big Data is Velocity. One aspect of the velocity is combining of different data like the data may be structured or the unstructured (Gandomi et al., 2013) data arriving very quickly. Data is generated from various sources that have different features. The importance is on the capabilities of processing or based on infrastructure that is present for processing the data in the other case it is not possible with the existing framework. Different people gave different definitions like (Laney, 2001) have given as it is the high volume, high variety information, high velocity which demand on cost effective for improved decision making which is 3V model. Later the definition was updated to 4V model and value is added for the Big Data. Then veracity is also added to Big Data and called it as 5V model. For storing and processing the data that comes from the value it required new platform. Because of the drawback in the traditional methodologies the Big Data processing comes in large volume, variety of data coming at changing velocity. To process this the scalability is required. In the processing of Big Data there will be data from various sources. The data from various sources is put into a single platform. At first the data is acquired from various sources, then processing is done. After processing visualization and finally decision making from visualization. The data from different sources like logs, streams, mails, media, phone calls are combined. Integration tools can be used to combine unstructured and structured data.

Figure 1.

Various sources of Big Data

Complete Chapter List

Search this Book:
Reset