Big Data for Satellite Image Processing: Analytics, Tools, Modeling, and Challenges

Big Data for Satellite Image Processing: Analytics, Tools, Modeling, and Challenges

Remya S. (VIT University, India), Ramasubbareddy Somula (VIT University, India), Sravani Nalluri (VIT University, India), Vaishali R. (VIT University, India) and Sasikala R. (VIT University, India)
DOI: 10.4018/978-1-5225-3643-7.ch008

Abstract

This chapter presents an introduction to the basics in big data including architecture, modeling, and the tools used. Big data is a term that is used for serving the high volume of data that can be used as an alternative to RDBMS and the other analytical technologies such as OLAP. For every application there exist databases that contain the essential information. But the sizes of the databases vary in different applications and we need to store, extract, and modify these databases. In order to make it useful, we have to deal with it efficiently. This is the place that big data plays an important role. Big data exceeds the processing and the overall capacity of other traditional databases. In this chapter, the basic architecture, tools, modeling, and challenges are presented in each section.
Chapter Preview
Top

1. Introduction

Day by day, we see the data is rapidly increasing in many forms. We have some traditional data processing software to process small quantity of data. But as trillions of bytes of information is being processed per second, the traditional software techniques fail in processing this data. We need to re-think of a solution which can process this data. Now Big Data gives us a solution. Big Data is a term used for creating, capturing, communicating, aggregating, storing and analyzing large amounts of data. Many attempts encountered to quantify the growth rate in the volume of data is called as Information Explosion.

Major milestones took place in the history of sizing data volumes plus the evolution of the term Big Data. The following are some of them:

  • In 1971, Arthur Miller stated in “The Assault on Privacy” that:

Too many information handlers seem to measure a man by the number of bits of storage capacity his dossier will occupy.

  • In April 1980, I.A.Tjomsland gave a talk titled “Where Do We Go From Here?” at “Fourth IEEE Symposium on Mass Storage Systems” in which he says:

Data expands to fill the space available, I believe that large amounts of data are being retained because users have no way of identifying obsolete data, the penalties for storing obsolete data are less apparent than are the penalties for discarding potentially useful data.

  • In 1997, Michael Lesk publishes “How much information is there in this world?” in which he concludes that:

There may be a few thousand petabytes of information all told, and the production of tape and disk will reach that level by the year 2000. So in only a few years, (a) we will be able to save everything- no information will have to be thrown out, and (b) the typical piece of information will never be looked at by a human being. (https://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/2/#1c3097c24343).

The term Big Data was coined in 1998 by Mr. John Mashey, Chief Scientist at SGI. Even though Michael Cox and David Ellsworth seem to have used the term ‘Big Data’ in print, Mr. Mashey supposedly used the term in his various speeches and that’s why he is crediting from coming up with Big Data. But some various sources say that the first use of the term Big Data was done in an academic paper- Visually Exploring Gigabyte Datasets in Realtime(ACM) (OECD, 2015; Mark A. Beyer & Douglas Laney, 2012).

The following are the differentiators of Big Data over Traditional Business Intelligence solutions:

  • Data is retained in a distributed file system instead of on a central server.

  • The processing functions are taken to the data rather than data being taken to the functions.

  • Data is of different formats, both structured as well as unstructured.

  • Data is both real-time as well as offline data.

  • Technology relies on massively parallel processing(MPP) concepts.

The Big Data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information policy. Organizations have to compromise and balance against the confidentiality requirements of the data. Organizations must determine how long the data has to be retained. With the advent of new tools and technologies to build big data solutions, availability of skills is a big challenge for CIO’s. A higher level of proficiency in the data science is required to implement big data solutions today because the tools are not user-friendly yet. (Bill Franks, 2012).

Complete Chapter List

Search this Book:
Reset