Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Trending Big Data Tools for Industrial Data Analytics

A. Bazila Banu, V. S. Nivedita

Source Title: Encyclopedia of Data Science and Machine Learning

DOI: 10.4018/978-1-7998-9220-5.ch032

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Big data is a substantial amount of data sets that cannot be stored, managed, or examined using conventional tools. Today, there are billions of data resources that generate data at a very speedy rate. These data resources are available across the world. Consider Facebook as an example—it produces 500 terabytes of data approximately every day which contains photographs, videos, text messages, emoticons, and more. In general, data that exists in the real world can be classified into different formats, like structured, semi-structured, and unstructured data. Structured data referred to as quantitative data has a well-defined format with only text /numbers which can be easily stored in the relational database. For example, data is represented in an Excel sheet. Semi-structured data has a partial format which may include tags. HTML tags fall in this category and need to be processed to store in the relational database. Unstructured data is otherwise referred to as qualitative data. For example, emails and video fall into this category.

Chapter Preview

Top

Background

Industrial Big Data

In the era of digital economic globalization, intelligent decision making has attracted a lot of attention from the digital industry market. One prime technology in artificial intelligence is big data driven analysis. This enhances the productivity and helps in making wise decisions by mining the hidden knowledge and the potential ability of the Big Data (M. Ghasemaghaei & G. Cali, 2019). Many real-time large-scale data are applied to the industrial process. Mostly the real-time data are streamed from noisy environment Also among the acquired data certain data will be labelled and few may not. Such kind of substantial amount of data with various challenges within are processed and expected to produce an optimized intelligent output without compromising the time and space dimensions. Hence the Big Data processing requires extensible methods to distribute and store real-time data, to suggest and dynamically adapt with the changes made in the process to provide automatic decisions (Ritu Ratra & Preeti Gulia, 2019). Thus, the End-to End Big Data process is expected to integrate, adapt and generalize the data in all stages within to create intelligent decisions with respect to the process.

Therefore, non-traditional techniques and strategies are required to store, organize and process the big data sets. There are several big data tools available, the following are the important big data analytics tools are highly recommendedand applied in industry servicing various needs such as data collection, data cleaning, data filtering and extraction, data validation, and data storage.

Hadoop -- To collect and evaluate data.

MongoDB -- To handle data that gets updated frequently.

Talend -- To provide data incorporation and administration.

Cassandra -- To handle aggregates of data.

Spark -- To provide real-time administration while handling large volume of data in the distributed environment.

STORM -- To process high velocity data in distributed real-time computational environment.

The following context discusses the aforementioned tools in detail with working strategies, application possibilities, its merits and exceptions in order to point out how effective the tools are applied in Big Data Analytics.

Key Terms in this Chapter

Open-Source: Denoting software for which the original source code is made freely available and may be redistributed and modified.

Web Services: A web service is any piece of software that makes itself available over the internet and uses a standardized XML messaging system.

Batch Processing: Batch data processing is a method of processing large amounts of data at once.

Dataset: A collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer.

Distributed File System (DFS): A Distributed File System (DFS) as the name suggests, is a file system that is distributed on multiple file servers or multiple locations. It allows programs to access or store isolated files as they do with the local ones, allowing programmers to access files from any network or computer. The main purpose of the Distributed File System (DFS) is to allows users of physically distributed systems to share their data and resources by using a Common File System.

Non-Relational Databases: A non-relational database stores data in a non-tabular form, and tends to be more flexible than the traditional, SQL-based, relational database structures.

Database: A database is an organized collection of structured information, or data, typically stored electronically in a computer system.

Latency: Latency is defined as the delay before a transfer of data begins following an instruction for its transfer.

Stream Processing: Stream processing is a big data technology that focuses on the real-time processing of continuous streams of data in motion.

Data Integration: Data integration involves combining data residing in different sources and providing users with a unified view of them.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference