Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Programming and Pre-Processing Systems for Big Data Storage and Visualization

Hidayat Ur Rahman, Rehan Ullah Khan, Amjad Ali

Source Title: Handbook of Research on Big Data Storage and Visualization Techniques

DOI: 10.4018/978-1-5225-3142-5.ch009

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter of the book chapter provides detailed overview of the major concept used in Big Data. In order to process the huge volume of data, the first step is the pre-processing which is required to anomalies such as, missing values by applying various transformations. This chapter provides a detail overview of preprocessing tools used for Big Data such as, R, Yahoo! Pipes, Mechanical Turk, Elasticsearch etc. Beside preprocessing tools, the chapter provides detailed overview of storage tools, programming tools, data visualization, log processing tools and caching tools used for Big Data analytics. In other words, this chapter is the core of the book and provides the overview of the major technologies discussed later in the book.

Chapter Preview

Top

Background

The term Big Data is commonly used for huge volumes of data which cannot be operated using traditional databases, they are beyond the capabilities of commonly used software tools to store, manipulate and process data within limited time (Garcia, 2015). There are often terabytes (Tb) or petabytes (Pb) of information stored in a single dataset. Some of the problems related with big data are capturing steaming data, storage, indexing, sharing and visualization. Enterprises use this high volume of data to extract useful knowledge using various software tools. However, as the size of the dataset increase, the difficulty in management also increases. In order to manage this huge amount of data, advanced tools and techniques are used. Since traditional data analysis and management tools are unable to exploit the data, it requires more sophisticated and specialized tools to store and manipulate data. Big Data analytics comprise of tools and techniques which helps in decision making process (Russom, 2011). Big Data analytics comprises of three main areas, the storage and management tools used for data, processing tools used for extracting useful information from the data and visualization. These three areas form different phases of a decision-making process in Big Data (Russom, 2011).

Key Terms in this Chapter

Extract, Transform, Load (ETL): ETL stands for Extract, Transform and Load. ETL is a process used in data warehousing to populate one database from another database.

HBase: Apache HBASE is a non-relational Hadoop database used for Big Data store. HBASE is written in Java.

Content Delivery Network (CDN): Content Delivery Network is used in distributed environment to deliver web contents to the user based on geographic locations.

DRQL: DRQL is a query language that very much resembles with SQL used for nested data designed for column based processing. DRQL is compatible with BIG Query and Dremel.

ZeroMQ: ZeroMQ is an asynchronous messaging system used in distributed and concurrent applications.

Atomicity, Consistency, Isolation and Durability (ACID): Atomicity, Consistency, Isolation and Durability are the properties of transaction management system. A transaction should satisfy the ACID properties.

Greenplum: Greenplum is a Big Data company which provides analytic operation on petabytes of data very rapidly.

Extreme Application Platform (XAP): Gigaspaces Extreme Application Platform is well suited for high performance, low latency, transaction processing as well as analytic processing.

Hadoop Distributed File System (HDFS): Hadoop Distributed File System is used to store huge files and stream the data to server at higher bandwidth typically to servers and user applications.

ACUNU: ACUNU is an analytic platform used for high velocity data mostly used in production environments.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Programming and Pre-Processing Systems for Big Data Storage and Visualization

Abstract

Background

Key Terms in this Chapter

Complete Chapter List