Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Big Data Analytics Using Apache Hive to Analyze Health Data

Pavani Konagala

Source Title: Nature-Inspired Algorithms for Big Data Frameworks

DOI: 10.4018/978-1-5225-5852-1.ch015

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.

Chapter Preview

Top

Introduction

In today’s life, web is playing an important role. A large amount of data is available online. These data are getting generated from various sources such as twitter, face book, cell phone GPS data, healthcare etc. Big data analytics (Chen et al, 2014) is the process of collecting and analysing large complex data sets containing a variety of data types to find customer preferences and other useful information. The processing of such data is difficult using traditional data processing applications. Therefore, to manage and process these types of data requires a new set of frameworks. Hadoop is an open software project for structuring Big Data and for making this data useful for analytics purposes. The creator of this software is Doug Cutting. He is an employee at Yahoo for the Nutch search engine project. He named it after seeing his son’s toy elephant. The symbol for Hadoop is a yellow elephant. Hadoop serves as a core platform to enable the processing of large data sets over cluster of servers. These servers are designed to be scalable with high degree of fault tolerance.

•
Seven V’s of Big Data Analytics: The Big Data (Sagiroglu et al, 2013) is broken into seven dimensions: Volume, Variety, Velocity, Veracity, Visualisation, Variability and Value.
- o
  Volume: Volume is the amount of data. The volume of data stored in an organisation has grown from megabytes to petabytes. The big volume represents Big Data.
- o
  Variety: Variety refers to the many sources and types of data such as structural, semi structural and un structural.
- o
  Velocity: It deals with the speed at which data flows from different sources such as social media sites, mobile device, business process, networks and human interaction etc. This velocity of data should be handled to make valuable business decisions.
- o
  Veracity: It is virtually worthless, if the data set being analysed is incomplete and inaccurate. This may happen due to the collection of data set from various sources with different formats, with noise and errors. Large amount of time may be involved to clean up this noisy data rather than analysing it.
- o
  Visualisation: Once the data set is processed it should be presented in readable format. Visualisation may contain many parameters and variables which cannot be represented using normal graphical formats or spread sheets. Even three-dimensional visualisations also may not help. So, the visualisation has become a new challenge of Big Data Analytics. AT & T has announced a new package called Nanocubes for visualisation.
- o
  Variability: Variability refers to the data set whose meaning and interpretations changes constantly. These changes occur depending on the context. Particularly this is true with Natural Language Processing. A single word may have different meanings. Over time new meanings may be created in place of old one. Interpreting them is essential in the applications like social media analytics. Therefore, the boundless variability of Big Data presents a unique challenge for Data scientists.
- o
  Value: There is a high potential value for Big Data Analytics. In the applications such as US health care system, Big Data Analytics have reduced the spending to 12-17 percent. The Big Data offers not only new and effective methods of selling but also new products to meet previously undetected market demands. Many industries use Big Data for reducing the cost of their organisations and their customers.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Big Data Analytics Using Apache Hive to Analyze Health Data

Abstract

Introduction

Complete Chapter List