Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

A Holistic View of Big Data

Won Kim, Ok-Ran Jeong, Chulyun Kim

Source Title: Big Data: Concepts, Methodologies, Tools, and Applications

DOI: 10.4018/978-1-4666-9840-6.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Today there is much hype about big data. The discussions seem to revolve around data mining technology, social Web data, and the open source platform of NoSQL and Hadoop. However, database, data warehouse and OLAP technologies are also integral parts of big data. Big data involves data from all sources, not just social Web data. Further, big data requires not only technology, but also a painstaking process for identifying, collecting, and preparing sufficient amounts of relevant data. This paper provides a holistic view of big data.

Chapter Preview

Top

Introduction

Big data is one of the current IT buzzwords. It refers roughly to extraction of actionable intelligence from a large amount of data, including social Web data, and applying it to some important needs of an organization. The data may be stored in the proprietary databases of an organization or purchased from third-party data providers or may be gathered from the Internet.

Although there is much hype about big data today, big data has been around for at least three decades or even longer, depending on how it is defined. From the 1970s, database systems, report generators and decision support systems were the technologies used for managing and analyzing large amounts of data. In the 1990s data warehousing and data migration technologies made organization-wide decision making easier over data across various data sources. At about the same time, data mining technology emerged to allow for semi-automatic extraction of grouping and classification of data. In all these, relational database systems and file systems have been used for storing data. Recently, the Hadoop open-source platform has become popular for storing and processing big data.

(For expositional simplicity, henceforth we will use the term “big data” to mean not just “a huge amount of data”, but also “storing, managing, and analyzing big data”. Big data certainly requires technologies.) The current hype about big data in the trade press appears to make big data seem like it is all about technologies, is a fully automated magic, and is a requirement for the survival of every organization. In reality, big data is not all about technologies, it requires considerable expert human efforts, and it can give competitive advantages to an organization only if used properly. In fact, big data requires the following three critical elements, besides technologies.

1.
Big data requires data. This may sound even dumb. However, the point is that the data must be the right kinds, must be sufficient in quantity, and must be clean. If relevant data is not available, no actionable intelligence can be discovered. If the amount of data is not sufficient, there may be no statistical significance in the results of big data. Even if there is a huge amount of data, when much of it is dirty, the data is not usable.
2.
Big data involves a painstaking process. If this process is not properly followed, efforts to extract actionable intelligence from big data are not likely to succeed. The starting point of the process is to identify important objective for big data, and exploring feasibility of successfully meeting the objective. The process ends after the actionable intelligence discovered is applied to the business needs. In between, the data must be analyzed for suitability for analysis and be cleansed, transformed and encoded for analysis. The suitability analysis and preparation for analysis require substantial human efforts.
3.
Big data requires people who understand how to use the technologies and how to execute each step of the big data process. Data mining technology is based on approximate computations that group data based on some measures of “mathematical similarity”, without understanding the meanings of the data. There are many mining tasks, including grouping (clustering) similar objects, classifying new objects into one of the existing groups, detecting anomalous data (outliers), etc. These tasks must be performed on various types of data, including numeric data, words, text, Web pages, multimedia data, sequence data, etc. Further, these tasks must support the special requirements and characteristics of numerous types of applications. To make the matter even worse, for any given task, there are many algorithms with different tradeoffs. There has been much progress in the usability of the data mining software that embody the algorithms. However, there is still a long way to go.

In other words, big data is difficult to do. In this paper, we provide a holistic view of big data, including technologies and non-technology elements, so that the readers may have a more complete perspective of big data, rather than get sidetracked by the current hype.

The remainder of this article is organized as follows. In the Process section, we will discuss the big data process, along with the technologies relevant to big data. In the Data Mining Technologies section, we will review data mining technologies. In the Database Platform section, we will discuss the big data platform issues. In the Conclusion section, we will outline R&D directions and conclude the article.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

A Holistic View of Big Data

Abstract

Introduction

Complete Chapter List