Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Role of Open Source Software in Big Data Storage

Rupali Ahuja, Jigyasa Malik, Ronak Tyagi, R. Brinda

Source Title: Handbook of Research on Big Data Storage and Visualization Techniques

DOI: 10.4018/978-1-5225-3142-5.ch005

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Today, the world is revolving around Big Data. Each organization is trying hard to explore ways for deriving value out of huge pile of data we are generating each moment. Open Source Software are widely being adopted by most academicians, researchers and industrialists to handle various Big Data needs because of their easy availability, flexibility, affordability and interoperability. As a result, several open source Big Data tools have been developed. This chapter discusses the role of Open Source Software in Big Data Storage and how various organizations have benefitted from its use. It provides an overview of popular Open Source Big Data Storage technologies existing today. Distributed File Systems and NoSQL databases meant for storing Big Data have been discussed with their features, applications and comparison.

Chapter Preview

Top

Background

The amount of data generated each second is continuously growing at an exponential rate. Facebook, a social networking website, is home to 40 billion photos and more than 100 hours of videos are uploaded to YouTube every minute and these statistics are burgeoning at speed of light in almost every field increasing the interest and demand for Big Data Storage and management technologies. A new forecast from International Data Corporation (IDC) sees the Big Data technology and services market growing at a Compound Annual Growth Rate (CAGR) of 23.1% over the 2014-2019 forecast periods with annual spending reaching $48.6 billion in 2019 (IDC, 2016).

Open Source tools are playing prominent role in managing Big Data Storage issues. The most dominant technologies used in Big Data world, Hadoop and Apache Spark are Open Source tools. The most popular Big Data software distribution companies like Cloudera and HortonWorks have based their business around open source technologies. Open Source is the platform best suited for Big Data solutions. Almost all Big Data solutions work on top of UNIX Operating System which is open source. Without open source tools, the Big Data world would not have grown so rapidly. According to Talend’s CEO, Mike Tuchen, “the entire next-generation data platform will be open source”. (Noyes, 2016)

Key Terms in this Chapter

Distributed File Systems (DFS): A File System in which files are distributed across multiple storage resources but appear to users as they exist on a single location.

MVCC (Multi Version Concurrency Control): It is a Concurrency Control method which allows concurrent access to the database without using any locking mechanism and by maintaining different versions of the same data.

Sharding: Sharding is a Database Partitioning scheme in which datasets are distributed across nodes for Load Balancing and improving performance.

Inode: In UNIX, inode is a data structure used to represent a file system object. It stores the attributes and disk block location of the file system object's data.

Multi-Master Replication: It is a method of database replication in which a group of computers store and update data. All members can handle client requests and are responsible for transmitting modifications to rest of its group members.

Master Slave Replication: Master Slave Replication allows data to be stored by a group of computers but it can be updated by only one member, the “master” of the group. Master is in charge of the group while several other database servers (the “slaves”) keep copies of all the data that’s been written to the master and can be queried. Data cannot be written to slaves directly.

Geographic Replication: A replication system in which data is replicated across servers which are geographically apart to improve network performance.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Role of Open Source Software in Big Data Storage

Abstract

Background

Key Terms in this Chapter

Complete Chapter List