Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Secure Data Deduplication on Cloud Storage

Shivansh Mishra, Surjit Singh

Source Title: Handbook of Research on the IoT, Cloud Computing, and Wireless Network Optimization

DOI: 10.4018/978-1-5225-7335-7.ch002

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Deduplication is the process of removing duplicate data by storing only one copy of the original data and replacing the others simply as a reference to the original. When data is stored on cloud storage, client-side deduplication helps in reducing storage and communications overheads both from the client as well as the server perspective. Secure deduplication is the practice by which the data stored on the cloud is secured from external influences such that the clients maintain the privacy of their data, and the server also gets to take advantage of deduplication. This is done by encrypting the data using different schemes into ciphertext, which makes sense only to the original client. The schemes created for secure deduplication on cloud storage provide a solution to the problem of duplication detection in encrypted ciphertext. This chapter provides a brief overview of secure deduplication used on cloud storage along with the issues encountered during its implementation. The chapter also includes a literature review and comparison of some deduplication techniques.

Chapter Preview

Top

Introduction

In the modern world there have been several technologies which have inherently changed the way we interact with the digital ecosystem. Of these, cloud computing has been one of the most influential ones. The simple fact that we can store our data or off load processing to an off-site location instead of actually using our local devices has been revolutionary. This has increased the possibilities through which we can use data stored on cloud across various devices and for various purposes. This trend of off-loading storage and computing capacity to third party sources does not seem to be stopping anytime soon. Rather with the advent of technologies like IoT (Internet of Things), the sheer amount of cloud storage required is only projected to increase at a mammoth pace. According to a recent report the total volume of data stored will double every two years until 2020 (Gantz & Reinsel, 2012) and more than 75% of the data produced is considered to be duplicated (Reinsel & Gantz, 2010). Hence huge savings can be achieved by identifying this duplicate data and deduplication is a possible solution to this situation.

Data deduplication is the process which looks for redundant sequences of data across different comparison windows. The first unique version of a data object is stored and the other duplicates are just referenced to the original data object rather than stored again (as shown in Figure 1). This process is completely hidden from users and applications trying to retrieve the stored data. But the process of comparing two similar data objects byte by byte is very cumbersome. Hence, for deduplication the first step is to create a data fingerprint for each object that is written to the storage device. This fingerprint should be a unique identity key to the data object. Also there is an additional requirement that generating and comparing such fingerprints should not be too difficult. When new data comes which has to be written to the device, the fingerprint of the new data are matched with the ones of data objects which have already been written to storage. All duplicate data copies except the first one are not written to the actual storage but are just referenced as pointers to the location of first unique copy. If a previously unseen data object is encountered – one whose fingerprint doesn’t match any others on the storage device – the full data object is written to the storage. Sometimes hashes are used as data fingerprints. Different data structures are used to perform matching of hashes as the sheer number of data objects which are processed is very high. Hence both the hash generation as well as the hash matching schemes are optimized to provide the most efficient results. The deduplication that is performed on cloud storage has an additional criteria to be secure i.e., the cloud storage provider should not be privy to the actual plaintext that is being stored on the client. Also since spoofing attacks are very common on internet, care should be taken that the mechanism employed is immune to such attacks.

Figure 1.

Figure shows result of deduplication on duplicate data such that total number of stored data blocks are reduced. Figure also shows the pointer based storage approach for efficient storage utilization where similar blocks are mainly referenced instead of being restored (here similar colour blocks signify similar data blocks in the same storage array).

In this chapter we provide a brief overview of some important aspects of deduplication. We look at some of the advantages of deduplication and contrast them with some of its disadvantages. Some general limitations of deduplication techniques are also discussed. In later sections we look into some background on the research that has already happened in the field of secure data deduplication. We also see some generic criteria about a deduplication scheme which have to be followed if the scheme has to be applied to cloud storage scenario. We then compare some of the more popular deduplication techniques and we also look at some direction in which the future research in this area could be conducted.

The major advantages of deduplication are as follows –

Key Terms in this Chapter

Deduplication: The process of removing duplicate data from a storage device by saving references to duplicate data.

Proof of Storage (PoS): The process of identity verification of the storage server on which the file to be retrieved is being stored. This is done by client and prevents from spoofing attacks in which an invalid copy of file would be transferred to the client.

Convergent Encryption: Also known as content key hashing, is the process of producing identical ciphertext from identical plaintext.

Message Locked Encryption: The scheme by means of which hash of the file itself is used to encrypt the file.

Proof of Retrievability (PoR): The process of checking the integrity of data for a file that is stored on an off-site device. This helps in identifying illegal changes to the file by unauthorized personnel.

Proof of Ownership (PoW): The process of identity verification of the client requesting the file for download. This is done by storage server and prevents unauthorized access to sensitive data.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Secure Data Deduplication on Cloud Storage

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List