Towards Efficient Big Data Storage With MapReduce Deduplication System

Vijesh Joe, Jennifer S. Raj, Smys S.

Source Title: International Journal of Information Technology and Web Engineering (IJITWE)16(2)

ISSN: 1554-1045|EISSN: 1554-1053|EISBN13: 9781799859758|DOI: 10.4018/IJITWE.2021040103

MLA

Joe, Vijesh, et al. "Towards Efficient Big Data Storage With MapReduce Deduplication System." IJITWE vol.16, no.2 2021: pp.45-57. http://doi.org/10.4018/IJITWE.2021040103

APA

Joe, V., Raj, J. S., & Smys S. (2021). Towards Efficient Big Data Storage With MapReduce Deduplication System. International Journal of Information Technology and Web Engineering (IJITWE), 16(2), 45-57. http://doi.org/10.4018/IJITWE.2021040103

Chicago

Joe, Vijesh, Jennifer S. Raj, and Smys S. "Towards Efficient Big Data Storage With MapReduce Deduplication System," International Journal of Information Technology and Web Engineering (IJITWE) 16, no.2: 45-57. http://doi.org/10.4018/IJITWE.2021040103

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

In the big data era, there is a high requirement for data storage and processing. The conventional approach faces a great challenge, and de-duplication is an excellent approach to reduce the storage space and computational time. Many existing approaches take much time to pinpoint the similar data. MapReduce de-duplication system is proposed to attain high duplication ratio. MapReduce is the parallel processing approach that helps to process large number of files in less time. The proposed system uses two threshold two divisor with switch algorithm for chunking. Switch is the average parameter used by TTTD-S to minimize the chunk size variance. Hashing using SHA-3 and fractal tree indexing is used here. In fractal index tree, read and write takes place at the same time. Data size after de-duplication, de-duplication ratio, throughput, hash time, chunk time, and de-duplication time are the parameters used. The performance of the system is tested by college scorecard and ZCTA dataset. The experimental results show that the proposed system can lessen the duplicity and processing time.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Towards Efficient Big Data Storage With MapReduce Deduplication System

MLA

APA

Chicago

Export Reference

Abstract

Request Access