An Automatic Centroid Image Selection Method Based on Fuzzy Logic Reasoning in Image Deduplication

An Automatic Centroid Image Selection Method Based on Fuzzy Logic Reasoning in Image Deduplication

Ming Chen (Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou, China), Jinghua Yan (National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing, China), Tieliang Gao (School of Business, Xinxiang University, Henan, China), Huan Ma (Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou, China), Li Duan (Beijing Jiaotong University, Beijing, China) and Qiguang Tang (Oil & Gas Engineering Service Center, Sinopec Zhongyuan Oilfield, Henan, China)
Copyright: © 2020 |Pages: 12
DOI: 10.4018/IJGHPC.2020100101
OnDemand PDF Download:
No Current Special Offers


Centroid selection plays a key role in image deduplication. It means selecting an optimal solution as a centroid image in a duplicate image set. Meanwhile, it will delete other image copies and establish pointers to point to the centroid image in the original position. At present, there is not a mature centroid selection scheme. Centroid selection mainly relies on users to manually complete according to experience. In a massive data environment, it will consume a lot of human resources, and it is easy to make mistakes by subjective judgment. Therefore, in order to solve this problem, this article proposes an automatic centroid image selection method based on fuzzy logic reasoning. In a duplicate image set, the image attribute information is used to automatically infer comprehensive quantized values to represent images, and the centroid image is selected by comparing the quantized values. The experimental results showed that the scheme not only could meet the visual perception characteristics, but also meet the purpose of image deduplication.
Article Preview


With the development of Web2.0 (Xiao, Cheng, Wei, Li, Wang, & Xu, 2019; Xiao, Cheng, & Liu, 2019), image data on the Internet has explosive growth. This has brought the great pressure on data storage. Blindly increasing the storage devices has been unable to solve the problem of data explosion. Therefore, how to use the limited storage resources to meet the growing storage demand has become an urgent problem in storage field.

At present, the traditional data deduplication technology is unsatisfactory for multimedia data, especially in the main storage system (Min, Yoon, & Won, 2011). This is because that the traditional data deduplication technology judges that two data items are redundant if and only if their bit streams are identical. But in the image storage field, according to image encoding rules (Pennebaker & Mitchell, 1993), any small changes will completely change the bit stream of an image. Therefore, the traditional data deduplication technology can only eliminate the exact same images, and it can't do anything for duplicate images, which have the same visual perception and different encoding.

So a new technology called image deduplication has emerged (Rashid & Miri, 2018). Image deduplication means that an optimal image is selected as a centroid image according to image attribute information in a duplicate image set. Then other duplicate images are deleted. And in the original position pointers are established to point to the centroid image. According to users’ need, other duplicate images can be obtained again by transformations from the centroid image. However, at present there is still not a mature solution for image deduplication. This is mainly due to the following two reasons:

  • 1.

    Retrieval accuracy: The accuracy of duplicate image detection is difficult to achieve 100%, and the error deletion will bring losses to users. Therefore, at present duplicate image deletion mainly relies on users to manually select an image as a representative based on subjective experience, and delete other images.

  • 2.

    Centroid selection: The content of duplicate images is not exactly the same. So, it is necessary to select a representative image and then delete other duplicate images. Here in order to reduce user loss, images with higher perceived quality are generally selected as representative images. This is because images with lower perceived quality can be deleted and replaced by images with higher perceived quality when we need them. But images with higher perceived quality cannot be replaced by images with lower perceived quality (Etienne, Herve, & Adrian,2017). At present, for a duplicate image set, which factors and algorithms can be used to automatically select representative images are still inconclusive.

For the first reason, the content-based duplicate image detection technology has been studied since the early 1990s (Chang, Wang,& Li,1998; Changick, 2003; Sivic & Zisserman, 2003; Etienne, Herve,& Adrian,2017; Wu, Ard, Ewin,& Michael,2017; Tang, Li & Zhu,2018; Liu, Shen, Wang, & Wang, 2019). Although the retrieval accuracy is still not achieved 100%, in some special applications, the retrieval accuracy can be close to 100% by feature selection and threshold control. In the case of allowing a certain loss, it can meet the needs of image deduplication. For other wider applications, the existing retrieval accuracy is not satisfactory. So, the process of duplicate image deletion entirely relies on manual judgment, it will occupy a large amount of human and material resources, and it will easily lead to subjective judgment errors. Given the “semantic gap” be difficult to achieve a big breakthrough in a short time, the focus of this paper is not how to improve the retrieval accuracy, but how to automatically select a centroid image based on the found duplicate images. If the centroid image can be selected automatically, which can help people make auxiliary decisions to improve work efficiency and reduce judgment errors. It will be a very meaningful job. So this paper is to study how to automatically select a representative image as the centroid image according to the image content and the characteristics of image deduplication. In order to solve this problem, we first give the principles of the centroid image selection.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 14: 6 Issues (2022): Forthcoming, Available for Pre-Order
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing