Improved Semantic Representation Learning by Multiple Clustering for Image-Based 3D Model Retrieval

Improved Semantic Representation Learning by Multiple Clustering for Image-Based 3D Model Retrieval

Jinghui Chu, Xiaoqian Zhao, Dan Song, Wenhui Li, Shenyuan Zhang, Xuanya Li, An-An Liu
Copyright: © 2022 |Pages: 20
DOI: 10.4018/IJSWIS.297033
Article PDF Download
Open access articles are freely available for download

Abstract

Under the heavy management on the increasing 3D models, the topic of image-based 3D model retrieval which organizes unlabeled 3D models based on abundant knowledge learned from labeled 2D images has drawn attention. However, prior methods are limited in aligning semantically at corresponding categories of two domains due to the lack of label information in the 3D domain. To this end, this paper proposes an improved semantic representation learning by multiple clustering approach, which improves the reliability of pseudo labels for 3D models, so as to achieve class-level semantic alignment. Specifically, this paper first extracts features for 2D images and 3D models. Then it clusters combining the 3D features with the semantic information from multiple clustering on 3D model features to obtain more reliable target pseudo labels. Extensive experiments have shown that the proposed method has achieved the gain of 3.0%-205.0% averagely for popular retrieval metrics on the benchmark of monocular image-based 3D object retrieval (MI3DOR), and 1.3%-69.7% on another advanced benchmark, MI3DOR-2.
Article Preview
Top

Introduction

With the wide applications of 3D techniques in academic, medical and other fields (Abburu, 2019), it becomes urgent to manage the exploding unlabeled 3D data in an efficient manner. An intuitive idea is to transfer the knowledge from available annotated data into these unlabeled 3D data. In reality, existing organized 2D datasets have advantages over 3D datasets in terms of category number and sample diversity. For example, the public 2D datasets used for classification or retrieval task such as ImageNet (Deng et al., 2009) and COCO (Lin et al., 2014) have over 20,000 categories and 15,000,000 samples, which is tens of times than those of the popular 3D datasets used for classification or retrieval task such as ModelNet (Wu et al., 2015) and ShapeNet (Chang et al., 2015). And it is more convenient to use widely available 2D images to manage 3D models. Thus, this paper leverages the model trained with labeled 2D dataset to manage these increasing unlabeled 3D data. However, this attractive task suffers from the domain shift induced by the distinctions of illumination, texture and background between 2D images and 3D models.

Domain adaptation is a mainstream direction to address the huge domain shift in cross-domain problem such as image-based 3D model retrieval in this paper. Existing domain adaptation methods can be divided into two categories (Xia & Ding, 2020), i.e., discrepancy measurement (Long et al., 2015; Long et al., 2017) and domain adversarial learning (Liu et al., 2019; Zhang et al., 2019). The former measures and reduces the statistical cross-domain distance. The latter inherits the concept of generative adversarial networks (GAN) (Goodfellow et al., 2014) to bridge the gap between two distinct domains. Specifically, a domain discriminator is added upon the feature extractor to distinguish features of source domain from those of target domain. Meanwhile, the feature extractor is trained to generate domain-invariant feature for both domains to confuse the discriminator. However, these series of domain adaptation methods are unlikely to be further improved since they only focus on domain-level alignment (global statistics alignment) but ignore the class-level alignment (semantic alignment) across domains.

To alleviated the above issue, numerous works have continuously sprung up, where one of the most effective and convincing methods is to assign relatively reliable pseudo labels for the target samples, on which the class-level alignment is promoted. For example, the authors in (Xie et al., 2018; Zhou et al., 2019a; Zhou et al., 2020) all adopted the source classifier that was well-trained on source domain to label the target domain and then performed their own designed semantic alignment. However, the source classifier is biased towards the source domain characteristics, which is not well applicable to the target domain. Thus, Zuo et al. (2021) proposed to train a target-specific classifier to assign pseudo labels for the target samples. Specifically, they utilized cross-entropy on easy target samples with the prediction from source classifier as ground-truth and a tailored GAN with two discriminators on tough target samples to train the target-specific classifier. Although such method considers target samples, the prediction accuracy of pseudo labels still relies on source classifier and domain transferability. To get rid of limitation of the source classifier and improve the reliability of pseudo labels for target samples, this paper explores the semantic similarity between target samples from different classification level by multiple clustering and then combines the similarity information with the target features to more accurately classify the target samples.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing