Article Preview
TopIntroduction
With the wide applications of 3D techniques in academic, medical and other fields (Abburu, 2019), it becomes urgent to manage the exploding unlabeled 3D data in an efficient manner. An intuitive idea is to transfer the knowledge from available annotated data into these unlabeled 3D data. In reality, existing organized 2D datasets have advantages over 3D datasets in terms of category number and sample diversity. For example, the public 2D datasets used for classification or retrieval task such as ImageNet (Deng et al., 2009) and COCO (Lin et al., 2014) have over 20,000 categories and 15,000,000 samples, which is tens of times than those of the popular 3D datasets used for classification or retrieval task such as ModelNet (Wu et al., 2015) and ShapeNet (Chang et al., 2015). And it is more convenient to use widely available 2D images to manage 3D models. Thus, this paper leverages the model trained with labeled 2D dataset to manage these increasing unlabeled 3D data. However, this attractive task suffers from the domain shift induced by the distinctions of illumination, texture and background between 2D images and 3D models.
Domain adaptation is a mainstream direction to address the huge domain shift in cross-domain problem such as image-based 3D model retrieval in this paper. Existing domain adaptation methods can be divided into two categories (Xia & Ding, 2020), i.e., discrepancy measurement (Long et al., 2015; Long et al., 2017) and domain adversarial learning (Liu et al., 2019; Zhang et al., 2019). The former measures and reduces the statistical cross-domain distance. The latter inherits the concept of generative adversarial networks (GAN) (Goodfellow et al., 2014) to bridge the gap between two distinct domains. Specifically, a domain discriminator is added upon the feature extractor to distinguish features of source domain from those of target domain. Meanwhile, the feature extractor is trained to generate domain-invariant feature for both domains to confuse the discriminator. However, these series of domain adaptation methods are unlikely to be further improved since they only focus on domain-level alignment (global statistics alignment) but ignore the class-level alignment (semantic alignment) across domains.
To alleviated the above issue, numerous works have continuously sprung up, where one of the most effective and convincing methods is to assign relatively reliable pseudo labels for the target samples, on which the class-level alignment is promoted. For example, the authors in (Xie et al., 2018; Zhou et al., 2019a; Zhou et al., 2020) all adopted the source classifier that was well-trained on source domain to label the target domain and then performed their own designed semantic alignment. However, the source classifier is biased towards the source domain characteristics, which is not well applicable to the target domain. Thus, Zuo et al. (2021) proposed to train a target-specific classifier to assign pseudo labels for the target samples. Specifically, they utilized cross-entropy on easy target samples with the prediction from source classifier as ground-truth and a tailored GAN with two discriminators on tough target samples to train the target-specific classifier. Although such method considers target samples, the prediction accuracy of pseudo labels still relies on source classifier and domain transferability. To get rid of limitation of the source classifier and improve the reliability of pseudo labels for target samples, this paper explores the semantic similarity between target samples from different classification level by multiple clustering and then combines the similarity information with the target features to more accurately classify the target samples.