A Novel Deep Learning-Based Visual Search Engine in Digital Marketing for Tourism E-Commerce Platforms

A Novel Deep Learning-Based Visual Search Engine in Digital Marketing for Tourism E-Commerce Platforms

Yingli Wu, Qiuyan Liu
Copyright: © 2024 |Pages: 27
DOI: 10.4018/JOEUC.340386
Article PDF Download
Open access articles are freely available for download

Abstract

Visual search technology, because of its convenience and high efficiency, is widely used by major tourism e-commerce platforms in product search functions. This study introduces an innovative visual search engine model, namely CLIP-ItP, aiming to thoroughly explore the application potential of visual search in tourism e-commerce. The model is an extension of the CLIP (contrastive language-image pre-training) framework and is developed through three pivotal stages. Firstly, by training an image feature extractor and a linear model, the visual search engine labels images, establishing an experimental visual search engine. Secondly, CLIP-ItP jointly trains multiple text and image encoders, facilitating the integration of multimodal data, including product image labels, categories, names, and attributes. Finally, leveraging user-uploaded images and jointly selected product attributes, CLIP-ItP provides personalized top-k product recommendations.
Article Preview
Top

Visual Search Technology in Tourism E-Commerce

In tourism e-commerce, visual search technology has emerged as a transformative and impactful tool, fundamentally altering the way consumers explore and discover products online. This technology utilizes advanced computer vision and machine learning algorithms to analyze and comprehend the visual content of images. By extracting key features, patterns, and characteristics from product images, visual search systems can accurately identify and match items, providing users with relevant and visually similar results. This capability is particularly beneficial in scenarios where users find it challenging to articulate their search intent through text or encounter language barriers.

However, the challenges associated with this task remain substantial. Firstly, employing deep learning methods for visual search typically demands substantial computational resources, potentially posing challenges for some small to medium-sized tourism e-commerce platforms. Secondly, the processing of image data may involve user privacy concerns, necessitating effective measures on the part of tourism e-commerce platforms to safeguard the privacy and security of user data. Overall, the application of visual search technology in tourism e-commerce continues to evolve, offering users and merchants a more intelligent and convenient interactive experience, with the potential to become a key driving force in the future development of the tourism e-commerce industry.

Visual Retrieval Based on Multi-Modal Data

The visual retrieval method based on multi-modal data enriches the information representation of the visual retrieval system by simultaneously considering various data modalities such as images, text, and audio, thereby providing users with more comprehensive and accurate retrieval results. Common multi-modal visual retrieval methods achieve correlated learning between different data modalities by simultaneously training multiple modal feature extractors. This approach, through information sharing, captures the inter-modality correlations more effectively, enhancing the performance of the retrieval system (Ye & Zhao, 2023; Ye et al., 2023; Yuan et al., 2022).

In terms of application effectiveness, multi-modal visual retrieval based on diverse data modalities integrates information from images, text, audio, and more, offering a more comprehensive visual understanding that assists in meeting users' retrieval needs more accurately. Furthermore, the consideration of multi-modal data aids in modeling semantic relationships between different data modalities, improving the retrieval system's understanding of complex semantic information and thereby enhancing the relevance of retrieval results. Lastly, multi-modal learning contributes to improving the system's generalization ability to new data, especially when dealing with limited samples or scarce modalities, enabling better adaptation to diverse data distributions.

However, the use of deep learning and joint learning methods may involve higher computational complexity, posing challenges in environments with limited computing resources. An additional challenge lies in the laborious and time-consuming process of annotating multi-modal data, particularly when dealing with intricate semantic relationships between modalities, which further increases the difficulty of annotation.

Complete Article List

Search this Journal:
Reset
Volume 36: 1 Issue (2024)
Volume 35: 3 Issues (2023)
Volume 34: 10 Issues (2022)
Volume 33: 6 Issues (2021)
Volume 32: 4 Issues (2020)
Volume 31: 4 Issues (2019)
Volume 30: 4 Issues (2018)
Volume 29: 4 Issues (2017)
Volume 28: 4 Issues (2016)
Volume 27: 4 Issues (2015)
Volume 26: 4 Issues (2014)
Volume 25: 4 Issues (2013)
Volume 24: 4 Issues (2012)
Volume 23: 4 Issues (2011)
Volume 22: 4 Issues (2010)
Volume 21: 4 Issues (2009)
Volume 20: 4 Issues (2008)
Volume 19: 4 Issues (2007)
Volume 18: 4 Issues (2006)
Volume 17: 4 Issues (2005)
Volume 16: 4 Issues (2004)
Volume 15: 4 Issues (2003)
Volume 14: 4 Issues (2002)
Volume 13: 4 Issues (2001)
Volume 12: 4 Issues (2000)
Volume 11: 4 Issues (1999)
Volume 10: 4 Issues (1998)
Volume 9: 4 Issues (1997)
Volume 8: 4 Issues (1996)
Volume 7: 4 Issues (1995)
Volume 6: 4 Issues (1994)
Volume 5: 4 Issues (1993)
Volume 4: 4 Issues (1992)
Volume 3: 4 Issues (1991)
Volume 2: 4 Issues (1990)
Volume 1: 3 Issues (1989)
View Complete Journal Contents Listing