Landmark Dataset Development and Recognition

Landmark Dataset Development and Recognition

Min Chen, Hao Wu
DOI: 10.4018/IJMDEM.2021100103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Landmark recognition aims to detect popular natural and manmade structures within an image. It is challenging with one of the reasons being the lack of large annotated datasets. Existing work mainly focuses on landmarks located in Europe and North America due to regional and language bias. In this study, the authors build a comprehensive Chinese landmark dataset to complement the current data and to benefit research for landmark recognition. It is done by leveraging the vast amount of multimedia data on the web and utilizing image clustering and retrieval techniques in data preparation and analysis. This results in a Chinese landmark dataset with a total of 42,548 images for 987 unique landmarks. In addition, a landmark recognition model is developed based on advanced deep learning techniques and integrated into a mobile application that allows users to do landmark prediction without the need of internet access or cellular data coverage.
Article Preview
Top

Introduction

With the development of deep learning algorithm and the advance of computing hardware, image classification technology has been remarkably improved over the recent years. Many researchers are now working on fine-grained recognition problems including landmark recognition that aims to automatically discriminate landmark categories with subtle visual differences. Currently, one of the biggest challenges in landmark recognition research is the lack of sufficient annotated data per landmark (Wu & Chen, 2020). For example, Google-Landmarks (Araujo & Weyand, 2018), the largest worldwide landmark dataset released by Google in March 2018, has roughly 70 images per landmark by average. Over 7,000 among its 30,000 landmarks contain less than or equal to 10 images in their training set (Google-landmarks dataset, 2021). This number of images per landmark/category is significantly smaller than most datasets used by traditional visual recognition research, such as ILSVRC2017 (ImageNet Large Scale Visual Recognition Challenge) (ILSVRC, 2017) that provides averagely 1,350 images per category as the training and evaluation data for the classification and localization tasks. In addition, Google-Landmarks only provides a unique ID for each landmark without other much needed annotations, and like most other existing datasets (Zheng et al., 2009), it mainly focuses on landmarks located in Europe and North America due to regional and language bias. Because of the lack of appropriate dataset and inherent complexity of landmark recognition task, existing studies fail to offer an effective solution to recognize landmarks from pictures. For example, cloud-based landmark recognition APIs including Microsoft API and Google API yield relatively low accuracy of less than 36%. Tour the word (Zheng et al., 2009) and Scene maps (Avrithis, Kalantidis, Tolias, & Spyrou, 2010) require a local database to save landmark images, so the landmark recognition is in fact based on image retrieval, which is resource intensive in terms of storage and computational time.

Therefore, in this study, we build a more comprehensive and annotated Chinese landmark dataset to complement the current data and to benefit research for landmark recognition problems. To build such a dataset, the study leverages the vast amount of multimedia data on the web using image clustering and retrieval techniques in data preparation and analysis. A landmark recognition model is then built on this dataset by adopting and extending existing research concepts on deep learning and transfer learning. Previous research papers have demonstrated that transfer learning is effective in applying to middle-scale datasets with less than or equal to 200 categories (Oquab, Bottou, Laptev, & Sivic, 2014), such as Pascal VOC (Everingham, Eslami, Van Gool, et al., 2015) with 20 categories, Oxford Flowers with 102 categories, and Caltech-UCSD Birds (Welinder et al., 2010) with 200 categories. This project extends the technique to develop landmark recognition models with our own dataset that consists of 987 unique landmarks. Furthermore, the recognition model is integrated into an iOS application so users can do landmark recognition on images taken by the phone camera or chosen from device gallery without Internet connections or cellular data coverage.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing