A Novel Convolutional Neural Network Based Localization System for Monocular Images

A Novel Convolutional Neural Network Based Localization System for Monocular Images

Chen Sun (Tsinghua University, Beijing, China), Chunping Li (Tsinghua University, Beijing, China) and Yan Zhu (Southwest Jiaotong University, Chengdu, China)
DOI: 10.4018/IJSSCI.2019040103


The authors present a robust and extendable localization system for monocular images. To have both robustness toward noise factors and extendibility to unfamiliar scenes simultaneously, our system combines traditional content-based image retrieval structure with CNN feature extraction model to localize monocular images. The core model of the system is a deep CNN feature extraction model. The feature extraction model can map an image to a d-dimension space where image pairs in the real word have smaller Euclidean distances. The feature extraction model is achieved using a deep Convnet modified from GoogLeNet. A special way to train the feature extraction model is proposed in the article using localization results from Cambridge Landmarks dataset. Through experiments, it is shown that the system is robust to noise factors supported by high level CNN features. Furthermore, the authors show that the system has a powerful extendibility to other unfamiliar scenes supported by a feature extract model's generic property and structure.
Article Preview

1. Introduction

Localization is crucial for people’s life and many applications like navigation, robotics, augmented reality, etc. Though the global positioning system (GPS) can solve the problem in the most of situations, there are still some cases that GPS cannot handle well. Many image-based localization methods are proposed to deal with these cases. This paper proposes a novel localization system named Dis-Retrieval to estimate position from a monocular RGB image.

Our proposed system takes a monocular RGB image as input and outputs a position where this image is token. The core of our system is a deep convolutional neural network (CNN) model, which can map an image to a d-dimension space where feature pairs of distance-closer image pairs in real word have smaller Euclidean distances. As Figure 1 illustrates, the structure of our system is similar with content-based image retrieval system (Zhang, Zhao, and Han, 2009). When a query image is coming, our system firstly uses CNN feature extraction model to extract a feature vector from query image, then uses it to match features of labeled images for k nearest images, finally uses k images’ position information to estimate query image’s position. Similar with content-based image retrieval system, it can operate in real time speeded by hash technology (Gionis, Indyk, and Motwani, 2000).

Figure 1.

Structure of proposed localization system for monocular image


Before introducing our main contribution, we first simply talk about motivations of this paper. By now, there are mainly two types of methods to solve the image-based localization problem. Methods of first type are traditional methods which estimate position by using traditional image features (like SIFT (Lowe, 2004)) to match images. Methods of this type are easily influenced by noise factors in images such as light, camera’s angle, pedestrians, cars, etc. Methods of second type are CNN-feature-based methods, which use CNN to solve the problem. Methods of this type can easily handle the influence of light, camera’s angle, pedestrians and cars through data-driven learning. But they are less extendable, that is one trained model can just process one scene or one place. For example, we train a model using labeled images of A University: if we want to have a localization model of B University, we have to train another model using labeled images of B University; even when we want to add some labeled images of A University, we also have to fine tune old model of A University. Maybe you can force one model to process two universities, but what if there are 1000 universities? The model’s parameters are limited, so it is hard to extend methods of this type to many places. Our proposed system overcomes these two difficulties through combining traditional content-based image retrieval structure with CNN feature extraction model.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 12: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing