High Performance Online Image Search with GPUs on Large Image Databases

High Performance Online Image Search with GPUs on Large Image Databases

Ali Cevahir (Rakuten Institute of Technology, Rakuten Inc., Tokyo, Japan) and Junji Torii (Rakuten Institute of Technology, Rakuten Inc., Tokyo, Japan)
DOI: 10.4018/jmdem.2013070102
OnDemand PDF Download:


The authors propose an online image search engine based on local image keypoint matching with GPU support. State-of-the-art models are based on bag-of-visual-words, which is an analogy of textual search for visual search. In this work, thanks to the vector computation power of the GPU, the authors utilize real values of keypoint descriptors and realize real-time search at keypoint level. By keeping the identities of each keypoint, closest keypoints are accurately retrieved. Image search has different characteristics than textual search. The authors implement one-to-one keypoint matching, which is more natural for images. The authors utilize GPUs for every basic step. To demonstrate practicality of GPU-extended image search, the authors also present a simple bag-of-visual-words search technique with full-text search engines. The authors explain how to implement one-to-one keypoint matching with text search engine. Proposed methods lead to drastic performance and precision improvement, which is demonstrated on datasets of different sizes.
Article Preview


In this study, we introduce an online image search engine for large image databases. For a query image, search engine retrieves similar images, such as those taken in different angles, scales, lighting conditions, etc. For example, to search for an item in an e-commerce site, the user takes a picture of the item to be searched and uses the picture as a search query. The search engine is expected to return a list of images, in which images containing the queried object is listed on top. There may be billions of images in the database and the search results should return in real-time.

Visually similar image search is a very vivid research area for the last decade. Recent techniques utilize scale and rotation invariant local feature points of images. Feature points or keypoints of an image are interesting points on the image, which can be extracted using various detectors (Mikolajczyk & Schmid, 2004). Extracted keypoints are then represented by high-dimensional feature vectors or descriptors (Mikolajczyk & Schmid, 2005). For example, 128-element SIFT feature vector representation is a popular method (Lowe, 1999). An image is described by hundreds to thousands of such feature vectors.

Once images are stored as a set of high-dimensional feature vectors, visually similar image search can be realized by matching the closest keypoints in the database with the keypoints of the query image. However, as the number of images increases, it becomes impractical to match keypoints by exclusive search over high-dimensional feature vectors. For this reason, state-of-the-art techniques are focused on matching visual words, instead of matching individual keypoints. Visual word dictionary is calculated by clustering keypoints. Each cluster is represented by a visual word. Each keypoint of images are mapped to a visual word, by finding the cluster that the keypoint falls in. Hence, each image is now represented as a set of visual words. This type of representation of images is called bag-of-visual-words.

Bag-of-visual-words (BoV) representation is similar to bag-of-words for text documents. Therefore retrieval techniques applied for text search can be utilized in BoV. However, as the feature vectors are rounded off, matching quality is decreased. To improve matching quality and accuracy, researchers propose different retrieval methods and scoring algorithms on visual word level. We are going to explain some of them in Related Work section.

In this work, we extend BoV search. We propose a two-step matching technique for keypoints. In the first step, we match visual words. In the second step, we match keypoints from the cluster that is represented by the visual word matched in the first step. By doing so, we retrieve closer points with better accuracy and reduce the errors of matching keypoints at cluster borders. Thanks to the superior performance of GPUs in vector computing, online feature vector matching can be realized in real-time. The first step of keypoint matching runs 104 times faster and the second step runs 20.5 times faster than the CPU on GPUs.

GPU accelerators contain large number of processors. They are originally designed for accelerating graphics processing, but recent GPUs can be programmed for running general purpose high-performance computations. For example, NVIDIA’s CUDA GPUs, which are also the name of their C++ library for programming the hardware, recently gained great popularity in many areas of computational sciences (NVIDIA Corporation, 2012).

We use GPUs in every step for the image search: to extract keypoints, matching visual words and matching keypoints. Not only in the online processing, but we also use GPUs for offline preprocessing, i.e., generating visual words by clustering keypoints. GPUs are usually much faster than CPUs for clustering algorithms used to generate visual words. In our tests, up to 265 times speedup is gained with GPUs. Using GPUs, more time can be devoted to generate better clusters. Employing GPUs, it is possible to use larger training data for clustering, such as the actual image keypoints to be searched. It is also possible to update visual words in shorter cycles, as the images in the database change and clusters are distorted.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing