Text-Based Image Retrieval Using Deep Learning

Text-Based Image Retrieval Using Deep Learning

Udit Singhania (Vellore Institute of Technology, India) and B. K. Tripathy (Vellore Institute of Technology, India)
Copyright: © 2021 |Pages: 11
DOI: 10.4018/978-1-7998-3479-3.ch007


This chapter is mainly an advanced version of the previous version of the chapter named “An Insight to Deep Learning Architectures” in the encyclopedia. This chapter mainly focusses on giving the insights of information retrieval after the year 2014, as the earlier part has been discussed in the previous version. Deep learning plays an important role in today's era, and this chapter makes use of such deep learning architectures which have evolved over time and have proved to be efficient in image search/retrieval nowadays. In this chapter, various techniques to solve the problem of natural language processing to process text query are mentioned. Recurrent neural nets, deep restricted Boltzmann machines, general adversarial nets have been discussed seeing how they revolutionize the field of information retrieval.
Chapter Preview


Recurrent neural nets belong to the class of artificial neural networks (ANNs) designed specifically to recognize patterns in the sequences of data, such as text, handwriting, speech, or numerical time-series data which can be gathered from sensors, stock markets, and government agencies. These algorithms have a major focus on time and sequence, looking towards the temporal dimension. They are considered to be one of the most powerful and useful types of a neural network, alongside the attention mechanism and memory networks. RNNs are applicable even to images, which can be decomposed into a series of patches and treated as a sequence. Traditional neural nets could not use the sense of persistence, they failed to perceive the future input because of not able to remember the sequence in which the input was fed, these neural nets have basically two inputs, first they take the input of the previous activation function of the previous input and also consider the present input and they apply activation function of the resultant to get new activation function and predict the output. This can be understood by the figure drawn below.

Figure 1.



Here, RNN is being unrolled (or unfolded) into a full network. The formulas that govern the computation happening during a RNN area unit as follows:

Key Terms in this Chapter

Recurrent Neural Networks: A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior.

Deep Boltzmann Machine: A deep Boltzmann machine (DBM) is a type of binary pairwise Markov random field (undirected probabilistic graphical model) with multiple layers of hidden random variables.

General Adverserial Network: General adverserial network (GAN) is a deep learning, unsupervised machine learning technique. It has a generator and a discriminator. Generator generates the new data and discriminator discriminates between generated input and the existing input so that to rectify the output.

Image Retrieval: An image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images. Most common utilized methods are adding metadata to captioning, keywords etc.

Convolutional Neural Networks: A multi-layer neural network similar to artificial neural networks only differs in its architecture and mainly built to recognize visual patterns from image pixels.

Long Short-Term Memory: Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.

Complete Chapter List

Search this Book: