Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Image Captioning

Deep Learning Research Applications for Natural Language Processing
The process of creating a meaningful and coherent vector of words that best describes an input image is known as image captioning.
Published in Chapter:
Automatic Image Captioning Using Different Variants of the Long Short-Term Memory (LSTM) Deep Learning Model
Ritwik Kundu (Vellore Institute of Technology, Vellore, India), Shaurya Singh (Vellore Institute of Technology, Vellore, India), Geraldine Amali (Vellore Institute of Technology, Vellore, India), Mathew Mithra Noel (Vellore Institute of Technology, Vellore, India), and Umadevi K. S. (Vellore Institute of Technology, Vellore, India)
DOI: 10.4018/978-1-6684-6001-6.ch008
Abstract
Today's world is full of digital images; however, the context is unavailable most of the time. Thus, image captioning is quintessential for providing the content of an image. Besides generating accurate captions, the image captioning model must also be scalable. In this chapter, two variants of long short-term memory (LSTM), namely stacked LSTM and BiLSTM along with convolutional neural networks (CNN) have been used to implement the Encoder-Decoder model for generating captions. Bilingual evaluation understudy (BLEU) score metric is used to evaluate the performance of these two bi-layered models. From the study, it was observed that both the models were on par when it came to performance. Some resulted in low BLEU scores suggesting that the predicted caption was dissimilar to the actual caption whereas some very high BLEU scores suggested that the model was able to predict captions almost similar to human. Furthermore, it was found that the bidirectional LSTM model is more computationally intensive and requires more time to train than the stacked LSTM model owing to its complex architecture.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR