ISCG: An Intelligent Sensing and Caption Generation System for Object Detection and Captioning Using Deep Learning

ISCG: An Intelligent Sensing and Caption Generation System for Object Detection and Captioning Using Deep Learning

Aahan Singh (Ramaiah Institute of Technology, India), Nithin Nagaraj (Consciousness Studies Programme, National Institute of Advanced Studies, India), Srinidhi Hiriyannaiah (Ramaiah Institute of Technology, India), and Lalit Mohan Patnaik (Consciousness Studies Programme, National Institute of Advanced Studies, Bangalore, India)
Copyright: © 2020 |Pages: 17
DOI: 10.4018/IJIIT.2020100104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Artificial intelligence has paved the way for different areas of computing such as speech recognition and translation, object detection, machine translation, and others. One of the goals of artificial general intelligence is to simulate human thinking and rationality within machines such that they are able to perceive their environment and then perform reasonable actions based on their perception. Creating a single model that performs every single task from visual perception to actuation is currently impossible. The system must be divided into several models each of which functions independently as well also contribute to the operation of the whole intelligent machine. In this paper, an intelligent sensing and caption generation (ISCG) system is proposed which is capable of detecting living/non-living objects and states of motion in images. The system consists of two separate modules of caption generator and intelligence engine with a Convolutional Neural Network (CNN) for determining the different objects in the images. Our model yields state-of-the-art performance on benchmarked dataset.
Article Preview
Top

1. Introduction

Artificial Neural networks (ANNs) have become popular now a days and is used for various applications such as classification, clustering, pattern recognition and prediction. It is relatively more competitive and yields better results as compared to the conventional machine learning (ML) techniques (V.S. Dave et al, 2014). ANNs are very useful for the development of the applications such as image recognition, natural language processing, speech recognition, machine translation etc. (N. Izeboudjen et al, 2014). The important advantages of using ANNs are self-learning, fault tolerance, capture non-linearity and the advances in the input-output mapping (D. Wang et al, 2018). It also makes the ease of using the model in complex natural systems with large inputs (Mahanta, J., 2017). The motivation of ANN is it can be compared to the human brain that performs a given task of interest. For example human brain is capable of remembering objects and recognize the semantics behind them (S. Haykin, 2009). The same idea can be extended to ANNs in development of the object detection applications.

Object detection applications helps in identifying the objects using object models that are known a priori. Labeling of objects is one of the important challenges of object detection applications. For a given image, there are different objects of interest and labeling each object requires an intelligence mechanism. A set of correct labels need to be assigned to the objects in the given image. The term detection can be used for functions such as identification, categorization and discrimination. In recent studies (A. Babenko et al, 2014,) (J. Wan et al, 2014,) (Zou, X. et al, 2019) on object detection, the different labels to objects in the image are identified. However, the other essential elements such as motion, living or non living objects is not identified and tagged. The motivation of this paper lies in identifying such essential elements in the given image. Tagging living/non-living objects (and the presence/absence of motion) in the input image could be great value in security-related applications for threat identification.

The Intelligent Sensing and Caption Generation (ISCG) system proposed in this work is able to detect life and motion in addition to detecting objects within the image. Multiple metrics are available which measure the quality of generated sentences. Here the authors propose a new metric which measures the intelligence level of the captioning model. Intelligence of the model can be measured by the set of words used in the generated caption. The proposed method looks for verbs that describe whether an action is being performed in the image. It also looks for words that describe whether the object is an inanimate object or not and thereby whether the object is living or not. Each criterion when satisfied is used as results in a score being assigned to the model. Combining scores for all criteria gives the intelligence level of the system.

The main contributions of the paper are listed as follows:

  • An ISCG model based on the CNN and LSTM for recognition of different entities in an image.

  • An Intelligent score is assigned to each image based on the entities discovered in the image.

  • The model has been compared with state of art methods using the benchmarked dataset FlickR 8K. It gives better caption compared to other methods.

The rest of the paper is organized as follows. In section 2, related work pertaining to object detection is discussed. The IGSA proposed system is discussed in section 3. It is followed by a discussion of the estimation of the scores for intelligent object detection and finally the results are discussed in the last section.

Complete Article List

Search this Journal:
Reset
Volume 21: 1 Issue (2025)
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing