Article Preview
TopIntroduction
From the past few decades, tremendous growth can be seen in the computer vision community. Researchers have provided optimal solutions in different vision based tasks like image classification, object detection, object labelling, saliency estimation, image compression and many more (Lu, 2007; Verschae, 2015; Messer, 2017). Almost all the vision-based applications include a basic step of segmenting an image into meaningful regions, which is basically the process of linking each pixel in an image with a class label. Although many optimal solutions have been provided till date for segmenting an image (Guo, 2018; Zaitoun, 2015), due to unpredictable real world situations and dependency of a majority of vision applications on this step, segmentation of image is still an open research problem for the computer vision community.
In this study, our focus is on performing the semantic segmentation for scene understanding. Semantic segmentation is a process of assigning a meaningful label to each pixel based on the context of the environment (Lateef, 2019). It is a very useful step for a variety of computer vision applications where it is important to understand the context of the operating environment in which the systems are operating, for example in robotics (Kim, 2018), self-driving cars (Kaymak, 2019) etc. Further, scene understanding is a computer vision application which includes analysis and perception of an image of the scene to create an overview of the event depicted in the scene (Xiao, 2013). A scene shows a real-world situation which is extracted from the environment. It includes multiple objects which are interacting with each other, thereby having some meaning. A scene can represent a variety of real-world events ranging from personal events to public events. Scene understanding is the process of interpreting scenes, which are captured through devices like cameras, microphones, contact sensors etc. to get an in depth understanding of it (Aarthi, 2017). The data of a scene can be expressed using various features like color, texture, and light intensity, thus, the process of creating a good understanding of a scene includes proper extraction of features from an image of a scene that characterizes it efficiently. It is based on the idea of vision and cognition, in which the functionality of detection, localization, recognition and understanding is performed first, followed by cognition, which is used to add functionalities like learning, adaption, finding alternatives, interpretation and analysis. The models that perform scene understanding include the capability to analyze events and modify accordingly. It can adapt to unforeseen data and perform robustly in such situations (Li, 2009). Scene understanding is included in machines to make them capable of interpreting events in a similar manner as humans do.