Study and Analysis of Visual Saliency Applications Using Graph Neural Networks

Study and Analysis of Visual Saliency Applications Using Graph Neural Networks

Gayathri Dhara, Ravi Kant Kumar
Copyright: © 2023 |Pages: 24
DOI: 10.4018/978-1-6684-6903-3.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

GNNs (graph neural networks) are deep learning algorithms that operate on graphs. A graph's unique ability to capture structural relationships among data gives insight into more information rather than by analyzing data in isolation. GNNs have numerous applications in different areas, including computer vision. In this chapter, the authors want to investigate the application of graph neural networks (GNNs) to common computer vision problems, specifically on visual saliency, salient object detection, and co-saliency. A thorough overview of numerous visual saliency problems that have been resolved using graph neural networks are studied in this chapter. The different research approaches that used GNN to find saliency and co-saliency between objects are also analyzed.
Chapter Preview
Top

Introduction

Overview of Visual Attention

The human brain is extremely efficient at assembling information about the environment in real time. We constantly collect information about our surroundings through our five senses, but the deeper layers of the brain do not deal with all the inbound sensory information. Humans are capable of quickly identifying the most interesting points in a scene based on external visual stimuli. A critical aspect of computer vision is identifying the most salient pixels or regions in an image. We perceive any type of information with varying levels of attention and involvement because the majority of arriving sensory information is filtered away by our brains. Even a highly sophisticated biological brain would find it as a challenging task to positively identify all interesting targets in its visual field. A solution, which is used by humans, is to break up the entire visual field into smaller parts. This serialization of visual scene analysis is facilitated by visual attention mechanisms. Each region is easier to analyze and can be processed separately. A pixel, object, or person with high visual saliency captures our attention when compared with its neighbors.

“Visual attention” is a cognitive process involved in selecting relevant information from cluttered visual scenes and filtering out irrelevant data from them. There are two sources of visual attention: bottom-up, pre-attentive saliency of the retinal input, and slower, top-down, memory, and volition-based processing based on a task.

Visual Salience

A visual salience (or visual saliency) is the distinct subjective perceptual quality that measure how likely human eyes will fixate on that area which makes some items in the world stand out from their neighbors and immediately grab our attention, that are visually salient stimuli. Humans are uniquely capable of determining salient objects (attention centers) visually more accurately and quickly than any machine. Salient object detection (SOD) is used by machines to solve this problem.

What Does Saliency Object Detection (SOD) Mean?

“A technique used to analyze image surroundings and to extract the impressive parts from the background is termed as saliency detection”. Salient object detection is an important task inspired by the human visual attention mechanism and is utilized by machines to overcome the challenge of visual attention by humans. The significance of SOD in computer vision applications stems from its ability to minimize computing complexity (Ahmed et al., 2022).

Co-Saliency Mean (Co-SOD)

Co-salient object detection (Co-SOD) is a recently developing and flourishing branch of SOD. In contrary to focusing and computing the saliency of only one image, the algorithms of Co-SOD focus on detecting the salient objects which are common in multiple input images. Detecting co-saliency between associated images entails finding common salient regions between them. Traditional methods of salient object detection only require one input image, but co-salient detection techniques require a group of images (Zhang et al., 2018a). In co-saliency detection, the main challenge is to exploit both intra- and inter-image salient cues simultaneously. Unlike traditional saliency detection tasks, which only consider intra-image saliency, this approach focuses on inter-image saliency.

Key Terms in this Chapter

Visual Saliency: The degree to which a specific location or region in an image or video stands out and attracts attention.

Node/Vertex: A representation of an object in a graph.

Co-Saliency Detection: A process of detecting and segmenting common salient objects or regions in multiple images.

Top-Down Saliency: Saliency that is driven by high-level factors such as task demands and prior knowledge.

Attention: A mechanism that allows a neural network to focus on specific parts of an input.

Graph Neural Network (GNN): A type of neural network designed to work with graph data structures, where the nodes and edges in a graph are used as input and output.

Graph Convolutional Network (GCN): A type of GNN that uses convolutional layers to learn features of the nodes in a graph.

Co-Saliency Dataset: A collection of multiple images that share common salient objects or regions, used for training and evaluation of co-saliency models.

EDGE: A representation of the relationship between two nodes in a graph.

Co-Saliency Pooling: A technique for aggregating information from multiple images to generate a co-saliency map.

Saliency: A property of visual stimuli that makes them stand out from their surroundings and attract attention.

Co-Saliency: A property of multiple images that makes them share common salient objects or regions.

Co-Saliency Integration: A technique for integrating co-saliency information with other computer vision tasks, such as object recognition and segmentation.

Graph: A data structure that represents objects (nodes) and their relationships (edges).

Message Passing: A process by which information is passed between nodes in a graph.

Saliency Map: A map that represents the degree of saliency of each location or region in an image or video.

Visual Attention: A type of attention that focuses on specific parts of an image or video.

Co-Saliency Map: A map that represents the degree of co-saliency of each location or region in multiple images.

Bottom-Up Saliency: Saliency that is driven by low-level features of the input, such as color, brightness, and orientation.

Complete Chapter List

Search this Book:
Reset