Article Preview
TopIntroduction
During the last few years, video streaming demand has gone through an incredible increase due to the immense expansion of multimedia communications. To tackle the resulting congestion of the enormous volume of data over the networks, more bandwidth, and reliable communications are required. This has led to a considerable amount of research within the field of video streaming, video compression, Quality of Service (QoS), and real-time traffic supporting. To overcome the video streaming problem from a network traffic perspective, we decided to utilize a semantic video analysis mechanism known as a saliency detection technique and we introduced gradual saliency concept. Regions of interest or salient areas in an image or video play an important role in the semantic analysis of visual data. Saliency detection is widely exploited in many applications such as content-based image/video retrieval, scene understanding, video surveillance, video summarization, event detection, and image/video compression. The visual attention system includes the procedure of selecting the significant and interesting areas across visual information that humans receive in daily life.
We introduced the gradual saliency concept to distinguish different classes of saliency in a video frame. In this way, the most important and informative regions, i.e., salient regions are extracted to reduce the volume of video data in transmission. Our main goal is to provide a guidance for the encoder in order to decide what information need to be dropped and what information should form different video coding layers.
The Human Visual System (HVS) can process this information rapidly and focus on the distinct parts of a scene. Studies indicated that the effective factors on visual attention and eye movements are categorized into bottom-up and top-down categories (Healey et al., 2012; Duncan et al., 2012). Bottom-up factors capture unconscious attention very quickly and have a strong impact on the human visual selection system. On the other hand, top-down factors capture the attention much slower and require previous knowledge about the scene. Saliency detection models or Visual Attention Models (VAMs) employ bottom-up and/or top-down factors to search for the salient part of visual data. Bottom-up based models use low-level features such as color, texture, size, contrast, brightness, position, motion, orientation, and shape of objects (Duncan et al., 2012). However, top-down based models exploit high-level context-dependent features such as a face, human, animal, vehicle, text, etc. Both bottom-up and top-down factors can be exploited to design VAMs but because of the complexity and time limitation, few hybrid approaches have been proposed that use both factors. For a real-time video streaming application, it is essential to achieve a fast and simple saliency detection method with the aim of effectively controlling the network traffic. Therefore, we decided to avoid any cognitive bias in designing our model in order to speed up the procedure of saliency detection. Because considering cognitive bias requires machine learning based algorithms which are time consuming.