Article Preview
TopIntroduction
Visual Saliency has always been a trending and challenging research area for researchers. It is the process of identifying relevant and informative part from the entire scene or image as shown in Figure 1. The foundation of visual saliency is the keenness of knowing the prominent region in a particular area whether it is static or dynamic. It is highly investigated in computer vision for its enormous applications such as object detection and recognition (Alexe, Deselaers, & Ferrari, 2012), (Walther & Koch, 2006), content-based image retrieval (Smeulders, Worring, Santini, Gupta, & Jain, 2000), image cropping (Stentiford, 2007), image thumb nailing (Sun & Ling, 2013), retargeting (Rubinstein, Shamir, & Avidan, 2008) etc. A fundamental task in the visual saliency is to produce saliency map which can predict the prominent region or object in the scenes.
Figure 1. Original image and its visual saliency
In recent years, a pile of techniques and studies came regarding visual saliency (Ma & Zhang, 2003). Basically, the visual saliency works on attention models that concentrate on localizing the interest in the images that trap human fixations before complicated object recognition takes place. Thus, it is important to understand the human visual system which emphasis on the problem of searching salient objects with the goal to find regular foreground objects which usually can be used as interesting part for many applications.
The first model which came into existence was the task of predicting the eye-fixations on images (Liu et al., 2011), which focuses on finding where human looks in the dormant or lively scenes. The recent trend focuses more on salient object detection or salient region detection (Ma & Zhang, 2003), (Liu et al., 2011). The existing approaches focused either on absolutely low-level features or on high-level information. The results obtained using either of the features are not up to the mark as some part of background is also detected as salient region. Some of the methods which rely on the background priors also produced inappropriate results due to the object connected to the boundary of an image.
In this paper, the bottom-up and top-down models are blended to get the fine saliency map. Bottom-up models basically uses low level features to compute a proficient saliency map. On the other side, top-down models are those in which the possibility of an object being salient in an image is known. It is a task-driven approach which imposes high-level information such as semantics, color, and location to saliency map computation.
The rest of the paper is structured as follows: Section 2 contains the related work. Section 3 contains the proposed methodology. Section 4 contains the validation of the proposed method through experiments and results.
TopEarlier research in visual saliency mainly focused on the bottom-up models to find prominent region (Itti, Koch, & Niebur, 1998), (Cheng, Mitra, Huang, Torr, & Hu, 2015), (Hou & Zhang, 2007), (Zhai & Shah, 2006). Bottom-up saliency models used low-level features with biologically inspired models which computes saliency on the basis of human eye fixation (Liu et al., 2011), (Klein & Frintrop, 2011), (Itti, Koch, & Niebur, 1998), (Hou & Zhang, 2007), (Valenti, Sebe, & Gevers, 2009), (Achanta, Hemami, Estrada, & Susstrunk, 2009). These approaches have limitation as only low-level features are used and there is no involvement of high-level information for the computation of saliency maps. The absence of high-level information leads to inaccurate saliency maps. So, for getting the consistent saliency map, the background features (Zhu, Liang, Wei, & Sun, 2014) as the low-level features are blended with high-level information.