Multi-Resolution Salient Region Detection in Frequency Domain

Multi-Resolution Salient Region Detection in Frequency Domain

Yingchun Guo, Yanhong Feng, Gang Yan, Shuo Shi
DOI: 10.4018/ijapuc.2014010105
(Individual Articles)
No Current Special Offers


Salient region detection is a challenge problem in computer vision, which is useful in image segmentation, region-based image retrieval, and so on. In this paper we present a multi-resolution salient region detection method in frequency domain which can highlight salient regions with well-defined boundaries of object. The original image is sub-sampled into three multi-resolution layers, and for each layer the luminance and color salient features are extracted in frequency domain. Then, the significant values are calculated by using invariant laws of Euclidean distance in Lab space and the normal distribution function is used to specify the salient map in each layer in order to remove noise and enhance the correlation among the vicinity pixels. The final saliency map is obtained by normalizing and merging the multi-resolution salient maps. Experimental evaluation depicts the promising results from the proposed model by outperforming the state-of-art frequency-tuned model.
Article Preview


Human visual system has a remarkable ability to pay more attention to some salient regions or objects in natural scenes, due to the fact that these regions or objects have conspicuous difference with surrounding in color, intensity, gradient, and so on. This is visual attention mechanism of human beings, and many studies have tried to build computational models to simulate this mechanism (Borji & Itti, 2013). There are many applications for visual attention, for example, automatic image cropping (Santella, Agrawala, DeCarlo, Salesin, & Cohen, 2006), adaptive image display on small devices (Chen, Xie, Fan, Ma, Zhang & Zhou, 2003), image/video compression, advertising design (Itti, L. 2000), and image collection browsing (Rother, Bordeaux, Hamadi & Blake, 2006). Recent studies (Navalpakkam & Itti, 2006; Rutishauser, Walther, Koch, & Perona, 2004) demonstrated that visual attention helps object recognition, tracking, and detection as well.

There are two types of methods to detect salient regions: one is based on spatial domain and the other is based on frequency domain. One of the earliest computational models of visual attention in spatial domain was proposed by Itti, Koch & Niebur (1998). The algorithm obtains the saliency map based on the intensity, color, and orientation conspicuity maps. These conspicuity maps are attained by across-scale addition of feature maps, while the feature maps capture the center-surround differences between various Gaussian pyramid and oriented pyramid scales. The saliency map of this method is useful in providing the locations of important regions in a given visual scene but is terribly low in resolution. Achanta, Estrada, Wils and Süsstrunk (2008) propose a salient region detection method, by this method a difference-of-means filter is used to estimate center-surround contrast. The lowest frequencies retained depend on the size of the largest surround filter and the highest frequencies depend on the size of the smallest center filter. So, method AC effectively retains the full resolution. Several other saliency models have been proposed to the research saliency using a graph representation of images (Harel, Koch, & Perona, 2007; Gopalakrishnan, Hu & Rajan 2009). In the Graph-Based Visual Saliency (GBVS) algorithm (Harel, Koch, & Perona, 2007), the edges of a graph are used to denote similarity between two nodes (pixels). Random walks are then performed on these nodes and the more a node is visited, the more salient it is to be. Goferman, Zelnik-Manor and Tal (2012) proposed a context-aware (CA) saliency computation approach by employing the color and position information of each image pixel, which can extract the salient objects and also reserve their surrounding regions. However, they calculate the saliency of each pixel by considering the patch dissimilarity of K most similar patches, which leads to high time complexity. The basis pitfall of these above mentioned methods is that they need calculate the global region or local region saliency for each pixel, so suffer from computational complexity, ad hoc design choices and over-parameterization, also has lower resolution when compared to original images. These drawbacks often arise from failing to exploit appropriate spatial frequency content of the original image, as analyzed by Achanta, Hemami, Estrada and Susstrunk (2009).

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing