Image Aesthetic Description Based on Semantic Addition Transformer Model

Image Aesthetic Description Based on Semantic Addition Transformer Model

Kai Wang, Shasha Lv, Yongzhen Ke, Jing Guo, Ruikun Wang
DOI: 10.4018/IJCINI.20211001.oa14
Article PDF Download
Open access articles are freely available for download

Abstract

Image aesthetic quality assessment has been a hot research topic in the field of image analysis during the last decade. Most recently, people have proposed comment type assessment to describe the aesthetics of an image using text automatically. However, existing works have rarely considered the quality of the aesthetic description. In this work, we propose a novel neural image aesthetic description network framework, named Deep Image Aesthetic Reviewer (DIAReviewer), based on Semantic Addition Transformer Model, the learning of Residual Network, and the Attention Mechanism in a single framework. Beyond that, we design a Semantic Addition module to compromise the image feature and semantic information to focus on the comment quality, such as fluency and complexity. We introduce a new image dataset named Aesthetic Review Dataset (ARD), which contains one or more aesthetic comments for each image. Finally, the experimental results on ARD show that our model outperforms other methods in content complexity and sentence fluency of aesthetic descriptions.
Article Preview
Top

1. Introduction

In recent years, a growing number of visual aesthetic works have been created, saved, and transmitted with digital tools, such as paintings, films, videos, and computer games. That makes it harder for people to appreciate them artificially. People started hoping that if computers or robots can understand and discover the beauty in the image as humans. Therefore, Image Aesthetics Quality Assessment (IAQA) has been a hot research topic in image analysis during the last decade.

Most previous studies of IAQA try to classify images into two categories: high aesthetic quality and low aesthetic quality for a specific photo (Karpathy, 2017; Lu, 2018; Mikolov, 2013). Another popular assessment task is to score photos with continuous numbers (Vinyals, 2014; Xu, 2015). For people, especially human artists, when shown a photo or a drawing, they often judge it by the complex content rather than just the simple numbers. To the best of our knowledge, there is little research on IAQA with a comment type, namely image aesthetic caption or description, due to lacking appropriate datasets, aesthetic models, and so on (Chang, 2017; Jin, 2019; Wang, 2018). Most previous researches on captioning of photo (Jin, 2018; Kiros, 2014; Le,2014; Marchesotti, 2011; Wang, 2012; Wang, 2008; Xu, 2016) focus on the description of objects in pictures or the interrelationship between them, without involving the description of aesthetics.

Figure 1.

Aesthetic Description of Images

IJCINI.20211001.oa14.f01

In this work, we propose the Deep Image Aesthetic Reviewer (DIAReviewer) based on Semantic Addition Transformer Model for image aesthetic description. This model can predict an aesthetically relevant text description for the image, as shown in Figure 1. Dpchallenge (https://www.dpchallenge.com) is a photographer forum. In this powerful community. Many outstanding photographers share their shooting process and skills with pictures and texts. By crawling pictures and comments from the DPChallenge, filtering, and removing the non-sentence and congratulation-like comments manually, we built an image aesthetic description Dataset, called Aesthetic Review Data (ARD), which contains one or more aesthetic comments for each image. It contains 19,405 images and 22,931 comments. We train the DIAReviewer on ARD. We evaluate our review model on ARD using the most popular Bilingual Evaluation Understudy (BLEU) method cited by several image description articles. The contributions of our work include:

  • We introduce a novel image dataset named ARD for image aesthetic description. Each image in the ARD corresponds to one or more aesthetic comments. These comments may be complete sentences or just simple words or phrases.

  • We propose a DIAReviewer network framework based on Semantic Addition Transformer Model, which is specially designed for compromising the image feature and semantic information and generate aesthetic critiques for images. Furthermore, the generated sentences are more smooth and favorable for humans.

  • We improve the traditional Convolutional Neural Networks (CNN) by adding two attention mechanisms and residual learning. The results prove this improvement can reduce the loss of picture characteristics, enrich the content of critiques, and improve the fluency of sentences.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 1 Issue (2022)
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing