Article Preview
Top1. Introduction
In recent years, a growing number of visual aesthetic works have been created, saved, and transmitted with digital tools, such as paintings, films, videos, and computer games. That makes it harder for people to appreciate them artificially. People started hoping that if computers or robots can understand and discover the beauty in the image as humans. Therefore, Image Aesthetics Quality Assessment (IAQA) has been a hot research topic in image analysis during the last decade.
Most previous studies of IAQA try to classify images into two categories: high aesthetic quality and low aesthetic quality for a specific photo (Karpathy, 2017; Lu, 2018; Mikolov, 2013). Another popular assessment task is to score photos with continuous numbers (Vinyals, 2014; Xu, 2015). For people, especially human artists, when shown a photo or a drawing, they often judge it by the complex content rather than just the simple numbers. To the best of our knowledge, there is little research on IAQA with a comment type, namely image aesthetic caption or description, due to lacking appropriate datasets, aesthetic models, and so on (Chang, 2017; Jin, 2019; Wang, 2018). Most previous researches on captioning of photo (Jin, 2018; Kiros, 2014; Le,2014; Marchesotti, 2011; Wang, 2012; Wang, 2008; Xu, 2016) focus on the description of objects in pictures or the interrelationship between them, without involving the description of aesthetics.
Figure 1. Aesthetic Description of Images
In this work, we propose the Deep Image Aesthetic Reviewer (DIAReviewer) based on Semantic Addition Transformer Model for image aesthetic description. This model can predict an aesthetically relevant text description for the image, as shown in Figure 1. Dpchallenge (https://www.dpchallenge.com) is a photographer forum. In this powerful community. Many outstanding photographers share their shooting process and skills with pictures and texts. By crawling pictures and comments from the DPChallenge, filtering, and removing the non-sentence and congratulation-like comments manually, we built an image aesthetic description Dataset, called Aesthetic Review Data (ARD), which contains one or more aesthetic comments for each image. It contains 19,405 images and 22,931 comments. We train the DIAReviewer on ARD. We evaluate our review model on ARD using the most popular Bilingual Evaluation Understudy (BLEU) method cited by several image description articles. The contributions of our work include:
- •
We introduce a novel image dataset named ARD for image aesthetic description. Each image in the ARD corresponds to one or more aesthetic comments. These comments may be complete sentences or just simple words or phrases.
- •
We propose a DIAReviewer network framework based on Semantic Addition Transformer Model, which is specially designed for compromising the image feature and semantic information and generate aesthetic critiques for images. Furthermore, the generated sentences are more smooth and favorable for humans.
- •
We improve the traditional Convolutional Neural Networks (CNN) by adding two attention mechanisms and residual learning. The results prove this improvement can reduce the loss of picture characteristics, enrich the content of critiques, and improve the fluency of sentences.