Performance Analysis of GAN Architecture for Effective Facial Expression Synthesis

Performance Analysis of GAN Architecture for Effective Facial Expression Synthesis

Karthik R., Nandana B., Mayuri Patil, Chandreyee Basu, Vijayarajan R.
DOI: 10.4018/978-1-7998-6690-9.ch015
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Facial expressions are an important means of communication among human beings, as they convey different meanings in a variety of contexts. All human facial expressions, whether voluntary or involuntary, are formed as a result of movement of different facial muscles. Despite their variety and complexity, certain expressions are universally recognized as representing specific emotions - for instance, raised eyebrows in combination with an open mouth are associated with surprise, whereas a smiling face is generally interpreted as happy. Deep learning-based implementations of expression synthesis have demonstrated their ability to preserve essential features of input images, which is desirable. However, one limitation of using deep learning networks is that their dependence on data distribution and the quality of images used for training purposes. The variation in performance can be studied by changing the optimizer and loss functions, and their effectiveness is analysed based on the quality of output images obtained.
Chapter Preview
Top

Introduction

Facial expressions are a direct result of the emotions experienced by human beings and a primary component of communication between human beings. Facial expressions produced as a result of emotions are not just involuntary responses to emotional stimuli, but also a means of conveying empathy. Understanding the emotional state of a person is a key aspect of effective interpersonal communication, and has been studied in a variety of fields ranging from human development and psychology to computer science. The cognitive ability of human beings to recognize emotional cues develops early in life, right from infancy. Discerning certain emotions such as fear or disgust from facial expressions can become a complex process, as there is little consistency in the way these emotions are exhibited by different people. However, certain emotions like happiness, surprise, sadness and anger are associated with distinctive modifications of facial features and facial expressions of these emotions can easily be categorized based on these differences. In a rising era of artificial intelligence and deep learning, human-computer interaction has seen significant growth in terms of the quality of interaction, which is the result of contributions from various fields, such as linguistics and industrial disciplines.

Rapid developments in computer interfaces involving computer graphics and programming techniques has improved interaction between humans and computing systems. This process of interaction can be significantly enhanced by introducing emotional recognition ability to machines. The primary purpose of every machine is to serve the needs of human beings, and to help them accomplish certain tasks. New technologies like virtual reality (VR) and augmented reality (AR) make use of facial expression recognition and synthesis to communicate with humans in a nearly-natural manner, and to make applications appear convincing, realistic emulation of facial expressions is important. Facial expression synthesis has a variety of applications including enhancement of facial recognition as well as data augmentation. Often, the need for realistic expression synthesis arises due to lack of available actors for the same purpose, or their lack of conviction. Even when available, involvement of actors is an expensive process, given that there are a variety of alternatives for data collection. Automated emotion recognition has been the subject of extensive research in the fields of science and technology owing to its large potential in a variety of applications. In the era of Industry 4.0, business communities prefer using marketing strategies that elicit emotional responses from customers to attract them towards products and services. Corporate organizations that enforce security by means of biometric data typically employ facial recognition components trained to identify authorized personnel. These facial recognition systems can be significantly enhanced by further training to identify variations in a single person’s face that are caused as a result of different emotions.

Emotion synthesis also plays a significant role in multimedia applications such as animation, where human-based models created using software need to be animated with facial expressions as realistic as possible. Models that synthesise images with facial expressions based on a textual description have also been proposed. Traditional methods of animation require complex modelling and computer graphics techniques in order to model different expressions, and the process can be simplified if one can generate the required expressions from a single face. Besides computer animation, emotion detection and synthesis can be incorporated in robots which can then be used as test subjects or assistants in social assignments.

Key Terms in this Chapter

Deep Learning: A subset of a broader family of machine learning methods that makes use of multiple layers to extract data from raw input in order to learn its features.

Instance Normalization: A mathematical technique employed to scale numeric values in data used for training a model.

Downsampling: The process of reducing the rate at which a digital signal is processed. It is the opposite of upsampling.

Neural Network: An artificial network of nodes, used for predictive modelling. It is generally used to tackle classification problems and AI related applications.

Upsampling: The process of increasing the rate at which a digital signal is processed. It is the opposite of downsampling.

Complete Chapter List

Search this Book:
Reset