Image Processing and Machine Learning Techniques for Facial Expression Recognition

Image Processing and Machine Learning Techniques for Facial Expression Recognition

Anastasios Koutlas (University of Ioannina, Greece) and Dimitrios I. Fotiadis (University of Ioannina, Greece)
DOI: 10.4018/978-1-60566-314-2.ch016


The aim of this chapter is to analyze the recent advances in image processing and machine learning techniques with respect to facial expression recognition. A comprehensive review of recently proposed methods is provided along with an analysis of the advantages and the shortcomings of existing systems. Moreover, an example for the automatic identification of basic emotions is presented: Active Shape Models are used to identify prominent features of the face; Gabor filters are used to represent facial geometry at selected locations of fiducial points and Artificial Neural Networks are used for the classification into the basic emotions (anger, surprise, fear, happiness, sadness, disgust, neutral); and finally, the future trends towards automatic facial expression recognition are described.
Chapter Preview


The face is the fundamental part of day to day interpersonal communication. Humans use the face along with facial expressions to denote consciously their emotional states (anger, surprise, stress, etc.) or subconsciously (yawn, lip biting), to accompany and enhance the meaning of their thoughts (wink) or exchange thoughts without talking (head nodes, look exchanges). Facial expressions are the result of the deformation in a human’s face due to muscle movement. The importance of automating the task to analyse facial expressions using computing systems is apparent and can be beneficial to many different scientific subjects such as psychology, neurology, psychiatry, as well as, applications of everyday life such as driver monitoring systems, automated tutoring systems or smart environments and human-computer interaction. Although humans are able to identify changes in facial expressions easily and effortlessly even in complicated scenes, the same is not an easy task to be undertaken by a machine. Moreover, computing systems must share the same robustness and accuracy with a human so that these systems could be used in a real-world scenario and provide adequate aid.

Advances in topics such as face detection, face tracking and recognition, psychological studies as well as the processing power of modern computer systems make the automatic analysis of facial expressions possible for use with real world examples where responsiveness (i.e. real time processing) is required along with sensitivity (i.e. being able to detect various day to day emotional states and visual cues) and the ability to tolerate head movements or sudden changes.

For an effective automatic facial expression recognition (AFER) system there are several characteristics that must be present so that it can be efficient. These are outlined in the Figure 1.

Figure 1.

Structure of an automatic facial expression recognition system

Face detection and identification of prominent features is a crucial step for an AFER system. It is the first step for any system that carries the automatic tag and the performance of this step in terms of accuracy is crucial for the overall accuracy of the system. Various approaches are presented in the literature in terms of static or temporal identification of the face or identification of prominent features such as eyes in contrast to identifying the presence of a face in a scene.

When the face is located it must be modeled so that it can be represented in an appropriate manner. The facial representation could be based on the facial geometry that encompasses some unique features of homogeneity and diversion across humans. It could also be based in characteristics that appear after some transformation with mathematical expressions modeling texture, position and gray-level information. After that the feature vector is built by extracting features. It can be represented either holistically or locally. Holistic approach treats the face as a whole, i.e. the processing of the face and the mathematical information applies to the whole face without considering any special prominent features of it. On the other hand the local approach treats each prominent feature of the face in a different way and the feature extraction process is applied in selected locations in the image which are often called fiducial points. Lastly, there are systems which are related to the processing of image sequences or static images which combine the two approaches, treating the face in a hybrid manner. There is also a distinction in terms of the presence of temporal information or not.

Classification is the last step for an AFER system. The facial actions or the deformations due to facial movement are categorized either as basic emotions or as Action Units (AUs). In what follows depending on the use of temporal characteristics or not the classification process is considered temporal or static for this chapter.

This chapter introduces recent advances in automatic facial expression recognition. The first part contains an introduction to the automatic facial expression recognition systems, including their structure, their objectives and their limitations. In the second part a review of recent work, is presented related to face identification, acquisition and recognition, facial feature transformation, feature vector extraction and classification. In part three a particular approach is described along with quantitative results.

Key Terms in this Chapter

Facial Action Coding System (FACS): It is a system developed by Ekman and Friesen (1978) to categorize human expressions. Using FACS human coders can categorize all possible facial deformation into action units that describe facial muscle movement.

Action Unit (AU): The key element of FACS, each action unit describes facial deformation due to each facial muscle movement. There are a total of 44 AUs where the majority involves contraction or relaxation of facial muscles and the rest involve miscellaneous actions such as “tongue show” or “bite lip”

Point Distribution Model (PDM): It is a model that tries to form a distribution of sample points from the training set. When the PDM is constructed it can approximate the position of each model point in a new image without manual intervention.

Machine Learning: The purpose of machine learning is to extract information from several types of data automatically, using computational and statistical methods. It is the use of computer algorithms which improve automatically using experience

Basic Emotions: They are a small set of prototypic emotions which share characteristics of universality and uniformity across people with different ethnic background or cultural heritage. The six basic emotions were proposed by Ekman and Friesen (1971) and are: disgust, fear, joy, surprise, sadness and anger

Classi fication: The task that categorizes feature vectors into appropriate categories. Each category is called a class.

Feature Vector Extraction: The task of providing a feature vector that describes facial geometry and deformation. There are two ways to model facial geometry and deformation, first by using prominent features of the face and second by using a mathematical transformation so that changes in appearance are modeled

Image Processing: The analysis of an image using techniques that can identify shades, colors and relationships which cannot be perceived by the human eye

Complete Chapter List

Search this Book: