The facial expression has long been an interest for psychology, since Darwin published The expression of Emotions in Man and Animals (Darwin, C., 1899). Psychologists have studied to reveal the role and mechanism of the facial expression. One of the great discoveries of Darwin is that there exist prototypical facial expressions across multiple cultures on the earth, which provided the theoretical backgrounds for the vision researchers who tried to classify categories of the prototypical facial expressions from images. The representative 6 facial expressions are afraid, happy, sad, surprised, angry, and disgust (Mase, 1991; Yacoob and Davis, 1994). On the other hand, real facial expressions that we frequently meet in daily life consist of lots of distinct signals, which are subtly different. Further research on facial expressions required an object method to describe and measure the distinct activity of facial muscles. The facial action coding system (FACS), proposed by Hager and Ekman (1978), defines 46 distinct action units (AUs), each of which explains the activity of each distinct muscle or muscle group. The development of the objective description method also affected the vision researchers, who tried to detect the emergence of each AU (Tian et. al., 2001).
6.1 Generalized Discriminate Analysis
Generalized discriminant analysis (GDA) (Baudat and Anouar, 2000) generalizes the linear discriminant analysis (LDA) (Duda et. al., 2000) for the linearly inseparable data using the kernel technique (Cristianini and Taylor, 2000). The main idea of kernel technique is to use the non-linear mapping function that maps the linearly inseparable input data into a high dimensional feature space where the mapped feature vectors can be linearly separable.
A nonlinear mapping function Φ maps an input data x in the data space x into the high dimensional feature space F as . Then, the covariance matrix in the feature space F can be written as (1)
We assume that the observations are centered in feature space F. Then, the between class scatter matrix in the feature space is represented as (2)
is the mean vector of the l
-th class in the feature space.
The goal of GDA is to maximize the between-class scatter and minimize the within-class scatter in the feature space. The solution can be found by solving the following generalized eigenvalue problem as (3)
where λ and v are the corresponding eigenvalue and eigenvector. The largest eigenvalue gives the maximum of the following discrimination measure as (4)
For a more compact formulation in the matrix form, the original eigenvalue resolution can be rewritten by multiplying as (Baudat and Anouar, 2000), (5)
For more derivation, we consider the following facts. First, the eigenvectors are expressed by a span of all observations in the feature space F as (6)
Second, the inner product in the feature space F can be expressed in a compact form by the help of a kernel function (7)
For a condense notation, we define a coefficient matrix (8)
and a M
kernel matrix (9)
) is a np
matrix that is composed of the inner product in the feature space F by
, and a M
block diagonal matrix by
, where Wl
is a nl
matrix with all terms equal to 1/nl
By combining Eq. (6) and Eq. (7), the discrimination measure J(x) is modified into a new form J(α) as