Convolutional Neural Networks for Real-Time Eye Tracking in Interactive Applications

Convolutional Neural Networks for Real-Time Eye Tracking in Interactive Applications

Michael Burch, Andrei Jalba, Carl van Dueren den Hollander
DOI: 10.4018/978-1-7998-5077-9.ch022
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Face alignment and eye tracking for interactive applications should be performed with very low latency or users will notice the delay. In this chapter, a face alignment method for real-time applications is introduced featuring a convolutional neural network architecture for face and pose alignment. The performance of the novel method is compared to a face alignment algorithm included in the freely available OpenFace toolkit, which also focuses on real-time applications. The approach exceeds OpenFace's performance on both our own and the 300W test sets in terms of accuracy and robustness but requires significant parallel processing power, currently provided by the GPU. For the eye tracking application, stereo cameras are used as input to determine the position of a user's eyes in three-dimensional space. It does not require synchronized recordings, which may contain redundant information, and instead prefers staggered recordings, which maximize the number of possible model updates.
Chapter Preview
Top

Recently, convolutional neural networks (CNNs) have gained popularity for many novel face alignment methods (Jin and Tan, 2016b). For example, Jin and Tan (2016a) use a CNN as a score function to evaluate the quality of previously acquired shape estimates.

The approaches by Zhu et al. (2016) and Jourabloo and Liu (2016) focus on dense model alignment. Wan et al. (2020) discuss a novel CRD algorithm to handle face de-occlusion and face alignment problems simultaneously, while Jiang et al. (2019) describe an efficient end-to-end 3D face alignment framework. Our goal, however, is to abstract from physical appearances and to determine the position of specific landmarks. One of the mentioned neural network methods is a two-stage convolutional part heatmap regression for the first 3D face (Bulat and Tzimiropoulos, 2016a, 2016b). Both approaches are predecessors of the hourglass-shaped neural networks for face alignment (Bulat and Tzimiropoulos, 2017a, 2017b) that the method described in this book chapter is also based on.

Another neural network-based approach by Zhang et al. (2016) includes additional attributes in the training data, such as whether a person is smiling to improve landmark prediction accuracy. The training process is carefully controlled. For example, a type of early stopping for easy-to-infer attributes is applied such that the network does not get stuck in a local minimum and is able to reach its optimal performance.

Key Terms in this Chapter

Convolutional Neural Network: A type of artificial neural network used in image recognition and processing tasks, designed for processing pixel data.

Eye Tracking: The process of measuring the point of gaze.

Face Alignment: A computer vision task meant to identify geometric structures of human faces in digital images.

Interactive Application: An application that enhances user experience by putting the user in control.

Complete Chapter List

Search this Book:
Reset