Improving the Supervised Learning of Activity Classifiers for Human Motion Data

Improving the Supervised Learning of Activity Classifiers for Human Motion Data

Liyue Zhao (University of Central Florida, USA), Xi Wang (University of Central Florida, USA) and Gita Sukthankar (University of Central Florida, USA)
DOI: 10.4018/978-1-4666-3682-8.ch014
OnDemand PDF Download:
No Current Special Offers


The ability to accurately recognize human activities from motion data is an important stepping-stone toward creating many types of intelligent user interfaces. Many supervised learning methods have been demonstrated for learning activity classifiers from data; however, these classifiers often fail due to noisy sensor data, lack of labeled training samples for rare actions and large individual differences in activity execution. In this chapter, the authors introduce two techniques for improving supervised learning of human activities from motion data: (1) an active learning framework to reduce the number of samples required to segment motion traces, and (2) an intelligent feature selection technique that both improves classification performance and reduces training time. They demonstrate how these techniques can be used to improve the classification of human household activities, an area of particular research interest since it facilitates the development of elder-care assistance systems to monitor household occupants.
Chapter Preview


Human activity recognition has become an increasingly important component of many domains such as user interfaces and video surveillance. In particular, enabling ubiquitous home user assistance systems for elder care requires rapid and robust recognition of human action from portable sensor data. Motion trajectories, gathered from video, inertial measurement units, or mocap, are a critical cue for identifying activities that require gross body movement, such as walking, running, falling, or waving. Human motion data typically needs to be segmented into activities to be utilized by any application. A common processing pipeline for motion data is:

  • 1.

    Segment data into short time windows;

  • 2.

    Recognize low-level human activities from repetitive patterns of motion executed by the human user within a time window;

  • 3.

    Identify a high-level intention or plan from sequences of activities.

For instance, one possible high-level household activity would be “baking pizza” which would consist of low-level activities such as “beating an egg” or “kneading dough” which could be recognized by the motion patterns and objects manipulated.

Although domain knowledge and common-sense reasoning methods are important for reasoning about the human's high level intentions, segmentation and activity classification have been successfully addressed by a variety of data-driven approaches, including supervised classifiers, such as support vector machines, hidden Markov models, dynamic Bayes nets, and conditional random fields. In the best case, supervised learning can yield classifiers that are robust and accurate. However, two problems frequently occur in supervised learning settings:

  • Lack of Data: Gathering and labeling the data is time-consuming and expensive. In some cases, the activities are highly repetitive in nature (stirring), whereas other actions are infrequent and short in duration (opening the refrigerator). To classify these short actions, learning techniques need to be sample-efficient to leverage relatively small amounts of labeled training data.

  • Feature Selection: Sensors yield data that is both noisy and high-dimensional. Learning classifiers based on the raw sensor data can be problematic and applying arbitrary dimensionality reduction techniques does not always yield good results.

In this chapter, we present a case study of how we addressed these problems while performing segmentation and activity recognition of human household actions. First, we introduce an active learning method in which the classifier is initialized with training data from unsupervised segmentation and improved by soliciting unlabeled samples that lie closest to the classification hyperplane. We demonstrate that this method can be used to reduce the number of samples required to classify motion capture data using Support Vector Machine (SVM) classifiers.

Second, we present a method to improve classification through intelligent feature selection. The signal data is converted into a set of motifs, approximately repeated symbolic subsequences, for each dimension of IMU data. These motifs leverage structure in the data and serve as the basis to generate a large candidate set of features from the multi-dimensional raw data. By measuring reductions in the conditional log-likelihood error of the training samples, we can select features and train a Conditional Random Field (CRF) classifier to recognize human actions.

Our techniques were evaluated using the CMU Multimodal Activity database (De la Torre Frade et al., 2008) which was collected to facilitate the comparison of different activity recognition techniques for recognizing household activities. The dataset contains video, audio, inertial measurement unit (IMU), and motion capture data; we demonstrate the utility of our techniques on segmenting motion capture data and recognizing Inertial Measurement Unit (IMU) data.



In this section, we give an overview of the concepts that our approach relies on (1) active learning, (2) feature selection, and (3) motif detection, in addition to a detailed discussion of the operation of the conditional random field classifier.

Complete Chapter List

Search this Book: