The Understanding of Spatial-Temporal Behaviors

The Understanding of Spatial-Temporal Behaviors

Yu-Jin Zhang (Tsinghua University, China)
Copyright: © 2018 |Pages: 11
DOI: 10.4018/978-1-5225-2255-3.ch115
OnDemand PDF Download:
List Price: $37.50


This chapter introduces a cutting-edge research field of computer vision and image understanding – the spatial-temporal behavior understanding. The main concepts, the focus of research, the typical technology, the fast development, etc. of this new field in recent years are overviewed. An important task in computer vision and image understanding is to analyze the scene through image operation on the image of scene in order to guide the action. To do this, one needs to locate the objects in the scene, and to determine how they change its position, attitude, speed and relationships in the space over time. In short, it is to grasp the action in time and space, to determine the purpose of the operation, and thus to understand the semantics of the information they passed. This is refereed as the understanding of spatial-temporal behaviors.
Chapter Preview


Research forces and results around such a topic are just appeared in recent years, some statistics can be seen from the survey on image engineering (Zhang, 2015c). The annual survey series of the yearly bibliographies on image engineering has started in 1995 and has been carried out for 21 years (Zhang, 2016). When the series enters its second decade (for the literature statistics of 2005), with the appearance of some new hot spots in the image engineering research and application, a new subcategories (C5): spatial-temporal technology (including 3-D motion analysis, gesture and posture detection, object tracking, behavior judgment and understanding) has been added into the image understanding category (C) (Zhang, 2006). The emphasis here is the comprehensive utilization of a variety of information possessed by the image/video in order to make the according interpretation for the dynamics of scene and objects inside.

In the past eleven years, the number of publications belong to the subcategory C5 in the annual survey series has attend a total of 153. There are five subcategories in category C, and the total number of publications belong to category C in these eleven years is 1352, so the subcategory C5 is still a small subcategory. Their distributions in each years are shown in the bars in Figure 1, in which a 3-order polynomial curve fitting to the number of publications of each year is drawn to show the change trends. Overall, this is still a relatively new field of research, so its development is not too stable, yet.

Figure 1.

Some statistics of the numbers of publications for spatial-temporal technology


Main Focus Of The Article

The definition, development, and stratification of spatial-temporal technology are first provided. Then, from low-level to high-level, the detection of points of interest, the forming of dynamic trajectory and activity path, the example techniques for action classification and recognition, as well as modeling for action and activity, are introduced consecutively. Several further research directions are discussed before some final concluding remarks are delivered.

Key Terms in this Chapter

Image Techniques: A collection of various branches of techniques for processing (such as acquiring, capturing, sensing, storing, enhancing, filtering, de-blurring, in-painting, transforming, coding, transmitting, manipulating, etc .) analyzing (such as segmenting, representing, describing, featuring, measuring, classifying, recognizing), and understanding (such as modeling, registering, matching, reconstructing, training, learning, reasoning, interpreting, etc .) images.

Image Engineering (IE): An integrated discipline/subject comprising the study of all the different branches of image and video techniques. As a general term for all image techniques, it could be considered as a broad subject encompassing mathematics, physics, biology, physiology, psychology, electrical engineering, computer science, automation, etc. Its advances are also closely related to the development of telecommunications, biomedical engineering, remote sensing, document processing, industrial applications, etc.

Image Understanding (IU): One of three layers of image engineering, which transforms data extracted from images into certain commonly understood descriptions, and makes subsequent decisions and actions according to the interpretation of the images.

Image Classification: Aims at associating different images with some semantic labels to represent the image contents abstractly. To achieve this goal, various machine learning and pattern recognition techniques could be used.

Clustering: Clustering is also called unsupervised learning and is a powerful technique for pattern classification. It is a process to group, based on some defined criteria, two or more terms together to form a large collection. In the context of image segmentation, it is often considered as the multi-dimensional extension of the thresholding technique.

Deep Learning (DL): A new area of machine learning research. It uses a cascade of many layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. It also learns multiple levels of representations that correspond to different levels of abstraction. Deep learning algorithms are based on distributed representations. The composition of a layer of nonlinear processing units used in a deep learning algorithm depends on the problem to be solved.

Machine Learning (ML): A powerful tool for pattern classification. It uses the theory of statistics in building mathematical models, and programs computers to optimize a performance criterion using example data or past experience.

Complete Chapter List

Search this Book: