Querying of Time Series for Big Data Analytics

Querying of Time Series for Big Data Analytics

Vasileios Zois (University of Southern California, USA), Charalampos Chelmis (University of Southern California, USA) and Viktor K. Prasanna (University of Southern California, USA)
DOI: 10.4018/978-1-4666-8767-7.ch013
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Time series data emerge naturally in many fields of applied sciences and engineering including but not limited to statistics, signal processing, mathematical finance, weather and power consumption forecasting. Although time series data have been well studied in the past, they still present a challenge to the scientific community. Advanced operations such as classification, segmentation, prediction, anomaly detection and motif discovery are very useful especially for machine learning as well as other scientific fields. The advent of Big Data in almost every scientific domain motivates us to provide an in-depth study of the state of the art approaches associated with techniques for efficient querying of time series. This chapters aims at providing a comprehensive review of the existing solutions related to time series representation, processing, indexing and querying operations.
Chapter Preview
Top

Introduction

Time series data refer to a collection of data points which represent the evolution or behavior of a specific entity in time. Examples include, but are not limited to consumption information from distinct customers on a power grid, stock price closing values (Bao, 2008), patient vital signs as monitored by special equipment and more recently tweets, blog posts. A time series is defined as a sequence of pair where is a data point (value) and is the timestamp at which is recorded. Timestamps can be omitted for simplicity, in which case a time series object S is described by the vector where is the observed value of the j-th time interval. Values are assumed to be presented in the same order as they are observed, so for , appears before in the corresponding vector. The length of the time interval between consecutive values can be fixed or variable. This definition refers to univariate time series (Chatfield, 2013). Multivariate time series (Box, Jenkins, & Reinsel, 2013), (Wang, Zhu, Li, Wan, & Zhang, 2014) refers to a sequence of observations with multiple value at every given point in time. Time series graphs refer to snapshots of evolving/temporal/dynamic graphs (Park, Priebe, & Youssef, 2013), (Wang, Tang, Park, & Priebe, 2014), (Yan & Eidenbenz, 2014). A collection of these snapshots are what constitute the time series object. This chapter focuses on univariate or multivariate time series and queries related to them. Graph time series are mentioned for completeness and will be mentioned only on a high level as part of the latter developments in the field. Difference in sampling rates can make it difficult for distinct time series objects to be compared. Interpolation is a standard preprocessing operation used to fill gaps between intervals induced either by incompatible sampling rates or missing values. Interpolation techniques are not discussed in this chapter but are mentioned for completeness as part of time series workflows. Other common preprocessing steps include time series normalization. Normalization is performed by eliminating the amplitude value through subtracting the mean and dividing with the standard deviation (Loh, Kim, & Whang, 2000).

Key Terms in this Chapter

Cyber-Physical System: A system consisting of computational components that are used to control and monitor physical entities.

Relational Database Management Systems (RDBMS): A database management system that is based on the concept of tables and primary keys to organize data and relationships between them.

Query Language: A computer language that aims at providing factual answers to questions or providing information that is relevant to the corresponding area of inquiry.

Data Structure: A way of organizing data so that it can be efficiently accessed and updated.

Analytics: The discovery and transmission of patterns in data that are meaningful based on the context of data.

Big Data: Data that is collectively too large and complex to be analyzed with traditional data mining techniques.

Time Series: A sequence of data points consisting of consecutive measurements that are made over a time interval.

Complete Chapter List

Search this Book:
Reset