Large Multivariate Time Series Forecasting: Survey on Methods and Scalability

Large Multivariate Time Series Forecasting: Survey on Methods and Scalability

Youssef Hmamouche (Aix-Marseille Université, France), Piotr Marian Przymus (Aix-Marseille Université, France), Hana Alouaoui (Aix-Marseille Université, France), Alain Casali (Aix-Marseille Université, France) and Lotfi Lakhal (Aix-Marseille Université, France)
Copyright: © 2019 |Pages: 28
DOI: 10.4018/978-1-5225-4963-5.ch006


Research on the analysis of time series has gained momentum in recent years, as knowledge derived from time series analysis can improve the decision-making process for industrial and scientific fields. Furthermore, time series analysis is often an essential part of business intelligence systems. With the growing interest in this topic, a novel set of challenges emerges. Utilizing forecasting models that can handle a large number of predictors is a popular approach that can improve results compared to univariate models. However, issues arise for high dimensional data. Not all variables will have direct impact on the target variable and adding unrelated variables may make the forecasts less accurate. Thus, the authors explore methods that can effectively deal with time series with many predictors. The authors discuss state-of-the-art methods for optimizing the selection, dimension reduction, and shrinkage of predictors. While similar research exists, it exclusively targets small and medium datasets, and thus, the research aims to fill the knowledge gap in the context of big data applications.
Chapter Preview


Time series analysis and time series data mining aim to describe patterns and evolutions occurring in data over the time. Among the many useful applications of time series data mining and analysis, time series forecasting is especially salient as it contributes crucial information to corporate and/or institutional decision-making. Thus, to no surprise, is often an important part of business intelligence (BI) systems, which allow a company to gather, store, access, and analyze corporate data to aid in decision-making.

In today’s information-driven world, countless numbers of numerical time series are generated by industry and researchers on any given day. For many applications -- biology, medicine, finance, industry, among others -- high dimensional time series are required. Modern time series analysis systems are expected to process and store millions of such high dimensional data points per minute, twenty-four hours a day, seven days a week, generating terabytes of logs. Needless to say, dealing with such voluminous datasets raises various new and interesting challenges.

The first models developed for time series forecasting were univariate models based on auto-regression principle. In such models, historic observations are used to make future forecasts. The most popular of these univariate models include the Auto-Regressive (AR), Auto-Regressive Moving Average (ARMA), and the Auto-Regressive Integrated Moving Average (ARIMA) models (Box, 2013). Let us detail the ARIMA model, which takes three integer parameters (p, d, q).

Where p is the lag parameter of the auto-regressive part, d is the non-stationarity order of the time series, and q is lag parameter of the moving average part.

The non-stationarity of time series is allowed with this model. Consider a time series that is non-stationary of order d.

The ARIMA (p, d, q) model consists in applying the ARMA (p, d) model after transforming the time series to stationary by differencing it d times, where d is the order of integration or non-stationarity. The ARMA (p, q) model expresses a stationary times series y(t) according to the q last error terms and the p past observations. It can be expressed as follows:

where are the error terms, αi and βi are the parameters of the model.

Despite their advantages, univariate forecasting approaches have some drawbacks: for one, they do not take into account potentially exploitable data of other time series in the same dataset.

Therefore, multivariate forecasting models, which incorporate such data into their analysis, were developed, including the extended version of the AR model; the Vector Auto-Regressive (VAR) and the co-integrated VAR model; the Vector Error Correction model (VECM) (Johansen, 1991). The principle underlying multivariate forecasting models is that the value of a given variable often depends on past values of itself and of other related variables. Such models are still popular today and are used independently or in combination with other new techniques, for instance, artificial neural network non-linear modeling (Thielbar & Dickey, 2011).

Despite significant developments in multivariate forecast modeling, two main issues still appear when dealing with highly dimensional data: (i) how to select predictors for a target variable; (ii) how to assess whether the quantity of variables used has affected the accuracy of predictions. Both problems have been examined by researchers in recent years, yielding original findings and novel approaches. Namely, recent literature (J. H. Stock & Watson, 2012), (Jiang, Athanasopoulos, Hyndman, Panagiotelis, & Vahid, 2017), suggests that for econometric data specifically, methods performance depends on the dataset and the target variable. Thus, it is advisable to evaluate wide variety of methods before making the final decision on used model.

An upper limit exists on the number of variables that can be used to improve the forecasting quality of existing multivariate models.

Complete Chapter List

Search this Book: