A Comparison of Deep Learning Models in Time Series Forecasting of Web Traffic Data From Kaggle

A Comparison of Deep Learning Models in Time Series Forecasting of Web Traffic Data From Kaggle

Bingnan Wang, Dickson K. W. Chiu, Kevin K. W. Ho
DOI: 10.4018/978-1-7998-9016-4.ch014
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In recent years, time series forecasting has attracted more attention from academia and industry. This research used raw data from the “Web Traffic Forecasting” competition on the Kaggle platform to test the prediction accuracy of different time series models, especially the generalization performance of various deep learning models. The experiments used historical traffic data from 145,063 web pages from Wikipedia from 2015-07-01 to 2017-11-13. Traffic data from 2015-07-01 to 2017-09-10 was used to forecast traffic from 2017-09-13 to 2017-11-13, a total of 62 days. The experimental results showed that almost all deep learning models predicted far more effectively than statistical and machine learning models, showing that deep learning models have great potential for time series forecasting problems.
Chapter Preview
Top

Introduction

Time series forecasting refers to using historical data in a period to predict the future. The application scenarios of time series forecasting are relatively rich, including quantitative trading, macroeconomic trend forecast, merchandise sales forecast in e-commerce, store sales forecast in the retail industry, and electricity forecast of power supply. The input and output format of time series forecasting is similar to regression problems in supervised learning, but there is one fundamental difference: time-series data usually has time dependence (Robinson & Sims, 1994). For a batch of input data of the same size, if the order of the data is changed, all data patterns and relations are likely to undergo huge changes, and the prediction results of models may be completely different as well. At the same time, other regression problems may not have this characteristic.

Although there are many successful deep learning applications in time series forecasting in academia, many industrial and commercial applications still use traditional statistical models or machine learning models to deal with time series forecasting problems. The industrial and commercial sectors are still in the exploratory stage of applying deep learning technology to time series forecasting. The main reasons are:

  • 1.

    Most time-series forecasting in academia use a very small dataset for testing complex deep learning models, which is of little significance for evaluating the actual performance of models. Few studies used industrial-grade datasets to test these models’ performances. Thus, conclusions drawn from such experiments are not quite convincing;

  • 2.

    Compared with mature fields, such as natural language processing, computer vision, and recommendation systems, the research field of time series forecasting is relatively niche. This means that the industry lacks related expertise; and

  • 3.

    At present, the industries that can generate the greatest benefits from time series forecasting are mainly the financial industry, especially quantitative trading. However, due to the unique nature of the industry, the results and details of such applications are rarely disclosed.

Thus, the objectives of this chapter are:

  • 1.

    Comparing the difference in model performance between the statistical model, machine learning model, and different deep learning models on the same data set; and

  • 2.

    Comparing the performance differences of different deep learning models on the same data set in detail, and summarize the advantages and disadvantages of different deep learning algorithms.

Time series forecasting has many important applications in commercial and industrial sectors. However, compared to popular fields, such as natural language processing, recommendation systems, and computer vision, time series forecasting is a relatively niche research field. Few articles have summarized the most valuable method, i.e., deep learning time series forecasting algorithms, so far. Therefore, to make up for this gap, this chapter will compare in detail the effects of different deep learning algorithms on the same real-life industrial data set and evaluate the performance of different deep learning algorithms. These algorithms are representative and popular and used as the solutions of the time series forecasting in the industry and competitions. They have potential great commercial value. This chapter provides references for various industries to solve practical time series forecasting problems in the real world.

Key Terms in this Chapter

Direct Forecasting Strategy: Requires developing a separate model for each forecast time step, thus avoiding error accumulation in each step.

Additive Trend Components: There is a seasonal component in the time series, and the seasonal component and other components in the time series have an additive relationship.

Deep Learning: Imitates the way humans gain knowledge for automating predictive analytics and can be abstracted as a combination of the physical structure of a multi-layer neural network with activation functions, objective functions, optimization algorithms, and various auxiliary functions.

Multiplicative Trend Components: There is a seasonal component in the time series and the seasonal component and other components in the time series are multiplicative.

Hybrid Forecasting Strategy: Combines the recursive and direct forecasting strategy so that static or dynamic time-invariant features help increase the forecasting accuracy while avoiding error accumulation like the direct forecasting strategy.

Time Series Forecasting: Using historical data in a period to predict the future. The input and output format of time series forecasting is similar to regression problems in supervised learning, but with time dependence, i.e., if the order of the data is changed, all data patterns and relations are likely to undergo huge changes affecting the prediction results.

Recursive Forecasting Strategy: Uses a one-step model multiple times in which the prediction for the prior time step feeds into the next time step for prediction.

Complete Chapter List

Search this Book:
Reset