Time Series Data Mining: A Retail Application

Time Series Data Mining: A Retail Application

Daniel Hebert (Market Analyst, Rogers Corporation, Woodstock, CT, USA), Billie Anderson (Department of Mathematics, Bryant University, Smithfield, RI, USA), Alan Olinsky (Department of Mathematics, Bryant University, Smithfield, RI, USA) and J. Michael Hardin (Dean, Culverhouse College of Commerce and Business Administration, University of Alabama, Tuscaloosa, AL, USA)
Copyright: © 2014 |Pages: 18
DOI: 10.4018/ijban.2014100104

Abstract

Modern technologies have allowed for the amassment of data at a rate never encountered before. Organizations are now able to routinely collect and process massive volumes of data. A plethora of regularly collected information can be ordered using an appropriate time interval. The data would thus be developed into a time series. Time series data mining methodology identifies commonalities between sets of time-ordered data. Time series data mining detects similar time series using a technique known as dynamic time warping (DTW). This research provides a practical application of time series data mining. A real-world data set was provided to the authors by dunnhumby. A time series data mining analysis is performed using retail grocery store chain data and results are provided.
Article Preview

1. Introduction

Data is being collected by businesses at a rate never encountered before through web sources, cellular phones and social media. The growth of internet businesses has led to a whole new scale of data processing challenges. Companies like Google, Facebook, Yahoo, Twitter, and Amazon now routinely collect and process hundreds to thousands of terabytes of data on a daily basis. The rise of the availability of retail data represents a significant change in the volume of data which can be processed.

In retail environments information regarding sales transactions is often recorded routinely. Such purchase transactions exist as observations within a data set. These records maintain a data and time associated with each transaction. Such time-ordered data presents new challenges for businesses and companies. Sequential data that is collected over a period of time is referred to as time series. When faced with large data sets that contain time series, organizations must employ new and sophisticated techniques for analysis. Traditional data mining methods and statistical techniques are often inappropriate when analyzing data that possess a time factor. This dilemma has led to the development and rising importance of time series data mining techniques. In order to address the increasing prevalence of time-ordered data, researchers and analysts have begun seeking new analysis methods that account for information collected over a period of time. In order to support informed decision making and gain useful knowledge, organizations are searching for novel means of understanding and interpreting time series.

Time series analysis consists of methods for analyzing time series data to extract meaningful information. Time series data mining combines data mining techniques with time series analysis to:

  • Conduct similarity analysis of time series data to validate the forecasting of new products.

  • Gain a deeper understanding of key markets of interest and notable buying behaviors.

  • Identify products that share similar purchase rates.

  • Improve predictive modeling capabilities.

  • Achieve better customer retention and satisfaction.

  • Enhance goods consumption ratio.

Time series is traditionally concerned with identifying trends in the data such as seasonality or trends and forecasting. Data mining is the process of detecting hidden relationships and patterns in very large data sets. Data mining focuses on developing predictive models and using nonparametric techniques such as clustering. The field of data mining includes methods that attempt to automate the scientific discovery process. The unique characteristic of data mining is the types of problems faced-those with large data sets that contain complex and hidden relationships.

The difference between time series and applying data mining techniques to time series data is the sheer amount of data that is involved. In time series data mining the number of time series that are available to analyze is so large that traditional time series methods are not feasible (Liu, Bhattacharyya, Sclove, Chen, & Lattyak, 2011).

The quantity of the time series is not the only reason that traditional time series methods are not applicable in certain situations. Traditional time series methods such as Box-Jenkins and the Autoregressive Integrated Moving Average (ARIMA) model both assume the time series are stationary; that is the time series remains constant through time (Box & Jenkings, 1976; Pandit & Wu, 1983). For real-world time series such as customers purchasing items or stock market movements, stationarity is not a realistic assumption. The ARIMA model also assumes that the system generating the time series is linear; that is the system can be described by difference equations (Gabel & Roberts, 1980). Many times in practice, the system generating the time series is not linear.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 6: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 5: 4 Issues (2018): 3 Released, 1 Forthcoming
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 4 Issues (2015)
Volume 1: 4 Issues (2014)
View Complete Journal Contents Listing