A Fast Shapelet Discovery Algorithm Based on Important Data Points

A Fast Shapelet Discovery Algorithm Based on Important Data Points

Cun Ji (School of Computer Science and Technology, Shandong University, Jinan, China), Chao Zhao (School of Computer Science and Technology, Shandong University, Jinan, China), Li Pan (School of Computer Science and Technology, Shandong University, Jinan, China & Engineering Research Center of Digital Media Technology, Ministry of Education, Jinan, China), Shijun Liu (School of Computer Science and Technology, Shandong University, Jinan, China & Engineering Research Center of Digital Media Technology, Ministry of Education, Jinan, China), Chenglei Yang (School of Computer Science and Technology, Shandong University, Jinan, China & Engineering Research Center of Digital Media Technology, Ministry of Education, Jinan, China) and Lei Wu (School of Computer Science and Technology, Shandong University, Jinan, China & Engineering Research Center of Digital Media Technology, Ministry of Education, Jinan, China)
Copyright: © 2017 |Pages: 14
DOI: 10.4018/IJWSR.2017040104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Time series classification (TSC) has attracted significant interest over the past decade. A shapelet is one fragment of a time series that can represent class characteristics of the time series. A classifier based on shapelets is interpretable, more accurate, and faster. However, the time it takes to find shapelets is enormous. This article will propose a fast shapelet (FS) discovery algorithm based on important data points (IDPs). First, the algorithm will identify IDPs. Next, the subsequence containing one or more IDPs will be selected as a candidate shapelet. Finally, the best shapelets will be selected. Results will show that the proposed algorithm reduces the shapelet discovery time by approximately 14.0% while maintaining the same level of classification accuracy rates.
Article Preview

1. Introduction

The internet of things (IoT) is made up of small sensors and actuators embedded in objects with internet access. It plays a key role in solving challenges faced in today’s society (Nunes et al., 2016; Sampaio, Lima, Mendonça, & Filho, 2013). Sensors in an IoT system collect observation data with equal intervals. Data collected in an IoT system is time series data.

Time series utilizes data points indexed in time order (either listed or graphed). This collection of values is obtained over time using sequential measurements. It is characterized by its numerical and continuous nature (Esling & Agon, 2012; Fu, 2011).

Time series is always considered as a whole rather than an individual numerical field. The high dimensionality, high feature correlation, and typically high levels of noise provide an interesting research problem (Gabr & Fatehy, 2013; Keogh & Kasetty, 2003; Ye & Keogh, 2009). Effective TSC has been an important research problem for both academic researchers and industry practitioners.

In TSC, an unlabeled time series is assigned to one of at least two predefined classes (Keogh & Kasetty, 2003). TSC arises in many real-world fields, including: electrocardiogram classification; fault detection and identification of physical systems; automotive preventive diagnosis; gesture recognition; alarm interpretation of telecommunication networks; data sensor analysis; speaker identification and/or authentication; aerospace health monitoring, etc. (Prieto, Alonso-González, & Rodríguez, 2015).

Many classification algorithms can be applied to time series (for example, classification trees, nearest neighbors, discriminant analysis, and iterative classification). Empirical evidence strongly suggests that classification based on time series shapelets will outperform many classification algorithms (He, Dong, Zhuang, Shang, & Shi, 2012). Shapelets are discriminative subsequences which have the property that the minimum distance between a shapelet and the time series is a good predictor for TSC (Wistuba, Grabocka, & Schmidt-Thieme, 2015). Algorithms based on shapelets are interpretable, more accurate, and faster than state-of-the-art classifiers (Mueen, Keogh, & Young, 2011; Ye & Keogh, 2009, 2011).

A shapelet is a time series subsequence representative of class membership. Algorithms based on shapelets are interpretable, more accurate, and faster than state-of-the-art classifiers. The time complexity of the shapelet selection process is high although shapelets are computed offline (Rakthanmanon & Keogh, 2013).

For this, a FS discovery algorithm based on IDPs (FS-IDPs) is proposed. First, the algorithm identifies IDPs. Next, only subsequences containing one or more IDPs are selected as candidate shapelets. Finally, the best shapelets are selected from the candidates. Through IDPs, the number of shapelet candidates is reduced. This leads to the time-saving shapelet discovery.

In this article, the authors will propose a FS-IDPs. The algorithm will use IDPs to speed up the shapelet discovery process. Next, comparison experiments among different shapelet discovery algorithms will be conducted. Experiment results will show that the algorithm will speed up the shapelet discovery process while maintaining the same level of classification accuracy rates. Then, the article will introduce definitions, summarize related work, and present the proposed FS-IDPs algorithm. Finally, the article will describe experiments, show results, and present a conclusion.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 14: 4 Issues (2017)
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing