A Fast Shapelet Discovery Algorithm Based on Important Data Points

A Fast Shapelet Discovery Algorithm Based on Important Data Points

Cun Ji, Chao Zhao, Li Pan, Shijun Liu, Chenglei Yang, Lei Wu
Copyright: © 2017 |Pages: 14
DOI: 10.4018/IJWSR.2017040104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Time series classification (TSC) has attracted significant interest over the past decade. A shapelet is one fragment of a time series that can represent class characteristics of the time series. A classifier based on shapelets is interpretable, more accurate, and faster. However, the time it takes to find shapelets is enormous. This article will propose a fast shapelet (FS) discovery algorithm based on important data points (IDPs). First, the algorithm will identify IDPs. Next, the subsequence containing one or more IDPs will be selected as a candidate shapelet. Finally, the best shapelets will be selected. Results will show that the proposed algorithm reduces the shapelet discovery time by approximately 14.0% while maintaining the same level of classification accuracy rates.
Article Preview
Top

1. Introduction

The internet of things (IoT) is made up of small sensors and actuators embedded in objects with internet access. It plays a key role in solving challenges faced in today’s society (Nunes et al., 2016; Sampaio, Lima, Mendonça, & Filho, 2013). Sensors in an IoT system collect observation data with equal intervals. Data collected in an IoT system is time series data.

Time series utilizes data points indexed in time order (either listed or graphed). This collection of values is obtained over time using sequential measurements. It is characterized by its numerical and continuous nature (Esling & Agon, 2012; Fu, 2011).

Time series is always considered as a whole rather than an individual numerical field. The high dimensionality, high feature correlation, and typically high levels of noise provide an interesting research problem (Gabr & Fatehy, 2013; Keogh & Kasetty, 2003; Ye & Keogh, 2009). Effective TSC has been an important research problem for both academic researchers and industry practitioners.

In TSC, an unlabeled time series is assigned to one of at least two predefined classes (Keogh & Kasetty, 2003). TSC arises in many real-world fields, including: electrocardiogram classification; fault detection and identification of physical systems; automotive preventive diagnosis; gesture recognition; alarm interpretation of telecommunication networks; data sensor analysis; speaker identification and/or authentication; aerospace health monitoring, etc. (Prieto, Alonso-González, & Rodríguez, 2015).

Many classification algorithms can be applied to time series (for example, classification trees, nearest neighbors, discriminant analysis, and iterative classification). Empirical evidence strongly suggests that classification based on time series shapelets will outperform many classification algorithms (He, Dong, Zhuang, Shang, & Shi, 2012). Shapelets are discriminative subsequences which have the property that the minimum distance between a shapelet and the time series is a good predictor for TSC (Wistuba, Grabocka, & Schmidt-Thieme, 2015). Algorithms based on shapelets are interpretable, more accurate, and faster than state-of-the-art classifiers (Mueen, Keogh, & Young, 2011; Ye & Keogh, 2009, 2011).

A shapelet is a time series subsequence representative of class membership. Algorithms based on shapelets are interpretable, more accurate, and faster than state-of-the-art classifiers. The time complexity of the shapelet selection process is high although shapelets are computed offline (Rakthanmanon & Keogh, 2013).

For this, a FS discovery algorithm based on IDPs (FS-IDPs) is proposed. First, the algorithm identifies IDPs. Next, only subsequences containing one or more IDPs are selected as candidate shapelets. Finally, the best shapelets are selected from the candidates. Through IDPs, the number of shapelet candidates is reduced. This leads to the time-saving shapelet discovery.

In this article, the authors will propose a FS-IDPs. The algorithm will use IDPs to speed up the shapelet discovery process. Next, comparison experiments among different shapelet discovery algorithms will be conducted. Experiment results will show that the algorithm will speed up the shapelet discovery process while maintaining the same level of classification accuracy rates. Then, the article will introduce definitions, summarize related work, and present the proposed FS-IDPs algorithm. Finally, the article will describe experiments, show results, and present a conclusion.

Complete Article List

Search this Journal:
Reset
Volume 21: 1 Issue (2024)
Volume 20: 1 Issue (2023)
Volume 19: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 18: 4 Issues (2021)
Volume 17: 4 Issues (2020)
Volume 16: 4 Issues (2019)
Volume 15: 4 Issues (2018)
Volume 14: 4 Issues (2017)
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing