Introduction of Item Constraints to Discover Characteristic Sequential Patterns

Introduction of Item Constraints to Discover Characteristic Sequential Patterns

Shigeaki Sakurai (Toshiba Digital Solutions Corporation, Japan)
Copyright: © 2019 |Pages: 14
DOI: 10.4018/978-1-5225-5516-2.ch011

Abstract

This chapter introduces a method that discovers characteristic sequential patterns from sequential data based on background knowledge. The sequential data is composed of rows of items. This chapter focuses on the sequential data based on the tabular structured data. That is, each item is composed of an attribute and an attribute value. Also, this chapter focuses on item constraints in order to describe the background knowledge. The constraints describe the combination of items included in sequential patterns. They can represent the interests of analysts. Therefore, they can easily discover sequential patterns coinciding to the interests of the analysts as characteristic sequential patterns. In addition, this chapter focuses on the special case of the item constraints. It is constrained at the last item of the sequential patterns. The discovered patterns are used to the analysis of cause, and reason and can predict the last item in the case that the sub-sequence is given. This chapter introduces the property of the item constraints for the last item.
Chapter Preview
Top

Introduction

Owing to the progress of computer and network environments, it is easy to collect data with time information such as daily business reports, weblog data, and physiological information. This is the context in which methods of analyzing data with time information have been studied. This chapter focuses on a sequential pattern discovery method from discrete sequential data. The research expands the pattern discovery task (Agrawal & Srikant, 1994). The methods proposed by (Garofalakis et al., 2010), (Pei et al., 2001), (Srikant & Agrawal, 1996), and (Zaki, 2001) efficiently discover the frequent patterns as characteristic patterns. However, the discovered patterns do not always correspond to the interests of analysts, because the patterns are common and are not a source of new knowledge for the analysts.

The problem has been pointed out in connection with the discovery of associative rules. Blanchard et al. (2005), Brin et al. (1997), Silberschatz et al. (1996), and Suzuki et al. (2005) propose other evaluation criteria in order to discover other kinds of characteristic patterns. The patterns discovered by the criteria are not always frequent but are characteristic with some viewpoints. The criteria may be applicable to discovery methods of sequential patterns. However, these criteria do not satisfy the Apriori property. It is difficult for the methods based on the criteria to efficiently discover the patterns. On the other hand, Sakurai et al. (2008b) proposes sequential interestingness as an evaluation criterion satisfying the Apriori property. It can discover sequential patterns including sub-patterns with relatively small frequency. The sequential patterns are regarded as rules which predict remaining sub-patterns in the case that the sub-sequential patterns are given. We can anticipate that the analysts are interested in the sequential patterns to some degree.

Also, the discovery methods tend to discover a large amount of sequential patterns when the thresholds of evaluation criteria are not appropriate. The thresholds depend on the sequential data. Therefore, the selection of appropriate thresholds are not easy for the analysts. Thus, methods that limit the number of sequential patterns (Fournier-Viger et al, 2013), (Hathi & Ambasana, 2015), (Maciag, 2017), (Sakurai & Nisihizawa, 2015), (Tzvetkov et al., 2003) have been proposed. These methods can discover sequential patterns whose number is appropriate.

Complete Chapter List

Search this Book:
Reset