Article Preview
TopIntroduction
As many kinds of sensors are smaller and cheaper, they are more easily buried in the real world environment. In near future, we think that they are tied to each other and they compose sensor networks. Large amount of sequential data will be collected through the networks. Also, we anticipate that the analysis of the data leads to the improvement of our daily life.
According to this background, many analysis methods of sequential data have been proposed, (Sakurai & Ueno, 2014) proposes a method that discovers sequential patterns from text sequential data. The method extracts events representing texts from the data and generates event sequential data. It can pick up sequential patterns satisfying constrains based on the interests of users. (Sakurai et al., 2006) proposes a new evaluation criterion measuring the interestingness of sequential patterns. The criterion can evaluate future relationships between sequential sub-patterns. Also, the paper proposes a discovery method of the patterns based on the criterion. (Sakurai et al., 2008) proposes a method that discovers sequential patterns from sequential data with the tabular structure. In the case of the data, each item of the patterns is composed of an attribute and its attribute value. The method can efficiently discover the patterns by referring to relationships between attributes and attribute values.
Even if these methods can deal with large amount of sequential data, their time constraint is not strict and their data is described with single format. The methods cannot real-timely process the data and it cannot spontaneously process various formats of data. On the other hand, it is necessary for the sequential data collected from the sensor networks to be real-timely processed in its application fields. We cannot apply the data to the methods whose calculation cost is high. Also, the sensor networks are composed of various types of sensors. It is necessary to deal with the data described by multiple formats.
In this paper, we focus on complex sequential data related to evaluation targets. The data is composed of numerical sequential data and text sequential data. For example, the former one is stock price sequences of the evaluation targets and the latter one is sequences of news articles describing them. Here, the evaluation targets are companies. Also, we focus on the methods whose calculation cost is comparatively low. This paper proposes a method generating a ranking model of the evaluation targets. The model is generated by referring to the complex sequential data. The method is applied to the data composed of stock price information and news articles. This paper compares the proposed method with the method based on a random selection. It verifies the effect of the proposed method through the experimental results.
In the remaining parts of this paper, Section 2 introduces related works dealing with complex sequential data. Especially, they deal with the data related to economical field. Section 3 explains the analysis policy of the data and proposes a method generating a ranking model of evaluation targets. Section 4 explains experimental data, experimental method, and evaluation method. Also, it shows experimental results and discuses the effect of the proposed method. Section 5 shows future research direction. Lastly, Section 6 summarizes this paper and shows remaining subjects in this field.