Advances in Classification of Sequence Data
Pradeep Kumar (University of Hyderabad, Gachibowli, India), P. Radha Krishna (University of Hyderabad, Gachibowli, India), Raju S. Bapi (University of Hyderabad, Gachibowli, India) and T. M. Padmaja (University of Hyderabad, Gachibowli, India)
Copyright: © 2008
In recent years, advanced information systems have enabled collection of increasingly large amounts of data that are sequential in nature. To analyze huge amounts of sequential data, the interdisciplinary field of Knowledge Discovery in Databases (KDD) is very useful. The most important step within the process of KDD is data mining, which is concerned with the extraction of the valid patterns. Recent research focus in data mining includes stream data mining, sequence data mining, web mining, text mining, visual mining, multimedia mining and multi-relational data mining. Sequence data may be discrete or continuous in nature. Most of the research on discrete sequence data concentrated on the discovery of frequently occurring patterns. However, comparatively less amount of work has been carried out in the area of discrete sequence data classification. In this chapter, data taxonomy is introduced with a review of the state of art for sequence data classification. The usefulness of embedding partial subsequence information extracted using sliding window technique into traditional classifier like kNN has been demonstrated. kNN has been tested with various vector based distance/similarity metrics. Further, with the use of S3M similarity metric, the full subsequence information embedded in the data sequences is extracted. The experimental data taken is DARPA’98 IDS benchmark dataset collected from UCIML dataset repository. The chapter closes by pointing out various application areas of sequence data and also the open issues in sequence data classification problem.