A New Similarity Metric for Sequential Data

A New Similarity Metric for Sequential Data

Pradeep Kumar (Indian Institute of Management, India), Bapi S. Raju (University of Hyderabad, India) and P. Radha Krishna (Infosys Technologies Limited, Hyderabad, India)
Copyright: © 2010 |Pages: 17
DOI: 10.4018/jdwm.2010100102
OnDemand PDF Download:
No Current Special Offers


In many data mining applications, both classification and clustering algorithms require a distance/similarity measure. The central problem in similarity based clustering/classification comprising sequential data is deciding an appropriate similarity metric. The existing metrics like Euclidean, Jaccard, Cosine, and so forth do not exploit the sequential nature of data explicitly. In this paper, the authors propose a similarity preserving function called Sequence and Set Similarity Measure (S3M) that captures both the order of occurrence of items in sequences and the constituent items of sequences. The authors demonstrate the usefulness of the proposed measure for classification and clustering tasks. Experiments were conducted on benchmark datasets, that is, DARPA’98 and msnbc, for classification task in intrusion detection and clustering task in web mining domains. Results show the usefulness of the proposed measure.
Article Preview

Sequence Similarity

A sequence is made of set of items that happen in time, or happen one after another, that is, in position but not necessarily in relation with time. We can say that a sequence is an ordered set of items. A sequence is denoted as follows:S = <a1, a2, …, an>where a1, a2, …, an are the item sets in sequence S. Sequence S contains n elements or ordered item sets. Sequence length is defined as the count of number of item sets contained in the sequence. It is denoted as |S| and here, |S| = n. Formally, similarity is a nonnegative real valued function S, defined on the Cartesian product X × X of a set X. It is called a metric on X if for every x,y∈ X, the following properties are satisfied by S.

Complete Article List

Search this Journal:
Volume 18: 4 Issues (2022): Forthcoming, Available for Pre-Order
Volume 17: 4 Issues (2021): 3 Released, 1 Forthcoming
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing