Simon Fong (Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, China)
DOI: 10.4018/IJEACH.2020010107
OnDemand PDF Download:
No Current Special Offers


Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various distance/similarity measures that is applicable to compare two probability density functions. Data comparison is widely used field in our society nowadays, and it is a very import part. To compare two objects is a common task that people from all walks of life would do. People always want or need to find the similarity between two different objects or the difference between two similar objects. Some different data may share some similarity in some given attribute(s). To compare with two datasets based on attributes by classification algorithms, for the attributes, we need to select them out by rules and the system is known as rule-based reasoning system or expert system which classifies a given test instance into a particular outcome from the learned rules. The test instance carries multiple attributes, which are usually the values of diagnostic tests. In this article, we are proposing a classifier ensemble-based method for comparison of two datasets or one dataset with different features. The ensemble data mining learning methods are applied for rule generation, and a multi-criterion evaluation approach is used for selecting reliable rules over the results of the ensemble methods. The efficacy of the proposed methodology is illustrated via an example of two disease datasets; it is a combined dataset with the same instances and normal attributes but the class in strictly speaking. This article introduces a fuzzy rule-based classification method called FURIA, to get the relationship between two datasets by FURIA rules. And find the similarity between these two datasets.
Article Preview


To calculate the similarity of two datasets actually is a necessary and important part in statistics or data mining, even in our lifetime. The classification algorithms are widely used in different fields, like image processing, medical diagnosis, data filtration, data comparison and so on. And the FURIA is one of the classification algorithms, FURIA is a novel fuzzy rule-based classification method called Fuzzy Unordered Rule Induction Algorithm or FURIA for short, which is a modification and extension of the state-of-the-art rule learner RIPPER (Cohen, 1995). In particular, FURIA learns fuzzy rules instead of conventional rules and unordered rule sets instead of rule lists. Moreover, to deal with uncovered examples, it makes use of an efficient rule stretching method.

There is a similar work in image processing, it’s the Fast Approximate Energy Minimization via Graph Cuts by Yuri Boykov, Olga Veksler, and Ramin Zabin (2001). This paper describes how to get a picture apart by algorithm, we know a picture is made from pixels, but how can I cut a picture, and which is the threshold among pixels is of a question. Pixels has its own code that means each one is a set of number. We can see it as a vector, and a whole picture is made from pixels. Numerically it is a pixel matrix, a matrix formed by vectors of variables. And now, we also can treat the matrix as a dataset.

So, for a dataset, it contains many attributes. It has a high dimension in mathematics, so we see every data as a vector IJEACH.2020010107.m01. We want to do classification for the data set, every class we want to classify is also a vector IJEACH.2020010107.m02 (j is the number of the attribute class).

For the IJEACH.2020010107.m03, we need have a value for IJEACH.2020010107.m04, so to use K-means to get the vectors, for k = j.

The do the iteration based on the IJEACH.2020010107.m05, for calculate every distance of IJEACH.2020010107.m06 and IJEACH.2020010107.m07.

Complete Article List

Search this Journal:
Open Access Articles
Volume 4: 2 Issues (2022): Forthcoming, Available for Pre-Order
Volume 3: 2 Issues (2021)
Volume 2: 2 Issues (2020)
Volume 1: 2 Issues (2019)
View Complete Journal Contents Listing