Article Preview
TopBackground
To calculate the similarity of two datasets actually is a necessary and important part in statistics or data mining, even in our lifetime. The classification algorithms are widely used in different fields, like image processing, medical diagnosis, data filtration, data comparison and so on. And the FURIA is one of the classification algorithms, FURIA is a novel fuzzy rule-based classification method called Fuzzy Unordered Rule Induction Algorithm or FURIA for short, which is a modification and extension of the state-of-the-art rule learner RIPPER (Cohen, 1995). In particular, FURIA learns fuzzy rules instead of conventional rules and unordered rule sets instead of rule lists. Moreover, to deal with uncovered examples, it makes use of an efficient rule stretching method.
There is a similar work in image processing, it’s the Fast Approximate Energy Minimization via Graph Cuts by Yuri Boykov, Olga Veksler, and Ramin Zabin (2001). This paper describes how to get a picture apart by algorithm, we know a picture is made from pixels, but how can I cut a picture, and which is the threshold among pixels is of a question. Pixels has its own code that means each one is a set of number. We can see it as a vector, and a whole picture is made from pixels. Numerically it is a pixel matrix, a matrix formed by vectors of variables. And now, we also can treat the matrix as a dataset.
So, for a dataset, it contains many attributes. It has a high dimension in mathematics, so we see every data as a vector
. We want to do classification for the data set, every class we want to classify is also a vector
(j is the number of the attribute class).
For the
, we need have a value for
, so to use K-means to get the vectors, for k = j.
The do the iteration based on the
, for calculate every distance of
and
.