Fuzzy Classification on Relational Databases

Fuzzy Classification on Relational Databases

Andreas Meier (University of Fribourg, Switzerland), Günter Schindler (Galexis AG, Switzerland) and Nicolas Werro (University of Fribourg, Switzerland)
Copyright: © 2008 |Pages: 29
DOI: 10.4018/978-1-59904-853-6.ch023
OnDemand PDF Download:
No Current Special Offers


In practice, information systems are based on very large data collections mostly stored in relational databases. As a result of information overload, it has become increasingly difficult to analyze huge amounts of data and to generate appropriate management decisions. Furthermore, data are often imprecise because they do not accurately represent the world or because they are themselves imperfect. For these reasons, a context model with fuzzy classes is proposed to extend relational database systems. More precisely, fuzzy classes and linguistic variables and terms, together with appropriate membership functions, are added to the database schema. The fuzzy classification query language (fCQL) allows the user to formulate unsharp queries that are then transformed into appropriate SQL statements using the fCQL toolkit so that no migration of the raw data is needed. In addition to the context model with fuzzy classes, fCQL and its implementation are presented here, illustrated by concrete examples.

Key Terms in this Chapter

Fuzzy Classification Database Schema: A fuzzy classification database schema R(A,C,X,T) is a database schema with a set of attributes A, a set of associated contexts C, a set of linguistic variables X, and a set of corresponding terms T. Each linguistic variable Xi has an associated set of terms T(Xi):={T1,…,Tk}. Note that the number of terms depends on the linguistic variable, that is, all linguistic variables must not have the same number of terms.

Merge Operator: The merge operator of two context-redundant tuples t and t’ leads to a new tuple u=(u1,…,un) and is defined as the set theoretic union uj:=tj ? t j’ of all tuple components uj.

Gamma Operator: The operator is used to weigh several attributes, and to calculate the aggregation of membership degrees x1, …, xm: . The ?-operator is composed of the algebraic product operator, a t-norm, and its counterpart the algebraic sum, a t-conorm. The ?-argument ranging from 0 to 1 specifies whether the results should go in the direction of the algebraic product (with ?=0) or toward the algebraic sum (with ?=1). The ?-argument therefore determines the strength of the compensation mechanism.

Linguistic Variable: A linguistic variable is characterized by a quintuple (X,T,U,G,M) where X is the name of the variable, T is the set of terms of X, U is the universe of discourse, G is a syntactic rule for generating the name of the terms, and M is a semantic rule for associating each term with its meaning, that is, a fuzzy set defined on U.

Hierarchical Decomposition: Having a multidimensional fuzzy classification, that is, the classification space has more than two dimensions, leads to a large number of classes whose semantics cannot be derived properly. In order to maintain classes with a meaningful definition, a multidimensional fuzzy classification can be decomposed into a hierarchy of fuzzy classifications. The hierarchical decomposition merges subsets of qualifying attributes to fuzzy subclassifications (composed attributes). The composed attributes are integrated as linguistic variables in classes of higher levels leading to a hierarchy of fuzzy classification. The value v(e) of an element e of a composed attribute can be derived by assigning to each fuzzy class Ck a grade gr(Ck) expressing the meaning of the composed attribute. By aggregating these grades multiplied with the membership degrees of the classified elements, the formula looks like this: .

Database Schema with Contexts: A relational database schema R(A,C) with contexts is a set of attributes A=(A1,…,An) with a associated set of contexts C=(C1(A1),…,Cn(An)). To every attribute Aj defined by a domain D(Aj) there is added a context C(Aj). A context C(Aj) is a partition of D(Aj) into equivalence classes.

Context-Based Selection: For a relation r of a database schema R(A,C) with contexts, a schema R(QA,QC) with a query set of attributes QA ? A, and a set of associated query contexts QC, there exists a context-based selection S [ß(QA,QC)](r) where ß(QA,QC) is a Boolean condition for values of the attribute set QA and corresponding query contexts QC.

Fuzzy Classification Query Language (fCQL): fCQL is a data analysis tool that allows users to query a predefined fuzzy classification of relational databases. In contrast to the fuzzy query languages, the user does not need to deal with a fuzzy SQL or with fuzzy predicates, which could lead to varying semantics and different interpretations of the original data collection. From the user’s point of view, fCQL can be seen as a human-oriented query language as it functions at the linguistic level. The language can be applied without numerical values through the use of predefined linguistic variables and their associated verbal terms. In this way, the user can easily formulate classification queries as they are intuitive; that is, the meaning of the queries is linguistically expressed.

Context Redundancy: Two tuples t and t’ of a relation r with associated schema R(A,C) are context redundant regarding the corresponding set of contexts C=(C1(A1),…,Cn(An)) if all tuple components tj and tj’ belong to the same equivalence class.

Context-Based Relational Algebra: Classical relational algebra consists of five operators, that is, set theoretic union, set theoretic difference, and Cartesian product as well as projection p and selection s. Context-based relational algebra is an extension of classical relational algebra that uses relational database schemas R(A,C) with contexts. For a given set of querying attributes QA and a set of corresponding querying contexts QC, there exists a context-based union, a context-based difference, and a context-based Cartesian product, as well as a context-based projection ? and selection S.

Context-Based Projection: For a relation r of a database schema R(A,C) with contexts and a schema R(QA,QC) with a query set of attributes QA ? A and a set of associated query contexts QC, there exists a context-based projection ?[QA,QC](r). The projection of r regarding R(QA,QC) is a relation qr of R(QA,QC). The tuples in qr are calculated by reducing the tuple components of r to the set of querying attributes QA. Context-redundant tuples regarding QC are merged by the merge operator.

Complete Chapter List

Search this Book: