2.1. An Exemplary NIS
A is a quadruplet (OB,AT,{VALA | AAT},g), here OB, AT and VALA are finite sets, and g is a mapping from OBAT to the power set of ∪A∈ATVALA. Each element in OB, AT and VALA is called an object, an attribute and an attribute value, respectively. In the mapping g, each attribute value is given as a set. We may then assume that there is an actual value in each value set but we do not know which one it is. Since each set of attribute values is finite. We might then replace each set in with an element in a set. In such a way, we would obtain a standard table. We call such tables as derived DISs from a NIS. Let DD() denote a set of all derived DISs. For in Table 1, DD()={| is a derived DIS, }, and | DD() | is 2304 (=).
Table 1. An exemplary
NIS for the suitcase data set. The values of attributes
color, size, weight, price are as follows:
VALcolor={red,blue,green},
VALsize={small,medium,large},
VALweight={light,heavy},
VALprice={high,low}
object | color | size | weight | price |
1 | {red,blue,green} | {small} | {light,heavy} | {low} |
2 | {red} | {small,medium} | {light,heavy} | {high} |
3 | {red,blue} | {small,medium} | {light} | {high} |
4 | {red} | {medium} | {heavy} | {low,high} |
5 | {red} | {small,medium,large} | {heavy} | {high} |
6 | {blue,green} | {large} | {heavy} | {low,high} |
Generally, as illustrated by Figure 1, the number of derived DISs may increase exponentially. For large data sets with relatively high level of non-determinism understood as cardinalities of value sets, creation of scalable methods of data analysis requires finding a way to handle directly, with no need to considering all particular derived DISs. In the case of rule generation, we have solved this problem by using rough set-based framework. In our research, we have coped with the challenges in the following:
Figure 1. An exemplary NIS and 2304 derived DISs
- 1.
Management of possible equivalence relations (Sakai & Okuma, 2004);
- 2.
The minimum and the maximum degrees of data dependency (Sakai, 2004);
- 3.
Certain and possible rules, and rule generation (Sakai et al., 2008);
- 4.
Stability factor of rules and calculation (Sakai et al., 2011B);
- 5.
Management of missing values (Sakai et al., 2008; Sakai et al., 2011A);
- 6.
Management of an actual value by intervals (Sakai et al., 2011A);
- 7.
Management of numerical patterns and figures (Sakai et al., 2008);
- 8.
Direct question-answering (Sakai et al., 2011B).