Article Preview
Top1. Introduction
Knowledge discovery, data mining and decision making have wide applications in solving practical problems in many fields (Cao et al., 2007; Dat et al., 2019; Mohamed et al., 2020; Patro et al., 2020; Tey et al., 2019). With the growth of data volume, the data sets become larger and more complex. The implementation of data mining algorithms on such large data sets is very challenging, typically classification and clustering algorithms (Kwok, Smith, Lozano & Taniar, 2002; Tan & Taniar, 2007). To improve the efficiency of data mining algorithms, it is necessary to reduce redundant attributes (Liu & Wang, 2020; You et al., 2020). This is called as attribute reduction. One of the most effective tools to deal with attribute reduction problem on decision systems is rough set theory (Pawlak, 1991). However, rough set based algorithms are all implemented on complete decision systems which contain full values. In practical problems, the values on the attribute domain are often missing, namely as “missing value”. These systems are named as incomplete decision systems. Kryszkiewicz (1998) constructed a tolerance relation as an extension of the equivalent relation in the traditional rough set and introduced tolerance rough set model. This model was applied into attribute reduction and rule extraction directly on incomplete decision systems without pre-processing missing values.
In many applications, the size of decision systems is often very large and changes over time (Atay & Garani, 2019), especially in the cases of adding or deleting object sets. For example, in online data analysis systems, data is always added and removed over time. In some others forecasting systems, we have already collected data in many years. In fact, we need to build data analysis model in each day or week. So it is necessary to remove out-of-date data as useless data. This leads to two main difficulties in solving attribute reduction problems on such the dynamic decision systems. Firstly, the size of decision system is extreme large so that it is hard to obtain reduct because of the limitation of memory size and computing capacity. Secondly, the data in decision system cannot be provided at once. When data changes, the traditional methods have to repeat every step on the entire decision system. This makes time consuming. To deal with those difficulties, incremental approach was proposed to decrease execution time of attribute reduction techniques.
It was known that the incremental algorithms only update reduct on the changed data without re-computing reduct in the whole decision tables. As the result, the computation time of incremental algorithms decreases significantly compared to non-incremental algorithms. Consequently, they can be implemented on large, changeable decision tables.
The motivation of this paper is based on the fact that available incremental algorithms using tolerance rough set did not get the best result in term of classification accuracy and the cardinality of reduct. Because these are filter algorithms, final reduct includes conditional attributes maintaining the original measure. After obtaining the reduct, classification accuracy is calculated. Consequently, the result is not optimized both in the number of attributes and the accuracy of classification. It leads to the decrease of performance of classifiers. KGIRA-M and KGIRD-M (Zhang, Dai & Chen, 2020) are two incremental algorithms used to calculate reduct used in the case of adding or deleting objects in incomplete decision systems. But they are also filter algorithms. In (Giang et al., 2020), we have proposed filter-wrapper incremental algorithms based on fuzzy partition distance to get reduct in cases of adding and deleting multiple objects. However, the proposed algorithms in (Giang et al., 2020) used fuzzy rough set to get reduct of complete decision tables.