Article Preview
Top1. Introduction
Attribute reduction is a crucial problem in the data preprocessing step of data mining process (Song, Li & Qu, 2018; You et al., 2020). The objective of attribute reduction is to find a reduced attribute set, known as reduct, in order to improve the efficiency of data mining model (Liu & Wang, 2020). Rough set theory (Pawlak, 1991) is considered as an effective tool to solve attribute reduction problem in decision systems. In practical problems, “missing values” always appear in the value domain of conditional attribute set of decision systems. Then, they are called incomplete decision systems. To get reduct direct on incomplete decision systems without preprocessing missing values, Kryszkiewicz (1998) introduced a tolerance rough set model based on the extension of traditional rough set theory. Many algorithms have been proposed to get reduct of these systems based on tolerance rough set model. Today, by the rapid growth of big data, incomplete decision systems with high dimension or very large number of attributes are popular in many fields. For example, some decision systems in bioinformatics data have millions of attributes. Furthermore, they are always changed or updated on different time scales (Wang, 2020), especially in the cases of changing attributes or dimensions (Atay & Garani, 2019). For example, in medical diagnosis, clinical symptoms are considered as initial attributes for diagnosis by a physician. After that, the test indicators are used as the next attributes that are constantly added and updated to assist the doctors in improving the diagnostic accuracy.
In such decision systems, the algorithms for finding a reduct according to traditional approaches had two major challenges. Firstly, in decision systems with the change of attribute sets, because these algorithms must re-compute reduct on the entire decision system after each changing, so the time computing increases significantly. Secondly, in large decision systems, these algorithms meet difficulties because of limited memory space and computational speed. To overcome these two challenges, this paper proposes incremental algorithms to get reduct of incomplete decision systems in the case of adding or deleting conditional attributes. Incremental algorithms only update reduct on the changed attribute set, not on the entire of original incomplete decision systems. Therefore, these algorithms greatly reduce the runtime. Moreover, incremental algorithms can be implemented on high dimension incomplete decision systems by splitting a high dimension system into separated parts according to attribute, and the final reduct is computed by adding each part one by one.
In dynamic complete decision systems, many incremental algorithms for finding reduct have been proposed based on traditional rough set and some extended rough set so far. When adding or deleting objects, researchers proposed many incremental attribute reduction algorithms based on different measures such as distance measure (Demetrovics et al., 2014; Huong & Giang, 2016), information granularity (Jing, Li, Fujita, Yu, & Wang, 2017; Jing, Li, Huang, Chen, & Hong, 2017), fuzzy attribute dependency (Liu et al., 2017), fuzzy discernibility relation (Yang et al., 2017), extended discernibility matrix (Lang et al., 2017; Yang et al., 2017; Wei et al., 2018; Ma et al., 2019), positive domain (Das, Sengupta & Bhattacharyya, 2018; Lang et al., 2018; Hao et al., 2019), membership function (Shua, Qian & Xie, 2019), simplified discernibility matrix (Yang et al., 2019), indiscernibility relation (Nandhini & Thangadurai, 2019), covering granularity (Cai et al., 2019), information entropy (Shu, Qian & Xie, 2020), key instance set in fuzzy rough set (Ni et al., 2020), fuzzy rough set based information entropy (Zhang et al., 2020), discernibility matrix (Liu et al., 2020).
When adding or deleting attributes, many incremental attribute reduction algorithms were also proposed based on measures such as tolerance matrix in set-valued decision systems (Zhang, Li & Ruan, 2012), information entropy (Wang, Liang & Qian, 2013), fuzzy dependency function in fuzzy rough set (Zeng et al., 2015), distance measure (Demetrovics et al., 2016), dependency function (Raza & Qamar, 2016), knowledge granularity (Jing, Li & Huang, 2016), approximations sets in covering decision systems (Cai, Li & Ma, 2017), knowledge granularity with a multi-granulation view (Jing et al., 2017), compressed binary discernibility matrix (Ma et al., 2019), discernibility matrix in compacted decision system (Wei et al., 2019), indiscernibility relation (Nandhini & Thangadurai, 2019), discernible relations (Chen, Dong & Mi, 2020).