Article Preview
TopIntroduction
One of the most common problems encountered in many data mining tasks including music data analysis and signal processing is the issue of ‘curse of dimensionality.’ This problem deals with the high dimensional data with massive amount of attributes. Using the whole set of attribute is inefficient in term of time processing and storage requirements. In addition, it may be difficult to interpret and may decrease the classification performance respectively. The solution of this problem is to remove irrelevant and redundant features and select the most important features that may achieve a better classifier (Liu, Jiang, & Yang, 2010). This process is known as feature selection or attributes reduction.
It has been proven that finding all possible reductions in an information system is NP-hard problem. For that, reduction of data ought to be properly addressed. The theories of rough set proposed by Pawlak in 1980s (Pawlak, 1982) and soft set proposed by Molodtsov (1999) emerge as a powerful tool for dealing with uncertainty that occur from inexact, noisy, or incomplete information. In feature selection problem, rough set is implemented with the aim of finding the minimal subsets of attributes which sufficient to generate the same classification accuracy as the whole set of attributes. This minimal features set is known as reduct. Banerjee et al. (2006) stated that the concept of reduct and core in rough set is relevant in feature selection to identify the essential features amongst the non-redundant ones. Liu, Jiang, and Yang (2010), also claimed that the concept of rough based reduction have been applied by many researchers in handling feature selection problems. In their work, the concept of inconsistency in attribute reduction is proposed. While, soft sets are called elementary neighborhood systems. Molodtsov pointed out that one of the main advantages of soft set theory is that it is free from the inadequacy of the parameterization tools, like in the theories of fuzzy set, probability and interval mathematics (Molodtsov, 1999). The idea of soft set theory as dimensionality reduction methods have been applied (Maji, Roy, & Biswa, 2002; Chen et al., 2003, 2005; Kong et al., 2008). Maji, Roy, and Biswas (2002) applied a soft set theory in the decision making problem with the help of Pawlak’s rough reduct. The reduct soft set algorithm which defined from the rough set theory is employed as a reduction method. Then the weighted choice value is embedded in the algorithm to select the optimal decision. In Chen et al. (2003, 2005) and Kong et al. (2008), studies on parameterization reduction of soft sets and its applications are presented. Two major problems in Maji, Roy, and Biswa (2002) are highlighted in their study which are the result of computing reduction is incorrect and the algorithm to compute the reduction and then to select the optimal objects are not reasonable. To improve these problems, they presented a new definition of parameterization reduction of soft sets with the concepts of attributes reduction in rough set theory.
In this paper, we attempt to apply soft set theory as data cleansing technique. It is based on a data cleansing technique which is developed using matrices computation of multi-soft sets. Further, we propose a feature selection technique using rough set theory based on maximum dependency of attributes proposed by (Herawan, Mustafa, and Abawajy (2010) purposely for Traditional Malay musical instruments sounds classification problem. The main contribution of our work is to delete the irrelevant features using soft set approach and then select the most significant features by ranking the relevant features based on the highest dependency of attributes on the dataset using rough set approach. After that, the redundant features with the similar dependency value are deleted.