Article Preview
Top1. Introduction
Breast cancer continues to be a significant public health problem in the world. Early detection is the key to improving breast cancer prognosis. Mammography is one of the reliable methods for early detection of breast cancer. Computer Aided (CA) diagnosis systems have been developed to aid radiologists in detecting mammographic lesions, characterized by promising performance. Various CA diagnosis algorithms have been proposed for the characterization of Microcalcifications (MCs), an important indicator of malignancy (Cheng et al., 2003, 2006; Thangavel et al., 2005). These algorithms are based on extracting image features from regions of interest (ROIs) and estimating the probability of malignancy for a given MC cluster. One of the most important steps for the classification task is extracting suitable features capable of distinguishing between classes. There have been great efforts spent on extracting appropriate features from microcalcification clusters (Shen et al., 1994). In order to reduce the complexity and to increase the performance of the classifier the redundant and irrelevant features are to be eliminated from the original feature set (Mohanty et al., 2013).
The objective of feature selection is to choose a subset of presented features by eradicating unnecessary features. To extract as much information as potential from a given image set while using the nominal number of features, we should eradicate the features with modest or no predictive information, and ignore the redundant features that are strongly correlated (Zhang 2000, Guyon & Elisseef, 2003). As a result, a great total of computation time can be saved. The chosen subset of features used to characterize such classification function influences numerous aspects of image classification, including the time requested to learn a classification function, the precision of the learned classification algorithm, the time-space cost coupled with the features, and the number of samples required for training.
Much progress has been made on feature subset selection these years. There are several viewpoints to categorize such techniques: filter, wrapper and embedded (Muni et al., 2006; Zhu et al., 2007), unsupervised (Liu & Yu, 2005) and supervised (Bhatt & Gopal, 2005; Neumann et al., 2005; Hu et al., 2008a, 2008b), etc (Liu & Yu, 2004). In the wrapper approach, the feature subsets are evaluated using a classifier to justify the performance of the selection (Guyon & Elisseeff, 2003), whereas in the filter approach, the subset is evaluated with a statistical measure (Dash & Liu, 1997). The embedded approach could use the strengths of both wrapper and filter approaches. (Huang et al., 2007). Sometimes, the dataset might have imbalanced data where the class distribution is not uniform among the classes; a special kind of attentions has to be taken on such datasets. Dash et al., (2013) used information gain theory (filter approach) for eliminating the irrelevant features and differential evolution for tuning center and spread of radial basis functions on both balanced and imbalanced data. Further these methods could be either supervised or unsupervised learning, the wrapper approaches mostly follows the supervised model as because they employ a classifier. For an unsupervised model, the filter approach is widely used. The major step in feature selection is searching for the features for subset construction, there are number of ways to solve this.