Efficiency and Scalability Methods in Cancer Detection Problems

Efficiency and Scalability Methods in Cancer Detection Problems

Inna Stainvas (General Motors - Research & Development, Israel) and Alexandra Manevitch (Siemens Computer Aided Diagnosis Ltd., Israel)
Copyright: © 2013 |Pages: 20
DOI: 10.4018/978-1-4666-3942-3.ch004
OnDemand PDF Download:


Computer aided detection (CAD) system for cancer detection from X-ray images is highly requested by radiologists. For CAD systems to be successful, a large amount of data has to be collected. This poses new challenges for developing learning algorithms that are efficient and scalable to large dataset sizes. One way to achieve this efficiency is by using good feature selection.
Chapter Preview


CAD system for cancer detection in mammography and chest X-ray (CXR) is a viable and efficient source of assistance to radiologists. The CAD systems are based on statistical modeling of cancer using data from collected X-ray images. These images are extremely heterogeneous due to substantial differences in the human population and in image acquisition conditions. For cancer detection modeling, large datasets representing this variability are required. Recently, this has become possible due to an exceptional increase in computing power, storage capacity and networking technologies.

Today medical datasets are growing rapidly. This poses new challenges for developing learning algorithms that are both efficient and scalable to dataset size. There are two main approaches to address the challenge: by intelligently manipulating large amounts of data1 and by reducing its dimensionality. Data manipulation is non-trivial for medical data. Firstly, this is due to the cost of data labeling by medical experts. Secondly, the medical data is imbalanced, i.e. the number of pathological cases is always insufficient. Thus, the second approach emerges as an important one in medical applications. It is preferable to describe data succinctly, rather than manipulate it.

This chapter presents two new feature selection methods proposed by us. Feature selection is usually done in two steps referred to as filters and wrappers (Dash, 1997). In the filter stage the best features are selected based on some heuristic goodness measure such as mutual information (MI), Fisher discriminative measure, etc. This stage is unseen by the specific classifier and does not optimize the same goal as the classifier. Tuning to the classifier is done later in the wrapper stage. In this step, the best number of features or filter stage parameters optimizing the classifier is found. Whereas this two stage approach is good for reducing data dimensionality, it does not guarantee the optimality of a selected feature set for a given classifier.

Although many different feature goodness measures have been proposed, the stability of feature selection is an issue that has not been commonly addressed (Loscalzo, Yu, & Ding, 2009). Similar to Loscalzo et al. (2009), we separate features into clusters. But unlike this work, we utilize feature space clustering to provide diversity in feature selection. This is because features of different clusters are dissimilar and, as such, are independent of each other.

After separating the features into the clusters, we take the best features of a certain number from each cluster. Our selection is based on a new goodness measure which is a combination of feature's discriminative and stability powers. The balance between discrimination and stability is controlled by a regularization parameter. The final stability of feature selection is defined by us as stability of the classification results on the unseen data. The efficacy of the method is demonstrated on real data for cancer detection in CXR images.

Our second feature selection method is designed for a specific type of classifiers and is based on compressive sensing (CS). This method allows us to find features that are wired into a classifier’s architecture instead of blindly selecting features on the basis of external heuristics and to train classifiers robustly. This method enables the training of classifiers in very high dimensional spaces and with a large amount data.

We adapted the compressive sensing approaches to find salient features required for classification. Similar to Davenport (2007), we constrain ourselves to a specific class of projective classifiers. These obtained features are wired into the classifiers' structure. Our feature selection is driven by the classification task, rather than by data reconstruction. Moreover, we avoid the usual two step feature selection procedure. Our method efficiently reduces the computational burden and is computationally “light-weight.”

Complete Chapter List

Search this Book: