A Hybrid GA-LDA Scheme for Feature Selection in Content-Based Image Retrieval

A Hybrid GA-LDA Scheme for Feature Selection in Content-Based Image Retrieval

Khadidja Belattar (Constantine 2 University, Constantine, Algeria), Sihem Mostefai (Constantine 2 University, Constantine, Algeria) and Amer Draa (Constantine 2 University, Constantine, Algeria)
Copyright: © 2018 |Pages: 24
DOI: 10.4018/IJAMC.2018040103
OnDemand PDF Download:
No Current Special Offers


Feature selection is an important pre-processing technique in the pattern recognition domain. This article proposes a hybridization between Genetic Algorithm (GA) and the Linear Discriminant Analysis (LDA) for solving the feature selection problem in Content-Based Image Retrieval (CBIR) applied to dermatological images. In the first step, we preprocess and segment the input image, then we derive color and texture features characterizing healthy skin and the segmented skin lesion. At this stage, a binary GA is used to evolve chromosome subsets whose fitness is evaluated by a Logistic Regression classifier. The optimal identified features are then used to feed LDA for a CBIR system, based on a K-Nearest Neighbor classification. To assess the proposed approach, the authors have opted for a K-fold cross validation method on a database of 1097 images of melanomas and other skin lesions. As a result, the authors obtained a reduced number of features and an improved CBDIR system compared to PCA, LDA and ICA methods.
Article Preview

1. Introduction

With the rapid development of multimedia technologies, many electronic imaging equipment have become widely used, resulting in a growing amount of multimedia information. Various image databases have led to prosperity of the image retrieval field, a challenging and expanding research area. Thus, determining how to effectively and efficiently retrieve a desired image from a constantly growing image database has become a serious issue.

In dermatology field, traditional image retrieval systems are based on the features of the original data, such as: skin lesion image type, anatomical location, symptom, pathophysiology and skin lesion diagnosis. When applied to large scale skin lesion image

databases, these features become troublesome, tedious, time-consuming, subjective and even inadequate with regard to describing image content. Therefore, Content-Based Image Retrieval (CBIR) is an interesting alternative. An automatic CBIR system extracts visual information from the image and converts it to a multidimensional feature vector representation. For retrieval, the dissimilarities (distances) between the feature vector of a query image and the feature vectors of the images in the database are computed. Then, the database images that are most similar to the query are returned to the user.

A wide variety of CBIR techniques has been proposed, including specifically hierarchical Bayesian model (Baldi, Quartulli, Murace, Dragonetti, Manganaro, Guerra & Bizzi, 2010), Support Vector Machines (Hui, Caiming & Hua, 2010) and semi supervised SVM batch mode active learning approaches (Hoi, Jin, Zhu & Lyu, 2009).

Content-Based Dermoscopic Image Retrieval (CBDIR) may especially be interesting to both inexperienced and experienced clinicians for Computer-Aided Diagnosis purposes when it is partly based on dermoscopic skin lesion images. In fact, dermoscopy is the most common modality used in dermatology, since it increases the sensibility and specificity of diagnosis, compared to naked eye examination. It refers to the examination of skin using skin surface microscopy.

In a typical diagnosis process, a dermatologist frequently needs to label an unknown skin disease represented by a skin image. He can then submit such an image to a CBDIR system which identifies the relevant cases that are correctly labeled from a known database. In this way, physicians improve the quality of skin lesion diagnosis using the CBDIR system as a second opinion.

Hence, the CBDIR system requires the analysis of the actual visual content of dermoscopic images. This content can be described by color, texture, spatial relationships, shape features and patterns. However, the computed features may not reflect the clinical and morphological parameters applied by physicians for diagnosis of the skin lesion; this is a common issue in CBIR knows as semantic gap.

To improve the performance of CBDIR systems, it is important to have a proper image feature set that describes the precise content of the image. The more relevant the image features are, the higher the retrieval performance would be.

Typically, relevant pattern recognition and CBDIR are two related problems. From a machine learning standpoint, pattern recognition is a feature selection problem, while the CBDIR is seen as a supervised classification problem. The latter problem refers to the procedure of visually retrieving the most similar images of known labels with respect to the user query from the image database; by using as input the resulting significant feature/pattern identified in the selection step. This is followed by assigning the query sample to the appropriate type of skin disease.

In this context, an effective dimensionality reduction method is needed for pattern recognition. These methods map the original feature space into a new, reduced dimensionality space; the examples to be used by machine learning algorithms will be represented in this new space. This mapping is usually performed in two ways: feature reduction and feature selection.

Feature reduction produces new features from the linear combination of the original ones, by discarding the less important ones. It involves a mathematical transformation that projects the original feature space into a lower dimensional space, while preserving as much relevant information as possible.

Among the most popular approaches used for skin lesion recognition, we cite the Principal Component Analysis (PCA) (Rahman, Bhattacharya & Desai, 2006; Celebi & Aslandogan, 2004), Independent Component Analysis (ICA) (Trojan, 2004), LImited RAnk Matrix Learning Vector Quantization (LiRaM LVQ) and the Large Margin Nearest Neighbor (LMNN) (Bunte, Biehl, Jonkman & Petkov).

Complete Article List

Search this Journal:
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing