A Machine Learning-Based Framework for Diagnosis of Breast Cancer

A Machine Learning-Based Framework for Diagnosis of Breast Cancer

Ravi Kumar Sachdeva, Priyanka Bathla
Copyright: © 2022 |Pages: 11
DOI: 10.4018/IJSI.301221
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Machine learning is used in the health care sector due to its ability to make predictions. Nowadays major cause of death in women is due to breast cancer. In this paper, a machine learning-based framework for the diagnosis of breast cancer has been proposed. The authors have used different feature selection methods on Breast Cancer Wisconsin (Diagnostic) dataset i.e. Chi-square, Pearson correlation between features and Feature importance. The competency of the feature selection methods has been analyzed using different machine learning classifiers on different performance parameters like accuracy, sensitivity, specificity, precision, and F-measure. Random Forest (RF), Extra Tree Classifier (ETC), and Logistic Regression (LR) machine learning classifiers have been used by the authors. Results reveal that FI (Feature Importance) is the preeminent feature selection method among all others used when applied with different classifiers. Results also show that the ETC machine learning classifier gives the best accuracy result in comparison with RF and LR classifiers.
Article Preview
Top

Introduction

Nowadays breast cancer is one of the growing diseases among women. With an age-adjusted prevalence of 25.8 per 100,000 women, it has risen to the top among Indian women. Breast cancer cases among women are more in less developed regions in comparison with the more developed regions (Malvia et al., 2017).

The machine learning field is constantly evolving. It allows computers to learn automatically without human intervention. It helps computers in building models from sample data to make predictions. Supervised learning is used in classification and regression types of problems. In supervised learning, the program is trained using a set of predefined training data and later on, accuracy has been checked using test data (Simon et al., 2015).

While creating a predictive model, the process of reducing the number of features is called feature selection. It is also the process of identifying and selecting the relevant from the available features while developing a predictive model. The purpose of feature selection methods is to identify and delete unnecessary features from the input data that do not help the model perform better (Vanaja and Kumar, 2014). Filter, wrapper, and embedded are different categories of methods for feature selection. Filter method ranks features according to some criterion, and then the features having the highest rank are selected. Wrapper methods evaluate all possible combinations and produce the result. The embedded method performs feature selection during the model training (Miao and Nio, 2016).

In this paper, breast cancer has been diagnosed using different combinations of classifier and feature selection methods. The authors did this by comparing the performance of various feature selection methods with different classifiers in machine learning. Following are the research contributions of the paper:

  • 1.

    Find out the best method for feature selection

  • 2.

    Propose a methodology for diagnosis of breast cancer

  • 3.

    For diagnosis of breast cancer, find optimal mix of classifier and feature selection approach

The remainder of the paper is organized as follows. The literature review of the relevant work has been covered in Section 2. Section 3 contains the methodology for the anticipated methods. Experimental Results of different feature selection methods have been included in section 4. Section 5 describes the conclusions and recommendations.

Top

New systems for disease diagnosis have been developed as technology in the medical industry has advanced. The following is a list of research related to the topic of the paper:

On the Wisconsin breast cancer dataset, Islam et al. (Islam et al., 2020) compared the accuracy, specificity, sensitivity, precision, F1 score, false-positive rate, negative predictive value, and Matthews correlation coefficient of five machine learning techniques: K-nearest Neighbors (KNN), LR, RF, Support Vector Machine (SVM), and Artificial Neural Networks (ANNs). According to the authors, ANNs had the best precision, accuracy, and F1 score.

Alickovic and Subasi (Alickovic and Subasi, 2015) have used Genetic Algorithm (GA) as feature selection technique for eliminating insignificant features. The authors have used several machine learning techniques i.e. LR, Decision Tree (DT), RF, SVM etc. on the Wisconsin datasets, and found that RF and GA feature selection gives the highest accuracy score.

Fatih (Fatih, 2020) used LR, DT, KNN, Naive Bayes (NB), RF, Rotation Forest techniques of machine learning on the Wisconsin breast cancer dataset (WBCD). The author has implemented classification algorithms in three different types with first as ‘All features included’, second as ‘Highly correlated features included’, and 3rd as ‘Low correlated features included’. Results reveal that Logistic Regression had the best classification accuracy with all features.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024)
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing