Severity of Breast Mass Prediction in Mammograms Based on an Optimized Naive Bayes Diagnostic System

Severity of Breast Mass Prediction in Mammograms Based on an Optimized Naive Bayes Diagnostic System

Abeer S. Desuky
DOI: 10.4018/978-1-6684-5092-5.ch012
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Mammography is the most effective tool for breast mass screening. It is a special CT scan technique used only to detect breast tumors early and accurately. Detecting tumors in its early stage has improved the survival rate for breast cancer patients. Computer-aided diagnostic systems help physicians to detect breast cells abnormalities earlier than other traditional procedures. The main aim of this chapter is to increase physicians' ability to determine the severity of a mammographic mass lesion from the BI-RADS features and the patient's age using the bio-inspired chicken swarm optimization (CSO) algorithm for Naive Bayes (NBC) classifier. The mammographic mass dataset is used to analyze the proposed method (CSO-NBC). The dataset is preprocessed and divided to train the CSO-NBC system and test it by the 5-fold cross-validation technique. The performance of the proposed classification system is compared with the results from other research to show the efficiency of the system in predicting the severity of breast tumors with the highest accuracy.
Chapter Preview
Top

Introduction

Cancer is one of the top leading causes of death. It is caused by uncontrolled growth of cells which invade and spread around the body, often resulting in death. Cancer is the second leading cause of death globally and was responsible for 8.8 million deaths in 20151. Globally, nearly 1 in 6 deaths is due to cancer according to the WHO (World Health Organization). Per the US National Cancer Institute, 60 percent of the world’s new cancer cases happen in Asia, Africa, Central, and South America, and 70 percent of global cancer deaths occur in those same regions as well. Moreover, breast cancer death rate was 571 000 equivalently 6.5 percent of all deaths in 20152 (http://www.worldatlas.com). Because of this fact, early detection and true diagnosis is an important issue and plays a key role to reduce mortality of this disease.

Mammography is an efficient imaging appliance for early breast cells abnormality detection. Mammography exists in two types: first type, screening mammography which is breast X ray used to test the changes in breast area for early detection of breast tumors in women with no symptoms of cancer. It can also detect tiny calcium deposits (micro calcification) that is one of cancer manifestations. The second type, Diagnostic mammography which is a breast X ray to check the signs of breast cancer after detecting a mass or other symptom. symptoms include breast size/shape change, skin thickening, pain or nipple discharge (Sickles et al, 2002; Ramani & Vanitha, 2014).

Significant improvements can be made in the lives of breast cancer patients by detecting cancer early and avoiding delays in care, computer-aided diagnosis (CAD) or data-driven clinical decision support systems (CDSS) can help the physicians to do this task faster and more accurate than traditional procedures.

During the last decade with development of machine learning approaches in diagnostic systems, breast cancer detection has improved. The aim of using machine learning approaches is to minimize mistakes that may occur by specialists in diagnosis (Güzel et al., 2013).

Recently, intelligent techniques and systems have been highlighted in many papers by researchers. Different computer-aided diagnosis and decision support have included studies for tumors diagnosing, breast tumors have special concern in these studies. In some studies, the focus was for proposing a single technique, while a combination of techniques was proposed by others to obtain the best reliable results based on different data. Ramani and Vanitha (2014) used weighted histogram algorithms to select features from mammogram data then classified the selected features using random forest, naive bayes and ANN algorithms. Another method proposed in (Sahar & Alaa, 2013) based on missing values imputation and three models support vector machine SVM with polynomial kernel, ANN with pruning parameters and DT with Chi-squared interaction detection were derived for prediction.

Alickovic and Subasi (2017) proposed a system of two stages. In the first stage, the genetic algorithm was used to extract the most useful and essential features from the used data set which containing 569 and 32 cases and features respectively. Second, the authors applied an individual and hybrid data mining techniques to find the best accurate result. The best accuracy obtained by the Random Forest method obtained with 99.48% rate.

In (Wang et al., 2018) a hybridization of Support Vector Machine (SVM) algorithms and the Weighted Area Under the Receiver Operating Characteristic Curve Ensemble (WAUCE) were performed to propose a novel method for increasing the performance accuracy and reducing different breast cancer diagnoses. The proposed method evaluated using three data sets, two of them are published and one is real. 97.68% was the highest accuracy rate which was achieved for the (WDBC) data set after applying 10 cross validation folds. Joshi and Mehta (2018) have tested the performance accuracy of their proposed K Nearest Neighbor (KNN) diagnosing technique in the R environment using a data set with 569 and 32 cases and features respectively. They used the most significant features selection methods: Principal Component Analysis (PCA) and (LDA) to employ their technique. Linear Discriminant Analysis was the method that achieved the better accuracy with 97.06% rate.

Key Terms in this Chapter

Chicken Swarm Optimization: A bio-inspired swarm algorithm that simulates the behaviors and hierarchical order of the swarms of chicken during the process of searching for food. In CSO algorithm each chicken considered as a candidate solution to the optimization problem.

World Health Organization (WHO): A part of the United Nations (UN) that deals with the major issues of health worldwide. WHO sets standards for health care, disease control, and medicines.

Feature Extraction: Select the most appropriate or the optimal set of features that can gain the best classification accuracy.

Bio-Inspired Swarm Algorithms: Are population-based algorithms in which the population consists of a number of unsophisticated agents. Each agent is considered as possible solution to the considered optimization problem.

Feature Weight: The degree of feature importance in the process of classification.

Classifier: A machine learning algorithm used to classify the dataset samples into two or more categories or classes.

Complete Chapter List

Search this Book:
Reset