Intelligent Information Retrieval for Reducing Missed Cancer and Improving the Healthcare System

Intelligent Information Retrieval for Reducing Missed Cancer and Improving the Healthcare System

Madhu Kumari, Prachi Ahlawat
Copyright: © 2022 |Pages: 25
DOI: 10.4018/IJIRR.2022010102
Article PDF Download
Open access articles are freely available for download

Abstract

This study presents an intelligent information retrieval system that will effectively extract useful information from breast cancer datasets and utilized that information to build a classification model. The proposed model will reduce the missed cancer rate by providing a comprehensive decision support to the radiologist. The model is built on two datasets, Wisconsin Breast Cancer Dataset (WBCD) and 365 free text mammography reports from a hospital. Effective pre-processing techniques including filling missing values with regression, an effective Natural Language Processing (NLP) Parser is developed to handle free text mammography reports, balancing the dataset with Synthetic Minority Oversampling (SMOTE) was applied to prepare the dataset for learning. Most relevant features were selected with the help of filter method and tf-idf scores. K-NN and SGD classifiers are optimized with optimum value of k for K-NN and hyper tuning the SGD parameters with grid search technique.
Article Preview
Top

Introduction

Cancer is the major chronic health risk worldwide, with 12.7 million cases reported in 2008 and is predicted to increase to 21 million by 2030 (Society, 2011). Breast cancer is the most invasive life-threatening disease among females. Late diagnosis and the high cost of treatment lead to high mortality rates. Cancer is the lump in which cells begin to grow recalcitrant and can be mortal. These lumps are known as tumors, which can be benign (non-cancerous) or malignant (cancerous). Most breast cancers are discovered in the milk-producing glands, called lobules, or in the ducts that connect to the nipple. Tumors are small in the initial stages and may not cause noticeable symptoms; therefore, it is difficult to diagnose in the early stages. However, advancement in diagnostic techniques allows the oncologist to detect breast cancer during the developing stages. Accurate and timely detection of cancer helps oncologists make effective treatment strategies that can increase patient survival (Jemal, 2005). Early diagnosis requires a reliable and robust diagnostic system that can accurately distinguish between malignant and benign tumors. Machine learning practices are gradually being brought together to improve diagnostic capabilities (Osareh, 2010; Kumari, 2017). With the assistance of machine learning techniques, the possibility of human error can be minimized, and healthcare data can be analyzed rapidly with a higher degree of accuracy (Dutra, 201). Statistically, early tumor detection increases the chance of successful treatment by 30% and improves overall survival rates (Elmore, 2003; Veronesi U, 2005). Consequently, efficient diagnostic techniques are required to detect tumors at an early stage in order to prepare effective treatment plans and strategies for long-term survival. Medical experts and researchers are increasing efforts to improve detection rates of the disease in the initial stages.

Mammography is used as a preliminary diagnostic screening exam to visualize potentially malignant breast tumors using a low dose x-ray with a detection accuracy of 80% (Elmore J. G., 2005). Each breast screening results in a minimum of one x-ray image and one free text report narrated by a radiologist. Each mammography report is assessed and categorized according to the Breast Imaging-Reporting and Data System (BI-RADS) (Liberman, 2002), a standardized classification system given by the American College of Radiology for risk assessment and offers uniformity to radiologist reports. Mammograms have the potential to identify tumors several years before the development of physical symptoms; however, false positives and negatives are not uncommon. The double evaluation of mammograms by two different radiologists is recommended to reduce the proportion of misdiagnoses; however, this practice increases the workload, is costly and time-consuming (Brown, 1996).

To confirm detection, Fine Needle Aspiration (FNA) is used as an additional microscopic analysis and has a detection accuracy of 65-98% (Giard, 1992). A fine needle is used to extract breast tissue for pathological assessment. A comprehensive report is provided on the cell type, including comments on malignancy. Moreover, a surgical biopsy is a diagnostic technique with a detection accuracy of ~100%. The accuracy of the visual interpretation of mammograms and FNA fluctuates extensively and is not the most reliable breast cancer detection method. Surgical biopsy reveals most of the malignant cases; however, this technique is invasive and expensive. Regardless of the availability of contemporary diagnostic procedures and advances in healthcare systems, missed breast cancer continues (Singh, 2007). Missed cancer during the diagnosis is the most damaging and expensive kind of investigative error, also known as diagnostic errors.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024)
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing