Classification of Polycystic Ovary Syndrome Based on Correlation Weight Using Machine Learning

Classification of Polycystic Ovary Syndrome Based on Correlation Weight Using Machine Learning

Marcelo Marreiros, Diana Ferreira, Cristiana Neto, Deden Witarsyah, José Machado
DOI: 10.4018/978-1-7998-9172-7.ch006
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Polycystic ovarian syndrome (PCOS) is the most common endocrine pathology in reproductive-age women worldwide. Research has shown that the application of machine learning (ML) and data mining (DM) can have a positive impact in this condition's diagnosis. This study aims to develop a model to identify patients with PCOS using different scenarios based on correlation weights. Five DM techniques were applied, namely random forest (RF), decision tree (DT), naive bayes (NB), logistic regression (LR), and artificial neural network (ANN), to determine the best model, which was the RF classifier. Additionally, the results show that the model was able to predict PCOS with 93.06% of accuracy, 92.66% of precision, 93.52% of sensitivity, and 92.59% of specificity. Compared with a previous work conducted by the authors, the feature selection-based solo on the correlation weight decreased the accuracy values by 1.9%, precision by 3.7%, sensitivity by 0.3%, and specificity by 3.6%.
Chapter Preview
Top

Introduction

Nowadays, the Stein-Leventhal syndrome currently known as Polycystic Ovary Syndrome (PCOS) is the most common endocrine pathology in reproductive-age women around the world (Leon & Mayrin, 2020; Balen & Rajkowha, 2003). PCOS is a hormonal disorder that represents a condition in which about 10 small cysts ranging in diameter between 2 and 9 mm develop in one or both ovaries and/or the ovarian volume in at least one ovary surpasses 10 ml (El Hayek et al., 2016). Its major features include menstrual dysfunction, anovulation, and signs of hyperandrogenism (Witchel et al., 2019). Consequently, the population of women at greater risk involve reproductive age females with clinical evidence of hyperandrogenism (i.e., hirsutism, acne, or alopecia), menstrual and/or ovulatory dysfunction, polycystic ovary morphology, insulin resistance and metabolic abnormalities or obesity (ESHRE & ASRM-Sponsored PCOS Consensus Workshop Group, 2004). PCOS affects about 5 to 15% of women worldwide depending on the diagnostic criteria used (Leon & Mayrin, 2020). In spite of having such a high prevalence, many cases remain undiagnosed and even when they are correctly diagnosed, the process usually is lengthy (Gibson-Helm et al., 2017).

In general, it is widely accepted that the diagnosis of PCOS should be based in the presence of two of the following three criteria: chronic anovulation, hyperandrogenism (clinical or biological), and polycystic ovaries (Leon & Mayrin, 2020). Nonetheless, consistent with the fact of being a syndrome, no single test is available to establish its diagnosis, and various disorders may present in a similar way. This leads to the necessity of extensive workup if clinical features suggest other causes (Azziz et al., 2009; El Hayek et al., 2016).

In today's world, more than ever before, it is progressively easier to create and store data from many fields which is accumulating at a dramatic pace (Ferreira et al., 2020). Consequently, there is an increasing gap between the generation of data and human understanding of it. In the growing pool of data lies hidden, potentially useful information, that is rarely made explicit or taken advantage of. Thus, one of the grand challenges of the information age is turning data into information and turning information into knowledge (Witten et al., 2005). This can be achieved through the use of Machine Learning (ML) (Zhang, 2020) and Data Mining (DM) (Hand & Adams 2014) since the first is used to extract information from the raw data in databases and the second is the application of specific algorithms for extracting patterns from data (Silva et al., 2018).

As health organizations generate and store large volumes of data every day, clinical decisions could be made not only based on doctor’s intuition and experience but also based on hidden knowledge stored over time in healthcare databases (Silva et al., 2018). In this sense, the aim of this study is to predict if a patient has POCS by applying classification techniques such as Random Forest (RF), Decision Tree (DT), Naive Bayes (NB), Logistic Regression (LR), and Artificial Neural Network (ANN). This application of DM may improve operating efficiency in healthcare organizations since the diagnosis of this syndrome can be hard to achieve. Out of the array of DM methodologies, Cross Industry Standard Process for Data Mining (CRISP-DM) was the one applied to the problem at hand, since this is a popular methodology used for improving the efficiency and the scalability of DM projects.

The paper is organized as follows: the next section presents information about previous studies made on PCOS classification; following, the CRISP-DM methodology and a detailed description of each stage is presented; then, the discussion of the results is made and lastly, the paper is concluded and some ideas for future work are outlined.

Complete Chapter List

Search this Book:
Reset