Early-Stage Ovarian Cancer Diagnosis Using Fuzzy Rough Sets with SVM Classification

Early-Stage Ovarian Cancer Diagnosis Using Fuzzy Rough Sets with SVM Classification

Nora Shoaip (Mansoura University, Egypt), Mohammed Mahfouz Elmogy (Faculty of Computers and Information, Mansoura University, Egypt), Alaa M. Riad (Mansoura University, Egypt), Hosam Zaghloul (Mansoura University, Egypt) and Farid A. Badria (Mansoura University, Egypt)
Copyright: © 2017 |Pages: 18
DOI: 10.4018/978-1-5225-2229-4.ch003
OnDemand PDF Download:
List Price: $37.50


Ovarian cancer is one of the most dangerous cancers among women which have a high rank of the cancers causing death. Ovarian cancer diagnoses are very difficult especially in early-stage because most symptoms associated with ovarian cancer such as Difficulty eating or feeling full quickly, Pelvic or abdominal pain, and Bloating are common and found in Women who do not have ovarian cancer. The CA-125 test is used as a tumor marker, high levels could be a sign of ovarian cancer, but sometimes it is not true because not all women with ovarian cancer have high CA-125 levels, particularly about 20% of ovarian cancers are found at an early stage. In this paper, we try to find the most important rules helping in Early-stage ovarian cancer Diagnosis by evaluating the significance of data between ovarian cancer and the amino acids. Therefore, we propose a Fuzzy Rough feature selection with Support Vector Machine (SVM) classification model. In the pre-processing stage, we use Fuzzy Rough set theory for feature selection. In post-processing stage, we use SVM classification which is a powerful method to get good classification performance. Finally, we compare the output results of the proposed system with other classification technique to guarantee the highest classification performance.
Chapter Preview


Ovarian cancer is formed in tissues of woman ovary (National Cancer Institute, 2014). Abnormal cells of this cancer can be found in one or both ovaries that have the ability to spread to the pelvis and abdomen parts then it spread in all body (Mayoclinic, 2014). Annually ovarian cancer is diagnosed in nearly a quarter of a million women in the entire world and is responsible for Hundreds of thousands of deaths each year. A small percentage of infected women with ovarian cancer, about 45%, can live for only five years compared to 89% of other women infected with breast cancer (World Ovarian Cancer, 2014). Ovarian cancer can be increased with custom groups of women. For example, women aged above 50 years, women who did not give birth or have difficulty in pregnancy or women who have relatives infected with cancer, such as breast, ovarian cancer, colon, uterine, or cervical cancer (Centers for Disease Control and Prevention CDC, 2014).

Cancer antigen-125 (CA-125) is a protein found with high rates of ovarian cancer cells more than other normal cells. CA-125 is created on the surface of cells then moving to the blood stream (Johns Hopkins University, 2014). Physicians used CA-125 as a tumor marker for ovarian cancer by measuring the levels of the CA-125 protein in a woman’s blood. However, it is not an adequate early detection tool. High levels of CA-125 could be used as a sign of ovarian cancer, but it is not accurate or effective role because not all women with ovarian cancer have high CA-125 levels. Ovarian cancer is difficult to be diagnosed in early stages until it spreads to different body parts. More than 60% of women discover this cancer in stage III or stage IV cancer (Wikipedia, 2014a).

Medical diagnosis contains a high degree of difficulty that faces two main problems. The first problem of the medical diagnosis is a classification process. It must analyze many factors in difficult circumstances, such as diagnosis disparity and limited observation. The second problem is the uncertainty of the processed data that affects the diagnosis process.

Among the machine learning techniques that can deal efficiently with different degrees of difficulty of data problems, such as incomplete, uncertain, and inconsistent data, Rough set theory (Pawlak, 1982) is used to analysis vague and uncertain data. In practice, Rough set classifies the discrete attributes with high accuracy. It cannot be done well with real-valued, continuous attributes. Therefore, it leads to the creation of hybrid systems to integrate the Rough set theory with other machine learning technique, such as a Fuzzy set. These methods are complementary to each other, and the combination of them can provide improved solutions for dealing with continuous attributes.

The fuzzy-rough set theory is a successful hybrid model between the Rough set and the fuzzy set that proposed by Dubois and Prade to be one of the most common and efficient feature selection algorithms (Dubois & Prade, 1990, Hassanien Aboul Ella et.al. 2010). The fuzzy-rough set (Jensen & Shen, 2004; Jensen, 2005; Shen & Jensen, 2007; Chen et al., 2008) is a generalization of the lower and upper approximation of the rough set to have greater flexibility in handling uncertainty. As compared to the Rough sets, fuzzy-rough handles the uncertainty present in real data type in a better way without making any transformation such as discretization.

Complete Chapter List

Search this Book: