Prediction of Breast Cancer Using Extremely Randomized Clustering Forests (ERCF) Technique: Prediction of Breast Cancer

Prediction of Breast Cancer Using Extremely Randomized Clustering Forests (ERCF) Technique: Prediction of Breast Cancer

Akhil Gupta, Rohit Anand, Digvijay Pandey, Nidhi Sindhwani, Subodh Wairya, Binay Kumar Pandey, Manvinder Sharma
Copyright: © 2021 |Pages: 15
DOI: 10.4018/IJDST.287859
(Individual Articles)
No Current Special Offers


Breast cancer is a significant public health concern in both developed and developing countries. It is almost one in three cancers diagnosed in all women. Data mining and pattern recognition applications in conjunction have been proven to be quite useful and relevant to extract the information useful for the medical purpose. This research work reflects the work based on extremely randomized clustering forests (ERCF) technique which is nothing but a type of pattern recognition technique that may be implemented as the prediction model for breast cancer (BC). The accuracy achieved through ERCF has also been compared with that of k-NN (correlation) and k-NN (Euclidean) in this research work (where k-NN refers to k-nearest neighbours technique), and thereafter, final conclusions have been drawn depending upon the testing attributes. The results show that the accuracy of ERCF in the forecasting of breast cancer is so much larger than that of the exactness of k-NN (correlation) and k-NN (Euclidean). Hence, ERCF, a randomized technique for pattern classification, is best.
Article Preview

1. Introduction

Breast cancer (BC) is a common type of cancer that affects females. It strikes almost 10% of the women at some phase of their life. The causes of breast cancer are not fully known (Konneworleans, n.d.). However, researchers have identified a number of factors (called a potential risk) that raise (or lower) the likelihood of suffering breast cancer. Despite the fact that this form of cancer is the one of the leading causes of cancer death in women, the survival rate is considerable high. In the case of early detection, over than 97 percent of females may live for more than 5 years. (Kharya, 2012). Since the last two decades, because of the large focus on research associated with the cancer, some unorthodox and unfamiliar methods for the early identification and exploration have been flourished that help to decline the death rate associated with the cancer. The application of various procedures to significant available data in order to estimate the lastingness of any patient affected by a disease over a length of time is known to as survival analysis in medical prognosis. Mostly with growing use of technology augmented by computerised and programmed tools (Delen et al., 2005), massive amounts of medical information are now being amassed and made accessible to the a wide range of medical research communities in order to create multiple kinds of prediction models for raising the long-term effectiveness of medical research. As a result, emerging research routes including certain knowledge discovery in databases (KDD), that also uses data mining algorithms (Delen et al., 2005), have become well-known tools for medical researchers who want to find and use the arrangement and connections across a wide range of variables in order to determine a outcome of a type of cancer utilising cached databases. In past years, data mining has now become a valuable tool for extracting and manipulating data and also designing pattern arrangements to generate information for effective decision. Data mining is the method with filtering, investigating, as well as prototyping a large amount of data to discover consistency, uniformity, or correspondence that was previously unknown in order to produce effective and excellent results for the database (Kataria & Sharma, 2013). In other words, data mining refers to self-regulatory analysis of enormous databases that are valid, novel, useful and understandable. Data mining has emerged as an aid to supply keys to analysts’ problems.

Commonly used data mining approaches are decision trees (Teli & Kanikar, 2015), logistic regression (Komarek, 2004), support vector machines (Wang, 2005), k-NN (Cai et al., 2010) and artificial neural networks (Arockiaraj, 2013) etc.

Automatic (machine) identification, interpretation, categorization, and pattern clustering are important methods that have applications in a wide range of fields, including engineering and science subject areas like biology, psychology, medicine, marketing, computer vision, artificial intelligence, and remote sensing, among many others (Jain et al., 2000). The data acquired through research paves the way for feature extraction in such techniques and nominates the common characteristics of a number of such applications where these features are not extracted by domain experts. The availability of technologies having features of higher computing power and faster processing of huge data sets have made it possible to diversify the techniques for data analysis as well as classification. In many of the emerging applications, several intermixed approaches are used for the optimal classification. That’s why, integrating many of the sensing approaches and classifiers is much frequently used exercise in pattern recognition (Jain et al., 2000).

Commonly used pattern recognition techniques are Decision Trees (Patel & Rana, 2014), Logistic Regression (Turkov et al., 2012), SVM (Wang, 2005), k-NN (Suguna & Thanushkodi, 2010), Random Forests (Ghosal, 2009) and ERCF (Moosmann et al., 2008) etc.

Complete Article List

Search this Journal:
Volume 14: 2 Issues (2023)
Volume 13: 8 Issues (2022)
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing